McGraw-Hili Ryerson


                              D�-J�
                      Mathematics of
                    J:;j�jJ�0ajSJajJ-J




This book was distributed by Jack Truong for use at William Lyon Mackenzie Collegiate Institute.
1
    PT   ER
                  Tools for Data Management
CHA




                  Specific Expectations                                                           Section

                  Locate data to answer questions of significance or personal interest, by           1.3
                  searching well-organized databases.

                  Use the Internet effectively as a source for databases.                           1.3

                  Create database or spreadsheet templates that facilitate the manipulation     1.2, 1.3, 1.4
                  and retrieval of data from large bodies of information that have a variety
                  of characteristics.

                  Represent simple iterative processes, using diagrams that involve                 1.1
                  branches and loops.

                  Represent complex tasks or issues, using diagrams.                              1.1, 1.5

                  Solve network problems, using introductory graph theory.                          1.5

                  Represent numerical data, using matrices, and demonstrate an                    1.6, 1.7
                  understanding of terminology and notation related to matrices.

                  Demonstrate proficiency in matrix operations, including addition, scalar         1.6, 1.7
                  multiplication, matrix multiplication, the calculation of row sums, and the
                  calculation of column sums, as necessary to solve problems, with and
                  without the aid of technology.

                  Solve problems drawn from a variety of applications, using matrix               1.6, 1.7
                  methods.
Chapter Problem
VIA Rail Routes                                  1. a) List several routes you have
When travelling by bus, train, or airplane,            travelled where you were able to
you usually want to reach your destination             reach your destination directly.
without any stops or transfers. However,            b) List a route where you had to
it is not always possible to reach your                change vehicles exactly once before
destination by a non-stop route. The                   reaching your destination.
following map shows the VIA Rail routes          2. a) List all the possible routes from
for eight major cities. The arrows                     Montréal to Toronto by VIA Rail.
represent routes on which you do not have
                                                    b) Which route would you take to get
to change trains.
                                                       from Montréal to Toronto in the
                                    Montréal           least amount of time? Explain your
Sudbury                                                reasoning.
                                                 3. a) List all the possible routes from
                       Ottawa
                                                       Kingston to London.
                                   Kingston         b) Give a possible reason why VIA Rail
                                                       chooses not to have a direct train
                   Toronto                             from Kingston to London.
                                                This chapter introduces graph theory,
                                                matrices, and technology that you can use
            London
                                                to model networks like the one shown. You
                                Niagara Falls   will learn techniques for determining the
Windsor                                         number of direct and indirect routes from
                                                one city to another. The chapter also
                                                discusses useful data-management tools
                                                including iterative processes, databases,
                                                software, and simulations.
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

    1. Order of operations Evaluate each                 5. Graphing data Organize the following set of
      expression.                                          data using a fully-labelled double-bar graph.
       a) (−4)(5) + (2)(−3)                                 City          Snowfall (cm) Total Precipitation (cm)
       b) (−2)(3) + (5)(−3) + (8)(7)                        St. John’s       322.1               148.2
                                                            Charlottetown    338.7               120.1
       c) (1)(0) + (1)(1) + (0)(0) + (0)(1)
                                                            Halifax          261.4               147.4
                       12
       d) (2)(4) + ᎏᎏ − (3)2                                Fredericton      294.5               113.1
                       3                                    Québec City      337.0               120.8
    2. Substituting into equations Given                    Montréal         214.2                94.0
      f (x) = 3x2 − 5x + 2 and g(x) = 2x − 1,               Ottawa           221.5                91.1
      evaluate each expression.                             Toronto          135.0                81.9
                                                            Winnipeg         114.8                50.4
       a) f (2)
                                                            Regina           107.4                36.4
       b) g(2)                                              Edmonton         129.6                46.1
       c) f (g(−1))                                         Calgary          135.4                39.9
       d) f ( g(1))                                         Vancouver         54.9               116.7
                                                            Victoria          46.9                85.8
       e) f ( f (2))
                                                            Whitehorse       145.2                26.9
       f) g( f (2))
                                                            Yellowknife      143.9                26.7
    3. Solving equations Solve for x.
                                                         6. Graphing data The following table lists the
       a) 2x − 3 = 7
                                                           average annual full-time earnings for males
       b) 5x + 2 = −8                                      and females. Illustrate these data using a
           x                                               fully-labelled double-line graph.
       c) ᎏ − 5 = 5
           2
       d) 4x − 3 = 2x − 1                                       Year       Women ($)       Men ($)
                                                                1989        28 219         42 767
       e) x2 = 25
                                                                1990        29 050         42 913
       f) x3 = 125                                              1991        29 654         42 575
       g) 3(x + 1) = 2(x − 1)                                   1992        30 903         42 984
           2x − 5       3x − 1                                  1993        30 466         42 161
       h) ᎏ = ᎏ
             2            4                                     1994        30 274         43 362
                                                                1995        30 959         42 338
    4. Graphing data In a sample of 1000                        1996        30 606         41 897
      Canadians, 46% have type O blood, 43%                     1997        30 484         43 804
      have type A, 8% have type B, and 3% have                  1998        32 553         45 070
      type AB. Represent these data with a fully-
      labelled circle graph.




4         MHR • Tools for Data Management
7. Using spreadsheets Refer to the spreadsheet    10. Ratios of areas Draw two squares on a sheet
  section of Appendix B, if necessary.               of grid paper, making the dimensions of the
   a) Describe how to refer to a specific cell.       second square half those of the first.
  b) Describe how to refer to a range of cells        a) Use algebra to calculate the ratio of the
      in the same row.                                   areas of the two squares.
   c) Describe how to copy data into another          b) Confirm this ratio by counting the
      cell.                                              number of grid units contained in each
                                                         square.
  d) Describe how to move data from one
      column to another.                              c) If you have access to The Geometer’s
                                                         Sketchpad or similar software, confirm
   e) Describe how to expand the width of a
                                                         the area ratio by drawing a square,
      column.
                                                         dilating it by a factor of 0.5, and
   f) Describe how to add another column.                measuring the areas of the two squares.
  g) What symbol must precede a                          Refer to the help menu in the software,
      mathematical expression?                           if necessary.

8. Similar triangles Determine which of the       11. Simplifying expressions Expand and simplify
  following triangles are similar. Explain           each expression.
  your reasoning.                                    a) (x – 1)2

     B        3   A
                                             D       b) (2x + 1)(x – 4)
                  2              7           4       c) –5x(x – 2y)
         4
                  C
                             E                       d) 3x(x – y)2
                                             F
                                     6
          G                                          e) (x – y)(3x)2
                      12                             f) (a + b)(c – d)
              6
                                                  12. Fractions, percents, decimals Express as a
                                         J
                         9                           decimal.
                  H
                                                         5              23              2
                                                     a) ᎏᎏ           b) ᎏᎏ          c) ᎏᎏ
9. Number patterns Describe each of the                 20              50              3
  following patterns. Show the next three               138             6
                                                     d) ᎏᎏ           e) ᎏᎏ          f) 73%
  terms.                                                 12             7
   a) 65, 62, 59, …                               13. Fractions, percents, decimals Express as a
  b) 100, 50, 25, …                                  percent.
             1 1         1                                              4               1
   c) 1, − ᎏ , ᎏ , − ᎏ , …                            a) 0.46        b) ᎏᎏ          c) ᎏᎏ
             2 4         8                                              5               30
  d) a, b, aa, bb, aaa, bbbb, aaaa, bbbbbbbb, …                         11
                                                      d) 2.25        e) ᎏᎏ
                                                                         8




                                                                   Review of Prerequisite Skills • MHR   5
1.1         The Iterative Process

  If you look carefully at the branches of a tree,
  you can see the same pattern repeated over
  and over, but getting smaller toward the end
  of each branch. A nautilus shell repeats the
  same shape on a larger and larger scale from
  its tip to its opening. You yourself repeat
  many activities each day. These three
  examples all involve an iterative process.
  Iteration is a process of repeating the same
  procedure over and over. The following
  activities demonstrate this process.




      I N V E S T I G AT E & I N Q U I R E : Developing a Sor t Algorithm

      Often you need to sort data using one or more criteria, such as alphabetical
      or numerical order. Work with a partner to develop an algorithm to sort the
      members of your class in order of their birthdays. An algorithm is a
      procedure or set of rules for solving a problem.

       1. Select two people and compare their birthdays.
       2. Rank the person with the later birthday second.
       3. Now, compare the next person’s birthday with the last ranked birthday.
          Rank the later birthday of those two last.
       4. Describe the continuing process you will use to find the classmate with
          the latest birthday.
       5. Describe the process you would use to find the person with the second
          latest birthday. With whom do you stop comparing?
       6. Describe a process to rank all the remaining members of your class
          by their birthdays.
       7. Illustrate your process with a diagram.


  The process you described is an iterative process because it involves repeating
  the same set of steps throughout the algorithm. Computers can easily be
  programmed to sort data using this process.


  6      MHR • Tools for Data Management
I N V E S T I G AT E & I N Q U I R E : T h e S i e r p i n s k i Tr i a n g l e

Method 1: Pencil and Paper
 1. Using isometric dot paper, draw a large equilateral triangle with side
    lengths of 32 units.
 2. Divide this equilateral triangle into four smaller equilateral triangles.
 3. Shade the middle triangle. What fraction of the original triangle is shaded?
 4. For each of the unshaded triangles, repeat this process. What fraction of
    the original triangle is shaded?
 5. For each of the unshaded triangles, repeat this process again. What fraction
    of the original triangle is shaded now?
 6. Predict the fraction of the original triangle that would be shaded for the
    fourth and fifth steps in this iterative process.
 7. Predict the fraction of the original triangle that would be shaded if this
    iterative process continued indefinitely.
Method 2: The Geometer’s Sketchpad®
 1. Open a new sketch and a new script.
 2. Position both windows side by side.
 3. Click on REC in the script window.
 4. In the sketch window, construct a triangle. Shift-click on each side of the
    triangle. Under the Construct menu, choose Point at Midpoint and then
    Polygon Interior of the midpoints.
 5. Shift-click on one vertex and the two adjacent midpoints. Choose Loop
    in your script.
 6. Repeat step 5 for the other two vertices.
 7. Shift-click on the three midpoints. From the Display menu,
    choose Hide Midpoints.
 8. Stop your script.
 9. Open a new sketch. Construct a new triangle. Mark the
    three vertices. Play your script at a recursion depth of at
    least 3. You may increase the speed by clicking on Fast.
10. a) What fraction of the original triangle is shaded
        i) after one recursion?
        ii) after two recursions?
        iii) after three recursions?
     b) Predict what fraction would be shaded after four and five recursions.
     c) Predict the fraction of the original triangle that would be shaded if
        this iterative (recursion) process continued indefinitely.
11. Experiment with recursion scripts to design patterns with repeating shapes.



                                                                                  1.1 The Iterative Process • MHR   7
The Sierpinski triangle is named after the Polish
mathematician, Waclaw Sierpinski (1882−1924). It is
an example of a fractal, a geometric figure that is                    www.mcgrawhill.ca/links/MDM12
generally created using an iterative process. One
part of the process is that fractals are made of self-            Visit the above web site and follow the links to
                                                                   learn more about the Sierpinski triangle and
similar shapes. As the shapes become smaller and
                                                               fractals. Choose an interesting fractal and describe
smaller, they keep the same geometrical
                                                                                how it is self-similar.
characteristics as the original larger shape. Fractal
geometry is a very rich area of study. Fractals can be
used to model plants, trees, economies, or the
honeycomb pattern in human bones.



Example 1 Modelling With a Fractal

Fractals can model the branching of a tree. Describe
the algorithm used to model the tree shown.

Solution
Begin with a 1-unit segment. Branch off at 60° with two
segments, each one half the length of the previous branch.
Repeat this process for a total of three iterations.

Arrow diagrams can illustrate iterations. Such diagrams show the sequence
of steps in the process.

Example 2 The Water Cycle

Illustrate the water cycle using an arrow diagram.

Solution                                                               The Water Cycle
The water, or hydrologic, cycle is          Condensation
an iterative process. Although the
timing of the precipitation can                      Precipitation         Transpiration
                                                                                                Evaporation
vary, the cycle will repeat itself      Surface
indefinitely.                            runoff

                                                           Lake
                                         Percolation
                                                                        Streams and                     Ocean
                                            Water table                 Rivers

                                                                     Groundwater




8     MHR • Tools for Data Management
Example 3 Tree Diagram
a)   Illustrate the results of a best-of-five hockey playoff series between
     Ottawa and Toronto using a tree diagram.
b)   How many different outcomes of the series are possible?


Solution
a) For each game, the tree diagram has two branches,                                       O
     one representing a win by Ottawa (O) and the other                        O
                                                                                                     O
     a win by Toronto (T). Each set of branches                                            T                   O
                                                                                                     T
     represents a new game in the playoff round. As soon          O                                            T
                                                                                                     O
     as one team wins three games, the playoff round                                       O                   O
                                                                                                      T
     ends, so the branch representing that sequence also                        T                              T
     stops.                                                                                          O         O
                                                                                           T                   T
                                                                                                     T
b) By counting the endpoints of the branches, you can                                                O
     determine that there are 20 possible outcomes for                                     O                   O
                                                                                                      T        T
     this series.                                                              O                               O
                                                                                                     O
                                                                                           T                   T
                                                                  T                                  T
                                                                                                               O
                                                                                                     O         T
                                                                                           O
                                                                                                      T
                                                                                T
                                                                                           T

                                                               1st game        2nd        3rd        4th       5th


Example 4 Recursive Formula

The recursive formula tn = 3tn-1 − tn-2 defines a sequence of numbers.
Find the next five terms in the sequence given that the initial or seed
values are t1 = 1 and t2 = 3.

Solution

        t3 = 3t2 − t1           t4 = 3t3 − t2            t5 = 3t4 − t3
           = 3(3) − 1              = 3(8) − 3               = 3(21) − 8
           =8                      = 21                     = 55

        t6 = 3t5 − t4           t7 = 3t6 − t5
           = 3(55) − 21            = 3(144) − 55
           = 144                   = 377

The next five terms are 8, 21, 55, 144, and 377.




                                                                             1.1 The Iterative Process • MHR       9
Key Concepts

     • Iteration occurs in many natural and mathematical processes. Iterative
       processes can create fractals.

     • A process that repeats itself can be illustrated using arrows and loops.

     • A tree diagram can illustrate all the possible outcomes of a repeated process
       involving two or more choices at each step.
     • For recursive functions, the first step is calculated using initial or seed values,
       then each successive term is calculated using the result of the preceding step.

     Communicate Your Understanding

      1. Describe how fractals have been used
         to model the fern leaf shown on the
         right.




      2. Describe your daily routine as an iterative process.



Practise                                                    2. The diagram below illustrates the carbon-
                                                                 oxygen cycle. Draw arrows to show the
 A                                                               gains and losses of carbon dioxide.
 1. Which of the following involve an iterative
     process?                                                                           The Carbon-Oxygen Cycle
                                                                                               World atmospheric
                                                                                                                                   CO2
      a) the cycle of a washing machine                                                       carbon dioxide supply
                                                                                                                                                  CO
                                                                                                                                                  CO2
      b) your reflections in two mirrors that face          CO2            CO2           CO2 CO2      CO2   CO2               CO2    Industrial
                                                                                                                                    activity     (H20)

         each other                                                                Soil
                                                                                           Animal Plant Photosynthesis             Combustion
                                                                                           respir– respir–
                                                                                   respir– ation
      c) the placement of the dials on an                                          ation
                                                                                                   ation


         automobile dashboard                                  Surface
                                                              exchange          Ocean Land
      d) a chart of sunrise and sunset times             Photo–       CO2 in seawater                            Peat
                                                         synthesis                                                                               Molten
      e) substituting a value for the variable in                       Respiration
                                                                                                                                     Fossil
                                                                                                                                                 rocks
                                                         Plankton        Marine                                  Coal
         a quadratic equation                                            Animals
                                                                                                                                     fuels


      f) a tessellating pattern, such as paving           Sediments       Organic sediments
                                                                          (hydrocarbons)
                                                                                                               Petroleum
                                                                                                               Natural gas
         bricks that fit together without gaps
                                                          CaCO2 in rock (calcium carbonate)




10      MHR • Tools for Data Management
3. Draw a tree diagram representing the               8. Application In 1904 the Swedish
    playoffs of eight players in a singles tennis       mathematician Helge von Koch
    tournament. The tree diagram should show            (1870−1924) developed a fractal based on
    the winner of each game continuing to the           an equilateral triangle. Using either paper
    next round until a champion is decided.             and pencil or a drawing program, such as
                                                        The Geometer’s Sketchpad, draw a large
Apply, Solve, Communicate                               equilateral triangle and trisect each side.
                                                        Replace each middle segment with two
B                                                       segments the same length as the middle
4. Draw a diagram to represent the food chain.          segment, forming an equilateral triangle
                                                        with the base removed, as shown below.
5. Communication Describe how the tracing
    of heartbeats on a cardiac monitor or
    electrocardiogram is iterative. Illustrate
    your description with a sketch.

6. In the first investigation, on page 6, you
    developed a sort algorithm in which new
    data were compared to the lowest ranked             Repeat the process of trisection and
    birthday until the latest birthday was found.       replacement on each of the 12 smaller
    Then, the second latest, third latest, and so       segments. If you are using a computer
    on were found in the same manner.                   program, continue this process for at least
    a) Write a sort algorithm in which this             two more iterations.
       process is reversed so that the highest          a) How many segments are there after
       ranked item is found instead of the                 three iterations?
       lowest.                                          b) How many segments are there after
    b) Write a sort algorithm in which you                 four iterations?
       compare the first two data, then the              c) What pattern can you use to predict the
       second and third, then the third and                number of segments after n iterations?
       fourth, and so on, interchanging the
       order of the data in any pair where the        9. The first two terms of a sequence are given
       second item is ranked higher.                    as t1 = 2 and t2 = 4. The recursion formula is
                                                        tn = (tn−1 ) 2 − 3tn−2. Determine the next four
7. Application Sierpinski’s carpet is similar to        terms in the sequence.
    Sierpinski’s triangle, except that it begins
    with a square. This square is divided into
    nine smaller squares and the middle one is
    shaded. Use paper and pencil or a drawing
    program to construct Sierpinski’s carpet to
    at least three stages. Predict what fraction of
    the original square will be shaded after n
    stages.




                                                                     1.1 The Iterative Process • MHR   11
10. Each of the following fractal trees has a            a) Select a starting point near the centre of
     different algorithm. Assume that each tree             a sheet of grid paper. Assign the numbers
     begins with a segment 1 unit long.                     1 to 4 to the directions north, south,
     a) Illustrate or describe the algorithm for            east, or west in any order. Now, generate
        each fractal tree.                                  random whole numbers between 1 and 4
                                                            using a die, coin, or graphing calculator.
        i)                      ii)
                                                            Draw successive line segments one unit
                                                            long in the directions corresponding to
                                                            the random numbers until you reach an
                                                            edge of the paper.
                                                         b) How would a random walk be affected if
                                                            it were self-avoiding, that is, not allowed
        iii)                                                to intersect itself? Repeat part a) using
                                                            this extra rule.
                                                         c) Design your own random walk with a
                                                            different set of rules. After completing
                                                            the walk, trade drawings with a classmate
                                                            and see if you can deduce the rules for
                                                            each other’s walk.



                                                              www.mcgrawhill.ca/links/MDM12

                                                        To learn more about chaos theory, visit the above
                                                        web site and follow the links. Describe an aspect
                                                                of chaos theory that interests you.

     b) What is the total length of the branches
        in each tree?
     c) An interesting shape on a fractal tree is    12. Use the given values for t1 to find the
        a spiral, which you can trace by tracing        successive terms of the following recursive
        a branch to its extremity. Are all spirals      formulas. Continue until a pattern appears.
        within a tree self-similar?                     Describe the pattern and make a prediction
     d) Write your own set of rules for a fractal       for the value of the nth term.
        tree. Draw the tree using paper and              a) tn = 2−tn−1; t1 = 0
        pencil or a drawing program.                     b) tn = ͙tn − 1; t1 = 256
                                                                     ෆ
                                                                   1
11. Inquiry/Problem Solving Related to fractals          c) tn = ᎏ ; t1 = 2
     is the mathematical study of chaos, in which                tn − 1
     no accurate prediction of an outcome can
     be made. A random walk can illustrate such
     “chaotic” outcomes.




12     MHR • Tools for Data Management
ACHIEVEMENT CHECK                                             15. Inquiry/Problem Solving The infinite series
                                                                        S = cos θ + cos2 θ + cos3 θ + … can be
  Knowledge/       Thinking/Inquiry/
 Understanding      Problem Solving
                                       Communication   Application      illustrated by drawing a circle centred at the
                                                                        origin, with radius of 1. Draw an angle θ
13. a) Given t1 = 1, list the next five terms for
                                                                        and, on the x-axis, label the point (cos θ, 0)
         the recursion formula tn = n × tn-1.                           as P1. Draw a new circle, centred at P1 ,
     b) In this sequence, tk is a factorial number,                     with radius of cos θ. Continue this iterative
         often written as k! Show that                                  process. Predict the length of the line
         tk = k!                                                        segment defined by the infinite series
            = k(k − 1)(k − 2)…(2)(1).                                   S = cos θ + cos2 θ + cos3 θ + ….
     c) Explain what 8! means. Evaluate 8!
                                                                     16. Communication Music can be written using
     d) Explain why factorial numbers can be
                                                                        fractal patterns. Look up this type of music
         considered an iterative process.                               in a library or on the Internet. What
     e) Note that                                                       characteristics does fractal music have?
           (25 )(5!)
         = (2 × 2 × 2 × 2 × 2)(5 × 4 × 3 × 2 × 1)                    17. Computers use binary (base 2) code to
         = (2 × 5)(2 × 4)(2 × 3)(2 × 2)(2 × 1)                          represent numbers as a series of ones and
         = 10 × 8 × 6 × 4 × 2                                           zeros.
         which is the product of the first five                                Base 10                Binary
         even positive integers. Write a formula                                0                      0
         for the product of the first n even                                     1                      1
         positive integers. Explain why your                                    2                      10
         formula is correct.                                                    3                      11
                    10!
      f) Write ᎏ as a product of                                                4                     100
                 (25)(5!)
                                                                                Ӈ                      Ӈ
         consecutive odd integers.
     g) Write a factorial formula for the                                a) Describe an algorithm for converting
         product of                                                         integers from base 10 to binary.
         i)      the first six odd positive integers                      b) Write each of the following numbers
                                                                            in binary.
         ii) the first ten odd positive integers
                                                                            i) 16                      ii) 21
         iii) the first n odd positive integers
                                                                            iii) 37                    iv) 130
                                                                         c) Convert the following binary numbers
 C                                                                          to base 10.
14. Inquiry/Problem Solving Recycling can be                                i) 1010                    ii) 100000
     considered an iterative process. Research                              iii) 111010                iv) 111111111
     the recycling process for a material such as
     newspaper, aluminum, or glass and illustrate
     the process with an arrow diagram.




                                                                                         1.1 The Iterative Process • MHR   13
1.2          Data Management Software


       I N V E S T I G AT E & I N Q U I R E : S o f t w a r e To o l s

       1. List every computer program you can think of that can be used to
          manage data.
       2. Sort the programs into categories, such as word-processors and spreadsheets.
       3. Indicate the types of data each category of software would be best suited
          to handle.
       4. List the advantages and disadvantages of each category of software.
       5. Decide which of the programs on your list would be best for storing and
          accessing the lists you have just made.


  Most office and business software manage data of some kind. Schedulers and
  organizers manage lists of appointments and contacts. E-mail programs allow
  you to store, access, and sort your messages. Word-processors help you manage
  your documents and often have sort and outline functions for organizing data
  within a document. Although designed primarily for managing financial
  information, spreadsheets can perform calculations related to the management
  and analysis of a wide variety of data. Most of these programs can easily
  transfer data to other applications.

  Database programs, such as Microsoft® Access and Corel®
  Paradox®, are powerful tools for handling large numbers of
  records. These programs produce relational databases,
  ones in which different sets of records can be linked and
  sorted in complex ways based on the data contained in the
  records. For example, many organizations use a relational
  database to generate a monthly mailing of reminder letters
  to people whose memberships are about to expire. However,
  these complex relational database programs are difficult to
  learn and can be frustrating to use until you are thoroughly
  familiar with how they work. Partly for this reason, there
  are thousands of simpler database programs designed for
  specific types of data, such as book indexes or family trees.

  Of particular interest for this course are programs that can
  do statistical analysis of data. Such programs range from
  modest but useful freeware to major data-analysis packages
  costing thousands of dollars. The more commonly used
  programs include MINITAB™, SAS, and SST (Statistical


  14     MHR • Tools for Data Management
Software Tools). To demonstrate statistical software, some examples in this book have
alternative solutions that use FathomTM, a statistical software package specifically
designed for use in schools.

Data management programs can perform complex calculations and link, search, sort, and
graph data. The examples in this section use a spreadsheet to illustrate these operations.
A spreadsheet is software that arranges data in rows and columns. For basic spreadsheet
instructions, please refer to the spreadsheet section of Appendix B. If you are not already
familiar with spreadsheets, you may find it helpful to try each of the examples yourself
before answering the Practise questions at the end of the section. The two most
commonly used spreadsheets are Corel Quattro® Pro and Microsoft Excel.

Formulas and Functions
A formula entered in a spreadsheet cell can perform calculations based on values or
formulas contained in other cells. Formulas retrieve data from other cells by using
cell references to indicate the rows and columns where the data are located. In the
formulas C2*0.05 and D5+E5, each reference is to an individual cell. In both
Microsoft® Excel and Corel® Quattro® Pro, it is good practice to begin a formula
with an equals sign. Although not always necessary, the equals sign ensures that a
formula is calculated rather than being interpreted as text.

Built-in formulas are called functions. Many functions, such as the SUM function or MAX
function use references to a range of cells. In Corel® Quattro Pro, precede a function
with an @ symbol. For example, to find the total of the values in cells A2 through A6, you
would enter
Corel® Quattro Pro: @SUM(A2..A6)                Microsoft® Excel: SUM(A2:A6)
Similarly, to find the total for a block of cells from A2 through B6, enter
Corel® Quattro Pro: @SUM(A2..B6)                Microsoft® Excel: SUM(A2:B6)

A list of formulas is available in the Insert menu by selecting Function…. You may select
from a list of functions in categories such as Financial, Math & Trig, and Database.

Example 1 Using Formulas and Functions

The first three columns of the spreadsheet on
the right list a student’s marks on tests and
assignments for the first half of a course.
Determine the percent mark for each test or
assignment and calculate an overall midterm mark.




                                                                    1.2 Data Management Software • MHR   15
Solution

In column D, enter formulas with cell referencing to find the percent for
each individual mark. For example, in cell D2, you could use the formula
B2/C2*100.

Use the SUM function to find totals for columns B and C, and then
convert to percent in cell D12 to find the midterm mark.




Relative and Absolute Cell References
Spreadsheets automatically adjust cell references whenever cells are copied,
moved, or sorted. For example, if you copy a SUM function, used to calculate the
sum of cells A3 to E3, from cell F3 to cell F4, the spreadsheet will change the cell
references in the copy to A4 and E4. Thus, the value in cell F4 will be the sum of
those in cells A4 to E4, rather than being the same as the value in F3.

Because the cell references are relative to a location, this automatic adjustment
is known as relative cell referencing. If the formula references need to be kept
exactly as written, use absolute cell referencing. Enter dollar signs before the
row and column references to block automatic adjustment of the references.

Fill and Series Features
When a formula or function is to be copied to several adjoining cells, as for the
percent calculations in Example 1, you can use the Fill feature instead of Copy.
Click once on the cell to be copied, then click and drag across or down through
the range of cells into which the formula is to be copied.

To create a sequence of numbers, enter the first two values in adjoining cells,
then select Edit/Fill/Series to continue the sequence.


16    MHR • Tools for Data Management
Example 2 Using the Fill Feature

The relationship between Celsius and Fahrenheit temperatures is given by
the formula Fahrenheit = 1.8 × Celsius + 32. Use a spreadsheet to produce
a conversion table for temperatures from 1°C to 15ºC.

Solution
Enter 1 into cell E2 and 2 into cell E3.
Use the Fill feature to put the numbers 3
through 15 into cells E4 to E16. Enter the
conversion formula E2*1.8+32 into cell
F2. Then, use the Fill feature to copy the
formula into cells F3 through F16. Note
that the values in these cells show that the
cell references in the formulas did change
when copied. These changes are an
example of relative cell referencing.




Charting
Another important feature of spreadsheets is the ability to display numerical
data in the form of charts or graphs, thereby making the data easier to
understand. The first step is to select the range of cells to be graphed. For
non-adjoining fields, hold down the Ctrl key while highlighting the cells.
Then, use the Chart feature to specify how you want the chart to appear.

You can produce legends and a title for your graph as well as labels for the
axes. Various two- and three-dimensional versions of bar, line, and circle
graphs are available in the menus.

Example 3 Charting

The results and standings of a hockey
league are listed in this spreadsheet.
Produce a two-dimensional bar chart
using the TEAM and POINTS columns.




                                                                  1.2 Data Management Software • MHR   17
Solution

Holding down the Ctrl key, highlight
cells A1 to A7 and then G1 to G7.
Use the Chart feature and follow the
on-screen instructions to customize
your graph. You will see a version of
the bar graph as shown here.




Sorting
Spreadsheets have the capability to sort data alphabetically, numerically, by date,
and so on. The sort can use multiple criteria in sequence. Cell references will
adjust to the new locations of the sorted data. To sort, select the range of cells
to be sorted. Then, use the Sort feature.

Select the criteria under which the data are to be sorted. A sort may be made in
ascending or descending order based on the data in any given column. A sort
with multiple criteria can include a primary sort, a secondary sort within it, and
a tertiary sort within the secondary sort.


Example 4 Sorting

Rank the hockey teams in Example 3, counting points first (in descending
order), then wins (in descending order), and finally losses (in ascending order).

Solution

When you select the Sort feature, the pop-up window asks if there is a header
row. Confirming that there is a header row excludes the column headings from
the sort so that they are left in place. Next, set up a three-stage sort:
• a primary sort in descending order, using the points column
• then, a secondary sort in descending order, using the wins column
• finally, a tertiary sort in ascending order, using the losses column

18    MHR • Tools for Data Management
Searching
To search for data in individual cells, select Find and Replace.

Then, in the dialogue box, enter the data and the criteria under which you are
searching. You have the option to search or to search and replace.

A filtered search allows you to search for rows containing the data for which
you are searching.

Arrows will appear at the top of each column containing data. Clicking on an
arrow opens a pull-down menu where you can select the data you wish to find.
The filter will then display only the rows containing these data. You can filter
for a specific value or select custom… to use criteria such as greater than, begins
with, and does not contain. To specify multiple criteria, click the And or Or
options. You can set different filter criteria for each column.


Example 5 Filtered Search

In the hockey-league spreadsheet from Example 3, use a filtered search to list
only those teams with fewer than 16 points.




                                                                   1.2 Data Management Software • MHR   19
Solution

In Microsoft® Excel, select
Data/Filter/Autofilter to begin the
filter process. Click on the arrow in
the POINTS column and select
custom… In the dialogue window,
select is less than and key in 16.

In Corel® Quattro® Pro, you use
Tools/Quickfilter/custom….




Now, the filter shows only the rows
representing teams with fewer than
16 points.




Adding and Referencing Worksheets
To add worksheets within your spreadsheet file, click on one of the sheet tabs
at the bottom of the data area. You can enter data onto the additional
worksheet using any of the methods described above or you can copy and               Project
                                                                                     Prep
paste data from the first worksheet or from another file.

To reference data from cells in another worksheet, preface the cell reference        The calculation,
with the worksheet number for the cells.                                             sorting, and
                                                                                     charting
Such references allow data entered in sheet A or sheet 1 to be manipulated in        capabilities of
another sheet without changing the values or order of the original data. Data        spreadsheets could
edited in the original sheet will be automatically updated in the other sheets       be particularly
that refer to it. Any sort performed in the original sheet will carry through to     useful for your
any references in other sheets, but any other data in the secondary sheets will      tools for data
not be affected. Therefore, it is usually best to either reference all the data in   management
the secondary sheets or to sort the data only in the secondary sheets.               project.




20    MHR • Tools for Data Management
Example 6 Sheet Referencing

Reference the goals for (GF) and goals against (GA) for the hockey teams
in Example 3 on a separate sheet and rank the teams by their goals scored.

Solution

Sheet 2 needs only the data in the columns titled GF and GA
in sheet 1. Notice that cell C2 contains a cell reference to sheet 1.
This reference ensures the data in cell F2 of sheet 1 will carry
through to cell C2 of sheet 2 even if the data in sheet 1 is edited.
Although the referenced and sorted data on sheet 2 appear
as shown, the order of the teams on sheet 1 is unchanged.

   Key Concepts

   • Thousands of computer programs are available for managing data. These
     programs range from general-purpose software, such as word-processors and
     spreadsheets, to highly specialized applications for specific types of data.
   • A spreadsheet is a software application that is used to enter, display, and
     manipulate data in rows and columns. Spreadsheet formulas perform
     calculations based on values or formulas contained in other cells.
   • Spreadsheets normally use relative cell referencing, which automatically
     adjusts cell references whenever cells are copied, moved, or sorted. Absolute
     cell referencing keeps formula references exactly as written.
   • Spreadsheets can produce a wide variety of charts and perform sophisticated
     sorts and searches of data.
   • You can add additional worksheets to a file and reference these sheets to cells
     in another sheet.

   Communicate Your Understanding

    1. Explain how you could use a word-processor as a data management tool.
    2. Describe the advantages and drawbacks of relational database programs.
    3. Explain what software you would choose if you wanted to determine
       whether there was a relationship between class size and subject in your
       school. Would you choose different software if you were going to look at
       class sizes in all the schools in Ontario?
    4. Give an example of a situation requiring relative cell referencing and one
       requiring absolute cell referencing.
    5. Briefly describe three advantages that spreadsheets have over hand-written
       tables for storing and manipulating data.


                                                                    1.2 Data Management Software • MHR   21
Practise                                              Apply, Solve, Communicate
 A                                                    B
 1. Application Set up a spreadsheet page in which    3. Set up a spreadsheet page that converts
     you have entered the following lists of data.        angles in degrees to radians using the
     For the appropriate functions, look under the        formula Radians = π×Degrees/180, for
     Statistical category in the Function list.           angles from 0° to 360° in steps of 5°. Use
     Student marks:                                       the series capabilities to build the data in the
        65, 88, 56, 76, 74, 99, 43, 56, 72, 81, 80,       Degrees column. Use π as defined by the
        30, 92                                            spreadsheet. Calculations should be rounded
                                                          to the nearest hundredth.
     Dentist appointment times in minutes:
       45, 30, 40, 32, 60, 38, 41, 45, 40, 45         4. The first set of data below represents the
     a) Sort each set of data from smallest to            number of sales of three brands of CD
        greatest.                                         players at two branches of Mad Dog Music.
     b) Calculate the mean (average) value for            Enter the data into a spreadsheet using two
        each set of data.                                 rows and three columns.
     c) Determine the median (middle) value                    Branch        Brand A    Brand B   Brand C
        for each set of data.                                 Store P              12      4        8
                                                              Store Q         9        15           6
     d) Determine the mode (most frequent)                    BRAND A          BRAND B            BRAND C
        value for each set of data.                       The second set of data     Brand         Price
                                                          represents the prices for     A           $102
 2. Using the formula features of the
                                                          these CD players. Enter         B        $89
     spreadsheet available in your school, write
                                                          the data using one              C       $145
     a formula for each of the following:
                                                          column into a second
     a) the sum of the numbers stored in cells            sheet of the same spreadsheet workbook.
        A1 to A9                                          Set up a third sheet of the spreadsheet
     b) the largest number stored in cells F3 to K3       workbook to reference the first two sets of
     c) the smallest number in the block from             data and calculate the total revenue from CD
        A1 to K4                                          player sales at each Mad Dog Music store.
     d) the sum of the cells A2, B5, C7, and D9       5. Application In section 1.1, question 12, you
     e) the mean, median, and mode of the                 predicted the value of the nth term of the
        numbers stored in the cells F5 to M5              recursion formulas listed below. Verify your
     f) the square root of the number in cell A3          predictions by using a spreadsheet to
     g) the cube of the number in cell B6
                                                          calculate the first ten terms for each
                                                          formula.
     h) the number in cell D2 rounded off to
                                                          a) tn = 2−tn–1; t1 = 0
        four decimal places
                                                          b) tn = ͙tn − 1; t1 = 256
                                                                     ෆ
      i) the number of cells between cells D3 and
                                                                    1
        M9 that contain data                              c) tn = ᎏ ; t1 = 2
                                                                  tn − 1
      j) the product of the values in cells A1, B3,
        and C5 to C10
     k) the value of π


22     MHR • Tools for Data Management
6. a) Enter the data shown in the table below                               f) Perform a search in the second sheet to
          into a spreadsheet and set up a second                                 find the cereals containing less than 1 g of
          sheet with relative cell references to the                             fat and more than 1.5 g of fibre. Make a
          Name, Fat, and Fibre cells in the original                             three-dimensional bar graph of the results.
          sheet.
                                                                        C
Nutritional Content of 14 Breakfast Cereals
                                                                        7. In section 1.1, question 10, you described
(amounts in grams)
                                                                            the algorithm used to draw each fractal tree
Name             Protein   Fat   Sugars Starch   Fibre   Other TOTALS
                                                                            below. Assuming the initial segment is 4 cm
Alphabits          2.4     1.1 12.0 12.0          0.9     1.6               in each tree, use a spreadsheet to determine
Bran Flakes        4.4     1.2    6.3     4.7 11.0        2.4               the total length of a spiral in each tree,
Cheerios           4.0     2.3    0.8 18.7        2.2     2.0               calculated to 12 iterative stages.
Crispix            2.2     0.3    3.2 22.0        0.5     1.8               a)
Froot Loops        1.3     0.8 14.0 12.0          0.5     1.4
Frosted Flakes     1.4     0.2 12.0 15.0          0.5     0.9
Just Right         2.2     0.8    6.6 17.0        1.4     2.0
Lucky Charms       2.1     1.0 13.0 11.0          1.4     1.5
Nuts ’n Crunch 2.3         1.6    7.1 16.5        0.7     1.8
Rice Krispies      2.1     0.4    2.9 22.0        0.3     2.3
Shreddies          2.9     0.6    5.0 16.0        3.5     2.0               b)
Special K          5.1     0.4    2.5 20.0        0.4     1.6
Sugar Crisp        2.0     0.7 14.0 11.0          1.1     1.2
Trix               0.9     1.6 13.0 12.0          1.1     1.4
AVERAGES
MAXIMUM
MINIMUM

    b) On the first sheet, calculate the values for
          the TOTALS column and AVERAGES
          row.
    c) Determine the maximum and minimum
          values in each column.
    d) Rank the cereals using fibre content in
          decreasing order as a primary criterion,                      8. Communication Describe how to lock column
          protein content in decreasing order as a                          and row headings in your spreadsheet
          secondary criterion, and sugar content in                         software so that they remain visible when
          increasing order as a tertiary criterion.                         you scroll through a spreadsheet.
    e) Make three circle graphs or pie charts:
                                                                        9. Inquiry/Problem Solving Outline a
          one for the averages row in part b), one
          for the cereal at the top of the list in part                     spreadsheet algorithm to calculate
          d), and one for the cereal at the bottom                          n × (n − 1) × (n − 2) … 3 × 2 × 1 for any
          of the list in part d).                                           natural number n without using the built-in
                                                                            factorial function.



                                                                                    1.2 Data Management Software • MHR    23
TE C H N OL OG Y                 E X T EN S I O N

            Introduction to Fathom™
Fathom™ is a statistics software package that offers a variety of powerful data-
analysis tools in an easy-to-use format. This section introduces the most basic
features of Fathom™: entering, displaying, sorting, and filtering data. A
                                                                                      Appendix B includes
complete guide is available on the Fathom™ CD. The real power of this
                                                                                      details on all the
software will be demonstrated in later chapters with examples that apply its          Fathom™ functions
sophisticated tools to statistical analysis and simulations.                          used in this text.
When you enter data into Fathom™, it creates a collection, an object that
contains the data. Fathom™ can then use the data from the collection to
produce other objects, such as a graph, table, or statistical test. These secondary
objects display and analyse the data from the collection, but they do not actually
contain the data themselves. If you delete a graph, table, or statistical test, the
data still remains in the collection.

Fathom™ displays a collection as a rectangular window
with gold balls in it. The gold balls of the collection
represent the original or “raw” data. Each of the gold balls
represents a case. Each case in a collection can have a
number of attributes. For example the cases in a collection
of medical records could have attributes such as the
patient’s name, age, sex, height, weight, blood pressure, and
so on. There are two basic types of attributes, categorical
(such as male/female) and continuous (such as height or
mass). The case table feature displays the cases in a
collection in a format similar to a spreadsheet, with a row
for each case and a column for each attribute. You can add,
modify, and delete cases using a case table.

Example 1 Tables and Graphs

a) Set up a collection for the hockey league standings from section 1.2,
     Example 3 on page 17.
b) Graph the Team and Points attributes.


Solution

a) To enter the data, start Fathom™ and drag
     the case table icon   from the menu bar
     down onto the work area.




24     MHR • Tools for Data Management
Click on the attribute <new>, type the heading Team, and press Enter. Fathom™
   will automatically create a blank cell for data under the heading and start a new
   column to the right of the first. Enter the heading Wins at the top of the new
   column, and continue this process to enter the rest of the headings. You can type
   entries into the cells under the headings in much the same way as you would enter
   data into the cells of a spreadsheet.




   Note that Fathom™ has stored your data as Collection 1, which will remain intact
   even if you delete the case table used to enter the data. To give the collection a
   more descriptive name, double-click on Collection 1 and type in HockeyStats.

b) Drag the graph icon       onto the work area. Now, drag the Team attribute from
   the case table to the x-axis of the graph and the Points attribute to the y-axis of the
   graph.



                ➔




   Your graph should look like this:




                                                       Technology Extension: Introduction to Fathom™ • MHR   25
Fathom™ can easily sort or filter data using the various attributes.

Example 2 Sorting and Filtering

a) Rank the hockey teams in Example 1 by points first, then by wins if two teams
   have the same number of points, and finally by losses if two teams have the
   same number of points and wins.
b) List only those teams with fewer than 16 points.
c) Set up a separate table showing only the goals for (GF) and goals against (GA)
   data for the teams and rank the teams by their goals scored.

Solution

a) To Sort the data, right-click on the Points attribute and choose Sort Descending.
     Fathom™ will list the team with the most points first, with the others
     following in descending order by their point totals. To set the secondary sort,
     right-click on the Wins attribute and choose Sort Descending. Similarly, right-
     click on the Losses attribute and choose Sort Ascending for the final sort, giving
     the result below.




b) To Filter the data, from the Data menu, choose Add Filter. Click on the
     plus sign beside Attributes.
     Now, double-click on the Points attribute, choose the less-than button     , and
     type 16. Click the Apply button and then OK.

     The results should look like this:




26     MHR • Tools for Data Management
The Filter is listed at the bottom as Points < 16.

c) Click on HockeyStats, and then drag a new table onto the work area.
   Click on the Wins attribute. From the Display menu, choose Hide
   Attribute. Use the same method to hide the Losses, Ties, and Points
   attributes. Right-click the GF attribute and use Sort Descending to
   rank the teams.




 1. Enter the data from Example 1 into Fathom™. Use the built-in                         For details on functions
   functions in Fathom™ to find the following.                                            in Fathom™, see the
                                                                                         Fathom™ section of
   a) the mean of goals against (GA)
                                                                                         Appendix B or
   b) the largest value of goals for (GF)                                                consult the Fathom™
   c) the smallest value of GF                                                           Help screen or manual.
   d) the sum of GA
   e) the sum of GA and GF for each case

 2. a) Set up a new collection with the following student marks:
      65, 88, 56, 76, 74, 99, 43, 56, 72, 81, 80, 30, 92
   b) Sort the marks from lowest to highest.
   c) Calculate the mean mark.
   d) Determine the median (middle) mark.

 3. Explain how you would create a graph of class size versus subjects in
   your school using Fathom™.

 4. Briefly compare the advantages and disadvantages
   of using Fathom™ and spreadsheets for
   storing and manipulating data.
                                                                 www.mcgrawhill.ca/links/MDM12

                                                           For more examples, data, and information on how
                                                             to use Fathom™, visit the above web site and
                                                                           follow the links.




                                                        Technology Extension: Introduction to Fathom™ • MHR    27
1.3           Databases

  A database is an organized
  store of records. Databases
  may contain information
  about almost any subject—
  incomes, shopping habits,
  demographics, features of
  cars, and so on.




       I N V E S T I G AT E & I N Q U I R E : D a t a b a s e s i n a L i b r a r y

       In your school or local public library, log on to the library catalogue.

        1. Describe the types of fields under which a search can be conducted
          (e.g., subject).
        2. Conduct a search for a specific topic of your choice.
        3. Describe the results of your search. How is the information
          presented to the user?


       I N V E S T I G AT E & I N Q U I R E : T h e E - S TAT D a t a b a s e

        1. Connect to the Statistics Canada web site and go to the E-STAT database.
          Your school may have a direct link to this database. If not, you can follow
          the Web Connection links shown here. You may need to get a password
          from your teacher to log in.
        2. Locate the database showing the educational attainment data for Canada
          by following these steps:
           a) Click on Data.
           b) Under the heading People, click on
                                                                        www.mcgrawhill.ca/links/MDM12
              Education.
           c) Click on Educational Attainment,                    To connect to E-STAT visit the above web site and
              then under the heading Census                                        follow the links.
              databases, select Educational
              Attainment again.
           d) Select Education, Mobility and
              Migration for the latest census.


  28      MHR • Tools for Data Management
3. Scroll down to the heading University, pop. 15 years and over by highest level of
       schooling, hold down the Ctrl key, and select all four subcategories under this
       heading. View the data in each of the following formats:
        a) table    b) bar graph     c) map
     4. Describe how the data are presented in each instance. What are the
       advantages and disadvantages of each format? Which format do you
       think is the most effective for displaying this data? Explain why.
     5. Compare the data for the different provinces and territories. What
       conclusions could you draw from this data?


A database record is a set of data that is treated as a unit. A record is usually
divided into fields that are reserved for specific types of information. For
example, the record for each person in a telephone book has four fields: last
name, first name or initial, address, and telephone number. This database is
sorted in alphabetical order using the data in the first two fields. You search this
database by finding the page with the initial letters of a person’s name and then
simply reading down the list.

A music store will likely keep its inventory records on a computerized database.
The record for each different CD could have fields for information, such as title,
artist, publisher, music type, price, number in stock, and a product code (for
example, the bar code number). The computer can search such databases for
matches in any of the data fields. The staff of the music store would be able to
quickly check if a particular CD was in stock and tell the customer the price and
whether the store had any other CDs by the same artist.

Databases in a Library
A library catalogue is a database. In the past, library databases were accessed
through a card catalogue. Most libraries are now computerized, with books listed
by title, author, publisher, subject, a Dewey Decimal or Library of Congress
catalogue number, and an international standard book number (ISBN). Records
can be sorted and searched using the information in any of the fields.

Such catalogues are examples of a well-organized database because they               Project
are easy to access using keywords and searches in multiple fields, many of            Prep
which are cross-referenced. Often, school libraries are linked to other
                                                                                     Skills in researching
libraries. Students have access to a variety of print and online databases in
                                                                                     library and on-line
the library. One powerful online database is Electric Library Canada, a
                                                                                     databases will help you
database of books, newspapers, magazines, and television and radio
                                                                                     find the information
transcripts. Your school probably has access to it or a similar library
                                                                                     needed for your tools
database. Your local public library may also have online access to Electric
                                                                                     for data management
Library Canada.
                                                                                     project.


                                                                                   1.3 Databases • MHR    29
Statistics Canada
Statistics Canada is the federal government department responsible for
collecting, summarizing, analysing, and storing data relevant to Canadian
demographics, education, health, and so on. Statistics Canada maintains a
number of large databases using data collected from a variety of sources
including its own research and a nation-wide census. One such database is
CANSIM II (the updated version of the Canadian Socio-economic Information
Management System), which profiles the Canadian people, economy, and
industries. Although Statistics Canada charges a fee for access to some of its
data, a variety of CANSIM II data is available to the public for free on Statistics
Canada’s web site.

Statistics Canada also has a free educational database, called E-STAT. It gives       Data in Action
access to many of Statistics Canada’s extensive, well-organized databases,            By law, Statistics
including CANSIM II. E-STAT can display data in a variety of formats and              Canada is required
allows students to download data into a spreadsheet or statistical software           to conduct a census
program.                                                                              of Canada’s
                                                                                      population and
                                                                                      agriculture every five
                                                                                      years. For the 2001
                                                                                      census, Statistics
                                                                                      Canada needed
                                                                                      about 37 000 people
                                                                                      to distribute the
                                                                                      questionnaires.
                                                                                      Entering the data
                                                                                      from the
                                                                                      approximately
                                                                                      13.2 million
                                                                                      questionnaires will
                                                                                      take about 5 billion
                                                                                      keystrokes.




30    MHR • Tools for Data Management
Key Concepts

  • A database is an organized store of records. A well-organized database can be
    easily accessed through searches in multiple fields that are cross-referenced.

  • Although most databases are computerized, many are available in print form.



  Communicate Your Understanding

    1. For a typical textbook, describe how the table of contents and the index are
       sorted. Why are they sorted differently?

    2. Describe the steps you need to take in order to access the 1860−61 census
       results through E-STAT.


Practise                                               3. a) Describe how you would locate a
                                                             database showing the ethnic makeup of
A                                                            your municipality. List several possible
1. Which of the following would be considered                sources.
    databases? Explain your reasoning.                    b) If you have Internet access, log onto
    a) a dictionary                                          E-STAT and go to the data on ethnic
    b) stock-market listings                                 origins of people in large urban centres:
    c) a catalogue of automobile specifications               i)   Select Data on the Table of Contents
        and prices                                                page.
    d) credit card records of customers’                     ii) Select Population and Demography.
        spending habits                                      iii) Under Census, select Ethnic Origin.
    e) an essay on Shakespeare’s Macbeth
                                                             iv) Select Ethnic Origin and Visible
     f) a teacher’s mark book                                     Minorities for the latest census in
    g) the Guinness World Records book                            large urban centres.
    h) a list of books on your bookshelf                     v) Enter a postal code for an urban area
                                                                  and select two or more ethnic origins
Apply, Solve, Communicate                                         while holding down the Ctrl key.
B                                                            vi) View table, bar graph, and map in
2. Describe each field you would include in a                      turn and describe how the data are
    database of                                                   presented in each instance.
    a) a person’s CD collection                           c) Compare these results with the data you
    b) a computer store’s software inventory
                                                             get if you leave the postal code section
                                                             line blank. What conclusions could you
    c) a school’s textbook inventory
                                                             draw from the two sets of data?
    d) the backgrounds of the students in a
       school
    e) a business’s employee records

                                                                                1.3 Databases • MHR     31
4. Application                                                6. Application The Internet is a link between
     a) Describe how you could find data to                         many databases. Search engines, such as
        compare employment for males and                           Yahoo Canada, Lycos, Google, and Canoe,
        females. List several possible sources.                    are large databases of web sites. Each search
                                                                   engine organizes its database differently.
     b) If you have Internet access, log onto
        E-STAT and go to the data on                               a) Use three different search engines to
        employment and work activity:                                 conduct a search using the keyword
                                                                      automobile. Describe how each search
        i)   Under the People heading, select
                                                                      engine presents its data.
             Labour.
                                                                   b) Compare the results of searches with
        ii) Under the Census databases heading,
                                                                      three different search engines using the
             select Salaries and Wages.
                                                                      following keywords:
        iii) Select Sources of Income (Latest
                                                                      i)   computer monitors
             census, Provinces, Census Divisions,
             Municipalities).                                         ii) computer+monitors

        iv) While holding down the Ctrl key,                          iii) computer or monitors
             click on All persons with                                iv) “computer monitors”
             employment income by work activity,
             Males with employment income by                   7. Use the Internet to check whether the map
                                                             pte
                                                        ha         of VIA Rail routes at the start of this chapter
             work activity, and Females with
                                                    C


                                                               r


             employment income by work activity.                   is up-to-date. Are there still no trains that
                                                               m
                                                    P




                                                    r
                                                        oble
                                                                   go from Montréal or Kingston right
        v) Download this data as a spreadsheet
                                                                   through to Windsor?
             file. Record the path and file name
             for the downloaded data.                          8. Communication Log on to the Electric
     c) Open the data file with a spreadsheet.                      Library Canada web site or a similar
        You may have to convert the format to                      database available in your school library.
        match your spreadsheet software. Use                       Enter your school’s username and password.
        your spreadsheet to                                        Perform a search for magazine articles,
        i)   calculate the percentage difference                   newspaper articles, and radio transcripts
             between male and female                               about the “brain drain” or another issue of
             employment                                            interest to you. Describe the results of your
                                                                   search. How many articles are listed? How
        ii) display all fields as a bar graph
                                                                   are the articles described? What other
 5. Communication Go to the reference area of                      information is provided?
     your school or local library and find a
     published database in print form.
     a) Briefly describe how the database is
        organized.
     b) Describe how to search the database.
     c) Make a list of five books that are set up
        as databases. Explain why they would be
        considered databases.


32     MHR • Tools for Data Management
1.4           Simulations

  A simulation is an experiment,
  model, or activity that imitates
  real or hypothetical conditions.
  The newspaper article shown here
  describes how astrophysicists used
  computers to simulate a collision
  between Earth and a planet the
  size of Mars, an event that would
  be impossible to measure directly. The simulation showed that such a collision
  could have caused both the formation of the moon and the rotation of Earth,
  strengthening an astronomical theory put forward in the 1970s.

        I N V E S T I G AT E & I N Q U I R E : Simulations

       For each of the following, describe what is being simulated, the
       advantages of using a simulation, and any drawbacks.

       a)   crash test dummies           b) aircraft simulators
       c)   wind tunnels                 d) zero-gravity simulator
       e)   3-D movies                    f) paint-ball games
       g) movie stunt actors             h) grow lights
       i)   architectural scale models

  In some situations, especially those with many variables, it can be difficult to
  calculate an exact value for a quantity. In such cases, simulations often can provide
  a good estimate. Simulations can also help verify theoretical calculations.

  Example 1 Simulating a Multiple-Choice Test

  When writing a multiple-choice test, you may have wondered “What are my
  chances of passing just by guessing?” Suppose that you make random guesses on a
  test with 20 questions, each having a choice of 5 answers. Intuitively, you would
  assume that your mark will be somewhere around 4 out of 20 since there is a 1 in 5
  chance of guessing right on each question. However, it is possible that you could
  get any number of the questions right—anywhere from zero to a perfect score.
  a)    Devise a simulation for making guesses on the multiple-choice test.
  b)    Run the simulation 100 times and use the results to estimate the mark
        you are likely to get, on average.
  c)    Would it be practical to run your simulation 1000 times or more?

                                                                                  1.4 Simulations • MHR   33
Solution 1         Using Pencil and Paper

a) Select any five cards from a deck of cards. Designate one of these cards to
     represent guessing the correct answer on a question. Shuffle the five cards
     and choose one at random. If it is the designated card, then you got the
     first question right. If one of the other four cards is chosen, then you got
     the question wrong.

     Put the chosen card back with the others and repeat the process 19 times
     to simulate answering the rest of the questions on the test. Keep track of
     the number of right answers you obtained.

b) You could run 100 simulations by repeating the process in part a) over and
     over. However, you would have to choose a card 2000 times, which would
     be quite tedious. Instead, form a group with some of your classmates and
     pool your results, so that each student has to run only 10 to 20 repetitions
     of the simulation.

     Make a table of the scores on the 100 simulated tests and calculate the
     mean score. You will usually find that this average is fairly close to the
     intuitive estimate of a score around 4 out of 20. However, a mean does not
     tell the whole story. Tally up the number of times each score appears in
     your table. Now, construct a bar graph showing the frequency for each
     score. Your graph will look something like the one shown.
     25



     20



     15



     10



      5



      0
           0   1   2   3   4   5   6   7   8   9   10 11 12 13 14 15 16 17 18 19 20

     This graph gives a much more detailed picture of the results you could
     expect. Although 4 is the most likely score, there is also a good chance of
     getting 2, 3, 5, or 6, but the chance of guessing all 20 questions correctly is
     quite small.

c) Running the simulation 1000 times would require shuffling the five cards
     and picking one 20 000 times—possible, but not practical.



34        MHR • Tools for Data Management
Solution 2    Using a Graphing Calculator                                          See Appendix B for
                                                                                   more details on how to
a) You can use random numbers as the basis for a simulation. If you
                                                                                   use the graphing
   generate random integers from 1 to 5, you can have 1 correspond to              calculator and
   a correct guess and 2 through 5 correspond to wrong answers.                    software functions in
                                                                                   Solutions 2 to 4.
   Use the STAT EDIT menu to view lists L1 and L2. Make sure both lists
   are empty. Scroll to the top of L1 and enter the randInt function from
   the MATH PRB menu. This function produces random integers.

   Enter 1 for the lower limit, 5 for the upper limit, and 20 for the
   number of trials. L1 will now contain 20 random integers between 1
   and 5. Next, sort the list with the SortA function on the LIST OPS menu.
   Press 2nd 1 to enter L1 into the sort function. When you return to L1,
   the numbers in it will appear in ascending order. Now, you can easily
   scroll down the list to determine how many correct answers there were
   in this simulation.

b) The simplest way to simulate 100 tests is to repeat the procedure in
   part a) and keep track of the results by entering the number of correct
   answers in L2. Again, you may want to pool results with your classmates
   to reduce the number of times you have to enter the same formula over
   and over. If you know how to program your calculator, you can set it to
   re-enter the formulas for you automatically. However, unless you are
   experienced in programming the calculator, it will probably be faster for
   you to just re-key the formulas.

   Once you have the scores from 100 simulations in L2, calculate the
   average using the mean function on the LIST MATH menu. To see which
   scores occur most frequently, plot L2 using STAT PLOT.
   i)    Turn off all plots except Plot1.
   ii)   For Type, choose the bar-graph icon and enter L2 for Xlist.
         Freq should be 1, the default value.
   iii) Use ZOOM/ZoomStat to set the window for the data. Press
         WINDOW to check the window settings. Set Xscl to 1 so that the
         bars correspond to integers.
   iv) Press GRAPH to display the bar graph.

c) It is possible to program the calculator to run a large number of
   simulations automatically. However, the maximum list length on the
   TI-83 Plus is 999, so you would have to use at least two lists to run
   the simulation a 1000 times or more.




                                                                               1.4 Simulations • MHR   35
Solution 3    Using a Spreadsheet

a) Spreadsheets have built-in functions that you can use to generate and count
     the random numbers for the simulation.

     The RAND() function produces a random real number that is equal to or
     greater than zero and less than one. The INT function rounds a real number
     down to the nearest integer. Combine these functions to generate a random
     integer between 1 and 5.                                                        In Microsoft® Excel,
                                                                                     you can use
     Enter the formula INT(RAND( )*5)+1 or RANDBETWEEN(1,5) in A1 and copy           RANDBETWEEN only if
     it down to A20. Next, use the COUNTIF function to count the number of 1s        you have the Analysis
     in column A. Record this score in cell A22.                                     Toolpak installed.




b) To run 100 simulations, copy A1:A22 into columns B through CV using the
     Fill feature. Then, use the average function to find the mean score for the
     100 simulated tests. Record this average in cell B23.

     Next, use the COUNTIF function to find the number of times each possible
     score occurs in cells A22 to CV22. Enter the headings SUMMARY, Score, and
     Frequency in cells A25, A26, and A27, respectively. Then, enter 0 in cell B26
     and highlight cells B26 through V26. Use the Fill feature to enter the
     integers 0 through 20 in cells B26 through V26. In B27, enter the formula
     for the number of zero scores; in C27, the number of 1s; in D27, the
     number of 2s; and so on, finishing with V27 having the number of perfect


36     MHR • Tools for Data Management
scores. Note that by using absolute cell referencing you can simply copy
   the COUNTIF function from B27 to the other 20 cells.

   Finally, use the Chart feature to plot frequency versus score. Highlight
   cells A26 through V27, then select Insert/Chart/XY.




c) The method in part b) can easily handle 1000 simulations or more.


Solution 4   Using FathomTM

a) FathomTM also has built-in functions to generate
   random numbers and count the scores in the
   simulations.

   Launch FathomTM and open a new document if
   necessary. Drag a new collection box to the
   document and rename it MCTest. Right-click on
   the box and create 20 new cases.

   Drag a case table to the work area. You should
   see your 20 cases listed. Expand the table if you
   cannot see them all on the screen.

   Rename the <new> column Guess. Right-click on
   Guess and select Edit Formula, Expand Functions,
   then Random Numbers. Enter 1,5 into the
   randomInteger() function and click OK to fill the
   Guess column with random integers between 1
   and 5. Scroll down the column to see how many
   correct guesses there are in this simulation.



                                                                              1.4 Simulations • MHR   37
b) You can run a new simulation just by pressing Ctrl-Y, which will fill the
     Guess column with a new set of random numbers. Better still, you can set
     FathomTM to automatically repeat the simulation 100 times automatically
     and keep track of the number of correct guesses.

     First, set up the count function. Right–click on the collection box and select
     Inspect Collection. Select the Measures tab and rename the <new> column
     Score. Then, right-click the column below Formula and select Edit Formula,
     Functions, Statistical, then One Attribute. Select count, enter Guess = 1
     between the brackets, and click OK to count the number of correct guesses
     in your case table.

     Click on the MCTest collection box. Now, select Analyse, Collect Measures
     from the main menu bar, which creates a new collection box called Measures
     from MCTest. Click on this box and drag a new case table to the document.
     FathomTM will automatically run five simulations of the multiple-choice test
     and show the results in this case table .

     To simulate 100 tests, right-click on the Measures from MCTest collection box
     and select Inspect Collection. Turn off the animation in order to speed up the
     simulation. Change the number of measures to 100. Then, click on the
     Collect More Measures button. You should now have 100 measures in the
     case table for Measures from MCTest.

     Next, use the mean function to find the average score for these simulations.
     Go back to the Inspect Measures from MCTest collection box and change the
     column heading <new> to Average. Right-click on Formula and select Edit
     Formula, Functions, Statistical, then One Attribute. Select mean, enter Score
     between the brackets, and select OK to display the mean mark on the 100
     tests.

     Finally, plot a histogram of the scores from the simulations. Drag the graph
     icon onto the work area. Then, drag the Score column from the Measures
     from MCTest case table to the horizontal axis of the graph. FathomTM then
     automatically produces a dot plot of your data. To display a histogram
     instead, simply click the menu in the upper right hand corner of the graph
     and choose Histogram.




38     MHR • Tools for Data Management
c) FathomTM can easily run this simulation 1000 times or more. All you have
   to do is change the number of measures.

   Key Concepts

   • Simulations can be useful tools for estimating quantities that are difficult to
     calculate and for verifying theoretical calculations.

   • A variety of simulation methods are available, ranging from simple manual
     models to advanced technology that makes large-scale simulations feasible.

   Communicate Your Understanding

    1. Make a table summarizing the pros and cons of the four simulation methods
      used in Example 1.

    2. A manufacturer of electric motors has a failure rate of 0.2% on one of its
      products. A quality-control inspector needs to know the range of the number
      of failures likely to occur in a batch of 1000 of these motors. Which tool
      would you use to simulate this situation? Give reasons for your choice.


                                                                               1.4 Simulations • MHR   39
Practise                                             7. Inquiry/Problem Solving Consider a random
                                                         walk in which a coin toss determines the
 A                                                       direction of each step. On the odd-
 1. Write a graphing calculator formula for              numbered tosses, walk one step north for
     a) generating 100 random integers between           heads and one step south for tails. On even-
        1 and 25                                         numbered tosses, walk one step east for
                                                         heads and one step west for tails.
     b) generating 24 random integers between
        −20 and 20                                       a) Beginning at position (0, 0) on a
                                                             Cartesian graph, simulate this random
 2. Write a spreadsheet formula for                          walk for 100 steps. Note the coordinates
     a) generating 100 random numbers                        where you finish.
        between 1 and 25                                 b) Repeat your simulation 10 times and
     b) generating 100 random integers between               record the results.
        1 and 25                                         c) Use these results to formulate a hypothesis
     c) generating 16 random integers between                about the endpoints for this random walk.
        −40 and 40                                       d) Change the rules of the random walk and
     d) counting the number of entries that                  investigate the effect on the end points.
        equal 42.5 in the range C10 to V40
                                                           ACHIEVEMENT CHECK
Apply, Solve, Communicate
                                                      Knowledge/     Thinking/Inquiry/
                                                                                         Communication   Application
                                                     Understanding    Problem Solving
 B
 3. Communication Identify two simulations you       8. a) Use technology to simulate rolling two
     use in everyday life and list the advantages            dice 100 times and record the sum of
     of using each simulation.                               the two dice each time. Make a
                                                             histogram of the sums.
 4. Describe three other manual methods you              b) Which sum occurs most often? Explain
     could use to simulate the multiple-choice               why this sum is likely to occur more
     test in Example 1.                                      often than the other sums.
 5. Communication                                        c) Which sum or sums occur least often?
                                                             Explain this result.
     a) Describe a calculation or mechanical
        process you could use to produce                 d) Suppose three dice are rolled 100 times
        random integers.                                     and the sums are recorded. What sums
                                                             would you expect to be the most
     b) Could you use a telephone book to
                                                             frequent and least frequent? Give
        generate random numbers? Explain
                                                             reasons for your answers.
        why or why not.

 6. Application A brother and sister each tell the
     truth two thirds of the time. The brother       C
     stated that he owned the car he was driving.    9. Communication Describe a quantity that
     The sister said he was telling the truth.           would be difficult to calculate or to measure
     Develop a simulation to show whether you            in real life. Outline a simulation procedure
     should believe them.                                you could use to determine this quantity.


40     MHR • Tools for Data Management
1.5            Graph Theory

  Graph theory is a branch of mathematics in which graphs or networks are used
  to solve problems in many fields. Graph theory has many applications, such as
  • setting examination timetables
  • colouring maps
  • modelling chemical compounds
  • designing circuit boards
  • building computer, communication, or transportation networks
  • determining optimal paths

  In graph theory, a graph is unlike the traditional Cartesian graph used for
  graphing functions and relations. A graph (also known as a network) is a
  collection of line segments and nodes. Mathematicians usually call the nodes
  vertices and the line segments edges. Networks can illustrate the relationships
  among a great variety of objects or sets.

  This network is an illustration of the subway system in Toronto. In order to
  show the connections between subway stations, this map is not to scale. In fact,
  networks are rarely drawn to scale.




      I N V E S T I G AT E & I N Q U I R E : Map Colouring

      In each of the following diagrams the lines represent borders between
      countries. Countries joined by a line segment are considered neighbours,
      but countries joining at only a single point are not.

       1. Determine the smallest number of colours needed for each map such
         that all neighbouring countries have different colours.
          a)                        b)




                                                                              1.5 Graph Theory • MHR   41
c)                           d)                          e)




     2. Make a conjecture regarding the maximum number of colours needed
        to colour a map. Why do you think your conjecture is correct?



Although the above activity is based on maps, it is very mathematical.
It is about solving problems involving connectivity. Each country could
be represented as a node or vertex. Each border could be represented
by a segment or edge.


Example 1 Representing Maps With Networks

Represent each of the following maps with a network.
a)                                 b)
             A                                     B
                                          A
                     B
        D                                      C
                                                       D
                 C                        F   E


Solution

a) Let A, B, C, and D be vertices representing countries A, B, C, and                     A
     D, respectively. A shares a border with both B and D but not with
     C, so A should be connected by edges to B and D only. Similarly, B       D                   B
     is connected to only A and C; C, to only B and D; and D, to only A
     and C.
                                                                                          C


b) Let A, B, C, D, E, and F be vertices representing countries A, B, C,       A       B
     D, E, and F, respectively. Note that the positions of the vertices are
     not important, but their interconnections are. A shares borders with                     C
                                                                              F
     B, C, and F, but not with D or E. Connect A with edges to B, C,
     and F only. Use the same process to draw the rest of the edges.              E           D




42     MHR • Tools for Data Management
As components of networks, edges could represent connections such as roads,
wires, pipes, or air lanes, while vertices could represent cities, switches, airports,
computers, or pumping stations. The networks could be used to carry vehicles,
power, messages, fluid, airplanes, and so on.
If two vertices are connected by an edge, they are considered to be adjacent.                A                    B
In the network on the right, A and B are adjacent, as are B and C. A and C are
not adjacent.
The number of edges that begin or end at a vertex is called the degree of the
                                                                                                      C
vertex. In the network, A has degree 1, B has degree 2, and C has degree 3. The
loop counts as both an edge beginning at C and an edge ending at C.
Any connected sequence of vertices is called a path. If the path begins and ends
at the same vertex, the path is called a circuit. A circuit is independent of the
starting point. Instead, the circuit depends on the route taken.

Example 2 Circuits

Determine if each path is a circuit.
a)   A                 B       b)    A               B       c)    A               B




     D                 C             D               C             D               C

Solution
a) Path: BC to CD to DA
     Since this path begins at B and ends at A, it is not a circuit.

b) Path: BC to CD to DA to AB
     This path begins at B and ends at B, so it is a circuit.

c) Path: CA to AB to BC to CD to DA
     Since this path begins at C and ends at A, it is not a circuit.


A network is connected if and only if there is at least one path connecting each pair of
vertices. A complete network is a network with an edge between every pair of vertices.




     Connected but not complete: Not     Connected and complete: All vertices   Neither connected nor complete:
     all vertices are joined directly.   are joined to each other by edges.     Not all vertices are joined.

                                                                                       1.5 Graph Theory • MHR         43
In a traceable network all the vertices are connected to at least one other
vertex and all the edges can be travelled exactly once in a continuous path.
          B                          B                    P                             P



A                   C     A                    C      S                Q            S           Q



           D                         D                    R                             R
Traceable: All vertices are connected to at least   Non-traceable: No continuous path can
one other vertex, and the path from A to B to C     travel all the edges only once.
to D to A to C includes all the edges without
repeating any of them.


Example 3 The Seven Bridges of Koenigsberg

The eighteenth-century German town of Koenigsberg (now the Russian
city of Kaliningrad) was situated on two islands and the banks of the
Pregel River. Koenigsberg had seven bridges as shown in the map.

People of the town believed—but could not prove—that it was
impossible to tour the town, crossing each bridge exactly once,
regardless of where the tour started or finished. Were they right?

Solution

Reduce the map to a simple network of vertices and edges. Let vertices                      A
A and C represent the mainland, with B and D representing the islands.
Each edge represents a bridge joining two parts of the town.
                                                                                    B                D


                                                                                            C
If, for example, you begin at vertex D, you will
                                                                       Begin                        Pass through
leave and eventually return but, because D has                          and leave
a degree of 3, you will have to leave again.                  Return                Return and end
                                                                           D                             D



                                                     Leave again

Conversely, if you begin elsewhere, you will pass through vertex D at some
point, entering by one edge and leaving by another. But, because D has
degree 3, you must return in order to trace the third edge and, therefore,


44     MHR • Tools for Data Management
must end at D. So, your path must either begin or end at vertex D. Because
all the vertices are of odd degree, the same argument applies to all the other
vertices. Since you cannot begin or end at more than two vertices, the network
is non-traceable. Therefore, it is indeed impossible to traverse all the town’s
bridges without crossing one twice.


Leonhard Euler developed this proof of Example 3 in 1735. He laid the foundations
for the branch of mathematics now called graph theory. Among other discoveries,
Euler found the following general conditions about the traceability of networks.
 • A network is traceable if it has only vertices of even degree (even vertices)
   or exactly two vertices of odd degree (odd vertices).

 • If the network has two vertices of odd degree, the tracing path must begin
   at one vertex of odd degree and end at the other vertex of odd degree.


Example 4 Traceability and Degree

For each of the following networks,
a) list the number of vertices with odd degree and with even degree
b)   determine if the network is traceable
i)                          ii)                          iii)                           iv)




Solution
i)   a) 3 even vertices     ii)    a) 0 even vertices    iii) a) 3 even vertices        iv) a) 1 even vertex
        0 odd vertices                 4 odd vertices             2 odd vertices                 4 odd vertices
     b) traceable                  b) non-traceable             b) traceable                  b) non-traceable

If it is possible for a network to be drawn on a two-dimensional surface so
that the edges do not cross anywhere except at vertices, it is planar.

Example 5 Planar Networks

Determine whether each of the following networks is planar.
a)           b)                   c)                    d)                              e)




                                                                                   1.5 Graph Theory • MHR    45
Solution

a)    Planar

b)    Planar

c)    Planar

d)



                             can be redrawn as




     Therefore, the network is planar.

e)


                          cannot be redrawn as a planar network:




     Therefore, the network is non-planar.


Example 6 Map Colouring (The Four-Colour Problem)

A graphic designer is working on a logo representing the different tourist
regions in Ontario. What is the minimum number of colours required for                       D
the design shown on the right to have all adjacent areas coloured                    B
                                                                                         A
                                                                                                     E
differently?
                                                                                         C

Solution

Because the logo is two-dimensional, you can redraw it as a planar network   B                   D
as shown on the right. This network diagram can help you see the
relationships between the regions. The vertices represent the regions and            A
the edges show which regions are adjacent. Vertices A and E both connect
to the three other vertices but not to each other. Therefore, A and E can        C
                                                                                                     E
have the same colour, but it must be different from the colours for B, C,
and D. Vertices B, C, and D all connect to each other, so they require
three different colours. Thus, a minimum of four colours is necessary for
the logo.




46     MHR • Tools for Data Management
This example is a specific case of a famous problem in graph
theory called the four-colour problem. As you probably
conjectured in the investigation at the start of this              www.mcgrawhill.ca/links/MDM12
section, the maximum number of colours required in
any planar map is four. This fact had been suspected       Visit the above web site and follow the links to find
                                                            out more about the four-colour problem. Write a
for centuries but was not proven until 1976. The
                                                               short report on the history of the four-colour
proof by Wolfgang Haken and Kenneth Appel at
                                                                                 problem.
the University of Illinois required a supercomputer to
break the proof down into cases and many years of
verification by other mathematicians. Non-planar maps
can require more colours.

Example 7 Scheduling

The mathematics department has five committees. Each of these committees
meets once a month. Membership on these committees is as follows:

Committee A: Szczachor, Large, Ellis
Committee B: Ellis, Wegrynowski, Ho, Khan
Committee C: Wegrynowski, Large
Committee D: Andrew, Large, Szczachor
Committee E: Bates, Card, Khan, Szczachor

What are the minimum number of time slots needed to schedule the
committee meetings with no conflicts?

Solution

Draw the schedule as a network, with each vertex representing a different                 A               B
committee and each edge representing a potential conflict between
committees (a person on two or more committees). Analyse the network
as if you were colouring a map.                                                       E
                                                                                                              C
The network can be drawn as a planar graph. Therefore, a maximum
of four time slots is necessary to “colour” this graph. Because Committee A             D
is connected to the four other committees (degree 4), at least two
time slots are necessary: one for committee A and at least one for          Project
all the other committees. Because each of the other nodes has degree 3,     Prep
at least one more time slot is necessary. In fact, three time slots are
sufficient since B is not connected to D and C is not connected to E.        Graph theory provides
                                                                            problem-solving
  Time Slot Committees
                                                                            techniques that will be
      1         A
                                                                            useful in your tools for
      2        B, D                                                         data management
      3        C, E                                                         project.


                                                                                 1.5 Graph Theory • MHR       47
Key Concepts

     • In graph theory, a graph is also known as a network and is a collection of line
       segments (edges) and nodes (vertices).

     • If two vertices are connected by an edge, they are adjacent. The degree of a
       vertex is equal to the number of edges that begin or end at the vertex.

     • A path is a connected sequence of vertices. A path is a circuit if it begins and
       ends at the same vertex.

     • A connected network has at least one path connecting each pair of vertices.
       A complete network has an edge connecting every pair of vertices.

     • A connected network is traceable if it has only vertices of even degree (even
       vertices) or exactly two vertices of odd degree (odd vertices). If the network
       has two vertices of odd degree, the tracing must begin at one of the odd
       vertices and end at the other.

     • A network is planar if its edges do not cross anywhere except at the vertices.

     • The maximum number of colours required to colour any planar map is four.


     Communicate Your Understanding

      1. Describe how to convert a map into a network. Use an example to aid in
        your description.

      2. A network has five vertices of even degree and three vertices of odd degree.
        Using a diagram, show why this graph cannot be traceable.

      3. A modern zoo contains natural habitats for its animals. However, many of
        the animals are natural enemies and cannot be placed in the same habitat.
        Describe how to use graph theory to determine the number of different
        habitats required.




48      MHR • Tools for Data Management
Practise                                              5. Is it possible to add one bridge to the
                                                        Koenigsberg map to make it traceable?
A                                                       Provide evidence for your answer.
1. For each network,
                                                      6. Inquiry/Problem Solving The following chart
    i) find the degree of each vertex
                                                        indicates the subjects studied by five students.
    ii) state whether the network is traceable
                                                              C. Powell      B. Bates            G. Farouk
    a)     A                 b)       P
                         E                               English           Calculus             Calculus
                                          S              French            French               French
                                                         History           Geometry             Geography
                                  Q            U
          B                                              Music             Physics              Music
                         D
                                          T                        E. Ho                    N. Khan
                 C                    R
                                                         Calculus                     English
2. Draw a network diagram representing the               English                      Geography
    maps in questions 1d) and 1e) of the                 Geometry                     Mathematics of Data
    investigation on pages 41 and 42.                    Mathematics of Data            Management
                                                              Management              Physics
3. a) Look at a map of Canada. How many
         colours are needed to colour the ten            a) Draw a network to illustrate the overlap
         provinces and three territories of Canada?           of subjects these students study.
    b) How many colours are needed if the map            b) Use your network to design an
         includes the U.S.A. coloured with a                  examination timetable without conflicts.
         single colour?                                 (Hint: Consider each subject to be one
                                                        vertex of a network.)
Apply, Solve, Communicate
                                                      7. A highway inspector wants to travel each
B                                                       road shown once and only once to inspect
4. The following map is made up of curved               for winter damage. Determine whether it is
    lines that cross each other and stop only at        possible to do so for each map shown below.
    the boundary of the map. Draw three other
                                                         a)
    maps using similar lines. Investigate the four
    maps and make a conjecture of how many
    colours are needed for this type of map.




                                                        b)




                                                                             1.5 Graph Theory • MHR          49
8. Inquiry/Problem Solving                                  10. Application
     a) Find the degree of each vertex in the                    a) Three houses are located at positions A,
        network shown.                                              B, and C, respectively. Water, gas, and
                                  A                                 electrical utilities are located at positions
                                                                    D, E, and F, respectively. Determine
                                                                    whether the houses can each be
                    B                                               connected to all three utilities without
                                            D                       any of the connections crossing. Provide
                                                                    evidence for your decision. Is it necessary
                                                                    to reposition any of the utilities? Explain.
                                  C
     b) Find the sum of the degrees of the                                     A          D
        vertices.
                                                                               B          E
     c) Compare this sum with the number of
        edges in the network. Investigate other
        networks and determine the sum of the                                  C          F
        degrees of their vertices.
                                                                 b) Show that a network representing two
     d) Make a conjecture from your
        observations.                                               houses attached to n utilities is planar.

                                                             11. The four Anderson sisters live near each
 9. a) The following network diagram of the
        main floor of a large house uses vertices                other and have connected their houses by
        to represent rooms and edges to                         a network of paths such that each house has
        represent doorways. The exterior of the                 a path leading directly to each of the other
        house can be treated as one room. Sketch                three houses. None of these paths intersect.
        a floor plan based on this network.                      Can their brother Warren add paths from
                                                                his house to each of his sisters’ houses
                        Library           Conservatory          without crossing any of the existing paths?

                                                             12. In the diagram below, a sheet of paper with
       Kitchen
                                                    Dining      a circular hole cut out partially covers a
                                                    Room
             Family
                                                                drawing of a closed figure. Given that point
                                      Hallway
             Room                                               A is inside the closed figure, determine
                                                 Living
                                                 Room           whether point B is inside or outside. Provide
              Tea
             Room                                               reasons for your answer.
                                          Parlour

                            Exterior


     b) Draw a floor plan and a network diagram                                             A
        for your own home.                                                         B




50     MHR • Tools for Data Management
13. Application A communications network              15. In a communications network, the optimal
   between offices of a company needs to                  path is the one that provides the fastest link.
   provide a back-up link in case one part of a          In the network shown, all link times are in
   path breaks down. For each network below,             seconds.
   determine which links need to be backed up.             Thunder Bay
   Describe how to back up the links.                                             2.7
   a)   Thunder Bay                                                                            Sudbury
                                  Sudbury                                  4.5          1.7
                                                                                          North Bay
                                                                                 2.3        2.0
                            North Bay
                                                                                  0.5    Ottawa
                                                                    Kitchener
        Kitchener                  Ottawa                                           0.8
                                                                          1.2 0.6
                                                                                        1.2
                                                                     Windsor 1.4      Hamilton

                                 Hamilton                Determine the optimal path from
         Windsor                                          a) Thunder Bay to Windsor
   b)                                                     b) Hamilton to Sudbury
                        Charlottetown
                                                          c) Describe the method you used to
          Halifax
                                                             estimate the optimal path.
          Toronto
                              Montréal                16. A salesperson must travel by air to all of the
         Kingston
                                                         cities shown in the diagram below. The
                              Winnipeg
        Saskatoon                                        diagram shows the cheapest one-way fare for
           Edmonton                                      flights between the cities. Determine the
                                Vancouver                least expensive travel route beginning and
                                                         ending in Toronto.
14. During an election campaign, a politician
                                                                     Thunder Bay
   will visit each of the cities on the map below.           $319
                                                                                        $150
                                                                                               Sudbury
                     Waterloo
               55              Guelph                        Vancouver           $225
   Stratford                31      60                                $378              $175
               63      23                                    $111                               $349
          41                23                                     Toronto
                          Cambridge    Orangeville                                      $378               $213
                           46 51                        Calgary     $349
 Woodstock          52             116
                       25                                                                        Halifax
                                                                  $119       $218
             38                                                                                $321
           Brantford     45      Hamilton
                                                                         $109
                                                              Windsor                    Montréal
    a) Is it possible to visit each city only once?
                                                                  $399
    b) Is it possible to begin and end in the
        same city?
    c) Find the shortest route for visiting all
        the cities. (Hint: You can usually find the
        shortest paths by considering the shortest
        edge at each vertex.)

                                                                                1.5 Graph Theory • MHR            51
ACHIEVEMENT CHECK
                                                                   19. Inquiry/Problem Solving Use graph theory
                                                                      to determine if it is possible to draw the
  Knowledge/     Thinking/Inquiry/
 Understanding    Problem Solving
                                     Communication   Application      diagram below using only three strokes of
                                                                      a pencil.
17. The diagram below shows the floor plan
     of a house.




                                                                   20. Communication
     a) Find a route that passes through each                          a) Can a connected graph of six vertices
         doorway of this house exactly once.                              be planar? Explain your answer.
     b) Use graph theory to explain why such                           b) Can a complete graph of six vertices
         a route is possible.                                             be planar? Explain.
     c) Where could you place two exterior                         21. Can the graph below represent a map in two
         doors so that it is possible to start                        dimensions. Explain.
         outside the house, pass through each
         doorway exactly once, and end up on                                               B
         the exterior again? Explain your
         reasoning.
     d) Is a similar route possible if you add                                  A                       C
         three exterior doors instead of two?
         Explain your answer.
                                                                                    E               D
 C                                                                 22. Can a network have exactly one vertex with
18. a) Six people at a party are seated at a table.                   an odd degree? Provide evidence to support
         No three people at the table know each                       your answer.
         other. For example, if Aaron knows
         Carmen and Carmen knows Allison, then                     23. Communication A graph is regular if all its
         Aaron and Allison do not know each                           vertices have the same degree. Consider
         other. Show that at least three of the six                   graphs that do not have either loops
         people seated at the table must be                           connecting a vertex back to itself or multiple
         strangers to each other. (Hint: Model this                   edges connecting any pair of vertices.
         situation using a network with six                            a) Draw the four regular planar graphs that
         vertices.)                                                       have four vertices.
     b) Show that, among five people, it is                             b) How many regular planar graphs with
         possible that no three all know each                             five vertices are there?
         other and that no three are all strangers.
                                                                       c) Explain the difference between your
                                                                          results in parts a) and b).




52      MHR • Tools for Data Management
1.6         Modelling With Matrices

  A matrix is a rectangular array of numbers used to manage and organize data,
  somewhat like a table or a page in a spreadsheet. Matrices are made up of horizontal
  rows and vertical columns and are usually enclosed in square brackets. Each number
  appearing in the matrix is called an entry. For instance, A =
                                                                  ΄5
                                                                   21 0   ΅
                                                                   −2 3 is a matrix

  with two rows and three columns, with entries 5, −2, and 3 in the first row and entries
  2, 1, and 0 in the second row. The dimensions of this matrix are 2 × 3. A matrix with
  m rows and n columns has dimensions of m × n.


      I N V E S T I G AT E & I N Q U I R E : Olympic Medal Winners

      At the 1998 Winter Olympic games in
      Nagano, Japan, Germany won 12 gold,
      9 silver, and 8 bronze medals; Norway
      won 10 gold, 10 silver, and 5 bronze
      medals; Russia won 9 gold, 6 silver, and
      3 bronze medals; Austria won 3 gold,
      5 silver, and 9 bronze medals; Canada
      won 6 gold, 5 silver, and 4 bronze
      medals; and the United States won
      6 gold, 3 silver, and 4 bronze medals.

       1. Organize the data using a matrix
         with a row for each type of medal
         and a column for each country.
       2. State the dimensions of the matrix.
       3. a) What is the meaning of the entry in row 3, column 1?
          b) What is the meaning of the entry in row 2, column 4?
       4. Find the sum of all the entries in the first row of the matrix. What is the
         significance of this row sum? What would the column sum represent?
       5. Use your matrix to estimate the number of medals each country would
         win if the number of Olympic events were to be increased by 20%.
       6. a) Interchange the rows and columns in your matrix by “reflecting” the
             matrix in the diagonal line beginning at row 1, column 1.
          b) Does this transpose matrix provide the same information? What
             are its dimensions?
       7. State one advantage of using matrices to represent data.



                                                                       1.6 Modelling With Matrices • MHR   53
In general, use a capital letter as the symbol for a matrix and represent each
entry using the corresponding lowercase letter with two indices. For example,




                                                  ΄                              ΅
                                                   c11    c12   c13   …    c1n
                                                                      …

     ΄              ΅          ΄           ΅
    a11 a12 a13               b11 b12              c21    c22   c23        c2n
A = a21 a22 a23           B = b21 b22          C = c31    c32   c33   …    c3n
    a31 a32 a33               b31 b32                Ӈ      Ӈ     Ӈ    Ӈ   Ӈ
                                                   cm1    cm2   cm3   …    cmn

Here, ai j , bi j , and ci j represent the entries in row i and column j of these matrices.

The transpose of a matrix is indicated by a superscript t, so the transpose of A is
shown as At. A matrix with only one row is called a row matrix, and a matrix
with only one column is a column matrix. A matrix with the same number of
rows as columns is called a square matrix.



                                   ΄ ΅                ΄                    ΅
                                   −3                    3    4 9
     [1 −2 5 −9]                    0                   −1    0 2
                                    5                    5 −10 −3
     a row matrix            a column matrix          a square matrix


Example 1 Representing Data With a Matrix

The number of seats in the House of Commons won by each party in the
federal election in 1988 were Bloc Québécois (BQ), 0; Progressive Conservative
Party (PC), 169; Liberal Party (LP), 83; New Democratic Party (NDP), 43;
Reform Party (RP), 0; Other, 0. In 1993, the number of seats won were BQ, 54;
PC, 2; LP, 177; NDP, 9; RP, 52; Other, 1. In 1997, the number of seats won
were BQ, 44; PC, 20; LP, 155; NDP, 21; RP, 60; Other, 1.
a)   Organize the data using a matrix S with a row for each political party.
b)   What are the dimensions of your matrix?
c)   What does the entry s43 represent?
d)   What entry has the value 52?
e)   Write the transpose matrix for S. Does S t provide the same information
     as S?
f)   The results from the year 2000 federal election were Bloc Québécois, 38;
     Progressive Conservative, 12; Liberal, 172; New Democratic Party, 13;
     Canadian Alliance (formerly Reform Party), 66; Other, 0. Update your
     matrix to include the results from the 2000 federal election.




54       MHR • Tools for Data Management
Solution

a)               1988 1993 1997




             ΄                      ΅
           0 54 44                       BQ
         169   2 20                      PC
          83 177 155                     LP
     S = 43    9 21                      NDP
           0 52 60                       RP
           0   1   1                     Other

     Labelling the rows and columns in large matrices can help you keep track
     of what the entries represent.

b) The dimensions of the matrix are 6 × 3.

c) The entry s43 shows that the NDP won 21 seats in 1997.

d) The entry s52 has the value 52.

e) The transpose matrix is
                  BQ PC        LP NDP RP Other


             ΄                                       ΅
                   0 169 83             43 0     0       1988
     S = t        54   2 177             9 52    1       1993
                  44 20 155             21 60    1       1997
     Comparing the entries in the two matrices shows that they do contain
     exactly the same information.

f)       1988 1993 1997 2000




     ΄                               ΅
           0 54 44 38                    BQ
         169   2 20 12                   PC
          83 177 155 172                 LP
          43   9 21 13                   NDP
           0 52 60 66                    CA (RP)
           0   1   1   0                 Other


Two matrices are equal only if each entry in one matrix is equal to the
corresponding entry in the other.
                                                     1.5 4 −8

                     ΄                           ΅ ΄            ΅
                         3
                         ᎏᎏ    ͙16
                                ෆ        (−2)3
For example,             2                       and 1        are equal matrices.
                                                     ᎏᎏ −4 2
                         5−1   −4        −(−2)       5




                                                                           1.6 Modelling With Matrices • MHR   55
Two or more matrices can be added or subtracted, provided that their
dimensions are the same. To add or subtract matrices, add or subtract the
corresponding entries of each matrix. For example,

         −1 5               5 −3
΄2
 0        7 −8   ΅ + ΄ −2
                        0
                            4 −1    ΅ = ΄ −2
                                           2    4 2
                                               11 −9   ΅
Matrices can be multiplied by a scalar or constant. To multiply a matrix by a
scalar, multiply each entry of the matrix by the scalar. For example,



     ΄           ΅΄                ΅
    4 5                −12 −15
−3 −6 0            =    18   0
    3 −8                −9 24

Example 2 Inventory Problem

The owner of Lou’s ’Lectronics Limited has two stores. The manager takes
inventory of their top-selling items at the end of the week and notes that at the
eastern store, there are 5 video camcorders, 7 digital cameras, 4 CD players,
10 televisions, 3 VCRs, 2 stereo systems, 7 MP3 players, 4 clock radios, and
1 DVD player in stock. At the western store, there are 8 video camcorders,
9 digital cameras, 3 CD players, 8 televisions, 1 VCR, 3 stereo systems, 5 MP3
players, 10 clock radios, and 2 DVD players in stock. During the next week,
the eastern store sells 3 video camcorders, 2 digital cameras, 4 CD players,
3 televisions, 3 VCRs, 1 stereo system, 4 MP3 players, 1 clock radio, and no
DVD players. During the same week, the western store sells 5 video
camcorders, 3 digital cameras, 3 CD players, 8 televisions, no VCRs, 1 stereo
system, 2 MP3 players, 7 clock radios, and 1 DVD player. The warehouse then
sends each store 4 video camcorders, 3 digital cameras, 4 CD players,
4 televisions, 5 VCRs, 2 stereo systems, 2 MP3 players, 3 clock radios, and
1 DVD player.
a)       Use matrices to determine how many of each item is in stock at the stores
         after receiving the new stock from the warehouse.
b)       Immediately after receiving the new stock, the manager phones the head
         office and requests an additional 25% of the items presently in stock in
         anticipation of an upcoming one-day sale. How many of each item will be
         in stock at each store?




56         MHR • Tools for Data Management
Solution 1    Using Pencil and Paper

a) Let matrix A represent the initial inventory, matrix B represent the number of items
   sold, and matrix C represent the items in the first shipment of new stock.
          E    W
           5 8      camcorders                              3       5                 4   4
           7 9      cameras                                 2       3                 3   3
           4 3      CD players                              4       3                 4   4
          10 8      TVs                                     3       8                 4   4
    A=     3 1      VCRs                           B=       3       0        C=       5   5
           2 3      stereos                                 1       1                 2   2
           7 5      MP3 players                             4       2                 2   2
           4 10     clock radios                            1       7                 3   3
           1 2      DVD players                             0       1                 1   1

   Since the dimensions of matrices A, B, and C are the same, matrix addition and
   subtraction can be performed. Then, the stock on hand before the extra shipment is
                       5 8             3   5       4    4                6   7
                       7 9             2   3       3    3                8   9
                       4 3             4   3       4    4                4   4
                      10 8             3   8       4    4               11   4
    D=A−B+C=           3 1     −       3   0   +   5    5       =        5   6
                       2 3             1   1       2    2                3   4
                       7 5             4   2       2    2                5   5
                       4 10            1   7       3    3                6   6
                       1 2             0   1       1    1                2   2
   Let E represent the stock in the stores after the extra shipment from the warehouse.
                        6      7            7.5   8.75
                        8      9           10    11.25
                        4      4            5     5
                       11      4           13.75 5
   E = 125% × D = 1.25 5       6   =        6.25 7.5
                        3      4            3.75 5
                        5      5            6.25 6.25
                        6      6            7.5   7.5
                        2      2            2.5   2.5
   Assuming the manager rounds to the nearest whole number, the stock at the eastern
   store will be 8 video camcorders, 10 digital cameras, 5 CD players, 14 televisions,
   6 VCRs, 4 stereo systems, 6 MP3 players, 8 clock radios, and 3 DVD players in
   stock. At the western store, there will be 9 video camcorders, 11 digital cameras,
   5 CD players, 5 televisions, 8 VCRs, 5 stereo systems, 6 MP3 players, 8 clock
   radios, and 3 DVD players in stock.


                                                                                 1.6 Modelling With Matrices • MHR   57
Solution 2     Using a Graphing Calculator

a) As in the pencil-and-paper solution, let matrix A represent the initial
     inventory, matrix B the items sold, and matrix C the first shipment of
     new stock. Use the MATRX EDIT menu to store matrices. Press ENTER
     to select a matrix name, then key in the dimensions and the entries.
     The calculator will store the matrix until it is cleared or overwritten.
     Matrix names and entries appear in square brackets on the calculator
     screen.

     Use the MATRX NAMES menu to copy the matrices into the expression
     for D, the matrix representing the stock on hand before the extra
     shipment. Just move the cursor to the matrix you need and press ENTER.




b) To find the stock on hand after the extra shipment for the one-day sale,
     multiply matrix D by 1.25 and store the result in matrix E. Then, you
     can use the round function in the MATH NUM menu to display the
     closest whole numbers for the entries in matrix E.


Solution 3     Using a Spreadsheet

a) You can easily perform matrix operations using a spreadsheet. It is also
     easy to add headings and row labels to keep track of what the entries
     represent. Enter each matrix using two adjacent columns: matrix A (initial
     stock) in columns A and B, matrix B (sales) in columns C and D, and
     matrix C (new stock) in columns E and F.

     To find the amount of stock on hand after the first shipment from the
     warehouse, enter the formula A3–C3+E3 in cell H3.

     Then, use the Fill feature to copy this formula for the rest of the entries
     in columns H and I.

b) Use the Fill feature in a similar way to copy the formula for the entries
     in matrix E, the stock on hand after the extra shipment from the
     warehouse. You can use the ROUND function to find the nearest whole
     number automatically. The formula for cell J3, the first entry, is
     ROUND(1.25*H3,0).




58     MHR • Tools for Data Management
Key Concepts

• A matrix is used to manage and organize data.

• A matrix made up of m rows and n columns has dimensions m × n.

• Two matrices are equal if they have the same dimensions and all corresponding
  entries are equal.

• The transpose matrix is found by interchanging rows with the corresponding
  columns.

• To add or subtract matrices, add or subtract the corresponding entries of each
  matrix. The dimensions of the matrices must be the same.
• To multiply a matrix by a scalar, multiply each entry of the matrix by the scalar.

Communicate Your Understanding

 1. Describe how to determine the dimensions of any matrix.

 2. Describe how you know whether two matrices are equal. Use an example to
   illustrate your answer.

 3. Can transpose matrices ever be equal? Explain.

 4. a) Describe how you would add two matrices. Give an example.
    b) Explain why the dimensions of the two matrices need to be the same to add
       or subtract them.

 5. Describe how you would perform scalar multiplication on a matrix. Give an
   example.



                                                                 1.6 Modelling With Matrices • MHR   59
Practise                                                         5. a) Give two examples of square matrices.
                                                                    b) State the dimensions of each matrix in
 A                                                                       part a).
 1. State the dimensions of each matrix.
                                                                 6. a) Write a 3 × 4 matrix, A, with the
                        5 −1
     a)
          ΄     4
               −2       3 8     ΅           b) [1 0 −7]                  property that entry aij = i + j.
                                                                    b) Write a 4 × 4 matrix, B, with the property
                                                                                                3 if i = j
                                                                                               Άi × j if i
          ΄                         ΅
               3 −9           −6                                         that entry bij =
               5 4             7                                                                                j
     c)
               1 0             8                                 7. Solve for w, x, y, and z.
               8 −1            2
                                                                    a)
                                                                         ΄ −2
                                                                            x          4
                                                                                     4z − 2  ΅ ΄
                                                                                            = 3
                                                                                              w
                                                                                                          y−1
                                                                                                           6        ΅
                                        ΄                    ΅
                          −5 3  2
                           6 0 −1
                                                                         ΄w               ΅ = ΄ 8 −8 2y         9
                                                                                                                        ΅
                                                                             3
 2. For the matrix A =            ,                                 b)               x2
                           4 8 −3                                         2y         3z                       2z − 5
                           7 1 −4
    a) state the value in entry


                                                                                 ΄ ΅ ΄ ΅
                                                                             2 −1      3 4
          i)    a21          ii) a43        iii) a13                         3 9      −6 1
                                                                 8. Let A =       ,B=       ,
     b) state the entry with value                                           5 0       8 2
                                                                            −4 1      −1 −5
          i)    4            ii) −3         iii) 1
                                                                                      3 −2
                                                                   and C =
                                                                                 ΄              6 5 .
                                                                                                          ΅
                    ΄                                    ΅
                         a      b c          d       e                                1 4       0 −8
                         f      g h          i       j
 3. Let A =                                                        Calculate, if possible,
                         k      l m          n       o
                                                                    a) A + B               b) B + A                 c) B − C
                         p      q r          s       t                                         1
                                                                    d) 3A                  e) −ᎏᎏB                  f) 2(B − A)


                    ΄               ΅
                        u      v                                                               2
                                                                    g) 3A − 2B
     and B =            w      x .
                        y      z

                                                                                 ΄             ΅ ΄                      ΅
                                                                             8 −6                             0 −1
     For each of the following, replace aij or bij               9. Let A =  1 −2 , B =                       2 4 ,
     with its corresponding entry in the above                              −4 5                              9 −3
     matrices to reveal a secret message.

                                                                                 ΄             ΅
                                                                                      2 3
     a) a33a11a45a43a24a13a15a44                                                      8 −6 .
                                                                   and C =
          a11a43a15             a21b11a34                                             4 1
     b) a24             a32a35b12a15        a33a11a45a23           Show that
     c) b21a35b21               a45a23a24a44                        a) A + B = B + A
          a24a44             a21b11a34                                   (commutative property)
 4. a) Give two examples of row matrices and                        b) (A + B) + C = A + (B + C )
          two examples of column matrices.                               (associative property)
     b) State the dimensions of each matrix in                      c) 5(A + B) = 5A + 5B
          part a).                                                       (distributive property)



60        MHR • Tools for Data Management
10. Find the values of w, x, y, and z if                              sciences; U.K. with 21 Nobel prizes in


     ΄                 ΅ ΄                        ΅
          5 −1 2                6 y           5                       physics, 25 in chemistry, 24 in
          4 x −8           + 2 −3 2           1                       physiology/medicine, 8 in literature, 13 in
          7 0  3                2 −3          z                       peace, and 7 in economic sciences; Germany
                                                                      with 20 Nobel prizes in physics, 27 in

          ΄               ΅
          34 10 24
       1                                                              chemistry, 16 in physiology/medicine, 7 in
     = ᎏᎏ −4 24 −12                                                   literature, 4 in peace, and 1 in economic
       2 2w −12 42
                                                                      sciences; France with 12 Nobel prizes in
11. Solve each equation.                                              physics, 7 in chemistry, 7 in physiology/
                                                                      medicine, 12 in literature, 9 in peace, and 1 in
     a)
          ΄3
           2     0 8       ΅
                 2 −5 + A = 7
                            −4      ΄         0 1
                                              3 −2    ΅               economic sciences; and Sweden with 4 Nobel
                                                                      prizes in physics, 4 in chemistry, 7 in

          ΄           ΅ ΄               ΅΄                ΅
               5 7             1 6                7 19                physiology/medicine, 7 in literature, 5 in
     b)        4 0     +y      0 −4       =       4 −8                peace, and 2 in economic sciences.
              −1 −3            2 5                3 7
                                                                       a) Represent this data as a matrix, N. What
Apply, Solve, Communicate                                                are the dimensions of N ?
                                                                      b) Use row or column sums to calculate
 B                                                                       how many Nobel prizes have been
12. Application The map below shows driving                              awarded to citizens of each country.
     distances between five cities in Ontario.
                                                                   14. The numbers of university qualifications
     Thunder Bay
                                                                      (degrees, certificates, and diplomas) granted
                                                                      in Canada for 1997 are as follows: social
                                                                      sciences, 28 421 males and 38 244 females;
          710 km                                                      education, 8036 males and 19 771 females;
                                                                      humanities, 8034 males and 13 339 females;
                                 North Bay                            health professions and occupations, 3460
              Sault Ste.                                  Ottawa      males and 9613 females; engineering and
                               425 km 365 km
               Marie
                                        350 km                        applied sciences, 10 125 males and 2643
                           655 km                     400 km
                                                                      females; agriculture and biological sciences,
                                         Toronto                      4780 males and 6995 females; mathematics
                                                                      and physical sciences, 6749 males and 2989
     a) Represent the driving distances between                       females; fine and applied arts, 1706 males
          each pair of cities with a matrix, A.                       and 3500 females; arts and sciences, 1730
     b) Find the transpose matrix, At.                                males and 3802 females.
     c) Explain how entry a23 in matrix A and
                                                                      The numbers for 1998 are as follows: social
          entry a32 in matrix At are related.
                                                                      sciences, 27 993 males and 39 026 females;
13. Nobel prizes are awarded for physics,                             education, 7565 males and 18 391 females;
     chemistry, physiology/medicine, literature,                      humanities, 7589 males and 13 227 females;
     peace, and economic sciences. The top five                        health professions and occupations, 3514
     Nobel prize-winning countries are U.S.A.                         males and 9144 females; engineering and
     with 67 Nobel prizes in physics, 43 in                           applied sciences, 10 121 males and 2709
     chemistry, 78 in physiology/medicine, 10 in                      females; agriculture and biological sciences,
     literature, 18 in peace, and 25 in economic                      4779 males and 7430 females;

                                                                               1.6 Modelling With Matrices • MHR   61
mathematics and physical sciences, 6876                         b) What is the total population for each age
     males and 3116 females; fine and applied arts,                      group?
     1735 males and 3521 females; arts and                           c) Suppose that Canada’s population grows
     sciences, 1777 males and 3563 females.                             by 1.5% in all age groups. Calculate the
     a) Enter two matrices in a graphing calculator                     anticipated totals for each age group.
        or spreadsheet—one two-column matrix
        for males and females receiving degrees in              16. a) Prepare a matrix showing the
                                                               pte
                                                          ha            connections for the VIA Rail routes
        1997 and a second two-column matrix for




                                                      C


                                                                 r
        the number of males and females receiving                       shown on page 3. Use a 1 to indicate a




                                                                 m
                                                      P
                                                      r
                                                          oble
        degrees in 1998.                                                direct connection from one city to
                                                                        another city. Use a 0 to indicate no direct
     b) How many degrees were granted to males
                                                                        connection from one city to another city.
        in 1997 and 1998 for each field of study?
                                                                        Also, use a 0 to indicate no direct
     c) How many degrees were granted to                                connection from a city to itself.
        females in 1997 and 1998 for each field
                                                                     b) What does the entry in row 4, column 3
        of study?
                                                                        represent?
     d) What is the average number of degrees
                                                                     c) What does the entry in row 3, column 4
        granted to females in 1997 and 1998 for
                                                                        represent?
        each field of study?
                                                                     d) Explain the significance of the
15. Application The table below shows the                               relationship between your answers in
     population of Canada by age and gender in                          parts b) and c).
     the year 2000.                                                  e) Describe what the sum of the entries in
     Age Group Number of Males Number of Females                        the first row represents.
     0−4           911 028          866 302                          f) Describe what the sum of the entries in
     5−9         1 048 247          996 171                             the first column represents.
     10−14       1 051 525          997 615
                                                                     g) Explain why your answers in parts e) and
     15−19       1 063 983        1 007 631
                                                                        f ) are the same.
     20−24       1 063 620        1 017 566
     25−29       1 067 870        1 041 900                      C
     30−34       1 154 071        1 129 095                     17. Inquiry/Problem Solving Show that for any
     35−39       1 359 796        1 335 765                          m × n matrices, A and B
     40−44       1 306 705        1 304 538
                                                                     a) (At )t = A      b) (A + B)t = At + B t
     45−49       1 157 288        1 162 560
     50−54       1 019 061        1 026 032                     18. Communication Make a table to compare
     55−59         769 591          785 657                          matrix calculations with graphing calculators
     60−64         614 659          641 914                          and with spreadsheets. What are the
     65−69         546 454          590 435                          advantages, disadvantages, and limitations
     70−74         454 269          544 008                          of these technologies?
     75−79         333 670          470 694
     80−84         184 658          309 748                     19. Inquiry/Problem Solving Search the
     85−89          91 455          190 960                          newspaper for data that could be organized
     90+            34 959           98 587                          in a matrix. What calculations could you
                                                                     perform with these data in matrix form? Is
     a) Create two matrices using the above data,
                                                                     there any advantage to using matrices for
        one for males and another for females.                       these calculations?
62      MHR • Tools for Data Management
1.7           Problem Solving With Matrices

  The previous section demonstrated how to use matrices to model, organize, and
  manipulate data. With multiplication techniques, matrices become a powerful
  tool in a wide variety of applications.

      I N V E S T I G AT E & I N Q U I R E : Matrix Multiplication

      The National Hockey League standings on March 9, 2001 in the
      Northeast Division are shown below along with the league’s point
      system for a win, loss, tie, or overtime loss (OTL).
       Team             Win      Loss   Tie    OTL          Score      Points
       Ottawa            39      17     8       3          Win           2
       Buffalo           36      25     5       1          Loss          0
       Toronto           31      23     10      5          Tie           1
       Boston            28      27     6       7          OTL           1
       Montréal          23      36     5       4

       1. Calculate the number of points
         for each team in the Northeast
         Division using the above tables.
         Explain your method.
       2. a) Represent the team standings
              as a 5 × 4 matrix, A.
          b) Represent the points system
              as a column matrix, B.
       3. Describe a procedure for
         determining the total points for
         Ottawa using the entries in row
         1 of matrix A and column 1 of
         matrix B.
       4. How could you apply this
         procedure to find the points
         totals for the other four teams?
       5. Represent the total points for each team as a column matrix, C.
         How are the dimensions of C related to those of A and B?
       6. Would it make sense to define matrix multiplication using a
         procedure such that A × B = C? Explain your reasoning.




                                                                    1.7 Problem Solving With Matrices • MHR   63
In the above investigation, matrix A has dimensions 5 × 4 and                                 A5x4 × B4x1 = C5x1
matrix B has dimensions 4 × 1. Two matrices can be multiplied
when their inner dimensions are equal. The outer dimensions                                      same
are the dimensions of the resultant matrix when matrices A                                    dimensions
and B are multiplied.                                                            outer dimensions give dimensions
                                                                                         of resultant matrix




Example 1 Multiplying Matrices

Matrix A represents the proportion of students at a high school who have part-time
jobs on Saturdays and the length of their shifts. Matrix B represents the number of
students at each grade level.

         Gr 9 Gr 10 Gr 11 Gr 12                       M   F



                                                              ΄             ΅
                                                     120 130                    Gr 9

     ΄                             ΅
    0.20 0.10 0.20 0.15 ≤ 4 h
                                                     137 155                    Gr 10
A = 0.25 0.30 0.25 0.45 4.1 − 6 h              B=
                                                     103 110                    Gr 11
    0.05 0.25 0.15 0.10 > 6 h
                                                       95 92                    Gr 12
a) Calculate AB. Interpret what each entry represents.
b)   Calculate BA, if possible.

Solution

a) A and B have the same inner dimensions, so multiplication is possible
     and their product will be a 3 × 2 matrix: A3×4 × B4×2 = C3×2



                                           ΅΄             ΅
                                                120 130

              ΄
                  0.20 0.10 0.20 0.15
                                                137 155
     AB =         0.25 0.30 0.25 0.45
                                                103 110
                  0.05 0.25 0.15 0.10
                                                 95 92


              ΄                                                                                                           ΅
            (0.20)(120) + (0.10)(137) + (0.20)(103) + (0.15)(95)   (0.20)(130) + (0.10)(155) + (0.20)(110) + (0.15)(92)
          = (0.25)(120) + (0.30)(137) + (0.25)(103) + (0.45)(95)   (0.25)(130) + (0.30)(155) + (0.25)(110) + (0.45)(92)
            (0.05)(120) + (0.25)(137) + (0.15)(103) + (0.10)(95)   (0.05)(130) + (0.25)(155) + (0.15)(110) + (0.10)(92)



              ΄          ΅
              73 77
          ⋅
          =  140 148
              65 71
     Approximately 73 males and 77 females work up to 4 h; 140 males and
     148 females work 4− 6 h, and 65 males and 71 females work more than
     6 h on Saturdays.

b) For B4×2 × A3×4, the inner dimensions are not the same, so BA cannot be calculated.




64       MHR • Tools for Data Management
Technology is an invaluable tool for solving problems that involve large
amounts of data.

Example 2 Using Technology to Multiply Matrices

The following table shows the number and gender of full-time students
enrolled at each university in Ontario one year.
University      Full-Time Students   Males (%)   Females (%)
Brock                   6509            43           57
Carleton              12 376            55           45
Guelph                11 773            38           62
Lakehead                5308            48           52
Laurentian              3999            43           57
McMaster              13 797            46           54
Nipissing               1763            34           66
Ottawa                16 825            42           58
Queen’s               13 433            44           56
Ryerson               10 266            47           53
Toronto               40 420            44           56
Trent                   3764            36           64
Waterloo              17 568            55           45
Western               21 778            46           54
Wilfred Laurier         6520            45           55
Windsor                 9987            46           54
York                  27 835            39           61

a)   Set up two matrices, one listing the numbers of full-time students at
     each university and the other the percents of males and females.
b)   Determine the total number of full-time male students and the total
     number of full-time female students enrolled in Ontario universities.


Solution 1    Using a Graphing Calculator

a) Use the MATRX EDIT menu to store matrices for a 1 × 17 matrix for the
     numbers of full-time students and a 17 × 2 matrix for the percents of
     males and females.



b) To multiply matrices, use the MATRX NAMES menu. Copy the matrices
     into an expression such as [A]*[B] or [A][B].

     There are 100 299 males and 123 622 females enrolled in Ontario
     universities.



                                                                1.7 Problem Solving With Matrices • MHR   65
You can also enter matrices directly into an expression by using the square
     brackets keys. This method is simpler for small matrices, but does not store
     the matrix in the MATRX NAMES menu.


Solution 2    Using a Spreadsheet

Enter the number of full-time students at each university as a 17 × 1 matrix in
cells B2 to B18. This placement leaves you the option of putting labels in the
first row and column. Enter the proportion of male and female students as a
2 × 17 matrix in cells D2 to T3.

Both Corel® Quattro Pro and Microsoft® Excel have built-in functions for
multiplying matrices, although the procedures in the two programs differ
somewhat.

Corel® Quattro Pro:
On the Tools menu, select Numeric Tools/Multiply. In the pop-up window,
enter the cell ranges for the two matrices you want to multiply and the cell
where you want the resulting matrix to start. Note that you must list the 2 × 17
matrix first.




                                                                                    Project
                                                                                    Prep

                                                                                    You can apply these
                                                                                    techniques for matrix
                                                                                    multiplication to the
                                                                                    calculations for your
                                                                                    tools for data
                                                                                    management project.


Microsoft® Excel:
The MMULT(matrix1,matrix2) function will calculate the product of the two
matrices but displays only the first entry of the resulting matrix. Use the INDEX
function to retrieve the entry for a specific row and column of the matrix.




66     MHR • Tools for Data Management
΄                        ΅
                                       1 0 0 0 … 0
                                       0 1 0 0 … 0
                                       0 0 1 0 … 0
Identity matrices have the form I =                              with entries
                                       0 0 0 1 … 0
                                       Ӈ Ӈ Ӈ Ӈ Ӈ Ӈ
                                       0 0 0 0 … 1
of 1 along the main diagonal and zeros for all other entries. The identity
matrix with dimensions n × n is represented by In. It can easily be shown that
Am×n In = Am×n for any m × n matrix A.

For most square matrices, there exists an inverse matrix A–1 with the
                                                1
property that AA−1 = A−1A = I. Note that A−1 ᎏᎏ.
                                                A

For 2 × 2 matrices, AA−1 =   ΄ ac d ΅ ΄ w x ΅ = ΄ 1 0 ΅
                                  b
                                        y z       0 1
Multiplying the matrices gives four simultaneous equations for w, x, y, and z.
                                               d −b
                                        1
                                             ΄        ΅
Solving these equations yields A−1 = ᎏᎏ − c a . You can confirm that
                                     ad − bc
A A = I, also. If ad = bc, then A does not exist since it would require
  –1                             −1

dividing by zero.

The formulas for the inverses of larger matrices can be determined in the
same way as for 2 × 2 matrices, but the calculations become much more
involved. However, it is relatively easy to find the inverses of larger matrices
with graphing calculators since they have the necessary formulas built in.


                                                                   1.7 Problem Solving With Matrices • MHR   67
Example 3 Calculating the Inverse Matrix

Calculate, if possible, the inverse of
a) A =
         ΄ 3 7
           4 −2      ΅    b) B = 6 8
                                     ΄
                                    3 4      ΅
Solution 1     Using Pencil and Paper

     A−1 = ᎏ d − b       ΄       ΅
              1
a)
           ad − bc − c a

        = ᎏᎏ −2 −7               ΄       ΅
                  1
          (3)( −2) − (7)(4) −4 3

        = − ᎏᎏ −2 −7
                 ΄           ΅
             1
            34 −4 3




             ΄ ΅
              1  7
             ᎏᎏ ᎏᎏ
        =    17 34
              2   3
             ᎏᎏ −ᎏᎏ
             17  34

b)   For B, ad − bc = (6)(4) − (8)(3) = 0, so B −1 does not exist.


Solution 2     Using a Graphing Calculator

a) Use the MATRX EDIT menu to store the 2 × 2 matrix. Retrieve it with the
     MATRX NAMES menu, then use x −1 to find the inverse. To verify that the
     decimal numbers shown are equal to the fractions in the pencil-and-paper
     solution, use the ᭤Frac function from the MATH NUM menu.




b) For B, the calculator shows that the inverse cannot be calculated.




68     MHR • Tools for Data Management
Solution 3   Using a Spreadsheet

The spreadsheet functions for inverse matrices are similar to those for matrix
multiplication.
a) Enter the matrix in cells A1 to B2.

   In Corel® Quattro Pro, use Tools/Numeric Tools/Invert… to enter the range
   of cells for the matrix and the cell where you want the inverse matrix to start.
   Use the Fraction feature to display the entries as fractions rather than
   decimal numbers.

   In Microsoft® Excel, use the MINVERSE function to produce the inverse
   matrix and the INDEX function to access the entries in it. If you put absolute
   cell references in the MINVERSE function for the first entry, you can use the
   Fill feature to generate the formulas for the other entries.
   Use the Fraction feature to display the entries as fractions rather than
   decimal numbers.




                                                                 1.7 Problem Solving With Matrices • MHR   69
During the 1930s, Lester Hill, an American mathematician, developed methods
for using matrices to decode messages. The following example illustrates a
simplified version of Hill’s technique.


Example 4 Coding a Message Using Matrices

a)   Encode the message PHONE ME TONIGHT using 2 × 2 matrices.
b)   Determine the matrix key required to decode the message.

Solution

a) Write the message using 2 × 2 matrices. Fill in any missing entries with
     the letter Z.

     ΄ O N ΅, ΄ E M ΅, ΄ OI N ΅, ΄ H T ΅
       P H
                E T         G      Z Z
     Replace each letter with its corresponding number in the alphabet.

     A      B    C D E         F G H         I J K L M
     1      2    3 4 5         6 7 8         9 10 11 12 13

     N O P Q R S T U V W X Y Z
     14 15 16 17 18 19 20 21 22 23 24 25 26


     ΄ 16
       15       14΅ ΄ 5 13 ΅, ΄ 15 14 ΅, ΄ 26 26 ΅
                 8 ,
                      5 20       9 7
                                            8 20


     Now, encode the message by multiplying with a coding matrix that only

                                                                ΄      ΅
     the sender and receiver know. Suppose that you chose C = 3 1 as your
     coding matrix.                                           5 2


     ΄ 3 1 ΅΄ 16
       5 2 15
                        8
                       14   ΅ = ΄ 110 68 ΅
                                   63 38



     ΄ 3 1 ΅΄ 5 13 ΅ = ΄ 20 105 ΅
       5 2 5 20          35
                             59



     ΄ 3 1 ΅΄ 15 14 ΅ = ΄ 54 49 ΅
       5 2     9 7        93 84

     ΄ 3 1 ΅΄ 26 26 ΅ = ΄ 50 152 ΅
       5 2
               8 20
                          92
                              86


     You would send the message as 63, 38, 110, 68, 20, 59, 35, 105, 54, 49, 93,
     84, 50, 86, 92, 152.




70       MHR • Tools for Data Management
b) First, rewrite the coded message as 2 × 2 matrices.


   ΄ 110 68 ΅, ΄ 20 105 ΅, ΄
      63 38
                 35
                     59        54 49 ,
                               93 84  ΅ ΄ 92 152 ΅
                                          50 86


   You can decode the message with the inverse matrix for the coding matrix.
       C −1 × CM = C −1C × M = IM = M
   where M is the message matrix and C is the coding matrix.
   Thus, the decoding matrix, or key, is the inverse matrix of the coding
   matrix. For the coding matrix used in part a), the key is

                    = ᎏᎏ 2 −1 = 2 −1
               −1

   ΄3 1΅                        ΄        ΅ ΄         ΅
                             1
    5 2               (3)(2) − (5)(1) −5 3 −5 3

   Multiplying the coded message by this key gives
          −1
   ΄ −5
      2
           3 ΅΄ 110 68 ΅ = ΄ 15 14 ΅
                 63 38       16 8

          −1 20 59 = 5 13
   ΄ −5
      2
           3 ΅΄ 35 105 ΅ ΄ 5 20 ΅
          −1 54 49 = 15 14
   ΄ −5
      2
           3 ΅΄ 93 84 ΅ ΄ 9 7 ΅
          −1 50 86 = 8 20
   ΄ −5
      2
           3 ΅΄ 92 152 ΅ ΄ 26 26 ΅
   The decoded message is 16, 8, 15, 14, 5, 13, 5, 20, 15, 14, 9, 7, 8, 20, 26, 26.
   Replacing each number with its corresponding letter in the alphabet gives
   PHONEMETONIGHTZZ, the original message with the two Zs as fillers.



Matrix multiplication and inverse matrices are the basis for many computerized
encryption systems like those used for electronic transactions between banks and
income tax returns filed over the Internet.

Transportation and communication networks can be represented using matrices,
called network matrices. Such matrices provide information on the number
of direct links between two vertices or points (such as people or places). The
advantage of depicting networks using matrices is that information on indirect
routes can be found by performing calculations with the network matrix.

To construct a network matrix, let each vertex (point) be represented as a row
and as a column in the matrix. Use 1 to represent a direct link and 0 to
represent no direct link. A vertex may be linked to another vertex in one
direction or in both directions. Assume that a vertex does not link with itself,
so each entry in the main diagonal is 0. Note that the network matrix provides
information only on direct links.


                                                                1.7 Problem Solving With Matrices • MHR   71
Example 5 Using Matrices to Model a Network

Matrixville Airlines offers flights between                       London, England
eight cities as shown on the right.
a) Represent the network using a
                                                               Toronto             New
    matrix, A. Organize the matrix         Vancouver                       Paris   Delhi
    so the cities are placed in
    alphabetical order.
                                                                     Kingston,
b)   Calculate A2. What information                 Honolulu         Jamaica
     does it contain?
                                                                   Buenos Aires
c)   How many indirect routes with
     exactly one change of planes are there from London to Buenos Aires?
d)   Calculate A + A2. What information does it contain?
e)   Explain what the entry from Vancouver to Paris in A + A2 represents.
f)   Calculate A3. Compare this calculation with the one for A2.
g)   Explain the significance of any entry in matrix A3.


Solution

a)          B H K L N P T V
            0    0   0   0   0   1   1   0   B
            0    0   0   0   0   0   1   0   H
            0    0   0   1   0   0   1   0   K
     A=     0    0   1   0   1   1   1   1   L
            0    0   0   1   0   1   0   0   N
            1    0   0   1   1   0   1   0   P
            1    1   1   1   0   1   0   1   T
            0    0   0   1   0   0   1   0   V

b) Since the dimensions of matrix A are 8 × 8, you may prefer to use a
     calculator or software for this calculation.

             2   1   1   2   1   1   1   1
             1   1   1   1   0   1   0   1
             1   1   2   1   1   2   1   2
     A2 =    2   1   1   5   1   2   3   1
             1   0   1   1   2   1   2   1
             1   1   2   2   1   4   2   2
             1   0   1   3   2   2   6   2
             1   1   2   1   1   2   1   2
     The entries in A2 show the number of indirect routes with exactly one
     change of planes. A2 does not contain any information on direct routes.


72     MHR • Tools for Data Management
c) There are two indirect routes with exactly one change of planes from
     London to Buenos Aires.

     London → Paris → Buenos Aires
     London → Toronto → Buenos Aires

                 2   1   1   2   1   2   2   1
                 1   1   1   1   0   1   1   1
                 1   1   2   2   1   2   2   2
d)   A + A2 =    2   1   2   5   2   3   4   2
                 1   0   1   2   2   2   2   1
                 2   1   2   3   2   4   3   2
                 2   1   2   4   2   3   6   2
                 1   1   2   2   1   2   2   2
     Since A shows the number of direct routes and A2 shows the number of
     routes with one change of planes, A + A2 shows the number of routes with
     at most one change of planes.

e) The entry in row 8, column 6 of A + A2 shows that there are two routes
     with a maximum of one change of planes from Vancouver to Paris.
     Vancouver → Toronto → Paris
     Vancouver → London → Paris

             2   1   3 5 3 6 8 3
             1   0   1 3 2 2 6 1
             3   1   2 8 3 4 9 2
f)   A3 =    5   3   8 8 7 11 12 8
             3   2   3 7 2 6 5 3
             6   2   4 11 6 6 12 4
             8   6   9 12 5 12 8 9
             3   1   2 8 3 4 9 2
     The calculation of A3 = A2 × A is more laborious than that for A2 = A × A
     since A2 has substantially fewer zero entries than A does. A calculator or
     spreadsheet could be useful.

g) The entries in A3 tell you the number of indirect routes with exactly two
     changes of planes between each pair of cities.




                                                                1.7 Problem Solving With Matrices • MHR   73
Key Concepts

     • To multiply two matrices, their inner dimensions must be the same. The outer
       dimensions give the dimensions of the resultant matrix: Am×n × Bn×p = Cm×p. To
       find the entry with row i and column j of matrix AB, multiply the entries of
       row i of matrix A with the corresponding entries of column j of matrix B, and
       then add the resulting products together.

     • The inverse of the 2 × 2 matrix A = a b is A−1 = ᎏᎏ d −b
                                                    ΄     ΅                 ΄            ΅
                                                                 1
                                              c d             ad − bc −c a
       provided that ad ≠ bc. Larger inverse matrices can be found using a graphing
       calculator or a spreadsheet.

     • To represent a network as a matrix, use a 1 to indicate a direct link and a 0 to
       indicate no direct link. Calculations with the square of a network matrix and
       its higher powers give information on the various direct and indirect routings
       possible.

     Communicate Your Understanding

      1. Explain how multiplying matrices is different from scalar multiplication of
        matrices.

      2. Describe the steps you would take to multiply
                                                              ΄ 4 −2 ΅΄ 3
                                                                1 5 0
                                                                                6
                                                                                4
                                                                                    2 −1 .
                                                                                    5 7          ΅
      3. Is it possible to find an inverse for a matrix that is not square? Why or why
        not?

      4. Explain why a network matrix must be square.
                                                                                A                    B
      5. Describe how you would represent the following network
        as a matrix. How would you find the number of routes
                                                                            C                            D
        with up to three changeovers?


                                                                                             E



Practise                                                          Calculate, if possible,
                                                                  a) BD     b) DB                c) B2       d) EA
 A
                                                                  e) AC     f) CE                g) DA
 1. Let A =
               ΄
             4 7
            −3 −5           1΅
                            0 ,B= 2
                                  −7 ΄         ΅
                                              9 ,
                                              0
                                                               2. Given A =
                                                                                ΄ 4 −1 ΅ and B = ΄−2 3 ΅, show
                                                                                     2             0


         ΄              ΅                           ΄ ΅
          1 5  8                                1                                 0                  0
     C = 2 0 −4 , D = −3
         −3 −2 8       2         ΄       5΅
                                         1 ,E= 2 .
                                               −3                                    ΄
                                                                  that A2 + 2B3 = 16 −30 .               ΅
                                                                                  24   1


74      MHR • Tools for Data Management
8. Application Calculators Galore has three
3. If A =
            ΄0 0      ΅
             0 1 , show that A4 =
                                                  ΄   0
                                                      0     ΅
                                                           0 ,
                                                           0
                                                                      stores in Matrixville. The downtown store
                                                                      sold 12 business calculators, 40 scientific
    the 2 × 2 zero matrix.
                                                                      calculators, and 30 graphing calculators
                                                                      during the past week. The northern store
4. Let A =
                ΄ 5 −1 ΅, B = ΄ −2 4 ΅, C = ΄ 1 −3΅.
                  2
                     0           3
                                   0          0 7                     sold 8 business calculators, 30 scientific
                                                                      calculators, and 21 graphing calculators
    Show that
                                                                      during the same week, and the southern
    a) A(B + C) = AB + AC                                             store sold 10 business calculators,
         (distributive property)                                      25 scientific calculators, and 23 graphing
    b) (AB)C = A(BC )                                                 calculators. What were the total weekly sales
         (associative property)                                       for each store if the average price of a
    c) AB ≠ BA                                                        business calculator is $40, a scientific
         (not commutative)                                            calculator is $30, and a graphing calculator
                                                                      is $150?
5. Find the inverse matrix, if it exists.
                                                                   9. Application The manager at Sue’s Restaurant
                             4 −6
    a)
       ΄ ΅ 0 −1
           2 4
                       b)
                          ΄ ΅
                            −2 3
                                             c)
                                                  ΄    3
                                                      −6
                                                           0
                                                           1   ΅      prepares the following schedule for the next
                                                                      week.
    d)
       ΄ 5 3΅
         4 2             ΄ 4 2΅
                       e) 10 5                                     Employee Mon. Tues.
                                                                    Chris    − 8
                                                                                         Wed. Thurs.
                                                                                          − 8
                                                                                                       Fri. Sat. Sun.
                                                                                                        8 − −
                                                                                                                        Wage Per Hour
                                                                                                                          $7.00
6. Use a graphing calculator or a spreadsheet                       Lee      4 4          − −          6.5 4 4            $6.75
    to calculate the inverse matrix, if it exists.                  Jagjeet − 4           4 4           4 8 8             $7.75
                                                                    Pierre   − 3          3 3           3 8 −             $6.75

             ΄               ΅
               1 −3 1
                                                                    Ming     8 8          8 8           − − −             $11.00
     a) A = −2      1 3
                                                                    Bobby − −             3 5           5 8 −             $8.00
               0 −1 0
                                                                    Nicole 3 3            3 3           3 − −             $7.00


             ΄                   ΅
           −2 0   5                                                 Louis    8 8          8 8           8 − −             $12.00
    b) B =  2 −1 −1                                                 Glenda 8 −            − 8           8 8 8             $13.00
            3 4   0                                                 Imran    3 4.5        4 3           5 − −             $7.75
                                                                       a) Create matrix A to represent the number



             ΄                           ΅
            2 −1             1       0                                    of hours worked per day for each
            0 1              0       2
    c) C =                                                                employee.
           −2 −1             0       0
                                                                      b) Create matrix B to represent the hourly
            1 0             −1       0
                                                                          wage earned by each employee.
Apply, Solve, Communicate                                              c) Use a graphing calculator or spreadsheet
                                                                          to calculate the earnings of each
B                                                                         employee for the coming week.
7. For A =
                ΄−2 − 4 ΅ and B = ΄ 5 7΅, show that
                  2
                      5             3 4
                                                                      d) What is the restaurant’s total payroll for
                                                                          these employees?
    a) (A−1)−1 = A
    b) (AB)−1 = B −1A−1
    c) (A t )−1 = (A−1) t


                                                                            1.7 Problem Solving With Matrices • MHR                75
10. According to a 1998 general social survey               c) What is the total cost of cloth and labour
     conducted by Statistics Canada, the ten most              for filling the order in part a)?
     popular sports for people at least 15 years
     old are as follows:                                12. Use the coding matrix
                                                           each message.
                                                                                     ΄ −2 −3 ΅ to encode
                                                                                        2
                                                                                           5
 Sport                  Total (%) Male (%) Female (%)
 Golf                      7.4     11.1       3.9           a) BIRTHDAY PARTY FRIDAY
 Ice Hockey                6.2     12.0       0.5           b) SEE YOU SATURDAY NIGHT
 Baseball                  5.5       8.0      3.1
 Swimming                  4.6       3.6      5.6       13. Application Use the decoding matrix

                                                           ΄ −1 −3 ΅ to decode each message.
 Basketball                3.2       4.6      1.9                2
 Volleyball                3.1       3.3      2.8             2
 Soccer                    3.0       4.6      1.5           a) 64, 69, 38, 45, 54, 68, 31, 44, 5, 115, 3,
 Tennis                    2.7       3.6      1.8              70, 40, 83, 25, 49
 Downhill/Alpine Skiing    2.7       2.9      2.6
                                                            b) 70, 47, 39, 31, 104, 45, 61, 25, 93, 68, 57,
 Cycling                   2.5       3.0      2.0
                                                               44, 55, 127, 28, 76
     In 1998, about 11 937 000 males and
                                                        14. a) Create a secret message about 16 to 24
     12 323 000 females in Canada were at least
                                                               letters long using the coding
     15 years old. Determine how many males
     and how many females declared each of the                        ΄
                                                               matrix 3 5 .
                                                                         1 2   ΅
     above sports as their favourite. Describe how
                                                            b) Trade messages with a classmate and
     you used matrices to solve this problem.
                                                               decode each other’s messages.
11. Application A company manufacturing
                                                        15. Quality education at a school requires open
     designer T-shirts produces five sizes: extra-
                                                           communication among many people.
     small, small, medium, large, and extra-large.
                                                                Superintendent
     The material and labour needed to produce
     a box of 100 shirts depends on the size of
     the shirts.
                    Cloth per     Labour per                    Administration
      Size          shirt (m2)   100 shirts (h)
      Extra-small      0.8             8
      Small            0.9           8.5                           Teachers              Guidance
      Medium           1.2             9
      Large            1.5            10
      Extra-large      2.0            11
                                                                   Students               Parents
     a) How much cloth and labour are required
        to fill an order for 1200 small, 1500                a) Represent this network as a matrix, A.
        medium, 2500 large, and 2000 extra-                 b) Explain the meaning of any entry, ai j , of
        large T-shirts?                                        matrix A.
     b) If the company pays $6.30 per square                c) Describe what the sum of the entries in
        metre for fabric and $10.70 per hour for               the third column represents.
        labour, find the cost per box for each size          d) Calculate A2.
        of T-shirt.

76     MHR • Tools for Data Management
e) How many indirect links exist with
                                                           C
              exactly one intermediary between the
                                                          18. Inquiry/Problem Solving Create your own
              principal and parents? List these links.
                                                               network problem, then exchange problems
           f) Calculate A + A2. Explain what                   with a classmate. Solve both problems and
              information this matrix provides.                compare your solutions with those of your
      16. Network matrices provide another approach
                                                               classmate. Can you suggest any
           to the Koenigsberg bridges example on               improvements for either set of solutions?
           page 44.                                       19. Show how you could use inverse matrices
                            Blacksmith                         to solve any system of equations in two
                              Bridge
                  Honey                          Wooden
                                                               variables whose matrix of coefficients has
                  Bridge                         Bridge        an inverse.
                                  D
                        E                    G
                                                          20. Communication Research encryption
                            Merchants    C
                             Bridge
                                                               techniques on the Internet. What is meant
                                                               by 128-bit encryption? How does the system
                   A                         F                 of private and public code keys work?
                                 B                High
               Green                             Bridge   21. Inquiry/Problem Solving
               Bridge       Connecting
                              Bridge                           a) Suppose you receive a coded message
                                                                  like the one in Example 4, but you do
           Use network matrices to answer the                     not know the coding matrix or its
           following questions.                                   inverse. Describe how you could use a
           a) How many ways can you get from                      computer to break the code and decipher
              Honey Bridge to Connecting Bridge by                the message.
              crossing only one of the other bridges?          b) Describe three methods you could use
              List these routes.                                  to make a matrix code harder to break.
           b) How many ways can you get from
                                                          22. a) Show that, for any m × n matrix A and
              Blacksmith Bridge to Connecting Bridge
                                                                  any n × p matrix B, (AB)t = B tA t.
              without crossing more than one of the
              other bridges?                                   b) Show that, if a square matrix C has an
                                                                  inverse C –1, then C t also has an inverse,
           c) Is it possible to travel from Wooden
                                                                  and (C t )–1 = (C –1) t.
              Bridge to Green Bridge without crossing
              at least two other bridges?

      17. Use network matrices to find the number
      pt
    ha e   of VIA Rail routes from
C


       r




           a) Toronto to Montréal with up to two
       m
P




r
    oble
              change-overs
           b) Kingston to London with up to three
              change-overs




                                                                   1.7 Problem Solving With Matrices • MHR      77
Review of Key Concepts

1.1 The Iterative Process                             1.2 Data Management Software
Refer to the Key Concepts on page 10.                 Refer to the Key Concepts on page 21.

 1. a) Draw a tree diagram showing your direct         5. List three types of software that can be used
        ancestors going back four generations.            for data management, giving an example of
     b) How many direct ancestors do you have             the data analysis you could do with each
        in four generations?                              type.

 2. a) Describe the algorithm used to build the        6. Evaluate each spreadsheet expression.
        iteration shown.                                  a) F2+G7–A12
     b) Continue the iteration for eight more                where F2=5, G7= –9, and A12=F2+G7
        rows.                                             b) PROD(D3,F9)

     c) Describe the resulting iteration.
                                                             where D3=6 and F9=5
                                                          c) SQRT(B1)
                           MATH
                         MATHMATH                            where B1=144
                      MATH      MATH
                                                       7. Describe how to reference cells A3 to A10
                    MATHMATHMATHMATH
                  MATH              MATH                  in one sheet of a spreadsheet into cells B2
                MATHMATH          MATHMATH                to B9 in another sheet.

 3. a) Construct a Pythagoras fractal tree using       8. Use a spreadsheet to convert temperatures
        the following algorithm.                          between −30° C and 30° C to the
        Step 1: Construct a square.                       Fahrenheit scale, using the formula
        Step 2: Construct an isosceles right              Fahrenheit = 1.8 × Celsius + 32. Describe
                triangle with the hypotenuse on           how you would list temperatures at two-
                one side of the square.                   degree intervals in the Celsius column.
        Step 3: Construct a square on each of
                the other sides of the triangle.      1.3 Databases
        Repeat this process, with the newly           Refer to the Key Concepts on page 31.
        drawn squares to a total of four
                                                       9. Describe the characteristics of a well-
        iterations.
                                                          organized database.
     b) If the edges in the first square are 4 cm,
        determine the total area of all the squares   10. Outline a design for a database of a shoe
        in the fourth iteration.                          store’s customer list.
     c) Determine the total area of all the
                                                      11. a) Describe the types of data that are
        squares in the diagram.
                                                             available from Statistics Canada’s
 4. Design an iterative process using the percent            E-STAT database.
     reduction capabilities of a photocopier.             b) What can you do with the data once
                                                             you have accessed them?



78     MHR • Tools for Data Management
12. What phrase would you enter into a search       18. State whether each network is
    engine to find                                      i)    connected
    a) the top-selling cookbook in Canada?             ii) traceable
    b) the first winner of the Fields medal?            iii) planar
    c) a list of movies in which bagpipes are           a)       A   C            b)       P     Q
         played?

1.4 Simulations                                                                        U                R
Refer to the Key Concepts on page 39.                         B      D

13. List three commonly used simulations and                                               T     S
    a reason why each is used.                          c)           L

14. Write out the function to generate a random
                                                             J                M
    integer between 18 and 65 using                                  K
    a) a graphing calculator
                                                                     N
    b) a spreadsheet
                                                    19. For each network in question 18, verify that
15. A chocolate bar manufacturer prints one of
    a repeating sequence of 50 brainteasers on         V − E + R = 2, where V is the number of
    the inside of the wrapper for each of its          vertices, E is the number of edges, and R is
    chocolate bars. Describe a manual                  the number of regions in a graph.
    simulation you could use to estimate the        20.The following is a listing of viewing requests
    chances of getting two chocolate bars with         submitted by patrons of a classic film
    the same brainteaser if you treat yourself to      festival. Use graph theory to set up the
    one of the bars every Friday for five weeks.        shortest viewing schedule that has no
16. Outline how you would use technology to
                                                       conflicts for any of these patrons.
    run a simulation 500 times for the scenario        Person A: Gone With the Wind, Curse of The
    in question 15.                                    Mummy, Citizen Kane
                                                       Person B: Gone With the Wind, Jane Eyre
1.5 Graph Theory                                       Person C: The Amazon Queen, West Side
Refer to the Key Concepts on page 48.                  Story, Citizen Kane
                                                       Person D: Jane Eyre, Gone With the Wind,
17. How many colours are needed to colour              West Side Story
    each of the following maps?                        Person E: The Amazon Queen, Ben Hur
    a)                      b)          A   B
                   C             C
                                        D
         A   B         D
                                 E          F
                   E
                                        G

                                                                         Review of Key Concepts • MHR       79
21. Below is a network showing the                                Calculate, if possible,
     relationships among a group of children.                      a) A + C       b) C − B
     The vertices are adjacent if the children
     are friends.                                                  c) A + B       d) 3D
                                                                       1
                             Sarah       Mai                      e) −ᎏᎏ C        f) 3(B + D)
                                                                       2
                                                                  g) A t + B      h) B t + C t
                  Deqa                              Priya
                                                               25. The manager of a sporting goods store takes
                                                                  inventory at the end of the month and finds
                             Tanya           Afra
                                                                  15 basketballs, 17 volleyballs, 4 footballs,
                                                                  15 baseballs, 8 soccer balls, 12 packs of
     a) Rewrite the network in table form.                        tennis balls, and 10 packs of golf balls. The
     b) Are these children all friends with each                  manager orders and receives a shipment of
         other?                                                   10 basketballs, 3 volleyballs, 15 footballs,
     c) Who has the most friends?                                 20 baseballs, 12 soccer balls, 5 packs of
     d) Who has the fewest friends?
                                                                  tennis balls, and 15 packs of golf balls.
                                                                  During the next month, the store sells
1.6 Modelling With Matrices                                       17 basketballs, 13 volleyballs, 17 footballs,
Refer to the Key Concepts on page 59.                             12 baseballs, 12 soccer balls, 16 packs of
                                                                  tennis balls, and 23 packs of golf balls.


                                     ΄                   ΅
                        2 −1 5
                                                                   a) Represent the store’s stock using three
                        0 4 3
22. For the matrix A =          ,                                     matrices, one each for the inventory, new
                        7 −8 −6
                                                                      stock received, and items sold.
                       −2 9 1
                                                                  b) How many of each item is in stock at the
     a) state the dimensions                                          end of the month?
     b) state the value of entry                                   c) At the beginning of the next month, the
         i)   a32            ii) a13          iii) a41                manager is asked to send 20% of the
     c) list the entry with value                                     store’s stock to a new branch that is
                                                                      about to open. How many of each item
         i)   3              ii) 9            iii) −1
                                                                      will be left at the manager’s store?
23. Write a 4 × 3 matrix, A, with the property
                                                               26. Outline the procedure you would use to
     that aij = i × j for all entries.
                                                                  subtract one matrix from another


                                                 ΄ ΅
                                                     8 −2          a) manually
                               2 −1 , B =
24. Given A =
                    ΄    3
                        −7     0 5       ΅           3 4 ,
                                                     2 5
                                                                  b) using a graphing calculator
                                                                   c) using a spreadsheet


                                                ΄ ΅
                                    4                    3
                    1 −4 , and D = −1
     C=   ΄ −5
             6
                    9 0        ΅    6
                                                         7 .
                                                         2



80      MHR • Introduction to Probability
1.7 Problem Solving With Matrices                  31. a) Write an equation to show the
Refer to the Key Concepts on page 74.                     relationship between a matrix and its
                                                          inverse.


                                                                       ΄                       ΅
                                                                                    −1
27. Let A =
              ΄ −6 5΅, B = ΄ −5 7 ΅,
                 4
                   3
                              1 0
                                                       b) Show that 20
                                                                        1.5 0
                                                                             −1.5 −13 is the
                                                                       −7.5 0.5       5

        ΄               ΅          ΄ ΅
         3 6 −1          5


                                                                       ΄               ΅
    C = 2 0 4 , and D = 4 .                                           4       2      6
        −5 −2 8         −3                                inverse of 10       0      2 .
    Calculate, if possible,                                           5       3      9
    a) AB         b) BA         c) A2
    d) DC         e) C 2
                                                       c) Find the inverse of
                                                                                    ΄ 4 5 ΅.
                                                                                      2 3
28. a) Write the transpose of matrices             32. The following diagram illustrates the food

            ΄       ΅
       A = 1 5 and B = 0 4 .
            8 −2               ΄
                             6 −1       ΅             chains in a pond.

    b) Show whether (AB) = B tA t.
                        t                                                  Plants

29. A small accounting firm charges $50 per              Small Fish                         Large Fish
    hour for preparing payrolls, $60 per hour
    for corporate tax returns, and $75 per hour
    for audited annual statements. The firm did              Snails                     Bacteria
    the following work for three of its clients:
    XYZ Limited, payrolls 120 hours, tax               a) Represent these food chains as a network
    returns 10 hours, auditing 10 hours                   matrix, A.
    YZX Limited, payrolls 60 hours, tax                b) Calculate A2.
    returns 8 hours, auditing 8 hours                  c) How many indirect links with exactly
    ZXY Limited, payrolls 200 hours, tax                  one intermediate step are there from
    returns 15 hours, auditing 20 hours                   plants to snails?
    a) Use matrices to determine how much the          d) Calculate A + A2. Explain the meaning
       accounting firm should bill each client.            of any entry in the resulting matrix.
    b) How can you determine the total billed          e) Calculate A3.
       to the three clients?                           f) List all the links with two intermediate

30. Suppose you were to encode a message by
                                                          steps from plants to bacteria.
    writing it in matrix form and multiplying by
    a coding matrix. Would your message be
    more secure if you then multiplied the
    resulting matrices by another coding matrix
    with the same dimensions as the first one?
    Explain why or why not.

                                                                       Review of Key Concepts • MHR     81
Chapter Test

ACHIEVEMENT CHART

                         Knowledge/        Thinking/Inquiry/
      Category                                                     Communication          Application
                        Understanding       Problem Solving
      Questions               All              8, 9, 14        1, 2, 5, 6, 7, 8, 9, 14   9, 10, 13, 14


 1. a) Describe an iterative                                       5. Suppose that, on January 10, you borrowed
        process you could use                                          $1000 at 6% per year compounded monthly
        to draw the red path.                                          (0.5% per month). You will be expected to
     b) Complete the path.                                             repay $88.88 a month for 1 year. However,
                                                                       the final payment will be less than $88.88.
                                                                       You set up a spreadsheet with the following
 2. Find the first few terms of the recursion                           column headings: MONTH, BALANCE,
                        1                                              PAYMENT, INTEREST, PRINCIPAL,
     formula tn = ᎏ , given t1 = 0.                                    NEW BALANCE
                   tn − 1 + 2
     Is there a pattern to these terms? If so,                         The first row of entries would be:
     describe the pattern.                                             MONTH: February
                                                                       BALANCE: 1000.00
 3. A “fan-out” calling system is frequently used                      PAYMENT: 88.88
     to spread news quickly to a large number of                       INTEREST: 5.00
     people such as volunteers for disaster relief.                    PRINCIPAL: 83.88
     The first person calls three people. Each of                       NEW BALANCE: 916.12
     those people calls an additional three people;                    Describe how you would
     each of whom calls an additional three                             a) use the cell referencing formulas and
     people, and so on.                                                    the Fill feature to complete the table
     a) Use a tree diagram to illustrate a fan-out                     b) determine the size of the final payment
        calling system with sufficient levels to                            on January 10 of the following year
        call 50 people.
                                                                        c) construct a line graph showing the
     b) How many levels would be sufficient to                              declining balance
        call 500 people?
                                                                   6. Describe how you would design a database
 4. Rewrite each of the following expressions as                       of the daily travel logs for a company’s
     spreadsheet functions.                                            salespersons.
     a) C1+C2+C3+C4+C5+C6+C7+C8
                                                                   7. Describe three different ways to generate
     b) The smallest value between cells A5                            random integers between 1 and 50.
        and G5
                                                                   8. a) Redraw this map as a
         5 − ͙6ෆ
     c) ᎏ                                                                 network.
         10 + 15
                                                                       b) How many colours are
                                                                          needed to colour the
                                                                          map? Explain your reasoning.


82     MHR • Introduction to Probability
9. A salesperson must visit each of the towns                                  b) What is the value of entry a23?
   on the following map.                                                        c) Identify the entry of matrix A with
     Pinkford            55         Orangeton                                      value −2.
                                                67
                                                        Blacktown
                                                                                d) Is it possible to calculate A2? Explain.
          50            60
                                    55


                                                                                          ΄΅
                                              46                                           2
                                    Blueton
   Brownhill        Redville
                                    35
                                                   86
                                                                            12. Let A =    1 , B = [7 5 0], C = 4 8 ,
                                                                                           5                    5 −3    ΄            ΅
                53             38     40


                                                                                                          ΄             ΅
               Whiteford
                               49             Greenside                                            8 −2
    a) Is there a route that goes through each                                      ΄
                                                                                   9 1       ΅
                                                                               D = 2 −7 , and E =  5 0 .
                                                                                                  −4 1
       town only once? Explain.
   b) Find the shortest route that begins and                                  Calculate, if possible,
       ends in Pinkford and goes through all the                                a) 2C + D b) A + B c) AD d) EC e) E t
       towns. Show that it is the shortest route.
                                                                            13. A local drama club staged a variety show for
10. The following map                                                          four evenings. The admission for adults was
   shows the bridges                                                           $7.00, for students $4.00, and for children
   of Uniontown,                                                               13 years of age and under $2.00. On
   situated on the                                                             Wednesday, 52 adult tickets, 127 student
   banks of a river and on three islands. Use                                  tickets, and 100 child tickets were sold; on
   graph theory to determine if a continuous                                   Thursday, 67 adult tickets, 139 student
   path could traverse all the bridges once each.                              tickets, and 115 child tickets were sold; on
                                                                               Friday, 46 adult tickets, 115 student tickets,


                ΄                        ΅
             4 −2 6
            –8 5 9                                                             and 102 child tickets were sold; and on
11. Let A =          .                                                         Saturday, 40 adult tickets, 101 student
             0 1 −1
             3 −7 −3                                                           tickets, and 89 child tickets were sold. Use
                                                                               matrices to calculate how much money was
    a) State the dimensions of matrix A.
                                                                               collected from admissions.

     ACHIEVEMENT CHECK

  Knowledge/Understanding                Thinking/Inquiry/Problem Solving         Communication                      Application
14. The network diagram below gives the cost of flights between                                           Montréal
                                                                                           $579                             $249
    five Canadian cities.
                                                                                                   $469
    a) Construct a network matrix A for these routes.                                                                $199
                                                                                                                                   Halifax
                                                                                    Vancouver $269
    b) Calculate A2 and A3.                                                                      Winnipeg
                                                                                                              $438            $398
    c) How many ways can a person travel from Halifax to                                    $508
       Vancouver by changing planes exactly twice? Describe                                                    Toronto
       each route. Which route is most economical?


                                                                                                          Chapter Test • MHR             83
To o l s f o r D a t a M a n a g e m e n t P r o j e c t


     Wrap-Up
 Implementing Your Action Plan                          8. From your rankings, select the top five
  1. With your whole class or a small group,              universities or community colleges. Draw
     brainstorm criteria for ranking universities         a diagram of the distances from each
     and community colleges. List the three               university or college to the four others and
     universities or colleges that you think will         to your home. Then, use graph theory to
     most likely be the best choices for you.             determine the most efficient way to visit
                                                          each of the five universities or community
     2. Have a class discussion on weighting              colleges during a five-day period, such as a
       systems.                                           March break vacation.
     3. Look up the Maclean’s university and            9. Based on your project, select your top three
       community college rankings in a library or         choices. Comment on how this selection
       on the Internet. Note the criteria that            compares with your original list of top
       Maclean’s uses.                                    choices.
     4. Determine your own set of criteria. These
       may include those that Maclean’s uses or
                                                       Suggested Resources
       others, such as travelling distances,
                                                       • Maclean’s magazine rankings of universities
       programs offered, size of the city or town
                                                         and community colleges
       where you would be living, and
       opportunities for part-time work.               • Other publications ranking universities and
                                                         community colleges
     5. Choose the ten criteria you consider most      • University and community college calendars
       important. Research any data you need to
                                                       • Guidance counsellors
       rate universities and colleges with these
       criteria.                                       • Map of Ontario
                                                       • Spreadsheets
     6. Assign a weighting factor to each of the ten
       criteria. For example, living close to home     Refer to section 9.3 for information on
       may be worth a weighting of 5 and tuition       implementing an action plan and Appendix C
       cost may be worth a weighting of 7.             for information on research techniques.

     7. Use a spreadsheet and matrix methods to
       determine an overall score for each
       university or community college in
       Ontario. Then, rank the universities or                 www.mcgrawhill.ca/links/MDM12
       community colleges on the spreadsheet.           For details of the Maclean’s rankings of universities
       Compare your rankings with those in               and colleges, visit the above web site and follow
       Maclean’s magazine. Explain the similarities                            the links.
       or differences.




84      MHR • Tools for Data Management Project
Evaluating Your Project                                      Presentation
 1. Reflect on your weighting formula and                     Prepare a written report on your findings.
    whether you believe it fairly ranks the                  Include
    universities and community colleges in                   • the raw data
    Ontario.
                                                             • a rationale for your choice of criteria
 2. Compare your rating system to that used                  • a rationale for your weightings
    by one of your classmates. Can you suggest               • a printout of your spreadsheet
    improvements to either system?
                                                             • a diagram showing the distances between
 3 What went well in this project?                             your five highest-ranked universities or
                                                               community colleges and the route you would
 4. If you were to do the project over again,                  use to visit them
    what would you change? Why?
                                                             • a summary of your findings
 5. If you had more time, how would you
    extend this project?

 6. What factors could change between now
    and when you make your final decision
    about which university or college to attend?




  Preparing for
  the Culminating Project

  Applying Project Skills                                    Keeping on Track
  Consider how the data management tools you                 Now is a good time to draw up a schedule
  used on this project could be applied to the               for your culminating project and to
  culminating project in Chapter 9 to                        investigate methods for selecting a topic.
  • access resources                                         Refer to Chapter 9 for an overview of how to
  • carry out research                                       prepare a major project. Section 9.1 suggests
  • carry out an action plan                                 methods for choosing a topic. Also, consider
  • evaluate your project                                    how to find the information you will need in
  • summarize your findings in a written report               order to choose your topic.

                                                 Refine/Redefine


  Define the    Define      Develop an    Implement       Evaluate Your     Prepare    Present Your      Constructively
  Problem       Your Task   Action Plan   Your Action     Investigation     Written    Investigation     Critique the
                                          Plan            and Its Results   Report     and Its Results   Presentations
                                                                                                         of Others




                                                               Tools for Data Management Project: Wrap-Up • MHR           85
Career Connection
                                           Cryptographer
     In this digital era, information is sent with blinding
     speed around the world. These transmissions need to be
     both secure and accurate. Although best known for their
     work on secret military codes, cryptographers also
     design and test computerized encryption systems that
     protect a huge range of sensitive data including
     telephone conversations among world leaders, business
     negotiations, data sent by credit-card readers in retail
     stores, and financial transactions on the Internet.
     Encrypted passwords protect hackers from reading or
     disrupting critical databases. Even many everyday
     devices, such as garage-door openers and TV remote
     controls, use codes.

     Cryptographers also develop error-correcting codes.
     Adding these special codes to a signal allows a computer
     receiving it to detect and correct errors that have
     occurred during transmission. Such codes have
     numerous applications including CD players,
     automotive computers, cable TV networks, and pictures
     sent back to Earth by interplanetary spacecraft.

     Modern cryptography is a marriage of mathematics and
     computers. A cryptographer must have a background
     in logic, matrices, combinatorics, and computer
     programming as well as fractal, chaos, number, and
     graph theory. Cryptographers work for a wide variety
     of organizations including banks, government offices,
     the military, software developers, and universities.




                 www.mcgrawhill.ca/links/MDM12

              Visit the above web site and follow the links
               for more information about a career as a
                cryptographer and about other careers
                         related to mathematics.




86       MHR • Tools for Data Management
Statistics Project


Life Expectancies
Background
Do women live longer than men? Do people live longer in warmer climates? Are
people living longer today than 50 years ago? Do factors such as education and
income affect life expectancy? In this project, you will answer such questions by
applying the statistical techniques described in the next two chapters.

Your Task
Research and analyse current data on life expectancies in Canada, and perhaps in
other countries. You will use statistical analysis to compare and contrast the data,
detect trends, predict future life expectancies, and identify factors that may affect
life expectancies.

Developing an Action Plan
You will need to find sources of data on life expectancies and to choose the kinds
of comparisons you want to make. You will also have to decide on a method for
handling the data and appropriate techniques for analysing them.




                                                                      <<Section Project: Introduction • MHR
                                                                       Statistics number and title>>          87
2
              2
    PT   ER
         ER
                  Statistics of One Variable
CHA




                  Specific Expectations                                                             Section

                  Locate data to answer questions of significance or personal interest, by             2.2
                  searching well-organized databases.

                  Use the Internet effectively as a source for databases.                             2.2

                  Demonstrate an understanding of the purpose and the use of a variety             2.3, 2.4
                  of sampling techniques.

                  Describe different types of bias that may arise in surveys.                         2.4

                  Illustrate sampling bias and variability by comparing the characteristics      2.4, 2.5, 2.6
                  of a known population with the characteristics of samples taken
                  repeatedly from that population, using different sampling techniques.

                  Organize and summarize data from secondary sources, using                      2.1, 2.2, 2.5,
                  technology.                                                                         2.6

                  Compute, using technology, measures of one-variable statistics (i.e.,            2.5, 2.6
                  the mean, median, mode, range, interquartile range, variance, and
                  standard deviation), and demonstrate an understanding of the
                  appropriate use of each measure.

                  Interpret one-variable statistics to describe characteristics of a data set.     2.5, 2.6

                  Describe the position of individual observations within a data set, using           2.6
                  z-scores and percentiles.

                  Explain examples of the use and misuse of statistics in the media.                  2.4

                  Assess the validity of conclusions made on the basis of statistical studies,     2.5, 2.6
                  by analysing possible sources of bias in the studies and by calculating
                  and interpreting additional statistics, where possible.

                  Explain the meaning and the use in the media of indices based on                    2.2
                  surveys.
In earlier times they had no
                                                                  statistics, and so they had
                                                                  to fall back on lies. Hence
                                                                  the huge exaggerations of
                                                                  primitive literature—giants
                                                                  or miracles or wonders!
                                                                  They did it with lies and
                                                                  we do it with statistics; but
                                                                  it is all the same.
                                                                  —Stephen Leacock (1869–1944)



                                                                  Facts are stubborn, but
                                                                  statistics are more pliable.
                                                                  —Mark Twain (1835–1910)




Chapter Problem
Contract Negotiations                           As these questions suggest, statistics could
François is a young NHL hockey player           be used to argue both for and against
whose first major-league contract is up for      a large salary increase for François.
renewal. His agent wants to bargain for a       However, the statistics themselves are
better salary based on François’ strong         not wrong or contradictory. François’
performance over his first five seasons with      agent and the team’s manager will,
the team. Here are some of François’            understandably, each emphasize only the
statistics for the past five seasons.            statistics that support their bargaining
                                                positions. Such selective use of statistics
Season   Games     Goals   Assists   Points     is one reason why they sometimes receive
                                                negative comments such as the quotations
  1        20        3        4        7        above. Also, even well-intentioned
  2        45        7       11       18        researchers sometimes inadvertently use
  3        76       19       25       44        biased methods and produce unreliable
  4        80       19       37       56        results. This chapter explores such sources
  5        82       28       36       64        of error and methods for avoiding them.
 Total    303       76      113      189        Properly used, statistical analysis is a
 1. How could François’ agent use these         powerful tool for detecting trends and
   statistics to argue for a substantial pay    drawing conclusions, especially when you
   increase for his client?                     have to deal with large sets of data.

 2. Are there any trends in the data that the
   team’s manager could use to justify a
   more modest increase?
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

 1. Fractions, percents, decimals The following          6. Graphing data Consider the following
     amounts are the total cost for the items              graph, which shows the average price
     including the 7% goods and services tax               of thingamajigs over time.
     (GST) and an 8% provincial sales tax (PST).




                                                             Price of Thingamajigs ($)
     Determine the price of each item.                                                   1.90
                                                                                         1.80
     a) watch $90.85
                                                                                         1.70
     b) CD $19.54                                                                        1.60

     c) bicycle $550.85                                                                  1.50
                                                                                         1.40
     d) running shoes $74.39
                                                                                                0 1996   1997   1998     1999   2000   2001
 2. Fractions, percents, decimals                                                                                     Year
     a) How much will Josh make if he receives
        an 8% increase on his pay of $12.50/h?              a) What was the price of thingamajigs
                                                                                     in 1996?
     b) What is the net increase in Josh’s take-
        home pay if the payroll deductions total            b) In what year did the price first rise
        17%?                                                                         above $1.50?
                                                            c) Describe the overall trend over the
 3. Fractions, percents, decimals What is the                                        time period shown.
     percent reduction on a sweater marked
                                                            d) Estimate the percent increase in the
     down from $50 to $35?
                                                                                     price of thingamajigs from 1996 to 2001.
 4. Fractions, percents, decimals Determine                 e) List the domain and range of these data.
     the cost, including taxes, of a VCR sold at
     a 25% discount from its original price of           7. Graphing data The table below gives the
     $219.                                                 number of CDs sold at a music store on
                                                           each day of the week for one week.
 5. Mean, median, mode Calculate the mean,                   Day                                     Number of CDs Sold
     median, and mode for each set of data.
                                                             Monday                                              48
     a) 22, 26, 28, 27, 26                                   Tuesday                                             52
     b) 11, 19, 14, 23, 16, 26, 30, 29                       Wednesday                                          44
     c) 10, 18, 30, 43, 18, 13, 10                           Thursday                                            65
     d) 70, 30, 25, 52, 12, 70                               Friday                                             122

     e) 370, 260, 155, 102, 126, 440                         Saturday                                           152
                                                             Sunday                                              84
     f) 24, 32, 37, 24, 32, 38, 32, 36, 35, 42
                                                           Display the data on a circle graph.




90     MHR • Statistics of One Variable
2.1         Data Analysis With Graphs

  Statistics is the gathering, organization, analysis, and
  presentation of numerical information. You can apply
  statistical methods to almost any kind of data.
  Researchers, advertisers, professors, and sports
  announcers all make use of statistics. Often, researchers
  gather large quantities of data since larger samples
  usually give more accurate results. The first step in the
  analysis of such data is to find ways to organize, analyse,
  and present the information in an understandable form.

      I N V E S T I G AT E & I N Q U I R E : U s i n g G r a p h s t o A n a l y s e D a t a

      1. Work in groups or as a class to design a fast and efficient way to survey your
         class about a simple numerical variable, such as the students’ heights or the
         distances they travel to school.
      2. Carry out your survey and record all the results in a table.
      3. Consider how you could organize these results to look for any trends or
         patterns. Would it make sense to change the order of the data or to divide
         them into groups? Prepare an organized table and see if you can detect any
         patterns in the data. Compare your table to those of your classmates. Which
         methods work best? Can you suggest improvements to any of the tables?
      4. Make a graph that shows how often each value or group of values occurs in
         your data. Does your graph reveal any patterns in the data? Compare your
         graph to those drawn by your classmates. Which graph shows the data most
         clearly? Do any of the graphs have other advantages? Explain which graph
         you think is the best overall.
      5. Design a graph showing the total of the frequencies of all values of the
         variable up to a given amount. Compare this cumulative-frequency graph
         to those drawn by your classmates. Again, decide which design works best
         and look for ways to improve your own graph and those of your classmates.


  The unprocessed information collected for a study is called raw data. The quantity
  being measured is the variable. A continuous variable can have any value within
  a given range, while a discrete variable can have only certain separate values (often
  integers). For example, the height of students in your school is a continuous
  variable, but the number in each class is a discrete variable. Often, it is useful to
  know how frequently the different values of a variable occur in a set of data.
  Frequency tables and frequency diagrams can give a convenient overview of the
  distribution of values of the variable and reveal trends in the data.

                                                                             2.1 Data Analysis With Graphs • MHR   91
A histogram is a special form of bar graph in which the areas of the bars are
proportional to the frequencies of the values of the variable. The bars in a histogram
are connected and represent a continuous range of values. Histograms are used for
variables whose values can be arranged in numerical order, especially continuous
variables, such as weight, temperature, or travel time. Bar graphs can represent all
kinds of variables, including the frequencies of separate categories that have no set
order, such as hair colour or citizenship. A frequency polygon can illustrate the
same information as a histogram or bar graph. To form a frequency polygon, plot
frequencies versus variable values and then join the points with straight lines.

            10                                          10                                                                    10
Frequency




                                            Frequency




                                                                                                       Frequency
             5                                           5                                                                     5




             0                                           0 Red                                                                     0
                  5 10 15 20 25 30                                Blond Brown Black Purple Green                                       5       10      15     20     25
             Travel Time to School (min)                                 Hair Colour                                                       Travel Time to School (min)
                        Histogram                                       Bar Graph                                                             Frequency Polygon

A cumulative-frequency graph or ogive shows the running
                                                                                                                              30
total of the frequencies from the lowest value up.
                                                                                                                              25
                                                                                                       Cumulative Frequency
                                                                                                                              20

                        www.mcgrawhill.ca/links/MDM12
                                                                                                                              15
                 To learn more about histograms, visit the above
                   web site and follow the links. Write a short                                                               10
                   description of how to construct a histogram.
                                                                                                                               5


                                                                                                                                   0
                                                                                                                                       5       10      15     20         25
                                                                                                                                           Travel Time to School (min)

Example 1 Frequency Tables and Diagrams

Here are the sums of the two numbers from 50 rolls of a pair of standard dice.
            11      4       4    10         8                 7    6     6      5      10   7      9                            8      8
             4      7       9    11        12                10    3     7      6       9   5      8                            6      8
             2      6       7     5        11                 2    5     5      6       6   5      2                           10      9
             6      5       5     5         3                 9    8     2

a)           Use a frequency table to organize these data.
b)           Are any trends or patterns apparent in this table?
c)           Use a graph to illustrate the information in the frequency table.


92               MHR • Statistics of One Variable
d)   Create a cumulative-frequency table and graph for the data.
e)   What proportion of the data has a value of 6 or less?

Solution

a) Go through the data and tally the frequency of each value of                                                      Sum          Tally           Frequency
     the variable as shown in the table on the right.                                                                 2       ||||                    4
                                                                                                                      3       ||                      2
b) The table does reveal a pattern that was not                                                                       4       |||                     3
     obvious from the raw data. From the                                                                              5       |||| ||||               9
     frequency column, notice that the middle                                                                         6       |||| |||                8
     values tend to be the most frequent while                                                                        7       ||||                    5
     the high and low values are much less                                                                            8       |||| |                  6
     frequent.                                                                                                        9       ||||                    5
                                                                                                                     10       ||||                    4
                                                                                                                     11       |||                     3
                                                                                                                     12       |                       1


c) The bar graph or
                                      8                                                                         8




                                                                                              Frequency
     frequency polygon
                          Frequency




                                      6                                                                         6
     makes the pattern                4                                                                         4
     in the data more                 2                                                                         2
     apparent.                        0                                                                          0
                                              2   3   4    5   6   7   8    9 10 11 12                                2   3   4   5   6   7   8   9 10 11 12
                                                           Sum                                                                     Sum


d) Add a column for cumulative frequencies to the table. Each value in this
     column is the running total of the frequencies of each sum up to and
     including the one listed in the corresponding row of the sum column.
     Graph these cumulative frequencies against the values of the variable.
      Sum       Tally     Frequency                       Cumulative Frequency
                                                                                         Cumulative Frequency




                                                                                                                50
        2    ||||                         4                                 4                                   40
        3    ||                           2                                 6                                   30
        4    |||                          3                                 9                                   20
        5    |||| ||||                    9                                18
                                                                                                                10
        6    |||| |||                     8                                26
                                                                                                                 0
        7    ||||                         5                                31                                         2   3   4   5   6   7   8   9 10 11 12
        8    |||| |                       6                                37                                                      Sum
        9    ||||                         5                                42
       10    ||||                         4                                46
       11    |||                          3                                49
       12    |                            1                                50

e) From either the cumulative-frequency column or the diagram, you can
     see that 26 of the 50 outcomes had a value of 6 or less.



                                                                                               2.1 Data Analysis With Graphs • MHR                             93
When the number of measured values is large, data are usually grouped into
classes or intervals, which make tables and graphs easier to construct and
interpret. Generally, it is convenient to use from 5 to 20 equal intervals that
cover the entire range from the smallest to the largest value of the variable.
The interval width should be an even fraction or multiple of the measurement
unit for the variable. Technology is particularly helpful when you are working
with large sets of data.


Example 2 Working With Grouped Data

This table lists the daily high temperatures in July for a city in southern Ontario.
Day                  1      2      3      4    5    6    7    8    9    10   11
Temperature (°C)     27    25     24      30   32   31   29   24   22   19   21

Day                  12    13     14      15   16   17   18   19   20   21   22
Temperature (°C)     25    26     31      33   33   30   29   27   28   26   27

Day                  23    24     25      26   27   28   29   30   31
Temperature (°C)     22    18     20      25   26   29   32   31   28

a)   Group the data and construct a frequency table, a histogram or frequency          See Appendix B for
     polygon, and a cumulative-frequency graph.                                        more detailed
b)   On how many days was the maximum temperature 25°C or less? On how                 information about
     many days did the temperature exceed 30°C?                                        technology functions
                                                                                       and keystrokes.

Solution 1     Using a Graphing Calculator

a) The range of the data is 33°C − 18°C = 15°C. You could use five 3-degree
     intervals, but then many of the recorded temperatures would fall on the
     interval boundaries. You can avoid this problem by using eight 2-degree
     intervals with the lower limit of the first interval at 17.5°C. The upper limit
     of the last interval will be 33.5°C.

     Use the STAT EDIT menu to make sure that lists L1 to L4 are clear, and then
     enter the temperature data into L1. Use STAT PLOT to turn on Plot1 and
     select the histogram icon. Next, adjust the window settings. Set Xmin and
     Xmax to the lower and upper limits for your intervals and set Xscl to the
     interval width. Ymin should be 0. Press GRAPH to display the histogram,
     then adjust Ymax and Yscl, if necessary.




94     MHR • Statistics of One Variable
You can now use the TRACE instruction and the arrow keys to determine the
   tally for each of the intervals. Enter the midpoints of the intervals into L2
   and the tallies into L3. Turn off Plot1 and set up Plot2 as an x-y line plot of
   lists L2 and L3 to produce a frequency polygon.




   Use the cumSum( function from the LIST OPS menu to find the running totals
   of the frequencies in L3 and store the totals in L4. Now, an x-y line plot of L2
   and L4 will produce a cumulative-frequency graph.




b) Since you know that all the temperatures were in whole degrees, you can see
   from the cumulative frequencies in L4 that there were 11 days on which the
   maximum temperature was no higher than 25°C. You can also get this
   information from the cumulative-frequency graph.

   You cannot determine the exact number of days with temperatures over
   30°C from the grouped data because temperatures from 29.5°C to 31.5°C
   are in the same interval. However, by interpolating the cumulative-
   frequency graph, you can see that there were about 6 days on which the
   maximum temperature was 31°C or higher.


Solution 2   Using a Spreadsheet

a) Enter the temperature data into column A and the midpoints of the intervals
   into column B. Use the COUNTIF function in column C to tally the
   cumulative frequency for each interval. If you use absolute cell referencing,
   you can copy the formula down the column and then change just the upper
   limit in the counting condition. Next, find the frequency for each interval by
   finding the difference between its cumulative frequency and the one for the
   previous interval.

   You can then use the Chart feature to produce a frequency polygon by
   graphing columns B and D. Similarly, charting columns B and C will
   produce a cumulative-frequency graph.


                                                                   2.1 Data Analysis With Graphs • MHR   95
In Corel® Quattro® Pro, you can also use the Histogram tool in the
     Tools/Numeric Tools/Analysis menu to automatically tally the frequencies
     and cumulative frequencies.




b) As in the solution using a graphing calculator, you can see from the
     cumulative frequencies that there were 11 days on which the maximum
     temperature was no higher than 25°C. Also, you can estimate from the
     cumulative-frequency graph that there were 6 days on which the maximum
     temperature was 31°C or higher. Note that you could use the COUNTIF
     function with the raw data to find the exact number of days with
     temperatures over 30°C.




96     MHR • Statistics of One Variable
A relative-frequency table or diagram shows the frequency of a data                                      Project
group as a fraction or percent of the whole data set.                                                    Prep

                                                                                                         You may find
Example 3 Relative-Frequency Distribution
                                                                                                         frequency-
Here are a class’ scores                         78    81     55     60      65      86     44     90
                                                                                                         distribution diagrams
obtained on a data-                              77    71     62     39      80      72     70     64    useful for your
management examination.                          88    73     61     70      75      96     51     73    statistics project.
                                                 59    68     65     81      78      67

a)   Construct a frequency table that includes a column for relative frequency.
b)   Construct a histogram and a frequency polygon.
c)   Construct a relative-frequency histogram and a relative-frequency polygon.
d)   What proportion of the students had marks between 70% and 79%?

Solution

a) The lowest and highest scores are                     Score (%)        Midpoint         Tally    Frequency Relative Frequency
     39% and 96%, which give a range                     34.5−39.5          37            |              1          0.033
     of 57%. An interval width of 5 is                   39.5−44.5          42            |              1          0.033
     convenient, so you could use                        44.5−49.5          47            −              0          0
     13 intervals as shown here. To                      49.5−54.5          52            |              1          0.033
     determine the relative frequencies,                 54.5−59.5          57            ||             2          0.067
     divide the frequency by the total                   59.5−64.5          62            ||||           4          0.133
     number of scores. For example, the                  64.5−69.5          67            ||||           4          0.133
     relative frequency of the first interval             69.5−74.5          72            |||| |         6          0.200
         1                                               74.5−79.5          77            ||||           4          0.133
     is ᎏᎏ, showing that approximately                   79.5−84.5          82            |||            3          0.100
        30
     3% of the class scored between                      84.5−89.5          87            ||             2          0.067
     34.5% and 39.5%.                                    89.5−94.5          92            |              1          0.033
                                                         94.5−99.5          97            |              1          0.033


b) The frequency polygon can be superimposed onto the same grid
     as the histogram.

                 6
     Frequency




                 4


                 2



                 0   37 42 47 52 57 62 67 72 77 82 87 92 97
                                   Score




                                                                                      2.1 Data Analysis With Graphs • MHR     97
c) Draw the relative-frequency histogram and                                    0.2




                                                           Relative Frequency
     the relative-frequency polygon using the same
     procedure as for a regular histogram and
     frequency polygon. As you can see, the only                                0.1

     difference is the scale of the y-axis.

                                                                                  0   37 42 47 52 57 62 67 72 77 82 87 92 97
                                                                                                       Score


d) To determine the proportion of students with marks in the 70s, add the relative
     frequencies of the interval from 69.5 to 74.5 and the interval from 74.5 to 79.5:
     0.200 + 0.133 = 0.333

     Thus, 33% of the class had marks between 70% and 79%.



Categorical data are given labels rather than being measured numerically.
For example, surveys of blood types, citizenship, or favourite foods all produce
categorical data. Circle graphs (also known as pie charts) and pictographs
are often used instead of bar graphs to illustrate categorical data.


Example 4 Presenting Categorical Data

The table at the right shows Canadians’ primary use                               Primary Use                      Households (%)
of the Internet in 1999.                                                          E-mail                               15.8
                                                                                  Electronic banking                    4.2
Illustrate these data with                                                        Purchase of goods and services        3.6
a) a circle graph
                                                                                  Medical or health information         8.6
b)   a pictograph                                                                 Formal education/training             5.8
                                                                                  Government information                7.8
                                                                                  Other specific information            14.7
                                                                                  General browsing                     14.2
                                                                                  Playing games                         6.7
                                                                                  Chat groups                           4.7
                                                                                  Other Internet services               5.8
                                                                                  Obtaining music                       5.0
                                                                                  Listening to the radio                3.1




98     MHR • Statistics of One Variable
Solution

a)                              Home Internet Use
                                        Listening to the Radio 3.1%
                Obtaining Music 5.0%
                                                  E-mail 15.8%
     Other Internet Services 5.8%

             Chat Groups 4.7%                         Electronic Banking 4.2%
                                                       Purchase of Goods and Services 3.6%
          Playing Games 6.7%                           Medical or Health Information 8.6%
                                                     Formal Education/Training 5.8%
         General Browsing 14.2%                   Government Information 7.8%

                           Other Specific Information 14.7%




b) There are numerous ways to represent the data with a pictograph.
     The one shown here has the advantages of being simple and visually
     indicating that the data involve computers.
                                    Home Internet Use
      E-mail
      Electronic Banking
      Purchase of Goods and Services
      Medical or Health Information
      Formal Education/Training
      Government Information
      Other Specific Information
      General Browsing
      Playing Games
      Chat Groups
      Other Internet Services
      Obtaining Music
      Listening to the Radio

      Each       represents 2% of households.



You can see from the example above that circle graphs are good for showing the
sizes of categories relative to the whole and to each other. Pictographs can use a
wide variety of visual elements to clarify the data and make the graph more
interesting. However, with both circle graphs and pictographs, the relative
frequencies for the categories can be hard to read accurately. While a well-
designed pictograph can be a useful tool, you will sometimes see pictographs
with distorted or missing scales or confusing graphics.



                                                                                   2.1 Data Analysis With Graphs • MHR   99
Key Concepts

  • Variables can be either continuous or discrete.

  • Frequency-distribution tables and diagrams are useful methods of summarizing
    large amounts of data.

  • When the number of measured values is large, data are usually grouped into
    classes or intervals. This technique is particularly helpful with continuous
    variables.
  • A frequency diagram shows the frequencies of values in each individual
    interval, while a cumulative-frequency diagram shows the running total of
    frequencies from the lowest interval up.

  • A relative-frequency diagram shows the frequency of each interval as a
    proportion of the whole data set.

  • Categorical data can be presented in various forms, including bar graphs,
    circle graphs (or pie charts), and pictographs.


  Communicate Your Understanding

      1. a) What information does a histogram present?
        b) Explain why you cannot use categorical data in a histogram.

      2. a) What is the difference between a frequency diagram and a cumulative-
            frequency diagram?
        b) What are the advantages of each of these diagrams?

      3. a) What is the difference between a frequency diagram and a relative-
            frequency diagram?
        b) What information can be easily read from a frequency diagram?
         c) What information can be easily read from a relative-frequency diagram?

      4. Describe the strengths and weaknesses of circle graphs and pictographs.




100     MHR • Statistics of One Variable
Practise                                                 b) Use the circle graph to determine what
                                                            percent of the people surveyed chose
A                                                           vegetarian dishes.
1. Explain the problem with the intervals in              c) Sketch a pictograph for the data.
    each of the following tables.
                                                         d) Use the pictograph to determine whether
    a)    Age (years)      Frequency                        more than half of the respondents chose
               28−32            6                           red-meat dishes.
               33−38            8
                                                       4. a) Estimate the number of hours you spent
               38−42           11
                                                            each weekday on each of the following
               42−48            9
                                                            activities: eating, sleeping, attending
               48−52            4
                                                            class, homework, a job, household
    b)     Score (%)       Frequency                        chores, recreation, other.
               61−65           5                         b) Present this information using a circle
               66−70           11                           graph.
               71−75           7                         c) Present the information using a
               76−80            4                           pictograph.
               91−95            1
                                                       Apply, Solve, Communicate
2. Would you choose a histogram or a bar               5. The examination scores for a biology class
    graph with separated bars for the data listed        are shown below.
    below? Explain your choices.
                                                           68   77   91   66   52   58   79   94   81
    a) the numbers from 100 rolls of a standard            60   73   57   44   58   71   78   80   54
         die                                               87   43   61   90   41   76   55   75   49
    b) the distances 40 athletes throw a shot-put
                                                          a) Determine the range for these data.
    c) the ages of all players in a junior lacrosse
                                                         b) Determine a reasonable interval size
         league
                                                            and number of intervals.
    d) the heights of all players in a junior
                                                          c) Produce a frequency table for the
         lacrosse league
                                                            grouped data.
3. A catering service conducted a survey asking          d) Produce a histogram and frequency
    respondents to choose from six different hot            polygon for the grouped data.
    meals.                                                e) Produce a relative-frequency polygon
Meal Chosen                                   Number        for the data.
Chicken cordon bleu                             16
                                                          f) Produce a cumulative-frequency polygon
New York steak                                  20          for the data.
Pasta primavera (vegetarian)                     9
                                                         g) What do the frequency polygon, the
Lamb chop                                       12
                                                            relative-frequency polygon, and the
Grilled salmon                                  10          cumulative-frequency polygon each
Mushroom stir-fry with almonds (vegetarian)      5          illustrate best?
    a) Create a circle graph to illustrate these
         data.

                                                                 2.1 Data Analysis With Graphs • MHR    101
b) Create a frequency table and diagram.
 B
 6. a) Sketch a bar graph to show the results                         c) Create a cumulative-frequency diagram.
         you would expect if you were to roll a                       d) How might the store owner use this
         standard die 30 times.                                          information in planning sales
      b) Perform the experiment or simulate it                           promotions?
         with software or the random-number                       9. The speeds of 24 motorists ticketed for
         generator of a graphing calculator.                          exceeding a 60-km/h limit are listed below.
         Record the results in a table.
                                                                       75     72   66    80     75   70   71   82
      c) Produce a bar graph for the data you
                                                                       69     70   72    78     90   75   76   80
         collected.                                                    75     96   91    77     76   84   74   79
      d) Compare the bar graphs from a) and c).
         Account for any discrepancies you                            a) Construct a frequency-distribution table
         observe.                                                        for these data.
                                                                      b) Construct a histogram and frequency
 7. Application In order to set a reasonable price                       polygon.
      for a “bottomless” cup of coffee, a restaurant
                                                                      c) Construct a cumulative-frequency
      owner recorded the number of cups each
                                                                         diagram.
      customer ordered on a typical afternoon.
                                                                      d) How many of the motorists exceeded
        2   1    2    3    0    1    1     1   2    2                    the speed limit by 15 km/h or less?
        1   3    1    4    2    0    1     2   3    1
                                                                      e) How many exceeded the speed limit by
      a) Would you present these data in a                               over 20 km/h?
         grouped or ungrouped format? Explain
         your choice.                                            10. Communication This table summarizes the
                                                                 pt
                                                               ha e   salaries for François’ hockey team.
      b) Create a frequency table and diagram.
                                                           C


                                                                  r




                                                                            Salary ($)        Number of Players
                                                                  m
                                                           P




                                                           r
      c) Create a cumulative-frequency diagram.                oble
                                                                             300 000                  2
      d) How can the restaurant owner use this                               500 000                  3
         information to set a price for a cup of                             750 000                  8
         coffee? What additional information
                                                                             900 000                  6
         would be helpful?
                                                                         1 000 000                    2
 8. Application The list below shows the value                           1 500 000                    1
      of purchases, in dollars, by 30 customers at                       3 000 000                    1
      a clothing store.                                                  4 000 000                    1
       55.40 48.26 28.31         14.12     88.90   34.45              a) Reorganize these data into appropriate
       51.02 71.87 105.12        10.19     74.44   29.05                 intervals and present them in a frequency
       43.56 90.66 23.00         60.52     43.17   28.49                 table.
       67.03 16.18 76.05         45.68     22.76   36.73
       39.92 112.48 81.21        56.73     47.19   34.45              b) Create a histogram for these data.
                                                                      c) Identify and explain any unusual features
      a) Would you present these data in a                               about this distribution.
         grouped or ungrouped format? Explain
         your choice.


102     MHR • Statistics of One Variable
11. Communication
     a) What is the sum of all the relative
                     frequencies for any set of data?
     b) Explain why this sum occurs.
                                                                                              b) Sketch a relative-frequency polygon to
12. The following relative-frequency polygon                                                       show the results you would expect if
     was constructed for the examination scores                                                    these dice were rolled 100 times.
     for a class of 25 students. Construct the                                                 c) Explain why your graph has the shape
     frequency-distribution table for the students’                                                it does.
     scores.
                                                                                              d) Use software or a graphing calculator
                            0.32
                                                                                                   to simulate rolling the funny dice 100
                            0.28                                                                   times, and draw a relative-frequency
       Relative Frequency




                            0.24                                                                   polygon for the results.
                            0.20                                                               e) Account for any differences between
                            0.16                                                                   the diagrams in parts b) and d).
                            0.12
                            0.08                                      15. This cumulative-frequency diagram shows
                            0.04                                                              the distribution of the examination scores
                               0   35 45 55 65 75 85 95
                                                                                              for a statistics class.
                                            Score                      Cumulative Frequency
                                                                                              30
                                                                                              25
13. Inquiry/Problem Solving The manager of a
                                                                                              20
     rock band suspects that MP3 web sites have                                               15
     reduced sales of the band’s CDs. A survey of                                             10
     fans last year showed that at least 50% had                                              5
     purchased two or more of the band’s CDs.
                                                                                               0     34.5     44.5   54.5     64.5   74.5   84.5   94.5
     A recent survey of 40 fans found they had
                                                                                                                            Score
     purchased the following numbers of the
     band’s CDs.                                                                              a) What interval contains the greatest
          2                   1    2   1    3   1    4    1   0   1                                number of scores? Explain how you can
          0                   2    4   1    0   5    2    3   4   1                                tell.
          2                   1    1   1    3   1    0    5   4   2                           b) How many scores fall within this interval?
          3                   1    1   0    2   2    0    0   1   3
                                                                      16. Predict the shape of the relative-frequency
     Does the new data support the manager’s
                                                                                              diagram for the examination scores of a
     theory? Show the calculations you made to
                                                                                              first-year university calculus class. Explain
     reach your conclusion, and illustrate the
                                                                                              why you chose the shape you did. Assume
     results with a diagram.
                                                                                              that students enrolled in a wide range of
 C                                                                                            programs take this course. State any other
14. Inquiry/Problem Solving
                                                                                              assumptions that you need to make.
     a) What are the possible outcomes for a roll
                     of two “funny dice” that have faces with
                     the numbers 1, 1, 3, 5, 6, and 7?

                                                                                                       2.1 Data Analysis With Graphs • MHR                103
2.2           Indices

  In the previous section, you used
  tables and graphs of frequencies to
  summarize data. Indices are another
  way to summarize data and
  recognize trends. An index relates
  the value of a variable (or group of
  variables) to a base level, which is
  often the value on a particular date.
  The base level is set so that the
  index produces numbers that are
  easy to understand and compare.
  Indices are used to report on a wide
  variety of variables, including prices
  and wages, ultraviolet levels in
  sunlight, and even the readability
  of textbooks.




        I N V E S T I G AT E & I N Q U I R E : C o n s u m e r P r i c e I n d e x

        The graph below shows Statistics Canada’s
                                                               Unadjusted Consumer Price Index




                                                                                                 118
        consumer price index (CPI), which tracks                                                 116
        the cost of over 600 items that would be                                                 114
                                                                         (1992 = 100)




        purchased by a typical family in Canada.                                                 112
        For this chart, the base is the cost of the                                              110
        same items in 1992.                                                                      108
                                                                                                 106
                                                                                                 104
                                                                                                       M          J          J          J          J          J M
                                                                                                           1996       1997       1998       1999       2000    2001


         1. What trend do you see in this graph? Estimate the annual rate of increase.
         2. Estimate the annual rate of increase for the period from 1992 to 1996.
           Do you think the difference between this rate and the one from 1996 to
           2001 is significant? Why or why not?
         3. What was the index value in February of 1998? What does this value tell
           you about consumer prices at that time?


  104     MHR • Statistics of One Variable
4. What would be the best way to estimate what the consumer price index
                                  will be in May of 2003? Explain your reasoning.
                                5. Explain how the choice of the vertical scale in the graph emphasizes
                                  changes in the index. Do you think this emphasis could be misleading?
                                  Why or why not?


The best-known Canadian business index is the S&P/TSX Composite Index,
managed for the Toronto Stock Exchange by Standard & Poor’s Corporation.
Introduced in May, 2002, this index is a continuation of the TSE 300 Composite
Index®, which goes back to 1977. The S&P/TSX Composite Index is a measure
of the total market value of the shares of over 200 of the largest companies traded
on the Toronto Stock Exchange. The index is the current value of these stocks
divided by their total value in a base year and then multiplied by a scaling factor.
When there are significant changes (such as takeovers or bankruptcies) in any of
the companies in the index, the scaling factor is adjusted so that the values of the
index remain directly comparable to earlier values. Note that the composite index
weights each company by the total value of its shares (its market capitalization)
rather than by the price of the individual shares. The S&P/TSX Composite Index
usually indicates trends for major Canadian corporations reasonably well, but it
does not always accurately reflect the overall Canadian stock market.

Time-series graphs are often used to show how indices change over time.
Such graphs plot variable values versus time and join the adjacent data points
with straight lines.

Example 1 Stock Market Index

The following table shows the TSE 300 Composite Index® from 1971 to 2001.
 TSE 300 Index (1975 = 1000)




                               10 000

                                8000

                                6000

                                4000

                                2000

                                    0
                                       1971
                                       1973
                                       1975
                                       1977
                                       1979
                                       1981
                                       1983
                                       1985
                                       1987
                                       1989
                                       1991
                                       1993
                                       1995
                                       1997
                                       1999
                                       2001




a)                             What does the notation “1975 = 1000” mean?
b)                             By what factor did the index grow over the period shown?
c)                             Estimate the rate of growth of the index during the 1980s.

                                                                                                          2.2 Indices • MHR   105
Solution

a) The notation indicates that the index shows the stock prices relative to what
      they were in 1975. This 1975 base has been set at 1000. An index value of
      2000 would mean that overall the stocks of the 300 companies in the index
      are selling for twice what they did in 1975.

b) From the graph, you can see that the index increased from about 1000 in 1971
      to about 10 000 in 2001. Thus, the index increased by a factor of approximately
      10 over this period.

c) To estimate the rate of growth of the index during the 1980s, approximate
      the time-series graph with a straight line during that
      10-year interval. Then, calculate the slope of
      the line.
                                                               www.mcgrawhill.ca/links/MDM12
         rise
      m= ᎏ
         run                                           For more information on stock indices, visit the above
                                                            web site and follow the links. Write a brief
       ⋅ 3700 − 1700
       = ᎏᎏ                                                   description of the rules for inclusion in
              10
                                                                    the various market indices.
       = 200

      The TSE 300 Composite Index® rose about 200 points a year during the 1980s.



Statistics Canada calculates a variety of carefully researched economic indices.
For example, there are price indices for new housing, raw materials, machinery
and equipment, industrial products, and farm products. Most of these indices
are available with breakdowns by province or region and by specific categories,
such as agriculture, forestry, or manufacturing. Statisticians, economists, and the
media make extensive use of these indices. (See section 1.3 for information on
how to access Statistics Canada data.)

The consumer price index (CPI) is the most widely reported of these
economic indices because it is an important measure of inflation. Inflation is                 Data in Action
a general increase in prices, which corresponds to a decrease in the value of                Statistics Canada
money. To measure the average change in retail prices across Canada,                         usually publishes the
                                                                                             consumer price index
Statistics Canada monitors the retail prices of a set of over 600 goods and                  for each month in
services including food, shelter, clothing, transportation, household items,                 the third week of the
health and personal care, recreation and education, and alcohol and tobacco                  following month.
products. These items are representative of purchases by typical Canadians                   Over 60 000 price
and are weighted according to estimates of the total amount Canadians spend                  quotations are
                                                                                             collected for each
on each item. For example, milk has a weighting of 0.69% while tea has a
                                                                                             update.
weighting of only 0.06%.




106     MHR • Statistics of One Variable
Example 2 Consumer Price Index

The following graph shows the amount by which the consumer price index
changed since the same month of the previous year.
 Percent Change in CPI




                         3


                         2


                         1


                         0
                         May    J          J          J          J          J May
                           1996     1997       1998       1999       2000      2001


a)                       What does this graph tell you about changes in the CPI from 1996 to 2001?
b)                       Estimate the mean annual change in the CPI for this period.

Solution
a) Note that the graph above shows the annual changes in the CPI, unlike
                         the graph on page 104, which illustrates the value of the CPI for any
                         given month. From the above graph, you can see that the annual change
                         in the CPI varied between 0.5% and 4% from 1996 to 2001. Overall,           Project
                         there is an upward trend in the annual change during this period.           Prep

b) You can estimate the mean annual change by drawing a horizontal line                              If your statistics
                         such that the total area between the line and the parts of the curve        project examines
                         above it is approximately equal to the total area between the line and      how a variable
                         the parts of the curve below it. As shown above, this line meets the        changes over time,
                         y-axis near 2%.                                                             a time-series graph
                                                                                                     may be an effective
                         Thus, the mean annual increase in the CPI was roughly 2% from 1996
                                                                                                     way to illustrate
                         to 2001.
                                                                                                     your findings.


The consumer price index and the cost of living index are not
quite the same. The cost of living index measures the
cost of maintaining a constant standard of living. If
consumers like two similar products equally well,               www.mcgrawhill.ca/links/MDM12
their standard of living does not change when they
switch from one to the other. For example, if you           For more information about Statistics Canada
                                                           indices, visit the above web site and follow the
like both apples and pears, you might start buying
                                                                       links to Statistics Canada.
more apples and fewer pears if the price of pears
went up while the price of apples was unchanged. Thus,
your cost of living index increases less than the consumer
price index does.

                                                                                                     2.2 Indices • MHR   107
Indices are also used in many other fields, including science, sociology,
medicine, and engineering. There are even indices of the clarity of writing.


Example 3 Readability Index

The Gunning fog index is a measure of the readability of prose. This index
estimates the years of schooling required to read the material easily.

Gunning fog index = 0.4(average words per sentence + percent “hard” words)

where “hard” words are all words over two syllables long except proper nouns,
compounds of easy words, and verbs whose third syllable is ed or es.
a)    Calculate the Gunning fog index for a book with an average sentence
      length of 8 words and a 20% proportion of hard words.
b)    What are the advantages and limitations of this index?

Solution
a) Gunning fog index = 0.4(8 + 20)
                            = 11.2
      The Gunning fog index shows that the book is written at a level            Project
      appropriate for readers who have completed grade 11.                       Prep

b) The Gunning fog index is easy to use and understand. It generates a           You may want to use
      grade-level rating, which is often more useful than a readability rating   an index to
      on an arbitrary scale, such as 1 to 10 or 1 to 100. However, the index     summarize and
      assumes that bigger words and longer sentences always make prose           compare sets of data
      harder to read. A talented writer could use longer words and sentences     in your statistics
      and still be more readable than writers who cannot clearly express their   project.
      ideas. The Gunning fog index cannot, of course, evaluate literary merit.




                           www.mcgrawhill.ca/links/MDM12

                        Visit the above web site to find a link to a
                    readability-index calculator. Determine the reading
                               level of a novel of your choice.




108     MHR • Statistics of One Variable
Key Concepts

  • An index can summarize a set of data. Indices usually compare the values of a
    variable or group of variables to a base value.

  • Indices have a wide variety of applications in business, economics, science, and
    other fields.

  • A time-series graph is a line graph that shows how a variable changes over time.
  • The consumer price index (CPI) tracks the overall price of a representative
    basket of goods and services, making it a useful measure of inflation.


  Communicate Your Understanding

    1. What are the key features of a time-series graph?

    2. a) Name three groups who would be interested in the new housing price index.
       b) How would this information be important for each group?

    3. Explain why the consumer price index is not the same as the cost of living index.



Practise                                                B
A                                                       3. Refer to the graph of the TSE 300
                                                            Composite Index® on page 105.
1. Refer to the consumer price index graph on
    page 104.                                               a) When did this index first reach five times
                                                               its base value?
    a) By how many index points did the CPI
       increase from January, 1992 to January,              b) Estimate the growth rate of the index
       2000?                                                   from 1971 to 1977. What does this
                                                               growth rate suggest about the Canadian
    b) Express this increase as a percent.
                                                               economy during this period?
    c) Estimate what an item that cost
                                                            c) During what two-year period did the index
       i)   $7.50 in 1992 cost in April, 1998                  grow most rapidly? Explain your answer.
       ii) $55 in August, 1997 cost in May, 2000            d) Could a straight line be a useful
                                                               mathematical model for the TSE 300
Apply, Solve, Communicate
                                                               Composite Index®? Explain why or why
2. a) Explain why there is a wide variety of                   not.
       items in the CPI basket.
                                                        4. Communication
    b) Is the percent increase for the price of
                                                            a) Define inflation.
       each item in the CPI basket the same?
       Explain.                                             b) In what way do the consumer price index
                                                               and the new housing price index provide
                                                               a measure of inflation?

                                                                                   2.2 Indices • MHR   109
c) How would you expect these two indices                                                               b) Describe how the overall trend in energy
                     to be related?                                                                              costs compares to that of the CPI for the
      d) Why do you think that they would be                                                                     period shown.
                     related in this way?                                                                     c) What insight is gained by removing the
                                                                                                                 energy component of the CPI?
 5. Application Consider the following time-
                                                                                                              d) Estimate the overall increase in the
      series graph for the consumer price index.
                                                                                                                 energy-adjusted CPI for the period shown.
        Consumer Price Index




                                                                                                              e) Discuss how your result in part d) compares
                                                                                                                 to the value found in part b) of Example 2.
            (1992 = 100)




                                    100


                                    50                                                                    7. François’ agent wants to bargain for a better
                                                                                                        pte
                                                                                                   ha         salary based on François’ statistics for his




                                                                                               C


                                                                                                          r
                                     0                                                                        first five seasons with the team.




                                                                                                          m
                                                                                               P
                                     1980    1984    1988       1992    1996    2000           r
                                                                                                   oble
                                                                                                              a) Produce a time-series graph for François’
      a) Identify at least three features of this                                                                goals, assists, and points over the past
                     graph that are different from the CPI                                                       five years.
                     graph on page 104.
                                                                                                              b) Calculate the mean number of goals,
      b) Explain two advantages that the graph                                                                   assists, and points per game played
                     shown here has over the one on page 104.                                                    during each of François’ five seasons.
      c) Explain two disadvantages of the graph                                                               c) Generate a new time-series graph based
                     shown here compared to the one on                                                           on the data from part b).
                     page 104.
                                                                                                              d) Which time-series graph will the agent
      d) Estimate the year in which the CPI was                                                                  likely use, and which will the team’s
                     at 50.                                                                                      manager likely use during the contract
      e) Explain the significance of the result in                                                                negotiations? Explain.
                     part d) in terms of prices in 1992.                                                      e) Explain the method or technology that
                                                                                                                 you used to answer parts a) to d).
 6. Application The following graph illustrates
      the CPI both with and without energy price                                                          8. Aerial surveys of wolves in Algonquin Park
      changes.                                                                                                produced the following estimates of their
                                                                                                              population density.
                                    4.0
                                                                                                                          Year       Wolves/100 km2
            Percent Change in CPI




                                    3.0
                                                                                                                        1988–89            4.91
                                                                All Items
                                                                                                                        1989–90            2.47
                                    2.0                                                                                 1990–91            2.80
                                                                                                                        1991–92            3.62
                                    1.0
                                                                        All Items                                       1992–93            2.53
                                                                        Excluding Energy
                                                                                                                        1993–94            2.23
                                     0
                                          May J             J             J            J May                            1994–95            2.82
                                           1997     1998         1999          2000    2001
                                                                                                                        1995–96            2.75
      a) How is this graph different from the one                                                                       1996–97            2.33
                     on page 107?                                                                                       1997–98            3.04
                                                                                                                        1998–99            1.59

110         MHR • Statistics of One Variable
a) Using 1988–89 as a base, construct an       12. Communication Use the Internet, a library,
      index for these data.                          or other resources to research two indices
   b) Comment on any trends that you                 not discussed in this section. Briefly describe
      observe.                                       what each index measures, recent trends in
                                                     the index, and any explanation or rationale
9. Use Statistics Canada web sites or other          for these trends.
   sources to find statistics for the following
   and describe any trends you notice.            13. Inquiry/Problem Solving The pictograph
                                                     below shows total greenhouse-gas emissions
   a) the population of Canada
                                                     for each province and territory in 1996.
   b) the national unemployment rate
                                                                                        = 638
   c) the gross domestic product                                                        = 50 200
                                                                                        = 99 800
10. Inquiry/Problem Solving                                                             = 149 500
   a) Use data from E-STAT or other sources
                                                                                        = 199 100
      to generate a time-series graph that
                                                                                         kilotonnes of
      shows the annual number of crimes in                                               CO2 equivalent
      Canada for the period 1989−1999. If
      using E-STAT, look in the Nation section
      under Justice/Crimes and Offences.
   b) Explain any patterns that you notice.
   c) In what year did the number of crimes
      peak?
   d) Suggest possible reasons why the number         a) Which two provinces have the highest
      of crimes peaked in that year. What                levels of greenhouse-gas emissions?
      other statistics would you need to              b) Are the diameters or areas of the circles
      confirm whether these reasons are                   proportional to the numbers they
      related to the peak in the number of               represent? Justify your answer.
      crimes?                                         c) What are the advantages and
                                                         disadvantages of presenting these data
11. a) Use data from E-STAT or other sources
                                                         as a pictograph?
      to generate a time-series graph that
      shows the number of police officers in           d) Which provinces have the highest levels
      Canada for the period 1989−1999. If                of greenhouse-gas emissions per
      using E-STAT, look in the Nation section           geographic area?
      under Justice/Police services.                  e) Is your answer to part d) what you would
   b) In what ways are the patterns in these             have expected? How can you account for
      data similar to the patterns in the data           such relatively high levels in these areas?
      in question 10? In what ways are the            f) Research information from E-STAT
      patterns different?                                or other sources to determine the
   c) In what year did the number of police              greenhouse-gas emissions per person
      officers peak?                                      for each province.
   d) Explain how this information could affect
      your answer to part d) of question 10.

                                                                              2.2 Indices • MHR     111
ACHIEVEMENT CHECK                                                        a) Construct a Pareto chart for these data.
                                                                                                b) Describe the similarities and differences
  Knowledge/                       Thinking/Inquiry/
                                                           Communication      Application
 Understanding                      Problem Solving                                                between a Pareto chart and other
                                                                                                   frequency diagrams.
14. The graph below shows the national
               unemployment rate from January, 1997,                                             Method                 Number of Respondents
               to June, 2001.                                                                    Automobile: alone                 26
                         %                                                                       Automobile: car pool              35
                       10.0
                                                                                                 Bus/Streetcar                     52
                        9.5
                                                                                                 Train                             40
 Unemployment Rate
 Seasonally Adjusted




                        9.0
                        8.5                                                                      Bicycle/Walking                   13
                        8.0
                        7.5
                        7.0
                        6.5
                        6.0                                                                              www.mcgrawhill.ca/links/MDM12
                              J          J             J             J            J M
                                  1997        1998           1999          2000   2001       For more information about Pareto charts, visit the above
                                                                                                web site and follow the links. Give two examples of
               a) Describe the overall trend for the period
                                                                                                  situations where you would use a Pareto chart.
                         shown.                                                                               Explain your reasoning.
              b) When did the unemployment rate reach
                         its lowest level?
               c) Estimate the overall unemployment rate                                    16. Pick five careers of interest to you.
                         for the period shown.                                                  a) Use resources such as CANSIM II,
              d) Explain what the term seasonally adjusted                                         E-STAT, newspapers, or the Internet to
                         means.                                                                    obtain information about entry-level
                                                                                                   income levels for these professions.
               e) Who is more likely to use this graph in
                         an election campaign, the governing                                    b) Choose an effective method to present
                         party or an opposing party? Explain.                                      your data.
                 f) How might an opposing party produce a                                       c) Describe any significant information
                         graph showing rising unemployment                                         you discovered.
                         without changing the data? Why would
                                                                                            17. a) Research unemployment data for
                         they produce such a graph?
                                                                                                   Ontario over the past 20 years.
                                                                                                b) Present the data in an appropriate form.
 C                                                                                              c) Conduct additional research to account
15. A Pareto chart is a type of frequency diagram                                                  for any trends or unusual features of the
           in which the frequencies for categorical data                                           data.
           are shown by connected bars arranged in                                              d) Predict unemployment trends for both
           descending order of frequency. In a random                                              the short term and the long term.
           survey, commuters listed their most                                                     Explain your predictions.
           common method of travelling to the
           downtown of a large city.



112                     MHR • Statistics of One Variable
2.3         Sampling Techniques

  Who will win the next federal election? Are
  Canadians concerned about global warming? Should
  a Canadian city bid to host the next Olympic Games?
  Governments, political parties, advocacy groups, and
  news agencies often want to know the public’s
  opinions on such questions. Since it is not feasible to
  ask every citizen directly, researchers often survey a
  much smaller group and use the results to estimate
  the opinions of the entire population.

       I N V E S T I G AT E & I N Q U I R E : Extrapolating From a Sample

       1. Work in groups or as a class to design a survey to determine the opinions of
         students in your school on a subject such as favourite movies, extra-curricular
         activities, or types of music.
       2. Have everyone in your class answer the survey.
       3. Decide how to categorize and record the results. Could you refine the survey
         questions to get results that are easier to work with? Explain the changes you
         would make.
       4. How could you organize and present the data to make it easier to recognize
         any patterns? Can you draw any conclusions from the data?
       5. a) Extrapolate your data to estimate the opinions of the entire school
             population. Explain your method.
          b) Describe any reasons why you think the estimates in part a) may be inaccurate.
          c) How could you improve your survey methods to get more valid results?



  In statistics, the term population refers to all individuals who belong to a
  group being studied. In the investigation above, the population is all the
  students in your school, and your class is a sample of that population. The
  population for a statistical study depends on the kind of data being collected.

  Example 1 Identifying a Population

  Identify the population for each of the following questions.
  a)   Whom do you plan to vote for in the next Ontario election?
  b)   What is your favourite type of baseball glove?
  c)   Do women prefer to wear ordinary glasses or contact lenses?


                                                                          2.3 Sampling Techniques • MHR   113
Solution

a) The population consists of those people in Ontario who will be eligible to
      vote on election day.

b) The population would be just those people who play baseball. However,
      you might want to narrow the population further. For example, you might
      be interested only in answers from local or professional baseball players.

c) The population is all women who use corrective lenses.




Once you have identified the population, you need to decide how you will
obtain your data. If the population is small, it may be possible to survey the
entire group. For larger populations, you need to use an appropriate sampling
technique. If selected carefully, a relatively small sample can give quite accurate
results.

The group of individuals who actually have a chance of being selected is called
the sampling frame. The sampling frame varies depending on the sampling
technique used. Here are some of the most commonly used sampling
techniques.


Simple Random Sample
In a simple random sample, every member of the population has an equal
chance of being selected and the selection of any particular individual does not
affect the chances of any other individual being chosen. Choosing the sample
randomly reduces the risk that selected members will not be representative of
the whole population. You could select the sample by drawing names randomly
or by assigning each member of the population a unique number and then using
a random-number generator to determine which members to include.


Systematic Sample
For a systematic sample, you go through the population sequentially and select
members at regular intervals. The sample size and the population size determine
the sampling interval.
           population size
interval = ᎏᎏ
             sample size
For example, if you wanted the sample to be a tenth of the population, you
would select every tenth member of the population, starting with one chosen
randomly from among the first ten in sequence.



114     MHR • Statistics of One Variable
Example 2 Designing a Systematic Sample

A telephone company is planning a marketing survey of its 760 000 customers.
For budget reasons, the company wants a sample size of about 250.
a)   Suggest a method for selecting a systematic sample.
b)   What expense is most likely to limit the sample size?

Solution

a) First, determine the sampling interval.
                 population size
     interval = ᎏᎏ
                   sample size
                760 000
              = ᎏᎏ
                  250
              = 3040
     The company could randomly select one of the first 3040 names on its list
     of customers and then choose every 3040th customer from that point on.
     For simplicity, the company might choose to select every 3000th customer
     instead.

b) The major cost is likely to be salaries for the staff to call and interview the
     customers.



Stratified Sample
Sometimes a population includes groups of members who share common
characteristics, such as gender, age, or education level. Such groups are called
strata. A stratified sample has the same proportion of members from each
stratum as the population does.

Example 3 Designing a Stratified Sample

Before booking bands for the school dances, the students’ council at Statsville
High School wants to survey the music preferences of the student body. The
following table shows the enrolment at the school.
     Grade    Number of Students
      9              255
      10             232
      11             209
      12             184
     Total           880

a)   Design a stratified sample for a survey of 25% of the student body.
b)   Suggest other ways to stratify this sample.

                                                                          2.3 Sampling Techniques • MHR   115
Solution

a) To obtain a stratified sample            Grade   Number of Students Relative Frequency Number Surveyed
      with the correct proportions,          9           255                0.29               64
      simply select 25% of the              10           232                0.26               58
      students in each grade level          11           209                0.24               52
      as shown on the right.                12           184                0.21               46
                                           Total         880                1.00              220

b) The sample could be stratified according to gender or age instead of grade level.



Other Sampling Techniques
Cluster Sample: If certain groups are likely to be representative of the entire
population, you can use a random selection of such groups as a cluster sample. For
example, a fast-food chain could save time and money by surveying all its employees
at randomly selected locations instead of surveying randomly selected employees
throughout the chain.

Multi-Stage Sample: A multi-stage sample uses several levels of random
sampling. If, for example, your population consisted of all Ontario households, you
could first randomly sample from all cities and townships in Ontario, then randomly
sample from all subdivisions or blocks within the selected cities and townships, and
finally randomly sample from all houses within the selected subdivisions or blocks.

Voluntary-Response Sample: In a voluntary-response sample, the researcher
simply invites any member of the population to participate in the survey. The results
from the responses of such surveys can be skewed because the people who choose to
respond are often not representative of the population. Call-in shows and mail-in
surveys rely on voluntary-response samples.

Convenience Sample: Often, a sample is selected simply because it is easily
accessible. While obviously not as random as some of the other techniques, such
convenience samples can sometimes yield helpful information. The investigation
at the beginning of this section used your class as a convenience sample.


   Key Concepts

   • Α carefully selected sample can provide accurate information about a population.

   • Selecting an appropriate sampling technique is important to ensure that the
     sample reflects the characteristics of the population. Randomly selected samples
     have a good chance of being representative of the population.

      • The choice of sampling technique will depend on a number of factors, such as
        the nature of the population, cost, convenience, and reliability.


116     MHR • Statistics of One Variable
Communicate Your Understanding

    1. What are the advantages and disadvantages of using a sample to estimate the
       characteristics of a population?

    2. Discuss whether a systematic sample is a random sample.

    3. a) Explain the difference between stratified sampling and cluster sampling.
       b) Suggest a situation in which it would be appropriate to use each of these
          two sampling techniques.



Practise                                                   e) A statistician conducting a survey
                                                              randomly selects 20 cities from across
A                                                             Canada, then 5 neighbourhoods from
1. Identify the population for each of the                    each of the cities, and then 3 households
    following questions.                                      from each of the neighbourhoods.
    a) Who should be the next president of                 f) The province randomly chooses 25
       the students’ council?                                 public schools to participate in a new
    b) Who should be next year’s grade-10                     fundraising initiative.
       representative on the student council?
                                                       3. What type(s) of sample would be
    c) What is the your favourite soft drink?              appropriate for
    d) Which Beatles song was the best?                    a) a survey of engineers, technicians, and
    e) How effective is a new headache remedy?                managers employed by a company?
                                                           b) determining the most popular pizza
2. Classify the sampling method used in each
                                                              topping?
    of the following scenarios.
                                                           c) measuring customer satisfaction for a
    a) A radio-show host invites listeners to call
                                                              department store?
       in with their views on banning smoking
       in restaurants.                                Apply, Solve, Communicate
    b) The Heritage Ministry selects a sample
       of recent immigrants such that the              B
       proportions from each country of origin         4. Natasha is organizing the annual family
       are the same as for all immigrants last             picnic and wants to arrange a menu that will
       year.                                               appeal to children, teens, and adults. She
    c) A reporter stops people on a downtown               estimates that she has enough time to survey
       street to ask what they think of the city’s         about a dozen people. How should Natasha
       lakefront.                                          design a stratified sample if she expects
                                                           13 children, 8 teens, and 16 adults to attend
    d) A school guidance counsellor arranges
                                                           the picnic?
       interviews with every fifth student on the
       alphabetized attendance roster.



                                                                         2.3 Sampling Techniques • MHR   117
5. Communication Find out, or estimate, how             9. Application The host of a call-in program
      many students attend your school. Describe             invites listeners to comment on a recent
      how you would design a systematic sample               trade by the Toronto Maple Leafs. One
      of these students. Assume that you can                 caller criticizes the host, stating that the
      survey about 20 students.                              sampling technique is not random. The host
                                                             replies: “So what? It doesn’t matter!”
 6. The newly elected Chancellor of the
                                                             a) What sampling technique is the call-in
      Galactic Federation is interested in the
                                                                show using?
      opinions of all citizens regarding economic
      conditions in the galaxy. Unfortunately, she           b) Is the caller’s statement correct? Explain.
      does not have the resources to visit every             c) Is the host’s response mathematically
      populated planet or to send delegates to                  correct? Why or why not?
      them. Describe how the Chancellor might
      organize a multi-stage sample to carry out         C
      her survey.                                       10. Look in newspapers and periodicals or on
                                                             the Internet for an article about a study
 7. Communication A community centre chooses                 involving a systematic, stratified, cluster,
      15 of its members at random and asks them              or multi-stage sample. Comment on the
      to have each member of their families                  suitability of the sampling technique and
      complete a short questionnaire.                        the validity of the study. Present your
      a) What type of sample is the community                answer in the form of a brief report. Include
         centre using?                                       any suggestions you have for improving the
      b) Are the 15 community-centre members                 study.
         a random sample of the community?              11. Inquiry/Problem Solving Design a data-
         Explain.                                            gathering method that uses a combination
      c) To what extent are the family members               of convenience and systematic sampling
         randomly chosen?                                    techniques.
 8. Application A students’ council is conducting       12. Inquiry/Problem Solving Pick a professional
      a poll of students as they enter the cafeteria.        sport that has championship playoffs each
      a) What sampling method is the student                 year.
         council using?                                      a) Design a multi-stage sample to gather
      b) Discuss whether this method is                         your schoolmates’ opinions on which
         appropriate for surveying students’                    team is likely to win the next
         opinions on                                            championship.
         i)   the new mural in the cafeteria                 b) Describe how you would carry out your
         ii) the location for the graduation prom               study and illustrate your findings.
      c) Would another sampling technique be                 c) Research the media to find what the
         better for either of the surveys in part b)?           professional commentators are
                                                                predicting. Do you think these opinions
                                                                would be more valid than the results of
                                                                your survey? Why or why not?




118     MHR • Statistics of One Variable
2.4         Bias in Surveys

  The results of a survey can be accurate only if the sample is representative of the
  population and the measurements are objective. The methods used for choosing
  the sample and collecting the data must be free from bias. Statistical bias is
  any factor that favours certain outcomes or responses and hence systematically
  skews the survey results. Such bias is often unintentional. A researcher may
  inadvertently use an unsuitable method or simply fail to recognize a factor
  that prevents a sample from being fully random. Regrettably, some people
  deliberately bias surveys in order to get the results they want. For this reason,
  it is important to understand not only how to use statistics, but also how to
  recognize the misuse of statistics.




      I N V E S T I G AT E & I N Q U I R E : Bias in a Sur vey

      1. What sampling technique is the pollster in this cartoon likely to be using?
      2. What is wrong with his survey methods? How could he improve them?
      3. Do you think the bias in this survey is intentional? Why or why not?
      4. Will this bias seriously distort the results of the survey? Explain your
         reasoning.
      5. What point is the cartoonist making about survey methods?
      6. Sketch your own cartoon or short comic strip about data management.



  Sampling bias occurs when the sampling frame does not reflect the
  characteristics of the population. Biased samples can result from problems
  with either the sampling technique or the data-collection method.




                                                                               2.4 Bias in Surveys • MHR   119
Example 1 Sampling Bias

Identify the bias in each of the following surveys and suggest how it could be
avoided.
a) A survey asked students at a high-school football game whether a fund for
    extra-curricular activities should be used to buy new equipment for the
    football team or instruments for the school band.
b)    An aid agency in a developing country wants to know what proportion of
      households have at least one personal computer. One of the agency’s staff
      members conducts a survey by calling households randomly selected from
      the telephone directory.

Solution

a) Since the sample includes only football fans, it is not representative of the
      whole student body. A poor choice of sampling technique makes the results
      of the survey invalid. A random sample selected from the entire student
      body would give unbiased results.

b) There could be a significant number of households without telephones.
      Such households are unlikely to have computers. Since the telephone survey
      excludes these households, it will overestimate the proportion of households
      that have computers. By using a telephone survey as the data-collection
      method, the researcher has inadvertently biased the sample. Visiting
      randomly selected households would give a more accurate estimate of the
      proportion that have computers. However, this method of data collection
      would be more time-consuming and more costly than a telephone survey.


Non-response bias occurs when particular groups are under-represented in a
survey because they choose not to participate. Thus, non-response bias is a form
of sampling bias.

Example 2 Non-Response Bias

A science class asks every fifth student entering the cafeteria to answer a survey
on environmental issues. Less than half agree to complete the questionnaire.
The completed questionnaires show that a high proportion of the respondents
are concerned about the environment and well-informed about environmental
issues. What bias could affect these results?

Solution
The students who chose not to participate in the survey are likely to be those
least interested in environmental issues. As a result, the sample may not be
representative of all the students at the school.


120     MHR • Statistics of One Variable
To avoid non-response bias, researchers must ensure that the sampling process is
truly random. For example, they could include questions that identify members of
particular groups to verify that they are properly represented in the sample.

Measurement bias occurs when the data-collection method consistently either
under- or overestimates a characteristic of the population. While random errors
tend to cancel out, a consistent measurement error will skew the results of a
survey. Often, measurement bias results from a data-collection process that affects
the variable it is measuring.

Example 3 Measurement Bias

Identify the bias in each of the following surveys and suggest how it could be avoided.
a)   A highway engineer suggests that an economical way to survey traffic speeds on
     an expressway would be to have the police officers who patrol the highway record
     the speed of the traffic around them every half hour.
b)   As part of a survey of the “Greatest Hits of All Time,” a radio station asks its
     listeners: Which was the best song by the Beatles?
     i)   Help!                   ii) Nowhere Man
     iii) In My Life              iv) Other:
c)   A poll by a tabloid newspaper includes the question: “Do you favour the proposed
     bylaw in which the government will dictate whether you have the right to smoke
     in a restaurant?”

Solution

a) Most drivers who are speeding will slow down when they see a police cruiser. A
     survey by police cruisers would underestimate the average traffic speed. Here, the
     data-collection method would systematically decrease the variable it is measuring.
     A survey by unmarked cars or hidden speed sensors would give more accurate
     results.

b) The question was intended to remind listeners of some of the Beatles’ early
     recordings that might have been overshadowed by their later hits. However, some
     people will choose one of the suggested songs as their answer even though they
     would not have thought of these songs without prompting. Such leading questions
     usually produce biased results. The survey would more accurately determine
     listeners’ opinions if the question did not include any suggested answers.

c) This question distracts attention from the real issue, namely smoking in
     restaurants, by suggesting that the government will infringe on the respondents’
     rights. Such loaded questions contain wording or information intended to
     influence the respondents’ answers. A question with straightforward neutral
     language will produce more accurate data. For example, the question could read
     simply: “Should smoking in restaurants be banned?”

                                                                              2.4 Bias in Surveys • MHR   121
Response bias occurs when participants in a survey deliberately give false           Project
or misleading answers. The respondents might want to influence the results            Prep
unduly, or they may simply be afraid or embarrassed to answer sensitive
questions honestly.                                                                  When gathering
                                                                                     data for your
                                                                                     statistics project, you
Example 4 Response Bias                                                              will need to ensure
                                                                                     that the sampling
A teacher has just explained a particularly difficult concept to her class and        process is free from
wants to check that all the students have grasped this concept. She realizes         bias.
that if she asks those who did not understand to put up their hands, these
students may be too embarrassed to admit that they could not follow the
lesson. How could the teacher eliminate this response bias?

Solution

The teacher could say: “This material is very difficult. Does anyone want me
to go over it again?” This question is much less embarrassing for students to
answer honestly, since it suggests that it is normal to have difficulty with the
material. Better still, she could conduct a survey that lets the students answer
anonymously. The teacher could ask the students to rate their understanding
on a scale of 1 to 5 and mark the ratings on slips of paper, which they would
deposit in a box. The teacher can then use these ballots to decide whether to
review the challenging material at the next class.


As the last two examples illustrate, careful wording of survey questions is
essential for avoiding bias. Researchers can also use techniques such as follow-up
questions and guarantees of anonymity to eliminate response bias. For a study to
be valid, all aspects of the sampling process must be free from bias.


   Key Concepts

   • Sampling, measurement, response, and non-response bias can all invalidate
     the results of a survey.

   • Intentional bias can be used to manipulate statistics in favour of a certain
     point of view.

   • Unintentional bias can be introduced if the sampling and data-collection
     methods are not chosen carefully.

   • Leading and loaded questions contain language that can influence the
     respondents’ answers.




122   MHR • Statistics of One Variable
Communicate Your Understanding

    1. Explain the difference between a measurement bias and a sampling bias.

    2. Explain how a researcher could inadvertently bias a study.

    3. Describe how each of the following might use intentional bias
       a) the media
       b) a marketing department
       c) a lobby group




Practise                                               3. Communication Reword each of the following
                                                           questions to eliminate the measurement bias.
A                                                          a) In light of the current government’s weak
1. Classify the bias in each of the following                 policies, do you think that it is time for a
    scenarios.                                                refreshing change at the next federal
    a) Members of a golf and country club are                 election?
       polled regarding the construction of a              b) Do you plan to support the current
       highway interchange on part of their golf              government at the next federal election,
       course.                                                in order that they can continue to
    b) A group of city councillors are asked                  implement their effective policies?
       whether they have ever taken part in                c) Is first-year calculus as brutal as they say?
       an illegal protest.
                                                           d) Which of the following is your favourite
    c) A random poll asks the following                       male movie star?
       question: “The proposed casino will
                                                              i)   Al Pacino             ii) Keanu Reeves
       produce a number of jobs and economic
       activity in and around your city, and it               iii) Robert DeNiro         iv) Jack Nicholson
       will also generate revenue for the                     v) Antonio Banderas vi) Other:
       provincial government. Are you in favour            e) Do you think that fighting should be
       of this forward-thinking initiative?”                  eliminated from professional hockey so
    d) A survey uses a cluster sample of Toronto              that skilled players can restore the high
       residents to determine public opinion on               standards of the game?
       whether the provincial government should
       increase funding for the public transit.        B
                                                       4. Communication
Apply, Solve, Communicate                                  a) Write your own example of a leading
                                                              question and a loaded question.
2. For each scenario in question 1, suggest how
    the survey process could be changed to                 b) Write an unbiased version for each
    eliminate bias.                                           of these two questions.




                                                                               2.4 Bias in Surveys • MHR   123
ACHIEVEMENT CHECK                                             6. Application A talk-show host conducts an
                                                                         on-air survey about re-instituting capital
  Knowledge/       Thinking/Inquiry/
 Understanding      Problem Solving
                                       Communication   Application       punishment in Canada. Six out of ten callers
                                                                         voice their support for capital punishment.
 5. A school principal wants to survey data-
                                                                         The next day, the host claims that 60% of
      management students to determine
                                                                         Canadians are in favour of capital
      whether having computer Internet access
                                                                         punishment. Is this claim statistically valid?
      at home improves their success in this
                                                                         Explain your reasoning.
      course.
      a) What type of sample would you                               C
         suggest? Why? Describe a technique                          7. a) Locate an article from a newspaper,
         for choosing the sample.                                           periodical, or Internet site that involves
      b) The following questions were drafted                               a study that contains bias.
         for the survey questionnaire. Identify                          b) Briefly describe the study and its
         any bias in the questions and suggest a                            findings.
         rewording to eliminate the bias.                                c) Describe the nature of the bias inherent
         i)      Can your family afford high-speed                          in the study.
                 Internet access?                                        d) How has this bias affected the results of
         ii) Answer the question that follows                               the study?
                 your mark in data management.                           e) Suggest how the study could have
                 Over 80%: How many hours per                               eliminated the bias.
                 week do you spend on the Internet
                 at home?                                            8. Inquiry/Problem Solving Do you think that
                 60−80%: Would home Internet                             the members of Parliament are a
                 access improve your mark in data                        representative sample of the population?
                 management ?                                            Why or why not?
                 Below 60%: Would increased
                 Internet access at school improve
                 your mark in data management?
      c) Suppose the goal is to convince the
         school board that every data-
         management student needs daily access
         to computers and the Internet in the
         classroom. How might you alter your
         sampling technique to help achieve the
         desired results in this survey? Would
         these results still be statistically valid?




124     MHR • Statistics of One Variable
2.5         Measures of Central
            Tendency
  It is often convenient to use a central value
  to summarize a set of data. People
  frequently use a simple arithmetic average
  for this purpose. However, there are several
  different ways to find values around which a
  set of data tends to cluster. Such values are
  known as measures of central tendency.




      I N V E S T I G A T E & I N Q U I R E : N o t Yo u r A v e r a g e A v e r a g e

      François is a NHL hockey player whose first major-league contract is up
      for renewal. His agent is bargaining with the team’s general manager.

      Agent: Based on François’ strong performance, we can accept no less than
      the team’s average salary.
      Manager: Agreed, François deserves a substantial increase. The team is willing
      to pay François the team’s average salary, which is $750 000 a season.
      Agent: I’m certain that we calculated the average salary to be $1 000 000 per
      season. You had better check your arithmetic.
      Manager: There is no error, my friend. Half of the players earn $750 000 or
      more, while half of the players receive $750 000 or less. $750 000 is a fair offer.

      This table lists the current salaries for the team.
          Salary ($)     Number of Players
            300 000                2
            500 000                3
            750 000                8
            900 000                6
          1 000 000                2
          1 500 000                1
          3 000 000                1
          4 000 000                1

       1. From looking at the table, do you think the agent or the manager is correct?
          Explain why.


                                                                        2.5 Measures of Central Tendency • MHR   125
2. Find the mean salary for the team. Describe how you calculated this
        amount.
      3. Find the median salary. What method did you use to find it?
      4. Were the statements by François’ agent and the team manager correct?
      5. Explain the problem with the use of the term average in these
        negotiations.


In statistics, the three most commonly used measures of central tendency are
the mean, median, and mode. Each of these measures has its particular
advantages and disadvantages for a given set of data.

A mean is defined as the sum of the values of a variable divided by the number
of values. In statistics, it is important to distinguish between the mean of a
population and the mean of a sample of that population. The sample mean will
approximate the actual mean of the population, but the two means could have
different values. Different symbols are used to distinguish the two kinds of
means: The Greek letter mu, µ, represents a population mean, while −, read as
                                                                         x
“x-bar,” represents a sample mean. Thus,

    x1 + x2 + … + xN                   x1 + x2 + … + xn
                                   − = ᎏᎏ
µ = ᎏᎏ and                         x
            N                                  n
    ∑x                                 ∑x
  =ᎏ                                 =ᎏ
    N                                   n

where ∑x is the sum of all values of X in the population or sample, N is the
number of values in the entire population, and n is the number of values in a
sample. Note that ∑ , the capital Greek letter sigma, is used in mathematics
as a symbol for “the sum of.” If no limits are shown above or below the sigma,
the sum includes all of the data.

Usually, the mean is what people are referring to when they use the term
average in everyday conversation.

The median is the middle value of the data when they are ranked from highest
to lowest. When there is an even number of values, the median is the midpoint
between the two middle values.

The mode is the value that occurs most frequently in a distribution. Some
distributions do not have a mode, while others have several.

Some distributions have outliers, which are values distant from the majority of
the data. Outliers have a greater effect on means than on medians. For example,
the mean and median for the salaries of the hockey team in the investigation
have substantially different values because of the two very high salaries for the
team’s star players.

126    MHR • Statistics of One Variable
Example 1 Determining Mean, Median, and Mode

Two classes that wrote the same physics examination had the following results.
Class A    71   82   55   76   66   71   90   84   95   64   71   70   83   45   73   51    68
Class B    54   80   12   61   73   69   92   81   80   61   75   74   15   44   91   63    50   84

a)   Determine the mean, median, and mode for each class.
b)   Use the measures of central tendency to compare the performance of the
     two classes.
c)   What is the effect of any outliers on the mean and median?

Solution
                                                         www.mcgrawhill.ca/links/MDM12
a) For class A, the mean is
     − ∑x
     x=ᎏ                                           For more information about means, medians, and
          n                                          modes, visit the above web site and follow the
         71 + 82 + … + 68                            links. For each measure, give an example of a
      = ᎏᎏ                                         situation where that measure is the best indicator
                17
                                                                 of the centre of the data.
        1215
      = ᎏᎏ
          17
      = 71.5
     When the marks are ranked from highest to lowest, the middle value is 71.
     Therefore, the median mark for class A is 71. The mode for class A is also 71
     since this mark is the only one that occurs three times.
                                             54 + 80 + … + 84
     Similarly, the mean mark for class B is ᎏᎏ = 64.4. When the marks
                                                    18
     are ranked from highest to lowest, the two middle values are 69 and 73, so the
                                 69 + 73
     median mark for class B is ᎏ = 71. There are two modes since the values 61
                                    2
     and 80 both occur twice. However, the sample is so small that all the values occur
     only once or twice, so these modes may not be a reliable measure.

b) Although the mean score for class A is significantly higher than that for class B, the
     median marks for the two classes are the same. Notice that the measures of central
     tendency for class A agree closely, but those for class B do not.

c) A closer examination of the raw data shows that, aside from the two extremely low
     scores of 15 and 12 in class B, the distributions are not all that different. Without
     these two outlying marks, the mean for class B would be 70.1, almost the same as
     the mean for class A. Because of the relatively small size of class B, the effect of the
     outliers on its mean is significant. However, the values of these outliers have no
     effect on the median for class B. Even if the two outlying marks were changed to
     numbers in the 60s, the median mark would not change because it would still be
     greater than the two marks.

                                                                       2.5 Measures of Central Tendency • MHR   127
The median is often a better measure of central tendency than the mean for small
data sets that contain outliers. For larger data sets, the effect of outliers on the
mean is less significant.


Example 2 Comparing Samples to a Population

Compare the measures of central tendency for each class in Example 1 to those
for all the students who wrote the physics examination.

Solution 1    Using a Graphing Calculator

Use the STAT EDIT menu to check that lists L1 and L2 are clear. Then, enter the data
for class A in L1 and the data for class B in L2. Next, use the augment( function
from the LIST OPS menu to combine L1 and L2, and store the result in L3.
You can use the mean( and median( functions from the LIST MATH menu
to find the mean and median for each of the three lists. You can also find these
measures by using the 1-Var Stats command from the STAT CALC menu. To find
the modes, sort the lists with the SortA( function from the LIST OPS menu, and
then scroll down through the lists to find the most frequent values. Alternatively,
you can use STAT PLOT to display a histogram for each list and read the x-values
for the tallest bars with the TRACE instruction.




Note that the mean for class A overestimates the population mean, while the
mean for class B underestimates it. The measures of central tendency for class
A are reasonably close to those for the whole population of students who wrote
the physics examination, but the two sets of measures are not identical. Because
both of the low-score outliers happen to be in class B, it is a less representative
sample of the population.

Solution 2    Using a Spreadsheet
Enter the data for class A and class B in separate columns. The AVG and MEAN
functions in Corel® Quattro® Pro will calculate the mean for any range of cells
you specify, as will the AVERAGE function in Microsoft® Excel.
In both spreadsheets, you can use the MEDIAN, and MODE functions to find the
median and mode for each class and for the combined data for both classes. Note
that all these functions ignore any blank cells in a specified range. The MODE
function reports only one mode even if the data have two or more modes.



128   MHR • Statistics of One Variable
Solution 3   Using Fathom™

Drag the case table icon to the workspace and name the attribute for the first
column Marks. Enter the data for class A and change the name of the collection
from Collection1 to ClassA. Use the same method to enter the marks for class B
into a collection called ClassB. To create a collection with the combined data,
first open another case table and name the collection Both. Then, go back to the
class A case table and use the Edit menu to select all cases and then copy them.
Return to the Both case table and select Paste Cases from the Edit menu. Copy
the cases from the class B table in the same way.
                                                                                   Project
Now, right-click on the class A collection to open the inspector. Click the        Prep
Measures tab, and create Mean, Median, and Mode measures. Use the Edit             In your statistics
Formula menu to enter the formulas for these measures. Use the same                project, you may find
procedure to find the mean, median, and mode for the other two                      measures of central
collections. Note from the screen below that Fathom™ uses a complicated            tendency useful for
formula to find modes. See the Help menu or the Fathom™ section of                  describing your data.
Appendix B for details.




                                                               2.5 Measures of Central Tendency • MHR   129
Chapter 8 discusses a method for calculating how representative of a population
a sample is likely to be.

Sometimes, certain data within a set are more significant than others. For
example, the mark on a final examination is often considered to be more
important than the mark on a term test for determining an overall grade for
a course. A weighted mean gives a measure of central tendency that reflects
the relative importance of the data:


−     w1x1 + w2x2 + … + wnxn
x w = ᎏᎏᎏ
         w1 + w2 + … + wn
      ∑wi xi
    = ᎏi
       ∑wi
         i


where ∑ wi xi is the sum of the weighted values and ∑ wi is the sum of the various
       i                                            i
weighting factors.

Weighted means are often used in calculations of indices.


Example 3 Calculating a Weighted Mean

The personnel manager for Statsville Marketing Limited considers five criteria
when interviewing a job applicant. The manager gives each applicant a score
between 1 and 5 in each category, with 5 as the highest score. Each category has
a weighting between 1 and 3. The following table lists a recent applicant’s scores
and the company’s weighting factors.
 Criterion                  Score, xi      Weighting Factor, wi
Education                       4                   2
Job experience                  2                   2
Interpersonal skills            5                   3
Communication skills            5                   3
References                      4                   1

a)    Determine the weighted mean score for this job applicant.
b)    How does this weighted mean differ from the unweighted mean?
c)    What do the weighting factors indicate about the company’s hiring
      priorities?




130     MHR • Statistics of One Variable
Solution

a) To compute the weighted mean, find the sum of the products of each score
   and its weighting factor.
         ∑wi xi
   −
   xw = ᎏi
          ∑ wi
            i
         2(4) + 2(2) + 3(5) + 3(5) + (1)4
       = ᎏᎏᎏᎏ
                2+2+3+3+1
         46
       = ᎏᎏ
         11
       = 4.2

   Therefore, this applicant had a weighted-mean score of approximately 4.2.

b) The unweighted mean is simply the sum of unweighted scores divided by 5.
   − ∑x
   x=ᎏ
       n
      4+2+5+5+4
    = ᎏᎏ
          5
    =4

   Without the weighting factors, this applicant would have a mean score of
   4 out of 5.

c) Judging by these weighting factors, the company places a high importance
   on an applicant’s interpersonal and communication skills, moderate
   importance on education and job experience, and some, but low, importance
   on references.

When a set of data has been grouped into intervals, you can approximate
the mean using the formula

    ∑f m                 ∑f m
  ⋅ i i i
 µ= ᎏ                − ⋅ i i i
                     x= ᎏ
     ∑ fi                 ∑ fi
        i                   i


where mi is the midpoint value of an interval and fi the frequency for that
interval.

You can estimate the median for grouped data by taking the midpoint of the
interval within which the median is found. This interval can be found by
analysing the cumulative frequencies.




                                                                2.5 Measures of Central Tendency • MHR   131
Example 4 Calculating the Mean and Median for Grouped Data

A group of children were asked how many hours a day                  Number of Hours            Number of Children, fi
they spend watching television. The table at the right                    0−1                            1
summarizes their responses.
                                                                          1−2                            4
a)    Determine the mean and median number of hours                       2−3                            7
      for this distribution.                                              3−4                            3
b)    Why are these values simply approximations?                         4−5                            2
                                                                          5− 6                           1

Solution
a) First, find the midpoints and cumulative frequencies for the intervals. Then, use the
      midpoints and the frequencies for the intervals to calculate an estimate for the mean.
         Number of           Midpoint,     Number of       Cumulative                fixi
           Hours                xi         Children, fi    Frequency
            0−1                  0.5             1               1                    0.5
            1−2                  1.5             4               5                    6
            2−3                  2.5             7              12                   17.5
            3−4                  3.5             3              15                   10.5
            4−5                  4.5             2              17                    9
            5−6                  5.5             1              18                    5.5
                                            ∑ fi = 18                            ∑ fi xi = 49
                                             i                                   i

          ∑f x
      − ⋅ i i i
      x= ᎏ
          ∑ fi
            i
           49
        = ᎏᎏ
           18
        = 2.7

      Therefore, the mean time the children spent watching television is approximately
      2.7 h a day.

      To determine the median, you must identify the interval in which the middle value
      occurs. There are 18 data values, so the median is the mean of the ninth and tenth
      values. According to the cumulative-frequency column, both of these occur within
      the interval of 2−3 h. Therefore, an approximate value for the median is 2.5 h.

b) These values for the mean and median are approximate because you do not know
      where the data lie within each interval. For example, the child whose viewing time
      is listed in the first interval could have watched anywhere from 0 to 60 min of
      television a day. If the median value is close to one of the boundaries of the
      interval, then taking the midpoint of the interval as the median could give an error
      of almost 30 min.



132     MHR • Statistics of One Variable
Key Concepts

  • The three principal measures of central tendency are the mean, median, and
    mode. The measures for a sample can differ from those for the whole population.

  • The mean is the sum of the values in a set of data divided by the number of
    values in the set.

  • The median is the middle value when the values are ranked in order. If there
    are two middle values, then the median is the mean of these two middle values.

  • The mode is the most frequently occurring value.

  • Outliers can have a dramatic effect on the mean if the sample size is small.

  • Α weighted mean can be a useful measure when all the data are not of equal
    significance.

  • For data grouped into intervals, the mean and median can be estimated using
    the midpoints and frequencies of the intervals.

  Communicate Your Understanding

    1. Describe a situation in which the most useful measure of central tendency is
       a) the mean         b) the median       c)   the mode

    2. Explain why a weighted mean would be used to calculate an index such as the
       consumer price index.
                                       ∑f m
                               − ⋅ i i i
    3. Explain why the formula x = ᎏ gives only an approximate value for the
                                        ∑ fi
                                         i
       mean for grouped data.


Practise                                                  c) List a set of eight values that has two
                                                               modes.
A                                                         d) List a set of eight values that has a
1. For each set of data, calculate the mean,                   median that is one of the data values.
    median, and mode.
    a) 2.4 3.5 1.9 3.0 3.5 2.4 1.6 3.8 1.2             Apply, Solve, Communicate
       2.4 3.1 2.7 1.7 2.2 3.3
                                                        3. Stacey got 87% on her term work in
    b) 10 15 14 19 18 17 12 10 14 15 18                   chemistry and 71% on the final examination.
       20 9 14 11 18                                      What will her final grade be if the term
2. a) List a set of eight values that has no              mark counts for 70% and the final
       mode.                                              examination counts for 30%?
    b) List a set of eight values that has a
       median that is not one of the data values.

                                                                 2.5 Measures of Central Tendency • MHR   133
4. Communication Determine which measure of                8. Application An academic award is to be
      central tendency is most appropriate for                granted to the student with the highest
      each of the following sets of data. Justify             overall score in four weighted categories.
      your choice in each case.                               Here are the scores for the three finalists.
      a) baseball cap sizes                                    Criterion            Weighting Paulo Janet Jamie
      b) standardized test scores for 2000 students            Academic
                                                                achievement             3      4     3       5
      c) final grades for a class of 18 students
                                                               Extra-curricular
      d) lifetimes of mass-produced items, such as              activities              2      4     4       4
         batteries or light bulbs                              Community
                                                                service                 2      2     5       3
 B                                                             Interview                1      5     5       4
 5. An interviewer rates candidates out of 5 for
                                                               a) Calculate each student’s mean score
      each of three criteria: experience, education,
                                                                  without considering the weighting factors.
      and interview performance. If the first two
      criteria are each weighted twice as much as             b) Calculate the weighted-mean score for
      the interview, determine which of the                       each student.
      following candidates should get the job.                 c) Who should win the award? Explain.
       Criterion       Nadia          Enzo       Stephan
                                                            9. Al, a shoe salesman, needs to restock his
       Experience         4                5         5
                                                              best-selling sandal. Here is a list of the sizes
       Education          4                4         3
                                                              of the pairs he sold last week. This sandal
       Interview          4                3         4        does not come in half-sizes.

 6. Determine the effect the two outliers have                  10    7 6         8 7 10 5 10 7          9
      on the mean mark for all the students in                  11    4 6         7 10 10 7 8 10         7
                                                                 9    7 10        4 7 7 10 11
      Example 2. Explain why this effect is
      different from the effect the outliers had on            a) Determine the three measures of central
      the mean mark for class B.                                  tendency for these sandals.
 7. Application The following table shows the                 b) Which measure has the greatest
      grading system for Xabbu’s calculus course.                 significance for Al? Explain.
       Term Mark                      Overall Mark             c) What other value is also significant?
       Knowledge and                  Term mark 70%           d) Construct a histogram for the data.
       understanding (K/U) 35%        Final examination           What might account for the shape of
       Thinking, inquiry, problem     30%                         this histogram?
       solving (TIPS) 25%
       Communication (C) 15%                               10. Communication Last year, the mean number
       Application (A) 25%                                    of goals scored by a player on Statsville’s
                                                              soccer team was 6.
      a) Determine Xabbu’s term mark if he                     a) How many goals did the team score last
         scored 82% in K/U, 71% in TIPS, 85%                      year if there were 15 players on the team?
         in C, and 75% in A.
                                                               b) Explain how you arrived at the answer for
      b) Determine Xabbu’s overall mark if he                     part a) and show why your method works.
         scored 65% on the final examination.


134     MHR • Statistics of One Variable
11. Inquiry/Problem Solving The following table            f) Determine a mean, median, and mode
   shows the salary structure of Statsville Plush              for the grouped data. Explain any
   Toys, Inc. Assume that salaries exactly on an               differences between these measures
   interval boundary have been placed in the                   and the ones you calculated in part a).
   higher interval.
                                                      13. The modal interval for grouped data is
     Salary Range ($000)        Number of Employees        the interval that contains more data than
            20−30                       12                 any other interval.
            30−40                       24                 a) Determine the modal interval(s) for
            40−50                       32                     your data in part d) of question 12.
            50−60                       19
                                                           b) Is the modal interval a useful measure
            60−70                        9                     of central tendency for this particular
            70−80                        3                     distribution? Why or why not?
            80−90                        0
            90−100                       1            14. a) Explain the effect outliers have on the
                                                               median of a distribution. Use examples
    a) Determine the approximate mean salary
                                                               to support your explanation.
       for an employee of this firm.
                                                           b) Explain the effect outliers have on the
    b) Determine the approximate median
                                                               mode of a distribution. Consider
       salary.
                                                               different cases and give examples of each.
    c) How much does the outlier influence the
       mean and median salaries? Use                   C
       calculations to justify your answer.           15. The harmonic mean is defined as

                                                           ΂      ΃
                                                                1 –1
                                                            ∑ ᎏ , where n is the number of values
12. Inquiry/Problem Solving A group of friends               i nx
                                                                 i
   and relatives get together every Sunday for             in the set of data.
   a little pick-up hockey. The ages of the 30
                                                           a) Use a harmonic mean to find the average
   regulars are shown below.
                                                               price of gasoline for a driver who bought
     22 28 32 45 48 19 20 52 50 21                             $20 worth at 65¢/L last week and
     30 46 21 38 45 49 18 25 23 46                             another $20 worth at 70¢/L this week.
     51 24 39 48 28 20 50 33 17 48
                                                           b) Describe the types of calculations for
    a) Determine the mean, median, and mode                    which the harmonic mean is useful.
       for this distribution.
                                                      16. The geometric mean is defined as
    b) Which measure best describes these                  n

       data? Explain your choice.                             x1 × x2 × … × xn
                                                           ͙ෆෆෆ , where n is the number
                                                           of values in the set of data.
    c) Group these data into six intervals and
       produce a frequency table.                          a) Use the geometric mean to find the
                                                               average annual increase in a labour
    d) Illustrate the grouped data with a
                                                               contract that gives a 4% raise the first
       frequency diagram. Explain why the
                                                               year and a 2% raise for the next three
       shape of this frequency diagram could be
                                                               years.
       typical for such groups of hockey players.
                                                           b) Describe the types of calculations for
    e) Produce a cumulative-frequency diagram.
                                                               which the geometric mean is useful.


                                                                 2.5 Measures of Central Tendency • MHR   135
2.6           Measures of Spread

  The measures of central tendency indicate
  the central values of a set of data. Often,
  you will also want to know how closely the
  data cluster around these centres.




        I N V E S T I G AT E & I N Q U I R E : S p r e a d i n a S e t o f D a t a

      For a game of basketball, a group of friends split into two randomly chosen
      teams. The heights of the players are shown in the table below.
                      Falcons                           Ravens
             Player             Height (cm)    Player            Height (cm)
           Laura                   183         Sam                  166
           Jamie                   165         Shannon              163
           Deepa                   148         Tracy                168
           Colleen                 146         Claudette            161
           Ingrid                  181         Maria                165
           Justiss                 178         Amy                  166
           Sheila                  154         Selena               166

        1. Judging by the raw data in this table, which team do you think has a
           height advantage? Explain why.
        2. Do the measures of central tendency confirm that the teams are
           mismatched? Why or why not?
        3. Explain how the distributions of heights on the two teams might give
           one of them an advantage. How could you use a diagram to illustrate
           the key difference between the two teams?



  The measures of spread or dispersion of a data set are quantities that indicate
  how closely a set of data clusters around its centre. Just as there are several
  measures of central tendency, there are also different measures of spread.

  136     MHR • Statistics of One Variable
Standard Deviation and Variance
A deviation is the difference between an individual value in a set of data and
the mean for the data.

For a population,                For a sample,
deviation = x − µ                                −
                                 deviation = x − x

The larger the size of the deviations, the greater the spread in the data. Values less
than the mean have negative deviations. If you simply add up all the deviations for
a data set, they will cancel out. You could use the sum of the absolute values of the
deviations as a measure of spread. However, statisticians have shown that a root-
mean-square quantity is a more useful measure of spread. The standard deviation
is the square root of the mean of the squares of the deviations.
The lowercase Greek letter sigma, σ, is the symbol for the standard deviation
of a population, while the letter s stands for the standard deviation of a sample.
Population standard deviation              Sample standard deviation
       ∑(x − µ)2                                       −
                                                 ∑(x − x )2
σ=   Ί๶ᎏᎏ
          N
                                           s=   Ί๶
                                                 ᎏᎏ
                                                   n−1

where N is the number of data in the population and n is the number in the
sample.

Note that the formula for s has n − 1 in the denominator instead of n. This
denominator compensates for the fact that a sample taken from a population tends
to underestimate the deviations in the population. Remember that the sample
mean, −, is not necessarily equal to the population mean, µ. Since − is the central
       x                                                            x
                                                       − than to µ. When n is large,
value of the sample, the sample data cluster closer to x
the formula for s approaches that for σ.

Also note that the standard deviation gives greater weight to the larger deviations
since it is based on the squares of the deviations.

The mean of the squares of the deviations is another useful measure. This quantity
is called the variance and is equal to the square of the standard deviation.
Population variance               Sample variance
      ∑(x − µ)2                               −
                                        ∑(x − x )2
σ 2 = ᎏᎏ                          s 2 = ᎏᎏ
         N                               n−1


Example 1 Using a Formula to Calculate Standard Deviations

Use means and standard deviations to compare the distribution of heights for
the two basketball teams listed in the table on page 136.




                                                                         2.6 Measures of Spread • MHR   137
Solution

Since you are considering the teams as two separate populations, use the
mean and standard deviation formulas for populations. First, calculate the
mean height for the Falcons.
   ∑x
µ= ᎏ
    N
   1155
 = ᎏᎏ
      7
 = 165
Next, calculate all the deviations and their squares.
 Falcons                Height (cm)     Deviation, x – µ   (x – µ)2
 Laura                     183                 18            324
 Jamie                     165                  0               0
 Deepa                     148                −17            289
 Colleen                   146                −19            361
 Ingrid                    181                 16            256
 Justiss                   178                 13            169
 Sheila                    154                −11            121
 Sum                      1155                  0          1520

Now, you can determine the standard deviation.

         ∑(x − µ)2
σ= Ί๶
    ᎏᎏ
        N

 = Ί๶
    1520
    ᎏ
      7
 = 14.7

Therefore, the Falcons have a mean height of 165 cm with a standard deviation
of 14.7 cm.

Similarly, you can determine that the Ravens also have a mean height of 165 cm,
but their standard deviation is only 2.1 cm. Clearly, the Falcons have a much
greater spread in height than the Ravens. Since the two teams have the same
mean height, the difference in the standard deviations indicates that the Falcons
have some players who are taller than any of the Ravens, but also some
players who are shorter.

If you were to consider either of the basketball teams in the example above as a
sample of the whole group of players, you would use the formula for s to calculate
the team’s standard deviation. In this case, you would be using the sample to
estimate the characteristics of a larger population. However, the teams are very
small samples, so they could have significant random variations, as the difference
in their standard deviations demonstrates.

138      MHR • Statistics of One Variable
For large samples the calculation of standard deviation can be quite tedious.     See Appendix B for more
However, most business and scientific calculators have built-in functions for      detailed information about
such calculations, as do spreadsheets and statistical software.                   technology functions and
                                                                                  keystrokes.


Example 2 Using Technology to Calculate Standard Deviations

A veterinarian has collected data on the life spans of a rare breed of cats.
Life Spans (in years)
 16   18   19    12     11   15   20   21   18   15   16   13   16   22
 18   19   17    14      9   14   15   19   20   15   15

Determine the mean, standard deviation, and the variance for these data.

Solution 1      Using a Graphing Calculator

Use the ClrList command to make sure list L1 is clear, then enter the data
into it. Use the 1-Var Stats command from the STAT CALC menu to calculate
a set of statistics including the mean and the standard deviation. Note that
the calculator displays both a sample standard deviation, Sx, and a
population standard deviation, σx. Use Sx since you are dealing with a
sample in this case. Find the variance by calculating the square of Sx.

The mean life span for this breed of cats is about 16.3 years with a standard
deviation of 3.2 years and a variance of 10.1. Note that variances are usually
stated without units. The units for this variance are years squared.

Solution 2      Using a Spreadsheet

Enter the data into your spreadsheet
program. With Corel® Quattro® Pro, you
can use the AVG, STDS, and VARS functions
to calculate the mean, sample standard
deviation, and sample variance. In
Microsoft® Excel, the equivalent functions
are AVERAGE, STDEV, and VAR.




                                                                          2.6 Measures of Spread • MHR   139
Solution 3    Using Fathom™

Drag a new case table onto the workspace, name the attribute for the first
column Lifespan, and enter the data. Right-click to open the inspector, and click
the Measures tab. Create Mean, StdDev, and Variance measures and select the
formulas for the mean, standard deviation, and variance from the Edit Formula/
Functions/Statistical/One Attribute menu.




If you are working with grouped data, you can estimate the standard                 Project
deviation using the following formulas.                                             Prep

For a population,                        For a sample,                              In your statistics
       ∑fi (mi − µ)2                                    −
                                               ∑fi(mi − x )2                        project, you may
  ⋅
σ = ᎏᎏΊ๶    N
                                           ⋅
                                            Ί๶
                                         s = ᎏᎏ
                                                  n−1
                                                                                    wish to use an
                                                                                    appropriate measure
where fi is the frequency for a given interval and mi is the midpoint of the        of spread to describe
interval. However, calculating standard deviations from raw, ungrouped              the distribution of
data will give more accurate results.                                               your data.


Quartiles and Interquartile Ranges
Quartiles divide a set of ordered data into four groups with equal numbers of
values, just as the median divides data into two equally sized groups. The three
“dividing points” are the first quartile (Q1), the median (sometimes called the
second quartile or Q2), and the third quartile (Q3). Q1 and Q3 are the medians
of the lower and upper halves of the data.
                             Interquartile Range


Lowest Datum       First Quartile   Median Third Quartile Highest Datum
                          Q1          Q2       Q3




140   MHR • Statistics of One Variable
Recall that when there are an even number of data, you take the midpoint
between the two middle values as the median. If the number of data below the
median is even, Q1 is the midpoint between the two middle values in this half
of the data. Q3 is determined in a similar way.

The interquartile range is Q3 − Q1, which is the range of the middle half of the
data. The larger the interquartile range, the larger the spread of the central half
of the data. Thus, the interquartile range provides a measure of spread. The
semi-interquartile range is one half of the interquartile range. Both these
ranges indicate how closely the data are clustered around the median.

A box-and-whisker plot of the data illustrates these measures. The box shows
the first quartile, the median, and the third quartile. The ends of the “whiskers”
represent the lowest and highest values in the set of data. Thus, the length of the
box shows the interquartile range, while the left whisker shows the range of the
data below the first quartile, and the right whisker shows the range above the
third quartile.

           Interquartile Range
Lowest Datum                    Highest Datum


             Q1            Q3
                  Median
                   (Q2)


A modified box-and-whisker plot is often used when the data contain outliers.
By convention, any point that is at least 1.5 times the box length away from the
box is classified as an outlier. A modified box-and-whisker plot shows such
outliers as separate points instead of including them in the whiskers. This method
usually gives a clearer illustration of the distribution.

           Interquartile Range
Lowest Datum                    Highest Datum


             Q1            Q3         Outliers
                  Median
                   (Q2)




                                                                         2.6 Measures of Spread • MHR   141
Example 3 Determining Quartiles and Interquartile Ranges

A random survey of people at a science-fiction convention asked them how
many times they had seen Star Wars. The results are shown below.
     3     4   2   8   10    5   1   15       5   16       6       3       4       9       12    3    30        2    10    7

a)       Determine the median, the first and third quartiles, and the interquartile and
         semi-interquartile ranges. What information do these measures provide?
b)       Prepare a suitable box plot of the data.
c)       Compare the results in part a) to those from last year’s survey, which found
         a median of 5.1 with an interquartile range of 8.0.

Solution 1         Using Pencil and Paper

a) First, put the data into numerical order.

           1   2   2    3   3    3   4    4       5    5       6       7       8       9    10       10    12       15    16   30

         The median is either the middle datum or, as in this case, the mean of the
         two middle data:
                  5+6
         median = ᎏ
                    2
                = 5.5

         The median value of 5.5 indicates that half of the people surveyed had seen
         Star Wars less than 5.5 times and the other half had seen it more than 5.5
         times.

         To determine Q1, find the median of the lower half of the data. Again, there
         are two middle values, both of which are 3. Therefore, Q1 = 3.

         Similarly, the two middle values of the upper half of the data are both 10, so
         Q3 = 10.

         Since Q1 and Q3 are the boundaries for the central half of the data, they show
         that half of the people surveyed have seen Star Wars between 3 and 10 times.

         Q3 − Q1 = 10 − 3
                 =7

         Therefore, the interquartile range is 7. The semi-interquartile range is half
         this value, or 3.5. These ranges indicate the spread of the central half of the
         data.




142        MHR • Statistics of One Variable
b) The value of 30 at the end of the ordered data is clearly an outlier. Therefore,
   a modified box-and-whisker plot will best illustrate this set of data.




      0      5     10     15     20       25   30
                  Viewings of Star Wars


c) Comparing the two surveys shows that the median number of viewings is
   higher this year and the data are somewhat less spread out.


Solution 2       Using a Graphing Calculator
a) Use the STAT EDIT menu to enter the data into a list. Use the 1-Var Stats
   command from the CALC EDIT menu to calculate the statistics for your list.
   Scroll down to see the values for the median, Q1, and Q3. Use the values for
   Q1 and Q3 to calculate the interquartile and semi-interquartile ranges.




b) Use STAT PLOT to select a modified box plot of your list. Press GRAPH to
   display the box-and-whisker plot and adjust the window settings, if necessary.




Solution 3       Using Fathom™

a) Drag a new case table onto the workspace, create an attribute called
   StarWars, and enter your data. Open the inspector and create Median, Q1,
   Q3, and IQR measures. Use the Edit Formula/Functions/Statistical/One
   Attribute menu to enter the formulas for the median, quartiles, and
   interquartile range.




                                                                           2.6 Measures of Spread • MHR   143
b) Drag the graph icon onto the workspace, then drop the StarWars attribute
      on the x-axis of the graph. Select Box Plot from the drop-down menu in the
      upper right corner of the graph.




Although a quartile is, strictly speaking, a single value, people sometimes speak
of a datum being within a quartile. What they really mean is that the datum is in
the quarter whose upper boundary is the quartile. For example, if a value x1 is
“within the first quartile,” then x1 ≤ Q1. Similarly, if x2 is “within the third
quartile,” then the median ≤ x2 ≤ Q3.


Example 4 Classifying Data by Quartiles

In a survey of low-risk mutual funds, the median annual yield was 7.2%, while Q1
was 5.9% and Q3 was 8.3%. Describe the following funds in terms of quartiles.
 Mutual Fund               Annual Yield (%)
 XXY Value                        7.5
 YYZ Dividend                     9.0
 ZZZ Bond                         7.2

Solution

The yield for the XXY Value fund was between the median and Q3. You might
see this fund described as being in the third quartile or having a third-quartile
yield.

YYZ Dividend’s yield was above Q3. This fund might be termed a fourth- or
top-quartile fund.

ZZZ Bond’s yield was equal to the median. This fund could be described as a
median fund or as having median performance.


144     MHR • Statistics of One Variable
Percentiles
Percentiles are similar to quartiles, except that percentiles divide the data into
100 intervals that have equal numbers of values. Thus, k percent of the data are
less than or equal to kth percentile, Pk , and (100 − k) percent are greater than or
equal to Pk. Standardized tests often use percentiles to convert raw scores to
scores on a scale from 1 to 100. As with quartiles, people sometimes use the
term percentile to refer to the intervals rather than their boundaries.


Example 5 Percentiles

An audio magazine tested 60 different models of speakers and gave each one an
overall rating based on sound quality, reliability, efficiency, and appearance. The
raw scores for the speakers are listed in ascending order below.
 35      47   57    62   64    67   72   76    83   90
 38      50   58    62   65    68   72   78    84   91
 41      51   58    62   65    68   73   79    86   92
 44      53   59    63   66    69   74   81    86   94
 45      53   60    63   67    69   75   82    87   96
 45      56   62    64   67    70   75   82    88   98

a)    If the Audio Maximizer Ultra 3000 scored at the 50th percentile, what was
      its raw score?
b)    What is the 90th percentile for these data?
c)    Does the SchmederVox’s score of 75 place it at the 75th percentile?

Solution

a) Half of the raw scores are less than or equal to the 50th percentile and half
      are greater than or equal to it. From the table, you can see that 67 divides
      the data in this way. Therefore, the Audio Maximizer Ultra 3000 had a raw
      score of 67.

b) The 90th percentile is the boundary between the lower 90% of the scores
      and the top 10%. In the table, you can see that the top 10% of the scores
      are in the 10th column. Therefore, the 90th percentile is the midpoint
      between values of 88 and 90, which is 89.

c) First, determine 75% of the number of raw scores.
      60 × 75% = 45
     There are 45 scores less than or equal to the 75th percentile. Therefore, the
     75th percentile is the midpoint between the 45th and 46th scores. These
     two scores are 79 and 81, so the 75th percentile is 80. The SchmederVox’s
     score of 75 is below the 75th percentile.



                                                                          2.6 Measures of Spread • MHR   145
Z-Scores
A z-score is the number of standard deviations that a datum is from the mean.
You calculate the z-score by dividing the deviation of a datum by the standard
deviation.
For a population,                        For a sample,
    x−µ                                      x−x −
z= ᎏ                                     z= ᎏ
      σ                                        s

Variable values below the mean have negative z-scores, values above the mean
have positive z-scores, and values equal to the mean have a zero z-score.
Chapter 8 describes z-scores in more detail.


Example 6 Determining Z-Scores

Determine the z-scores for the Audio Maximizer Ultra 3000 and SchmederVox
speakers.

Solution

You can use a calculator, spreadsheet, or statistical software to determine that
the mean is 68.1 and the standard deviation is 15.2 for the speaker scores in
Example 4.




Now, use the mean and standard deviation to calculate the z-scores for the
two speakers.

For the Audio Maximizer Ultra 3000,
    x−x −
z= ᎏ
      s
    67 − 68.1
 = ᎏᎏ
       15.2
 = −0.072




146   MHR • Statistics of One Variable
For the SchmederVox,
    x−x −
z= ᎏ
      s
    75 − 68.1
 = ᎏᎏ
       15.2
 = 0.46

The Audio Maximizer Ultra 3000 has a z-score of −0.072, indicating that it is
approximately 7% of a standard deviation below the mean. The SchmederVox
speaker has a z-score of 0.46, indicating that it is approximately half a standard
deviation above the mean.



   Key Concepts

   • The variance and the standard deviation are measures of how closely a set of
     data clusters around its mean. The variance and standard deviation of a sample
     may differ from those of the population the sample is drawn from.

   • Quartiles are values that divide a set of ordered data into four intervals with
     equal numbers of data, while percentiles divide the data into 100 intervals.

   • The interquartile range and semi-interquartile range are measures of how
     closely a set of data clusters around its median.

   • The z-score of a datum is a measure of how many standard deviations the
     value is from the mean.

   Communicate Your Understanding

    1. Explain how the term root-mean-square applies to the calculation of the
       standard deviation.

    2. Why does the semi-interquartile range give only an approximate measure of
       how far the first and third quartiles are from the median?

    3. Describe the similarities and differences between the standard deviation and
       the semi-interquartile range.

    4. Are the median, the second quartile, and the 50th percentile always equal?
       Explain why or why not.




                                                                          2.6 Measures of Spread • MHR   147
Practise                                                                   Apply, Solve, Communicate
 A                                                                         B
 1. Determine the mean, standard deviation,                                6. The board members of a provincial
      and variance for the following samples.                                  organization receive a car allowance for
      a) Scores on a data management quiz                                      travel to meetings. Here are the distances
         (out of 10 with a bonus question):                                    the board logged last year (in kilometres).

               5        7     9        6     5 10     8        2                 44 18 125 80      63      42   35   68   52
              11        8     7        7     6 9      5        8                 75 260 96 110     72      51

      b) Costs for books purchased including                                   a) Determine the mean, standard deviation,
         taxes (in dollars):                                                      and variance for these data.
                                                                               b) Determine the median, interquartile
              12.55         15.31          21.98    45.35          19.81
              33.89         29.53          30.19    38.20                         range, and semi-interquartile range.
                                                                               c) Illustrate these data using a box-and-
 2. Determine the median, Q1, Q3, the                                             whisker plot.
      interquartile range, and semi-interquartile
                                                                               d) Identify any outliers.
      range for the following sets of data.
      a) Number of home runs hit by players                                7. The nurses’ union collects data on the hours
         on the Statsville little league team:                                 worked by operating-room nurses at the
                                                                               Statsville General Hospital.
              6     4        3      8       9 11     6      5 15
                                                                                 Hours Per Week     Number of Employees
      b) Final grades in a geography class:                                            12                        1
             88 56 72 67 59 48 81 62                                                   32                        5
             90 75 75 43 71 64 78 84                                                   35                        7
                                                                                       38                        8
 3. For a recent standardized test, the median                                         42                        5
      was 88, Q1 was 67, and Q3 was 105. Describe
                                                                               a) Determine the mean, variance, and
      the following scores in terms of quartiles.
                                                                                  standard deviation for the nurses’ hours.
      a) 8
                                                                               b) Determine the median, interquartile
      b) 81                                                                       range, and semi-interquartile range.
      c) 103                                                                   c) Illustrate these data using a box-and-
                                                                                  whisker plot.
 4. What percentile corresponds to
      a) the first quartile?                                                8. Application
      b) the median?                                                           a) Predict the changes in the standard
      c) the third quartile?                                                      deviation and the box-and-whisker plot
                                                                                  if the outlier were removed from the data
 5. Convert these raw scores to z-scores.                                         in question 7.
       18          15             26           20         21                   b) Remove the outlier and compare the
                                                                                  new results to your original results.
                                                                               c) Account for any differences between your
                                                                                  prediction and your results in part b).

148     MHR • Statistics of One Variable
9. Application Here are the current salaries for       semi-interquartile range? Give an example
         pte
    ha         François’ team.                                    or explain why one is not possible.
C


           r

                     Salary ($)        Number of Players
           m
P




r
    oble                                                       14. Inquiry/Problem Solving A business-
                       300 000                  2
                                                                  travellers’ association rates hotels on a
                       500 000                  3
                                                                  variety of factors including price, cleanliness,
                       750 000                  8                 services, and amenities to produce an overall
                       900 000                  6                 score out of 100 for each hotel. Here are the
                     1 000 000                  2                 ratings for 50 hotels in a major city.
                     1 500 000                  1
                                                                     39      50   56     60    65     68   73    77   81   87
                     3 000 000                  1                    41      50   56     60    65     68   74    78   81   89
                     4 000 000                  1                    42      51   57     60    66     70   74    78   84   91
               a) Determine the standard deviation,                  44      53   58     62    67     71   75    79   85   94
                                                                     48      55   59     63    68     73   76    80   86   96
                  variance, interquartile range, and
                  semi-interquartile range for these data.         a) What score represents
               b) Illustrate the data with a modified                    i)      the 50th percentile?
                  box-and-whisker plot.
                                                                        ii) the 95th percentile?
               c) Determine the z-score of François’
                                                                   b) What percentile corresponds to a rating
                  current salary of $300 000.
                                                                        of 50?
               d) What will the new z-score be if François’
                                                                   c) The travellers’ association lists hotels
                  agent does get him a million-dollar
                                                                        above the 90th percentile as “highly
                  contract?
                                                                        recommended” and hotels between the
          10. Communication Carol’s golf drives have a                  75th and 90th percentiles as
               mean of 185 m with a standard deviation                  “recommended.” What are the minimum
               of 25 m, while her friend Chi-Yan shoots                 scores for the two levels of recommended
               a mean distance of 170 m with a standard                 hotels?
               deviation of 10 m. Explain which of the two
               friends is likely to have a better score in a
               round of golf. What assumptions do you                 ACHIEVEMENT CHECK

               have to make for your answer?                     Knowledge/       Thinking/Inquiry/
                                                                                                      Communication    Application
                                                                Understanding      Problem Solving
          11. Under what conditions will Q1 equal one of
                                                               15. a) A data-management teacher has two
               the data points in a distribution?
                                                                        classes whose midterm marks have
          12. a) Construct a set of data in which Q1 = Q3               identical means. However, the standard
                  and describe a situation in which this                deviations for each class are significantly
                  equality might occur.                                 different. Describe what these measures
                                                                        tell you about the two classes.
               b) Will such data sets always have a median
                  equal to Q1 and Q3? Explain your                 b) If two sets of data have the same mean,
                  reasoning.                                            can one of them have a larger standard
                                                                        deviation and a smaller interquartile
          13. Is it possible for a set of data to have a                range than the other? Give an example
               standard deviation much smaller than its                 or explain why one is not possible.


                                                                                        2.6 Measures of Spread • MHR             149
a) Determine the midrange and
 C
                         −                                      interquartile range for these data.
16. Show that ∑(x − x ) = 0 for any distribution.
                                                             b) What are the similarities and differences
                         n(∑x ) − (∑x)
                                   2        2                   between these two measures of spread?
17. a) Show that s =
                          Ί๶๶
                         ᎏᎏ .
                             n(n − 1)
                                                          19. The mean absolute deviation of a set of
                                         −                                            −
                                                                                ∑|x − x |
         (Hint: Use the fact that ∑ x = nx .)                data is defined as ᎏ , where | x − x | is−
      b) What are two advantages of using the                                       n
         formula in part a) for calculating                  the absolute value of the difference between
         standard deviations?                                each data point and the mean.
                                                             a) Calculate the mean absolute deviation
18. Communication The midrange of a set of                      and the standard deviation for the data
      data is defined as half of the sum of the                  in question 18.
      highest value and the lowest value. The
                                                             b) What are the similarities and differences
      incomes for the employees of Statsville
                                                                between these two measures of spread?
      Lawn Ornaments Limited are listed below
      (in thousands of dollars).
       28 34 49 22 50 31 55 32 73 21
       63 112 35 19 44 28 59 85 47 39



      Career Connection
                                           Statistician
  Use of statistics today is so widespread that there are numerous career
  opportunities for statisticians in a broad range of fields. Governments,
  medical-research laboratories, sports agencies, financial groups, and
  universities are just a few of the many organizations that employ statisticians.
  Current trends suggest an ongoing need for statisticians in many areas.

  A statistician is engaged in the collection, analysis, presentation, and
  interpretation of data in a variety of forms. Statisticians provide insight
  into which data are likely to be reliable and whether valid conclusions or
  predictions can be drawn from them. A research statistician might develop
  new statistical techniques or applications.

  Because computers are essential for analysing large
  amounts of data, a statistician should possess a
  strong background in computers as well as
                                                                  www.mcgrawhill.ca/links/MDM12
  mathematics. Many positions call for a
  minimum of a bachelor’s or master’s degree.                   For more information about a career as a
  Research at a university or work for a                         statistician and other careers related to
  consulting firm usually requires a doctorate.                    mathematics, visit the above web site
                                                                            and follow the links.




150     MHR • Statistics of One Variable
Review of Key Concepts

2.1 Data Analysis With Graphs                                2.2 Indices
Refer to the Key Concepts on page 100.                       Refer to the Key Concepts on page 109.

 1. The following data show monthly sales of                 The following graph shows four categories
    houses by a real-estate agency.                          from the basket of goods and services used
                                                             to calculate the consumer price index.
       6     5     7     6   8     3     5   4     6
       7     5     9     5   6     6     7                                              200
                                                                                                                                                                           Fresh
                                                                                                                                                                           Vegetables
    a) Construct an ungrouped frequency




                                                             for Ontario (1992 = 100)
                                                              Consumer Price Index
                                                                                        150
       table for this distribution.                                                                                                                                        Coffee and
                                                                                                                                                                           Tea
    b) Create a frequency diagram.                                                      100
                                                                                                                                                                           Rent
    c) Create a cumulative-frequency diagram.
                                                                                        50                                                                                 Fuel Oil
                                                                                                                                                                           and Other
 2. A veterinary study recorded the masses in                                                                                                                              Fuel
    grams of 25 kittens at birth.                                                        0




                                                                                              1990
                                                                                                     1991
                                                                                                            1992
                                                                                                                   1993
                                                                                                                          1994
                                                                                                                                 1995
                                                                                                                                        1996
                                                                                                                                               1997
                                                                                                                                                      1998
                                                                                                                                                             1999
                                                                                                                                                                    2000
     240     300       275   350       280   260       320
     295     340       305   280       265   300       275
     315     285       320   325       275   270       290         4. a) What is this type of graph called?
     245     235       305   265                                                        b) Which of the four categories had the
                                                                                              greatest increase during the period
    a) Organize these data into groups.                                                       shown?
    b) Create a frequency table and histogram.                                          c) Why do all four graphs intersect
    c) Create a frequency polygon.                                                            at 1992?
    d) Create a relative-frequency diagram.                                             d) Which category was
                                                                                              i)      the most volatile?
 3. A class of data-management students
    listed their favourite board games.                                                       ii) the least volatile?

     Game                     Frequency                                                 e) Suggest reasons for this difference
     Pictionary®                   10                                                         in volatility.
     Chess                          5                              5. a) If a tin of coffee cost $5.99 in 1992,
     Trivial Pursuit®              8                                                          what would you expect it to cost in
     MONOPOLY®                      3
                                                                                              i)      1995?
     Balderdash®                    6
                                                                                              ii) 1990?
     Other                          4
                                                                                        b) What rent would a typical tenant pay
    a) What type of data does this table show?                                                in 2000 for an apartment that had a rent
       Explain your reasoning.                                                                of $550 per month in 1990?
    b) Graph these data using an appropriate                                            c) What might you expect to pay for
       format.                                                                                broccoli in 2000, if the average price
    c) Explain why you chose the type of graph                                                you paid in 1996 was $1.49 a bunch?
       you did.

                                                                                                                          Review of Key Concepts • MHR                            151
2.3 Sampling Techniques                                     c) A budding musician plays a new song for
Refer to the Key Concepts on page 116.                         family members and friends to see if it is
                                                               good enough to record professionally.
 6. a) Explain the difference between a
                                                            d) Every fourth person entering a public
         stratified sample and a systematic sample.
                                                               library is asked: “Do you think Carol
      b) Describe a situation where a convenience              Shields should receive the Giller prize
         sample would be an appropriate                        for her brilliant and critically acclaimed
         technique.                                            new novel?”
      c) What are the advantages and
         disadvantages of a voluntary-response          10. For each situation in question 9, suggest
         sample?                                            how the statistical process could be changed
                                                            to remove the bias.
 7. Suppose you are conducting a survey that
      you would like to be as representative as         2.5 Measures of Central Tendency
      possible of the entire student body at your       Refer to the Key Concepts on page 133.
      school. However, you have time to visit
      only six classes and to process data from a       11. a) Determine the mean, median, and mode
      total of 30 students.                                    for the data in question 1.
      a) What sampling technique would you use?             b) Which measure of central tendency best

      b) Describe how you would select the
                                                               describes these data? Explain your
         students for your sample.                             reasoning.

                                                        12. a) Use your grouped data from question 2
 8. Drawing names from a hat and using a
      random-number generator are two ways to                  to estimate the mean and median masses
      obtain a simple random sample. Describe                  for the kittens.
      two other ways of selecting a random sample.          b) Determine the actual mean and median
                                                               masses from the raw data.
2.4 Bias in Surveys                                         c) Explain any differences between your
Refer to the Key Concepts on page 122.                         answers to parts a) and b).

 9. Identify the type of bias in each of the            13. a) For what type of “average” will the
      following situations and state whether the               following statement always be true?
      bias is due to the sampling technique or                 “There are as many people with
      the method of data collection.                           below-average ages as there are with
      a) A survey asks a group of children                     above-average ages. ”
         whether or not they should be allowed              b) Is this statement likely to be true for
         unlimited amounts of junk food.                       either of the other measures of central
      b) A teachers asks students to raise their               tendency discussed in this chapter?
         hands if they have ever told a harmless lie.          Why or why not?




152     MHR • Statistics of One Variable
14. Angela is applying to a university                        17. a) Explain why you cannot calculate the
    engineering program that weights an                              semi-interquartile range if you know
    applicant’s eight best grade-12 marks as                         only the difference between either Q3
    shown in the following table.                                    and the median or median and Q1.
    Subjects                               Weighting              b) Explain how you could determine the
    Calculus, chemistry, geometry                                    semi-interquartile range if you did know
    and discrete mathematics, physics          3                     both of the differences in part a).
    Computer science, data
    management, English                        2              18. a) For the data in question 2, determine
    Other                                      1                     i)   the first and third quartiles
    Angela’s grade-12 final marks are listed                          ii) the 10th, 25th, 75th, and 90th
    below.                                                                percentiles
    Subject                Mark Subject                Mark       b) Would you expect any of the values in
    Calculus                95   Computer science       84           part a) to be equal? Why or why not?
    English                 89   Chemistry              90
    Geometry and            94   Mathematics of         87    19. The scores on a precision-driving test for
    discrete mathematics         data management                 prospective drivers at a transit company have
    Physical education      80   Physics                92       a mean of 100 and a standard deviation of 15.
    a) Calculate Angela’s weighted average.                       a) Determine the z-score for each of the

    b) Calculate Angela’s unweighted average.
                                                                     following raw scores.

    c) Explain why the engineering program
                                                                     i)   85    ii) 135       iii) 100        iv) 62
       would use this weighting system.                           b) Determine the raw score corresponding
                                                                     to each of the following z-scores.
15. Describe three situations where the mode
                                                                     i)   1     ii) −2        iii) 1.5        iv) −1.2
    would be the most appropriate measure of
    central tendency.                                         20. Dr. Simba’s fourth-year class in animal
                                                                 biology has only 12 students. Their scores on
2.6 Measures of Spread                                           the midterm examination are shown below.
Refer to the Key Concepts on page 147.                             50     71    65       54      84      69     82
                                                                   67     52    52       86      85
16. a) Determine the standard deviation,
       the interquartile range, and the semi-                     a) Calculate the mean and median for these
       interquartile range for the data in                           data. Compare these two statistics.
       question 1.                                                b) Calculate the standard deviation and the
    b) Create a box-and-whisker plot for these                       semi-interquartile range. Compare these
       data.                                                         statistics and comment on what you notice.
    c) Are there any outliers in the data?                        c) Which measure of spread is most
       Justify your answer.                                          suitable for describing this data set?
                                                                     Explain why.



                                                                                Review of Key Concepts • MHR           153
Chapter Test

ACHIEVEMENT CHART

                          Knowledge/          Thinking/Inquiry/
       Category                                                    Communication          Application
                         Understanding         Problem Solving
       Questions              All                  10, 11         4, 6, 7, 8, 9, 11        5, 6, 11


Use the following set of data-management final                      6. An interview committee graded three short-
examination scores to answer questions 1                               listed candidates for a management position
through 5.                                                             as shown below. The scores are on a scale of
                                                                       1 to 5, with 5 as the top score.
 92 48       59    62   66   98     70   70   55    63
 70 97       61    53   56   64     46   69   58    64                  Criterion              Weight Clarise Pina Steven
                                                                        Education                2       3     3     4
 1. a) Group these data into intervals and
                                                                        Experience               2       4     5     3
         create a frequency table.
                                                                        Interpersonal skills     3       3     3     5
      b) Produce a frequency diagram and
                                                                        First interview          1       5     4     3
         a frequency polygon.
                                                                       Who should the committee hire based on
      c) Produce a cumulative-frequency
                                                                       these data? Justify your choice.
         diagram.
                                                                   7. Describe the type of sample used in each
 2. Determine the
                                                                       of the following scenarios.
      a) three measures of central tendency
                                                                       a) A proportionate number of boys and
      b) standard deviation and variance                                   girls are randomly selected from a class.
      c) interquartile and semi-interquartile                          b) A software company randomly chooses
         ranges                                                            a group of schools in a particular school
                                                                           district to test a new timetable program.
 3. a) Produce a modified box-and-whisker plot
         for this distribution.                                        c) A newspaper prints a questionnaire and
                                                                           invites its readers to mail in their
      b) Identify any outliers.
                                                                           responses.
      c) Identify and explain any other unusual
                                                                       d) A telephone-survey company uses a
         features of this graph.
                                                                           random-number generator to select
 4. Explain which of the three measures of                                 which households to call.
      central tendency is most appropriate to                          e) An interviewer polls people passing by
      describe this distribution of marks and why                          on the street.
      the other two measures are not appropriate.
                                                                   8. A group of 8 children in a day-care centre
 5. Students with scores above the 90th                                are to be interviewed about their favourite
      percentile receive a book prize.                                 games. Describe how you would select a
      a) How many students will receive prizes?                        systematic sample if there are 52 children
      b) What are these students’ scores?
                                                                       at the centre.



154     MHR • Statistics of One Variable
9. a) Identify the bias in the following surveys                                         iii) A random survey of corporate
                 and explain the effect it could have on                                     executives asked: “Do you favour
                 their results.                                                              granting a cable-television licence
                       i) Parents of high-school students were                               for a new economics and business
                            asked: “Do you think that students                               channel?”
                            should be released from school a half                     b) Suggest how to eliminate the bias in
                            hour early on Friday, free to run                            each of the surveys in part a).
                            around and get into trouble?”
                                                                                   10. A mutual-fund company proudly advertises
                      ii) Audience members at an investment
                                                                                      that all of its funds have “first-quartile
                            workshop were asked to raise their
                                                                                      performance.” What mathematical errors has
                            hands if they had been late with a bill
                                                                                      the company made in this advertisement?
                            payment within the last six months.


     ACHIEVEMENT CHECK

 Knowledge/Understanding                        Thinking/Inquiry/Problem Solving        Communication               Application
11. The graph below shows the stock price for an Ontario technology
   company over a one-month period in 2001.
                      30
                      28
    Stock Price ($)




                      26
                      24
                      22
                      20
                      18
                           23 25            1         8          15          22

                              August 2001            September 2001

   a) When did the stock reach its lowest value during the period shown?
                      Suggest a possible reason for this low point.
   b) Compare the percent drop in stock price from September 1 to
                      September 8 to the drop during the following week.
   c) Sketch a new graph and provide a commentary that the company
                      could use to encourage investors to buy the company’s stock.




                                                                                                             Chapter Test • MHR    155
3
    PT   ER
                  Statistics of Two Variables
CHA




                  Specific Expectations                                                             Section

                  Define the correlation coefficient as a measure of the fit of a scatter           3.1, 3.2, 3.3,
                  graph to a linear model.                                                            3.5

                  Calculate the correlation coefficient for a set of data, using graphing         3.1, 3.2, 3.3,
                  calculators or statistical software.                                                3.5

                  Demonstrate an understanding of the distinction between cause-effect           3.1, 3.2, 3.3,
                  relationships and the mathematical correlation between variables.                 3.4, 3.5

                  Describe possible misuses of regression.                                       3.2, 3.3, 3.5

                  Explain examples of the use and misuse of statistics in the media.                  3.5

                  Assess the validity of conclusions made on the basis of statistical studies,   3.2, 3.3, 3.4,
                  by analysing possible sources of bias in the studies and by calculating             3.5
                  and interpreting additional statistics, where possible.

                  Demonstrate an understanding of the purpose and the use of a variety of          3.4, 3.5
                  sampling techniques.

                  Organize and summarize data from secondary sources, using                      3.1, 3.2, 3.3,
                  technology.                                                                       3.4, 3.5

                  Locate data to answer questions of significance or personal interest, by        3.1, 3.2, 3.4,
                  searching well-organized databases.                                                 3.5

                  Use the Internet effectively as a source for databases.                        3.1, 3.2, 3.4,
                                                                                                      3.5
Chapter Problem
Job Prospects                                   1. How could Gina graph this data to
Gina is in her second year of business            estimate
studies at university and she is starting to       a) her chances of finding a job in her
think about a job upon graduation. She has            field when she graduates in two years?
two primary concerns—the job market and            b) her starting salary?
expected income. Gina does some research
at the university’s placement centre and        2. What assumptions does Gina have to
finds employment statistics for graduates of       make for her predictions? What other
her program and industry surveys of entry-        factors could affect the accuracy of
level salaries.                                   Gina’s estimates?
                     Number Mean Starting      This chapter introduces statistical
        Number of
                    Hired Upon Salary
 Year   Graduates                              techniques for measuring relationships
                    Graduation ($000)
 1992      172          151      26            between two variables. As you will see,
 1993      180          160      27            these techniques will enable Gina to make
 1994      192          140      28            more precise estimates of her job prospects.
 1995      170          147      27.5
 1996      168          142      27
                                               Two-variable statistics have an enormous
 1997      176          155      26.5          range of applications including industrial
 1998      180          160      27            processes, medical studies, and
 1999      192          162      29            environmental issues—in fact, almost any
 2000      200          172      31            field where you need to determine if a
 2001      220          180      34            change in one variable affects another.
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

 1. Scatter plots For each of the following sets of       5. Graphing exponential functions
      data, create a scatter plot and describe any           a) Identify the base and the numerical
      patterns you see.                                           coefficient for each of the following
      a)        x          y         b)        x    y             functions.
                3         18                    4    6            i) y = 0.5(3)x ii) y = 2x iii) y = 100(0.5)x
                5         15                    7    2       b) Graph each of the functions in part a).
                8         12                   13   17
                                                             c) Explain what happens to the value of x as
                9         10                   14    5
                                                                  the curves in part b) approach the x-axis.
               12          8                   23   19
               15          4                   24   11    6. Sigma notation Calculate each sum without
               17          1                   25   30      the use of technology.
                                                                  8                      5
                                               33   21       a)   Αi                b)   Αi    2

                                               36   29            i=1                    i=1
                                                                                  −
                                                          7. Sigma notation Given x = 2.5, calculate each
                                               40   39
                                               42   26      sum without the use of technology.
                                                                  6                      4
                                                                         −                      −
                                               46   32       a)   Α (i − x )
                                                                  i=1
                                                                                    b)   Α (i − x )
                                                                                         i=1
                                                                                                   2


 2. Scatter plots For each plot in question 1,            8. Sigma notation
       i) graph the line of best fit and calculate            a) Repeat questions 6 and 7 using
           its equation                                           appropriate technology such as a
      ii) estimate the x- and y-intercepts                        graphing calculator or a spreadsheet.
      iii) estimate the value of y when x = 7                b) Explain the method that you chose.

 3. Graphing linear equations Determine the               9. Sampling (Chapter 2) Briefly explain each
      slope and y-intercept for the lines defined by         of the following terms.
      the following equations, and then graph the            a) simple random sample
      lines.
                                                             b) systematic sample
      a) y = 3x − 4              b) y = −2x + 6
                                                             c) outlier
      c) 12x − 6y = 7
                                                         10. Bias (Chapter 2)
 4. Graphing quadratic functions Graph the
                                                             a) Explain the term measurement bias.
      following functions and estimate any x- and
      y-intercepts.                                          b) Give an example of a survey method
                                                                  containing unintentional measurement bias.
      a) y = 2x2
                                                             c) Give an example of a survey method
      b) y = x2 + 5x − 6
                                                                  containing intentional measurement bias.
      c) y = −3x2 + x + 2
                                                             d) Give an example of sampling bias.



158        MHR • Statistics of Two Variables
3.1         Scatter Plots and Linear Correlation

  Does smoking cause lung cancer? Is job performance related to marks in high
  school? Do pollution levels affect the ozone layer in the atmosphere? Often the
  answers to such questions are not clear-cut, and inferences have to be made from
  large sets of data. Two-variable statistics provide methods for detecting relationships
  between variables and for developing mathematical models of these relationships.

  The visual pattern in a graph or plot can often reveal the nature of the relationship
  between two variables.

      I N V E S T I G AT E & I N Q U I R E : V i s u a l i z i n g R e l a t i o n s h i p s B e t w e e n Va r i a b l e s

      A study examines two new obedience-training
      methods for dogs. The dogs were randomly selected
      to receive from 5 to 16 h of training in one of
      the two training programs. The dogs were assessed
      using a performance test graded out of 20.
          Rogers Method               Laing System
        Hours         Score         Hours        Score
           10          12             8            10
           15          16             6             9
            7          10            15            12
           12          15            16             7
            8            9            9            11
            5            8           11             7
            8          11            10             9
           16          19            10             6
           10          14             8            15
       1. Could you determine which of the two training systems is more effective by
         comparing the mean scores? Could you calculate another statistic that would
         give a better comparison? Explain your reasoning.
       2. Consider how you could plot the data for the Rogers Method. What do
         you think would be the best method? Explain why.
       3. Use this method to plot the data for the Rogers Method. Describe any
         patterns you see in the plotted data.
       4. Use the same method to plot the data for the Laing System and describe
         any patterns you see.
       5. Based on your data plots, which training method do you think is more
         effective? Explain your answer.

                                                                         3.1 Scatter Plots and Linear Correlation • MHR       159
6. Did your plotting method make it easy to compare the two sets of data?
           Are there ways you could improve your method?
      7. a) Suggest factors that could influence the test scores but have not been
               taken into account.
           b) How could these factors affect the validity of conclusions drawn from
               the data provided?


In data analysis, you are often trying to discern whether one variable, the
dependent (or response) variable, is affected by another variable, the
independent (or explanatory) variable. Variables have a linear correlation
if changes in one variable tend to be proportional to changes in the other.
Variables X and Y have a perfect positive (or direct) linear correlation if
Y increases at a constant rate as X increases. Similarly, X and Y have a perfect
negative (or inverse) linear correlation if Y decreases at a constant rate as
X increases.

A scatter plot shows such relationships graphically, usually with the
independent variable as the horizontal axis and the dependent variable as
the vertical axis. The line of best fit is the straight line that passes as close
as possible to all of the points on a scatter plot. The stronger the correlation,
the more closely the data points cluster around the line of best fit.

Example 1 Classifying Linear Correlations

Classify the relationship between the variables X and Y for the data shown
in the following diagrams.
a) y                          b) y                          c) y




                                  x                       x                           x

d)    y                               e)      y               f)   y




                                  x                       x                           x

160       MHR • Statistics of Two Variables
Solution

a) The data points are clustered around a line that rises to the right (positive
     slope), indicating definitely that Y increases as X increases. Although the
     points are not perfectly lined up, there is a strong positive linear correlation
     between X and Y.

b) The data points are all exactly on a line that slopes down to the right, so
     Y decreases as X increases. In fact, the changes in Y are exactly proportional
     to the changes in X. There is a perfect negative linear correlation between X
     and Y.

c) No discernible linear pattern exists. As X increases, Y appears to change
     randomly. Therefore, there is zero linear correlation between X and Y.

d) A definite positive trend exists, but it is not as clear as the one in part a).
     Here, X and Y have a moderate positive linear correlation.

e) A slight positive trend exists. X and Y have a weak positive linear correlation.

f)   A definite negative trend exists, but it is hard to classify at a glance. Here,
     X and Y have a moderate or strong negative linear correlation.


As Example 1 shows, a scatter plot often can give only a rough indication of the
correlation between two variables. Obviously, it would be useful to have a more
precise way to measure correlation. Karl Pearson (1857−1936) developed a
formula for estimating such a measure. Pearson, who also invented the term
standard deviation, was a key figure in the development of modern statistics.


The Correlation Coefficient
To develop a measure of correlation, mathematicians first defined the
covariance of two variables in a sample:
        1
 sXY = ᎏᎏ Α (x − x )( y − − )
                   −      y
       n−1
where n is the size of the sample, x represents individual values of the variable
X, y represents individual values of the variable Y, x is the mean of X, and − is
                                                     −                        y
the mean of Y.

Recall from Chapter 2 that the symbol Α means “the sum of.” Thus, the
covariance is the sum of the products of the deviations of x and y for all the data
points divided by n − 1. The covariance depends on how the deviations of the
two variables are related. For example, the covariance will have a large positive
value if both x − x and y − − tend to be large at the same time, and a negative
                  −         y
value if one tends to be positive when the other is negative.


                                                                  3.1 Scatter Plots and Linear Correlation • MHR   161
The correlation coefficient, r, is the covariance divided by the product of the
standard deviations for X and Y:
       sXY
 r = ᎏᎏ
     sX × sY

where sX is the standard deviation of X and sY is the standard deviation of Y.
This coefficient gives a quantitative measure of the strength of a linear
correlation. In other words, the correlation coefficient indicates how closely the
data points cluster around the line of best fit. The correlation coefficient is also
called the Pearson product-moment coefficient of correlation (PPMC) or
Pearson’s r.

The correlation coefficient always has values in the range from −1 to 1. Consider
a perfect positive linear correlation first. For such correlations, changes in the
dependent variable Y are directly proportional to changes in the independent
variable X, so Y = aX + b, where a is a positive constant. It follows that

 sXY = ᎏ Α (x − x )( y − − )
          1         −                                               ∑( y − − )2
                                                                    Ί๶
                                                                           y
                         y                                  sY =    ᎏᎏ
        n−1                                                           n−1
          1         −            −                                                  −
     = ᎏ Α (x − x )[(ax + b) − (ax + b)]                             ∑[(ax + b) − (ax + b)]2
        n−1                                                     =   Ί๶๶
                                                                     ᎏᎏᎏ
                                                                             n−1
          1         −
     = ᎏ Α (x − x )(ax − ax )−                                                −
                                                                     ∑(ax − ax )2
        n−1                                                     = ᎏᎏΊ๶  n−1
          1
     = ᎏ Α a(x − x )2−                                                       −
                                                                     a ∑(x − x )2
                                                                    Ί๶
                                                                      2
        n−1                                                     = ᎏᎏ
                                                                        n−1
               −
         ∑(x − x )2                                                         −
                                                                      ∑(x − x )2
     = a ᎏᎏ
            n−1                                                      Ί๶
                                                                = a ᎏᎏ
                                                                        n−1
     = as X
         2
                                                                = asX

Substituting into the equation for the correlation coefficient gives
    sXY
r= ᎏ
   sX sY
     as2
 =ᎏ
        X

   sX (asX )
                                           Y




 =1                                                      r =1



                                                     X




162    MHR • Statistics of Two Variables
Similarly, r = −1 for a perfect negative linear correlation.

For two variables with no correlation, Y is                               r=0
equally likely to increase or decrease as X
increases. The terms in Α (x − x )( y − −) are
                                  −       y
randomly positive or negative and tend to
cancel each other. Therefore, the correlation




                                                          Y




                                                                                                  Y
coefficient is close to zero if there is little or
no correlation between the variables. For
                                                                                                      r = –0.5
moderate linear correlations, the summation
terms partially cancel out.                                                  X                                   X

The following diagram illustrates how the correlation coefficient corresponds
to the strength of a linear correlation.
      Negative Linear Correlation        Positive Linear Correlation
Perfect                                                               Perfect
      Strong   Moderate       Weak       Weak      Moderate      Strong

 –1        –0.67     – 0.33          0          0.33      0.67           1
                       Correlation Coefficient, r


                                                      −
Using algebraic manipulation and the fact that Α x = nx , Pearson showed that

         n∑xy − (∑x)(∑y)
 r = ᎏᎏᎏᎏ
     ͙[n∑x2 − (∑x)2][n∑y2 − (∑y)2]
      ෆ ෆෆෆෆ

where n is the number of data points in the sample, x represents individual
values of the variable X, and y represents individual values of the variable Y.
(Note that Α x2 is the sum of the squares of all the individual values of X,
while ( Α x)2 is the square of the sum of all the individual values.)

Like the alternative formula for standard deviations (page 150), this formula for
r avoids having to calculate all the deviations individually. Many scientific and
statistical calculators have built-in functions for calculating the correlation
coefficient.

It is important to be aware that increasing the number of data points used in
determining a correlation improves the accuracy of the mathematical model.
Some of the examples and exercise questions have a fairly small set of data in
order to simplify the computations. Larger data sets can be found in the e-book
that accompanies this text.




                                                                             3.1 Scatter Plots and Linear Correlation • MHR   163
Example 2 Applying the Correlation Coefficient Formula

A farmer wants to determine whether there is a relationship between the mean
temperature during the growing season and the size of his wheat crop. He
assembles the following data for the last six crops.
Mean Temperature (°C) Yield (tonnes/hectare)
            4                           1.6
            8                           2.4
           10                           2.0
            9                           2.6
           11                           2.1
            6                           2.2
a)    Does a scatter plot of these data indicate any linear correlation between
      the two variables?
b)    Compute the correlation coefficient.
c)    What can the farmer conclude about the relationship between the mean
      temperatures during the growing season and the wheat yields on his farm?

Solution

a) The farmer wants to know whether the crop yield depends                                      y
                                                                                            2.5
      on temperature. Here, temperature is the independent


                                                                             Yield (T/ha)
                                                                                            2
      variable, X, and crop yield is the dependent variable, Y. The
                                                                                            1.5
      scatter plot has a somewhat positive trend, so there appears                          1
      to be a moderate positive linear correlation.                                         0.5

                                                                                                0     2   4   6   8 10 12 14   x
                                                                                                Mean Temperature (ºC)

b) To compute r, set up a table to calculate the quantities required
      by the formula.
       Temperature, x         Yield, y           x2            y2                               xy
                 4                1.6             16           2.56                                 6.4
                 8                2.4             64           5.76                             19.2
                10                2.0            100           4.00                             20.0
                 9                2.6             81           6.76                             23.4
                11                2.1            121           4.41                             23.1
                 6                2.2             36           4.84                             13.2

          Α x = 48           Α y = 12.9       Α x = 418
                                                 2
                                                          Αy   2
                                                                   = 28.33        Α xy = 105.3




164     MHR • Statistics of Two Variables
Now compute r, using the formula:                                                            Data in Action
                  n∑(xy) − (∑x)(∑y)                                                               From 1992 to 2001,
     r = ᎏᎏᎏᎏ
            ͙ෆ− (∑x)2][n∑y2 − (∑yෆ
              [n∑x2 ෆෆ ෆ)2]                                                                       Canada produced an
                                                                                                  average of 27 million
                  6(105.3) − (48)(12.9)
       = ᎏᎏᎏᎏ                                                                                     tonnes of wheat a
             [6(418) − (48)2][6(28.33) − (12.9)ෆ
          ͙ෆෆෆෆෆ2]                                                                                year. About 70%
                     631.8 − 619.2                                                                of this crop was
       = ᎏᎏᎏᎏ                                                                                     exported.
             (2508 − 2304)(169.98 − 166.41)
          ͙ෆෆෆෆෆ
           12.6
       =ᎏ
          26.99
       = 0.467

     The correlation coefficient for crop yield versus mean temperature is
     approximately 0.47, which confirms a moderate positive linear correlation.

c)   It appears that the crop yield tends to increase somewhat as the mean temperature
     for the growing season increases. However, the farmer cannot conclude that higher
     temperatures cause greater crop yields. Other variables could account for the
     correlation. For example, the lower temperatures could be associated with heavy
     rains, which could lower yields by flooding fields or leaching nutrients from the soil.


The important principle that a correlation does not prove the existence of a cause-
and-effect relationship between two variables is discussed further in section 3.4.

Example 3 Using Technology to Determine Correlation Coefficients

Determine whether there is a linear correlation between horsepower and fuel
consumption for these five vehicles by creating a scatter plot and calculating the
correlation coefficient.
 Vehicle                       Horsepower, x   Fuel Consumption (L/100 km), y
Midsize sedan                      105                      6.7
Minivan                            170                     23.5
Small sports utility vehicle       124                      5.9
Midsize motorcycle                  17                      3.4
Luxury sports car                  296                      8.4


Solution 1      Using a Graphing Calculator

Use the ClrList command to make sure lists L1 and L2 are clear, then enter                     See Appendix B for
                                                                                               more details on the
the horsepower data in L1 and the fuel consumption figures in L2.
                                                                                               graphing calculator
To display a scatter plot, first make sure that all functions in the Y= editor                  and software functions
are either clear or turned off. Then, use STAT PLOT to select PLOT1.                           used in this section.




                                                                   3.1 Scatter Plots and Linear Correlation • MHR    165
Turn the plot on, select the scatter-plot icon, and enter L1 for XLIST and L2 for
YLIST. (Some of these settings may already be in place.) From the ZOOM menu,
select 9:ZoomStat. The calculator will automatically optimize the window
settings and display the scatter plot.

To calculate the correlation coefficient, from the CATALOG menu, select
DiagnosticOn, then select the LinReg(ax+b) instruction from the STAT CALC menu.
The calculator will perform a series of statistical calculations using the data in
lists L1 and L2. The last line on the screen shows that the correlation coefficient
is approximately 0.353.

Therefore, there is a moderate linear correlation
between horsepower and fuel consumption for
the five vehicles.




Solution 2    Using a Spreadsheet

Set up three columns and enter the data from the table above. Highlight the
numerical data and use your spreadsheet’s Chart feature to display a scatter plot.
Both Corel® Quattro® Pro and Microsoft® Excel have a CORREL function that
allows you to calculate the correlation coefficient easily. The scatter plot and
correlation coefficient indicate a moderate correlation between horsepower and
fuel consumption.




Solution 3    Using Fathom™
Create a new collection by setting up a case table with three attributes: Vehicle,
Hp, and FuelUse. Enter the data for the five cases. To create a scatter plot, drag
the graph icon onto the work area and drop the Hp attribute on the x-axis and
the FuelUse attribute on the y-axis.



166   MHR • Statistics of Two Variables
To calculate the correlation coefficient, right-click on the collection and select Inspect
Collection. Select the Measures tab and name a new measure PPMC. Right-click this
measure and select Edit Formula, then Functions/Statistical/Two Attributes/correlation.
When you enter the Hp and FuelUse attributes in the correlation function,
Fathom™ will calculate the correlation coefficient for these data.

Again, the scatter plot and correlation coefficient show a moderate linear
correlation.
                                                                                              Project
                                                                                              Prep

                                                                                              For your statistics
                                                                                              project, you may
                                                                                              be investigating
                                                                                              the linear
                                                                                              correlation
                                                                                              between two
                                                                                              variables. A
                                                                                              graphing
                                                                                              calculator or
                                                                                              computer software
                                                                                              may be a valuable
Notice that the scatter plots in Example 3 have an outlier at (170, 23.5).                    aid for this
Without this data point, you would have a strong positive linear correlation.                 analysis.
Section 3.2 examines the effect of outliers in more detail.

   Key Concepts

   • Statistical studies often find linear correlations between two variables.

   • A scatter plot can often reveal the relationship between two variables. The
     independent variable is usually plotted on the horizontal axis and the
     dependent variable on the vertical axis.

   • Two variables have a linear correlation if changes in one variable tend to be
     proportional to changes in the other. Linear correlations can be positive or
     negative and vary in strength from zero to perfect.

   • The correlation coefficient, r, is a quantitative measure of the correlation
     between two variables. Negative values indicate negative correlations while
     positive values indicate positive correlations. The greater the absolute value
     of r, the stronger the linear correlation, with zero indicating no correlation
     at all and 1 indicating a perfect correlation.

   • Manual calculations of correlation coefficients can be quite tedious, but a
     variety of powerful technology tools are available for such calculations.


                                                                3.1 Scatter Plots and Linear Correlation • MHR   167
Communicate Your Understanding

      1. Describe the advantages and disadvantages of using a scatter plot or the
         correlation coefficient to estimate the strength of a linear correlation.

      2. a) What is the meaning of a correlation coefficient of
            i) −1?
            ii) 0?
            iii) 0.5?
         b) Can the correlation coefficient have a value greater than 1?
            Why or why not?

      3. A mathematics class finds a correlation coefficient of 0.25 for the students’
         midterm marks and their driver’s test scores and a coefficient of −0.72 for
         their weight-height ratios and times in a 1-km run. Which of these two
         correlations is stronger? Explain your answer.


Practise                                                 Apply, Solve, Communicate
 A                                                         B
 1. Classify the type of linear correlation that           3. For a week prior to their final physics
      you would expect with the following pairs                examination, a group of friends collect
      of variables.                                            data to see whether time spent studying
      a) hours of study, examination score                     or time spent watching TV had a stronger
                                                               correlation with their marks on the
      b) speed in excess of the speed limit,
                                                               examination.
         amount charged on a traffic fine
                                                                                  Hours         Examination
      c) hours of television watched per week,
                                                                Hours Studied   Watching TV        Score
         final mark in calculus                                       10              8              72
      d) a person’s height, sum of the digits in                     11              7              67
         the person’s telephone number                               15              4              81
      e) a person’s height, the person’s strength                    14              3              93
                                                                      8              9              54
 2. Identify the independent variable and the
                                                                      5             10              66
      dependent variable in a correlational study
      of                                                       a) Create a scatter plot of hours studied
      a) heart disease and cholesterol level                      versus examination score. Classify the
      b) hours of basketball practice and free-                   linear correlation.
         throw success rate                                    b) Create a similar scatter plot for the
      c) amount of fertilizer used and height                     hours spent watching TV.
         of plant                                              c) Which independent variable has a
      d) income and level of education                            stronger correlation with the
                                                                  examination scores? Explain.
      e) running speed and pulse rate


168     MHR • Statistics of Two Variables
d) Calculate the correlation coefficient for                      c) Does the computed r-value agree with
      hours studied versus examination score                           the classification you made in part a)?
      and for hours watching TV versus                                 Explain why or why not.
      examination score. Do these answers                           d) Identify any outliers in the data.
      support your answer to c)? Explain.
                                                                    e) Suggest possible reasons for any outliers
4. Application Refer to the tables in the                              identified in part d).
  investigation on page 159.
                                                                6. Application Six classmates compared their
   a) Determine the correlation coefficient                          arm spans and their scores on a recent
      and classify the linear correlation for                       mathematics test as shown in the following
      the data for each training method.                            table. Span (m)
                                                                      Arm               Score
   b) Suppose that you interchanged the                                     1.5               82
      dependent and independent variables,                                  1.4               71
      so that the test scores appear on the                                 1.7               75
      horizontal axis of a scatter plot and the                             1.6               66
      hours of training appear on the vertical
                                                                            1.6               90
      axis. Predict the effect this change will
                                                                            1.8               73
      have on the scatter plot and the
      correlation coefficient for each set of data.
                                                                    a) Illustrate these data with a scatter plot.
   c) Test your predictions by plotting the data
      and calculating the correlation                               b) Determine the correlation coefficient
      coefficients with the variables reversed.                         and classify the linear correlation.
      Explain any differences between your                          c) What can the students conclude from
      results and your predictions in part b).                         their data?

5. A company studied whether there was a                        7. a) Use data in the table on page 157 to
                                                              pte
  relationship between its employees’ years of           ha            create a scatter plot that compares the size
                                                     C


                                                                r




  service and number of days absent. The data                          of graduating classes in Gina’s program to
                                                                m
                                                     P




                                                     r
                                                         oble
  for eight randomly selected employees are                            the number of graduates who found jobs.
  shown below.                                                      b) Classify the linear correlation.
                      Years of       Days Absent                    c) Determine the linear correlation
    Employee          Service         Last Year
                                                                       coefficient.
    Jim                   5                 2
    Leah                  2                 6                   8. a) Search sources such as E-STAT,
    Efraim                7                 3                          CANSIM II, the Internet, newspapers,
    Dawn                  6                 3                          and magazines for pairs of variables that
    Chris                 4                 4                          exhibit
    Cheyenne              8                 0                          i)   a strong positive linear correlation
    Karrie                1                 2                          ii) a strong negative linear correlation
    Luke                 10                 1
                                                                       iii) a weak or zero linear correlation
   a) Create a scatter plot for these data and                      b) For each pair of variables in part a),
      classify the linear correlation.                                 identify the independent variable and
   b) Calculate the correlation coefficient.                            the dependent variable.


                                                                     3.1 Scatter Plots and Linear Correlation • MHR   169
9. Find a set of data for two variables known          13. a) Search sources such as newspapers,
      to have a perfect positive linear correlation.           magazines, and the Internet for a set of
      Use these data to demonstrate that the                   two-variable data with
      correlation coefficient for such variables is 1.          i)   a moderate positive linear correlation
      Alternatively, find a set of data with a perfect
                                                               ii) a moderate negative correlation
      negative correlation and show that the
      correlation coefficient is −1.                            iii) a correlation in which |r| > 0.9
                                                            b) Outline any conclusions that you can
10. Communication                                              make from each set of data. Are there
      a) Would you expect to see a correlation                 any assumptions inherent in these
         between the temperature at an outdoor                 conclusions? Explain.
         track and the number of people using               c) Pose at least two questions that could
         the track? Why or why not?                            form the basis for further research.
      b) Sketch a typical scatter plot of this type
         of data.                                       14. a) Sketch scatter plots of three different
                                                               patterns of data that you think would
      c) Explain the key features of your scatter
                                                               have zero linear correlation.
         plot.
                                                            b) Explain why r would equal zero for each
11. Inquiry/Problem Solving Refer to data tables               of these patterns.
      in the investigation on page 159.                     c) Use Fathom™ or a spreadsheet to create
      a) How could the Rogers Training                         a scatter plot that looks like one of your
         Company graph the data so that their                  patterns and calculate the correlation
         training method looks particularly good?              coefficient. Adjust the data points to get
      b) How could Laing Limited present the                   r as close to zero as you can.
         same data in a way that favours their
         training system?
      c) How could a mathematically
         knowledgeable consumer detect the
         distortions in how the two companies
         present the data?

 C
12. Inquiry/Problem Solving
      a) Prove that interchanging the
         independent and dependent variables
         does not change the correlation
         coefficient for any set of data.
      b) Illustrate your proof with calculations
         using a set of data selected from one of
         the examples or exercise questions in
         this section.




170     MHR • Statistics of Two Variables
3.2         Linear Regression

  Regression is an analytic technique for
  determining the relationship between a dependent
  variable and an independent variable. When the
  two variables have a linear correlation, you can
  develop a simple mathematical model of the
  relationship between the two variables by finding
  a line of best fit. You can then use the equation
  for this line to make predictions by interpolation
  (estimating between data points) and
  extrapolation (estimating beyond the range of
  the data).


      I N V E S T I G AT E & I N Q U I R E : Modelling a Linear Relationship

      A university would like to construct a mathematical model to predict
      first-year marks for incoming students based on their achievement in grade 12.
      A comparison of these marks for a random sample of first-year students is
      shown below.
      Grade 12 Average      85     90     76     78     88       84   76     96      86      85
      First-Year Average    74     83     68     70     75       72   64     91      78      86

       1. a) Construct a scatter plot for these data. Which variable should be
             placed on the vertical axis? Explain.
          b) Classify the linear correlation for this data, based on the scatter
             plot.
       2. a) Estimate and draw a line of best fit for the data.
          b) Measure the slope and y-intercept for this line, and write an equation
             for it in the form y = mx + b.
       3. Use this linear model to predict
          a) the first-year average for a student who had an 82 average in
             grade 12
          b) the grade-12 average for a student with a first-year average of 60
       4. a) Use software or the linear regression instruction of a graphing
             calculator to find the slope and y-intercept for the line of best fit.
             (Note that most graphing calculators use a instead of m to represent
             slope.)
          b) Are this slope and y-intercept close to the ones you measured in
             question 2? Why or why not?


                                                                               3.2 Linear Regression • MHR   171
c) Estimate how much the new values for slope and y-intercept will change
            your predictions in question 3. Check your estimate by recalculating your
            predictions using the new values and explain any discrepancies.
      5. List the factors that could affect the accuracy of these mathematical models.
        Which factor do you think is most critical? How could you test how much
        effect this factor could have?



It is fairly easy to “eyeball” a good estimate of the line of best fit on a scatter
plot when the linear correlation is strong. However, an analytic method using a
least-squares fit gives more accurate results, especially for weak correlations.

Consider the line of best fit in the following scatter plot. A dashed blue line
shows the residual or vertical deviation of each data point from the line of best
fit. The residual is the difference between the values of y at the data point and
at the point that lies on the line of best fit and has the same x-coordinate as the
data point. Notice that the residuals are positive for points above the line and
negative for points below the line. The boxes show the squares of the residuals.
 y




                           x

For the line of best fit in the least-squares method,
• the sum of the residuals is zero (the positive and negative residuals cancel out)
• the sum of the squares of the residuals has the least possible value

Although the algebra is daunting, it can be shown that this line has the equation

                      n(∑xy) − (∑x)(∑y)
y = ax + b, where a = ᎏᎏ and b = − − ax y −
                        n(∑x2) − (∑x)2

Recall from Chapter 2 that x is the mean of x and − is the mean of y. Many
                              −                      y
statistics texts use an equation with the form y = a + bx, so you may sometimes
see the equations for a and b reversed.




172    MHR • Statistics of Two Variables
Example 1 Applying the Least-Squares Formula

This table shows data for the full-time employees of a                          Age (years)       Annual Income ($000)
small company.                                                                         33                    33
a) Use a scatter plot to classify the correlation between                              25                    31
   age and income.                                                                     19                    18
b)   Find the equation of the line of best fit analytically.                            44                    52
c)   Predict the income for a new employee who is 21 and                               50                    56
     an employee retiring at age 65.                                                   54                    60
                                                                                       38                    44
                                                                                       29                    35

Solution

a) The scatter plot suggests a strong positive linear
                                                                                65
     correlation between age and income level.
                                                                                55




                                                                       Income
                                                                                45
                                                                                35
                                                                                25
                                                                                15

                                                                                     0 15 20 25 30 35 40 45 50 55
                                                                                                 Age

b) To determine the equation of the line of best fit, organize the data into
     a table and compute the sums required for the formula.
         Age, x          Income, y               x2             xy
            33               33              1089               1089
            25               31                  625             775
            19               18                  361             342
            44               52              1936               2288
            50               56              2500               2800
            54               60              2916               3240
            38               44              1444               1672
            29               35                  841            1015

        Α x = 292        Α y = 329      Αx   2
                                                 = 11 712   Α xy = 13 221
     Substitute these totals into the formula for a.
         n(∑xy) − (∑x)(∑y)
     a = ᎏᎏ
            n(∑x2) − (∑x)2
          8(13 221) − (292)(329)
       = ᎏᎏᎏ
            8(11 712) − (292)2
         9700
       =ᎏ
         8432
       ⋅ 1.15
       =

                                                                                         3.2 Linear Regression • MHR   173
To determine b, you also need the means of x and y.
       − ∑x           −= ᎏ∑y
       x =ᎏ           y                b = − − ax
                                           y    −
            n              n
                                         = 41.125 − 1.15(36.5)
           292            329            = −0.85
         =ᎏ             =ᎏ
            8              8
        = 36.5             = 41.125
      Now, substitute the values of a and b into the equation for the line of best fit.
      y = ax + b
        = 1.15x − 0.85
      Therefore, the equation of the line of best fit is y = 1.15x − 0.85.

c) Use the equation of the line of best fit as a model.

      For a 21-year-old employee,                  For a 65-year-old employee,
      y = ax + b                                   y = ax + b
        = 1.15(21) − 0.85                            = 1.15(65) − 0.85
        = 23.3                                       = 73.9

      Therefore, you would expect the new employee to have an income of about
      $23 300 and the retiring employee to have an income of about $73 900. Note
      that the second estimate is an extrapolation beyond the range of the data, so
      it could be less accurate than the first estimate, which is interpolated between
      two data points.


Note that the slope a indicates only how y varies with x on the line of best fit.
The slope does not tell you anything about the strength of the correlation
between the two variables. It is quite possible to have a weak correlation with
a large slope or a strong correlation with a small slope.


Example 2 Linear Regression Using Technology

Researchers monitoring the numbers of wolves and rabbits in a wildlife reserve
think that the wolf population depends on the rabbit population since wolves
prey on rabbits. Over the years, the researchers collected the following data.
Year                   1994     1995        1996   1997   1998   1999   2000     2001
Rabbit Population        61       72         78     76     65     54     39       43
Wolf Population          26       33         42     49     37     30     24       19

a)    Determine the line of best fit and the correlation coefficient for these data.
b)    Graph the data and the line of best fit. Do these data support the
      researchers’ theory?



174     MHR • Statistics of Two Variables
Solution 1   Using a Graphing Calculator

a) You can use the calculator’s linear regression instruction to find both the line
   of best fit and the correlation coefficient. Since the theory is that the wolf
   population depends on the rabbit population, the rabbit population is the
   independent variable and the wolf population is the dependent variable.

   Use the STAT EDIT menu to enter the rabbit data into list L1 and the wolf
   data into L2. Set DiagnosticOn, and then use the STAT CALC menu to select
   LinReg(ax+b).




   The equation of the line of best fit is y = 0.58x − 3.1 and the correlation
   coefficient is 0.87.

b) Store the equation for the line of best fit as a function, Y1. Then, use the
   STAT PLOT menu to set up the scatter plot. By displaying both Y1 and the
   scatter plot, you can see how closely the data plots are distributed around
   the line of best fit.




   The correlation coefficient and the scatter plot show a strong positive linear
   correlation between the variables. This correlation supports the researchers’
   theory, but does not prove that changes in the rabbit population are the
   cause of the changes in the wolf population.


Solution 2   Using a Spreadsheet

Set up a table with the data for the rabbit and wolf populations. You can
calculate the correlation coefficient with the CORREL function. Use the Chart
feature to create a scatter plot.

In Corel® Quattro® Pro, you can find the equation of the line of best fit by
selecting Tools/Numeric Tools/Regression. Enter the cell ranges for the data,
and the program will display regression calculations including the constant (b),
the x-coefficient (or slope, a), and r 2.


                                                                           3.2 Linear Regression • MHR   175
In Microsoft® Excel, you can find the equation of the line of best fit by selecting
Chart/Add Trendline. Check that the default setting is Linear. Select the straight
line that appears on your chart, then click Format/Selected Trendline/Options.
Check the Display equation on chart box. You can also display r 2.




                                                                                      Project
                                                                                      Prep

                                                                                      When analysing
                                                                                      two-variable data
Solution 3    Using Fathom™                                                           for your statistics
                                                                                      project, you may
Drag a new case table to the workspace, create attributes for Year, Rabbits, and      wish to develop a
Wolves, and enter the data. Drag a new graph to the workspace, then drag the          linear model,
Rabbits attribute to the x-axis and the Wolves attribute to the y-axis. From the      particularly if a
Graph menu, select Least Squares Line. Fathom™ will display r 2 and the               strong linear
equation for the line of best fit. To calculate the correlation coefficient directly,   correlation is
select Inspect Collection, click the Measures tab, then create a new measure by       evident.
selecting Functions/Statistical/Two Attributes/correlation and entering Rabbits
and Wolves as the attributes.




176   MHR • Statistics of Two Variables
In Example 2, the sample size is small, so you should be cautious about
making generalizations from it. Small samples have a greater chance of not
being representative of the whole population. Also, outliers can seriously
affect the results of a regression on a small sample.


Example 3 The Effect of Outliers

To evaluate the performance of one of its instructors, a driving school
tabulates the number of hours of instruction and the driving-test scores
for the instructor’s students.
Instructional Hours   10     15      21      6      18     20      12
Student’s Score       78     85      96     75      84     45      82
a)   What assumption is the management of the driving school making?
     Is this assumption reasonable?
b)   Analyse these data to determine whether they suggest that the instructor
     is an effective teacher.
c)   Comment on any data that seem unusual.
d)   Determine the effect of any outliers on your analysis.

Solution
a) The management of the driving school is assuming that the correlation
     between instructional hours and test scores is an indication of the
     instructor’s teaching skills. Such a relationship could be difficult to prove
     definitively. However, the assumption would be reasonable if the driving
     school has found that some instructors have consistently strong
     correlations between the time spent with their students and the students’
     test scores while other instructors have consistently weaker correlations.

b) The number of hours of instruction is the independent variable. You
     could analyse the data using any of the methods in the two previous
     examples. For simplicity, a spreadsheet solution is shown here.

     Except for an obvious outlier at (20, 45), the scatter plot below indicates
     a strong positive linear correlation. At first glance, it appears that the
     number of instructional hours is positively correlated to the students’ test
     scores. However, the linear regression analysis yields a line of best fit with
     the equation y = −0.13x + 80 and a correlation coefficient of −0.05.

     These results indicate that there is virtually a zero linear correlation, and
     the line of best fit even has a negative slope! The outlier has a dramatic
     impact on the regression results because it is distant from the other data
     points and the sample size is quite small. Although the scatter plot looked


                                                                             3.2 Linear Regression • MHR   177
favourable, the regression analysis suggests that the instructor’s lessons had
      no positive effect on the students’ test results.




c) The fact that the outlier is substantially below all the other data points
      suggests that some special circumstance may have caused an abnormal result.
      For instance, there might have been an illness or emotional upset that
      affected this one student’s performance on the driving test. In that case, it
      would be reasonable to exclude this data point when evaluating the driving
      instructor.

d) Remove the outlier from your data table and repeat your analysis.




      Notice that the line of best fit is now much closer to the data points and has
      a positive slope. The correlation coefficient, r, is 0.93, indicating a strong
      positive linear correlation between the number of instructional hours and
      the driver’s test scores. This result suggests that the instructor may be an
      effective teacher after all. It is quite possible that the original analysis was
      not a fair evaluation. However, to do a proper evaluation, you would need
      a larger set of data, more information about the outlier, or, ideally, both.




178     MHR • Statistics of Two Variables
As Example 3 demonstrates, outliers can skew a                  Project
regression analysis, but they could also simply                 Prep
indicate that the data really do have large variations.
A comprehensive analysis of a set of data should look           If your statistics project involves a
for outliers, examine their possible causes and their           linear relationship that contains
effect on the analysis, and discuss whether they                outliers, you will need to consider
should be excluded from the calculations. As you                carefully their impact on your results,
observed in Chapter 2, outliers have less effect on             and how you will deal with them.
larger samples.




           www.mcgrawhill.ca/links/MDM12

       Visit the above web site and follow the links to
        learn more about linear regression. Describe
           an application of linear regression that
                        interests you.


   Key Concepts

   • Linear regression provides a means for analytically determining a line of best
     fit. In the least-squares method, the line of best fit is the line which minimizes
     the sum of the squares of the residuals while having the sum of the residuals
     equal zero.

   • You can use the equation of the line of best fit to predict the value of one of
     the two variables given the value of the other variable.

   • The correlation coefficient is a measure of how well a regression line fits a set
     of data.

   • Outliers and small sample sizes can reduce the accuracy of a linear model.


   Communicate Your Understanding

    1. What does the correlation coefficient reveal about the line of best fit
       generated by a linear regression?

    2. Will the correlation coefficient always be negative when the slope of the
       line of best fit is negative? Explain your reasoning.

    3. Describe the problem that outliers present for a regression analysis and
       outline what you could do to resolve this problem.




                                                                          3.2 Linear Regression • MHR   179
Practise                                                           a) Create a scatter plot and classify the
                                                                      linear correlation.
 A                                                                 b) Apply the method of least squares to
 1. Identify any outliers in the following sets of                    generate the equation of the line of
      data and explain your choices.                                  best fit.
      a)      X       25   34        43   55    92    105   16     c) Predict the mass of a trainee whose
              Y       30   41        52   66    18    120   21        height is 165 cm.
              X       5     7        6     6     4     8           d) Predict the height of a 79-kg trainee.
      b)
              Y     304    99    198      205   106    9           e) Explain any discrepancy between your
                                                                      answer to part d) and the actual height of
 2. a) Perform a linear regression analysis to                        the 79-kg trainee in the sample group.
            generate the line of best fit for each set
            of data in question 1.                               6. A random survey of a small group of high-
                                                                   school students collected information on the
      b) Repeat the linear regressions in part a),
                                                                   students’ ages and the number of books they
            leaving out any outliers.
                                                                   had read in the past year.
      c) Compare the lines of best fit in parts a)
            and b).                                                  Age (years)    Books Read
                                                                         16               5
Apply, Solve, Communicate                                                15               3
                                                                         18               8
 B                                                                       17               6
 3. Use the formula for the method of least                              16               4
      squares to verify the slope and intercept
                                                                         15               4
      values you found for the data in the
                                                                         14               5
      investigation on page 171. Account for
                                                                         17            15
      any discrepancies.
                                                                   a) Create a scatter plot for this data.
 4. Use software or a graphing calculator to
                                                                      Classify the linear correlation.
      verify the regression results in Example 1.
                                                                   b) Determine the correlation coefficient
 5. Application The following table lists the                         and the equation of the line of best fit.
      heights and masses for a group of fire-                       c) Identify the outlier.
      department trainees.
                                                                   d) Repeat part b) with the outlier excluded.
           Height (cm)     Mass (kg)
                                                                   e) Does removing the outlier improve the
              177               91
                                                                      linear model? Explain.
              185               88
                                                                    f) Suggest other ways to improve the
              173               82
                                                                      model.
              169               79
                                                                   g) Do your results suggest that the number
              188               87
                                                                      of books a student reads depends on the
              182               85
                                                                      student’s age? Explain.
              175               79




180        MHR • Statistics of Two Variables
7. Application Market research has provided                            b) Determine the correlation coefficient
  the following data on the monthly sales of                              and the equation of the line of best fit.
  a licensed T-shirt for a popular rock band.                          c) Repeat the linear regression analysis with
      Price ($)       Monthly Sales                                       any outliers removed.
          10             2500                                          d) Repeat parts a) and b) using the data for
          12             2200                                             the productions in 2002.
          15             1600                                          e) Repeat parts a) and b) using the
          18             1200                                             combined data for productions in both
          20                 800                                          2001 and 2002. Do there still appear to
          24                 250                                          be any outliers?
  a) Create a scatter plot for these data.                             f) Which of the four linear equations
                                                                          do you think is the best model for the
  b) Use linear regression to model these
                                                                          relationship between production costs
     data.
                                                                          and revenue? Explain your choice.
  c) Predict the sales if the shirts are priced
                                                                       g) Explain why the executive producer
     at $19.
                                                                          might choose to use the equation from
  d) The vendor has 1500 shirts in stock and                              part d) to predict the income from
     the band is going to finish its concert                               MDM’s 2003 productions.
     tour in a month. What is the maximum
     price the vendor can charge and still                         9. At Gina’s university, there are 250 business
                                                                  pt
     avoid having shirts left over when the                     ha e   students who expect to graduate in 2006.
                                                            C


                                                                   r




     band stops touring?                                               a) Model the relationship between the total
                                                                   m
                                                            P




                                                            r
                                                                oble
                                                                          number of graduates and the number
8. Communication MDM Entertainment has
                                                                          hired by performing a linear regression
  produced a series of TV specials on the lives
                                                                          on the data in the table on page 157.
  of great mathematicians. The executive
                                                                          Determine the equation of the line of
  producer wants to know if there is a linear
                                                                          best fit and the correlation coefficient.
  correlation between production costs and
  revenue from the sales of broadcast rights.                          b) Use this linear model to predict how
  The costs and gross sales revenue for                                   many graduates will be hired in 2006.
  productions in 2001 and 2002 were as                                 c) Identify any outliers in this scatter plot
  follows (amounts in millions of dollars).                               and suggest possible reasons for an
               2001                         2002                          outlier. Would any of these reasons
    Cost ($M)     Sales ($M)       Cost ($M)   Sales ($M)                 justify excluding the outlier from the
       5.5            15.4            2.7          5.2                    regression calculations?
       4.1            12.1            1.9          1.0                 d) Repeat part a) with the outlier removed.
       1.8             6.9            3.4          3.4                 e) Compare the results in parts a) and d).
       3.2             9.4            2.1          1.9                    What assumptions do you have to make?
       4.2             1.5            1.4          1.5

  a) Create a scatter plot using the data for
     the productions in 2001. Do there
     appear to be any outliers? Explain.


                                                                                       3.2 Linear Regression • MHR   181
10. Communication Refer to Example 2, which                 ii) add a moveable line to the scatter plot
      describes population data for wolves and                 and construct the geometric square
      rabbits in a wildlife reserve. An alternate              for the deviation of each data point
      theory has it that the rabbit population                 from the moveable line
      depends on the wolf population since the              iii) generate a dynamic sum of the areas
      wolves prey on the rabbits.                              of these squares
      a) Create a scatter plot of rabbit population         iv) manoeuvre the moveable line to the
         versus wolf population and classify the               position that minimizes the sum of
         linear correlation. How are your data                 the areas of the squares.
         points related to those in Example 2?
                                                            v) record the equation of this line
      b) Determine the correlation coefficient
                                                         b) Determine the equation of the line of
         and the equation of the line of best fit.
                                                            best fit for this set of data.
         Graph this line on your scatter plot.
                                                         c) Compare the equations you found in
      c) Is the equation of the line of best fit the
                                                            parts a) and b). Explain any differences
         inverse of that found in Example 2?
                                                            or similarities.
         Explain.
      d) Plot both populations as a time series.      12. Application Use E-STAT or other sources
         Can you recognize a pattern or                  to obtain the annual consumer price index
         relationship between the two series?            figures from 1914 to 2000.
         Explain.                                        a) Download this information into a
      e) Does the time series suggest which                 spreadsheet or statistical software, or
         population is the dependent variable?              enter it into a graphing calculator. (If you
         Explain.                                           use a graphing calculator, enter the data
                                                            from every third year.) Find the line of
11. The following table lists the mathematics               best fit and comment on whether a
      of data management marks and grade 12                 straight line appears to be a good model
      averages for a small group of students.               for the data.
        Mathematics of Data             Grade 12         b) What does the slope of the line of best
        Management Mark                 Average             fit tell you about the rate of inflation?
                 74                         77
                                                         c) Find the slope of the line of best fit for
                 81                         87
                                                            the data for just the last 20 years, and
                 66                         68
                                                            then repeat the calculation using only
                 53                         67              the data for the last 5 years.
                 92                         85
                                                         d) What conclusions can you make by
                 45                         55
                                                            comparing the three slopes? Explain
                 80                         76              your reasoning.
      a) Using FathomTM or The Geometer’s
         Sketchpad,
         i)   create a scatter plot for these data




182     MHR • Statistics of Two Variables
ACHIEVEMENT CHECK
                                                                      C
  Knowledge/       Thinking/Inquiry/
                                                                     14. Suppose that a set of data has a perfect linear
                                       Communication   Application
 Understanding      Problem Solving                                       correlation except for two outliers, one above
13. The Worldwatch Institute has collected                                the line of best fit and the other an equal
     the following data on concentrations of                              distance below it. The residuals of these two
     carbon dioxide (CO2) in the atmosphere.                              outliers are equal in magnitude, but one is
                                                                          positive and the other negative. Would you
                 Year              CO2 Level (ppm)
                                                                          agree that a perfect linear correlation exists
                 1975                    331                              because the effects of the two residuals
                 1976                    332                              cancel out? Support your opinion with
                 1977                    333.7                            mathematical reasoning and a diagram.
                 1978                    335.3
                 1979                    336.7                       15. Inquiry/Problem Solving Recall the formulas
                 1980                    338.5                            for the line of best fit using the method of
                 1981                    339.8                            least squares that minimizes the squares of
                 1982                    341                              vertical deviations.
                 1983                    342.6                            a) Modify these formulas to produce a line
                 1984                    344.3                               of best fit that minimizes the squares of
                 1985                    345.7                               horizontal deviations.
                 1986                    347
                                                                          b) Do you think your modified formulas
                 1987                    348.8
                                                                             will produce the same equation as the
                 1988                    351.4
                                                                             regular least-squares formula?
                 1989                    352.7
                 1990                    354                              c) Use your modified formula to calculate
                 1991                    355.5                               a line of best fit for one of the examples
                 1992                    356.2                               in this section. Does your line have the
                 1993                    357                                 same equation as the line of best fit in
                 1994                    358.8                               the example? Is your equation the inverse
                 1995                    360.7                               of the equation in the example? Explain
                                                                             why or why not.
     a) Use technology to produce a scatter
         plot of these data and describe any                         16. a) Calculate the residuals for all of the data
         correlation that exists.                                            points in Example 3 on page 177.
                                                                             Make a plot of these residuals versus the
    b) Use a linear regression to find the line
                                                                             independent variable, X, and comment
         of best fit for the data. Discuss the
                                                                             on any pattern you see.
         reliability of this model.
                                                                          b) Explain how you could use such residual
     c) Use the regression equation to predict
                                                                             plots to detect outliers.
         the level of atmospheric CO2 that you
         would expect today.
    d) Research current CO2 levels. Are the
         results close to the predicted level?
         What factors could have affected the
         trend?


                                                                                          3.2 Linear Regression • MHR   183
3.3           Non-Linear Regression

  Many relationships between two variables follow patterns that are not linear. For
  example, square-law, exponential, and logarithmic relationships often appear in
  the natural sciences. Non-linear regression is an analytical technique for finding
  a curve of best fit for data from such relationships. The equation for this curve
  can then be used to model the relationship between the two variables.

  As you might expect, the calculations for curves are more complicated than those
  for straight lines. Graphing calculators have built-in regression functions for a
  variety of curves, as do some spreadsheets and statistical programs. Once you
  enter the data and specify the type of curve, these technologies can automatically
  find the best-fit curve of that type. They can also calculate the coefficient of
  determination, r 2, which is a useful measure of how closely a curve fits the data.

        I N V E S T I G AT E & I N Q U I R E : Bacterial Growth

      A laboratory technician monitors the growth of a bacterial
      culture by scanning it every hour and estimating the number
      of bacteria. The initial population is unknown.
        Time (h)         0      1       2     3      4      5      6       7
        Population       ?     10      21     43     82    168    320     475


        1. a) Create a scatter plot and classify the linear correlation.
           b) Determine the correlation coefficient and the line of
               best fit.
           c) Add the line of best fit to your scatter plot. Do you think
               this line is a satisfactory model? Explain why or why not.
        2. a) Use software or a graphing calculator to find a curve
               of best fit with a
                   i) quadratic regression of the form y = ax2 + bx + c
               ii) cubic regression of the form y = ax3 + bx2 + cx + d
           b) Graph these curves onto a scatter plot of the data.
           c) Record the equation and the coefficient of
               determination, r 2, for the curves.
           d) Use the equations to estimate the initial population               See Appendix B for details
               of the bacterial culture. Do these estimates seem                 on using technology for
               reasonable? Why or why not?                                       non-linear regressions.




  184     MHR • Statistics of Two Variables
3. a) Perform an exponential regression on the data. Graph the curve of best
              fit and record its equation and coefficient of determination.
          b) Use this model to estimate the initial population.
          c) Do you think the exponential equation is a better model for the growth
              of the bacterial culture than the quadratic or cubic equations? Explain
              your reasoning.



Recall that Pearson’s correlation coefficient, r, is a measure of the linearity
of the data, so it can indicate only how closely a straight line fits the data.
However, the coefficient of determination, r 2, is defined such that it applies
to any type of regression curve.

      variation in y explained by variation in x
r 2 = ᎏᎏᎏᎏᎏ
                     total variation in y
                – )2
      ∑( yest − y
    = ᎏᎏ
      ∑( y − – )2
               y

where − is the mean y value, yest is the value estimated by the best-fit curve for
       y
a given value of x, and y is the actual observed value for a given value of x.
                     Unexplained (x, y)
         Total       deviation
         deviation
                     Explained      (x,y est)
                                                The total variation is the
                     deviation                  sum of the squares of the
 Y




     y                                          deviations for all of the
                                                individual data points.
                     Curve of best fit

                       X

The coefficient of determination can have values from 0 to 1. If the curve is a
perfect fit, then yest and y will be identical for each value of x. In this case, the
variation in x accounts for all of the variation in y, so r 2 = 1. Conversely, if the
curve is a poor fit, the total of ( yest − − )2 will be much smaller than the total of
                                          y
( y − − )2, since the variation in x will account for only a small part of the total
      y
variation in y. Therefore, r 2 will be close to 0. For any given type of regression,
the curve of best fit will be the one that has the highest value for r 2.

For graphing calculators and Microsoft® Excel, the procedures for non-linear
regression are almost identical to those for linear regression. At present, Corel®
Quattro® Pro and Fathom™ do not have built-in functions for non-linear
regression.




                                                                             3.3 Non-Linear Regression • MHR   185
Exponential Regression
Exponential regressions produce equations with the form y = ab x or y = ae kx,
where e = 2.718 28…, an irrational number commonly used as the base for
exponents and logarithms. These two forms are equivalent, and it is
straightforward to convert from one to the other.


Example 1 Exponential Regression

Generate an exponential regression for the bacterial culture in the investigation
on page 184. Graph the curve of best fit and determine its equation and the
coefficient of determination.

Solution 1 Using a Graphing Calculator

Use the ClrList command from the STAT EDIT menu to clear lists L1 and L2,
and then enter the data. Set DiagnosticOn so that regression calculations will
display the coefficient of determination. From the STAT CALC menu, select the
non-linear regression function ExpReg. If you do not enter any list names, the
calculator will use L1 and L2 by default.




The equation for the curve of best fit is y = 5.70(1.93) x, and the coefficient of
determination is r 2 = 0.995. Store the equation as Y1. Use STAT PLOT to display
a scatter plot of the data along with Y1. From the ZOOM menu, select
9:ZoomStat to adjust the window settings automatically.




Solution 2 Using a Spreadsheet

Enter the data into two columns. Next, highlight these columns and use the
Chart feature to create an x-y scatter plot.




186   MHR • Statistics of Two Variables
Select Chart/Add Trendline and then choose Expontenial regression. Then, select
the curve that appears on your chart, and click Format/Selected Trendline/Options.
Check the option boxes to display the equation and r 2.




The equation of the best-fit curve is y = 5.7e0.66x and the coefficient of
determination is r 2 = 0.995. This equation appears different from the one found
                                                                                 ⋅
with the graphing calculator. In fact, the two forms are equivalent, since e0.66 = 1.93.


Power and Polynomial Regression
In power regressions, the curve of best fit has an equation with the form y = ax b.

Example 2 Power Regression

For a physics project, a group of students videotape a ball dropped from the top
of a 4-m high ladder, which they have marked every 10 cm. During playback,
they stop the videotape every tenth of a second and compile the following table
for the distance the ball travelled.
Time (s)        0.1    0.2    0.3     0.4     0.5   0.6   0.7    0.8    0.9     1.0
Distance (m)   0.05    0.2    0.4     0.8     1.2   1.7   2.4    3.1    3.9     4.9

a)   Does a linear model fit the data well?
b)   Use a power regression to find a curve of best fit for the data. Does the
     power-regression curve fit the data more closely than the linear model does?
c)   Use the equation for the regression curve to predict
     i) how long the ball would take to fall 10 m
     ii) how far the ball would fall in 5 s




                                                                          3.3 Non-Linear Regression • MHR   187
Solution 1      Using a Graphing Calculator

a) Although the linear correlation coefficient is 0.97, a scatter plot of the data
      shows a definite curved pattern. Since b = −1.09, the linear model predicts
      an initial position of about –1.1 m and clearly does not fit the first part of
      the data well. Also, the pattern in the scatter plot suggests the linear model
      could give inaccurate predictions for times beyond 1 s.




b) From the STAT CALC menu, select the non-linear regression function PwrReg
      and then follow the same steps as in Example 1.




      The equation for the curve of best fit is y = 4.83x2. The coefficient of
      determination and a graph on the scatter plot show that the quadratic
      curve is almost a perfect fit to the data.

c) Substitute the known values into the equation for the quadratic curve of
   best fit:
   i) 10 = 4.83x2                 ii) y = 4.83(5)2
              10
        x2 = ᎏᎏ                         = 4.83(25)
             4.83                       = 121
              Ί๶
                 10
         x = ᎏᎏ
                4.83
           = 1.4
  The quadratic model predicts that
   i) the ball would take approximately 1.4 s to fall 10 m
   ii) the ball would fall 121 m in 5 s



Solution 2      Using a Spreadsheet

a) As in Solution 1, the scatter plot shows that a curve might be a
      better model.




188     MHR • Statistics of Two Variables
b) Use the Chart feature as in Example 1, but select Power when adding the
     trend line.




     The equation for the curve of best fit is y = 4.83x 2. The graph and the value
     for r 2 show that the quadratic curve is almost a perfect fit to the data.

c) Use the equation for the curve of best fit to enter formulas for the two
     values you want to predict, as shown in cells A13 and B14 in the screen
     above.


Example 3 Polynomial Regression

Suppose that the laboratory technician takes further measurements of the
bacterial culture in Example 1.
Time (h)           8    9    10     11     12     13     14
Population     630     775   830   980    1105   1215   1410

a)   Discuss the effectiveness of the exponential model from Example 1 for
     the new data.
b)   Find a new exponential curve of best fit.
c)   Find a better curve of best fit. Comment on the effectiveness of the new
     model.




                                                                       3.3 Non-Linear Regression • MHR   189
Solution

a) If you add the new data to the scatter plot, you will see that the
      exponential curve determined earlier, y = 5.7(1.9) x, is no longer a
      good fit.




b) If you perform a new exponential regression on all 14 data points, you
      obtain the equation y = 18(1.4) x with a coefficient of determination of
      r 2 = 0.88. From the graph, you can see that this curve is not a
      particularly good fit either.

      Because of the wide range of non-linear regression options, you can
      insist on a fairly high value of r 2 when searching for a curve of best fit
      to model the data.

c) If you perform a quadratic regression, you get a much better fit with the
      equation y = 4.0x2 + 55x − 122 and a coefficient of determination of
      r 2 = 0.986.

      This quadratic model will probably serve well for interpolating between
      most of the data shown, but may not be accurate for times before 3 h
      and after 14 h. At some point between 2 h and 3 h, the curve intersects
      the x-axis, indicating a negative population prior to this time. Clearly
      the quadratic model is not accurate in this range.

      Similarly, if you zoom out, you will notice a problem beyond 14 h. The
      rate of change of the quadratic curve continues to increase after 14 h,
      but the trend of the data does not suggest such an increase. In fact,
      from 7 h to 14 h the trend appears quite linear.


It is important to recognize the limitations of regression curves. One
interesting property of polynomial regressions is that for a set of n data
points, a polynomial function of degree n − 1 can be produced which
perfectly fits the data, that is, with r 2 = 1.

For example, you can determine the equation for a line (a first-degree
polynomial) with two points and the equation for a quadratic (a second-            Project
degree polynomial) with three points. However, these polynomials are               Prep
not always the best models for the data. Often, these curves can give
inaccurate predictions when extrapolated.                                          Non-linear models may
                                                                                   be useful when you are
Sometimes, you can find that several different types of curves fit closely           analysing two-variable
to a set of data. Extrapolating to an initial or final state may help               data in your statistics
determine which model is the most suitable. Also, the mathematical                 project.
model should show a logical relationship between the variables.
190     MHR • Statistics of Two Variables
Key Concepts

       • Some relationships between two variables can be modelled using non-linear
         regressions such as quadratic, cubic, power, polynomial, and exponential
         curves.

       • The coefficient of determination, r 2, is a measure of how well a regression
         curve fits a set of data.

       • Sometimes more than one type of regression curve can provide a good fit for
         data. To be an effective model, however, the curve must be useful for
         extrapolating beyond the data.

       Communicate Your Understanding

        1. A data set for two variables has a linear correlation coefficient of 0.23. Does
          this value preclude a strong correlation between the variables? Explain why
          or why not.

        2. A best-fit curve for a set of data has a coefficient of determination of r 2 = 0.76.
          Describe some techniques you can use to improve the model.



Practise                                                      2. For each set of data use software or a
                                                                graphing calculator to find the equation and
 A                                                              coefficient of determination for a curve of
 1. Match each of the following coefficients of                  best fit.
       determination with one of the diagrams                    a)                  b)                 c)
       below.
                                                                      x      y             x      y          x      y
       a) 0        b) 0.5           c) 0.9     d) 1
                                                                  −2.8       0.6          −2.7    1.6        1.1    2.5
i)                            ii)
                                                                  −3.5      −5.8          −3.5   −3          3.5   11
                                                                  −2         3            −2.2    3          2.8    8.6
                                                                  −1         6          −0.5     −0.5        2.3    7
                                                                      0.2    4             0      1.3        0      1
                                                                      1      1             0.6    4.7        3.8   14
                                                                  −1.5       5            −1.8    1.7        1.4    4.2
iii)                          iv)                                     1.4   −3.1          −3.8   −7       −4        0.2
                                                                      0.7    3            −1.3    0.6     −1.3      0.6
                                                                  −0.3       6.1           0.8    7          3     12
                                                                  −3.3      −3.1           0.5    2.7        4.1   17
                                                                  −4        −7            −1      1.5        2.2    5
                                                                      2     −5.7          −3     −1.1     −2.7     0.4




                                                                                 3.3 Non-Linear Regression • MHR    191
Apply, Solve, Communicate                                 Animal              Mass (kg)     BMR (kJ/day)
                                                          Frog                     0.018         0.050
 B                                                        Squirrel                 0.90          1.0
 3. The heights of a stand of pine trees were             Cat                      3.0           2.6
      measured along with the area under the
                                                          Monkey                   7.0           4.0
      cone formed by their branches.
                                                          Baboon                30              14
      Height (m)        Area (m2)
                                                          Human                 60              25
          2.0              5.9
                                                          Dolphin             160               44
          1.5              3.4
                                                          Camel               530             116
          1.8              4.8
          2.4              8.6                            a) Create a scatter plot and explain why
          2.2              7.3                               Kleiber thought a power-regression
          1.2              2.1                               curve would fit the data.
          1.8              4.9
          3.1             14.4                           b) Use a power regression to find the
                                                             equation of the curve of best fit. Can
      a) Create a scatter plot of these data.                you rewrite the equation so that it has
      b) Determine the correlation coefficient                exponents that are whole numbers?
         and the equation of the line of best fit.            Do so, if possible, or explain why not.
      c) Use a power regression to calculate a            c) Is this power equation a good
         coefficient of determination and an                  mathematical model for the relationship
         equation for a curve of best fit.                    between an animal’s mass and its basal
      d) Which model do you think is more                    metabolic rate? Explain why or why not.
         accurate? Explain why.                          d) Use the equation of the curve of best fit
      e) Use the more accurate model to predict              to predict the basal metabolic rate of
         i) the area under a tree whose height is            i) a 15-kg dog
            2.7 m                                            ii) a 2-tonne whale
         ii) the height of a tree whose area is
                                                       5. Application As a sample of a radioactive
            30 m2
                                                         element decays into more stable elements,
      f) Suggest a reason why the height and             the amount of radiation it gives off
         circumference of a tree might be related in     decreases. The level of radiation can be
         the way that the model in part d) suggests.     used to estimate how much of the original
                                                         element remains. Here are measurements
 4. Application The biologist Max Kleiber
                                                         for a sample of radium-227.
      (1893−1976) pioneered research on the
      metabolisms of animals. In 1932, he                  Time (h)   Radiation Level (%)
      determined the relationship between an                     0          100
      animal’s mass and its energy requirements or               1            37
      basal metabolic rate (BMR). Here are data                  2            14
      for eight animals.                                         3             5.0
                                                                 4             1.8
                                                                 5             0.7
                                                                 6             0.3



192     MHR • Statistics of Two Variables
a) Create a scatter plot for these data.             b) Use the equation for this curve of best fit
               b) Use an exponential regression to                     to estimate the power level at a distance of
                  find the equation for the curve of best               i)   1.0 km from the transmitter
                  fit.                                                  ii) 4.0 km from the transmitter
               c) Is this equation a good model for the                iii) 50.0 km from the transmitter
                  radioactive decay of this element?
                  Explain why or why not.                        8. Communication Logistic curves are often a
               d) A half-life is the time it takes for half of     good model for population growth. These
                  the sample to decay. Use the regression          curves have equations with the form
                                                                           c
                  equation to estimate the half-life of            y = ᎏ , where a, b, and c are constants.
                                                                       1 + ae−bx
                  radium-227.
                                                                   Consider the following data for the bacterial
           6. a) Create a time-series graph for the                culture in Example 1:
         pte
    ha            mean starting salary of the graduates             Time (h)        0     1     2      3      4     5
C


           r




                  who find jobs. Describe the pattern                Population      ?    10     21     43     82   168
           m
P




r
    oble
                  that you see.
                                                                    Time (h)        6     7     8      9      10    11
               b) Use non-linear regression to construct a
                  curve of best fit for the data. Record the         Population     320   475   630    775    830   980
                  equation of the curve and the coefficient          Time (h)       12    13     14     15     16    17
                  of determination.
                                                                    Population 1105 1215 1410 1490 1550 1575
               c) Comment on whether this equation is
                  a good model for the graduates’ starting          Time (h)       18    19     20
                  salaries.                                         Population 1590 1600 1600

                                                                    a) Use software or a graphing calculator
           7. An engineer testing the transmitter for a
               new radio station measures the radiated                 to find the equation and coefficient of
               power at various distances from the                     determination for the logistic curve
               transmitter. The engineer’s readings are                that best fits the data for the bacteria
               in microwatts per square metre.                         population from 1 to 20 h.
                                                                   b) Graph this curve on a scatter plot of
                Distance (km) Power Level (µW/m2)
                                                                       the data.
                      2.0              510
                      5.0               78                          c) How well does this curve appear to

                      8.0               32
                                                                       fit the entire data set? Describe the
                                                                       shape of the curve.
                     10.0               19
                     12.0               14                         d) Write a brief paragraph to explain

                     15.0                9
                                                                       why you think a bacterial population
                                                                       may exhibit this type of growth
                     20.0                5
                                                                       pattern.
               a) Find an equation for a curve of best fit
                  for these data that has a coefficient of
                  determination of at least 0.98.




                                                                                 3.3 Non-Linear Regression • MHR   193
9. Inquiry/Problem Solving The following            11. Inquiry/Problem Solving Use a software
      table shows the estimated population of a         program, such as Microsoft® Excel, to
      crop-destroying insect.                           analyse these two sets of data:
         Year      Population (billions)                             Data Set A                  Data Set B
         1995              100                                   x                y          x                y
         1996              130                                   2                5              2             6
         1997              170                                   4                7              4             5
         1998              220                                   6                2              7            –4
         1999              285                                   8                5              9             1
         2000              375                                                               12                2
         2001              490                           a) For each set of data,
      a) Determine an exponential curve of best             i) determine the degree of polynomial
         fit for the population data.                             regression that will generate a
      b) Suppose that 100 million of an arachnid                 perfectly fit regression curve
         that preys on the insect are imported              ii) perform the polynomial regression
         from overseas in 1995. Assuming the                     and record the value of r 2 and the
         arachnid population doubles every year,                 equation of the regression curve
         estimate when it would equal 10% of the         b) Assess the effectiveness of the best-fit
         insect population.                                 polynomial curve as a model for the
      c) What further information would                     trend of the set of data.
         you need in order to estimate the               c) For data set B,
         population of the crop-destroying
                                                            i)       explain why the best-fit polynomial
         insect once the arachnids have been
                                                                     curve is an unsatisfactory model
         introduced?
                                                            ii) generate a better model and record
      d) Write an expression for the size of this
                                                                     the value of r 2 and the equation of
         population.
                                                                     your new best-fit curve
 C                                                          iii) explain why this curve is a better
10. Use technology to calculate the coefficient                   model than the polynomial curve
      of determination for two of the linear                     found in part a)
      regression examples in section 3.2. Is there
      any relationship between these coefficients
      of determination and the linear correlation
      coefficients for these examples?




194     MHR • Statistics of Two Variables
3.4         Cause and Effect

  Usually, the main reason for a correlational study is
  to find evidence of a cause-and-effect relationship.
  A health researcher may wish to prove that even
  mild exercise reduces the risk of heart disease. A
  chemical company developing an oil additive
  would like to demonstrate that it improves engine
  performance. A school board may want to know
  whether calculators help students learn
  mathematics. In each of these cases, establishing
  a strong correlation between the variables is just
  the first step in determining whether one affects
  the other.



      I N V E S T I G AT E & I N Q U I R E : C o r r e l a t i o n Ve r s u s C a u s e a n d E f f e c t

      1. List the type of correlation that you would expect to observe between the
         following pairs of variables. Also list whether you think the correlation is
         due to a cause-and-effect relationship or some other factor.
          a) hours spent practising at a golf driving range, golf drive distance
          b) hours spent practising at a golf driving range, golf score
          c) size of corn harvest, size of apple harvest
          d) score on a geometry test, score on an algebra test
          e) income, number of CDs purchased
      2. Compare your list with those of your classmates and discuss any differences.
         Would you change your list because of factors suggested by your classmates?
      3. Suggest how you could verify whether there is a cause-and-effect relationship
         between each pair of variables.

  A strong correlation does not prove that the changes in one variable cause
  changes in the other. There are various types and degrees of causal relationships
  between variables.

  Cause-and-Effect Relationship: A change in X produces a change in Y. Such
  relationships are sometimes clearly evident, especially in physical processes.
  For example, increasing the height from which you drop an object increases its
  impact velocity. Similarly, increasing the speed of a production line increases
  the number of items produced each day (and, perhaps, the rate of defects).


                                                                                       3.4 Cause and Effect • MHR   195
Common-Cause Factor: An external variable causes two variables to change
in the same way. For example, suppose that a town finds that its revenue from
parking fees at the public beach each summer correlates with the local tomato
harvest. It is extremely unlikely that cars parked at the beach have any effect on
the tomato crop. Instead good weather is a common-cause factor that increases
both the tomato crop and the number of people who park at the beach.

Reverse Cause-and-Effect Relationship: The dependent and independent
variables are reversed in the process of establishing causality. For example,
suppose that a researcher observes a positive linear correlation between the
amount of coffee consumed by a group of medical students and their levels of
anxiety. The researcher theorizes that drinking coffee causes nervousness, but
instead finds that nervous people are more likely to drink coffee.

Accidental Relationship: A correlation exists without any causal relationship
between variables. For example, the number of females enrolled in
undergraduate engineering programs and the number of “reality” shows on
television both increased for several years. These two variables have a positive
linear correlation, but it is likely entirely coincidental.

Presumed Relationship: A correlation does not seem to be accidental even
though no cause-and-effect relationship or common-cause factor is apparent.
For example, suppose you found a correlation between people’s level of fitness
and the number of adventure movies they watched. It seems logical that a
physically fit person might prefer adventure movies, but it would be difficult
to find a common cause or to prove that the one variable affects the other.



Example 1 Causal Relationships

Classify the relationships in the following situations.
a)  The rate of a chemical reaction increases with temperature.
b)    Leadership ability has a positive correlation with academic achievement.
c)    The prices of butter and motorcycles have a strong positive correlation
      over many years.
d)    Sales of cellular telephones had a strong negative correlation with ozone
      levels in the atmosphere over the last decade.
e)    Traffic congestion has a strong correlation with the number of urban
      expressways.




196     MHR • Statistics of Two Variables
Solution

a) Cause-and-effect relationship: Higher temperatures cause faster reaction rates.

b) Presumed relationship: A positive correlation between leadership ability and
   academic achievement seems logical, yet there is no apparent common-cause
   factor or cause-and-effect relationship.

c) Common-cause factor: Inflation has caused parallel increases in the prices
   of butter and motorcycles over the years.

d) Accidental relationship: The correlation between sales of cellular telephones
   and ozone levels is largely coincidental. However, it is possible that the
   chemicals used to manufacture cellular telephones cause a small portion
   of the depletion of the ozone layer.

e) Cause-and-effect relationship and reverse cause-and-effect relationship:
   Originally expressways were built to relieve traffic congestion, so traffic
   congestion did lead to the construction of expressways in major cites
   throughout North America. However, numerous studies over the last
   20 years have shown that urban expressways cause traffic congestion by
   encouraging more people to use cars.



As Example 1 demonstrates, several types of causal relationships can be involved
in the same situation. Determining the nature of causal relationships can be
further complicated by the presence of extraneous variables that affect either
the dependent or the independent variable. Here, extraneous means external
rather than irrelevant.
For example, you might expect to see a strong positive correlation between term
marks and final examination results for students in your class since both these
variables are affected by each student’s aptitude and study habits. However,
there are extraneous factors that could affect the examination results, including
the time each student had for studying before the examination, the individual
examination schedules, and varying abilities to work well under pressure.

In order to reduce the effect of extraneous variables, researchers often compare
an experimental group to a control group. These two groups should be as
similar as possible, so that extraneous variables will have about the same effect
on both groups. The researchers vary the independent variable for the
experimental group but not for the control group. Any difference in the
dependent variables for the two groups can then be attributed to the changes
in the independent variable.




                                                                           3.4 Cause and Effect • MHR   197
Example 2 Using a Control Group

A medical researcher wants to test a new drug believed to help smokers
overcome the addictive effects of nicotine. Fifty people who want to quit
smoking volunteer for the study. The researcher carefully divides the volunteers
into two groups, each with an equal number of moderate and heavy smokers.
One group is given nicotine patches with the new drug, while the second group
uses ordinary nicotine patches. Fourteen people in the first group quit smoking
completely, as do nine people in the second group.
a)    Identify the experimental group, the control group, the independent
      variable, and the dependent variable.
b)    Can the researcher conclude that the new drug is effective?
c)    What further study should the researcher do?

Solution

a) The experimental group consists of the volunteers being given nicotine
      patches with the new drug, while the control group consists of the
      volunteers being given the ordinary patches. The independent variable is
      the presence of the new drug, and the dependent variable is the number
      of volunteers who quit smoking.

b) The results of the study are promising, but the researcher has not proven
      that the new drug is effective. The sample size is relatively small, which is
      prudent for an early trial of a new drug that could have unknown side-
      effects. However, the sample is small enough that the results could be
      affected by random statistical fluctuations or extraneous variables, such as
      the volunteers’ work environments, previous attempts to quit, and the
      influence of their families and friends.

c) Assuming that the new drug does not have any serious side-effects, the
      researcher should conduct further studies with larger groups and try to
      select the experimental and control groups to minimize the effect of all
      extraneous variables. The researcher might also conduct a study with
      several experimental groups that receive different dosages of the new drug.



When designing a study or interpreting a correlation, you often need
background knowledge and insight to recognize the causal relationships
present. Here are some techniques that can help determine whether a
correlation is the result of a cause-and-effect relationship.




198     MHR • Statistics of Two Variables
• Use sampling methods that hold the extraneous variables constant.
                                                                                     Project
                                                                                     Prep
• Conduct similar investigations with different samples and check for
  consistency in the results.                                                        In your statistics
• Remove, or account for, possible common-cause factors.                             project, you may
                                                                                     wish to consider
The later chapters in this book introduce probability theory and some                cause-and-effect
statistical methods for a more quantitative approach to determining cause-           relationships and
and-effect relationships.                                                            extraneous variables
                                                                                     that could affect
                                                                                     your study.

   Key Concepts

   • Correlation does not necessarily imply a cause-and-effect relationship.
     Correlations can also result from common-cause factors, reverse cause-and-
     effect relationships, accidental relationships, and presumed relationships.

   • Extraneous variables can invalidate conclusions based on correlational
     evidence.

   • Comparison with a control group can help remove the effect of extraneous
     variables in a study.


   Communicate Your Understanding

     1. Why does a strong linear correlation not imply cause and effect?

     2. What is the key characteristic of a reverse cause-and-effect relationship?

     3. Explain the difference between a common-cause factor and an extraneous
        variable.

     4. Why are control groups used in statistical studies?



Practise                                                      b) score on physics examination, score on
                                                                 calculus examination
 A                                                            c) increase in pay, job performance
 1. Identify the most likely type of causal
                                                              d) population of rabbits, consumer price
     relationship between each of the following
                                                                 index
     pairs of variables. Assume that a strong
     positive correlation has been observed with              e) number of scholarships received, number
     the first variable as the independent variable.              of job offers upon graduation
     a) alcohol consumption, incidence of                     f) coffee consumption, insomnia
        automobile accidents                                  e) funding for athletic programs, number
                                                                 of medals won at Olympic games

                                                                             3.4 Cause and Effect • MHR     199
2. For each of the following common-cause                       6. Application A random survey of students
      relationships, identify the common-cause                       at Statsville High School found that their
      factor. Assume a positive correlation                          interest in computer games is positively
      between each pair of variables.                                correlated with their marks in mathematics.
      a) number of push-ups performed in one                         a) How would you classify this causal
         minute, number of sit-ups performed in                         relationship?
         one minute                                                  b) Suppose that a follow-up study found
      b) number of speeding tickets, number of                          that students who had increased the time
         accidents                                                      they spent playing computer games
      c) amount of money invested, amount of                            tended to improve their mathematics
         money spent                                                    marks. Assuming that this study held all
                                                                        extraneous variables constant, would you
Apply, Solve, Communicate                                               change your assessment of the nature of
                                                                        the causal relationship? Explain why or
 3. A civil engineer examining traffic flow                               why not.
      problems in a large city observes that the
      number of traffic accidents is positively                   7. a) The net assets of Custom Industrial
      correlated with traffic density and concludes                      Renovations Inc., an industrial
      that traffic density is likely to be a major                       construction contractor, has a strong
      cause of accidents. What alternative                              negative linear correlation with those of
      conclusion should the engineer consider?                          MuchMega-Fun, a toy distributor. How
                                                                        would you classify the causal relationship
 B                                                                      between these two variables?
 4. Communication An elementary school is
                                                                     b) Suppose that the two companies are both
      testing a new method for teaching grammar.                        subsidiaries of Diversified Holdings Ltd.,
      Two similar classes are taught the same                           which often shifts investment capital
      material, one with the established method                         between them. Explain how this additional
      and the other with the new method. When                           information could change your
      both classes take the same test, the class                        interpretation of the correlation in part a).
      taught with the established method has
      somewhat higher marks.                                     8. Communication Aunt Gisele simply cannot
      a) What extraneous variables could                             sleep unless she has her evening herbal tea.
         influence the results of this study?                         However, the package for the tea does not
                                                                     list any ingredients known to induce sleep.
      b) Explain whether the study gives the school
                                                                     Outline how you would conduct a study to
         enough evidence to reject the new method.
                                                                     determine whether the tea really does help
      c) What further studies would you
                                                                     people sleep.
         recommend for comparing the two
         teaching methods?                                       9. Find out what a double-blind study is and
                                                                     briefly explain the advantages of using this
 5. Communication An investor observes a
                                                                     technique in studies with a control group.
      positive correlation between the stock price
      of two competing computer companies.                      10. a) The data on page 157 show a positive
                                                               pte
      Explain what type of causal relationship is         ha            correlation between the size of the
                                                      C


                                                                 r




      likely to account for this correlation.                           graduating class and the number of
                                                                 m
                                                      P




                                                      r
                                                          oble


200     MHR • Statistics of Two Variables
graduates hired. Does this correlation                         12. Search the E-STAT, CANSIM II, or other
         mean that increasing the number of                                  databases for a set of data on two variables
         graduates causes a higher demand for                                with a positive linear correlation that you
         them? Explain your answer.                                          believe to be accidental. Explain your
    b) A recession during the first half of the                               findings and reasoning.
         1990s reduced the demand for business
                                                                         C
         graduates. Review the data on page 157
                                                                        13. Use a library, the Internet, or other
         and describe any trends that may be
         caused by this recession.                                           resources to find information on the
                                                                             Hawthorne effect and the placebo effect.
                                                                             Briefly explain what these effects are, how
       ACHIEVEMENT CHECK                                                     they can affect a study, and how researchers
                                                                             can avoid having their results skewed by
  Knowledge/       Thinking/Inquiry/
                                       Communication      Application        these effects.
 Understanding      Problem Solving

11. The table below lists numbers of divorces                           14. Inquiry/Problem Solving In a behavioural
     and personal bankruptcies in Canada for                                 study of responses to violence, an
     the years 1976 through 1985.                                            experimental group was shown violent
            Year                Divorces         Bankruptcies                images, while a control group was shown
           1976                  54 207                10 049                neutral images. From the initial results, the
           1977                  55 370                12 772                researchers suspect that the gender of the
           1978                  57 155                15 938                people in the groups may be an extraneous
           1979                  59 474                17 876
                                                                             variable. Suggest how the study could be
                                                                             redesigned to
           1980                  62 019                21 025
           1981                  67 671                23 036                a) remove the extraneous variable
           1982                  70 436                30 643                b) determine whether gender is part of the
           1983                  68 567                26 822                   cause-and-effect relationship
           1984                  65 172                22 022
                                                                        15. Look for material in the media or on the
           1985                  61 976                19 752                Internet that incorrectly uses correlational
     a) Create a scatter plot and classify the                               evidence to claim that a cause-and-effect
         linear correlation between the number                               relationship exists between the two variables.
         of divorces and the number of                                       Briefly describe
         bankruptcies.                                                       a) the nature of the correlational study
    b) Perform a regression analysis. Record                                 b) the cause and effect claimed or inferred
         the equation of the line of best fit and                             c) the reasons why cause and effect was not
         the correlation coefficient.                                            properly proven, including any
     c) Identify an external variable that could                                extraneous variables that were not
         be a common-cause factor.                                              accounted for
    d) Describe what further investigation you                               d) how the study could be improved
         could do to analyse the possible
         relationship between divorces and
         bankruptcies.


                                                                                             3.4 Cause and Effect • MHR   201
3.5           Critical Analysis

  Newspapers and radio and television news programs often run stories involving
  statistics. Indeed, the news media often commission election polls or surveys on
  major issues. Although the networks and major newspapers are reasonably careful
  about how they present statistics, their reporters and editors often face tight
  deadlines and lack the time and mathematical knowledge to thoroughly critique
  statistical material. You should be particularly careful about accepting statistical
  evidence from sources that could be biased. Lobby groups and advertisers like to
  use statistics because they appear scientific and objective. Unfortunately, statistics
  from such sources
  are sometimes flawed
  by unintentional or,
  occasionally, entirely
  deliberate bias.
  To judge the
  conclusions of a
  study properly, you
  need information
  about its sampling
  and analytical
  methods.


        I N V E S T I G AT E & I N Q U I R E : Statistics in the Media

        1. Find as many instances as you can of statistical claims made in the media or
          on the Internet, including news stories, features, and advertisements. Collect
          newspaper and magazine clippings, point-form notes of radio and television
          stories, and printouts of web pages.
        2. Compare the items you have collected with those found by your classmates.
          What proportion of the items provide enough information to show that they
          used valid statistical methods?
        3. Select several of the items. For each one, discuss
           a) the motivation for the statistical study
           b) whether the statistical evidence
                                                                     www.mcgrawhill.ca/links/MDM12
               justifies the claim being made
                                                                 Visit the above web site and follow the links to
  The examples in this section illustrate how you               learn more about how statistics can be misused.
  can apply analytical tools to assess the results of                Describe two examples of the misuse of
  statistical studies.                                                              statistics.


  202     MHR • Statistics of Two Variables
Example 1 Sample Size and Technique                                            Test Score        Productivity
                                                                                    98               78
A manager wants to know if a new aptitude test accurately predicts
                                                                                    57               81
employee productivity. The manager has all 30 current employees                     82               83
write the test and then compares their scores to their                              76               44
productivities as measured in the most recent performance reviews.                  65               62
                                                                                    72               89
The data is ordered alphabetically by employee surname. In order                    91               85
to simplify the calculations, the manager selects a systematic                      87               71
sample using every seventh employee. Based on this sample, the                      81               76
                                                                                    39               71
manager concludes that the company should hire only applicants                      50               66
who do well on the aptitude test. Determine whether the                             75               90
manager’s analysis is valid.                                                        71               48
                                                                                    89               80
                                                                                    82               83
Solution                                                                            95               72
                                                                                    56               72
A linear regression of the systematic sample produces a line of best                71               90
fit with the equation y = 0.55x + 33 and a correlation coefficient of                 68               74
r = 0.98, showing a strong linear correlation between productivity                  77               51
                                                                                    59               65
and scores on the aptitude test. Thus, these calculations seem to                   83               47
support the manager’s conclusion. However, the manager has made                     75               91
the questionable assumption that a systematic sample will be                        66               77
                                                                                    48               63
representative of the population. The sample is so small that                       61               58
statistical fluctuations could seriously affect the results.                         78               55
                                                                                    70               73
               y                                                                    68               75
               84                                                                   64               69
Productivity




               80
               76
               72
               68
               64

                    0 55 60 65 70 75 80 85 90 95 x
                              Test Score


Examine the raw data. A scatter plot with all 30 data points does not show
any clear correlation at all. A linear regression yields a line of best fit with the
equation y = 0.15x + 60 and a correlation coefficient of only 0.15.
               y
               90
Productivity




               80
               70
               60
               50
               40

                    0 30 40 50 60 70 80 90 100 x
                             Test Score


                                                                               3.5 Critical Analysis • MHR   203
Thus, the new aptitude test will probably be useless for predicting employee
productivity. Clearly, the sample was far from representative. The manager’s
choice of an inappropriate sampling technique has resulted in a sample size too
small to make any valid conclusions.



In Example 1, the manager should have done an analysis using all of the data
available. Even then the data set is still somewhat small to use as a basis for a
major decision such as changing the company’s hiring procedures. Remember
that small samples are also particularly vulnerable to the effects of outliers.

Example 2 Extraneous Variables and Sample Bias

An advertising blitz by SuperFast Computer Training Inc. features profiles of
some of its young graduates. The number of months of training that these
graduates took, their job titles, and their incomes appear prominently in the
advertisements.
                                      Months of   Income
 Graduate
                                       Training   ($000)
 Sarah, software developer                  9       85
 Zack, programmer                           6       63
 Eli, systems analyst                       8       72
 Yvette, computer technician                5       52
 Kulwinder, web-site designer               6       66
 Lynn, network administrator                4       60

a)    Analyse the company’s data to determine the strength of the linear
      correlation between the amount of training the graduates took and their
      incomes. Classify the linear correlation and find the equation of the linear
      model for the data.
b)    Use this model to predict the income of a student who graduates from
      the company’s two-year diploma program after 20 months of training.
      Does this prediction seem reasonable?
c)    Does the linear correlation show that SuperFast’s training accounts for
      the graduates’ high incomes? Identify possible extraneous variables.
d)    Discuss any problems with the sampling technique and the data.


Solution
a)    The scatter plot for income versus months of training shows a definite
      positive linear correlation. The regression line is y = 5.44x + 31.9, and
      the correlation coefficient is 0.90. There appears to be a strong positive
      correlation between the amount of training and income.

204     MHR • Statistics of Two Variables
b)   As shown in cell C9 in the screen above, substituting 20 months into the linear
     regression equation predicts an income of approximately
     y = 5.44(20) + 31.9
       = 141
     Therefore, the linear model predicts that a graduate who has taken 20 months
     of training will make about $141 000 a year. This amount is extremely high
     for a person with a two-year diploma and little or no job experience. The
     prediction suggests that the linear model may not be accurate, especially when
     applied to the company’s longer programs.
c)   Although the correlation between SuperFast’s training and the graduates’
     incomes appears to be quite strong, the correlation by itself does not prove
     that the training causes the graduates’ high incomes. A number of extraneous
     variables could contribute to the graduates’ success, including experience prior
     to taking the training, aptitude for working with computers, access to a high-
     end computer at home, family or social connections in the industry, and the
     physical stamina to work very long hours.
d)   The sample is small and could have intentional bias. There is no indication
     that the individuals in the advertisements were randomly chosen from the
     population of SuperFast’s students. Quite likely, the company carefully selected
     the best success stories in order to give potential customers inflated
     expectations of future earnings. Also, the company shows youthful graduates,
     but does not actually state that the graduates earned their high incomes
     immediately after graduation. It may well have taken the graduates years of
     hard work to reach the income levels listed in the advertisements. Further, the
     amounts given are incomes, not salaries. The income of a graduate working
     for a small start-up company might include stock options that could turn out
     to be worthless. In short, the advertisements do not give you enough
     information to properly evaluate the data.



                                                                            3.5 Critical Analysis • MHR   205
Example 2 had several fairly obvious extraneous variables. However, extraneous
      variables are sometimes difficult to recognize. Such hidden or lurking
      variables can also invalidate conclusions drawn from statistical results.


Example 3 Detecting a Hidden Variable

An arts council is considering whether to fund the start-up of a local youth
orchestra. The council has a limited budget and knows that the number of
youth orchestras in the province has been increasing. The council needs to
know whether starting another youth orchestra will help the development of
young musicians. One measure of the success of such programs is the number
of youth-orchestra players who go on to professional orchestras. The council
has collected the following data.
 Year    Number of Youth Orchestras         Number of Players Becoming Professionals
 1991                10                                      16
 1992                11                                      18
 1993                12                                      20
 1994                12                                      23
 1995                14                                      26
 1996                14                                      32
 1997                16                                      13
 1998                16                                      16
 1999                18                                      20
 2000                20                                      26
a)    Does a linear regression allow you to determine whether the council
      should fund a new youth orchestra? Can you draw any conclusions from
      other analysis?
b)    Suppose you discover that one of the country’s professional orchestras
      went bankrupt in 1997. How does this information affect your analysis?

Solution

a)    A scatter plot of the number of youth-orchestra members who go on to
      play professionally versus the number of youth orchestras shows that there
      may be a weak positive linear correlation. The correlation coefficient is
      0.16, indicating that the linear correlation is very weak. Therefore, you
      might conclude that starting another youth orchestra will not help the
      development of young musicians. However, notice that the data points seem
      to form two clusters in the scatter plot, one on the left side and the other
      on the right. This unusual pattern suggests the presence of a hidden
      variable, which could affect your analysis. You will need more information
      to determine the nature and effect of the possible hidden variable.




206     MHR • Statistics of Two Variables
You have enough data to produce a time-series graph of the numbers of
   young musicians who go on to professional orchestras. This graph also has
   two clusters of data points. The numbers rise from 1991 to 1996, drop
   substantially in 1997, and then rise again. This pattern suggests that
   something unusual happened in 1997.




b) The collapse of a major orchestra means both that there is one less orchestra
   hiring young musicians and that about a hundred experienced players are
   suddenly available for work with the remaining professional orchestras.
   The resulting drop in the number of young musicians hired by professional
   orchestras could account for the clustering of data points you observed in
   part a). Because of the change in the number of jobs available for young
   musicians, it makes sense to analyse the clusters separately.




                                                                         3.5 Critical Analysis • MHR   207
Observe that the two sets of data both exhibit a strong linear correlation.
      The correlation coefficients are 0.93 for the data prior to 1997 and 0.94 for
      the data from 1997 on. The number of players who go on to professional
      orchestras is strongly correlated to the number of youth orchestras. So,
      funding the new orchestra may be a worthwhile project for the arts council.

      The presence of a hidden variable, the collapse of a major orchestra,
      distorted the data and masked the underlying pattern. However, splitting
      the data into two sets results in smaller sample sizes, so you still have to
      be cautious about drawing conclusions.



When evaluating claims based on statistical studies, you must assess the             Project
methods used for collecting and analysing the data. Some critical questions          Prep
are:
                                                                                     When collecting
• Is the sampling process free from intentional and unintentional bias?              and analysing data
                                                                                     for your statistics
• Could any outliers or extraneous variables influence the results?
                                                                                     project, you can
• Are there any unusual patterns that suggest the presence of a hidden               apply the concepts
  variable?                                                                          in this section to
                                                                                     ensure that your
• Has causality been inferred with only correlational evidence?                      conclusions are
                                                                                     valid.




208     MHR • Statistics of Two Variables
Key Concepts

  • Although the major media are usually responsible in how they present statistics,
    you should be cautious about accepting any claim that does not include
    information about the sampling technique and analytical methods used.

  • Intentional or unintentional bias can invalidate statistical claims.

  • Small sample sizes and inappropriate sampling techniques can distort the data
    and lead to erroneous conclusions.

  • Extraneous variables must be eliminated or accounted for.

  • A hidden variable can skew statistical results and yet still be hard to detect.

  Communicate Your Understanding

    1. Explain how a small sample size can lead to invalid conclusions.

    2. A city councillor states that there are problems with the management of the
       police department because the number of reported crimes in the city has risen
       despite increased spending on law enforcement. Comment on the validity of
       this argument.

    3. Give an example of a hidden variable not mentioned in this section, and
       explain why this variable would be hard to detect.



Apply, Solve, Communicate                               B
A                                                       3. A student compares height and grade
                                                            average with four friends and collects
1. An educational researcher discovers that
                                                            the following data.
    levels of mathematics anxiety are negatively
    correlated with attendance in mathematics                  Height (cm)    Grade Average (%)
    class. The researcher theorizes that poor                     171                 73
    attendance causes mathematics anxiety.                        145                 91
    Suggest an alternate interpretation of the                    162                 70
    evidence.                                                     159                 81
                                                                  178                 68
2. A survey finds a correlation between the
    proportion of high school students who own              From this table, the student concludes that
    a car and the students’ ages. What hidden               taller students tend to get lower marks.
    variable could affect this study?                       a) Does a regression analysis support the
                                                               student’s conclusion?
                                                            b) Why are the results of this analysis invalid?
                                                            c) How can the student get more accurate
                                                               results?
                                                                             3.5 Critical Analysis • MHR   209
4. Inquiry/Problem Solving A restaurant chain          b) Is this prediction realistic? Explain.
               randomly surveys its customers several times       c) Explain why this model generated such
               a year. Since the surveys show that the level         an inaccurate prediction despite having
               of customer satisfaction is rising over time,         a high value for the coefficient of
               the company concludes that its customer               determination.
               service is improving. Discuss the validity of
                                                                  d) Suggest methods Gina could use to make
               the surveys and the conclusion based on
                                                                     a more accurate prediction.
               these surveys.
                                                                7. Communication Find a newspaper or
           5. Application A teacher offers the following data
                                                                  magazine article, television commercial, or
               to show that good attendance is important.
                                                                  web page that misuses statistics of two
                 Days Absent       Final Grade                    variables. Perform a critical analysis using
                      8                 72                        the techniques in this chapter. Present your
                      2                 75                        findings in a brief report.
                      0                 82
                                                                8. Application A manufacturing company keeps
                     11                 68
                                                                  records of its overall annual production and
                     15                 66
                                                                  its number of employees. Data for a ten-year
                     20                 30                        period are shown below.
               A student with a graphing calculator points         Year   Number of Employees    Production (000)
               out that the data indicate that anyone who
                                                                   1992           158                   75
               misses 17 days or more is in danger of
                                                                   1993           165                   81
               failing the course.
                                                                   1994           172                   84
               a) Show how the student arrived at this
                                                                   1995           148                   68
                  conclusion.
                                                                   1996           130                   58
               b) Identify and explain the problems that
                                                                   1997           120                   51
                  make this conclusion invalid.
                                                                   1998            98                   50
               c) Outline statistical methods to avoid these       1999           105                   57
                  problems.                                        2000           110                   62
           6. Using a graphing calculator, Gina found the          2001           120                   70
        apte
    h          cubic curve of best fit for the salary data in      a) Create a scatter plot to see if there is a
C


           r




               the table on page 157. This curve has a               linear correlation between annual
           m
P




r
    oble
               coefficient of determination of 0.98,                  production and number of employees.
               indicating an almost perfect fit to the data.          Classify the correlation.
               The equation of the cubic curve is                 b) At some point, the company began to lay
               starting salary                                       off workers. When did these layoffs begin?
               = 0.0518y3 – 310y2 + 618 412y – 411 344 091        c) Does the scatter plot suggest the
               where the salary is given in thousands of             presence of a hidden variable? Could the
               dollars and y is the year of graduation.              layoffs account for the pattern you see?
               a) What mean starting salary does this                Explain why or why not.
                  model predict for Gina’s class when they        d) The company’s productivity is its annual
                  graduate in 2005?                                  production divided by the number of


         210     MHR • Statistics of Two Variables
employees. Create a time-series graph             comment on any evidence of a hidden
       for the company’s productivity.                   variable. Conduct further research to
    e) Find the line of best fit for the graph in         determine if there are any hidden variables.
       part d).                                          Write a brief report outlining your analysis
                                                         and conclusions.
    f) The company has adopted a better
       management system. When do you think          10. Inquiry/Problem Solving A study conducted
       the new system was implemented?                   by Stanford University found that
       Explain your reasoning.                           behavioural counselling for people who had
                                                         suffered a heart attack reduced the risk of a
C
                                                         further heart attack by 45%. Outline how
9. Search E-STAT, CANSIM II, or other
                                                         you would design such a study. List the
    sources for time-series data for the price of        independent and dependent variables you
    a commodity such as gasoline, coffee, or             would use and describe how you would
    computer memory. Analyse the data and                account for any extraneous variables.

    Career Connection
                                     Economist
 Economists apply statistical methods to develop mathematical models of the
 production and distribution of wealth. Governments, large businesses, and
 consulting firms are employers of economists. Some of the functions
 performed by an economist include
 • recognizing and interpreting domestic and international market trends
 • using supply and demand analysis to assess market potential and set prices
 • identifying factors that affect economic growth, such as inflation and
   unemployment
 • advising governments on fiscal and monetary policies
 • optimizing the economic activity of financial institutions and large
   businesses

 Typically, a bachelor’s degree in economics is necessary to enter this field.
 However, many positions require a master’s or doctorate degree or
 specialized training. Since economists often deal with large amounts of data,
 a strong background in statistics and an ability to work with computers are
 definite assets.

 An economist can expect to earn a comfortable
 living. Most employment opportunities for
 economists are in large cities. The current
                                                              www.mcgrawhill.ca/links/MDM12
 demand for economists is reasonably strong
 and likely to remain so for the foreseeable              Visit the above web site and follow the links to
 future, as governments and large businesses will          learn more about a career as an economist
 continue to need the information and analysis                       and other related careers.
 that economists provide.



                                                                            3.5 Critical Analysis • MHR      211
Review of Key Concepts

3.1 Scatter Plots and Linear Correlation                        a) Create a scatter plot for these data.
Refer to the Key Concepts on page 167.                             Classify the linear correlation.
                                                                b) Determine the correlation coefficient.
 1. a) Classify the linear correlation in each
         scatter plot shown below.                              c) Can you make any conclusions about the
                                                                   effect that watching television has on
          y                                                        academic achievement? Explain.
          14
          12
          10                                                3.2 Linear Regression
           8                                                Refer to the Key Concepts on page 179.
           6
                                                             3. Use the method of least squares to find the
           4
           2
                                                                equation for the line of best fit for the data
                                                                in question 2.
           0
               2   4   6   8 10 12 14   x
                                                             4. The scores for players’ first and second
          y
         10
                                                                games at a bowling tournament are shown
          8
                                                                below.
          6                                                      First Game       169 150 202 230 187 177 164
          4                                                      Second Game 175 162 195 241 185 235 171
          2
                                                                a) Create a scatter plot for these data.
           0
               2   4   6   8 10 12 14   x
                                                                b) Determine the correlation coefficient
          y                                                        and the line of best fit.
          10                                                    c) Identify any outliers.
           8
                                                                d) Repeat part b) with the outliers removed.
           6
           4                                                    e) A player scores 250 in the first game.
           2                                                       Use both linear models to predict this
           0                            x                          player’s score for the second game.
               2   4   6   8 10 12 14
                                                                   How far apart are the two predictions?
      b) Determine the correlation coefficient for
         data points in the scatter plots in part a).
                                                            3.3 Non-Linear Regression
      c) Do these correlation coefficients agree             Refer to the Key Concepts on page 191.
         with your answers in part a)?
                                                             5. An object is thrown straight up into the air.
 2. A survey of a group of randomly selected                    The table below shows the height of the
      students compared the number of hours of                  object as it ascends.
      television they watched per week with their                Time (s)     0     0.1   0.2   0.3   0.4   0.5   0.6
      grade averages.
                                                                 Height (m)   0      1    1.8   2.6   3.2   3.8   4.2
      Hours Per Week         12 10      5   3   15 16   8
                                                                a) Create a scatter plot for these data.
      Grade Average (%) 70 85 82 88 65 75 68



212     MHR • Statistics of Two Variables
b) Perform a non-linear regression for these          8. a) Explain the relationship between
       data. Record the equation of the curve of                experimental and control groups.
       best fit and the coefficient of                         b) Why is a control group needed in
       determination.                                           some statistical studies?
    c) Use your model to predict the maximum
       height of the object.                              9. a) Explain the difference between an
                                                                accidental relationship and a presumed
    d) Use your model to predict how long the
                                                                relationship.
       object will be in the air.
                                                             b) Provide an example of each.
    e) Do you think that your model is
       accurate? Explain.                                10. The price of eggs is positively correlated
                                                             with wages. Explain why you cannot
 6. The table shows the          Time (s) Distance (m)
                                                             conclude that raising the price of eggs
    distance travelled by a          0          0            should produce a raise in pay.
    car as a function of time.       2          6
    a) Determine a curve             4         22        11. An educational researcher compiles data on
        of best fit to model          6         50            Internet use and scholastic achievement for
        the data.                    8         90            a random selection of students, and observes
                                    10        140
    b) Do you think the                                      a strong positive linear correlation. She
                                    12        190
       equation for this                                     concludes that Internet use improves student
                                    14        240
       curve of best fit is          16        290
                                                             grades. Comment on the validity of this
       a good model for             18        340            conclusion.
       the situation?               20        380
       Explain your                 22        410        3.5 Critical Analysis
       reasoning.                   24        430        Refer to the Key Cconcepts on page 209.
    c) Describe what the            26        440
                                    28        440        12. A teacher is trying to determine whether a
       driver did between
                                                             new spelling game enhances learning. In
       0 and 28 s.
                                                             his gifted class, he finds a strong positive
                                                             correlation between use of the game and
3.4 Cause and Effect                                         spelling-test scores. Should the teacher
Refer to the Key Concepts on page 199.                       recommend the use of the game in all
 7. Define or explain the following terms and                 English classes at his school? Explain your
    provide an example of each one.                          answer.
    a) common-cause factor                               13. a) Explain what is meant by the term hidden
    b) reverse cause-and-effect relationship                    variable.
    c) extraneous variable                                   b) Explain how you might detect the presence
                                                                of a hidden variable in a set of data.




                                                                            Review of Key Concepts • MHR   213
Chapter Test

ACHIEVEMENT CHART

                          Knowledge/           Thinking/Inquiry/
       Category                                                    Communication          Application
                         Understanding          Problem Solving
       Questions               All                 5, 7, 10        1, 5, 6, 8, 10         3, 4, 7, 10


 1. Explain or define each of the following                             d) Use this model to predict the average
      terms.                                                               word length in a book recommended
      a) perfect negative linear correlation                               for 12-year olds.
      b) experimental research                                     Use the following information in order to
      c) outlier                                                   answer questions 4–6.
      d) extraneous variable
                                                                   Jerome has kept track of the hours he spent
      e) hidden variable                                           studying and his marks on examinations.
 2. Match the following.                                             Subject                    Hours Studied   Mark
                                                                     Mathematics, grade 9                5      70
         Correlation Type                       Coefficient, r
                                                                     English, grade 9                    3      65
      a) strong negative linear                       1
                                                                     Science, grade 9                    4      68
      b) direct                                       0.6            Geography, grade 9                 4       72
      c) weak positive linear                         0.3            French, grade 9                     2      38
      d) moderate positive linear                    −0.8            Mathematics, grade 10               7      74
      e) perfect negative linear                     −1              English, grade 10                   5      69
                                                                     Science, grade 10                  6       71
 3. The following set of data relates mean word                      History, grade 10                  5       75
      length and recommended age level for a set                     Mathematics, grade 11              12      76
      of children’s books.
                                                                     English, grade 11                   9      74
       Recommended Age        Mean Word Length
                                                                     Physics, grade 11                  14      78
                   4                     3.5
                   6                     5.5                        4. a) Create a scatter plot for Jerome’s data
                                                                           and classify the linear correlation.
                   5                     4.6
                   6                     5.0                           b) Perform a regression analysis. Identify
                   7                     5.2                               the equation of the line of best fit as y1,
                                                                           and record the correlation coefficient.
                   9                     6.5
                   8                     6.1                           c) Identify any outliers.
                   5                     4.9                           d) Repeat part b) with the outlier removed.
      a) Create a scatter plot and classify the                            Identify this line as y2.
         linear correlation.                                        5. Which of the two linear models found in
      b) Determine the correlation coefficient.                        question 4 gives a more optimistic
      c) Determine the line of best fit.                               prediction for Jerome’s upcoming biology
                                                                      examination? Explain.

214     MHR • Statistics of Two Variables
6. a) Identify at least three extraneous                           a) Create a scatter plot for the data.
      variables in Jerome’s study.                                 b) Perform a quadratic regression. Record
   b) Suggest some ways that Jerome might                                the equation of the curve of best fit and
      improve the validity of his study.                                 the coefficient of determination.
                                                                   c) Repeat part b) for an exponential
7. A phosphorescent material can glow in the
                                                                         regression.
  dark by absorbing energy from light and
  then gradually re-emitting it. The following                     d) Compare how well these two models fit
  table shows the light levels for a                                     the data.
  phosphorescent plastic.                                          e) According to each model, what will be
        Time (h)     Light Level (lumens)                                the light level after 10 h?
           0               0.860                                      f) Which of these two models is superior
           1               0.695                                         for extrapolating beyond 6 h? Explain.
           2               0.562
                                                                8. Explain how you could minimize the effects
           3               0.455
                                                                   of extraneous variables in a correlation study.
           4               0.367
           5               0.305                                9. Provide an example of a reverse cause-and-
           6               0.247                                   effect relationship.

     ACHIEVEMENT CHECK

 Knowledge/Understanding     Thinking/Inquiry/Problem Solving            Communication                    Application
10. The table shown on the right contains data                             Licensed Number of         % of Drivers in Age
   from the Ontario Road Safety Annual Report                   Age
                                                                            Drivers  Collisions       Group in Collisions
   for 1999.                                                     16           85 050   1 725                  2.0
   a) Organize the data so that the age intervals                17         105 076        7 641               7.3
      are consistent. Create a scatter plot of the               18         114 056        9 359               8.2
      proportion of drivers involved in collisions               19         122 461        9 524               7.8
      versus age.                                                20         123 677        9 320               7.5
   b) Perform a regression analysis. Record the                 21–24       519 131       36 024               6.9
      equations of the curves of best fit for each               25–34 1 576 673           90 101               5.7
      regression you try as well as the coefficient              35–44 1 895 323           90 813               4.8
      of determination.                                         45–54 1 475 588           60 576               4.1
   c) In Ontario, drivers over 80 must take vision              55–64       907 235       31 660               3.5
      and knowledge tests every two years to                    65–74       639 463       17 598               2.8
      renew their licences. However, these drivers              75 and
      no longer have to take road tests as part of               older      354 581        9 732               2.7
      the review. Advocacy groups for seniors had               Total     7 918 314      374 073               4.7
      lobbied the Ontario government for this
      change. How could such groups have used
      your data analysis to support their position?

                                                                                                   Chapter Test • MHR   215
Statistics Project


  Wrap-Up
 Implementing Your Action Plan                       Suggested Resources
  1. Look up the most recent census data from        • Statistics Canada web sites and publications
     Statistics Canada. Pick a geographical          • Embassies and consulates
     region and study the data on age of all
                                                     • United Nations web sites and publications
     respondents by gender. Conjecture a
                                                       such as UNICEF’s CyberSchoolbus and
     relationship between age and the relative
                                                       World Health Organization reports
     numbers of males and females. Use a table
     and a graph to organize and present the         • Statistical software (the Fathom™ sample
     data. Does the set of data support your           documents include census data for Beverly
     conjecture?                                       Hills, California)
                                                     • Spreadsheets
  2. You may want to compare the data you
                                                     • Graphing calculators
      analysed in step 1 to the corresponding
      data for other regions of Canada or for
      other countries. Identify any significant
      similarities or differences between the data
      sets. Suggest reasons for any differences             www.mcgrawhill.ca/links/MDM12
      you notice.
                                                        Visit the web site above to find links to various
  3. Access data on life expectancies in Canada                        census databases.
      for males and females from the 1920s to
      the present. Do life expectancies appear to
      be changing over time? Is there a
      correlation between these two variables?       Evaluating Your Project
      If so, use regression analysis to predict      To help assess your own project, consider the
      future life expectancies for males and         following questions.
      females in Canada.
                                                      1. Are the data you selected appropriate?
  4. Access census data on life expectancies in
                                                      2. Are your representations of the data
      the various regions of Canada. Select
                                                        effective?
      another attribute from the census data and
      conjecture whether there is a correlation       3. Are the mathematical models that you used
      between this variable and life expectancies.      reliable?
      Analyse data from different regions to see
      if the data support your conjecture.            4. Who would be interested in your findings?
                                                        Is there a potential market for this
                                                        information?




216   MHR • Statistics Project
5. Are there questions that arose from your                Presentation
    research that warrant further investigation?            Present the findings of your investigation in
    How would you go about addressing these                 one or more of the following forms:
    issues in a future project?                             • written report
 6. If you were to do this project again, what              • oral presentation
    would you do differently? Why?                          • computer presentation (using software such
                                                              as Corel® Presentations™ or Microsoft®
Section 9.4 describes methods for evaluating
                                                              PowerPoint®)
your own work.
                                                            • web page
                                                            • display board
                                                            Remember to include a bibliography. See
                                                            section 9.5 and Appendix D for information on
                                                            how to prepare a presentation.

 Preparing for
 the Culminating Project

 Applying Project Skills                                    be the focus of your project and begun to
 Throughout this statistics project, you have               gather relevant data. Section 9.2 provides
 developed skills in statistical research and               suggestions to help you clearly define your
 analysis that may be helpful in preparing                  task. Your next steps are to develop and
 your culminating project:                                  implement an action plan.

 • making a conjecture or hypothesis                        Make sure there are enough data to support
 • using technology to access, organize, and                your work. Decide on the best way to
   analyse data                                             organize and present the data. Then,
                                                            determine what analysis you need to do. As
 • applying a variety of statistical tools
                                                            you begin to work with the data, you may
 • comparing two sets of data
                                                            find that they are not suitable or that further
 • presenting your findings
                                                            research is necessary. Your analysis may lead
                                                            to a new approach or topic that you would
 Keeping on Track
                                                            like to pursue. You may find it necessary to
 At this point, you should have a good idea of              refine or alter the focus of your project. Such
 the basic nature of your culminating project.              changes are a normal part of the development
 You should have identified the issue that will              and implementation process.



                                                Refine/Redefine


  Define the   Define      Develop an    Implement       Evaluate Your     Prepare a     Present Your      Constructively
  Problem      Your Task   Action Plan   Your Action     Investigation     Written       Investigation     Critique the
                                         Plan            and Its Results   Report        and Its Results   Presentations
                                                                                                           of Others




                                                                                  Statistics Project: Wrap-Up • MHR         217
Cumulative Review: Chapters 1 to 3 4
Cumulative Review: Chapters 3 and

                                                         5. Classify the type of linear correlation that

                 ΄             ΅
                      7 3
                                                           you would expect for each pair of variables.
 1. Let A =           0 –2 , B = 8
                     –5 4        –5      ΄     4   ΅
                                               1 , and
                                                            a) air temperature, altitude
                                                            b) income, athletic ability

           ΄              ΅
               –8 0
      C=        5 6 . Calculate, if possible,               c) people’s ages from 1 to 20 years, their
                9 –3                                             masses
      a) –2(A + C)             b) AC                        d) people’s ages from 21 to 40 years, their
                                                                 masses
      c) (BA)t                 d) B2
      e) C2                    f) B –1                   6. Identify the most likely causal relationship
                                                           between each of the following pairs of variables.
 2. a) Describe the iterative process used to
                                                            a) grade point average, starting salary upon
         generate the table below.
                                                                 graduation
      b) Continue the process until all the cells
                                                            b) grade in chemistry, grade in physics
         are filled.
                                                            c) sales of symphony tickets, carrot harvest
                     17   16       15    14   13            d) monthly rainfall, monthly umbrella sales
                     18    5        4     3   12
                           6        1     2   11         7. a) Sketch a map that can be coloured using
                           7        8     9   10                 only three colours.
                                                            b) Reconfigure your map as a network.

                                                         8. State whether each of the following
 3. Which of the following would you consider              networks is
      to be databases? Explain your reasoning.              i) connected          ii) traceable   iii) planar
      a) a novel                                           Provide evidence for your decisions.
      b) school attendance records                          a)                          b) P
                                                                              B                                 T
      c) the home page of a web site
      d) an advertising flyer from a department               A
         store                                                                C
                                                                                          Q                     S
 4. What sampling techniques are most likely
      to be used for the following surveys? Explain                D                                 R
      each of your choices.                              9. Use a tree diagram to represent the
      a) a radio call-in show                              administrative structure of a school that has
      b) a political poll                                  a principal, vice-principals, department
      c) a scientific study                                 heads, assistant heads, and teachers.




218     MHR • Cumulative Review: Chapters 1 to 3
10. A renowned jazz pianist living in Toronto               b) Create a histogram and a cumulative-
   often goes on tours in the United States. For               frequency diagram for the data.
   the tour shown below, which city has the                 c) What proportion of the families surveyed
   most routes                                                 earn an annual income of $60 000 or less?
    a) with exactly one stopover?
                                                        13. Classify the bias in each of the following
    b) with no more than two stopovers?
                                                            situations. Explain your reasoning in each case.
                     Toronto
                                                            a) At a financial planning seminar, the
                               Buffalo                         audience were asked to raise their hands
          Detroit                                              if they had ever considered declaring
                                            New York
                    Cleveland
                                                               bankruptcy.
    Chicago                              Philadelphia       b) A supervisor asked an employee if he
                         Pittsburgh
                                                               would mind working late for a couple
                               Washington                      of hours on Friday evening.
                                                            c) A survey asked neighbourhood dog-
11. The following are responses to a survey that
                                                               owners if dogs should be allowed to run
   asked: “On average, how many hours per                      free in the local park.
   week do you read for pleasure?”
                                                            d) An irascible talk show host listed the
1 3 0 0 7 2 0 1 10 5 2 2 2 0 1 4 0 8 3 1 3                     mayor’s blunders over the last year and
0 0 2 15 4 9 1 6 7 0 3 3 14 5 7 0 1 1 0 10 0                   invited listeners to call in and express
   Use a spreadsheet to                                        their opinions on whether the mayor
                                                               should resign.
    a) sort the data from smallest to largest value
    b) determine the mean hours of pleasure             14. The scores in a recent bowling tournament
        reading                                             are shown in the following table.
    c) organize the data into a frequency table              150 260 213 192 176 204 138 214 298 188
        with appropriate intervals                           168 195 225 170 260 254 195 177 149 224
    d) make a histogram of the information in                260 222 167 182 207 221 185 163 112 189
        part c)
                                                            a) Calculate the mean, median, and mode
12. The annual incomes of 40 families surveyed                 for this distribution. Which measure
   at random are shown in the table.                           would be the most useful? Which would
                                                               be the least useful? Explain your choices.
                     Income ($000)
                                                            b) Determine the standard deviation, first
 28.5    38       61   109      42    56    19
                                                               quartile, third quartile, and interquartile
 27      44.5     81    36      39    51    40.5
 67      28       60    87      58   120   111                 range.
 73      65       34    54      16.5 135    70.5            c) Explain what each of the quantities in part
 59      47       92    38      55    84.5 107                 b) tells you about the distribution of scores.
 71      59       26.5  76      50
                                                            d) What score is the 50th percentile for this
    a) Group these data into 8 to 12 intervals                 distribution?
        and create a frequency table.
                                                                Cumulative Review: Chapters 1 to 3 • MHR   219
e) Is the player who scored 222 above the           a) Create a time-series graph for these data.
         80th percentile? Explain why or why not.        b) Based on this graph, what level of sales
                                                                         would you predict for 2003?
15. The players on a school baseball team
      compared their batting averages and the             c) List three factors that could affect the
      hours they spent at the batting practice.                          accuracy of your prediction.
       Batting Average Practice Hours                    d) Compute an index value for the sales
            0.220               20                                       each year using the 1997 sales as a base.
            0.215               18                                       What information do the index values
            0.185               15                                       provide?
            0.170               14                        e) Suppose that this salesperson is thinking
            0.200               18                                       of changing jobs. Outline how she could
            0.245               22                                       use the sales index to convince other
            0.230               19
                                                                         employers to hire her.
            0.165               15
            0.205               17                    18. The following time-series graph shows the
      a) Identify the independent variable and           Consumer Price Index (CPI) for the period
         dependent variable. Explain your choices.       1971 to 2001.
                                                          Consumer Price Index (CPI)   150
      b) Produce a scatter plot for the data and
         classify the linear correlation.
      c) Determine the correlation coefficient and
                                                                (1992=100)


                                                                                       100
         the equation of the line of best fit.
      d) Use this linear model to predict the
         batting average for players who had                                            50
         batting practice for
         i) 16 h    ii) 13 h         iii) 35 h
      e) Discuss how accurate you think each of
                                                                                             1975


                                                                                                    1980


                                                                                                           1985


                                                                                                                     1990


                                                                                                                            1995


                                                                                                                                   2000
         these predictions will be.
                                                                                                                  Year
16. Describe a method you could use to detect
                                                          a) What is the base for this index? When
      outliers in a sample.                                              did the CPI equal half of this base value?
17. A bright, young car salesperson has made the         b) Approximately how many times did the
      following gross sales with her first employer.                      average price of goods double from 1971
        Year   Gross Sales ($ millions)                                  to 1992?
        1997              0.8                             c) Which decade on this graph had the
        1998              1.1                                            highest rate of inflation? Explain your
        1999              1.6                                            answer.
        2000              2.3
                                                         d) Estimate the overall rate of inflation for
        2001              3.5
                                                                         the period from 1971 to 2001.
        2002              4.7


220     MHR • Cumulative Review: Chapters 1 to 3
Probability Project


Designing a Game
Background
Many games introduce elements of chance
with random processes. For example, card
games use shuffled cards, board games often
use dice, and bingo uses randomly selected
numbers.

Your Task
Design and then analyse a game for two or
more players, involving some form of
random process. One of the players may
assume the role of dealer or game master.

Developing an Action Plan
You will need to decide on one or more
instruments of chance, such as dice, cards,
coins, coloured balls, a random-number
generator, a spinner, or a nail maze.
Recommend a method of tracking progress
or keeping score, such as a game board or
tally sheet. Create the rules of the game.
Submit a proposal to your teacher outlining
the concept and purpose of your game.




                                              Probability Project: Introduction • MHR
                                               <<Section number and title>>             221
4
              4
    PT   ER
         ER
                  Permutations and Organized
CHA




                     Counting


                  Specific Expectations                                                         Section

                  Represent complex tasks or issues, using diagrams.                              4.1

                  Solve introductory counting problems involving the additive and            4.1, 4.2, 4.3
                  multiplicative counting principles.

                  Express the answers to permutation and combination problems, using           4.2, 4.3
                  standard combinatorial symbols.

                  Evaluate expressions involving factorial notation, using appropriate         4.2, 4.3
                  methods.

                  Solve problems, using techniques for counting permutations where some           4.3
                  objects may be alike.

                  Identify patterns in Pascal’s triangle and relate the terms of Pascal’s      4.4, 4.5
                                         n
                  triangle to values of ΂ r ΃, to the expansion of a binomial, and to the
                  solution of related problems.


                  Communicate clearly, coherently, and precisely the solutions to counting   4.1, 4.2, 4.3,
                  problems.                                                                     4.4, 4.5
Chapter Problem
Students’ Council Elections                         1. In how many ways could the positions
Most high schools in Ontario have a                    of president and vice-president be filled
students’ council comprised of students                by these ten students if all ten are
from each grade. These students are elected            eligible for these positions? How many
representatives, and a part of their function          ways are there if only the grades 11 and
is to act as a liaison between the staff and           12 students are eligible?
the students. Often, these students are
                                                    2. The grade representatives must
instrumental in fundraising and in
                                                       represent their current grade level.
coordinating events, such as school dances
                                                       In how many ways could the grade
and sports.
                                                       representative positions be filled?
A students’ council executive could consist of a
                                                   You could answer both of these questions
president, vice-president, secretary, treasurer,
                                                   by systematically listing all the possibilities
social convenor, fundraising chair, and four
                                                   and then counting them. In this chapter,
grade representatives. Suppose ten students
                                                   you will learn easier and more powerful
have been nominated to fill these positions.
                                                   techniques that can also be applied to much
Five of the nominees are from grade 12, three
                                                   more complex situations.
are from grade 11, and the other two are a
grade 9 and a grade 10 student.
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

 1. Tree diagrams Draw a tree diagram to                    c) 3 by 5 grid?        d) 4 by 5 grid?
      illustrate the number of ways a quarter, a
      dime, and a nickel can come up heads or
      tails if you toss one after the other.

 2. Tree diagrams
      a) Draw a tree diagram to illustrate the
         possible outcomes of tossing a coin and         5. Evaluating expressions Evaluate each
         rolling a six-sided die.                          expression given x = 5, y = 4, and z = 3.
      b) How many possible outcomes are there?                8y(x + 2)( y + 2)(z + 2)
                                                           a) ᎏᎏᎏ
 3. Number patterns The manager of a grocery                    (x − 3)( y + 3)(z + 2)
      store asks a stock clerk to arrange a display            (x − 2)3( y + 2)2(z + 1)2
                                                            b) ᎏᎏᎏ
      of canned vegetables in a triangular pyramid               2
                                                                   y(x + 1)( y − 1)
      like the one shown. Assume all cans are the              (x + 4)( y − 2)(z + 3)      (x − 1)2(z + 1)y
      same size and shape.                                  c) ᎏᎏᎏ + ᎏᎏ
                                                                  ( y − 1)(x − 3)z
                                                                      4
                                                                                           (x − 3) ( y + 4)

                                                         6. Order of operations Evaluate.
                                                            a) 5(4) + (–1)3(3)2
                                                                 (10 − 2)2(10 − 3)2
                                                            b) ᎏᎏᎏ
                                                                (10 − 2) − (10 − 3)
                                                                2 2

      a) How many cans is the tallest complete                  6(6 − 1)(6 − 2)(6 − 3)(6 − 4)(6 − 5)
         pyramid that the clerk can make with               c) ᎏᎏᎏᎏ
                                                                          3(3 − 1)(3 − 2)
         100 cans of vegetables?
                                                                50(50 − 1)(50 − 2)…(50 − 49)
      b) How many cans make up the base level               d) ᎏᎏᎏᎏ
                                                                48(48 − 1)(48 − 2)…(48 − 47)
         of the pyramid in part a)?
                                                                12 × 11 × 10 × 9        10 × 9 × 8 × 7
      c) How many cans are in the full pyramid              e) ᎏᎏ + ᎏᎏ
                                                                2    4
                                                                       6                      2
         in part a)?
                                                                 8×7×6×5
      d) What is the sequence of the numbers of                − ᎏᎏ
                                                                    42
         cans in the levels of the pyramid?
                                                         7. Simplifying expressions Simplify.
 4. Number patterns What is the greatest                        x2 − xy + 2x            (4x + 8)2
      possible number of rectangles that can                a) ᎏᎏ                  b) ᎏ
                                                                     2x                    16
      be drawn on a
                                                                14(3x2 + 6)
      a) 1 by 5 grid?      b) 2 by 5 grid?                  c) ᎏᎏ
                                                                   7×6
                                                               x(x − 1)(x − 2)(x − 3)
                                                            d) ᎏᎏᎏ
                                                                2
                                                                      x − 2x
                                                               2y + 1    16y + 4
                                                            e) ᎏ + ᎏ
                                                                 x          4x
224     MHR • Permutations and Organized Counting
4.1         Organized Counting

  The techniques and mathematical logic for counting possible arrangements or
  outcomes are useful for a wide variety of applications. A computer programmer
  writing software for a game or industrial process would use such techniques, as
  would a coach planning a starting line-up, a conference manager arranging
  a schedule of seminars, or a school board trying to make the most efficient use
  of its buses.

  Combinatorics is the branch of mathematics dealing with ideas and methods
  for counting, especially in complex situations. These techniques are also
  valuable for probability calculations, as you will learn in Chapter 6.


      I N V E S T I G AT E & I N Q U I R E : Licence Plates

      Until 1997, most licence plates for passenger cars in Ontario had three
      numbers followed by three letters. Suppose the provincial government
      had wanted all the vehicles
      registered in Ontario to have plates
      with the letters O, N, and T.

       1. Draw a diagram to illustrate all
         the possibilities for arranging
         these three letters assuming
         that the letters can be repeated.
         How many possibilities are
         there?
       2. How could you calculate the
         number of possible three-letter groups
         without listing them all?
       3. Predict how many three-letter groups
         the letters O, N, T, and G can
         form.
       4. How many three-letter groups
         do you think there would be if
         you had a choice of five letters?
       5. Suggest a general strategy for
         counting all the different
         possibilities in situations like
         those above.



                                                                         4.1 Organized Counting • MHR   225
When you have to make a series of choices, you can usually determine the
total number of possibilities without actually counting each one individually.


Example 1 Travel Itineraries

Martin lives in Kingston and is planning a trip to Vienna, Austria. He checks
a web site offering inexpensive airfares and finds that if he travels through
London, England, the fare is much lower. There are three flights available
from Toronto to London and two flights from London to Vienna. If Martin
can take a bus, plane, or train from Kingston to Toronto, how many ways can
he travel from Kingston to Vienna?

Solution
                                                                                Martin's Choices
You can use a tree diagram to illustrate and count Martin’s choices.               Flight A
                                                                                               Flight 1
This diagram suggests another way to determine the number                                      Flight 2
                                                                                               Flight 1
of options Martin has for his trip.                                      Bus       Flight B
                                                                                               Flight 2
                                                                                               Flight 1
                                                                                   Flight C
Choices for the first portion of trip:       3                                                  Flight 2
Choices for the second portion of trip:     3                                      Flight A
                                                                                               Flight 1
                                                                                               Flight 2
Choices for the third portion of trip:      2
                                                                                               Flight 1
Total number of choices:                    3 × 3 × 2 = 18              Train      Flight B
                                                                                               Flight 2
                                                                                               Flight 1
                                                                                   Flight C
In all, Martin has 18 ways to travel from Kingston to Vienna.                                  Flight 2
                                                                                               Flight 1
                                                                                   Flight A
                                                                                               Flight 2
                                                                                               Flight 1
                                                                        Plane      Flight B
                                                                                               Flight 2
                                                                                               Flight 1
                                                                                   Flight C
                                                                                               Flight 2
Example 2 Stereo Systems

Javon is looking at stereos in an electronics store. The store has five types of
receivers, four types of CD players, and five types of speakers. How many
different choices of stereo systems does this store offer?

Solution

For each choice of receiver, Javon could choose any one of the CD players.
Thus, there are 5 × 4 = 20 possible combinations of receivers and CD players.
For each of these combinations, Javon could then choose one of the five kinds
of speakers.

The store offers a total of 5 × 4 × 5 = 100 different stereo systems.




226   MHR • Permutations and Organized Counting
These types of counting problems illustrate the fundamental or multiplicative
counting principle:

If a task or process is made up of stages with separate choices, the total
number of choices is m × n × p × …, where m is the number of choices
for the first stage, n is the number of choices for the second stage, p is the
number of choices for the third stage, and so on.


Example 3 Applying the Fundamental Counting Principle                                  Project
                                                                                       Prep
A school band often performs at benefits and other functions outside the
school, so its members are looking into buying band uniforms. The band                 You can use the
committee is considering four different white shirts, dress pants in grey, navy,       fundamental or
or black, and black or grey vests with the school crest. How many different            multiplicative
designs for the band uniform is the committee considering?                             counting principle
                                                                                       to help design
Solution                                                                               the game for
                                                                                       your probability
First stage: choices for the white shirts, m = 4
                                                                                       project.
Second stage: choices for the dress pants, n = 3
Third stage: choices for the vests, p = 2
The total number of possibilities is
m×n×p=4×3×2
           = 24
The band committee is considering 24 different possible uniforms.


In some situations, an indirect method makes a calculation easier.

Example 4 Indirect Method

Leora, a triathlete, has four pairs of running shoes loose in her gym bag.
In how many ways can she pull out two unmatched shoes one after the other?

Solution

You can find the number of ways of picking unmatched shoes by subtracting the
number of ways of picking matching ones from the total number of ways of
picking any two shoes.

There are eight possibilities when Leora pulls out the first shoe, but only seven
when she pulls out the second shoe. By the fundamental counting principle, the
number of ways Leora can pick any two shoes out of the bag is 8 × 7 = 56. She
could pick each of the matched pairs in two ways: left shoe then right shoe or right
shoe then left shoe. Thus, there are 4 × 2 = 8 ways of picking a matched pair.

Leora can pull out two unmatched shoes in 56 − 8 = 48 ways.

                                                                         4.1 Organized Counting • MHR   227
Sometimes you will have to count several subsets of possibilities separately.

Example 5 Signal Flags

Sailing ships used to send messages with signal flags flown from their masts.
How many different signals are possible with a set of four distinct flags if a
minimum of two flags is used for each signal?

Solution

A ship could fly two, three, or four signal flags.

Signals with two flags:       4 × 3 = 12
Signals with three flags:     4 × 3 × 2 = 24
Signals with four flags:      4 × 3 × 2 × 1 = 24
Total number of signals:     12 + 24 + 24 = 60

Thus, the total number of signals possible with these flags is 60.


In Example 5, you were counting actions that could not occur at the same time.
When counting such mutually exclusive actions, you can apply the additive
counting principle or rule of sum:
If one mutually exclusive action can occur in m ways, a second in n ways,
a third in p ways, and so on, then there are m + n + p … ways in which
one of these actions can occur.



   Key Concepts

   • Τree diagrams are a useful tool for organized counting.

   • Ιf you can choose from m items of one type and n items of another, there are
     m × n ways to choose one item of each type (fundamental or multiplicative
     counting principle).

   • If you can choose from either m items of one type or n items of another type,
     then the total number of ways you can choose an item is m + n (additive
     counting principle).

   • Both the multiplicative and the additive counting principles also apply to
     choices of three or more types of items.

   • Sometimes an indirect method provides an easier way to solve a problem.




228   MHR • Permutations and Organized Counting
Communicate Your Understanding

    1. Explain the fundamental counting principle in your own words and give
       an example of how you could apply it.
    2. Are there situations where the fundamental counting principle does not
       apply? If so, give one example.
    3. Can you always use a tree diagram for organized counting? Explain your
       reasoning.



Practise                                             Apply, Solve, Communicate
A                                                     6. Ten different books and four different pens
1. Construct a tree diagram to illustrate the             are sitting on a table. One of each is
    possible contents of a sandwich made from             selected. Should you use the rule of sum or
    white or brown bread, ham, chicken, or                the product rule to count the number of
    beef, and mustard or mayonnaise. How                  possible selections? Explain your reasoning.
    many different sandwiches are possible?
                                                      B
2. In how many ways can you roll either a sum
                                                      7. Application A grade 9 student may build a
    of 4 or a sum of 11 with a pair of dice?
                                                          timetable by selecting one course for each
3. In how many ways can you draw a 6 or a                 period, with no duplication of courses.
    face card from a deck of 52 playing cards?            Period 1 must be science, geography, or
                                                          physical education. Period 2 must be art,
4. How many ways are there to draw a 10 or
                                                          music, French, or business. Periods 3 and 4
    a queen from the 24 cards in a euchre deck,
                                                          must each be mathematics or English.
    which has four 10s and four queens?
                                                          a) Construct a tree diagram to illustrate the
5. Use tree diagrams to answer the following:                choices for a student’s timetable.
    a) How many different soccer uniforms are             b) How many different timetables could a
       possible if there is a choice of two types            student choose?
       of shirts, three types of shorts, and two
       types of socks?                                8. A standard die is rolled five times. How
                                                          many different outcomes are possible?
    b) How many different three-scoop cones
       can be made from vanilla, chocolate, and       9. A car manufacturer offers three kinds of
       strawberry ice cream?                              upholstery material in five different colours
    c) Suppose that a college program has six             for this year’s model. How many upholstery
       elective courses, three on English                 options would a buyer have? Explain your
       literature and three on the other arts. If         reasoning.
       the college requires students to take one
       of the English courses and one of the         10. Communication In how many ways can a
       other arts courses, how many pairs of              student answer a true-false test that has six
       courses will satisfy these requirements?           questions. Explain your reasoning.



                                                                       4.1 Organized Counting • MHR   229
11. The final score of a soccer game is 6 to 3.                    17. Ten students have been nominated for
                                                                 pte
      How many different scores were possible               ha         a students’ council executive. Five of the




                                                        C


                                                                   r
      at half-time?                                                    nominees are from grade 12, three are




                                                                   m
                                                        P
                                                        r
                                                            oble
                                                                       from grade 11, and the other two are
12. A large room has a bank of five windows.                            from grades 9 and 10.
      Each window is either open or closed. How
                                                                       a) In how many ways could the nominees
      many different arrangements of open and
                                                                          fill the positions of president and vice-
      closed windows are there?
                                                                          president if all ten are eligible for these
13. Application A Canadian postal code uses six                           senior positions?
      characters. The first, third, and fifth are                        b) How many ways are there to fill
      letters, while the second, fourth, and sixth                        these positions if only grade 11 and
      are digits. A U.S.A. zip code contains five                          grade 12 students are eligible?
      characters, all digits.
                                                                  18. Communication
      a) How many codes are possible for each
         country?                                                      a) How many different licence plates could
                                                                          be made using three numbers followed
      b) How many more possible codes does
                                                                          by three letters?
         the one country have than the other?
                                                                       b) In 1997, Ontario began issuing licence
14. When three-digit area codes were                                      plates with four letters followed by three
      introduced in 1947, the first digit had to be                        numbers. How many different plates are
      a number from 2 to 9 and the middle digit                           possible with this new system?
      had to be either 1 or 0. How many area                           c) Research the licence plate formats used in
      codes were possible under this system?                              the other provinces. Compare and contrast
15. Asha builds new homes and offers her
                                                                          these formats briefly and suggest reasons
      customers a choice of brick, aluminium                              for any differences between the formats.
      siding, or wood for the exterior, cedar or                  19. In how many ways can you arrange the
      asphalt shingles for the roof, and radiators or                  letters of the word think so that the t and the
      forced-air for the heating system. How many                      h are separated by at least one other letter?
      different configurations is Asha offering?
                                                                  20. Application Before the invention of the
16. a) In how many ways could you choose                               telephone, Samuel Morse (1791−1872)
         two fives, one after the other, from a                         developed an efficient system for sending
         deck of cards?                                                messages as a series of dots and dashes
      b) In how many ways could you choose a red                       (short or long pulses). International code, a
         five and a spade, one after the other?                         modified version of Morse code, is still
      c) In how many ways could you choose                             widely used.
         a red five or a spade?                                         a) How many different characters can the
      d) In how many ways could you choose                                international code represent with one to
         a red five or a heart?                                            four pulses?
      e) Explain which counting principles you                         b) How many pulses would be necessary
         could apply in parts a) to d).                                   to represent the 72 letters of the
                                                                          Cambodian alphabet using a system
                                                                          like Morse code?

230     MHR • Permutations and Organized Counting
ACHIEVEMENT CHECK                                           24. Inquiry/Problem Solving Your school is
                                                                      purchasing a new type of combination lock
  Knowledge/     Thinking/Inquiry/
 Understanding    Problem Solving
                                     Communication   Application      for the student lockers. These locks have
                                                                      40 positions on their dials and use a three-
21. Ten finalists are competing in a race at
                                                                      number combination.
     the Canada Games.
                                                                       a) How many combinations are possible if
     a) In how many different orders can the
                                                                          consecutive numbers cannot be the
         competitors finish the race?                                      same?
     b) How many ways could the gold, silver,
                                                                       b) Are there any assumptions that you have
         and bronze medals be awarded?                                    made? Explain.
     c) One of the finalists is a friend from
                                                                       c) Assuming that the first number must be
         your home town. How many of the                                  dialled clockwise from 0, how many
         possible finishes would include your                              different combinations are possible?
         friend winning a medal?
                                                                       d) Suppose the first number can also be
     d) How many possible finishes would
                                                                          dialled counterclockwise from 0. Explain
         leave your friend out of the medal                               the effect this change has on the number
         standings?                                                       of possible combinations.
     e) Suppose one of the competitors is
                                                                       e) If you need four numbers to open the
         injured and cannot finish the race.                               lock, how many different combinations
         How does that affect your previous                               are possible?
         answers?
      f) How would the competitor’s injury                         25. Inquiry/Problem Solving In chess, a knight
         affect your friend’s chances of winning                      can move either two squares horizontally
         a medal? Explain your reasoning.                             plus one vertically or two squares vertically
         What assumptions have you made?                              plus one horizontally.
                                                                       a) If a knight starts from one corner of a
                                                                          standard 8 × 8 chessboard, how many
 C                                                                        different squares could it reach after
22. A locksmith has ten types of blanks for                               i)   one move?
     keys. Each blank has five different cutting
                                                                          ii) two moves?
     positions and three different cutting depths
     at each position, except the first position,                          iii) three moves?
     which only has two depths. How many                               b) Could you use the fundamental counting
     different keys are possible with these                               principle to calculate the answers for
     blanks?                                                              part a)? Why or why not?

23. Communication How many 5-digit numbers
     are there that include the digit 5 and exclude
     the digit 8? Explain your solution.




                                                                                    4.1 Organized Counting • MHR    231
4.2           Factorials and Permutations

        In many situations, you need to determine the number of different orders
        in which you can choose or place a set of items.


        I N V E S T I G AT E & I N Q U I R E : N u m b e r s o f A r r a n g e m e n t s

        Consider how many different ways a president and a vice-president could
        be chosen from eight members of a students’ council.

         1. a) Have one person in your class make two signs, writing President
               on one and Vice-President on the other. Now, choose two people
               to stand at the front of the class. Using the signs to indicate which
               person holds each position, decide in how many ways you can
               choose a president and a vice-president from the two people at
               the front of the class.
            b) Choose three students to
               be at the front of the class.
               Again using the signs to
               indicate who holds each
               position, determine how
               many ways you can choose
               a president and a vice-
               president from the three
               people at the front of the
               class.
            c) Repeat the process with
               four students. Do you see
               a pattern in the number of
               ways a president and a
               vice-president can be
               chosen from the different
               sizes of groups? If so, what
               is the pattern? If not,
               continue the process with five students and then with six students.
            d) When you see a pattern, predict how many ways a president and
               a vice-president can be chosen from the eight members of the
               students’ council.
            e) Suggest other ways of simulating the selection of a president and
               a vice-president for the students’ council.




  232      MHR • Permutations and Organized Counting
2. Suppose that each of the eight members of the students’ council has to
       give a brief speech at an assembly. Consider how you could determine
       the number of different orders in which they could speak.
        a) Choose two students from your class and list all the possible orders
           in which they could speak.
        b) Choose three students and list all the possible orders in which they
           could speak.
        c) Repeat this process with four students.
        d) Is there an easy method to organize the list so that you could
           include all the possibilities?
        e) Is this method related to your results in question 1? Explain.
        f) Can you use your method to predict the number of different
           orders in which eight students could give speeches?


Many counting and probability calculations involve the product of a series of
consecutive integers. You can use factorial notation to write such expressions
more easily. For any natural number n,

n! = n × (n − 1) × (n − 2) × (n − 3) × … × 3 × 2 × 1

This expression is read as n factorial.


Example 1 Evaluating Factorials

Calculate each factorial.
a)  2!          b) 4!             c)      8!

Solution
a) 2! = 2 × 1
      =2

b) 4! = 4 × 3 × 2 × 1
      = 24

c) 8! = 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1
      = 40 320


As you can see from Example 1, n! increases dramatically as n becomes larger.
However, calculators and computer software provide an easy means of
calculating the larger factorials. Most scientific and graphing calculators have
a factorial key or function.


                                                                   4.2 Factorials and Permutations • MHR   233
Example 2 Using Technology to Evaluate Factorials

Calculate.
a)  21!           b)   53!           c)   70!

Solution 1     Using a Graphing Calculator

Enter the number on the home screen and then use the ! function on the
MATH PRB menu to calculate the factorial.
a) 21! = 21 × 20 × 19 × 18 × … × 2 × 1
       = 5.1091 × 1019

b)    53! = 53 × 52 × 51 × … × 3 × 2 × 1
          = 4.2749 × 1069


c)    Entering 70! on a graphing calculator gives an ERR:OVERFLOW message since
      70! > 10100 which is the largest number the calculator can handle. In fact, 69! is
      the largest factorial you can calculate directly on TI-83 series calculators.

Solution 2     Using a Spreadsheet

Both Corel® Quattro® Pro and Microsoft® Excel have a built-in factorial
function with the syntax FACT(n).




234     MHR • Permutations and Organized Counting
Example 3 Evaluating Factorial Expressions

Evaluate.
    10!                       83!
a) ᎏ                     b)   ᎏ
     5!                       79!

Solution

In both these expressions, you can divide out the common terms in the
numerator and denominator.
    10!    10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1
a) ᎏ = ᎏᎏᎏᎏ
     5!              5×4×3×2×1
         = 10 × 9 × 8 × 7 × 6
         = 30 240

     83!   83 × 82 × 81 × 80 × 79 × 78 × … × 2 × 1
b)   ᎏ = ᎏᎏᎏᎏᎏ
     79!             79 × 78 × … × 2 × 1
         = 83 × 82 × 81 × 80
         = 44 102 880

Note that by dividing out the common terms, you can use a calculator to evaluate
this expression even though the factorials are too large for the calculator.

Example 4 Counting Possibilities
The senior choir has rehearsed five songs for an upcoming assembly.
In how many different orders can the choir perform the songs?

Solution
There are five ways to choose the first song, four ways to choose the second,
three ways to choose the third, two ways to choose the fourth, and only one way
to choose the final song. Using the fundamental counting principle, the total
number of different ways is
5 × 4 × 3 × 2 × 1 = 5!
                  = 120
The choir can sing the five songs in 120 different orders.

Example 5      Indirect Method
In how many ways could ten questions on a test be arranged, if the easiest
question and the most difficult question
a) are side-by-side?
b)   are not side-by-side?




                                                                 4.2 Factorials and Permutations • MHR   235
Solution

a) Treat the easiest question and the most difficult question as a unit making
      nine items that are to be arranged. The two questions can be arranged in 2!
      ways within their unit.
      9! × 2! = 725 760
      The questions can be arranged in 725 760 ways if the easiest question
      and the most difficult question are side-by-side.

b) Use the indirect method. The number of arrangements with the easiest and
      most difficult questions separated is equal to the total number of possible
      arrangements less the number with the two questions side-by-side:
      10! − 9! × 2! = 3 628 800 − 725 760
                    = 2 903 040
      The questions can be arranged in 2 903 040 ways if the easiest question
      and the most difficult question are not side-by-side.

A permutation of n distinct items is an arrangement of all the items in a definite
order. The total number of such permutations is denoted by nPn or P(n, n).

There are n possible ways of choosing the first item, n − 1 ways of choosing the
second, n − 2 ways of choosing the third, and so on. Applying the fundamental
counting principle as in Example 5 gives
 P = n × (n − 1) × (n − 2) × (n − 3) × … × 3 × 2 × 1
n n
   = n!


Example 6 Applying the Permutation Formula
In how many different orders can eight nominees for the students’ council give
their speeches at an assembly?

Solution
P = 8!
8 8
  =8×7×6×5×4×3×2×1
  = 40 320
There are 40 320 different orders in which the eight nominees can give their speeches.

Example 7 Student Government
In how many ways could a president and a vice-president be chosen from a group
of eight nominees?

Solution
Using the fundamental counting principle, there are 8 × 7, or 56, ways to choose
a president and a vice-president.

236     MHR • Permutations and Organized Counting
A permutation of n distinct items taken r at a time is an arrangement of r
of the n items in a definite order. Such permutations are sometimes called
r-arrangements of n items. The total number of possible arrangements of
r items out of a set of n is denoted by nPr or P(n, r).

There are n ways of choosing the first item, n − 1 ways of choosing the second
item, and so on down to n − r + 1 ways of choosing the rth item. Using the
fundamental counting principle,

 P = n(n − 1)(n − 2)…(n − r + 1)
n r
                                                                                         Project
                                                                                         Prep
It is often more convenient to rewrite this expression in terms of factorials.
                                                                                         The permutations
                                                                                         formula could be
        n!
 P = ᎏᎏ                                                                                  a useful tool for
n r
     (n − r)!
                                                                                         your probability
                                                                                         project.
The denominator divides out completely, as in Example 3, so these two ways
of writing nPr are equivalent.


Example 8 Applying the Permutation Formula

In a card game, each player is dealt a face down “reserve” of 13 cards that
can be turned up and used one by one during the game. How many different
sequences of reserve cards could a player have?

Solution 1    Using Pencil and Paper

Here, you are taking 13 cards from a deck of 52.
            52!
  P = ᎏᎏ
52 13
        (52 − 13)!
        52!
      =ᎏ
        39!
      = 52 × 51 × 50 × … × 41 × 40
      = 3.9542 × 1021
There are approximately 3.95 × 1021 different sequences of reserve cards a
player could have.

Solution 2    Using a Graphing Calculator
Use the nPr function on the MATH PRB menu.

There are approximately 3.95 × 1021 different
sequences of reserve cards a player could turn
up during one game.



                                                                    4.2 Factorials and Permutations • MHR   237
Solution 3     Using a Spreadsheet

Both Corel® Quattro® Pro and Microsoft® Excel have a permutations function
with the syntax PERMUT(n,r).




There are approximately 3.95 × 1021 different sequences of reserve cards a
player could turn up during one game.




   Key Concepts

   • A factorial indicates the multiplication of consecutive natural numbers.
     n! = n(n − 1)(n − 2) × … × 1.
   • The number of permutations of n distinct items chosen n at a time in a
     definite order is nPn = n!

   • The number of permutations of r items taken from n distinct items is
             n!
      P = ᎏ.
     n r
          (n − r)!


   Communicate Your Understanding

      1. Explain why it is convenient to write the expression for the number of
        possible permutations in terms of factorials.

      2. a) Is (−3)! possible? Explain your answer.
        b) In how many ways can you order an empty list, or zero items? What does
           this tell you about the value of 0!? Check your answer using a calculator.



238     MHR • Permutations and Organized Counting
Practise                                           B
A                                                  7. Simplify each of the following in factorial
                                                       form. Do not evaluate.
1. Express in factorial notation.
                                                       a) 12 × 11 × 10 × 9!
    a) 6 × 5 × 4 × 3 × 2 × 1
                                                       b) 72 × 7!
    b) 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1
                                                       c) (n + 4)(n + 5)(n +3)!
    c) 3 × 2 × 1
    d) 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1           8. Communication Explain how a factorial is an
                                                       iterative process.
2. Evaluate.
       7!               11!                        9. Seven children are to line up for a photograph.
    a) ᎏ            b) ᎏ
       4!                9!                            a) How many different arrangements are
        8!               15!                              possible?
    c) ᎏ            d) ᎏ
       5! 2!            3! 8!                          b) How many arrangements are possible if
       85!               14!                              Brenda is in the middle?
    e) ᎏ             f) ᎏ
       82!              4! 5!                          c) How many arrangements are possible if
                                                          Ahmed is on the far left and Yen is on
3. Express in the form nPr.                               the far right?
    a) 6 × 5 × 4                                       d) How many arrangements are possible if
    b) 9 × 8 × 7 × 6                                      Hanh and Brian must be together?
    c) 20 × 19 × 18 × 17
                                                  10. A 12-volume encyclopedia is to be placed on
    d) 101 × 100 × 99 × 98 × 97                        a shelf. How many incorrect arrangements
    e) 76 × 75 × 74 × 73 × 72 × 71 × 70                are there?
4. Evaluate without using technology.             11. In how many ways can the 12 members of
    a) P(10, 4)        b) P(16, 4)     c) 5P2          a volleyball team line up, if the captain and
    d) 9P4             e) 7!
                                                       assistant captain must remain together?

                                                  12. Ten people are to be seated at a rectangular
5. Use either a spreadsheet or a graphing or
    scientific calculator to verify your answers        table for dinner. Tanya will sit at the head of
    to question 4.                                     the table. Henry must not sit beside either
                                                       Wilson or Nancy. In how many ways can the
Apply, Solve, Communicate                              people be seated for dinner?

6. a) How many ways can you arrange the           13. Application Joanne prefers classical and
       letters in the word factor?                     pop music. If her friend Charlene has five
                                                       classical CDs, four country and western
    b) How many ways can Ismail arrange
                                                       CDs, and seven pop CDs, in how many
       four different textbooks on the shelf in
                                                       orders can Joanne and Charlene play the
       his locker?
                                                       CDs Joanne likes?
    c) How many ways can Laura colour
       4 adjacent regions on a map if she has     14. In how many ways can the valedictorian,
       a set of 12 coloured pencils?                   class poet, and presenter of the class gift
                                                       be chosen from a class of 20 students?

                                                               4.2 Factorials and Permutations • MHR   239
15. Application If you have a standard deck of               ACHIEVEMENT CHECK
               52 cards, in how many different ways can           Knowledge/     Thinking/Inquiry/
                                                                                                     Communication   Application
               you deal out                                      Understanding    Problem Solving

               a) 5 cards?                b) 10 cards?         20. Wayne has a briefcase with a three-digit
               c) 5 red cards?            d) 4 queens?               combination lock. He can set the
                                                                     combination himself, and his favourite
          16. Inquiry/Problem Solving Suppose you are                digits are 3, 4, 5, 6, and 7. Each digit can
               designing a coding system for data relayed            be used at most once.
               by a satellite. To make transmissions errors
                                                                     a) How many permutations of three of
               easier to detect, each code must have no
                                                                         these five digits are there?
               repeated digits.
                                                                    b) If you think of each permutation as a
               a) If you need 60 000 different codes, how
                                                                         three-digit number, how many of these
                  many digits long should each code be?
                                                                         numbers would be odd numbers?
               b) How many ten-digit codes can you
                                                                     c) How many of the three-digit numbers
                  create if the first three digits must be 1,
                                                                         are even numbers and begin with a 4?
                  3, or 6?
                                                                    d) How many of the three-digit numbers are
          17. Arnold Schoenberg (1874 −1951) pioneered                   even numbers and do not begin with a 4?
               serialism, a technique for composing music            e) Is there a connection among the four
               based on a tone row, a sequence in which                  answers above? If so, state what it is and
               each of the 12 tones in an octave is played               why it occurs.
               only once. How many tone rows are possible?

          18. Consider the students’ council described on       C
         pte
    ha         page 223 at the beginning of this chapter.
C


           r




               a) In how many ways can the secretary,          21. TI-83 series calculators use the definition
           m
P




r

                                                                    ΂ − ᎏ2ᎏ΃! = ͙␲. Research the origin of this
    oble                                                                  1
                  treasurer, social convenor, and                                ෆ
                  fundraising chair be elected if all ten
                  nominees are eligible for any of these            definition and explain why it is useful for
                  positions?                                        mathematical calculations.
               b) In how many ways can the council be          22. Communication How many different ways
                  chosen if the president and vice-                 can six people be seated at a round table?
                  president must be grade 12 students and           Explain your reasoning.
                  the grade representatives must represent
                  their current grade level?                   23. What is the highest power of 2 that divides
                                                                    evenly into 100! ?
          19. Inquiry/Problem Solving A student has
               volunteered to photograph the school’s          24. A committee of three teachers are to select
               championship basketball team for the                 the winner from among ten students
               yearbook. In order to get the perfect                nominated for special award. The teachers
               picture, the student plans to photograph the         each make a list of their top three choices in
               ten players and their coach lined up in every        order. The lists have only one name in
               possible order. Determine whether this plan          common, and that name has a different rank
               is practical.                                        on each list. In how many ways could the
                                                                    teachers have made their lists?

         240     MHR • Permutations and Organized Counting
4.3         Permutations With Some Identical Items

  Often, you will deal with permutations in which some items are identical.

      I N V E S T I G AT E & I N Q U I R E : W h a t I s i n a N a m e ?

      1. In their mathematics class,
         John and Jenn calculate the
         number of permutations of
         all the letters of their first
         names.
         a) How many permutations
            do you think John finds?
         b) List all the permutations
            of John’s name.
         c) How many permutations
            do you think Jenn finds?
         d) List all the permutations
            of Jenn’s name.
         e) Why do you think there
            are different numbers of
            permutations for the two
            names?
      2. a) List all the permutations of the letters in your first name. Is the
            number of permutations different from what you would calculate
            using the nPn = n! formula? If so, explain why.
         b) List and count all the permutations of a word that has two identical
            pairs of letters. Compare your results with those your classmates
            found with other words. What effect do the identical letters have
            on the number of different permutations?
         c) Predict how many permutations you could make with the letters in the
            word googol. Work with several classmates to verify your prediction by
            writing out and counting all of the possible permutations.
      3. Suggest a general formula for the number of permutations of a word that
         has two or more identical letters.


  As the investigation above suggests, you can develop a general formula for
  permutations in which some items are identical.



                                                              4.3 Permutations With Some Identical Items • MHR   241
Example 1 Permutations With Some Identical Elements

Compare the different permutations for the words DOLE, DOLL, and LOLL.

Solution

The following are all the permutations of DOLE :
DOLE        DOEL           DLOE         DLEO          DEOL          DELO
ODLE        ODEL           OLDE         OLED          OEDL          OELD
LODE        LOED           LDOE         LDEO          LEOD          LEDO
EOLD        EODL           ELOD         ELDO          EDOL          EDLO

There are 24 permutations of the four letters in DOLE. This number matches
what you would calculate using 4P4 = 4!

To keep track of the permutations of the letters in the word DOLL, use a
subscript to distinguish the one L from the other.
DOLL1         DOL1L         DLO L1       DL L1O        DL1OL      DL1LO
ODLL1         ODL1L         OLDL1        OLL1D         OL1DL      OL1LD
LODL1         LOL1D         LDOL1        LDL1O         LL1OD      LL1DO
L1OLD         L1ODL         L1LOD        L1LDO         L1DOL      L1DLO

Of the 24 arrangements listed here, only 12 are actually different from each
other. Since the two Ls are in fact identical, each of the permutations shown in
black is duplicated by one of the permutations shown in red. If the two Ls in a
permutation trade places, the resulting permutation is the same as the original
one. The two Ls can trade places in 2P2 = 2! ways.

Thus, the number of different arrangements is
4!    24
ᎏ=ᎏ
2!     2
   = 12

In other words, to find the number of permutations, you divide the total number
of arrangements by the number of ways in which you can arrange the identical
letters. For the letters in DOLL, there are four ways to choose the first letter,
three ways to choose the second, two ways to choose the third, and one way to
choose the fourth. You then divide by the 2! or 2 ways that you can arrange the
two Ls.

Similarly, you can use subscripts to distinguish the three Ls in LOLL, and then
highlight the duplicate arrangements.
L2OLL1        L2OL1L        L2LOL1        L2LL1O        L2L1OL      L2L1LO
OL2LL1        OL2L1L        OLL2L1        OLL1L2        OL1L2L      OL1LL2
LOL2L1        LOL1L2        LL2OL1        LL2L1O        LL1OL2      LL1L2O
L1OLL2        L1OL2L        L1LOL2        L1LL2O        L1L2OL      L1L2LO



242   MHR • Permutations and Organized Counting
The arrangements shown in black are the only different ones. As with the other
two words, there are 24 possible arrangements if you distinguish between the
identical Ls. Here, the three identical Ls can trade places in 3P3 = 3! ways.
                                    4!
Thus, the number of permutations is ᎏ = 4.
                                    3!

You can generalize the argument in Example 1 to show that the number of
                                                             n!
permutations of a set of n items of which a are identical is ᎏ .
                                                             a!

Example 2 Tile Patterns

Tanisha is laying out tiles for the edge of a mosaic. How many patterns can she
make if she uses four yellow tiles and one each of blue, green, red, and grey
tiles?

Solution

Here, n = 8 and a = 4.
8!
ᎏ =8×7×6×5
4!
   = 1680
Tanisha can make 1680 different patterns with the eight tiles.

Example 3 Permutation With Several Sets of Identical Elements

The word bookkeeper is unusual in that it has three consecutive double letters.
How many permutations are there of the letters in bookkeeper?

Solution
If each letter were different, there would be 10! permutations, but there are two
os, two ks, and three es. You must divide by 2! twice to allow for the duplication
of the os and ks, and then divide by 3! to allow for the three es:
   10!      10 × 9 × 8 × 7 × 6 × 5 × 4
 ᎏ = ᎏᎏᎏ
 2!2!3!               2×2
         = 151 200
There are 151 200 permutations of the letters in bookkeeper.


The number of permutations of a set of n objects containing a identical
objects of one kind, b identical objects of a second kind, c identical objects
                                 n!
of a third kind, and so on is ᎏᎏ.
                              a!b!c!…



                                                          4.3 Permutations With Some Identical Items • MHR   243
Example 4 Applying the Formula for Several Sets of Identical Elements

Barbara is hanging a display of clothing imprinted with the school’s crest on a
line on a wall in the cafeteria. She has five sweatshirts, three T-shirts, and four
pairs of sweatpants. In how many ways can Barbara arrange the display?

Solution
                                                                                     Project
Here, a = 5, b = 3, c = 4, and the total number of items is 12.                      Prep

So,                                                                                  The game you
  n!       12!                                                                       design for your
ᎏ=ᎏ
a!b!c!   5!3!4!                                                                      probability project
       = 27 720                                                                      could involve
Barbara can arrange the display in 27 720 different ways.                            permutations of
                                                                                     identical objects.



   Key Concepts

   • When dealing with permutations of n items that include a identical items
     of one type, b identical items of another type, and so on, you can use the
                 n!
     formula ᎏ .
              a!b!c!…


   Communicate Your Understanding

      1. Explain why there are fewer permutations of a given number of items if some
        of the items are identical.

      2. a) Explain why the formula for the numbers of permutations when some items
            are identical has the denominator a!b!c!… instead of a × b × c… .
        b) Will there ever be cases where this denominator is larger than the
           numerator? Explain.
        c) Will there ever be a case where the formula does not give a whole number
           answer? What can you conclude about the denominator and the numerator?
           Explain your reasoning.




244     MHR • Permutations and Organized Counting
Practise                                                8. a) Calculate the number of permutations for
                                                               each of the jumbled words in this puzzle.
A                                                          b) Estimate how long it would take to solve
1. Identify the indistinguishable items in each                this puzzle by systematically writing out
    situation.                                                 the permutations.
    a) The letters of the word mathematics are
       arranged.
    b) Dina has six notebooks, two green and
       four white.
    c) The cafeteria prepares 50 chicken
       sandwiches, 100 hamburgers, and
       70 plates of French fries.
    d) Thomas and Richard, identical twins,
       are sitting with Marianna and Megan.

2. How many permutations are there of all
    the letters in each name?
    a) Inverary      b) Beamsville
    c) Mattawa       d) Penetanguishene

3. How many different five-digit numbers                        © Tribune Media Services, Inc. All Rights Reserved. Reprinted with Permission.

    can be formed using three 2s and two 5s?

4. How many different six-digit numbers are
    possible using the following numbers?                          www.mcgrawhill.ca/links/MDM12

    a) 1, 2, 3, 4, 5, 6     b) 1, 1, 1, 2, 3, 4              For more word jumbles and other puzzles, visit
    c) 1, 3, 3, 4, 4, 5     d) 6, 6, 6, 6, 7, 8               the above web site and follow the links. Find
                                                                or generate two puzzles for a classmate
Apply, Solve, Communicate                                                       to solve.

B                                                       9. Application Roberta is a pilot for a small
5. Communication A coin is tossed eight times.             airline. If she flies to Sudbury three times,
    In how many different orders could five                 Timmins twice, and Thunder Bay five times
    heads and three tails occur? Explain your              before returning home, how many different
    reasoning.                                             itineraries could she follow? Explain your
                                                           reasoning.
6. Inquiry/Problem Solving How many 7-digit
    even numbers less than 3 000 000 can be            10. After their training run, six members of a
    formed using all the digits 1, 2, 2, 3, 5, 5, 6?       track team split a bag of assorted doughnuts.
                                                           How many ways can the team share the
7. Kathryn’s soccer team played a good season,
                                                           doughnuts if the bag contains
    finishing with 16 wins, 3 losses, and 1 tie. In
                                                           a) six different doughnuts?
    how many orders could these results have
    happened? Explain your reasoning.                      b) three each of two varieties?
                                                           c) two each of three varieties?

                                                         4.3 Permutations With Some Identical Items • MHR                                   245
11. As a project for the photography class,                                  15. Ten students have been nominated for the
                                                                            pte
      Haseeb wants to create a linear collage                          ha         positions of secretary, treasurer, social




                                                                   C


                                                                              r
      of photos of his friends. He creates a                                      convenor, and fundraising chair. In how




                                                                              m
                                                                   P
                                                                   r
                                                                       oble
      template with 20 spaces in a row. If                                        many ways can these positions be filled if
      Haseeb has 5 identical photos of each                                       the Norman twins are running and plan to
      of 4 friends, in how many ways can he                                       switch positions on occasion for fun since
      make his collage?                                                           no one can tell them apart?

12. Communication A used car lot has four                                    16. Inquiry/Problem Solving In how many ways
      green flags, three red flags, and two blue                                    can all the letters of the word CANADA be
      flags in a bin. In how many ways can the                                     arranged if the consonants must always be
      owner arrange these flags on a wire                                          in the order in which they occur in the word
      stretched across the lot? Explain your                                      itself?
      reasoning.
                                                                              C
13. Application Malik wants to skateboard over                               17. Glen works part time stocking shelves in a
      to visit his friend Gord who lives six blocks                               grocery store. The manager asks him to
      away. Gord’s house is two blocks west and                                   make a pyramid display using 72 cans of
      four blocks north of Malik’s house. Each                                    corn, 36 cans of peas, and 57 cans of carrots.
      time Malik goes over, he likes to take a                                    Assume all the cans are the same size and
      different route. How many different routes                                  shape. On his break, Glen tries to work out
      are there for Malik if he only travels west or                              how many different ways he could arrange
      north?                                                                      the cans into a pyramid shape with a
                                                                                  triangular base.
                                                                                  a) Write a formula for the number of
        ACHIEVEMENT CHECK
                                                                                     different ways Glen could stack the
  Knowledge/     Thinking/Inquiry/                                                   cans in the pyramid.
                                     Communication   Application
 Understanding    Problem Solving
                                                                                  b) Estimate how long it will take Glen to
14. Fran is working on a word puzzle and is
                                                                                     calculate this number of permutations
      looking for four-letter “scrambles” from
                                                                                     by hand.
      the clue word calculate.
                                                                                  c) Use computer software or a calculator
       a) How many of the possible four-letter
                                                                                     to complete the calculation.
          scrambles contain four different letters?
       b) How many contain two as and one                                    18. How many different ways are there of
          other pair of identical letters?                                        arranging seven green and eight brown
       c) How many scrambles consist of any
                                                                                  bottles in a row, so that exactly one pair
          two pairs of identical letters?                                         of green bottles is side-by-side?
       d) What possibilities have you not yet                                19. In how many ways could a class of
          taken into account? Find the number                                     18 students divide into groups of
          of scrambles for each of these cases.                                   3 students each?
       e) What is the total number of four-letter
          scrambles taking all cases into account?



246     MHR • Permutations and Organized Counting
4.4               Pascal’s Triangle

  The array of numbers shown below is called Pascal’s
  triangle in honour of French mathematician, Blaise
  Pascal (1623−1662). Although it is believed that the
  14th century Chinese mathematician Chu Shi-kie
  knew of this array and some of its applications, Pascal
  discovered it independently at age 13. Pascal found
  many mathematical uses for the array, especially in
  probability theory.

  Pascal’s method for building his triangle is a simple
  iterative process similar to those described in,
  section 1.1. In Pascal’s triangle, each term is equal
  to the sum of the two terms immediately above it.
  The first and last terms in each row are both equal
  to 1 since the only term immediately above them is
  also always a 1.

  If tn,r represents the term in row n, position r, then
  tn,r = tn-1,r-1 + tn-1,r .

  For example, t6,2 = t5,1 + t5,2. Note that both the row
  and position labelling begin with 0.
                                                                                                                                                    Chu Shi-kie’s triangle
                                1                              Row 0                                             t0,0
                           1         1                         Row 1                                      t1,0          t1,1
                      1         2         1                    Row 2                               t2,0          t2,1          t2,2
                  1        3         3         1               Row 3                        t3,0          t3,1          t3,2          t3,3
              1       4         6         4        1           Row 4                 t4,0          t4,1          t4,2          t4,3          t4,4
          1       5        10        10        5       1       Row 5          t5,0          t5,1          t5,2          t5,3          t5,4          t5,5
      1       6       15        20        15       6       1   Row 6   t6,0          t6,1          t6,2          t6,3          t6,4          t6,5          t6,6



                                 www.mcgrawhill.ca/links/MDM12

                           Visit the above web site and follow the links to
                          learn more about Pascal’s triangle. Write a brief
                             report about an application or an aspect of
                                   Pascal’s triangle that interests you.




                                                                                                                               4.4 Pascal’s Triangle • MHR              247
I N V E S T I G AT E & I N Q U I R E : R o w S u m s

       1. Find the sums of the numbers in each of the first six rows of Pascal’s
            triangle and list these sums in a table.
       2. Predict the sum of the entries in
             a) row 7            b) row 8            c) row 9
       3. Verify your predictions by calculating the sums of the numbers in rows
            7, 8, and 9.
       4. Predict the sum of the entries in row n of Pascal’s triangle.
       5. List any other patterns you find in Pascal’s triangle. Compare your list
            with those of your classmates. Do their lists suggest further patterns you
            could look for?



In his book Mathematical Carnival, Martin Gardner describes Pascal’s triangle
as “so simple that a 10-year old can write it down, yet it contains such
inexhaustible riches and links with so many seemingly unrelated aspects of
mathematics, that it is surely one of the most elegant of number arrays.”


Example 1 Pascal’s Method

a)    The first six terms in row 25 of Pascal’s triangle are 1, 25, 300, 2300,
      12 650, and 53 130. Determine the first six terms in row 26.
b)    Use Pascal’s method to write a formula for each of the following terms:
      i)     t12,5
      ii)    t40,32
      iii) tn+1,r+1



Solution

a)     t26,0 = 1                        t26,1 = 1 + 25          t26,2 = 25 + 300
                                              = 26                    = 325
       t26,3 = 300 + 2300               t26,4 = 2300 + 12 650   t26,5 = 12 650 + 53 130
             = 2600                           = 14 950                = 65 780

b) i)        t12,5 = t11,4 + t11,5
      ii)    t40,32 = t39,31 + t39,32
      iii) tn+1,r+1 = tn,r + tn,r+1




248         MHR • Permutations and Organized Counting
Example 2 Row Sums

Which row in Pascal’s triangle has the sum of its terms equal to 32 768?

Solution

From the investigation on page 248, you know that the sum of the
terms in any row n is 2n. Dividing 32 768 by 2 repeatedly, you find that
32 768 = 215. Thus, it is row 15 of Pascal’s triangle that has terms totalling 32 768.



Example 3 Divisibility

Determine whether tn,2 is divisible by tn,1 in each row of Pascal’s triangle.

Solution
                      tn,2
      Row             ᎏ              Divisible?
                      tn,1
     0 and 1        n/a                 n/a
        2             0.5                no
        3             1                  yes
        4             1.5                no
        5             2                  yes
        6             2.5                no
        7             3                  yes

It appears that tn,2 is divisible by tn,1 only in odd-numbered rows.
However, 2tn,2 is divisible by tn,1 in all rows that have three or more terms.



Example 4 Triangular Numbers

Coins can be arranged in the shape of an equilateral triangle as shown.




a)   Continue the pattern to determine the numbers of coins in triangles
     with four, five, and six rows.
b)   Locate these numbers in Pascal’s triangle.
c)   Relate Pascal’s triangle to the number of coins in a triangle with n rows.
d)   How many coins are in a triangle with 12 rows?


                                                                                 4.4 Pascal’s Triangle • MHR   249
Solution

a) The numbers of coins in the triangles follow the pattern 1 + 2 + 3 + … as
      shown in the table below.

b) The numbers of coins in the triangles match the entries on the third
      diagonal of Pascal’s triangle.
       Number of Rows Number of Coins       Term in Pascal’s Triangle                                1
             1               1                        t2,2                                       1       1
             2               3                        t3,2                                   1       2       1
             3               6                        t4,2                               1       3       3       1
                                                                                     1       41      6       4
             4              10                        t5,2
                                                                                 1  5 10 10 5 1
             5              15                        t6,2                      1 6 15 20 15 6 1
             6              21                        t7,2                     1 7 21 35 35 21 7 1

c) Compare the entries in the first and third columns of the table. The row
      number of the term from Pascal’s triangle is always one greater than the
      number of rows in the equilateral triangle. The position of the term in the
      row, r, is always 2. Thus, the number of coins in a triangle with n rows is
      equal to the term tn+1,2 in Pascal’s triangle.

d) t12+1,2 = t13,2
           = 78
      A triangle with 12 rows contains 78 coins.

Numbers that correspond to the number of items stacked in a triangular array
are known as triangular numbers. Notice that the nth triangular number is
also the sum of the first n positive integers.

Example 5 Perfect Squares

Can you find a relationship between perfect squares and the sums of pairs of
entries in Pascal’s triangle?

Solution
Again, look at the third diagonal in                n    n2    Entries in Pascal’s Triangle Terms in Pascal’s Triangle
Pascal’s triangle.                                  1      1               1                                         t2,2
                                                    2      4             1+3                                 t2,2 + t3,2
                                                    3      9             3+6                                 t3,2 + t4,2
                                                    4    16             6 + 10                               t4,2 + t5,2

Each perfect square greater than 1 is equal to the sum of a pair of adjacent
terms on the third diagonal of Pascal’s triangle: n2 = tn,2 + tn+1,2 for n > 1.



250     MHR • Permutations and Organized Counting
Key Concepts

  • Each term in Pascal’s triangle is equal to the sum of the two adjacent terms in
    the row immediately above: tn,r = tn-1,r-1 + tn-1,r where tn,r represents the r th term
    in row n.

  • The sum of the terms in row n of Pascal’s triangle is 2n.

  • Τhe terms in the third diagonal of Pascal’s triangle are triangular numbers.
    Many other number patterns occur in Pascal’s triangle.


  Communicate Your Understanding

    1. Describe the symmetry in Pascal’s triangle.

    2. Explain why the triangular numbers in Example 4 occur in Pascal’s triangle.



Practise                                                 Apply, Solve, Communicate
A                                                          B
1. For future use, make a diagram of the first              5. Inquiry/Problem Solving
    12 rows of Pascal’s triangle.                              a) Alternately add and subtract the terms
                                                                  in each of the first seven rows of Pascal’s
2. Express as a single term from Pascal’s
                                                                  triangle and list the results in a table
    triangle.
                                                                  similar to the one below.
    a) t7,2 + t7,3
                                                                    Row      Sum/Difference       Result
    b) t51,40 + t51,41                                               0         1                     1
    c) t18,12 − t17,12                                               1         1−1                   0
    d) tn,r − tn-1,r                                                 2         1−2+1                0
                                                                     3         1 − 3 + 3 −1         0
3. Determine the sum of the terms in each of
                                                                     Ӈ
    these rows in Pascal’s triangle.
    a) row 12                                                  b) Predict the result of alternately adding
    b) row 20
                                                                  and subtracting the entries in the eighth
                                                                  row. Verify your prediction.
    c) row 25
                                                               c) Predict the result for the nth row.
    d) row (n − 1)
                                                           6. a) Predict the sum of the squares of the
4. Determine the row number for each of the
                                                                  terms in the nth row of Pascal’s triangle.
    following row sums from Pascal’s triangle.
                                                               b) Predict the result of alternately adding
    a) 256                    b) 2048
                                                                  and subtracting the squares of the terms
    c) 16 384                 d) 65 536                           in the nth row of Pascal’s triangle.



                                                                               4.4 Pascal’s Triangle • MHR   251
7. Communication                                     11. Application Oranges can be piled in a
      a) Compare the first four powers of 11 with         tetrahedral shape as shown. The first pile
         entries in Pascal’s triangle. Describe any      contains one orange, the second contains
         pattern you notice.                             four oranges, the third contains ten oranges,
                                                         and so on. The numbers of items in such
      b) Explain how you could express row 5 as
                                                         stacks are known as tetrahedral numbers.
         a power of 11 by regrouping the entries.
      c) Demonstrate how to express rows 6 and
         7 as powers of 11 using the regrouping
         method from part b). Describe your
         method clearly.

 8. a) How many diagonals are there in
                                                          a) Relate the number of oranges in the nth
         i)   a quadrilateral?                               pile to entries in Pascal’s triangle.
         ii) a pentagon?                                  b) What is the 12th tetrahedral number?
         iii) a hexagon?
                                                      12. a) Relate the sum of the squares of the first
      b) Find a relationship between entries in
                                                             n positive integers to entries in Pascal’s
         Pascal’s triangle and the maximum
                                                             triangle.
         number of diagonals in an n-sided
         polygon.                                         b) Use part a) to predict the sum of the
                                                             squares of the first ten positive integers.
      c) Use part b) to predict how many
                                                             Verify your prediction by adding the
         diagonals are in a heptagon and an
                                                             numbers.
         octagon. Verify your prediction by
         drawing these polygons and counting the      13. Inquiry/Problem Solving A straight line
         number of possible diagonals in each.           drawn through a circle divides it into two
                                                         regions.
 9. Make a conjecture about the divisibility
      of the terms in prime-numbered rows                 a) Determine the maximum number of
      of Pascal’s triangle. Confirm that your                 regions formed by n straight lines drawn
      conjecture is valid up to row 11.                      through a circle. Use Pascal’s triangle to
                                                             help develop a formula.
10. a) Which rows of Pascal’s triangle contain
         only odd numbers? Is there a pattern to
         these rows?
      b) Are there any rows that have only even
         numbers?
      c) Are there more even or odd entries in
         Pascal’s triangle? Explain how you               b) What is the maximum number of regions
         arrived at your answer.                             inside a circle cut by 15 lines?

                                                      14. Describe how you would set up a
                                                         spreadsheet to calculate the entries in
                                                         Pascal’s triangle.



252     MHR • Permutations and Organized Counting
18. a) Write the first 20 rows of Pascal’s
 C
                                                               triangle on a sheet of graph paper,
15. The Fibonacci sequence is 1, 1, 2, 3, 5, 8,
                                                               placing each entry in a separate square.
     13, 21, … . Each term is the sum of the
     previous two terms. Find a relationship                b) Shade in all the squares containing
     between the Fibonacci sequence and the                    numbers divisible by 2.
     following version of Pascal’s triangle.                c) Describe, in detail, the patterns
     1                                                         produced.
     1 1                                                    d) Repeat this process for entries divisible
     1 2 1                                                     by other whole numbers. Observe the
     1 3 3 1                                                   resulting patterns and make a conjecture
     1 4 6 4 1                                                 about the divisibility of the terms in
     1 5 10 10 5 1                                             Pascal’s triangle by various whole
     1 6 15 20 15 6 1
                                                               numbers.
     1 7 21 35 35 21 7 1
     …                                                  19. Communication

16. Application Toothpicks are laid out to                  a) Describe the iterative process used to
     form triangles as shown below. The first                   generate the terms in the triangle below.
     triangle contains 3 toothpicks, the second                              1
                                                                             ᎏᎏ
     contains 9 toothpicks, the third                                        1
     contains 18 toothpicks, and so on.                                    1    1
                                                                          ᎏᎏ    ᎏᎏ
                                                                           2    2
                                                                        1    1     1
                                                                       ᎏᎏ    ᎏᎏ    ᎏᎏ
                                                                        3    6     3
                                                                     1    1     1     1
                                                                    ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ
                                                                     4    12 12       4
                                                                  1    1     1     1    1
                                                                  ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ
     a) Relate the number of toothpicks in the                    5    20 30 20         5
        nth triangle to entries in Pascal’s triangle.           1   1     1     1     1    1
                                                               ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ
     b) How many toothpicks would the                           6   30 60 60 30            6
        10th triangle contain?
                                                            b) Write the entries for the next two rows.
17. Design a 3-dimensional version of Pascal’s              c) Describe three patterns in this triangle.
     triangle. Use your own criteria for the
                                                            d) Research why this triangle is called the
     layers. The base may be any regular
                                                               harmonic triangle. Briefly explain the
     geometric shape, but each successive layer
                                                               origin of the name, listing your source(s).
     must have larger dimensions than the one
     above it.




                                                                            4.4 Pascal’s Triangle • MHR   253
4.5           Applying Pascal’s Method

  The iterative process that generates the terms in Pascal’s triangle can also
  be applied to counting paths or routes between two points. Consider
                                                                                                                                        1
  water being poured into the top bucket in the diagram. You can use
  Pascal’s method to count the different paths that water overflowing
  from the top bucket could take to each of the buckets in the
  bottom row.                                                                1                                                                                1


  The water has one path to each of the buckets in the second
  row. There is one path to each outer bucket of the third
                                                                                                                                           2                                      1
  row, but two paths to the middle bucket, and so on.                                       1

  The numbers in the diagram match those in Pascal’s
  triangle because they were derived using the same
                                                           1                                                    3                                                 3                                        1
  method—Pascal’s method.




        I N V E S T I G AT E & I N Q U I R E : C o u n t i n g R o u t e s

        Suppose you are standing at the corner of Pythagoras                                                    Kovalevsky Avenue




                                                                                                                                                      Sierpinski Street
        Street and Kovalevsky Avenue, and want to reach the



                                                                                                                                     Germain Street
        corner of Fibonacci Terrace and Euler Boulevard. To                                de Fermat Drive
        avoid going out of your way, you would travel only




                                                                                                                                                                                         Euler Boulevard
                                                                       Pythagoras Street




        east and south. Notice that you could start out by




                                                                                                                                                                          Gauss Street
        going to the corner of either Euclid Street and                                     Agnes Road
        Kovalevsky Avenue or Pythagoras Street and de
                                                                                                                Descartes Street
                                                                                                Euclid Street




        Fermat Drive.                                                                                                              Hypatia Street

         1. How many routes are possible to the corner of
           Euclid Street and de Fermat Drive from your                                                                                   Wiles Lane
           starting point? Sketch the street grid and mark
           the number of routes onto it.                                                                                           Fibonacci Terrace
         2. a) Continue to travel only east or south. How
               many routes are possible from the start to the corner of
               i) Descartes Street and Kovalevsky Avenue?
               ii) Pythagoras Street and Agnes Road?
               iii) Euclid Street and Agnes Road?
               iv) Descartes Street and de Fermat Drive?
               v) Descartes Street and Agnes Road?
            b) List the routes you counted in part a).

  254      MHR • Permutations and Organized Counting
3. Consider your method and the resulting numbers. How do they relate to
       Pascal’s triangle?
    4. Continue to mark the number of routes possible on your sketch until you
       have reached the corner of Fibonacci Terrace and Euler Boulevard. How
       many different routes are possible?
    5. Describe the process you used to find the number of routes from Pythagoras
       Street and Kovalevsky Avenue to Fibonacci Terrace and Euler Boulevard.


Example 1 Counting Paths in an Array

Determine how many different paths will spell PASCAL if you start at the
top and proceed to the next row by moving diagonally left or right.
             P
           A A
         S S S
       C C C C
         A A A
           L L

Solution
Starting at the top, record the number of possible paths moving diagonally               P
to the left and right as you proceed to each different letter. For instance,         1
                                                                                      A    A
                                                                                            1
there is one path from P to the left A and one path from P to the right A.       1       2              1
There is one path from an A to the left S, two paths from an A to the              S     S             S
                                                                             1         3   3                 1
middle S, and one path from an A to the right S.                               C      C    C                C
                                                                                 4           6          4
Continuing with this counting reveals that there are 10 different paths           A          A         A
leading to each L. Therefore, a total of 20 paths spell PASCAL.
                                                                                      10 L       L10



Example 2 Counting Paths on a Checkerboard

On the checkerboard shown, the checker can travel only diagonally upward.
It cannot move through a square containing an X. Determine the number of
paths from the checker’s current position to the top of the board.



              x




                                                                   4.5 Applying Pascal’s Method • MHR       255
Solution

Use Pascal’s method to find the number of paths to each successive             5       9       8       8
position. There is one path possible into each of the squares diagonally          5       4       4       4
adjacent to the checker’s starting position. From the second row there        1       4       x       4
are four paths to the third row: one path to the third square from the
                                                                                  1       3       3       1
left, two to the fifth square, and one to the seventh square. Continue
                                                                                      1       2       1
this process for the remaining four rows. The square containing an X
gets a zero or no number since there are no paths through this blocked                    1       1
square.

From left to right, there are 5, 9, 8, and 8 paths to the white squares at
the top of the board, making a total of 30 paths.



   Key Concepts

   • Pascal’s method involves adding two neighbouring terms in order to find the
     term below.

   • Pascal’s method can be applied to counting paths in a variety of arrays and
     grids.



   Communicate Your Understanding

      1. Suggest a context in which you could apply Pascal’s method, other than those
        in the examples above.

      2. Which of the numbers along the perimeter of a map tallying possible routes
        are always 1? Explain.


Practise                                                 2. In the following arrangements of letters,
                                                            start from the top and proceed to the next
 A                                                          row by moving diagonally left or right. How
 1. Fill in the missing numbers using Pascal’s              many different paths will spell each word?
      method.                                               a)                   P
                        495                                                    A A
                           825                                               T T T
                3003   2112                                               T T T T
                                                                         E E E E E
                                                                      R R R R R R
                                                                     N N N N N N N
                                                                    S S S S S S S S


256     MHR • Permutations and Organized Counting
b)                  M                           5. Sung is three blocks east and five blocks
                    A       A                         south of her friend’s home. How many
                  T T T                               different routes are possible if she walks
                H H H H                               only west or north?
               E E E E E
              M M M M M M                           6. Ryan lives four blocks north and five blocks
             A A A A A A A                            west of his school. Is it possible for him to
              T T T T T T                             take a different route to school each day,
               I I I I I                              walking only south and east? Assume that
                C C C C                               there are 194 school days in a year.
                  S S S
    c)          T                                   7. A checker is placed on a checkerboard as
               R R                                    shown. The checker may move diagonally
              I I I                                   upward. Although it cannot move into a
             A A A A                                  square with an X, the checker may jump over
              N N N                                   the X into the diagonally opposite square.
               G G
              L L L
             E E E E
                                                                       x
3. The first nine terms of a row of Pascal’s
                                                                                 x
    triangle are shown below. Determine the first
    nine terms of the previous and next rows.
    1 16 120 560 1820 4368 8008 11 440 12 870

                                                       a) How many paths are there to the top of
Apply, Solve, Communicate                                 the board?
B                                                     b) How many paths would there be if the
4. Determine the number of possible routes                checker could move both diagonally and
    from A to B if you travel only south or east.         straight upward?
    a) A
                                                    8. Inquiry/Problem Solving
                                                       a) If a checker is placed as shown below,
                                                          how many possible paths are there for
                                                          that checker to reach the top of the game
                            B
                                                          board? Recall that checkers can travel
    b) A
                                                          only diagonally on the white squares, one
                                                          square at a time, moving upward.


                            B
    c)   A




                                B                               1      2     3       4


                                                               4.5 Applying Pascal’s Method • MHR   257
b) When a checker reaches the opposite                    11. Communication A popular game show uses a
          side, it becomes a “king.” If the starting               more elaborate version of the Plinko board
          squares are labelled 1 to 4, from left to                shown below. Contestants drop a peg into
          right, from which starting square does a                 one of the slots at the top of the upright
          checker have the most routes to become                   board. The peg is equally likely to go left
          a king? Verify your statement.                           or right at each post it encounters.

 9. Application The following diagrams represent                          1        2        3       4        5   6
      communication networks between a
      company’s computer centres in various cities.
      Thunder Bay                             Charlottetown
                      Sudbury       Halifax


                  North Bay
                                  Toronto
                       Ottawa                      Montréal
      Kitchener
                                  Kingston                            $100 $1000       $0   $5000       $0   $1000 $100

                                                   Winnipeg
                                                                    a) Into which slot should contestants drop
                       Hamilton
       Windsor                  Saskatoon
                                                                       their pegs to maximize their chances of
                                                                       winning the $5000 prize? Which slot
                                       Edmonton
                                                                       gives contestants the least chance of
                                                  Vancouver
                                                                       winning this prize? Justify your answers.
      a) How many routes are there from
                                                                    b) Suppose you dropped 100 pegs into the
          Windsor to Thunder Bay?
                                                                       slots randomly, one at a time. Sketch a
      b) How many routes are there from                                graph of the number of pegs likely to wind
          Ottawa to Sudbury?                                           up in each compartment at the bottom of
      c) How many routes are there from                                the board. How is this graph related to
          Montréal to Saskatoon?                                       those described in earlier chapters?
      d) How many routes are there from
                                                                12. Inquiry/Problem Solving
          Vancouver to Charlottetown?
                                                                    a) Build a new version of Pascal’s triangle,
      e) If the direction were reversed, would the
                                                                       using the formula for tn,r on page 247,
          number of routes be the same for parts a)
                                                                       but start with t0,0 = 2.
          to d)? Explain.
                                                                   b) Investigate this triangle and state a
10. To outfox the Big Bad Wolf, Little Red                             conjecture about its terms.
      Riding Hood mapped all the paths through                      c) State a conjecture about the sum of the
      the woods to Grandma’s house. How many                           terms in each row.
      different routes could she take, assuming she
      always travels from left to right?                        13. Inquiry/Problem Solving Develop a formula
                                                                   relating tn,r of Pascal’s triangle to the terms
  Little Red
  Riding Hood's                                                    in row n − 3.
  House                                             Grandma's
                                                    House




258      MHR • Permutations and Organized Counting
ACHIEVEMENT CHECK                                               17. Inquiry/Problem Solving Water is poured
                                                                           into the top bucket of a triangular stack of
  Knowledge/         Thinking/Inquiry/
 Understanding        Problem Solving
                                         Communication   Application       2-L buckets. When each bucket is full, the
                                                                           water overflows equally on both sides into
14. The grid below shows the streets in Anya’s
                                                                           the buckets immediately below. How much
     neighbourhood.                                                        water will have been poured into the top
                                              B                            bucket when at least one of the buckets in
                                                                           the bottom row is full?
                 D

                 C

      A

     a) If she only travels east and north, how
          many different routes can Anya take
          from her house at intersection A to her
          friend’s house at intersection B?
     b) How many of the routes in part a) have
          only one change of direction?
     c) Suppose another friend lives at
          intersection C. How many ways can                                A        B        C        D        E         F
          Anya travel from A to B, meeting her
          friend at C along the way?
                                                                       18. Application Is it possible to arrange a
     d) How many ways can she travel to B                                  pyramid of buckets such that the bottom
          without passing through C? Explain                               layer will fill evenly when water overflows
          your reasoning.                                                  from the bucket at the top of the pyramid?
     e) If Anya takes any route from A to B, is she
          more likely to pass through intersection C                   19. Application Enya is standing in the centre
          or D? Explain your reasoning.                                    square of a 9 by 9 grid. She travels outward
                                                                           one square at a time, moving diagonally or
                                                                           along a row or column. How many different
                                                                           paths can Enya follow to the perimeter?
 C
15. Develop a general formula to determine the                         20. Communication Describe how a chessboard
     number of possible routes to travel n blocks                          path activity involving Pascal’s method is
     north and m blocks west.                                              related to network diagrams like those in
                                                                           section 1.5. Would network diagrams for
16. Inquiry/Problem Solving In chess, a knight                             such activities be planar? Explain.
     moves in L-shaped jumps consisting of two
     squares along a row or column plus one
     square at a right angle. On a standard 8 × 8
     chessboard, the starting position for a knight
     is the second square of the bottom row. If
     the knight travels upward on every move,
     how many routes can it take to the top of
     the board?

                                                                                    4.5 Applying Pascal’s Method • MHR       259
Review of Key Concepts

4.1 Organized Counting                                 4.3 Permutations With Some Identical
Refer to the Key Concepts on page 228.                 Items
                                                       Refer to the Key Concepts on page 244.
 1. A restaurant has a daily special with soup
      or salad for an appetizer; fish, chicken, or a     8. How many different ten-digit telephone
      vegetarian dish for the entrée; and cake, ice        numbers contain four 2s, three 3s, and
      cream, or fruit salad for dessert. Use a tree        three 7s?
      diagram to illustrate all the different meals
                                                        9. a) How many permutations are there of
      possible with this special.
                                                              the letters in the word baseball?
 2. A theatre company has a half-price offer for           b) How many begin with the letter a?
      students who buy tickets for at least three of       c) How many end with the letter e?
      the eight plays presented this season. How
      many choices of three plays would a student      10. Find the number of 4 × 4 patterns you can
      have?                                                make using eight white, four grey, and four
                                                           blue floor tiles.
 3. In how many different orders can a
      photographer pose a row of six people
      without having the tallest person beside         4.4 Pascal’s Triangle
      the shortest one?                                Refer to the Key Concepts on page 251.

 4. A transporter truck has three compact cars, a      11. Write out the first five rows of Pascal’s
      station wagon, and a minivan on its trailer.         triangle.
      In how many ways can the driver load the
      shipment so that one of the heavier vehicles     12. What is the sum of the entries in the
      is directly over the rear axle of the trailer?       seventh row of Pascal’s triangle?

                                                       13. Describe three patterns in Pascal’s triangle.
4.2 Factorials and Permutations
Refer to the Key Concepts on page 238.                 4.5 Applying Pascal’s Method
 5. For what values of n is n! less than 2 ? n         Refer to the Key Concepts on page 256.
      Justify your answer.                             14. Explain why Pascal’s method can be
 6. A band has recorded five hit singles. In how            considered an iterative process.
      many different orders could the band play        15. How many paths through                       S
      three of these five songs at a concert?               the array shown will spell               I       I
                                                                                                E       E       E
 7. In how many ways could a chairperson,                  SIERPINSKI?                              R       R
      treasurer, and secretary be chosen from a                                                 P       P       P
                                                                                           I      I I       I
      12-member board of directors?                                                         N N N
                                                                                           S S S S
                                                                                            K K K
                                                                                             I I


260     MHR • Permutations and Organized Counting
Chapter Test

ACHIEVEMENT CHART

                       Knowledge/        Thinking/Inquiry/
    Category                                                    Communication          Application
                      Understanding       Problem Solving
    Questions              All                4, 7, 8              1, 3, 8            3, 4, 5, 6, 8


1. Natasha tosses four coins one after the other.               4. a) How many four-digit numbers can you
   a) In how many different orders could                               form with the digits 1, 2, 3, 4, 5, 6, and 7
      heads or tails occur.                                            if no digit is repeated?
   b) Draw a tree diagram to illustrate all the                    b) How many of these four-digit numbers
      possible results.                                                are odd numbers?
   c) Explain how your tree diagram                                c) How many of them are even numbers?
      corresponds to your calculation in part a).
                                                                5. How many ways are there to roll either a
2. Evaluate the following by first expressing                       6 or a 12 with two dice?
  each in terms of factorials.
  a) 15P6     b) P(6, 2)       c) 7P3                           6. How many permutations are there of the
                                                                   letters of each of the following words?
  d) 9P9      e) P(7, 0)
                                                                   a) data      b) management           c) microwave
3. Suppose you are designing a remote control
  that uses short, medium, or long pulses of                    7. A number of long, thin sticks are lying in a
  infrared light to send control signals to a                      pile at odd angles such that the sticks cross
  device.                                                          each other.
   a) How many different control codes can                         a) Relate the maximum number of
      you define using                                                  intersection points of n sticks to entries
      i) three pulses?                                                 in Pascal’s triangle.
      ii) one, two, or three pulses?                               b) What is the maximum number of
  b) Explain how the multiplicative and                                intersection points with six overlapping
      additive counting principles apply in                            sticks?
      your calculations for part a).


     ACHIEVEMENT CHECK

 Knowledge/Understanding     Thinking/Inquiry/Problem Solving         Communication                    Application
8. At a banquet, four couples are sitting along one side of a table with men and women
   alternating.
    a) How many seating arrangements are possible for these eight people?
   b) How many arrangements are possible if each couple sits together? Explain your reasoning.
    c) How many arrangements are possible if no one is sitting beside his or her partner?
   d) Explain why the answers from parts b) and c) do not add up to the answer from part a).


                                                                                                Chapter Test • MHR   261
5
    PT   ER
                  Combinations and the
CHA




                    Binomial Theorem


                  Specific Expectations                                                         Section

                  Use Venn diagrams as a tool for organizing information in counting              5.1
                  problems.

                  Solve introductory counting problems involving the additive and            5.1, 5.2, 5.3
                  multiplicative counting principles.

                  Express answers to permutation and combination problems, using             5.1, 5.2, 5.3
                  standard combinatorial symbols.

                  Evaluate expressions involving factorial notation, using appropriate         5.2, 5.3
                  methods.

                  Solve problems, using techniques for counting combinations.                  5.2, 5.3

                  Identify patterns in Pascal’s triangle and relate the terms of Pascal’s         5.4
                  triangle to values of ΂n΃, to the expansion of a binomial, and to the
                                         r
                  solution of related problems.

                  Communicate clearly, coherently, and precisely the solutions to counting   5.1, 5.2, 5.3,
                  problems.                                                                       5.4
Chapter Problem
Radio Programming                               2. In how many ways can he program the
Jeffrey works as a DJ at a local radio            second hour if he chooses at least
station. He does the drive shift from 16 00       10 songs that are in positions 15 to 40
to 20 00, Monday to Friday. Before going          on the charts?
on the air, he must choose the music he will
                                                3. Over his 4-h shift, he will play at least
play during these four hours.
                                                  48 songs from the top 100. In how
The station has a few rules that Jeffrey          many ways can he choose these songs?
must follow, but he is allowed quite a bit
                                               In these questions, Jeffrey can play the
of leeway. Jeffrey must choose all his music
                                               songs in any order. Such questions can be
from the top 100 songs for the week and he
                                               answered with the help of combinatorics,
must play at least 12 songs an hour. In his
                                               the branch of mathematics introduced in
first hour, all his choices must be from the
                                               Chapter 4. However, the permutations in
top-20 list.
                                               Chapter 4 dealt with situations where the
 1. In how many ways can Jeffrey choose        order of items was important. Now, you
   the music for his first hour?                will learn techniques you can apply in
                                               situations where order is not important.
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

 1. Factorials (section 4.2) Evaluate.                    7. Exponent laws Use the exponent laws to
                                    8!                      simplify each of the following.
      a) 8!                    b) ᎏᎏ
                                    5!                       a) (−3y)0                   b) (−4x)3
         24!
      c) ᎏᎏ                    d) 3! × 4!
                                                                                                     ΂ ΃
                                                                                                     1    5
         22!                                                 c) 15(7x)4(4y)2             d) 21(x3)2 ᎏ2
                                                                                                     x
 2. Permutations (section 4.2) Evaluate mentally.
                                                                                              ΂΃
                                                                                            1 4
                                                             e) (4x0y)2(3x2y)3           f) ᎏᎏ (3x2)(2y)3
      a) 5 P5                  b)   10
                                         P2                                                 2

                                                                                              ΂΃
      c)        P1             d) 7 P3                                                      1 0
           12                                               g) (−3x y)(−5x2y)2           h) ᎏᎏ (−2xy)3
                                                                                            3
 3. Permutations (section 4.2) Evaluate manually.
      a)        P5             b) P(16, 2)                8. Simplifying expressions Expand and simplify.
           10
      c)        P10            d) P(8, 5)                    a) (x − 5)2                 b) (5x − y)2
           10
                                                             c) (x2 + 5)2                d) (x + 3)(x − 5)2
 4. Permutations (section 4.2) Evaluate using
                                                             e) (x2 − y)2                f) (2x + 3)2
      software or a calculator.
                                                            g) (x − 4)2(x − 2)           h) (2x2 + 3y)2
      a)        P25            b) P(37, 16)
                                                             i) (2x + 1)2(x − 2)         j) (x + y)(x − 2y)2
           50
      c)   29
                P29            d)    P
                                    46 23
                                                          9. Sigma notation Rewrite the following
 5. Organized counting (section 4.1) Every
                                                            using sigma notation.
      Canadian aircraft has five letters in its
      registration. The first letter must be C, the           a) 1 + 2 + 4 + 8 + 16
      second letter must be F or G, and the last            b) x +2x2 +3x3 + 4x4 + 5x5
      three letters have no restrictions. If repeated             1          1   1   1
                                                             c) ᎏ + ᎏ + ᎏ + ᎏ + …
      letters are allowed, how many aircraft can be               2          3   4   5
      registered with this system?
                                                         10. Sigma notation Expand.
                                                                   5
 6. Applying permutations (Chapter 4)
      a) How many arrangements are there of
                                                             a)   Α 2n
                                                                  n=2
           three different letters from the word                   4
                                                                        xn
           kings?                                           b)    Αᎏ
                                                                  n=1 n!
      b) How many arrangements are there of                        5
           all the letters of the word management?           c)   Α (2
                                                                  n=1
                                                                         n
                                                                             + n2)
      c) How many ways could first, second, and
           third prizes be awarded to 12 entrants in
           a mathematics contest?




264        MHR • Combinations and the Binomial Theorem
5.1         Organized Counting With Venn Diagrams

  In Chapter 4, you used tree diagrams as a tool for counting items when the
  order of the items was important. This section introduces a type of diagram that
  helps you organize data about groups of items when the order of the items is
  not important.

      I N V E S T I G AT E & I N Q U I R E : V isualizing Relationships Between Groups

      A group of students meet regularly to plan the dances at Vennville High School.
      Amar, Belinda, Charles, and Danica are on the dance committee, and Belinda,
      Charles, Edith, Franco, and Geoff are on the students’ council. Hans and Irena
      are not members of either group, but they attend meetings as reporters for the
      school newspaper.

       1. Draw two circles to represent the
         dance committee and the students’
         council. Where on the diagram
         would you put initials
         representing the students who are
          a) on the dance committee?
          b) on the students’ council?
          c) on the dance committee and
             the students’ council?
          d) not on either the dance
             committee or the students’
             council?
       2. Redraw your diagram marking on
         it the number of initials in each
         region. What relationships can
         you see among these numbers?



  Your sketch representing the dance committee and the students’ council is a simple
  example of a Venn diagram. The English logician John Venn (1834−1923)
  introduced such diagrams as a tool for analysing situations where there is some
  overlap among groups of items, or sets. Circles represent different sets and a
  rectangular box around the circles represents the universal set, S, from which all
  the items are drawn. This box is usually labelled with an S in the top left corner.




                                                       5.1 Organized Counting With Venn Diagrams • MHR   265
The items in a set are often called the elements or members of the set. The
size of a circle in a Venn diagram does not have to be proportional to the
number of elements in the set the circle represents. When some items in a set
are also elements of another set, these items are common elements and the sets
are shown as overlapping circles. If all elements of a set C are also elements of
set A, then C is a subset of A. A Venn diagram would show this set C as a
region contained within the circle for set A.

           S        A         B


                                                                    www.mcgrawhill.ca/links/MDM12

                                                                To learn more about Venn diagrams, visit the
                                                               above web site and follow the links. Describe an
                Common elements                                 example of how Venn diagrams can be used
                   of A and B                                             to organize information.
The common elements are a subset of both A and B.

You can use Venn diagrams to organize information for situations in which the
number of items in a group are important but the order of the items is not.

Example 1 Common Elements

There are 10 students on the volleyball team and 15 on the basketball team.
When planning a field trip with both teams, the coach has to arrange
transportation for a total of only 19 students.
a)    Use a Venn diagram to illustrate this situation.
b)    Explain why you cannot use the additive counting principle to find the
      total number of students on the teams.
c)    Determine how many students are on both teams.
d)    Determine the number of students in the remaining regions of your
      diagram and explain what these regions represent.

Solution

a) Some students must be on both the volleyball and the basketball              S         VB         BB
      team. Draw a box with an S in the top left-hand corner. Draw
      and label two overlapping circles to represent the volleyball and
      basketball teams.




266     MHR • Combinations and the Binomial Theorem
b) The additive counting principle (or rule of sum) applies only to mutually
    exclusive events or items. However, it is possible for students to be on both
    teams. If you simply add the 10 students on the volleyball team to 15
    students on the basketball team, you get a total of 25 students because the
    students who play on both teams have been counted twice.

c) The difference between the total in part b) and the total number of students
    actually on the two teams is equal to the number of students who are
    members of both teams. Thus, 25 − 19 = 6 students play on both teams. In
    the Venn diagram, these 6 students are represented by the area where the
    two circles overlap.

d) There are 10 − 6 = 4 students in the section of the VB circle that        S       VB          BB
    does not overlap with the BB circle. These are the students who
    play only on the volleyball team. Similarly, the non-overlapping                  4            9
    portion of the BB circle represents the 15 − 6 = 9 students who                         6
    play only on the basketball team.


Example 1 illustrates the principle of inclusion and exclusion. If you are
counting the total number of elements in two groups or sets that have common
elements, you must subtract the common elements so that they are not included
twice.
Principle of Inclusion and Exclusion for Two Sets
For sets A and B, the total number of elements in either A or B is the number
in A plus the number in B minus the number in both A and B.

 n(A or B) = n(A) + n(B) − n(A and B),
 where n(X ) represents the numbers of elements in a set X.

The set of all elements in either set A or set B is the union of A and B, which is
often written as A ∪ B. Similarly, the set of all elements in both A and B is the
intersection of A and B, written as A ∩ B. Thus the principle of inclusion and
exclusion for two sets can also be stated as
 n(A ∪ B) = n(A) + n(B) − n( A ∩ B)

Note that the additive counting principle (or rule of sum) could be considered
a special case of the principle of inclusion and exclusion that applies only when
sets A and B have no elements in common, so that n(A and B) = 0. The
principle of inclusion and exclusion can also be applied to three or more sets.




                                                       5.1 Organized Counting With Venn Diagrams • MHR   267
Example 2 Applying the Principle of Inclusion and Exclusion

A drama club is putting on two one-act plays. There are 11 actors in the
Feydeau farce and 7 in the Molière piece.
a) If 3 actors are in both plays, how many actors are there in all?
b)    Use a Venn diagram to calculate how many students are in only one
      of the two plays.

Solution

a) Calculate the number of students in both plays using the principle of
      inclusion and exclusion.

      n(total) = n(Feydeau) + n(Molière) − n(Feydeau and Molière)
               = 11 + 7 − 3
               = 15

      There are 15 students involved in the two one-act plays.

b) There are 3 students in the overlap between the two circles. So,
                                                                              S         F       M
      there must be 11 − 3 = 8 students in the region for Feydeau only
      and 7 − 3 = 4 students in the region for Molière only.
                                                                                    8           4
                                                                                            3
      Thus, a total of 8 + 4 = 12 students are in only one of the two
      plays.



As in the first example, using a Venn diagram can clarify the relationships
between several sets and subsets.

Example 3 Working With Three Sets

Of the 140 grade 12 students at Vennville High School, 52 have signed up for
biology, 71 for chemistry, and 40 for physics. The science students include 15
who are taking both biology and chemistry, 8 who are taking chemistry and
physics, 11 who are taking biology and physics, and 2 who are taking all three
science courses.
a) How many students are not taking any of these three science courses?
b)    Illustrate the enrolments with a Venn diagram.

Solution

a) Extend the principle of inclusion and exclusion to three sets. Total the
      numbers of students in each course, subtract the numbers of students taking
      two courses, then add the number taking all three. This procedure subtracts
      out the students who have been counted twice because they are in two


268     MHR • Combinations and the Binomial Theorem
courses, and then adds back those who were subtracted twice because they
   were in all three courses.

   For simplicity, let B stand for biology, C stand for chemistry, and P stand for
   physics. Then, the total number of students taking at least one of these
   three courses is

   n(total) = n(B) + n(C) + n(P) − n(B and C) − n(C and P) − n(B and P) + n(B and C and P)
            = 52 + 71 + 40 − 15 − 8 − 11 + 2
            = 131

   There are 131 students taking one or more of the three science courses.
   To find the number of grade 12 students who are not taking any of these
   science courses, subtract 131 from the total number of grade 12 students.

   Thus, 140 − 131 = 9 students are not taking any of these three science
   courses in grade 12.

b) For this example, it is easiest to start with the overlap among the       S           B                 C
   three courses and then work outward. Since there are 2 students
   taking all three courses, mark 2 in the centre of the diagram where
   the three circles overlap.                                                                     2

   Next, consider the adjacent regions representing the students who
   are taking exactly two of the three courses.
                                                                                                  P
   Biology and chemistry: Of the 15 students taking these two courses,
                                                                            S       B                  C
   2 are also taking physics, so 13 students are taking only biology
   and chemistry.                                                                            13
   Chemistry and physics: 8 students less the 2 in the centre region                         2
                                                                                         9        6
   leaves 6.
   Biology and physics: 11 − 2 = 9.
                                                                                             P
   Now, consider the regions representing students taking only one
   of the science courses.

   Biology: Of the 52 students taking this course, 13 + 2 + 9 = 24 are      S       B                  C
   in the regions overlapping with the other two courses, leaving
   28 students who are taking biology only.                                         28       13       50
   Chemistry: 71 students less the 13 + 2 + 6 leaves 50.                                      2
                                                                                         9        6
   Physics: 40 − (9 + 2 + 6) = 23.
                                                                                             23
   Adding all the numbers within the circles gives a total of 131.              9
                                                                                             P
   Thus, there must be 140 − 131 = 9 grade 12 students who are
   not taking any of the three science courses, which agrees with
   the answer found in part a).



                                                      5.1 Organized Counting With Venn Diagrams • MHR          269
Key Concepts

   • Venn diagrams can help you visualize the relationships between sets of items,
     especially when the sets have some items in common.

   • The principle of inclusion and exclusion gives a formula for finding the
     number of items in the union of two or more sets. For two sets, the formula
     is n(A or B) = n(A) + n(B) − n(A and B).


   Communicate Your Understanding

      1. Describe the principal use of Venn diagrams.

      2. Is the universal set the same for all Venn diagrams? Explain why or
         why not.

      3. Explain why the additive counting principle can be used in place of the
         principle of inclusion and exclusion for mutually exclusive sets.



Practise                                                      c) List all subsets containing exactly two
                                                                 elements for
 A                                                               i)   A
 1. Let set A consist of an apple, an orange,
                                                                 ii) B
      and a pear and set B consist of the apple
      and a banana.                                              iii) A ∪ B

      a) List the elements of                              2. A recent survey of a group of students found
         i)   A and B                                        that many participate in baseball, football,
         ii) A or B                                          and soccer. The Venn diagram below shows
                                                             the results of the survey.
         iii) S
         iv) S ∩ B                                                        S                   Baseball
         v) A ∪ B ∪ S
                                                                          Football   27       8        10
      b) List the value of                                                                    6
                                                                                          3        4
         i)   n(A) + n(B)
                                                                                              19
         ii) n(A or B)                                                        5
                                                                                               Soccer
         iii) n(S)
         iv) n(A ∪ B )
         v) n(S ∩ A )




270     MHR • Combinations and the Binomial Theorem
a) How many students participated in the        5. Suppose the Canadian Embassy in the
         survey?                                       Netherlands has 32 employees, all of whom
    b) How many of these students play both            speak both French and English. In addition,
         soccer and baseball?                          22 of the employees speak German and 15
                                                       speak Dutch. If there are 10 who speak both
    c) How many play only one sport?
                                                       German and Dutch, how many of the
    d) How many play football and soccer?              employees speak neither German nor
    e) How many play all three sports?                 Dutch? Illustrate your answer with a Venn
    f) How many do not play soccer?                    diagram.

Apply, Solve, Communicate                           6. Application There are 900 employees at
                                                       CantoCrafts Inc. Of these, 615 are female,
B                                                      345 are under 35 years old, 482 are single,
3. Of the 220 graduating students in a school,         295 are single females, 187 are singles
    110 attended the semi-formal dance and 150         under 35 years old, 190 are females
    attended the formal dance. If 58 students          under 35 years old, and 120 are single
    attended both events, how many graduating          females under 35 years old. Use a Venn
    students did not attend either of the two          diagram to determine how many employees
    dances? Illustrate your answer with a Venn         are married males who are at least
    diagram.                                           35 years old.

4. Application A survey of 1000 television          7. Communication A survey of 100 people who
    viewers conducted by a local television            volunteered information about their reading
    station produced the following data:               habits showed that
    •   40% watch the news at 12 00                    •   75 read newspapers daily
    •   60% watch the news at 18 00                    •   35 read books at least once a week
    •   50% watch the news at 23 00                    •   45 read magazines regularly
    •   25% watch the news at 12 00 and at 18 00       •   25 read both newspapers and books
    •   20% watch the news at 12 00 and at 23 00       •   15 read both books and magazines
    •   20% watch the news at 18 00 and at 23 00       •   10 read newspapers, books, and
    •   10% watch all three news broadcasts                magazines
    a) What percent of those surveyed watch at         a) Construct a Venn diagram to determine
         least one of these programs?                       the maximum number of people in the
    b) What percent watch none of these news
                                                            survey who read both newspapers and
         broadcasts?                                        magazines.
                                                       b) Explain why you cannot determine
    c) What percent view the news at 12 00 and
         at 18 00, but not at 23 00?                        exactly how many of the people
                                                            surveyed read both newspapers and
    d) What percent view only one of these
                                                            magazines.
         shows?
    e) What percent view exactly two of these
         shows?




                                                   5.1 Organized Counting With Venn Diagrams • MHR   271
8. Jeffrey works as a DJ at a local radio station.
         pte
                                                                         C
    ha          On occasion, he chooses some of the songs
                                                                         9. Inquiry/Problem Solving The Vennville
C


           r



                he will play based on the phone-in requests
           m



                                                                             junior hockey team has 12 members who
P




r
    oble
                received by the switchboard the previous
                                                                             can play forward, 8 who can play defence,
                day. Jeffrey’s list of 200 possible selections
                                                                             and 2 who can be goalies. What is the
                includes
                                                                             smallest possible size of the team if
                •   all the songs in the top 100
                                                                             a) no one plays more than one position?
                •   134 hard-rock songs
                •   50 phone-in requests                                     b) no one plays both defence and forward?
                •   45 hard-rock songs in the top 100                        c) three of the players are able to play
                •   20 phone-in requests in the top 100                         defence or forward?
                •   24 phone-in requests for hard-rock songs                 d) both the goalies can play forward but not
                Use a Venn diagram to determine                                 defence?
                a) how many phone-in requests were for
                                                                     10. Inquiry/Problem Solving Use the principle of
                     hard-rock songs in the top 100
                                                                             inclusion and exclusion to develop a formula
                b) how many of the songs in the top 100                      for the number of elements in
                     were neither phone-in requests nor hard-
                                                                             a) three sets    b) four sets     c) n sets
                     rock selections


                Career Connection
                                                    Forensic Scientist
               The field of forensic science could be attractive to those with a
               mathematics and science background. The job of a forensic scientist is to
               identify, analyse, and match items collected from crime scenes.

               Forensic scientists most often work in a forensic laboratory. Such
               laboratories examine and analyse physical evidence, including controlled
               substances, biological materials, firearms and ammunition components,
               and DNA samples.

               Forensic scientists may have specialities such as fingerprints, bullistics,
               clothing and fibres, footprints, tire tracks,
               DNA profiling, or crime scene analysis.

               Modern forensic science combines                                www.mcgrawhill.ca/links/MDM12
               mathematics and computers. A forensic
               scientist should have a background in                       For more information about forensic science
               combinatorics, biology, and the physical                   and other careers related to mathematics, visit
                                                                         the above web site and follow the links. Write a
               sciences. Forensic scientists work for a wide
                                                                         brief description of how combinatorics could be
               variety of organizations including police forces,
                                                                                    used by forensic scientists.
               government offices, and the military.




         272        MHR • Combinations and the Binomial Theorem
5.2           Combinations

  In Chapter 4, you learned about permutations—arrangements in which the
  order of the items is specified. However, in many situations, order does not
  matter. For example, in many card games, what is in your hand is important, but
  the order in which it was dealt is not.

       I N V E S T I G AT E & I N Q U I R E : S t u d e n t s ’ C o u n c i l

      Suppose the students at a secondary school elect a council of eight
      members, two from each grade. This council then chooses two of its
      members as co-chairpersons. How could you calculate the number of
      different pairs of members who could be chosen as the co-chairs?

      Choose someone in the class to record your answers to the following
      questions on a blackboard or an overhead projector.

      a)   Start with the simplest case. Choose two students to stand at the front
           of the class. In how many ways can you choose two co-chairs from this
           pair of students?
      b) Choose three students to be at the front of the class. In how many ways
           can you choose two co-chairs from this trio?
      c)   In how many ways can you choose two co-chairs from a group of four
           students?
      d) In how many ways can you choose two
           co-chairs from a group of five students?
           Do you see a pattern developing? If so,
           what is it? If not, try choosing from a
           group of six students and then from a
           group of seven students while
           continuing to look for a pattern.
      e)   When you see a pattern, predict the
           number of ways two co-chairs can be
           chosen from a group of eight students.
      f)   Can you suggest how you could find
           the answers for this investigation from
           the numbers of permutations you
           found in the investigation in section
           4.2?




                                                                                5.2 Combinations • MHR   273
In the investigation on the previous page, you were dealing with a situation in
which you were selecting two people from a group, but the order in which you
chose the two did not matter. In a permutation, there is a difference between
selecting, say, Bob as president and Margot as vice-president as opposed to
selecting Margot as president and Bob as vice-president. If you select Bob and
Margot as co-chairs, the order in which you select them does not matter since
they will both have the same job.

A selection from a group of items without regard to order is called a
combination.

Example 1 Comparing Permutations and Combinations

a)    In how many ways could Alana, Barbara, Carl, Domenic, and Edward fill
      the positions of president, vice-president, and secretary?
b)    In how many ways could these same five people form a committee with
      three members? List the ways.
c)    How are the numbers of ways in parts a) and b) related?

Solution

a) Since the positions are different, order is important. Use a permutation, nPr.
      There are five people to choose from, so n = 5. There are three people
      being chosen, so r = 3. The number of permutations is 5P3 = 60.

      There are 60 ways Alana, Barbara, Carl, Domenic, and Edward could fill
      the positions of president, vice-president, and secretary.

b) The easiest way to find all committee combinations is to write them in an
      ordered fashion. Let A represent Alana, B represent Barbara, C represent
      Carl, D represent Domenic, and E represent Edward.

      The possible combinations are:
      ABC       ABD       ABE        ACD              ACE
      ADE       BCD       BCE        BDE              CDE

      All other possible arrangements include the same three people as one of the
      combinations listed above. For example, ABC is the same as ACB, BAC,
      BCA, CAB, and CBA since order is not important.

      So, there are only ten ways Alana, Barbara, Carl, Domenic and Edward can
      form a three-person committee.




274     MHR • Combinations and the Binomial Theorem
c) In part a), there were 60 possible permutations, while in part b), there were
         10 possible combinations. The difference is a factor of 6. This factor is
          P = 3!, the number of possible arrangements of the three people in each
         3 3
         combination. Thus,
                                                number of permutations
         number of combinations = ᎏᎏᎏᎏᎏᎏ
                                    number of permutations of the objects selected
                                     P
                                 = ᎏ5 3
                                     3!
                                    60
                                 = ᎏ
                                     6
                                 = 10

Combinations of n distinct objects taken r at a time
The number of combinations of r objects chosen from a set of n distinct objects is
            P
     n
      Cr = ᎏ
           n r
            r!
                 n!
             ᎏᎏ
              (n − r)!
          = ᎏᎏ
                 r!
                n!
          = ᎏ
            (n − r)!r!

The notations nCr, C(n, r), and ΂ n ΃ are all equivalent. Many people prefer the form
                                   r
 n when a number of combinations are multiplied together. The symbol C is used
΂r΃                                                                          n r

most often in this text since it is what appears on most scientific and graphing calculators.

Example 2 Applying the Combinations Formula
How many different sampler dishes with 3 different flavours could you get
at an ice-cream shop with 31 different flavours?

Solution
There are 31 flavours, so n = 31. The sampler dish has 3 flavours, so r = 3.
              31!
     C3 = ᎏᎏ
31
          (31 − 3)! 3
           31!
        = ᎏᎏ
          28!3!
          31 × 30 × 29
        = ᎏᎏ
              3×2
        = 4495
There are 4495 possible sampler combinations.

                                                                               5.2 Combinations • MHR   275
Note that the number of combinations in Example 2 was easy to calculate
because the number of items chosen, r, was quite small.

Example 3 Calculating Numbers of Combinations Manually

A ballet choreographer wants 18 dancers for a scene.
a)    In how many ways can the choreographer choose the dancers if the
      company has 20 dancers? 24 dancers?
b)    How would you advise the choreographer to choose the dancers?

Solution

a) When n and r are close in value, nCr can be calculated mentally.
      With n = 20 and r = 18,
                       20!
           C18 = ᎏᎏ
      20
                  (20 − 18)!18!
                 20 × 19
               = ᎏᎏ                20 ÷ 2 = 10
                    2!
               = 190               Then, 10 × 19 = 190

      The choreographer could choose from 190 different combinations of the
      20 dancers.

      With n = 24 and r = 18, nCr can be calculated manually or with a basic
      calculator once you have divided out the common terms in the factorials.
                      24!
           C18 = ᎏᎏ
      24
                 (24 − 18)!18!
                 24 × 23 × 22 × 21 × 20 × 19
               = ᎏᎏᎏ
                              6!
                 24 × 23 × 22 × 21 × 20 × 19
               = ᎏᎏᎏ
                    6×5×4×3×2×1
              = 23 × 11 × 7 × 4 × 19
              = 134 596

      With the 4 additional dancers, the choreographer now has a choice of
      134 596 combinations.

b) From part a), you can see that it would be impractical for the choreographer
      to try every possible combination. Instead the choreographer could use an
      indirect method and try to decide which dancers are least likely to be
      suitable for the scene.




276         MHR • Combinations and the Binomial Theorem
Even though there are fewer permutations of n objects than there are combinations,
the numbers of combinations are often still too large to calculate manually.

Example 4 Using Technology to Calculate Numbers of Combinations

Each player in a bridge game receives a hand of 13 cards dealt from a standard         For details of
deck. How many different bridge hands are possible?                                    calculator and
                                                                                       software functions,
Solution 1     Using a Graphing Calculator                                             refer to Appendix B.

Here, the order in which the player receives the cards does not matter. What
you want to determine is the number of different combinations of cards a player
could have once the dealing is complete. So, the answer is simply 52C13. You
can evaluate 52C13 by using the nCr function on the MATH PRB menu of a
graphing calculator. This function is similar to the nPr function used for
permutations.

There are about 635 billion possible bridge hands.

Solution 2     Using a Spreadsheet

Most spreadsheet programs have a combinations function for calculating numbers of
combinations. In Microsoft® Excel, this function is the COMBIN(n,r) function. In
Corel® Quattro® Pro, this function is the COMB(r,n) function.




You now have a variety of methods for finding numbers of combinations—
                                                                                      Project
paper-and-pencil calculations, factorials, scientific or graphing calculators, and
                                                                                      Prep
software. When appropriate, you can also apply both of the counting
principles described in Chapter 4.                                                    Techniques for
                                                                                      calculating
Example 5 Using the Counting Principles With Combinations                             numbers of
                                                                                      combinations
Ursula runs a small landscaping business. She has on hand 12 kinds of rose            could be helpful
bushes, 16 kinds of small shrubs, 11 kinds of evergreen seedlings, and 18 kinds       for designing the
of flowering lilies. In how many ways can Ursula fill an order if a customer            game in your
wants                                                                                 probability project,
a) 15 different varieties consisting of 4 roses, 3 shrubs, 2 evergreens,              especially if your
    and 6 lilies?                                                                     game uses cards.
b)   either 4 different roses or 6 different lilies?

                                                                              5.2 Combinations • MHR   277
Solution

a) The order in which Ursula chooses the plants does not matter.

      The number of ways of choosing the roses is 12C4.
      The number of ways of choosing the shrubs is 16C3.
      The number of ways of choosing the evergreens is 11C2.
      The number of ways of choosing the lilies is 18C6.

      Since varying the rose selection for each different selection of the shrubs,
      evergreens, and lilies produces a different choice of plants, you can apply
      the fundamental (multiplicative) counting principle. Multiply the series of
      combinations to find the total number of possibilities.

      12
           C4 × 16C3 × 11C2 × 18C6 = 495 × 560 × 55 × 18 564
                                   = 2.830 267 44 × 1011

      Ursula has over
      283 billion ways
      of choosing the
      plants for her
      customer.


b) Ursula can choose the 4 rose bushes in            12
                                                          C4 ways.

      She can choose the 6 lilies in 18C6 ways.

      Since the customer wants either the rose bushes or the lilies, you can apply
      the additive counting principle to find the total number of possibilities.

      12
           C4 + 18C6 = 495 + 18 564
                     = 19 059

      Ursula can fill the order for either roses or lilies in 19 059 ways.

As you can see, even relatively simple situations can produce very large
numbers of combinations.

   Key Concepts

   • A combination is a selection of objects in which order is not important.

   • The number of combinations of n distinct objects taken r at a time is denoted
                                                    n!
     as nCr , C(n, r), or ΂ n ΃ and is equal to ᎏᎏ .
                            r                   (n − r)! r!
   • The multiplicative and additive counting principles can be applied to problems
     involving combinations.


278         MHR • Combinations and the Binomial Theorem
Communicate Your Understanding

    1. Explain why n objects have more possible permutations than combinations.
         Use a simple example to illustrate your explanation.

    2. Explain whether you would use combinations, permutations, or another
         method to calculate the number of ways of choosing
         a) three items from a menu of ten items
         b) an appetizer, an entrée, and a dessert from a menu with three appetizers,
               four entrées, and five desserts

    3. Give an example of a combination expression you could calculate
         a) by hand
         b) algebraically
         c) only with a calculator or computer




Practise                                                 4. How many ways can 4 cards be chosen from
                                                            a deck of 52, if the order in which they are
A                                                           chosen does not matter?
1. Evaluate using a variety of methods.
    Explain why you chose each method.                   5. How many groups of 3 toys can a child
                                                            choose to take on a vacation from a toy box
    a)        C19         b)        C28
         21                    30                           containing 11 toys?
    c)   18
              C5          d)   16
                                    C3
    e)        C4          f)        C20                  6. How many sets of 6 questions for a test can
         19                    25
                                                            be chosen from a list of 22 questions?
2. Evaluate the following pairs of combinations
    and compare their values.                            7. In how many ways can a teacher select
                                                            5 students from a class of 23 to make a
    a)        C1, 11C10
         11                                                 bulletin-board display? Explain your
    b)   11
              C2, 11C9                                      reasoning.
    c)   11
              C3, 11C8
                                                         8. As a promotion, a video store decides to give
Apply, Solve, Communicate                                   away posters for recently released movies.
                                                            a) If posters are available for 27 recent
B                                                               releases, in how many ways could the
3. Communication In how many ways could you                     video-store owner choose 8 different
    choose 2 red jellybeans from a package of                   posters for the promotion?
    15 red jellybeans? Explain your reasoning.
                                                            b) Are you able to calculate the number
                                                                of ways mentally? Why or why not?




                                                                              5.2 Combinations • MHR    279
9. Communication A club has 11 members.                        14. Jeffrey, a DJ at a local radio station, is
                                                               pte
                                                          ha         choosing the music he will play on his shift.
      a) How many different 2-member




                                                      C


                                                                 r
         committees could the club form?                             He must choose all his music from the top




                                                                 m
                                                      P
                                                      r
                                                          oble
                                                                     100 songs for the week and he must play at
      b) How many different 3-member
                                                                     least 12 songs an hour. In his first hour, all
         committees could the club form?
                                                                     his choices must be from the top-20 list.
      c) In how many ways can a club president,
                                                                     a) In how many ways can Jeffrey choose the
         treasurer, and secretary be chosen?
                                                                        songs for his first hour if he wants to
      d) By what factor do the answers in parts b)                      choose exactly 12 songs?
         and c) differ? How do you account for
                                                                     b) In how many ways can Jeffrey choose
         this difference?
                                                                        the 12 songs if he wants to pick 8 of the
10. Fritz has a deck of 52 cards, and he may                            top 10 and 4 from the songs listed from
      choose any number of these cards, from                            11 to 20 on the chart?
      none to all. Use a spreadsheet or Fathom™                      c) In how many ways can Jeffrey choose
      to calculate and graph the number of                              either 12 or 13 songs to play in the first
      combinations for each of Fritz’s choices.                         hour of his shift?
                                                                     d) In how many ways can Jeffrey choose the
11. Application A track club, a swim club, and a
                                                                        songs if he wants to play up to 15 songs
      cycling club are forming a joint committee
                                                                        in the first hour?
      to organize a triathlon. The committee will
      have two members from each club. In how                   15. The game of euchre uses only 24 of the
      many ways can the committee be formed if                       cards from a standard deck. How many
      ten runners, eight swimmers, and seven                         different five-card euchre hands are
      cyclists volunteer to serve on it?                             possible?
12. In how many ways can a jury of 6 men and                    16. Application A taxi is shuttling 11 students
      6 women be chosen from a group of 10 men                       to a concert. The taxi can hold only 4
      and 15 women?                                                  students. In how many ways can 4 students
                                                                     be chosen for
13. Inquiry/Problem Solving There are
      15 technicians and 11 chemists working in                      a) the taxi’s first trip?
      a research laboratory. In how many ways                        b) the taxi’s second trip?
      could they form a 5-member safety
      committee if the committee                                17. Diane is making a quilt. She needs three
                                                                     pieces with a yellow undertone, two pieces
      a) may be chosen in any way?
                                                                     with a blue undertone, and four pieces with
      b) must have exactly one technician?                           a white undertone. If she has six squares
      c) must have exactly one chemist?                              with a yellow undertone, five with a blue
      d) must have exactly two chemists?                             undertone, and eight with a white undertone
                                                                     to choose from, in how many ways can she
      e) may be all technicians or all chemists?
                                                                     choose the squares for the quilt?




280     MHR • Combinations and the Binomial Theorem
18. Inquiry/Problem Solving At a family reunion,                   20. In the game of bridge, each player is dealt
    everyone greets each other with a                                   a hand of 13 cards from a standard deck of
    handshake. If there are 20 people at the                            52 cards.
    reunion, how many handshakes take place?                            a) By what factor does the number of
                                                                           possible bridge hands differ from the
                                                                           number of ways a bridge hand could be
       ACHIEVEMENT CHECK                                                   dealt to a player? Explain your reasoning.
  Knowledge/     Thinking/Inquiry/
                                                                        b) Use combinations to write an expression
                                     Communication   Application
 Understanding    Problem Solving                                          for the number of bridge hands that have
19. A basketball team consists of five players—                             exactly five clubs, two spades, three
     one centre, two forwards, and two guards.                             diamonds, and three hearts.
     The senior squad at Vennville Central                              c) Use combinations to write an expression
     High School has two centres, six forwards,                            for the number of bridge hands that have
     and four guards.                                                      exactly five hearts.
     a) How many ways can the coach pick the                            d) Use software or a calculator to evaluate
         two starting guards for a game?                                   the expressions in parts b) and c).
    b) How many different starting lineups are                      C
         possible if all team members play their                   21. There are 18 students involved in the class
         specified positions?                                            production of Arsenic and Old Lace.
     c) How many of these starting lineups                              a) In how many ways can the teacher cast
         include Dana, the team’s 185-cm                                   the play if there are five male roles and
         centre?                                                           seven female roles and the class has nine
    d) Some coaches designate the forwards as                              male and nine female students?
         power forward and small forward. If all                        b) In how many ways can the teacher cast
         six forwards are adept in either position,                        the play if Jean will play the young
         how would this designation affect the                             female part only if Jovane plays the male
         number of possible starting lineups?                              lead?
     e) As the league final approaches, the                              c) In how many ways can the teacher cast
         centre Dana, forward Ashlee, and guard                            the play if all the roles could be played
         Hollie are all down with a nasty flu.                              by either a male or a female student?
         Fortunately, the five healthy forwards
         can also play the guard position. If the                  22. A large sack contains six basketballs and
         coach can assign these players as either                       five volleyballs. Find the number of
         forwards or guards, will the number of                         combinations of four balls that can be
         possible starting lineups be close to the                      chosen from the sack if
         number in part b)? Support your answer                         a) they may be any type of ball
         mathematically.                                                b) two must be volleyballs and two must
     f) Is the same result achieved if the                                 be basketballs
         forwards play their regular positions but                      c) all four must be volleyballs
         the guards can play as either forwards
                                                                        d) none may be volleyballs
         or guards? Explain your answer.



                                                                                            5.2 Combinations • MHR   281
5.3             Problem Solving With Combinations

  In the last section, you considered the number of ways of choosing r items from
  a set of n distinct items. This section will examine situations where you want to
  know the total number of possible combinations of any size that you could
  choose from a given number of items, some of which may be identical.

        I N V E S T I G AT E & I N Q U I R E : Combinations of Coins

        1. a) How many different sums of money can you create with a
              penny and a nickel? List these sums.
           b) How many different sums can you create
              with a penny, a nickel, and a dime? List
              them.
           c) Predict how many different sums you
              can create with a penny, a nickel,
              a dime, and a quarter. Test your
              conjecture by listing the possible sums.
        2. a) How many different sums of money
              can you create with two pennies
              and a dime? List them.
           b) How many different sums can you
              create with three pennies and a dime?
           c) Predict how many sums you can create with
              four pennies and a dime. Test your conjecture.
              Can you see a pattern developing? If so, what is it?



  Example 1 All Possible Combinations of Distinct Items

  An artist has an apple, an orange, and a pear in his refrigerator. In how many
  ways can the artist choose one or more pieces of fruit for a still-life painting?

  Solution
  The artist has two choices for each piece of fruit: either include it in the
  painting or leave it out. Thus, the artist has a total of 2 × 2 × 2 = 8 choices.
  Note that one of these choices is to leave out the apple, the orange, and the
  pear. However, the artist wants at least one piece of fruit in his painting.
  Thus, he has 23 − 1 = 7 combinations to choose from.



  282     MHR • Combinations and the Binomial Theorem
You can apply the same logic to any group of distinct items.

The total number of combinations containing at least one item chosen
from a group of n distinct items is 2n − 1.

Remember that combinations are subsets of the group of n objects.
A null set is a set that has no elements. Thus,

A set with n distinct elements has 2n subsets including the null set.


Example 2 Applying the Formula for Numbers of Subsets

In how many ways can a committee with at least one member be appointed
from a board with six members?

Solution

The board could choose 1, 2, 3, 4, 5, or 6 people for the committee, so n = 6.
Since the committee must have at least one member, use the formula that
excludes the null set.

26 − 1 = 64 − 1
       = 63

There are 63 ways to choose a committee of at least one person from a
six-member board.

Example 3 All Possible Combinations With Some Identical Items

Kate is responsible for stocking the coffee room at her office. She can purchase
up to three cases of cookies, four cases of soft drinks, and two cases of coffee
packets without having to send the order through the accounting department.
How many different direct purchases can Kate make?

Solution

Kate can order more than one of each kind of item, so this situation involves
combinations in which some items are alike.
• Kate may choose to buy three or two or one or no cases of cookies, so she
  has four ways to choose cookies.
• Kate may choose to buy four or three or two or one or no cases of soft
  drinks, so she has five ways to choose soft drinks.
• Kate may choose to buy two or one or no cases of coffee packets, so she has
  three ways to choose coffee.



                                                            5.3 Problem Solving With Combinations • MHR   283
Cookies                 0                             1                           2                             3

Soft Drinks       0     1    2     3    4      0     1    2     3    4      0     1    2     3    4      0     1     2    3     4


       Coffee 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2


As shown on the first branch of the diagram above, one of these choices is
purchasing no cookies, no soft drinks, and no coffee. Since this choice is not
a purchase at all, subtract it from the total number of choices.

Thus, Kate can make 4 × 5 × 3 − 1 = 59 different direct purchases.



In a situation where you can choose all, some, or none of the p items available,
you have p + 1 choices. You can then apply the fundamental (multiplicative)
counting principle if you have successive choices of different kinds of items.
Always consider whether the choice of not picking any items makes sense. If it
does not, subtract 1 from the total.

Combinations of Items in Which Some are Alike

If at least one item is chosen, the total number of selections that can be
made from p items of one kind, q items of another kind, r items of
another kind, and so on, is ( p + 1)(q + 1)(r + 1) … − 1

Having identical elements in a set reduces the number of possible combinations
when you choose r items from that set. You cannot calculate this number by
simply dividing by a factorial as you did with permutations in section 4.3. Often,
you have to consider a large number of cases individually. However, some
situations have restrictive conditions that make it much easier to count the
number of possible combinations.


Example 4 Combinations With Some Identical Items

The director of a short documentary has found five rock songs, two blues tunes,
and three jazz pieces that suit the theme of the film. In how many ways can the
director choose three pieces for the soundtrack if she wants the film to include
some jazz?




284     MHR • Combinations and the Binomial Theorem
Solution 1   Counting Cases

The director can select exactly one, two, or three jazz pieces.
Case 1: One jazz piece
        The director can choose the one jazz piece in 3C1 ways and two of
        the seven non-jazz pieces in 7C2 ways. Thus, there are 3C1 × 7C2 = 63
        combinations of music with one jazz piece.
Case 2: Two jazz pieces
        The director can choose the two jazz pieces in 3C2 ways and one of
        the seven non-jazz pieces in 7C1 ways. There are 3C2 × 7C1 = 21
        combinations with two jazz pieces.
Case 3: Three jazz pieces
        The director can choose the three jazz pieces and none of the seven
        non-jazz pieces in only one way: 3C3 × 7C0 = 1.

The total number of combinations with at least one jazz piece is 63 + 21 + 1 = 85.

Solution 2   Indirect Method
You can find the total number of possible combinations of three pieces of music
and subtract those that do not have any jazz.

The total number of ways of choosing any three pieces from the ten available is
  C = 120. The number of ways of not picking any jazz, that is, choosing only
10 3
from the non-jazz pieces is 7C3 = 35.

Thus, the number of ways of choosing at least one jazz piece is 120 − 35 = 85.

Here is a summary of ways to approach questions involving choosing or
selecting objects.
Is order important?
Yes: Use permutations. Can the same                No: Use combinations. Are you choosing exactly
     objects be selected more than once                r objects?
     (like digits for a telephone number)?             Yes: Could some of the objects be identical?
     Yes: Use the fundamental counting                       Yes: Count the individual cases.
           principle.                                                            n!
                                                             No: Use nCr = ᎏ
     No: Are some of the objects identical?                                  (n − r)!r!
                                        n!             No: Are some of the objects identical?
           Yes: Use the formula ᎏ .
                                     a!b!c!…                 Yes: Use ( p + 1)(q + 1)(r + 1) − 1 to find
                               n!                                 the total number of combinations
           No: Use nPr = ᎏ .                                      with at least one object.
                            (n − r)!
                                                             No: Use 2n to find the total number of
                                                                  combinations; subtract 1 if you do
                                                                  not want to include the null set.



                                                           5.3 Problem Solving With Combinations • MHR   285
Key Concepts

  • Use the formula ( p + 1)(q + 1)(r + 1) … − 1 to find the total number of
    selections of at least one item that can be made from p items of one kind,
    q of a second kind, r of a third kind, and so on.

  • A set with n distinct elements has 2n subsets including the null set.

  • For combinations with some identical elements, you often have to consider
    all possible cases individually.

  • In a situation where you must choose at least one particular item, either
    consider the total number of choices available minus the number without the
    desired item or add all the cases in which it is possible to have the desired
    item.

  Communicate Your Understanding

      1. Give an example of a situation where you would use the formula
         ( p + 1)(q + 1)(r + 1) … − 1. Explain why this formula applies.

      2. Give an example of a situation in which you would use the expression 2n − 1.
         Explain your reasoning.

      3. Using examples, describe two different ways to solve a problem where at least one
         particular item must be chosen. Explain why both methods give the same answer.


Practise                                                   4. In how many ways can a committee with
                                                               eight members form a subcommittee with
 A                                                             at least one person on it?
 1. How many different sums of money can you
      make with a penny, a dime, a one-dollar              B
      coin, and a two-dollar coin?                         5. Determine whether the following questions
                                                               involve permutations or combinations and
 2. How many different sums of money can be                    list any formulas that would apply.
      made with one $5 bill, two $10 bills, and                a) How many committees of 3 students can
      one $50 bill?                                               be formed from 12 students?
 3. How many subsets are there for a set with                  b) In how many ways can 12 runners finish
      a) two distinct elements?                                   first, second, and third in a race?
      b) four distinct elements?                               c) How many outfits can you assemble from
      c) seven distinct elements?                                 three pairs of pants, four shirts, and two
                                                                  pairs of shoes?
                                                               d) How many two-letter arrangements can
                                                                  be formed from the word star?


286     MHR • Combinations and the Binomial Theorem
Apply, Solve, Communicate                            11. The number 5880 can be factored into
                                                        prime divisors as 2 × 2 × 2 × 3 × 5 × 7 × 7.
 6. Seven managers and eight sales representatives
                                                         a) Determine the total number of divisors
   volunteer to attend a trade show. Their                  of 5880.
   company can afford to send five people. In
                                                         b) How many of the divisors are even?
   how many ways could they be selected
                                                         c) How many of the divisors are odd?
    a) without any restriction?
    b) if there must be at least one manager and     12. Application A theme park has a variety of
       one sales representative chosen?                 rides. There are seven roller coasters, four
                                                        water rides, and nine story rides. If
 7. Application A cookie jar contains three
                                                        Stephanie wants to try one of each type
   chocolate-chip, two peanut-butter, one               of ride, how many different combinations
   lemon, one almond, and five raisin cookies.           of rides could she choose?
    a) In how many ways can you reach into
       the jar and select some cookies?              13. Shuwei finds 11 shirts in his size at a
    b) In how many ways can you select some
                                                        clearance sale. How many different
       cookies, if you must include at least one        purchases could Shuwei make?
       chocolate-chip cookie?                        14. Communication Using the summary on
 8. A project team of 6 students is to be selected
                                                        page 285, draw a flow chart for solving
   from a class of 30.                                  counting problems.
    a) How many different teams can be selected?     15. a) How many different teams of 4 students
    b) Pierre, Gregory, and Miguel are students             could be chosen from the 15 students in
       in this class. How many of the teams                 the grade-12 Mathematics League?
       would include these 3 students?                   b) How many of the possible teams would
    c) How many teams would not include                     include the youngest student in the league?
       Pierre, Gregory, and Miguel?                      c) How many of the possible teams would
                                                            exclude the youngest student?
 9. The game of euchre uses only the 9s, 10s,
   jacks, queens, kings, and aces from a standard    16. Inquiry/Problem Solving
   deck of cards. How many five-card hands have           a) Use combinations to determine how
    a) all red cards?                                       many diagonals there are in
    b) at least two red cards?                              i)   a pentagon          ii) a hexagon
    c) at most two red cards?                            b) Draw sketches to verify your answers in
                                                            part a).
10. If you are dealing from a standard deck of
   52 cards,                                         17. A school is trying to decide on new school
    a) how many different 4-card hands could            colours. The students can choose three
       have at least one card from each suit?           colours from gold, black, green, blue, red,
    b) how many different 5-card hands could
                                                        and white, but they know that another
       have at least one spade?                         school has already chosen black, gold, and
                                                        red. How many different combinations of
    c) how many different 5-card hands could
                                                        three colours can the students choose?
       have at least two face cards (jacks,
       queens, or kings?
                                                         5.3 Problem Solving With Combinations • MHR   287
18. Application The social convenor has                                21. Communication
              12 volunteers to work at a school dance.                            a) How many possible combinations are
              Each dance requires 2 volunteers at the                                there for the letters in the three circles
              door, 4 volunteers on the floor, and                                    for each of the clue words in this puzzle?
              6 floaters. Joe and Jim have not volunteered
                                                                                  b) Explain why you cannot answer part a)
              before, so the social convenor does not want
                                                                                     with a single nCr calculation for each word.
              to assign them to work together. In how
              many ways can the volunteers be assigned?

      19. Jeffrey is a DJ at a local radio station. For
      pt
    ha e      the second hour of his shift, he must choose
C


       r




              all his music from the top 100 songs for the
       m
P




r
    oble
              week. Jeffery will play exactly 12 songs
              during this hour.
              a) How many different stacks of discs could
                   Jeffrey pull from the station’s collection if
                   he chooses at least 10 songs that are in
                   positions 15 to 40 on the charts?
              b) Jeffrey wants to start his second hour with
                   a hard-rock song and finish with a pop
                   classic. How many different play lists can
                   Jeffrey prepare if he has chosen 4 hard rock
                   songs, 5 soul pieces, and 3 pop classics?                           © Tribune Media Services, Inc. All Rights Reserved. Reprinted with Permission.

              c) Jeffrey has 8 favourite songs currently on
                                                                             22. Determine the number of ways of selecting
                   the top 100 list. How many different                           four letters, without regard for order, from
                   subsets of these songs could he choose to                      the word parallelogram.
                   play during his shift?
                                                                              C
                 ACHIEVEMENT CHECK
                                                                             23. Inquiry/Problem Solving Suppose the artist
            Knowledge/     Thinking/Inquiry/
                                               Communication   Application        in Example 1 of this section had two apples,
           Understanding    Problem Solving
                                                                                  two oranges, and two pears in his
       20. There are 52 white keys on a piano. The                                refrigerator. How many combinations does
               lowest key is A. The keys are designated A,                        he have to choose from if he wants to paint
               B, C, D, E, F, and G in succession, and                            a still-life with
               then the sequence of letters repeats, ending                       a) two pieces of fruit?
               with a C for the highest key.
                                                                                  b) three pieces of fruit?
               a) If five notes are played simultaneously,
                                                                                  c) four pieces of fruit?
                   in how many ways could the notes all be
                   i) As?                       ii) Gs?                      24. How many different sums of money can be
                   iii) the same letter? iv) different letters?                   formed from one $2 bill, three $5 bills, two
                                                                                  $10 bills, and one $20 bill?
              b) If the five keys are played in order, how
                   would your answers in part a) change?


      288         MHR • Combinations and the Binomial Theorem
5.4         The Binomial Theorem

  Recall that a binomial is a polynomial
  with just two terms, so it has the form
  a + b. Expanding (a + b)n becomes very
  laborious as n increases. This section
  introduces a method for expanding
  powers of binomials. This method is
  useful both as an algebraic tool and
  for probability calculations, as you
  will see in later chapters.




                                                                                                                    Blaise Pascal

      I N V E S T I G AT E & I N Q U I R E : Patterns in the Binomial Expansion

      1. Expand each of the following and simplify fully.
         a) (a + b)1                    b) (a + b)2         c) (a + b)3
         d) (a + b)4                    e) (a + b)5
      2. Study the terms in each of these expansions. Describe how the degree
         of each term relates to the power of the binomial.
      3. Compare the terms in Pascal’s triangle to the expansions in question 1.
         Describe any pattern you find.
      4. Predict the terms in the expansion of (a + b)6.



  In section 4.4, you found a number of patterns in Pascal’s triangle. Now
  that you are familiar with combinations, there is another important
  pattern that you can recognize. Each term in Pascal’s triangle corresponds
  to a value of nCr.

                            1                                                0
                                                                                 C0
                        1       1                                       1
                                                                            C0    1
                                                                                      C1
                    1       2       1                              C
                                                                  2 0
                                                                              C
                                                                             2 1       2
                                                                                           C2
                1  3 3 1                                     3
                                                                 C0     3
                                                                            C1    3
                                                                                      C2     3
                                                                                                C3
            1     4 6 4 1                                   C
                                                           4 0
                                                                       C
                                                                      4 1
                                                                              C
                                                                             4 2
                                                                                        C
                                                                                       4 3       4
                                                                                                     C4
        1       5 10 10 5 1                           C0
                                                      5      5
                                                                 C1     5
                                                                            C2    5
                                                                                      C3     5
                                                                                                C4        C5
                                                                                                          5




                                                                                            5.4 The Binomial Theorem • MHR   289
Comparing the two triangles shown on page 289, you can see that tn,r = nCr.
Recall that Pascal’s method for creating his triangle uses the relationship

tn,r = tn−1, r−1 + tn−1, r.

So, this relationship must apply to combinations as well.

Pascal’s Formula

 n
  Cr = n−1Cr−1 + n−1Cr


Proof:
                          (n − 1)!           (n − 1)!
     Cr−1 + n−1Cr = ᎏᎏ + ᎏᎏ
 n−1
                     (r − 1)!(n − r)! r!(n − r − 1)!
                          r(n − 1)!            (n − 1)!(n − r)
                  = ᎏᎏ + ᎏᎏᎏ
                     r(r − 1)!(n − r)!      r!(n − r)(n − r − 1)!
                     r(n − 1)!      (n − 1)!(n − r)
                  = ᎏ + ᎏᎏ
                     r!(n − r)!        r!(n − r)!
                      (n − 1)!
                  = ᎏ [r + (n − r)]
                     r!(n − r)!
                     (n − 1)! × n
                  = ᎏᎏ
                       r!(n − r)!
                          n!
                  = ᎏ
                     r!(n − r)!
                  = nCr
This proof shows that the values of nCr do indeed follow the pattern that
creates Pascal’s triangle. It follows that nCr = tn,r for all the terms in Pascal’s
triangle.

Example 1 Applying Pascal’s Formula to Combinations

Rewrite each of the following using Pascal’s formula.
a)   C
   12 8
                             b) 19C5 + 19C6


Solution

a)     12
            C8 = 11C7 + 11C8           b)   19
                                                 C5 + 19C6 = 20C6


As you might expect from the investigation at the beginning of this section,
the coefficients of each term in the expansion of (a + b)n correspond to the
terms in row n of Pascal’s triangle. Thus, you can write these coefficients in
combinatorial form.


290          MHR • Combinations and the Binomial Theorem
The Binomial Theorem

(a + b)n = nC0 a n + nC1a n−1b + nC2a n−2b 2 + … + nCr a n−rb r + … + nCnb n
                  n
or (a + b)n = Α nCr a n−rb r
                 r=0



Example 2 Applying the Binomial Theorem

Use combinations to expand (a + b)6.

Solution
            6
(a + b)6 = Α 6Cr a 6−rb r
           r=0
         = 6C0a6 + 6C1a5b + 6C2a4b 2 + 6C3a3b3 + 6C4a2b4 + 6C5ab5 + 6C6b6
         = a6 + 6a5b + 15a4b 2 + 20a3b3 + 15a2b4 + 6ab5 + b6

Example 3 Binomial Expansions Using Pascal’s Triangle

Use Pascal’s triangle to expand
a) (2x −1)4
b)   (3x − 2y)5


Solution

a) Substitute 2x for a and −1 for b. Since the exponent is 4, use the terms
     in row 4 of Pascal’s triangle as the coefficients: 1, 4, 6, 4, and 1. Thus,

     (2x − 1)4 = 1(2x)4 + 4(2x)3( − 1) + 6(2x)2( − 1)2 + 4(2x)( − 1)3 + 1( − 1)4
               = 16x4 + 4(8x3)( − 1) + 6(4x2)(1) + 4(2x)( − 1) + 1
               = 16x4 − 32x3 + 24x2 − 8x + 1

b) Substitute 3x for a and −2y for b, and use the terms from row 5 as coefficients.

     (3x − 2y)5 = 1(3x)5 + 5(3x)4( − 2y) + 10(3x)3( − 2y)2 + 10(3x)2( − 2y)3 + 5(3x)( − 2y)4 + 1( − 2y)5
                = 243x5 − 810x4 y + 1080x3y2 − 720x2y3 + 240xy4 − 32y5


Example 4 Expanding Binomials Containing Negative Exponents

                                                          ΂         ΃
                                                    2 4
Use the binomial theorem to expand and simplify x + ᎏ .
                                                    x2




                                                                               5.4 The Binomial Theorem • MHR   291
Solution
                       2
Substitute x for a and ᎏ for b.
                       x2
                      4

΂             ΃                      ΂ ΃
        2     4                       2    r
     x+ ᎏ
        x2
                  =   Α
                      r=0
                            Cr x 4−r ᎏᎏ
                            4
                                     x2

                                ΂ ΃         ΂ ΃          ΂ ΃ ΂ ΃
                                   2         2                  2          2        3         2    4
                  = 4C0 x4 + 4C1x3 ᎏ + 4C2x2 ᎏ                      + 4C3x ᎏ            + 4C4 ᎏ
                                   x2        x2                            x2                 x2

                  = 1x + 4x ΂ ᎏ ΃ + 6x ΂ ᎏ ΃ + 4x΂ ᎏ ΃ + 1΂ ᎏ ΃
                       4      2 3        4     2   8        16
                                     2               4              6           8
                              x          x         x        x
                  = x4 + 8x + 24x −2 + 32x −5 + 16x −8

Example 5 Patterns With Combinations

Using the patterns in Pascal’s triangle from the investigation and Example 4 in
section 4.4, write each of the following in combinatorial form.
a) the sum of the terms in row 5 and row 6
b)    the sum of the terms in row n
c)    the first 5 triangular numbers
d)    the nth triangular number

Solution

a) Row 5:                                                  Row 6:
        1 + 5 + 10 + 10 + 5 + 1                              1 + 6 + 15 + 20 + 15 + 6 + 1
      = 5C0 + 5C1 + 5C2 + 5C3 + 5C4 + 5C5                  = 6C0 + 6C1 + 6C2 + 6C3 + 6C4 + 6C5 + 6C6
      = 32                                                 = 64
      = 25                                                 = 26

b) From part a) it appears that nC0 + nC1 + … + nCn = 2n.

      Using the binomial theorem,
      2n = (1 + 1)n
         = nC0 × 1n + nC1 × 1n−1 × 1 + … + nCn × 1n
         = nC0 + nC1 + … + nCn

c)        n           Triangular Numbers           Combinatorial Form
          1                     1                          C
                                                          2 2

          2                     3                           3
                                                             C2
          3                     6                           4
                                                             C2
          4                     10                          5
                                                             C2
          5                     15                          6
                                                             C2

d) The nth triangular number is                      C2.
                                                   n+1




292     MHR • Combinations and the Binomial Theorem
Example 6 Factoring Using the Binomial Theorem

    Rewrite 1 + 10x2 + 40x4 + 80x6 + 80x8 + 32x10 in the form (a + b)n.

    Solution

    There are six terms, so the exponent must be 5.
    The first term of a binomial expansion is an, so a must be 1.
    The final term is 32x10 = (2x2)5, so b = 2x2.
    Therefore, 1 + 10x2 + 40x4 + 80x6 + 80x8 + 32x10 = (1 + 2x2)5



  Key Concepts

  • The coefficients of the terms in the expansion of (a + b)n correspond to the
    terms in row n of Pascal’s triangle.

  • The binomial (a + b)n can also be expanded using combinatorial symbols:
                                                                          n
      (a + b)n = nC0 a n + nC1 a n−1b + nC2 a n−2b 2 + … + nCn b n or    Α
                                                                         r=0
                                                                               Cr a n−rb r
                                                                               n


  • The degree of each term in the binomial expansion of (a + b)n is n.

  • Patterns in Pascal’s triangle can be summarized using combinatorial symbols.

  Communicate Your Understanding

    1. Describe how Pascal’s triangle and the binomial theorem are related.

    2. a) Describe how you would use Pascal’s triangle to expand (2x + 5y)9.
         b) Describe how you would use the binomial theorem to expand (2x + 5y)9.

    3. Relate the sum of the terms in the nth row of Pascal’s triangle to the total
         number of subsets of a set of n elements. Explain the relationship.


Practise                                                          2. Determine the value of k in each of these
                                                                        terms from the binomial expansion of (a + b)10.
A                                                                       a) 210a 6b k         b) 45a kb8      c) 252a kb k
1. Rewrite each of the following using Pascal’s
    formula.                                                      3. How many terms would be in the expansion
    a)        C11            b)        C36                              of the following binomials?
         17                       43
    c)        Cr+1           d)        C4 + 32C5                        a) (x + y)12         b) (2x − 3y)5   c) (5x − 2)20
         n+1                      32
    e)   15
              C10 + 15C9     f) nCr + nCr+1                       4. For the following terms from the expansion
    g)     C − 17C9
         18 9
                             h)     C − 23C7
                                  24 8
                                                                        of (a + b)11, state the coefficient in both nCr
     i) nCr −         Cr−1                                              and numeric form.
                    n−1
                                                                        a) a 2b 9            b) a11          c) a 6b 5
                                                                                        5.4 The Binomial Theorem • MHR      293
Apply, Solve, Communicate                                   10. Communication
                                                                a) Find and simplify the first five terms of
 B                                                                   the expansion of (3x + y)10.
 5. Using the binomial theorem and patterns in
                                                                b) Find and simplify the first five terms of
      Pascal’s triangle, simplify each of the
                                                                     the expansion of (3x − y)10.
      following.
                                                                c) Describe any similarities and differences
      a) 9C0 + 9C1 + … + 9C9
                                                                     between the terms in parts a) and b).
      b)    12
                 C0 − 12C1 + 12C2 − … − 12C11 + 12C12
            15                          n                   11. Use the binomial theorem to expand and
      c)    Α
            r=0
                  15
                       Cr          d)   ΑC
                                        r=0
                                            n   r
                                                               simplify the following.

                                                                     ΂          ΃                 ΂           ΃
                                                                         1 5                   3                  4
                                                               a) x2 − ᎏ               b) 2y + ᎏ
           n                                                             x                     y2
 6. If     Α nCr = 16 384, determine the value of n.
                                                                                                  ΂           ΃
           r=0                                                                                            k    5
                                                                c) (͙x + 2x2)6
                                                                     ෆ                          d) k + ᎏ2
 7. a) Write formulas in combinatorial form for                                                           m
            the following. (Refer to section 4.4, if
                                                                     ΂              ΃                 ΂               ΃
                                                                           2        7                          2      4
            necessary.)                                         e)    ͙y − ᎏ
                                                                       ෆ                        f) 2 3m2 − ᎏ
                                                                           ͙yෆ                                ͙ෆm
            i)    the sum of the squares of the terms in
                  the nth row of Pascal’s triangle          12. Application Rewrite the following expansions
            ii) the result of alternately adding and           in the form (a + b)n, where n is a positive
                  subtracting the squares of the terms in      integer.
                  the nth row of Pascal’s triangle              a) x6 + 6x5y + 15x4 y2 + 20 x3y3 + 15x2y4
            iii) the number of diagonals in an n-sided               + 6xy5 + y6
                  polygon                                       b) y12 + 8y9 + 24y6 + 32y3 + 16
      b) Use your formulas from part a) to                      c) 243a5 − 405a4b + 270a3b 2 − 90a2b3
            determine                                                + 15ab4 − b5
            i)    the sum of the squares of the terms in
                                                            13. Communication Use the binomial theorem to
                  row 15 of Pascal’s triangle
                                                               simplify each of the following. Explain your
            ii) the result of alternately adding and           results.
                  subtracting the squares of the terms in
                                                                     ΂ ΃ ΂ ΃ ΂ ΃                              ΂ ΃
                                                                    1 5      1 5         1 5      1 5
                  row 12 of Pascal’s triangle                  a) ᎏ + 5 ᎏ + 10 ᎏ + 10 ᎏ
                                                                    2        2           2        2
            iii) the number of diagonals in a 14-sided
                                                                       ΂2΃ ΂2΃
                                                                        1   1   5           5
                  polygon                                            +5 ᎏ + ᎏ

 8. How many terms would be in the expansion                    b) (0.7)7 + 7(0.7)6(0.3) + 21(0.7)5(0.3)2 + …
      of (x2 + x)8?                                                  + (0.3)7
                                                                c) 79 − 9 × 78 + 36 × 77 − … − 70
 9. Use the binomial theorem to expand and
      simplify the following.
                                                                                ΂           ΃ and compare it with
                                                                                        2   4
                                                            14. a) Expand x + ᎏᎏ
      a) (x + y)        7
                                   b) (2x + 3y)     6                                   x
      c) (2x − 5y)                 d) (x2 + 5)4                                      1
                            5
                                                                   the expansion of ᎏ (x2 + 2)4.
                                                                                    x4
      e) (3a2 + 4c)7                f) 5(2p − 6c2)5
                                                                b) Explain your results.


294        MHR • Combinations and the Binomial Theorem
15. Use your knowledge of algebra and the                          20. Inquiry/Problem Solving
     binomial theorem to expand and simplify                           a) Use the binomial theorem to expand
     each of the following.                                               (x + y + z)2 by first rewriting it as
     a) (25x2 + 30xy + 9y2)3                                              [x + ( y + z)]2.
     b) (3x − 2y)5(3x + 2y)5                                           b) Repeat part a) with (x + y + z)3.
                                                                       c) Using parts a) and b), predict the
16. Application
                                                                          expansion of (x + y + z)4. Verify your
     a) Calculate an approximation for (1.2)9 by
                                                                          prediction by using the binomial
         expanding (1 + 0.2)9.                                            theorem to expand (x + y + z)4.
     b) How many terms do you have to evaluate
                                                                       d) Write a formula for (x + y + z)n.
         to get an approximation accurate to two
                                                                       e) Use your formula to expand and simplify
         decimal places?
                                                                          (x + y + z)5.
17. In a trivia contest, Adam has drawn a topic he
                                                                   21. a) In the expansion of (x + y)5, replace x
     knows nothing about, so he makes random
                                                                          and y with B and G, respectively. Expand
     guesses for the ten true/false questions. Use
                                                                          and simplify.
     the binomial theorem to help find
                                                                       b) Assume that a couple has an equal
     a) the number of ways that Adam can
                                                                          chance of having a boy or a girl. How
         answer the test using exactly four trues
                                                                          would the expansion in part a) help find
     b) the number of ways that Adam can
                                                                          the number of ways of having k girls in a
         answer the test using at least one true                          family with five children?
                                                                       c) In how many ways could a family with
       ACHIEVEMENT CHECK
                                                                          five children have exactly three girls?
  Knowledge/     Thinking/Inquiry/
                                     Communication   Application       d) In how many ways could they have
 Understanding    Problem Solving
                                                                          exactly four boys?
18. a) Expand (h + t)5.
                                                                   22. A simple code consists of a string of five
     b) Explain how this expansion can be used to
                                                                      symbols that represent different letters of
         determine the number of ways of getting
                                                                      the alphabet. Each symbol is either a dot (•)
         exactly h heads when five coins are tossed.
                                                                      or a dash (–).
     c) How would your answer in part b)
                                                                       a) How many different letters are possible
         change if six coins are being tossed? How
                                                                          using this code?
         would it change for n coins? Explain.
                                                                       b) How many coded letters will contain
                                                                          exactly two dots?
 C                                                                     c) How many different coded letters will
                                                                          contain at least one dash?
19. Find the first three terms, ranked by degree
     of the terms, in each expansion.
     a) (x + 3)(2x + 5)4
     b) (2x + 1)2(4x − 3)5
     c) (x2 − 5)9(x3 + 2)6




                                                                                   5.4 The Binomial Theorem • MHR   295
Review of Key Concepts

5.1 Organized Counting With Venn                          b) Use a Venn diagram to find the
Diagrams                                                        proportion of households in each
Refer to the Key Concepts on page 270.                          of these categories.

 1. Which regions in the diagram below
                                                      5.2 Combinations
      correspond to
                                                      Refer to the Key Concepts on page 278.
      a) the union of sets A and B?
      b) the intersection of sets B and C?             4. Evaluate the following and indicate any
                                                          calculations that could be done manually.
      c) A ∩ C?
                                                          a)    41
                                                                     C8    b)    33
                                                                                      C15
      d) either B or S?
                                                          c)    25
                                                                     C17   d)    50
                                                                                      C10
                S       A                      B
                                                          e)    10
                                                                     C8     f)   15
                                                                                      C13
                        R2       R6       R3              g) 5C4           h)         C24
                                                                                 25
                R1               R8
                            R5        R7                   i)   15
                                                                     C11    j)   25
                                                                                      C20
                                 R4                       k)    16
                                                                     C8     l)   30
                                                                                      C26

                                      C                5. A track and field club has 12 members who
                                                          are runners and 10 members who specialize
 2. a) Write the equation for the number of
                                                          in field events. The club has been invited to
         elements contained in either of two sets.        send a team of 3 runners and 2 field athletes
      b) Explain why the principle of inclusion           to an out-of-town meet. How many
         and exclusion subtracts the last term in         different teams could the club send?
         this equation.
                                                       6. A bridge hand consists of 13 cards. How
      c) Give a simple example to illustrate your
         explanation.                                     many bridge hands include 5 cards of one
                                                          suit, 6 cards of a second, and 2 cards of a
 3. A survey of households in a major city found          third?
      that
                                                       7. Explain why combination locks should really
      • 96% had colour televisions                        be called permutation locks.
      • 65% had computers
      • 51% had dishwashers                           5.3 Problem Solving With Combinations
      • 63% had colour televisions and computers      Refer to the Key Concepts on page 286.
      • 49% had colour televisions and                 8. At Subs Galore, you have a choice of lettuce,
        dishwashers                                       onions, tomatoes, green peppers,
      • 31% had computers and dishwashers                 mushrooms, cheese, olives, cucumbers, and
      • 30% had all three                                 hot peppers on your submarine sandwich.
      a) List the categories of households not
                                                          How many ways can you “dress” your
         included in these survey results.                sandwich?



296     MHR • Combinations and the Binomial Theorem
9. Ballots for municipal elections usually list    16. Use the binomial theorem to expand
    candidates for several different positions.         a) (x + y)6
    If a resident can vote for a mayor, two
                                                        b) (6x − 5y)4
    councillors, a school trustee, and a hydro
    commissioner, how many combinations of              c) (5x + 2y)5
    positions could the resident choose to mark         d) (3x − 2)6
    on the ballot?
                                                    17. Write the first three terms of the expansion
10. There are 12 questions on an examination,          of
    and each student must answer 8 questions            a) (2x + 5y)7
    including at least 4 of the first 5 questions.       b) (4x − y)6
    How many different combinations of
    questions could a student choose to answer?     18. Describe the steps in the binomial expansion
                                                       of (2x − 3y)6.
11. Naomi invites eight friends to a party on
    short notice, so they may not all be able to    19. Find the last term in the binomial expansion

                                                            ΂ᎏxᎏ + 2x΃ .
    come. How many combinations of guests                    1          5
                                                       of     2
    could attend the party?

12. In how many ways could 15 different books       20. Find the middle term in the binomial

                                                                        ΂            ΃
    be divided equally among 3 people?                                   5 8
                                                       expansion of ͙x + ᎏ .
                                                                     ෆ
                                                                         ͙ෆ
                                                                          x
13. The camera club has five members, and the
    mathematics club has eight. There is only       21. In the expansion of (a + x)6, the first three
    one member common to both clubs. In how            terms are 1 + 3 + 3.75. Find the values
    many ways could a committee of four people         of a and x.
    be formed with at least one member from
                                                    22. Use the binomial theorem to expand and
    each club?
                                                       simplify ( y2 − 2)6( y2 + 2)6.

                                                    23. Write 1024x10 − 3840x8 + 5760x6 − 4320x4 +
5.4 The Binomial Theorem                               1620x2 − 243 in the form (a + b)n. Explain
Refer to the Key Concepts on page 293.
                                                       your steps.
14. Without expanding (x + y)5, determine
    a) the number of terms in the expansion
    b) the value of k in the term 10x k y2

15. Use Pascal’s triangle to expand
    a) (x + y)8
    b) (4x − y)6
    c) (2x + 5y)4
    d) (7x − 3)5


                                                                        Review of Key Concepts • MHR   297
Chapter Test

ACHIEVEMENT CHART

                               Knowledge/     Thinking/Inquiry/
       Category                                                   Communication       Application
                              Understanding    Problem Solving
       Questions                   All               12              6, 12           5, 6, 7, 8, 9


 1. Evaluate each of the following. List any                      6. A track club has 20 members.
      calculations that require a calculator.                         a) In how many ways can the club choose
      a)     C
           25 25
                                                                         3 members to help officiate at a meet?
      b)   52
                C1                                                   b) In how many ways can the club choose
      c)        C3                                                       a starter, a marshal, and a timer?
           12
      d)        C15                                                   c) Should your answers to parts a) and b)
           40
                                                                         be the same? Explain why or why not.
 2. Rewrite each of the following as a single
      combination.                                                7. Statistics on the grade-12 courses taken
                                                                     by students graduating from a secondary
      a)   10
                C7 + 10C8
                                                                     school showed that
      b)   23
                C15 − 22C14                                          • 85 of the graduates had taken a science
 3. Use Pascal’s triangle to expand
                                                                       course
                                                                     • 75 of the graduates had taken a second
      a) (3x − 4)4
                                                                       language
      b) (2x + 3y)7                                                  • 41 of the graduates had taken
                                                                       mathematics
 4. Use the binomial theorem to expand
                                                                     • 43 studied both science and a second
      a) (8x − 3)5                                                     language
      b) (2x − 5y)6                                                  • 32 studied both science and mathematics
                                                                     • 27 had studied both a second language
 5. A student fundraising committee has 14                             and mathematics
      members, including 7 from grade 12. In how                     • 19 had studied all three subjects
      many ways can a 4-member subcommittee
                                                                      a) Use a Venn diagram to determine the
      for commencement awards be formed if
                                                                         minimum number of students who could
      a) there are no restrictions?                                      be in this graduating class.
      b) the subcommittee must be all grade-12                       b) How many students studied
           students?                                                     mathematics, but neither science nor
      c) the subcommittee must have 2 students                           a second language?
           from grade 12 and 2 from other grades?
      d) the subcommittee must have no more
           than 3 grade-12 students?




298        MHR • Combinations and the Binomial Theorem
8. A field-hockey team played seven games and                       b) The restaurant also has a lunch special
  won four of them. There were no ties.                               with your choice of one item from each
   a) How many arrangements of the four                               group. How many choices do you have
        wins and three losses are possible?                           with this special?
   b) In how many of these arrangements                        10. In the expansion of (1 + x)n, the first three
        would the team have at least two wins                     terms are 1 − 0.9 + 0.36. Find the values of x
        in a row?                                                 and n.

9. A restaurant offers an all-you-can-eat                      11. Use the binomial theorem to expand and
  Chinese buffet with the following items:                        simplify (4x2 − 12x + 9)3.
   •   egg roll, wonton soup                                   12. A small transit bus has 8 window seats and
   •   chicken wings, chicken balls, beef, pork                   12 aisle seats. Ten passengers board the bus
   •   steamed rice, fried rice, chow mein                        and select seats at random. How many
   •   chop suey, mixed vegetables, salad                         seating arrangements have all the window
   •   fruit salad, custard tart, almond cookie                   seats occupied if which passenger is in a seat
   a) How many different combinations of                          a) does not matter?     b) matters?
        items could you have?




       ACHIEVEMENT CHECK

 Knowledge/Understanding    Thinking/Inquiry/Problem Solving         Communication               Application
13. The students’ council is having pizza at their next meeting. There are 20
   council members, 6 of whom are vegetarian. A committee of 3 will order six
   pizzas from a pizza shop that has a special price for large pizzas with up to
   three toppings. The shop offers ten different toppings.
   a) How many different pizza committees can the council choose if there must
        be at least one vegetarian and one non-vegetarian on the committee?
   b) In how many ways could the committee choose exactly three toppings for a
        pizza?
   c) In how many ways could the committee choose up to three toppings for a
        pizza?
   d) The committee wants as much variety as possible in the toppings. They
        decide to order each topping exactly once and to have at least one topping
        on each pizza. Describe the different cases possible when distributing the
        toppings in this way.
   e) For one of these cases, determine the number of ways of choosing and
        distributing the ten toppings.


                                                                                          Chapter Test • MHR   299
6
    PT   ER
                  Introduction to Probability
CHA




                  Specific Expectations                                                            Section

                  Use Venn diagrams as a tool for organizing information in counting                 6.5
                  problems.

                  Solve problems, using techniques for counting permutations where some              6.3
                  objects may be alike.

                  Solve problems, using techniques for counting combinations.                        6.3

                  Solve probability problems involving combinations of simple events,           6.3, 6.4, 6.5,
                  using counting techniques.                                                         6.6

                  Interpret probability statements, including statements about odds, from a     6.1, 6.2, 6.3,
                  variety of sources.                                                           6.4, 6.5, 6.6

                  Design and carry out simulations to estimate probabilities in situations           6.3
                  for which the calculation of the theoretical probabilities is difficult or
                  impossible.

                  Assess the validity of some simulation results by comparing them with the          6.3
                  theoretical probabilities, using the probability concepts developed in the
                  course.

                  Represent complex tasks or issues, using diagrams.                              6.1, 6.5

                  Represent numerical data, using matrices, and demonstrate an
                  understanding of terminology and notation related to matrices.                     6.6

                  Demonstrate proficiency in matrix operations, including addition, scalar
                  multiplication, matrix multiplication, the calculation of row sums, and the        6.6
                  calculation of column sums, as necessary to solve problems, with and
                  without the aid of technology.

                  Solve problems drawn from a variety of applications, using matrix
                  methods.                                                                           6.6
Chapter Problem
Genetic Probabilities                          themselves, while offspring of healthy does
Biologists are studying a deer population      have only a 20% likelihood of developing
in a provincial conservation area. The         it. Currently, 30% of the does have bald
biologists know that many of the bucks         patches.
(male deer) in the area have an unusual         1. Out of ten deer randomly captured,
“cross-hatched” antler structure, which           how many would you expect to have
seems to be genetic in origin. Of                 either cross-hatched antlers or bald
48 randomly tagged deer, 26 were does             patches?
(females), 22 were bucks, and 7 of the
bucks had cross-hatched antlers.                2. Do you think that the proportion of
                                                  does with the bald patches will increase,
Several of the does have small bald patches
                                                  decrease, or remain relatively stable?
on their hides. This condition also seems to
have some genetic element. Careful long-       In this chapter, you will learn methods that
term study has found that female offspring     the biologists could use to calculate
of does with bald patches have a 65%           probabilities from their samples and to
likelihood of developing the condition         make predictions about the deer population.
Review of Prerequisite Skills

If you need help with any of the skills listed in purple below, refer to Appendix A.

 1. Fractions, percents, decimals Express each           6. Tree diagrams In the game of backgammon,
      decimal as a percent.                                you roll two dice to determine how you can
      a) 0.35                                              move your counters. Suppose you roll first
                                                           one die and then the other and you need to
      b) 0.04
                                                           roll 9 or more to move a counter to safety.
      c) 0.95                                              Use a tree diagram to list the different
      d) 0.008                                             rolls in which
      e) 0.085                                              a) you make at least 9
      f) 0.375                                              b) you fail to move your counter to safety

 2. Fractions, percents, decimals Express each           7. Fundamental counting principle (section 4.1)
      percent as a decimal.                                Benoit is going skating on a cold wintry day.
      a) 15%                      b) 3%                    He has a toque, a watch cap, a beret, a heavy
      c) 85%                      d) 6.5%
                                                           scarf, a light scarf, leather gloves, and wool
                                                           gloves. In how many different ways can
      e) 26.5%                     f) 75.2%
                                                           Benoit dress for the cold weather?
 3. Fractions, percents, decimals Express each
                                                         8. Additive counting principle (section 4.1) How
      percent as a fraction in simplest form.              many 13-card bridge hands include either
   a) 12%                         b) 35%                   seven hearts or eight diamonds?
   c) 67%                         d) 4%
                                                         9. Venn diagrams (section 5.1)
   e) 0.5%                         f) 98%
                                                            a) List the elements for each of the
 4. Fractions, percents, decimals Express each                 following sets for whole numbers from
      fraction as a percent. Round answers to the              1 to 10 inclusive.
      nearest tenth, if necessary.                             i) E, the set of even numbers
          1                       13                           ii) O, the set of odd numbers
       a) ᎏᎏ                  b) ᎏᎏ
          4                       15                           iii) C, the set of composite numbers
          11                       7
       c) ᎏᎏ                  d) ᎏᎏ                            iv) P, the set of perfect squares
          14                      10
          4                      13                         b) Draw a diagram to illustrate how the
       e) ᎏᎏ                  f) ᎏᎏ
          9                      20                            following sets are related.
                                                               i) E and O
 5. Tree diagrams A coin is flipped three times.
      Draw a tree diagram to illustrate all possible           ii) E and C
      outcomes.                                                iii) O and P
                                                               iv) E, C, and P




302      MHR • Introduction to Probability
10. Principle of inclusion and exclusion             16. Combinations (section 5.2) A pizza shop has
    (section 5.1)                                       nine toppings available. How many different
    a) Explain the principle of inclusion and           three-topping pizzas are possible if each
       exclusion.                                       topping is selected no more than once?
    b) A gift store stocks baseball hats in red or   17.Combinations (section 5.3) A construction
       green colours. Of the 35 hats on display         crew has 12 carpenters and 5 drywallers.
       on a given day, 20 are green. As well,           How many different safety committees could
       18 of the hats have a grasshopper logo           they form if the members of this committee
       on the brim. Suppose 11 of the red hats          are
       have logos. How many hats are red, or
       have logos, or both?                              a) any 5 of the crew?
                                                         b) 3 carpenters and 2 drywallers?
11. Factorials (section 4.2) Evaluate.
    a) 6!              b) 0!                         18. Matrices (section 1.6) Identify any square
         16!                 12!                        matrices among the following. Also identify
    c) ᎏ               d) ᎏ                             any column or row matrices.
         14!                9! 3!

    e) ᎏᎏ
         100!
         98!
                       f) ᎏᎏ
                              16!
                            10! × 8!
                                                        a)
                                                              ΄
                                                             3 4
                                                             0 1          ΅  b) [0.4 0.3 0.2]




                                                              ΄                 ΅                ΄               ΅
                                                                      1   0                      −2     3    9
12. Permutations (section 4.2) Evaluate.
                                                         c)           0.5 0.5            d)       0    11   −4
    a) 5P3             b)    P
                            7 1                                       0.8 0.6                     3     6   −1
    c) P(6, 2)         d)    P

                                                              ΄            ΅                     ΄ ΅
                            9 9
                                                                  49 63                           8
    e)   100 1
              P        f) P(100, 2)
                                                         e)       25 14                     f)   16
13. Permutations (section 4.2) A baseball team                    72 9                           32
    has 13 members. If a batting line-up consists
                                                     19. Matrices (section 1.7) Given A = [0.3 0.7]
    of 9 players, how many different batting
    line-ups are possible?                              and B =
                                                                       ΄ 0.4
                                                                         0.55
                                                                          0.45          ΅
                                                                          0.6 , perform the

14. Permutations (section 4.2) What is the              following matrix operations, if possible. If
    maximum number of three-digit area codes            the operation is not possible, explain why.
    possible if the area codes cannot start with
                                                         a) A × B                   b) B × A
    either 1 or 0?
                                                                  2
                                                         c) B                       d) B3
15. Combinations (section 5.2) Evaluate these            e) A 2                     f) A × A t
    expressions.
    a) 6C3             b) C(4, 3)
    c) 8C8             d)   11
                                 C0

         ΂4΃ × ΂5 ΃         ΂1΃
          6        7         100
    e)                 f)

    g)   20
              C2       h)   20
                                 C18
                                                                         Review of Prerequisite Skills • MHR     303
6.1            Basic Probability Concepts

  How likely is rain tomorrow? What are the
  chances that you will pass your driving test on
  the first attempt? What are the odds that the
  flight will be on time when you go to meet
  someone at the airport?

  Probability is the branch of mathematics that
  attempts to predict answers to questions like
  these. As the word probability suggests, you can
  often predict only what might happen.
  However, you may be able to calculate how
  likely it is. For example, if the weather report
  forecasts a 90% chance of rain, there is still
  that slight possibility that sunny skies will
  prevail. While there are no sure answers, in
  this case it probably will rain.

        I N V E S T I G AT E & I N Q U I R E : A Number Game

        Work with a partner. Have each partner take three identical slips of paper,
        number them 1, 2, and 3, and place them in a hat, bag, or other container.
        For each trial, both partners will randomly select one of their three slips of
        paper. Replace the slips after each trial. Score points as follows:
        • If the product of the two numbers shown is less than the sum, Player A gets
          a point.
        • If the product is greater than the sum, Player B gets a point.
        • If the product and sum are equal, neither player gets a point.

         1. Predict who has the advantage in this game. Explain why you think so.

         2. Decide who will be Player A by flipping a coin or using the random
            number generator on a graphing calculator. Organize your results in a
            table like the one below.
                       Trial             1     2   3   4   5     6     7    8     9      10
             Number drawn by A
             Number drawn by B
             Product
             Sum
             Point awarded to:



  304      MHR • Introduction to Probability
3. a) Record the results for 10 trials. Total the points and determine the winner.
           Do the results confirm your prediction? Have you changed your opinion on
           who has the advantage? Explain.
        b) To estimate a probability for each player getting a point, divide the number
           of points each player earned by the total number of trials.

     4. a) Perform 10 additional trials and record point totals for each player over all
           20 trials. Estimate the probabilities for each player, as before.
        b) Are the results for 20 trials consistent with the results for 10 trials? Explain.
        c) Are your results consistent with those of your classmates? Comment on
           your findings.

     5. Based on your results for 20 trials, predict how many points each player will
       have after 50 trials.

     6. Describe how you could alter the game so that the other player has the advantage.



The investigation you have just completed is an example of a probability experiment.
In probability, an experiment is a well-defined process consisting of a number of trials
in which clearly distinguishable outcomes, or possible results, are observed.

The sample space, S, of an experiment is the set of all possible outcomes. For the
sum/product game in the investigation, the outcomes are all the possible pairings of
slips drawn by the two players. For example, if Player A draws 1 and Player B draws 2,
you can label this outcome (1, 2). In this particular game, the result is the same for the
outcomes (1, 2) and (2, 1), but with different rules it might be important who draws
which number, so it makes sense to view the two outcomes as different.

Outcomes are often equally likely. In the sum/product game, each possible pairing of
numbers is as likely as any other. Outcomes are often grouped into events. An
example of an event is drawing slips for which the product is greater than the sum, and
there are several outcomes in which this event happens. Different events often have
different chances of occurring. Events are usually labelled with capital letters.


Example 1 Outcomes and Events

Let event A be a point awarded to Player A in the sum/product game.
List the outcomes that make up event A.

Solution
Player A earns a point if the sum of the two numbers is greater than the
product. This event is sometimes written as event A = {sum > product}.
A useful technique in probability is to tabulate the possible outcomes.


                                                                      6.1 Basic Probability Concepts • MHR   305
Sums                                 Products
                        Player A                                 Player A
                    1      2       3                       1       2        3
  Player B   1      2      3       4       Player B   1    1       2        3
             2      3      4       5                  2    2       4        6
             3      4      5       6                  3    3       6        9

Use the tables shown to list the outcomes where the sum is greater than the product:
(1, 1), (1, 2), (1, 3), (2, 1), (3, 1)
These outcomes make up event A. Using this list, you can also write event A as
event A = {(1, 1), (1, 2), (1, 3), (2, 1), (3, 1)}


The probability of event A, P(A), is a quantified measure of the likelihood that
the event will occur. The probability of an event is always a value between 0 and 1.
A probability of 0 indicates that the event is impossible, and 1 signifies that the
event is a certainty. Most events in probability studies fall somewhere between
these extreme values. Probabilities less than 0 or greater than 1 have no
meaning. Probability can be expressed as fractions, decimals, or percents.
Probabilities expressed as percents are always between 0% and 100%. For
example, a 70% chance of rain tomorrow means the same as a probability of 0.7,
    7
or ᎏᎏ, that it will rain.
   10
The three basic types of probability are
• empirical probability, based on direct observation or experiment
• theoretical probability, based on mathematical analysis
• subjective probability, based on informed guesswork

The empirical probability of a particular event (also called experimental or
relative frequency probability) is determined by dividing the number of times
that the event actually occurs in an experiment by the number of trials. In the
sum/product investigation, you were calculating empirical probabilities. For
example, if you had found that in the first ten trials, the product was greater
than the sum four times, then the empirical probability of this event would be
          4
 P(A) = ᎏ
         10
        2
      = ᎏᎏ or 0.4
        5
The theoretical probability of a particular event is deduced from analysis of
the possible outcomes. Theoretical probability is also called classical or a priori
probability. A priori is Latin for “from the preceding,” meaning based on
analysis rather than experiment.


306    MHR • Introduction to Probability
For example, if all possible outcomes are equally likely, then                          Project
                                                                                        Prep
       n(A)
P(A) = ᎏᎏ
       n(S)                                                                          You will need to
where n(A) is the number of outcomes in which event A can occur, and n(S)            determine theoretical
is the total number of possible outcomes. You used tables to list the                probabilities to
outcomes for A in Example 1, and this technique allows you to find the                design and analyse
theoretical probability P(A) by counting n(A) = 5 and n(S) = 9. Another way          your game in the
to determine the values of n(A) and n(S) is by organizing the information in         probability project.
a tree diagram.

Example 2 Using a Tree Diagram to Calculate Probability

Determine the theoretical probabilities for each key event in the sum/product
game.

Solution
                                                                         product   sum
The tree diagram shows the nine possible outcomes, each              1     1   <    2
equally likely, for the sum/product game.                        1   2     2   <    3
                                                                     3     3   <    4
Let event A be a point for Player A, event B a point                 1     2   <    3
for Player B, and event C a tie between sum and                  2   2     4   =    4
product. From the tree diagram, five of the nine possible             3     6   >    5
outcomes have the sum greater than the product.                      1     3   <    4
Therefore, the theoretical probability of this event is          3   2     6   >    5
                                                                     3     9   >    6
         n(A)
 P(A) = ᎏ
         n(S)
        5
      = ᎏᎏ
        9
Similarly,
         n(B)                  n(C)
 P(B) = ᎏ and P(C ) = ᎏ
         n(S)                  n(S)
        3                      1
      = ᎏᎏ                  = ᎏᎏ
        9                      9

In Example 2, you know that one, and only one, of the three events will occur.
The sum of the probabilities of all possible events always equals 1.
                     5 3 1
P(A) + P(B) + P(C) = ᎏᎏ + ᎏᎏ + ᎏᎏ
                     9 9 9
                   =1
Here, the numerator in each fraction represents the number of ways that each
event can occur. The total of these numerators is the total number of possible
outcomes, which is equal to the denominator.



                                                                     6.1 Basic Probability Concepts • MHR   307
Empirical probabilities may differ sharply from theoretical probabilities when
only a few trials are made. Such statistical fluctuation can result in an event
occurring more frequently or less frequently than theoretical probability
suggests. Over a large number of trials, however, statistical fluctuations tend to
cancel each other out, and empirical probabilities usually approach theoretical
values. Statistical fluctuations often appear in sports, for example, where a team
can enjoy a temporary winning streak that is not sustainable over an entire
season.

In most problems, you will be determining theoretical probability. Therefore,
from now on you may take the term probability to mean theoretical probability
unless stated otherwise.


Example 3 Dice Probabilities

Many board games involve a roll of two six-sided dice to see how far you may
move your pieces or counters. What is the probability of rolling a total of 7?

Solution
The table shows the totals for all possible rolls of two dice.
                               First Die
                  1      2      3      4         5    6
             1    2      3      4      5         6    7
             2
Second Die




                  3      4      5      6         7    8
             3    4      5      6      7         8    9
             4    5      6      7      8         9    10
             5    6      7      8      9         10   11
             6    7      8      9      10        11   12

To calculate the probability of a particular total, count the number of times it
appears in the table. For event A = {rolling 7},
        n(A)
P(A) = ᎏ
        n(S)
         n(rolls totalling 7)
     = ᎏᎏᎏ
        n(all possible rolls)
        6
     = ᎏᎏ
        36
        1
     = ᎏᎏ
        6
                                           1
The probability of rolling a total of 7 is ᎏᎏ.
                                           6




308          MHR • Introduction to Probability
A useful and important concept in probability is the complement of an event.
The complement of event A, A′ or ~A, is the event that “event A does not
happen.” Thus, whichever outcomes make up A, all the other outcomes make
up A′. Because A and A′ together include all possible outcomes, the sum of their
probabilities must be 1. Thus,
P(A) + P(AЈ) = 1      and     P(AЈ) = 1 − P(A)

   A'
             A




The event A′ is usually called “A-prime,” or sometimes “not-A”; ~A is called
“tilde-A.”


Example 4 The Complement of an Event

What is the probability that a randomly drawn integer between 1 and 40 is not a
perfect square?

Solution

Let event A = {a perfect square}. Then, the complement of A is the event
A′ = {not a perfect square}. In this case, you need to calculate P(A′), but it is easier
to do this by finding P(A) first. There are six perfect squares between 1 and
40: 1, 4, 9, 16, 25, and 36. The probability of a perfect square is, therefore,
          n(A)
 P(A) = ᎏ
          n(S)
          6
       = ᎏᎏ
          40
          3
       = ᎏᎏ
          20
Thus,
P(A′) = 1 − P(A)
              3
       = 1 − ᎏᎏ
              20
          17
       = ᎏᎏ
          20
           17
There is a ᎏᎏ or 85% chance that a random integer between 1 and 40 will not
           20
be a perfect square.




                                                                       6.1 Basic Probability Concepts • MHR   309
Subjective probability, the third basic type of probability, is an estimate of
likelihood based on intuition and experience—an educated guess. For example,
a well-prepared student may be 90% confident of passing the next data
management test. Subjective probabilities often figure in everyday speech in
expressions such as “I think the team has only a 10% chance of making the
finals this year.”


Example 5 Determining Subjective Probability

Estimate the probability that
a) the next pair of shoes you buy will be the same size as the last pair you
    bought
b)    an expansion baseball team will win the World Series in their first season
c)    the next person to enter a certain coffee shop will be male

Solution

a) There is a small chance that the size of your feet has changed significantly
      or that different styles of shoes may fit you differently, so 80–90% would be
      a reasonable subjective probability that your next pair of shoes will be the
      same size as your last pair.

b) Expansion teams rarely do well during their first
      season, and even strong teams have difficulty
      winning the World Series. The subjective
                                                                      www.mcgrawhill.ca/links/MDM12
      probability of a brand-new team winning the
      World Series is close to zero.                            For some interesting baseball statistics, visit the
                                                                 above web site and follow the links. Write a
c) Without more information about the coffee shop                     problem that could be solved using
      in question, your best estimate is to assume that                          probabilities.
      the shop’s patrons are representative of the general
      population. This assumption gives a subjective probability of
      50% that the next customer will be male.


Note that the answers in Example 5 contain estimates, assumptions, and, in some
cases, probability ranges. While not as rigorous a measure as theoretical or
empirical probability, subjective probabilities based on educated guesswork can
still prove useful in some situations.




310     MHR • Introduction to Probability
Key Concepts

• A probability experiment is a well-defined process in which clearly identifiable
  outcomes are measured for each trial.

• An event is a collection of outcomes satisfying a particular condition. The
  probability of an event can range between 0 (impossible) and 1 or 100%
  (certain).

• The empirical probability of an event is the number of times the event occurs
  divided by the total number of trials.
                                                               n(A)
• The theoretical probability of an event A is given by P(A) = ᎏ , where
                                                               n(S)
  n(A) is the number of outcomes making up A, n(S ) is the total number
  of outcomes in the sample space S, and all outcomes are equally likely
  to occur.

• A subjective probability is based on intuition and previous experience.

• If the probability of event A is given by P(A), then the probability of the
  complement of A is given by P(AЈ) = 1 − P(A).


Communicate Your Understanding

 1. Give two synonyms for the word probability.

 2. a) Explain why P(A) + P(AЈ) = 1.
    b) Explain why probabilities less than 0 or greater than 1 have no meaning.

 3. Explain the difference between theoretical, empirical, and subjective
   probability. Give an example of how you would determine each type.

 4. Describe three situations in which statistical fluctuations occur.

 5. a) Describe a situation in which you might determine the probability of
       event A indirectly by calculating P(AЈ) first.
    b) Will this method always yield the same result as calculating P(A) directly?
    c) Defend your answer to part b) using an explanation or proof, supported
       by an example.




                                                                6.1 Basic Probability Concepts • MHR   311
Practise                                                             Determine the following probabilities.
                                                                     a) P(resident owns home)
 A
                                                                     b) P(resident rents and has lived at present
 1. Determine the probability of
                                                                        address less than two years)
      a) tossing heads with a single coin
                                                                     c) P(homeowner has lived at present
      b) tossing two heads with two coins                               address more than two years)
      c) tossing at least one head with three coins
                                                                 B
      d) rolling a composite number with one die
                                                                 5. Application Suppose your school’s basketball
      e) not rolling a perfect square with two dice
                                                                     team is playing a four-game series against
       f) drawing a face card from a standard deck                   another school. So far this season, each team
         of cards                                                    has won three of the six games in which they
                                                                     faced each other.
 2. Estimate a subjective probability of each of
      the following events. Provide a rationale for                  a) Draw a tree diagram to illustrate all
      each estimate.                                                    possible outcomes of the series.
      a) the sun rising tomorrow                                     b) Use your tree diagram to determine the
                                                                        probability of your school winning
      b) it never raining again
                                                                        exactly two games.
      c) your passing this course
                                                                     c) What is the probability of your school
      d) your getting the next job you apply for                        sweeping the series (winning all four
 3. Recall the sum/product game at the
                                                                        games)?
      beginning of this section. Suppose that the                    d) Discuss any assumptions you made in the
      game were altered so that the slips of paper                      calculations in parts b) and c).
      showed the numbers 2, 3, and 4, instead of
                                                                 6. Application Suppose that a graphing calculator
      1, 2, and 3.
                                                                     is programmed to generate a random natural
      a) Identify all the outcomes that will                         number between 1 and 10 inclusive. What is
         produce each of the three possible events                   the probability that the number will be prime?
         i)   p>s        ii) p < s      iii) p = s
                                                                 7. Communication
      b) Which player has the advantage in this
         situation?                                                  a) A game involves rolling two dice. Player
                                                                        A wins if the throw totals 5, 7, or 9.
Apply, Solve, Communicate                                               Player B wins if any other total is
                                                                        thrown. Which player has the advantage?
 4. The town planning department surveyed                               Explain.
      residents of a town about home ownership.
                                                                     b) Suppose the game is changed so that
      The table shows the results of the survey.
                                                                        Player A wins if 5, 7, or doubles (both dice
                    At Address At Address                               showing the same number) are thrown.
                    Less Than More Than              Total for
       Residents     2 Years    2 Years              Category           Who has the advantage now? Explain.
        Owners         2000          8000             10 000         c) Design a similar game in which each
        Renters        4500          1500              6 000            player has an equal chance of winning.
         Total         6500          9500             16 000



312      MHR • Introduction to Probability
8. a) Based on the randomly tagged sample,            11. Communication Prior to a municipal
         pte
    ha            what is the empirical probability that a          election, a public-opinion poll determined
C


           r

                  deer captured at random will be a doe?            that the probability of each of the four
           m
P




r
    oble
               b) If ten deer are captured at random,               candidates winning was as follows:
                  how many would you expect to be                   Jonsson 10%
                  bucks?                                            Trimble 32%
           C                                                        Yakamoto 21%
           9. Inquiry/Problem Solving Refer to the prime            Audette 37%
               number experiment in question 6. What                 a) How will these probabilities change if
               happens to the probability if you change the             Jonsson withdraws from the race after
               upper limit of the sample space? Use a                   ballots are cast?
               graphing calculator or appropriate computer          b) How will these probabilities change if
               software to investigate this problem. Let A              Jonsson withdraws from the race before
               be the event that the random natural                     ballots are cast?
               number will be a prime number. Let the
                                                                     c) Explain why your answers to a) and b)
               random number be between 1 and n
                                                                        are different.
               inclusive. Predict what you think will
               happen to P(A) as n increases. Investigate        12. Inquiry/Problem Solving It is known from
               P(A) as a function of n, and reflect on your          studying past tests that the correct answers
               hypothesis. Did you observe what you                 to a certain university professor’s multiple-
               expected? Why or why not?                            choice tests exhibit the following pattern.
          10. Suppose that the Toronto Blue Jays face the             Correct Answer     Percent of Questions
               New York Yankees in the division final. In                    A                   15%
               this best-of-five series, the winner is the first              B                   25%
               team to win three games. The games are                       C                   30%
               played in Toronto and in New York, with                      D                   15%
               Toronto hosting the first, second, and if                     E                   15%
               needed, fifth games. The consensus among
               experts is that Toronto has a 65% chance of           a) Devise a strategy for guessing that would
               winning at home and a 40% chance of                      maximize a student’s chances for success,
               winning in New York.                                     assuming that the student has no idea of
                                                                        the correct answers. Explain your
               a) Construct a tree diagram to illustrate all
                                                                        method.
                  the possible outcomes.
                                                                    b) Suppose that the study of past tests
               b) What is the chance of Toronto winning
                                                                        revealed that the correct answer choice
                  in three straight games?
                                                                        for any given question was the same as
               c) For each outcome, add to your tree                    that of the immediately preceding
                  diagram the probability of that outcome.              question only 10% of the time. How
               d) Communication Explain how you found                   would you use this information to adjust
                  your answers to parts b) and c).                      your strategy in part a)? Explain your
                                                                        reasoning.



                                                                             6.1 Basic Probability Concepts • MHR   313
6.2           Odds

  Odds are another way to
  express a level of confidence
  about an outcome. Odds are
  commonly used in sports and
  other areas. Odds are often
  used when the probability of
  an event versus its complement
  is of interest, for example
  whether a sprinter will win
  or lose a race or whether a
  basketball team will make it
  to the finals.




        I N V E S T I G AT E & I N Q U I R E : Te n n i s To u r n a m e n t

        For an upcoming tennis tournament, a television commentator estimates
        that the top-seeded (highest-ranked) player has “a 25% probability of
        winning, but her odds of winning are only 1 to 3.”

         1. a) If event A is the top-seeded player winning the tournament, what is A′?
           b) Determine P(A′).

         2. a) How are the odds of the top-seeded player winning related to P(A) and
               P(A′)?
           b) Should the television commentator be surprised that the odds were only
               1 to 3? Why or why not?

         3. a) What factors might the commentator consider when
               estimating the probability of the top-seeded player
               winning the tournament?
                                                                               www.mcgrawhill.ca/links/MDM12
           b) How accurate do you think the
               commentator’s estimate is likely to be?                  For more information about tennis rankings and
               Would you consider such an estimate                     other tennis statistics, visit the above web site and
               primarily a classical, an empirical, or a                 follow the links. Locate some statistics about a
               subjective probability? Explain.                             tennis player of your choice. Use odds to
                                                                                     describe these statistics.




  314     MHR • Introduction to Probability
The odds in favour of an event’s occurring are given by the ratio of the
probability that the event will occur to the probability that it will not occur.
                      P(A)
odds in favour of A = ᎏᎏ
                      P(AЈ)
Giving odds in favour of an event is a common way to express a probability.


Example 1 Determining Odds

A messy drawer contains three red socks, five white socks, and four black socks.
What are the odds in favour of randomly drawing a red sock?

Solution

Let the event A be drawing a red sock. The probability of this event is
          3
 P(A) = ᎏ
         12
        1
      = ᎏᎏ
        4                                                                      Project
The probability of not drawing a red sock is                                   Prep
 P(AЈ) = 1 − P(A)
                                                                               A useful feature you
         3
       = ᎏᎏ                                                                    could include in your
         4
                                                                               probability project is a
Using the definition of odds,
                                                                               calculation of the odds
                        P(A)
odds in favour of A = ᎏ                                                        of winning your game.
                        P(AЈ)
                        1
                        ᎏᎏ
                        4
                     = ᎏ
                        3
                        ᎏᎏ
                        4
                       1
                     = ᎏᎏ
                       3
                                                         1
Therefore, the odds in favour of drawing a red sock are ᎏᎏ, or less than 1. You
                                                         3
are more likely not to draw a red sock. These odds are commonly written as
1:3, which is read as “one to three” or “one in three.”


Notice in Example 1 that the ratio of red socks to other socks is 3:9, which
is the same as the odds in favour of drawing a red sock. In fact, the odds in
favour of an event A can also be found using
                        n(A)
 odds in favour of A = ᎏᎏ
                       n(AЈ)



                                                                                     6.2 Odds • MHR   315
A common variation on the theme of odds is to express the odds against an event
happening.

                 P(AЈ)
odds against A = ᎏᎏ
                 P(A)


Example 2 Odds Against an Event

If the chance of a snowstorm in Windsor, Ontario, in January is estimated at
0.4, what are the odds against Windsor’s having a snowstorm next January? Is a
January snowstorm more likely than not?

Solution

Let event A = {snowstorm in January}.
Since P(A) + P(AЈ) = 1,
                   P(AЈ)
odds against A = ᎏ
                    P(A)
                   1 − P(A)
               = ᎏ
                     P(A)
                   1 − 0.4
               = ᎏ
                     0.4
                   0.6
               = ᎏ
                   0.4
                   3
               = ᎏᎏ
                   2
The odds against a snowstorm are 3:2, which is greater than 1:1. So a
snowstorm is less likely to occur than not.


Sometimes, you might need to convert an expression of odds into a probability.
You can do this conversion by expressing P(AЈ) in terms of P(A).

Example 3 Probability From Odds

A university professor, in an effort to promote good attendance habits, states
that the odds of passing her course are 8 to 1 when a student misses fewer
than five classes. What is the probability that a student with good attendance
will pass?

Solution

Let the event A be that a student with good attendance passes. Since
                      P(A)
odds in favour of A = ᎏ ,
                      P(AЈ)

316   MHR • Introduction to Probability
8   P(A)
       ᎏ = ᎏ
       1   P(AЈ)
               P(A)
          = ᎏ
             1 − P(A)
8 − 8P(A) = P(A)
        8 = 9P(A)
            8
     P(A) = ᎏᎏ
            9
                                                                 8
The probability that a student with good attendance will pass is ᎏᎏ, or
                                                                 9
approximately 89%.


                                                              h                h
In general, it can be shown that if the odds in favour of A = ᎏᎏ, then P(A) = ᎏᎏ.
                                                              k               h+k


Example 4 Using the Odds-Probability Formula

The odds of Rico’s hitting a home run are 2:7. What is the probability of Rico’s
hitting a home run?

Solution

Let A be the event that Rico hits a home run. Then, h = 2 and k = 7, and
          h
P(A) = ᎏ
        h+k
          2
     = ᎏ
        2+7
       2
     = ᎏᎏ
       9
Rico has approximately a 22% chance of hitting a home run.



   Key Concepts
                                                    P(A)
   • The odds in favour of A are given by the ratio ᎏ .
                                                    P(AЈ)

                                               P(AЈ)
   • The odds against A are given by the ratio ᎏ .
                                               P(A)
                                    h                h
   • If the odds in favour of A are ᎏ , then P(A) = ᎏ .
                                    k               h+k




                                                                                    6.2 Odds • MHR   317
Communicate Your Understanding

      1. Explain why the terms odds and probability have different meanings. Give an
         example to illustrate your answer.

      2. Would you prefer the odds in favour of passing your next data management
         test to be 1:3 or 3:1? Explain your choice.

      3. Explain why odds can be greater than 1, but probabilities must be between
         0 and 1.


Practise                                                         Apply, Solve, Communicate
 A                                                                B
 1. Suppose the odds in favour of good weather                    5. Greta’s T-shirt drawer contains three tank
      tomorrow are 3:2.                                               tops, six V-neck T-shirts, and two sleeveless
      a) What are the odds against good weather                       shirts. If she randomly draws a shirt from
         tomorrow?                                                    the drawer, what are the odds that she will
      b) What is the probability of good weather                      a) draw a V-neck T-shirt?
         tomorrow?                                                    b) not draw a tank top?

 2. The odds against the Toronto Argonauts                        6. Application If the odds in favour of Boris
      winning the Grey Cup are estimated at 19:1.                     beating Elena in a chess game are 5 to 4,
      What is the probability that the Argos will                     what is the probability that Elena will win
      win the cup?                                                    an upset victory in a best-of-five chess
                                                                      tournament?
 3. Determine the odds in favour of rolling
      each of the following sums with a standard                  7. a) Based on the randomly tagged sample,
                                                                pte
      pair of dice.                                        ha            what are the odds in favour of a captured
                                                       C


                                                                  r




      a) 12                     b) 5 or less                             deer being a cross-hatched buck?
                                                                  m
                                                       P




                                                       r
                                                           oble
      c) a prime number         d) 1                                  b) What are the odds against capturing a
                                                                         doe?
 4. Calculate the odds in favour of each event.
      a) New Year’s Day falling on a Friday
      b) tossing three tails with three coins
                                                                           www.mcgrawhill.ca/links/MDM12
      c) not tossing exactly two heads with three
         coins                                                          Visit the above web site and follow the links
                                                                           for more information about Canadian
      d) randomly drawing a black 6 from a
                                                                                          wildlife.
         complete deck of 52 cards
      e) a random number from 1 to 9 inclusive
         being even




318     MHR • Introduction to Probability
8. The odds against A, by definition, are                          12. George estimates that there is a 30% chance
    equivalent to the odds in favour of A′. Use                         of rain the next day if he waters the lawn, a
    this definition to show that the odds against                        40% chance if he washes the car, and a 50%
    A are equal to the reciprocal of the odds in                        chance if he plans a trip to the beach.
    favour of A.                                                        Assuming George’s estimates are accurate,
                                                                        what are the odds
 9. Application Suppose the odds of the Toronto
                                                                        a) in favour of rain tomorrow if he waters
    Maple Leafs winning the Stanley Cup are
                                                                           the lawn?
    1:5, while the odds of the Montréal
    Canadiens winning the Stanley Cup are                               b) in favour of rain tomorrow if he washes
    2:13. What are the odds in favour of either                            the car?
    Toronto or Montréal winning the Stanley                             c) against rain tomorrow if he plans a trip
    Cup?                                                                   to the beach?

10. What are the odds against drawing                               C
    a) a face card from a standard deck?                           13. Communication A volleyball coach claims
    b) two face cards?                                                  that at the next game, the odds of her team
                                                                        winning are 3:1, the odds against losing are
       ACHIEVEMENT CHECK
                                                                        5:1, and the odds against a tie are 7:1. Are
                                                                        these odds possible? Explain your reasoning.
  Knowledge/     Thinking/Inquiry/
                                     Communication   Application
 Understanding    Problem Solving
                                                                   14. Inquiry/Problem Solving Aki is a participant
11. Mike has a loaded (or unfair) six-sided die.                        on a trivia-based game show. He has an
     He rolls the die 200 times and determines                          equal likelihood on any given trial of being
     the following probabilities for each score:                        asked a question from one of six categories:
     P(1) = 0.11                                                        Hollywood, Strange Places, Number Fun,
     P(2) = 0.02                                                        Who?, Having a Ball, and Write On! Aki
     P(3) = 0.18                                                        feels that he has a 50/50 chance of getting
     P(4) = 0.21                                                        Having a Ball or Strange Places questions
     P(5) = 0.40                                                        correct, but thinks he has a 90% probability
                                                                        of getting any of the other questions right. If
     a) What is P(6)?
                                                                        Aki has to get two of three questions
    b) Mike claims that the odds in favour of                           correct, what are his odds of winning?
         tossing a prime number with this die are
         the same as with a fair die. Do you agree                 15. Inquiry/Problem Solving Use logic and
         with his claim?                                                mathematical reasoning to show that if
     c) Using Mike’s die, devise a game with                            the odds in favour of A are given by
         odds in Mike’s favour that an                                  h                h
                                                                        ᎏᎏ, then P(A) = ᎏᎏ. Support your
         unsuspecting person would be tempted                           k               h+k
         to play. Use probabilities to show that                        reasoning with an example.
         the game is in Mike’s favour. Explain
         why a person who does not realize that
         the die is loaded might be tempted by
         this game.



                                                                                                 6.2 Odds • MHR    319
6.3            Probabilities Using Counting Techniques

  How likely is it that, in a game of cards, you will be dealt just the hand that you
  need? Most card players accept this question as an unknown, enjoying the
  unpredictability of the game, but it can also be interesting to apply counting
  analysis to such problems.

  In some situations, the possible outcomes are not easy or convenient to count
  individually. In many such cases, the counting techniques of permutations and
  combinations (see Chapters 4 and 5, respectively) can be helpful for calculating
  theoretical probabilities, or you can use a simulation to determine an empirical
  probability.

        I N V E S T I G AT E & I N Q U I R E : Fishing Simulation

        Suppose a pond has only three types of fish: catfish, trout, and
        bass, in the ratio 5:2:3. There are 50 fish in total. Assuming you
        are allowed to catch only three fish before throwing them back,
        consider the following two events:
        • event A = {catching three trout}
        • event B = {catching the three types of fish, in alphabetical order}

         1. Carry out the following probability experiment, independently
           or with a partner. You can use a hat or paper bag to represent
           the pond, and some differently coloured chips or markers to
           represent the fish. How many of each type of fish should you
           release into the pond? Count out the appropriate numbers
           and shake the container to simulate the fish swimming
           around.

         2. Draw a tree diagram to illustrate the different possible outcomes of this
           experiment.

         3. Catch three fish, one at a time, and record the results in a table. Replace all
           three fish and shake the container enough to ensure that they are randomly
           distributed. Repeat this process for a total of ten trials.

         4. Based on these ten trials, determine the empirical probability of event A,
           catching three trout. How accurate do you think this value is? Compare your
           results with those of the rest of the class. How can you obtain a more accurate
           empirical probability?

         5. Repeat step 4 for event B, which is to catch a bass, catfish, and trout in order.



  320      MHR • Introduction to Probability
6. Perform step 3 again for 10 new trials. Calculate the empirical probabilities
        of events A and B, based on your 20 trials. Do you think these probabilities
        are more accurate than those from 10 trials? Explain why or why not.

     7. If you were to repeat the experiment for 50 or 100 trials, would your results
        be more accurate? Why or why not?

     8. In this investigation, you knew exactly how many of each type of fish were in
        the pond because they were counted out at the beginning. Describe how you
        could use the techniques of this investigation to estimate the ratios of
        different species in a real pond.


This section examines methods for determining the theoretical probabilities of
successive or multiple events.

Example 1 Using Permutations

Two brothers enter a race with five friends. The racers draw lots to determine
their starting positions. What is the probability that the older brother will start
in lane 1 with his brother beside him in lane 2?

Solution

A permutation nPr, or P(n, r), is the number of ways to select r objects from a set
of n objects, in a certain order. (See Chapter 4 for more about permutations.)
The sample space is the total number of ways the first two lanes can be
occupied. Thus,
n(S) = 7P2
            7!
      = ᎏ
         (7 − 2)!
         7!
      = ᎏᎏ
         5!
         7 × 6 × (5!)
      = ᎏᎏ
              5!
      = 42

The specific outcome of the older brother starting in lane 1 and the younger
brother starting in lane 2 can only happen one way, so n(A) = 1. Therefore,
         n(A)
P(A) = ᎏ
         n(S)
         1
      = ᎏᎏ
        42
The probability that the older brother will start in lane 1 next to his brother in
           1
lane 2 is ᎏᎏ, or approximately 2.3%.
          42

                                                          6.3 Probabilities Using Counting Techniques • MHR   321
Example 2 Probability Using Combinations

A focus group of three members is to be randomly selected from a medical
team consisting of five doctors and seven technicians.
a) What is the probability that the focus group will be comprised of
    doctors only?
b)    What is the probability that the focus group will not be comprised of
      doctors only?

Solution

                                                 ΂ r ΃, is the number of ways to
                                                   n
a) A combination nCr, also written C(n, r) or

      select r objects from a set of n objects, in any order. (See Chapter 5 for more
      about combinations.) Let event A be selecting three doctors to form the
      focus group. The number of possible ways to make this selection is

      n(A) = 5C3
                 5!
           = ᎏᎏ
             3!(5 − 3)!
             5 × 4 × 3!
           = ᎏᎏ
               3! × 2!
             20
           = ᎏᎏ
              2
           = 10

      However, the focus group can consist of any three people from the team of 12.
      n(S) = 12C3
                  12!
           = ᎏᎏ
              3!(12 − 3)!
              12 × 11 × 10 × 9!
           = ᎏᎏ
                   3! × 9!
             1320
           = ᎏᎏ
                6
           = 220

      The probability of selecting a focus group of doctors only is
              n(A)
       P(A) = ᎏ
              n(S)
               10
            = ᎏᎏ
              220
               1
            = ᎏᎏ
              22
                                                                                 1
      The probability of selecting a focus group consisting of three doctors is ᎏᎏ,
                                                                                22
      or approximately 0.045.

322     MHR • Introduction to Probability
b) Either the focus group is comprised of doctors only, or it is not.            Project
   Therefore, the probability of the complement of A, P(A′), gives               Prep
   the desired result.
                                                                                 When you determine the
   P(A′ ) = 1 − P(A)                                                             classical probabilities for
                 1                                                               your probability project,
          = 1 − ᎏᎏ
                22                                                               you may need to apply
             21                                                                  the counting techniques
          = ᎏ
             22                                                                  of permutations and
   So, the probability of selecting a focus group not comprised of               combinations.
                   21
   doctors only is ᎏᎏ, or approximately 0.955.
                   22


Example 3 Probability Using the Fundamental Counting Principle

What is the probability that two or more students out of a class of 24 will have
the same birthday? Assume that no students were born on February 29.

Solution 1 Using Pencil and Paper
The simplest method is to find the probability of the complementary event that
no two people in the class have the same birthday.

Pick two students at random. The second student has a different birthday than
the first for 364 of the 365 possible birthdays. Thus, the probability that the
                                          364
two students have different birthdays is ᎏᎏ. Now add a third student. Since
                                          365
there are 363 ways this person can have a different birthday from the other
two students, the probability that all three students have different birthdays
   364 363
is ᎏᎏ × ᎏᎏ. Continuing this process, the probability that none of the
   365 365
24 people have the same birthday is
          n(A′)
P(A′) = ᎏ
          n(S)
          364    363     362          342
      = ᎏ × ᎏ × ᎏ ×…× ᎏ
          365    365     365          365
      =⋅ 0.462

 P(A) = 1 − P(A′)
       = 1 − 0.462
       = 0.538
The probability that at least two people in the group have the same birthday
is approximately 0.538.




                                                        6.3 Probabilities Using Counting Techniques • MHR   323
Solution 2 Using a Graphing Calculator

Use the iterative functions of a graphing calculator to evaluate the
formula above much more easily. The prod( function on the LIST
MATH menu will find the product of a series of numbers. The
seq( function on the LIST OPS menu generates a sequence for the
range you specify. Combining these two functions allows you to
calculate the probability in a single step.


   Key Concepts

   • In probability experiments with many possible outcomes, you can apply the
     fundamental counting principle and techniques using permutations and
     combinations.

   • Permutations are useful when order is important in the outcomes;
     combinations are useful when order is not important.

   Communicate Your Understanding

      1. In the game of bridge, each player is dealt 13 cards out of the deck of 52.
         Explain how you would determine the probability of a player receiving
         a) all hearts          b) all hearts in ascending order

      2. a) When should you apply permutations in solving probability problems,
            and when should you apply combinations?
         b) Provide an example of a situation where you would apply permutations
            to solve a probability problem, other than those in this section.
         c) Provide an example of a situation where you would apply combinations
            to solve a probability problem, other than those in this section.


Practise                                                     3. A fruit basket contains five red apples and
                                                                   three green apples. Without looking, you
 A                                                                 randomly select two apples. What is the
 1. Four friends, two females and two males, are                   probability that
      playing contract bridge. Partners are                        a) you will select two red apples?
      randomly assigned for each game. What is                     b) you will not select two green apples?
      the probability that the two females will be
      partners for the first game?                            4. Refer to Example 1. What is the probability
                                                                   that the two brothers will start beside each
 2. What is the probability that at least two
                                                                   other in any pair of lanes?
      out of a group of eight friends will have the
      same birthday?


324     MHR • Introduction to Probability
Apply, Solve, Communicate                               b) What is the probability that the friends
                                                           will arrive in order of ascending age?
B
                                                        c) What assumptions must be made in parts
5. An athletic committee with three members                a) and b)?
    is to be randomly selected from a group of
    six gymnasts, four weightlifters, and eight      9. A hockey team has two goalies, six defenders,
    long-distance runners. Determine the               eight wingers, and four centres. If the team
    probability that                                   randomly selects four players to attend a
    a) the committee is comprised entirely of
                                                       charity function, what is the likelihood that
       runners                                          a) they are all wingers?
    b) the committee is represented by each of          b) no goalies or centres are selected?
       the three types of athletes
                                                    10. Application A lottery promises to award
6. A messy drawer contains three black socks,          ten grand-prize trips to Hawaii and sells
    five blue socks, and eight white socks, none        5 400 000 tickets.
    of which are paired up. If the owner grabs          a) Determine the probability of winning a
    two socks without looking, what is the                 grand prize if you buy
    probability that both will be white?                   i)   1 ticket
7. a) A family of nine has a tradition of                  ii) 10 tickets
       drawing two names from a hat to see                 iii) 100 tickets
       whom they will each buy presents for. If         b) Communication How many tickets do
       there are three sisters in the family, and          you need to buy in order to have a 5%
       the youngest sister is always allowed the           chance of winning a grand prize? Do you
       first draw, determine the probability that           think this strategy is sensible? Why or
       the youngest sister will draw both of the           why not?
       other two sisters’ names. If she draws her
                                                        c) How many tickets do you need to ensure
       own name, she replaces it and draws
                                                           a 50% chance of winning?
       another.
    b) Suppose that the tradition is modified        11. Suki is enrolled in one data-management
       one year, so that the first person whose         class at her school and Leo is in another. A
       name is drawn is to receive a “main”            school quiz team will have four volunteers,
       present, and the second a less expensive,       two randomly selected from each of the two
       “fun” present. Determine the probability        classes. Suki is one of five volunteers from
       that the youngest sister will give a main       her class, and Leo is one of four volunteers
       present to the middle sister and a fun          from his. Calculate the probability of the
       present to the eldest sister.                   two being on the team and explain the steps
                                                       in your calculation.
8. Application
   a) Laura, Dave, Monique, Marcus, and
       Sarah are going to a party. What is the
       probability that two of the girls will
       arrive first?




                                                     6.3 Probabilities Using Counting Techniques • MHR   325
12. a) Suppose 4 of the 22 tagged bucks are                                  c) Could the random-number generator of
         pte
    ha                 randomly chosen for a behaviour study.                             a graphing calculator be used to simulate
C


           r



                       What is the probability that                                       this investigation? If so, explain how. If
           m
P




r
    oble
                       i)      all four bucks have the cross-hatched                      not, explain why.
                               antlers?                                                d) Outline the steps you would use to
                       ii) at least one buck has cross-hatched                            model this problem with software such
                               antlers?                                                   as FathomTM or a spreadsheet.
                  b) If two of the seven cross-hatched males                           e) Is the assumption that the fish are
                       are randomly selected for a health study,                          randomly distributed likely to be
                       what is the probability that the eldest of                         completely correct? Explain. What other
                       the seven will be selected first, followed                          assumptions might affect the accuracy of
                       by the second eldest?                                              the calculated probabilities?

                                                                                   15. A network of city streets forms square
                     ACHIEVEMENT CHECK
                                                                                      blocks as shown in the diagram.
                Knowledge/       Thinking/Inquiry/                                           Library
                                                     Communication   Application
               Understanding      Problem Solving

          13. Suppose a bag contains the letters to spell
                   probability.
                   a) How many four-letter arrangements are
                       possible using these letters?
                                                                                                                    Pool
                  b) What is the probability that Barb
                       chooses four letters from the bag in the                       Jeanine leaves the library and walks toward
                       order that spell her name?                                     the pool at the same time as Miguel leaves
                   c) Pick another four-letter arrangement                            the pool and walks toward the library.
                       and calculate the probability that it is                       Neither person follows a particular route,
                       chosen.                                                        except that both are always moving toward
                                                                                      their destination. What is the probability
                  d) What four-letter arrangement would be
                                                                                      that they will meet if they both walk at the
                       most likely to be picked? Explain your
                                                                                      same rate?
                       reasoning.
                                                                                   16. Inquiry/Problem Solving A committee is
           C                                                                          formed by randomly selecting from eight
          14. Communication Refer to the fishing
                                                                                      nurses and two doctors. What is the
                  investigation at the beginning of this section.                     minimum committee size that ensures at
                                                                                      least a 90% probability that it will not be
                  a) Determine the theoretical probability of
                                                                                      comprised of nurses only?
                       i)      catching three trout
                       ii) catching a bass, catfish, and trout in
                               alphabetical order
                  b) How do these results compare with the
                       empirical probabilities from the
                       investigation? How do you account for
                       any differences?

         326          MHR • Introduction to Probability
6.4         Dependent and Independent Events

  If you have two examinations next Tuesday, what is the probability that you will
  pass both of them? How can you predict the risk that a critical network server
  and its backup will both fail? If you flip an ordinary coin repeatedly and get
  heads 99 times in a row, is the next toss almost certain to come up tails?

  In such situations, you are dealing with compound events involving two or
  more separate events.

      I N V E S T I G AT E & I N Q U I R E : G e t t i n g O u t o f J a i l i n M O N O P O LY ®

      While playing MONOPOLY® for the first
      time, Kenny finds himself in jail. To get out
      of jail, he needs to roll doubles on a pair of
      standard dice.

       1. Determine the probability that Kenny
          will roll doubles on his first try.

       2. Suppose that Kenny fails to roll doubles
          on his first two turns in jail. He reasons
          that on his next turn, his odds are now
          50/50 that he will get out of jail. Explain
          how Kenny has reasoned this.

       3. Do you agree or disagree with Kenny’s
          reasoning? Explain.

       4. What is the probability that Kenny will
          get out of jail on his third attempt?

       5. After how many turns is Kenny certain to
          roll doubles? Explain.

       6. Kenny’s opponent, Roberta, explains to
          Kenny that each roll of the dice is an
          independent event and that, since the
          relatively low probability of rolling doubles never changes from trial to trial,
          Kenny may never get out of jail and may as well just forfeit the game. Explain
          the flaws in Roberta’s analysis.




                                                                   6.4 Dependent and Independent Events • MHR   327
In some situations involving compound events, the occurrence of one event
has no effect on the occurrence of another. In such cases, the events are
independent.

Example 1 Simple Independent Events

a)    A coin is flipped and turns up heads. What is the probability that the
      second flip will turn up heads?
b)    A coin is flipped four times and turns up heads each time. What is the
      probability that the fifth trial will be heads?
c)    Find the probability of tossing five heads in a row.
d)    Comment on any difference between your answers to parts b) and c).


Solution
a) Because these events are independent, the outcome of the first toss has no
      effect on the outcome of the second toss. Therefore, the probability of
      tossing heads the second time is 0.5.

b) Although you might think “tails has to come up sometime,” there is still a
      50/50 chance on each independent toss. The coin has no memory of the
      past four trials! Therefore, the fifth toss still has just a 0.5 probability of
      coming up heads.

c) Construct a tree diagram to represent five tosses of the coin.


                                              H
                                      H
                                              T
                        H                     H
                                      T
              H                               T
                                              H
                                      H
                         T                    T
                                              H
                                      T               There is an equal
                                              T       number of outcomes
      H
                                              H       in which the first flip
                                      H               turns up tails.
                                              T
                        H                     H
                                      T
              T                               T
                                              H
                                      H
                         T                    T
                                              H
                                      T
                                              T




328       MHR • Introduction to Probability
The number of outcomes doubles with each trial. After the fifth toss,
   there are 25 or 32 possible outcomes, only one of which is five heads in a
   row. So, the probability of five heads in a row, prior to any coin tosses,
       1
   is ᎏᎏ or 0.031 25.
      32
d) The probability in part c) is much less than in part b). In part b), you
   calculate only the probability for the fifth trial on its own. In part c), you
   are finding the probability that every one of five separate events actually
   happens.


Example 2 Probability of Two Different Independent Events

A coin is flipped while a die is rolled. What is the probability of flipping
heads and rolling 5 in a single trial?

Solution

Here, two independent events occur in a single trial. Let A be the event of
flipping heads, and B be the event of rolling 5. The notation P(A and B)
represents the compound, or joint, probability that both events, A and B,
will occur simultaneously. For independent events, the probabilities can
simply be multiplied together.

P(A and B) = P(A) × P(B)
             1 1
           = ᎏᎏ × ᎏᎏ
             2 6
              1
           = ᎏᎏ
             12
                                                                    1
The probability of simultaneously flipping heads while rolling 5 is ᎏᎏ or
                                                                   12
approximately 8.3%


In general, the compound probability of two independent events can be
calculated using the product rule for independent events:

 P(A and B) = P(A) × P(B)

From the example above, you can see that the product rule for independent
events agrees with common sense. The product rule can also be derived
mathematically from the fundamental counting principle (see Chapter 4).




                                                             6.4 Dependent and Independent Events • MHR   329
Proof:
A and B are separate events and so they correspond to separate sample spaces,
SA and SB.

Their probabilities are thus
       n(A)         n(B)
P(A) = ᎏ and P(B) = ᎏ .
       n(SA )       n(SB )

Call the sample space for the compound event S, as usual.

You know that
             n(A and B)
P(A and B) = ᎏᎏ                         (1)
                n(S)
Because A and B are independent, you can apply the fundamental counting
principle to get an expression for n(A and B).

n(A and B) = n(A) × n(B)                (2)

Similarly, you can also apply the fundamental counting principle to get an
expression for n(S).

n(S) = n(SA ) × n(SB )                  (3)

Substitute equations (2) and (3) into equation (1).

              n(A)n(B)
P(A and B) = ᎏᎏ
             n(SA )n(SB )
                 n(A)     n(B)
              = ᎏ × ᎏ
                 n(SA )   n(SB )
              = P(A) × P(B)


Example 3 Applying the Product Rule for Independent Events

Soo-Ling travels the same route to work every day. She has determined that
there is a 0.7 probability that she will wait for at least one red light and that
there is a 0.4 probability that she will hear her favourite new song on her way
to work.
a) What is the probability that Soo-Ling will not have to wait at a red light
    and will hear her favourite song?
b)    What are the odds in favour of Soo-Ling having to wait at a red light and
      not hearing her favourite song?




330     MHR • Introduction to Probability
Solution

a) Let A be the event of Soo-Ling having to wait at a red light, and B be the
   event of hearing her favourite song. Assume A and B to be independent events.
   In this case, you are interested in the combination A′ and B.
   P(A′ and B) = P(A′) × P(B)
               = (1 − P(A)) × P(B)
               = (1 − 0.7) × 0.4
               = 0.12
   There is a 12% chance that Soo-Ling will hear her favourite song and not
   have to wait at a red light on her way to work.

b) P(A and B′) = P(A) × P(B′)
                = P(A) × (1 − P(B))
                = 0.7 × (1 − 0.4)
                = 0.42
   The probability of Soo-Ling having to wait at a red light and not hearing her
   favourite song is 42%.
   The odds in favour of this happening are
                       P(A and B′)
   odds in favour = ᎏᎏ
                     1 − P(A and B′)
                         42%
                   = ᎏᎏ
                     100% − 42%
                     42
                   = ᎏᎏ
                     58
                     21
                   = ᎏᎏ
                     29
   The odds in favour of Soo-Ling having to wait at a red light and not hearing
   her favourite song are 21:29.


In some cases, the probable outcome of an event, B, depends directly on the
outcome of another event, A. When this happens, the events are said to be
dependent. The conditional probability of B, P(B | A), is the probability that
B occurs, given that A has already occurred.

Example 4 Probability of Two Dependent Events

A professional hockey team has eight wingers. Three of these wingers are 30-goal
scorers, or “snipers.” Every fall the team plays an exhibition match with the club’s
farm team. In order to make the match more interesting for the fans, the coaches
agree to select two wingers at random from the pro team to play for the farm
team. What is the probability that two snipers will play for the farm team?

                                                            6.4 Dependent and Independent Events • MHR   331
Solution

Let A = {first winger is a sniper} and B = {second winger is a sniper}. Three
of the eight wingers are snipers, so the probability of the first winger selected
being a sniper is
        3
P(A) = ᎏᎏ
        8
If the first winger selected is a sniper, then there are seven remaining wingers
to choose from, two of whom are snipers. Therefore,
            2
P(B | A) = ᎏᎏ
            7
Applying the fundamental counting principle, the probability of randomly
selecting two snipers for the farm team is the number of ways of selecting two
snipers divided by the number of ways of selecting any two wingers.
               3×2
P(A and B) = ᎏ
               8×7
                3
             = ᎏ
               28
             3
There is a ᎏᎏ or 10.7% probability that two professional snipers will play for
            28
the farm team in the exhibition game.


Notice in Example 4 that, when two events A and B are                 Project
dependent, you can still multiply probabilities to find the            Prep
probability that they both happen. However, you must use
the conditional probability for the second event. Thus,               When designing your game for
the probability that both events will occur is given by the           the probability project, you may
product rule for dependent events:                                    decide to include situations
                                                                      involving independent or
P(A and B) = P(A) × P(B | A)                                          dependent events. If so, you will
                                                                      need to apply the appropriate
This reads as: “The probability that both A and B will occur          product rule in order to
equals the probability of A times the probability of B given          determine classical probabilities.
that A has occurred.”

Example 5 Conditional Probability From Compound Probability

Serena’s computer sometimes crashes while she is trying to use her e-mail
program, OutTake. When OutTake “hangs” (stops responding to commands),
Serena is usually able to close OutTake without a system crash. In a computer
magazine, she reads that the probability of OutTake hanging in any 15-min
period is 2.5%, while the chance of OutTake and the operating system failing
together in any 15-min period is 1%. If OutTake is hanging, what is the
probability that the operating system will crash?


332   MHR • Introduction to Probability
Solution

Let event A be OutTake hanging, and event B be an operating system failure.
Since event A can trigger event B, the two events are dependent. In fact,
you need to find the conditional probability P(B | A). The data from the
magazine tells you that P(A) = 2.5%, and P(A and B) = 1%. Therefore,
P(A and B) = P(A) × P(B | A)
       1% = 2.5% × P(B | A)
                1%
   P(B | A) = ᎏ
               2.5%
            = 0.4
There is a 40% chance that the operating system will crash when OutTake is
hanging.


Example 5 suggests a useful rearrangement of the product rule for dependent events.

            P(A and B)
P(B | A ) = ᎏᎏ
               P(A)
This equation is sometimes used to define the conditional probability P(B | A ).


   Key Concepts

   • If A and B are independent events, then the probability of both occurring is
     given by P(A and B) = P(A) × P(B).

   • If event B is dependent on event A, then the conditional probability of B given
     A is P(B | A). In this case, the probability of both events occurring is given by
     P(A and B) = P(A) × P(B | A).

   Communicate Your Understanding

    1. Consider the probability of randomly drawing an ace from a standard deck
      of cards. Discuss whether or not successive trials of this experiment are
      independent or dependent events. Consider cases in which drawn cards are
       a) replaced after each trial
       b) not replaced after each trial

    2. Suppose that for two particular events A and B, it is true that P(B A) = P(B).
                                                                          |
      What does this imply about the two events? (Hint: Try substituting this
      equation into the product rule for dependent events.)




                                                            6.4 Dependent and Independent Events • MHR   333
Practise                                                    5. a) Rocco and Biff are two koala bears
                                                                  participating in a series of animal
 A                                                                behaviour tests. They each have 10 min
 1. Classify each of the following as                             to solve a maze. Rocco has an 85%
      independent or dependent events.                            probability of succeeding if he can smell
                   First Event            Second Event            the eucalyptus treat at the other end. He
      a)      Attending a rock        Passing a final              can smell the treat 60% of the time. Biff
              concert on Tuesday      examination the             has a 70% chance of smelling the treat,
              night                   following                   but when he does, he can solve the maze
                                      Wednesday morning           only 75% of the time. Neither bear will
      b)      Eating chocolate        Winning at checkers         try to solve the maze unless he smells the
      c)      Having blue eyes        Having poor hearing         eucalyptus. Determine which koala bear
      d)      Attending an            Improving personal          is more likely to enjoy a tasty treat on
              employee training       productivity                any given trial.
              session
      e)      Graduating from                                  b) Communication Explain how you arrived
                                      Running a marathon
              university                                          at your conclusion.
      f)      Going to a mall         Purchasing a new
                                      shirt                 6. Shy Tenzin’s friends assure him that if he
                                                              asks Mikala out on a date, there is an 85%
 2. Amitesh estimates that he has a 70% chance
                                                              chance that she will say yes. If there is a
      of making the basketball team and a 20%
                                                              60% chance that Tenzin will summon the
      chance of having failed his last geometry
                                                              courage to ask Mikala out to the dance next
      quiz. He defines a “really bad day” as one in
                                                              week, what are the odds that they will be
      which he gets cut from the team and fails his
                                                              seen at the dance together?
      quiz. Assuming that Amitesh will receive
      both pieces of news tomorrow, how likely is           7. When Ume’s hockey team uses a “rocket
      it that he will have a really bad day?                  launch” breakout, she has a 55% likelihood
                                                              of receiving a cross-ice pass while at full
 3. In the popular dice game Yahtzee®, a
                                                              speed. When she receives such a pass, the
      Yahtzee occurs when five identical numbers                                                           1
      turn up on a set of five standard dice. What             probability of getting her slapshot away is ᎏᎏ.
                                                                                                          3
      is the probability of rolling a Yahtzee on one          Ume’s slapshot scores 22% of the time.
      roll of the five dice?                                   What is the probability of Ume scoring with
                                                              her slapshot when her team tries a rocket
Apply, Solve, Communicate                                     launch?
 B
                                                            8. Inquiry/Problem Solving Show that if A and
 4. There are two tests for a particular antibody.            B are dependent events, then the conditional
      T A gives a correct result 95% of the time.
        est                                                   probability P(A | B) is given by
      T B is accurate 89% of the time. If a patient
        est                                                              P(A and B)
      is given both tests, find the probability that           P(A | B) = ᎏᎏ .
                                                                            P(B)
      a) both tests give the correct result
      b) neither test gives the correct result
      c) at least one of the tests gives the correct
           result

334        MHR • Introduction to Probability
9. A consultant’s study found Megatran’s call       14. Application A critical circuit in a
               centre had a 5% chance of transferring a             communication network relies on a set of
               call about schedules to the lost articles            eight identical relays. If any one of the relays
               department by mistake. The same study                fails, it will disrupt the entire network. The
               shows that, 1% of the time, customers                design engineer must ensure a 90%
               calling for schedules have to wait on hold,          probability that the network will not fail
               only to discover that they have been                 over a five-year period. What is the
               mistakenly transferred to the lost articles          maximum tolerable probability of failure for
               department. What are the chances that a              each relay?
               customer transferred to lost articles will be
                                                                C
               put on hold?
                                                               15. a) Show that if a coin is tossed n times, the
          10. Pinder has examinations coming up in data                probability of tossing n heads is given by
               management and biology. He estimates that
                                                                               ΂΃
                                                                                1 n
               his odds in favour of passing the data-                 P(A) = ᎏᎏ .
                                                                                2
               management examination are 17:3 and his
                                                                    b) What is the probability of getting at least
               odds against passing the biology examination
                                                                       one tail in seven tosses?
               are 3:7. Assume these to be independent
               events.                                         16. What is the probability of not throwing 7 or
               a) What is the probability that Pinder will          doubles for six consecutive throws with a
                  pass both exams?                                  pair of dice?
               b) What are the odds in favour of Pinder
                                                               17. Laurie, an avid golfer, gives herself a 70%
                  failing both exams?                               chance of breaking par (scoring less than 72
               c) What factors could make these two                 on a round of 18 holes) if the weather is
                  events dependent?                                 calm, but only a 15% chance of breaking par
                                                                    on windy days. The weather forecast gives a
          11. Inquiry/Problem Solving How likely is it for
                                                                    40% probability of high winds tomorrow.
               a group of five friends to have the same birth
                                                                    What is the likelihood that Laurie will break
               month? State any assumptions you make for
                                                                    par tomorrow, assuming that she plays one
               your calculation.
                                                                    round of golf?
          12. Determine the probability that a captured
         pte
                                                               18. Application The Tigers are leading the
    ha         deer has the bald patch condition.
                                                                    Storm one game to none in a best-of-five
C


           r




                                                                    playoff series. After a playoff win, the
           m
P




r
    oble

          13. Communication Five different CD-ROM                   probability of the Tigers winning the next
               games, Garble, Trapster, Zoom!, Bungie,              game is 60%, while after a loss, their
               and Blast ’Em, are offered as a promotion            probability of winning the next game drops
               by SugarRush cereals. One game is                    by 5%. The first team to win three games
               randomly included with each box of cereal.           takes the series. Assume there are no ties.
                                                                    What is the probability of the Storm coming
               a) Determine the probability of getting all
                                                                    back to win the series?
                  5 games if 12 boxes are purchased.
               b) Explain the steps in your solution.
               c) Discuss any assumptions that you make
                  in your analysis.
                                                                     6.4 Dependent and Independent Events • MHR   335
6.5            Mutually Exclusive Events

  The phone rings. Jacques is really hoping that it is one of his friends calling
  about either softball or band practice. Could the call be about both?

  In such situations, more than one event could occur during a single trial. You
  need to compare the events in terms of the outcomes that make them up. What
  is the chance that at least one of the events happens? Is the situation “either/or,”
  or can both events occur?

        I N V E S T I G AT E & I N Q U I R E : Baseball Pitches

        Marie, at bat for the Coyotes, is facing Anton, who is
        pitching for the Power Trippers. Anton uses three pitches: a
        fastball, a curveball, and a slider. Marie feels she has a good
        chance of making a base hit, or better, if Anton throws either
        a fastball or a slider. The count is two strikes and three balls.
        In such full-count situations, Anton goes to his curveball one
        third of the time, his slider half as often, and his fastball the
        rest of the time.

         1. Determine the probability of Anton throwing his
            a) curveball                b) slider   c) fastball

         2. a) What is the probability that Marie will get the pitch
                she does not want?
            b) Explain how you can use this information to
                determine the probability that Marie will get a pitch
                she likes.

         3. a) Show another method of determining this probability.
            b) Explain your method.

         4. What do your answers to questions 2 and 3 suggest
            about the probabilities of events that cannot happen
            simultaneously?


  The possible events in this investigation are said to be mutually exclusive (or
  disjoint) since they cannot occur at the same time. The pitch could not be both a
  fastball and a slider, for example. In this particular problem, you were interested in
  the probability of either of two favourable events. You can use the notation P(A or B)
  to stand for the probability of either A or B occurring.


  336      MHR • Introduction to Probability
Example 1 Probability of Mutually Exclusive Events

Teri attends a fundraiser at which 15 T-shirts are being given away as door
prizes. Door prize winners are randomly given a shirt from a stock of 2 black
shirts, 4 blue shirts, and 9 white shirts. Teri really likes the black and blue shirts,
but is not too keen on the white ones. Assuming that Teri wins the first door
prize, what is the probability that she will get a shirt that she likes?

Solution

Let A be the event that Teri wins a black shirt, and B be the event that she
wins a blue shirt.
         2                  4
P(A) = ᎏ and P(B) = ᎏ
        15                 15
Teri would be happy if either A or B occurred.
There are 2 + 4 = 6 non-white shirts, so
               6
 P(A or B) = ᎏ
              15
              2
           = ᎏᎏ
              5
                                                          2
The probability of Teri winning a shirt that she likes is ᎏᎏ or 40%. Notice that
                                                          5
this probability is simply the sum of the probabilities of the two mutually
exclusive events.


When events A and B are mutually exclusive, the probability that A or B
will occur is given by the addition rule for mutually exclusive events:

P(A or B) = P(A) + P(B)

A Venn diagram shows mutually exclusive events as non-overlapping,
                                                                                       S
or disjoint. Thus, you can apply the additive counting principle (see
Chapter 4) to prove this rule.                                                              A          B

Proof:
If A and B are mutually exclusive events, then

            n(A or B)
P(A or B) = ᎏᎏ
               n(S)
            n(A) + n(B)
          = ᎏᎏ                     A and B are disjoint sets, and thus share no elements.
                n(S )
            n(A)      n(B)
          = ᎏ + ᎏ
             n(S)     n(S)
          = P(A) + P(B)

                                                                          6.5 Mutually Exclusive Events • MHR   337
In some situations, events are non-mutually exclusive, which means                          Second die
that they can occur simultaneously. For example, consider a board game                     1 2 3 4 5 6
in which you need to roll either an 8 or doubles, using two dice.
                                                                                       1   2 3 4 5 6 7
Notice that in one outcome, rolling two fours, both events have                        2 3 4 5 6 7 8
occurred simultaneously. Hence, these events are not mutually                  First   3 4 5 6 7 8 9
exclusive. Counting the outcomes in the diagram shows that the                 die     4 5 6 7 8 9 10
                                                 10     5
probability of rolling either an 8 or doubles is ᎏ or ᎏ . You                          5 6 7 8 9 10 11
                                                 36    18                              6 7 8 9 10 11 12
need to take care not to count the (4, 4) outcome twice. You are
applying the principle of inclusion and exclusion, which was explained
in greater detail in Chapter 5.


Example 2 Probability of Non-Mutually Exclusive Events

A card is randomly selected from a standard deck of cards. What is the
probability that either a heart or a face card (jack, queen, or king) is selected?

Solution

Let event A be that a heart is selected, and event B be that a face card is
selected.
        13               12
P(A) = ᎏ and P(B) = ᎏ
        52               52
If you add these probabilities, you get
               13    12
P(A) + P(B) = ᎏ + ᎏ
               52    52
               25
             = ᎏ
               52
However, since the jack, queen, and king of hearts are in both A and B, the
sum P(A) + P(B) actually includes these outcomes twice.
A ♣ 2 ♣ 3 ♣ 4 ♣ 5 ♣ 6 ♣ 7 ♣ 8 ♣ 9 ♣ 10 ♣ J ♣ Q ♣ K ♣

A ♦ 2 ♦ 3 ♦ 4 ♦ 5 ♦ 6 ♦ 7 ♦ 8 ♦ 9 ♦ 10 ♦ J ♦ Q ♦ K ♦

A ♥ 2 ♥ 3 ♥ 4 ♥ 5 ♥ 6 ♥ 7 ♥ 8 ♥ 9 ♥ 10 ♥ J ♥ Q ♥ K ♥

A ♠ 2 ♠ 3 ♠ 4 ♠ 5 ♠ 6 ♠ 7 ♠ 8 ♠ 9 ♠ 10 ♠ J ♠ Q ♠ K ♠

Based on the diagram, the actual theoretical probability of drawing either
                          22     11
a heart or a face card is ᎏᎏ, or ᎏᎏ. You can find the correct value by subtracting
                          52     26
the probability of selecting the three elements that were counted twice.




338   MHR • Introduction to Probability
13  12   3                 S
P(A or B) = ᎏ + ᎏ − ᎏ
            52  52  52
            22                             Hearts      Face card
          = ᎏᎏ                                 13           12
            52                             P = –––      P = –––
                                               52           52
            11
          = ᎏᎏ
            26
                                            Heart and face card
The probability that either a heart                    3
                                                  P = –––
                              11                      52
or a face card is selected is ᎏᎏ.
                              26



When events A and B are non-mutually exclusive, the probability that            S
A or B will occur is given by the addition rule for non-mutually
exclusive events:                                                                         A               B

P(A or B) = P(A) + P(B) − P(A and B)

                                                                                                A and B

Example 3 Applying the Addition Rule for                           Project
             Non-Mutually Exclusive Events                         Prep

An electronics manufacturer is testing a new product to see        When analysing the possible
whether it requires a surge protector. The tests show that a       outcomes for your game in the
voltage spike has a 0.2% probability of damaging the               probability project, you may need to
product’s power supply, a 0.6% probability of damaging             consider mutually exclusive or non-
downstream components, and a 0.1% probability of                   mutually exclusive events. If so, you
damaging both the power supply and other components.               will need to apply the appropriate
Determine the probability that a voltage spike will damage         addition rule to determine theoretical
the product.                                                       probabilities.

Solution

Let A be damage to the power supply and C be                       S
damage to other components.
                                                                           A               C
The overlapping region represents the probability that                0.2           0.1   0.6
a voltage surge damages both the power supply and another
component. The probability that either A or C occurs is
given by
P(A or C) = P(A) + P(C) − P(A and C)
           = 0.2% + 0.6% − 0.1%
           = 0.7%
There is a 0.7% probability that a voltage spike will damage the product.


                                                                       6.5 Mutually Exclusive Events • MHR    339
Key Concepts

   • If A and B are mutually exclusive events, then the probability of either A or B
     occurring is given by P(A or B) = P(A) + P(B).

   • If A and B are non-mutually exclusive events, then the probability of either
     A or B occurring is given by P(A or B) = P(A) + P(B) − P(A and B).

   Communicate Your Understanding

      1. Are an event and its complement mutually exclusive? Explain.

      2. Explain how to determine the probability of randomly throwing either a
           composite number or an odd number using a pair of dice.

      3. a) Explain the difference between independent events and mutually
               exclusive events.
           b) Support your explanation with an example of each.
            c) Why do you add probabilities in one case and multiply them in the other?



Practise                                                      2. Nine members of a baseball team are
                                                                randomly assigned field positions. There are
 A                                                              three outfielders, four infielders, a pitcher,
 1. Classify each pair of events as mutually                    and a catcher. Troy is happy to play any
      exclusive or non-mutually exclusive.                      position except catcher or outfielder.
                                                                Determine the probability that Troy will
                     Event A                 Event B
              Randomly drawing        Randomly drawing
                                                                be assigned to play
      a)
              a grey sock from a      a wool sock from a         a) catcher
              drawer                  drawer
                                                                b) outfielder
      b)      Randomly selecting      Randomly selecting
              a student with          a student on the           c) a position he does not like
              brown eyes              honour roll
      c)      Having an even          Having an odd           3. A car dealership analysed its customer
              number of students      number of students        database and discovered that in the last
              in your class           in your class             model year, 28% of its customers chose a
      d)      Rolling a six with a    Rolling a prime           2-door model, 46% chose a 4-door model,
              die                     number with a die         19% chose a minivan, and 7% chose a
      e)      Your birthday           Your birthday
              falling on a            falling on a
                                                                4-by-4 vehicle. If a customer was selected
              Saturday next year      weekend next year         randomly from this database, what is the
      f)      Getting an A on the     Passing the next test     probability that the customer
              next test                                          a) bought a 4-by-4 vehicle?
      g)      Calm weather at         Stormy weather at
              noon tomorrow           noon tomorrow             b) did not buy a minivan?
      h)      Sunny weather next      Rainy weather next         c) bought a 2-door or a 4-door model?
              week                    week
                                                                d) bought a minivan or a 4-by-4 vehicle?

340        MHR • Introduction to Probability
Apply, Solve, Communicate                                           7. Application In an animal-behaviour study,
                                                                        hamsters were tested with a number
B                                                                       of intelligence tasks, as shown in the
4. As a promotion, a resort has a draw for free                         table below.
    family day-passes. The resort considers July,
                                                                        Number of Tests Number of Hamsters
    August, March, and December to be
                                                                               0                    10
    “vacation months.”
                                                                               1                     6
    a) If the free passes are randomly dated,
                                                                               2                     4
       what is the probability that a day-pass
                                                                               3                     3
       will be dated within
                                                                           4 or more                 5
       i) a vacation month?
       ii) June, July, or August                                        If a hamster is randomly chosen from this
                                                                        study group, what is the likelihood that the
    b) Draw a Venn diagram of the events in
                                                                        hamster has participated in
       part a).
                                                                        a) exactly three tests?
5. A certain provincial park has 220 campsites.                         b) fewer than two tests?
    A total of 80 sites have electricity. Of the 52
                                                                        c) either one or two tests?
    sites on the lakeshore, 22 of them have
    electricity. If a site is selected at random, what                  d) no tests or more than three tests?
    is the probability that                                         8. Communication
     a) it will be on the lakeshore?
                                                                       a) Prove that, if A and B are non-mutually
    b) it will have electricity?                                           exclusive events, the probability of either
    c) it will either have electricity or be on the                        A or B occurring is given by
       lakeshore?                                                          P(A or B) = P(A) + P(B) − P(A and B).
    d) it will be on the lakeshore and not have                         b) What can you conclude if P(A and B) = 0?
       electricity?                                                        Give reasons for your conclusion.

6. A market-research firm monitored 1000                             9. Inquiry/Problem Solving Design a game in
    television viewers, consisting of 800 adults                        which the probability of drawing a winning
    and 200 children, to evaluate a new comedy                          card from a standard deck is between 55%
    series that aired for the first time last week.                      and 60%.
    Research indicated that 250 adults and
                                                                   10. Determine the probability that a captured
    148 children viewed some or all of the                        pte
                                                             ha         deer has either cross-hatched antlers or bald
    program. If one of the 1000 viewers was
                                                         C


                                                                    r




                                                                        patches. Are these events mutually exclusive?
    selected, what is the probability that
                                                                    m
                                                         P




                                                         r
                                                             oble
                                                                        Why or why not?
    a) the viewer was an adult who did not
       watch the new program?                                      11. The eight members of the debating club
    b) the viewer was a child who watched                               pose for a yearbook photograph. If they line
       the new program?                                                 up randomly, what is the probability that
    c) the viewer was an adult or someone                               a) either Hania will be first in the row or
       who watched the new program?                                        Aaron will be last?
                                                                        b) Hania will be first and Aaron will not be
                                                                           last?

                                                                                   6.5 Mutually Exclusive Events • MHR   341
ACHIEVEMENT CHECK                                            C
  Knowledge/     Thinking/Inquiry/
                                     Communication   Application   13. A grade 12 student is selected at random to
 Understanding    Problem Solving
                                                                        sit on a university liaison committee. Of the
12. Consider a Stanley Cup playoff series in                            120 students enrolled in the grade 12
      which the Toronto Maple Leafs hockey                              university-preparation mathematics courses,
      team faces the Ottawa Senators. Toronto                           • 28 are enrolled in data management only
      hosts the first, second, and if needed, fifth                       • 40 are enrolled in calculus only
      and seventh games in this best-of-seven                           • 15 are enrolled in geometry only
      contest. The Leafs have a 65% chance of                           • 16 are enrolled in both data management
      beating the Senators at home in the first                            and calculus
      game. After that, they have a 60% chance                          • 12 are enrolled in both calculus and geometry
      of a win at home if they won the previous                         • 6 are enrolled in both geometry and data
      game, but a 70% chance if they are                                  management
      bouncing back from a loss. Similarly, the                         • 3 are enrolled in all three of data
      Leafs’ chances of victory in Ottawa are                             management, calculus, and geometry
      40% after a win and 45% after a loss.                             a) Draw a Venn diagram to illustrate this
      a) Construct a tree diagram to illustrate                            situation.
         all the possible outcomes of the first                          b) Determine the probability that the
         three games.                                                       student selected will be enrolled in either
      b) Consider the following events:                                     data management or calculus.
         A = {Leafs lose the first game but go                           c) Determine the probability that the
         on to win the series in the fifth game}                             student selected will be enrolled in only
         B = {Leafs win the series in the fifth                              one of the three courses.
         game}
                                                                   14. Application For a particular species of cat,
         C = {Leafs lose the series in the fifth                         the odds against a kitten being born with
         game}                                                          either blue eyes or white spots are 3:1. If the
         Identify all the outcomes that make up                         probability of a kitten exhibiting only one of
         each event, using strings of letters, such                     these traits is equal and the probability of
         as LLSLL. Are any pairs from these                             exhibiting both traits is 10%, what are the
         three events mutually exclusive?                               odds in favour of a kitten having blue eyes?
      c) What is the probability of event A in
                                                                   15. Communication
         part b)?
                                                                        a) A standard deck of cards is shuffled and
      d) What is the chance of the Leafs winning
                                                                            three cards are selected. What is the
         in exactly five games?
                                                                            probability that the third card is either
      e) Explain how you found your answers to
                                                                            a red face card or a king if the king of
         parts c) and d).                                                   diamonds and the king of spades are
                                                                            selected as the first two cards?
                                                                        b) Does this probability change if the first
                                                                            two cards selected are the queen of
                                                                            diamonds and the king of spades? Explain.



342     MHR • Introduction to Probability
16. Inquiry/Problem Solving The table below lists                     v) a male or a graduate in mathematics
   the degrees granted by Canadian universities                          and physical sciences?
   from 1994 to 1998 in various fields of study.                  b) If a male graduate from 1996 is selected
    a) If a Canadian university graduate from                         at random, what is the probability that
       1998 is chosen at random, what is the                          he is graduating in mathematics and
       probability that the student is                                physical sciences?
       i)     a male?                                            c) If a mathematics and physical sciences
       ii) a graduate in mathematics and                              graduate is selected at random from the
              physical sciences?                                      period 1994 to 1996, what is the
                                                                      probability that the graduate is a male?
       iii) a male graduating in mathematics and
              physical sciences?                                 d) Do you think that being a male and
                                                                      graduating in mathematics and physical
       iv) not a male graduating in mathematics
                                                                      sciences are independent events? Give
              and physical sciences?
                                                                      reasons for your hypothesis.

                                                    1994      1995        1996        1997         1998
            Canada                                178 074   178 066     178 116     173 937      172 076
             Male                                  76 470    76 022      75 106      73 041       71 949
             Female                               101 604   102 044     103 010     100 896      100 127
            Social sciences                        69 583    68 685      67 862      66 665       67 019
             Male                                  30 700    29 741      29 029      28 421       27 993
             Female                                38 883    38 944      38 833      38 244       39 026
            Education                              30 369    30 643      29 792      27 807       25 956
             Male                                    9093      9400        8693        8036         7565
             Female                                21 276    21 243      21 099      19 771       18 391
            Humanities                             23 071    22 511      22 357      21 373       20 816
             Male                                    8427      8428        8277        8034         7589
             Female                                14 644    14 083      14 080      13 339       13 227
            Health professions and occupations     12 183    12 473      12 895      13 073       12 658
             Male                                    3475      3461        3517        3460         3514
             Female                                  8708      9012        9378        9613         9144
            Engineering and applied sciences       12 597    12 863      13 068      12 768       12 830
             Male                                  10 285    10 284      10 446      10 125       10 121
             Female                                  2312      2579        2622        2643         2709
            Agriculture and biological sciences    10 087    10 501      11 400      11 775       12 209
             Male                                    4309      4399        4756        4780         4779
             Female                                  5778      6102        6644        6995         7430
            Mathematics and physical sciences        9551      9879        9786        9738         9992
             Male                                    6697      6941        6726        6749         6876
             Female                                  2854      2938        3060        2989         3116
            Fine and applied arts                    5308      5240        5201        5206         5256
             Male                                    1773      1740        1780        1706         1735
             Female                                  3535      3500        3421        3500         3521
            Arts and sciences                        5325      5271        5755        5532         5340
             Male                                    1711      1628        1882        1730         1777
             Female                                  3614      3643        3873        3802         3563


                                                                            6.5 Mutually Exclusive Events • MHR   343
6.6            Applying Matrices to Probability Problems

  In some situations, the probability of an
  outcome depends on the outcome of the
  previous trial. Often this pattern appears in
  stock market trends, weather patterns,
  athletic performance, and consumer habits.
  Dependent probabilities can be calculated
  using Markov chains, a powerful probability
  model pioneered about a century ago by the
  Russian mathematician Andrei Markov.

        I N V E S T I G AT E & I N Q U I R E : Running Late

        Although Marla tries hard to be punctual, the demands of her home life and the
        challenges of commuting sometimes cause her to be late for work. When she is
        late, she tries especially hard to be punctual the next day. Suppose that the
        following pattern emerges: If Marla is punctual on any given day, then there is
        a 70% chance that she will be punctual the next day and a 30% chance that she
        will be late. On days she is late, however, there is a 90% chance that she will be
        punctual the next day and just a 10% chance that she will be late. Suppose
        Marla is punctual on the first day of the work week.

         1. Create a tree diagram of the possible outcomes for the second and third
           days. Show the probability for each branch.

         2. a) Describe two branches in which Marla is punctual on day 3.
            b) Use the product rule for dependent events on page 332 to calculate the
                compound probability of Marla being punctual on day 2 and on day 3.
            c) Find the probability of Marla being late on day 2 and punctual on day 3.
            d) Use the results from parts b) and c) to determine the probability that
                Marla will be punctual on day 3.

         3. Repeat question 2 for the outcome of Marla being late on day 3.

         4. a) Create a 1 × 2 matrix A in which the first element is the probability that
                Marla is punctual and the second element is the probability that she is
                late on day 1. Recall that Marla is punctual on day 1.
            b) Create a 2 × 2 matrix B in which the elements in each row represent
                conditional probabilities that Marla will be punctual and late. Let the first
                row be the probabilities after a day in which Marla was punctual, and the
                second row be the probabilities after a day in which she was late.


  344      MHR • Introduction to Probability
c) Evaluate A × B and A × B2.
        d) Compare the results of part c) with your answers to questions 2 and 3.
           Explain what you notice.
        e) What does the first row of the matrix B2 represent?



The matrix model you have just developed is an example of a Markov chain,
a probability model in which the outcome of any trial depends directly on the
outcome of the previous trial. Using matrix operations can simplify
probability calculations, especially in determining long-term trends.

The 1 × 2 matrix A in the investigation is an initial probability vector, S (0),
and represents the probabilities of the initial state of a Markov chain. The
2 × 2 matrix B is a transition matrix, P, and represents the probabilities of
moving from any initial state to a new state in any trial.

These matrices have been arranged such that the product S (0) × P generates
the row matrix that gives the probabilities of each state after one trial. This
matrix is called the first-step probability vector, S (1). In general, the nth-
step probability vector, S (n), can be obtained by repeatedly multiplying the
probability vector by P. Sometimes these vectors are also called first-state
and nth-state vectors, respectively.

Notice that each entry in a probability vector or a transition matrix is a
probability and must therefore be between 0 and 1. The possible states in a
Markov chain are always mutually exclusive events, one of which must occur
at each stage. Therefore, the entries in a probability vector must sum to 1, as
must the entries in each row of the transition matrix.

Example 1 Probability Vectors

Two video stores, Video Vic’s and MovieMaster, have just opened in a new
residential area. Initially, they each have half of the market for rented movies.
A customer who rents from Video Vic’s has a 60% probability of renting from
Video Vic’s the next time and a 40% chance of renting from MovieMaster.
On the other hand, a customer initially renting from MovieMaster has only
a 30% likelihood of renting from MovieMaster the next time and a 70%
probability of renting from Video Vic’s.
a)   What is the initial probability vector?
b)   What is the transition matrix?
c)   What is the probability of a customer renting a movie from each store
     the second time?
d)   What is the probability of a customer renting a movie from each store
     the third time?
e)   What assumption are you making in part d)? How realistic is it?
                                                        6.6 Applying Matrices to Probability Problems • MHR   345
Solution

a) Initially, each store has 50% of the market, so, the initial probability vector is

              VV MM
      S (0) = [0.5 0.5]

b) The first row of the transition matrix represents the probabilities for the
      second rental by customers whose initial choice was Video Vic’s. There is a
      60% chance that the customer returns, so the first entry is 0.6. It is 40%
      likely that the customer will rent from MovieMaster, so the second entry
      is 0.4.
      Similarly, the second row of the transition matrix represents the probabilities
      for the second rental by customers whose first choice was MovieMaster.
      There is a 30% chance that a customer will return on the next visit, and a
      70% chance that the customer will try Video Vic’s.
             VV MM
      P=
          ΄  0.6 0.4 VV
             0.7 0.3 MM   ΅
      Regardless of which store the customer chooses the first time, you are
      assuming that there are only two choices for the next visit. Hence, the
      sum of the probabilities in each row equals one.

c) To find the probabilities of a customer renting from either store on the
      second visit, calculate the first-step probability vector, S (1):

      S (1) = S (0)P
           = [0.5 0.5]    ΄ 0.6
                            0.7
                                  0.4
                                  0.3   ΅
           = [0.65 0.35]

      This new vector shows that there is a 65% probability that a customer will
      rent a movie from Video Vic’s on the second visit to a video store and a 35%
      chance that the customer will rent from MovieMaster.

d) To determine the probabilities of which store a customer will pick on the
      third visit, calculate the second-step probability vector, S (2):

      S (2) = S (1)P
           = [0.65 0.35]      ΄ 0.6
                                0.7
                                      0.4
                                      0.3    ΅
           = [0.635 0.365]

      So, on a third visit, a customer is 63.5% likely to rent from Video Vic’s and
      36.5% likely to rent from MovieMaster.




346      MHR • Introduction to Probability
e) To calculate the second-step probabilities, you assume that the conditional
     transition probabilities do not change. This assumption might not be
     realistic since customers who are 70% likely to switch away from
     MovieMaster may not be as much as 40% likely to switch back, unless they
     forget why they switched in the first place. In other words, Markov chains
     have no long-term memory. They recall only the latest state in predicting
     the next one.


Note that the result in Example 1d) could be calculated in another way.

S (2) = S (1)P
      = (S (0)P)P
      = S (0)(PP)   since matrix multiplication is associative
      = S (0)P 2

Similarly, S (3) = S (0)P 3, and so on. In general, the nth-step probability vector, S (n),
is given by

 S (n) = S (0)P n

This result enables you to determine higher-state probability vectors easily
using a graphing calculator or software.


Example 2 Long-Term Market Share

A marketing-research firm has tracked the sales of three brands of hockey sticks.
Each year, on average,
• Player-One keeps 70% of its customers, but loses 20% to Slapshot and 10%
  to Extreme Styx
• Slapshot keeps 65% of its customers, but loses 10% to Extreme Styx and 25%
  to Player-One
• Extreme Styx keeps 55% of its customers, but loses 30% to Player-One and
  15% to Slapshot
a)   What is the transition matrix?
b)   Assuming each brand begins with an equal market share, determine the
     market share of each brand after one, two, and three years.
c)   Determine the long-range market share of each brand.
d)   What assumption must you make to answer part c)?




                                                                 6.6 Applying Matrices to Probability Problems • MHR   347
Solution 1 Using Pencil and Paper

a) The transition matrix is

                    P      S      E

           ΄                            ΅
                   0.7    0.2    0.1        P
      P=           0.25   0.65   0.1        S
                   0.3    0.15   0.55       E
b) Assuming each brand begins with an equal market share, the initial
      probability vector is

               ΄            ΅
              1 1 1
      S (0) = ᎏᎏ ᎏᎏ ᎏᎏ
              3 3 3
      To determine the market shares of each brand after one year, compute the
      first-step probability vector.

      S (1) = S (0)P


                                 ΄                      ΅
                        0.7 0.2 0.1
               ΄             ΅
               1 1 1
           = ᎏᎏ ᎏᎏ ᎏᎏ 0.25 0.65 0.1
               3 3 3    0.3 0.15 0.55
                  – –
           = [0.416 0.3 0.25]

      So, after one year Player-One will have a market share of approximately
      42%, Slapshot will have 33%, and Extreme Styx will have 25%.

      Similarly, you can predict the market shares after two years using

      S (2) = S (1)P


                                            ΄                   ΅
                                                0.7 0.2 0.1
                  – –
           = [0.416 0.3 0.25]                   0.25 0.65 0.1
                                                0.3 0.15 0.55
           = [0.45 0.3375 0.2125]

      After two years, Player-One will have approximately 45% of the market,
      Slapshot will have 34%, and Extreme Styx will have 21%.

      The probabilities after three years are given by

      S (3) = S (2)P

                                                 ΄                   ΅
                                                     0.7 0.2 0.1
           = [0.45 0.3375 0.2125]                    0.25 0.65 0.1
                                                     0.3 0.15 0.55
           = [0.463 0.341 0.196]

      After three years, Player-One will have approximately 46% of the market,
      Slapshot will have 34%, and Extreme Styx will have 20%.



348     MHR • Introduction to Probability
c) The results from part b) suggest that the relative market shares may be
   converging to a steady state over a long period of time. You can test this
   hypothesis by calculating higher-state vectors and checking for stability.

   For example,
    S (10) = S (9)P                        S (11) = S (10)P
           = [0.471 0.347 0.182]                  = [0.471 0.347 0.182]

   The values of S (10) and S (11) are equal. It is easy to verify that they are equal to
   all higher orders of S (n) as well. The Markov chain has reached a steady state.
   A steady-state vector is a probability vector that remains unchanged when
   multiplied by the transition matrix. A steady state has been reached if
   S (n) = S (n)P
         = S (n+1)
   In this case, the steady state vector [0.471 0.347 0.182] indicates that, over a
   long period of time, Player-One will have approximately 47% of the market
   for hockey sticks, while Slapshot and Extreme Styx will have 35% and 18%,
   respectively, based on current trends.

d) The assumption you make in part c) is that the transition matrix does not
   change, that is, the market trends stay the same over the long term.


Solution 2 Using a Graphing Calculator

a) Use the MATRX EDIT menu to enter and store a matrix for the transition matrix B.

b) Similarly, enter the initial probability vector as matrix A. Then, use the MATRX
   EDIT menu to enter the calculation A × B on the home screen. The resulting
   matrix shows the market shares after one year are 42%, 33%, and 25%,
   respectively.




   To find the second-step probability vector use the formula S (2) = S (0)P2.
   Enter A × B2 using the MATRX NAMES menu and the 2 key. After
   two years, therefore, the market shares are 45%, 34%, and 21%,
   respectively.




                                                          6.6 Applying Matrices to Probability Problems • MHR   349
Similarly, enter A × B 3 to find the third-step probability vector. After
      three years, the market shares are 46%, 34%, and 20%, respectively.




c) Higher-state probability vectors are easy to determine with a graphing
      calculator.

       S (10) = S (0)P 10
              = [0.471 0.347 0.182]

      S (100) = S (0)P 100
              = [0.471 0.347 0.182]

      S (10) and S (100) are equal. The tiny difference between S (10) and S (100) is
      unimportant since the original data has only two significant digits. Thus,
      [0.471 0.347 0.182] is a steady-state vector, and the long-term market
      shares are predicted to be about 47%, 35%, and 18% for Player-One,
      Slapshot, and Extreme Styx, respectively.

Regular Markov chains always achieve a steady state. A Markov chain is                  Project
regular if the transition matrix P or some power of P has no zero entries.              Prep
Thus, regular Markov chains are fairly easy to identify. A regular Markov
chain will reach the same steady state regardless of the initial probability vector.    In the probability
                                                                                        project, you may
                                                                                        need to use
Example 3 Steady State of a Regular Markov Chain
                                                                                        Markov chains to
Suppose that Player-One and Slapshot initially split most of the market evenly          determine long-
between them, and that Extreme Styx, a relatively new company, starts with a            term probabilities.
10% market share.
a) Determine each company’s market share after one year.
b)    Predict the long-term market shares.

Solution

a) The initial probability vector is
      S (0) = [0.45 0.45 0.1]

      Using the same transition matrix as in Example 2,
      S(1) = S(0)P

                                 ΄                ΅
                             0.7 0.2 0.1
          = [0.45 0.45 0.1]  0.25 0.65 0.1
                             0.3 0.15 0.55
          = [0.4575 0.3975 0.145]
                                                                             – –
      These market shares differ from those in Example 2, where S (1) = [0.416 0.3 0.25].

350     MHR • Introduction to Probability
b)   S (100) = S (0)P 100
             = [0.471 0.347 0.182]

     In the long term, the steady state is the same as in Example 2. Notice that
     although the short-term results differ as seen in part a), the same steady
     state is achieved in the long term.



The steady state of a regular Markov chain can also be determined analytically.

Example 4 Analytic Determination of Steady State

The weather near a certain seaport follows this pattern: If it is a calm day, there
is a 70% chance that the next day will be calm and a 30% chance that it will be
stormy. If it is a stormy day, the chances are 50/50 that the next day will also be
stormy. Determine the long-term probability for the weather at the port.

Solution
The transition matrix for this Markov chain is
      C     S
P=   ΄0.7 0.3 C
      0.5 0.5 S   ΅
The steady-state vector will be a 1 × 2 matrix, S (n) = [ p q].

The Markov chain will reach a steady state when S (n) = S (n)P, so

[ p q] = [ p q]
                  ΄ 0.7
                    0.5
                          0.3
                          0.5   ΅
         = [0.7p + 0.5q   0.3p + 0.5q]

Setting first elements equal and second elements equal gives two equations in
two unknowns. These equations are dependent, so they define only one
relationship between p and q.

p = 0.7p + 0.5q
q = 0.3p + 0.5q

Subtracting the second equation from the first gives

p − q = 0.4p
    q = 0.6p




                                                          6.6 Applying Matrices to Probability Problems • MHR   351
Now, use the fact that the sum of probabilities at any state must equal 1,
           p+q=1
        p + 0.6p = 1
                     1
               p= ᎏ
                    1.6
                 = 0.625
               q=1−p
                 = 0.375
So, the steady-state vector for the weather is [0.625 0.375]. Over the long term,
there will be a 62.5% probability of a calm day and 37.5% chance of a stormy
day at the seaport.



   Key Concepts

   • The theory of Markov chains can be applied to probability models in which
     the outcome of one trial directly affects the outcome of the next trial.

   • Regular Markov chains eventually reach a steady state, which can be used to
     make long-term predictions.



   Communicate Your Understanding

      1. Why must a transition matrix always be square?

      2. Given an initial probability vector S (0) = [0.4 0.6] and a transition matrix

        P=   ΄ 0.5
               0.3     0.7 ΅
                       0.5 , state which of the following equations is easier to use

        for determining the third-step probability vector:
        S (3) = S (2)P or S (3) = S (0)P 3
        Explain your choice.

      3. Explain how you can determine whether a Markov chain has reached a
        steady state after k trials.

      4. What property or properties must events A, B, and C have if they are the
        only possible different states of a Markov chain?




352     MHR • Introduction to Probability
Practise                                             Apply, Solve, Communicate
A                                                     B
1. Which of the following cannot be an initial        4. Refer to question 3.
    probability vector? Explain why.                      a) Which company do you think will
    a) [0.2 0.45 0.25]                                         increase its long-term market share,
    b) [0.29 0.71]                                             based on the information provided?
                                                               Explain why you think so.
    c)
         ΄ 0.4 ΅
           0.6                                            b) Calculate the steady-state vector for the
                                                               Markov chain.
    d) [0.4 −0.1 0.7]
                                                          c) Which company increased its market
    e) [0.4 0.2 0.15 0.25]                                     share over the long term?
2. Which of the following cannot be a                     d) Compare this result with your answer
    transition matrix? Explain why.                            to part a). Explain any differences.


         ΄                    ΅
          0.3 0.3 0.4
                                                      5. For which of these transition matrices will
     a)   0.1 0      0.9
                                                          the Markov chain be regular? In each case,
          0.2 0.3 0.4
                                                          explain why.
    b)
         ΄ 0.2
           0.65
                   0.8
                   0.35   ΅                               a)
                                                               ΄ 0.2
                                                                 0.95
                                                                        0.8
                                                                        0.05   ΅
    c)
         ΄ 0.5
           0.3
                   0.1
                   0.22
                          0.4
                          0.48    ΅                       b)
                                                               ΄ 1 0΅
                                                                 0 1
3. Two competing companies, ZapShot and


                                                               ΄                    ΅
                                                                   0.1 0.6 0.3
    E-pics, manufacture and sell digital cameras.
                                                          c)       0.33 0.3 0.37
    Customer surveys suggest that the
                                                                   0.5 0    0.5
    companies’ market shares can be modelled
    using a Markov chain with the following           6. Gina noticed that the performance of her
    initial probability vector S(0) and transition        baseball team seemed to depend on the
    matrix P.                                             outcome of their previous game. When her

                                      ΄ 0.6   ΅
                                     0.4                  team won, there was a 70% chance that they
    S (0) = [0.67 0.33]       P=
                                     0.50.5               would win the next game. If they lost,
    Assume that the first element in the initial           however, there was only a 40% chance that
    probability vector pertains to ZapShot.               they would their next game.
    Explain the significance of                            a) What is the transition matrix of the
    a) the elements in the initial probability                 Markov chain for this situation?
         vector                                           b) Following a loss, what is the probability
    b) each element of the transition matrix                   that Gina’s team will win two games later?
    c) each element of the product S (0)P                 c) What is the steady-state vector for the
                                                               Markov chain, and what does it mean?




                                                     6.6 Applying Matrices to Probability Problems • MHR   353
7. Application Two popcorn manufacturers,                    9. Application On any given day, the stock price
      Ready-Pop and ButterPlus, are competing                     for Bluebird Mutual may rise, fall, or remain
      for the same market. Trends indicate that                   unchanged. These states, R, F, and U, can
      65% of consumers who purchase Ready-Pop                     be modelled by a Markov chain with the
      will stay with Ready-Pop the next time,                     transition matrix:
      while 35% will try ButterPlus. Among those                       R    F    U
      who purchase ButterPlus, 75% will buy

                                                                  ΄                    ΅
                                                                      0.75 0.15 0.1 R
      ButterPlus again and 25% will switch to                         0.25 0.6 0.15 F
      Ready-Pop. Each popcorn producer initially                      0.4 0.4 0.2 U
      has 50% of the market.
      a) What is the initial probability vector?                  a) If, after a day of trading, the value of
                                                                      Bluebird’s stock has fallen, what is the
      b) What is the transition matrix?
                                                                      probability that it will rise the next day?
      c) Determine the first- and second-step
                                                                  b) If Bluebird’s value has just risen, what is
         probability vectors.
                                                                      the likelihood that it will rise one week
      d) What is the long-term probability that a                     from now?
         customer will buy Ready-Pop?
                                                                  c) Assuming that the behaviour of the
 8. Inquiry/Problem Solving The weather                               Bluebird stock continues to follow this
      pattern for a certain region is as follows. On                  established pattern, would you consider
      a sunny day, there is a 50% probability that                    Bluebird to be a safe investment? Explain
      the next day will be sunny, a 30% chance                        your answer, and justify your reasoning
      that the next day will be cloudy, and a 20%                     with appropriate calculations.
      chance that the next day will be rainy. On a
      cloudy day, the probability that the next day          10. Assume that each doe produces one female
                                                             pt
      will be cloudy is 35%, while it is 40% likely        ha e   offspring. Let the two states be D, a normal
                                                       C


                                                              r




      to be rainy and 25% likely to be sunny the                  doe, and B, a doe with bald patches.
                                                              m
                                                       P




                                                       r
                                                           oble
      next day. On a rainy day, there is a 45%                    Determine
      chance that it will be rainy the next day, a                a) the initial probability vector
      20% chance that the next day will be sunny,
                                                                  b) the transition matrix for each generation
      and a 35% chance that the next day will be
                                                                      of offspring
      cloudy.
                                                                  c) the long-term probability of a new-born
      a) What is the transition matrix?
                                                                      doe developing bald patches
      b) If it is cloudy on Wednesday, what is the
                                                                  d) Describe the assumptions which are
         probability that it will be sunny on
                                                                      inherent in this analysis. What other
         Saturday?
                                                                      factors could affect the stability of this
      c) What is the probability that it will be                      Markov chain?
         sunny four months from today, according
         to this model?
      d) What assumptions must you make in
         part c)? Are they realistic? Why or
         why not?




354      MHR • Introduction to Probability
ACHIEVEMENT CHECK
                                                                         C
  Knowledge/      Thinking/Inquiry/
                                                                        12. Communication Refer to Example 4 on
                                          Communication   Application
 Understanding     Problem Solving                                           page 351.
11. When Mazemaster, the mouse, is placed in                                 a) Suppose that the probability of stormy
     a maze like the one shown below, he will                                     weather on any day following a calm day
     explore the maze by picking the doors at                                     increases by 0.1. Estimate the effect this
     random to move from compartment to                                           change will have on the steady state of
     compartment. A transition takes place                                        the Markov chain. Explain your
     when Mazemaster moves through one of                                         prediction.
     the doors into another compartment. Since                               b) Calculate the new steady-state vector and
     all the doors lead to other compartments,                                    compare the result with your prediction.
     the probability of moving from a                                             Discuss any difference between your
     compartment back to the same                                                 estimate and the calculated steady state.
     compartment in a single transition is zero.                             c) Repeat parts a) and b) for the situation in
                                                                                  which the probability of stormy weather
                       1                            6                             following either a calm or a stormy day
                                      4                                           increases by 0.1, compared to the data in
                                                                                  Example 4.
                       2                            7
                                                                             d) Discuss possible factors that might cause
                                      5
                                                                                  the mathematical model to be altered.
                       3                            8
                                                                        13. For each of the transition matrices below,
                                                                             decide whether the Markov chain is regular
     a) Construct the transition matrix, P, for
                                                                             and whether it approaches a steady state.
         the Markov chain.
                                                                             (Hint: An irregular Markov chain could still
    b) Use technology to calculate P 2, P 3, and                             have a steady-state vector.)
         P 4.
     c) If Mazemaster starts in compartment 1,                               a)
                                                                                  ΄ 0 1΅
                                                                                    1 0
                                                                                                       b)
                                                                                                            ΄0
                                                                                                             0.5
                                                                                                                    1
                                                                                                                    0.5   ΅
         what is the probability that he will be in
         compartment 4 after
         i)      two transitions?
                                                                             c)
                                                                                  ΄1
                                                                                   0.5
                                                                                          0
                                                                                          0.5   ΅
         ii)     three transitions?                                     14. Refer to Example 2 on page 347.
         iii) four transitions?                                              a) Using a graphing calculator, find P 100.
    d) Predict where Mazemaster is most likely                                    Describe this matrix.
         to be in the long run. Explain the                                  b) Let S (0) = [a b c]. Find an expression for
         reasoning for your prediction.                                           the value of S (0)P 100. Does this expression
     e) Calculate the steady-state vector. Does                                   depend on S (0), P, or both?
         it support your prediction? If not,                                 c) What property of a regular Markov
         identify the error in your reasoning in                                  chain can you deduce from your answer
         part d).                                                                 to part b)?



                                                                        6.6 Applying Matrices to Probability Problems • MHR   355
15. Inquiry/Problem Solving The transition
      matrix for a Markov chain with steady-state

                ΄ 7
                 13 13
                      6
                            ΅ ΄
      vector of ᎏᎏ ᎏᎏ is 0.4 0.6 .
                              m n            ΅
      Determine the unknown transition matrix
      elements, m and n.
      Career Connection
                                            Investment Broker
  Many people use the services of an investment
  broker to help them invest their earnings. An
  investment broker provides advice to clients
  on how to invest their money, based on their
  individual goals, income, and risk tolerance,
  among other factors. An investment broker can
  work for a financial institution, such as a bank
  or trust company, or a brokerage, which is a
  company that specializes in investments. An
  investment broker typically buys, sells, and
  trades a variety of investment items, including
  stocks, bonds, mutual funds, and treasury bills.

  An investment broker must be able to read and           acceptable substitute. A broker must have
  interpret a variety of financial data including          a licence from the provincial securities
  periodicals and corporate reports. Based on             commission and must pass specialized courses
  experience and sound mathematical principles,           in order to trade in specific investment
  the successful investment broker must be able           products such as securities, options, and futures
  to make reasonable predictions of uncertain             contracts. The chartered financial analyst (CFA)
  outcomes.                                               designation is recommended for brokers
                                                          wishing to enter the mutual-fund field or other
  Because of the nature of this industry, earnings        financial-planning services.
  often depend directly on performance. An
  investment broker typically earns a commission,
  similar to that for a sales representative. In the
  short term, the investment broker can expect
  some fluctuations in earnings. In the long                     www.mcgrawhill.ca/links/MDM12
  term, strong performers can expect a very
                                                            Visit the above web site and follow the links to
  comfortable living, while weak performers
                                                            find out more about an investment broker and
  are not likely to last long in the field.
                                                                  other careers related to mathematics.
  Usually, an investment broker requires a
  minimum of a bachelor’s degree in economics
  or business, although related work experience
  in investments or sales is sometimes an



356     MHR • Introduction to Probability
Review of Key Concepts

6.1 Basic Probability Concepts                              Based on this survey, calculate
Refer to the Key Concepts on page 311.                      a) the odds that a customer visited the
                                                               restaurant exactly three times
 1. A bag of marbles contains seven whites, five
    blacks, and eight cat’s-eyes. Determine the             b) the odds in favour of a customer having
    probability that a randomly drawn marble is                visited the restaurant fewer than three times
    a) a white marble                                       c) the odds against a customer having visited
                                                               the restaurant more than three times
    b) a marble that is not black

 2. When a die was rolled 20 times, 4 came up           6.3 Probabilities Using Counting
    five times.                                              Techniques
    a) Determine the empirical probability of           Refer to the Key Concepts on page 324.
       rolling a 4 with a die based on the 20 trials.    6. Suppose three marbles are selected at random
    b) Determine the theoretical probability of             from the bag of marbles in question 1.
       rolling a 4 with a die.                              a) Draw a tree diagram to illustrate all
    c) How can you account for the difference                  possible outcomes.
       between the results of parts a) and b)?              b) Are all possible outcomes equally likely?

 3. Estimate the subjective probability of each
                                                               Explain.
    event and provide a rationale for your                  c) Determine the probability that all three
    decision.                                                  selected marbles are cat’s-eyes.
    a) All classes next week will be cancelled.             d) Determine the probability that none of

    b) At least one severe snow storm will occur
                                                               the marbles drawn are cat’s-eyes.
       in your area next winter.                         7. The Sluggers baseball team has a starting line-
                                                            up consisting of nine players, including Tyrone
6.2 Odds                                                    and his sister Amanda. If the batting order is
Refer to the Key Concepts on page 317.                      randomly assigned, what is the probability that
                                                            Tyrone will bat first, followed by Amanda?
 4. Determine the odds in favour of flipping
    three coins and having them all turn up              8. A three-member athletics council is to be
    heads.                                                  randomly chosen from ten students, five of
                                                            whom are runners. The council positions
 5. A restaurant owner conducts a study that
                                                            are president, secretary, and treasurer.
    measures the frequency of customer visits in
                                                            Determine the probability that
    a given month. The results are recorded in
    the following table.                                    a) the committee is comprised of all
                                                               runners
     Number of Visits Number of Customers
            1                    4                          b) the committee is comprised of the three
            2                    6                             eldest runners
            3                    7                          c) the eldest runner is president, second
        4 or more                3                             eldest runner is secretary, and third
                                                               eldest runner is treasurer
                                                                          Review of Key Concepts • MHR   357
6.4 Dependent and Independent Events                        12. During a marketing blitz, a telemarketer
Refer to the Key Concepts on page 333.                          conducts phone solicitations continuously
                                                                from 16 00 until 20 00. Suppose that you
 9. Classify each of the following pairs of events              have a 20% probability of being called
      as independent or dependent.                              during this blitz. If you generally eat dinner
                  First Event           Second Event            between 18 00 and 18 30, how likely is it
      a)     Hitting a home run      Catching a pop fly          that the telemarketer will interrupt your
             while at bat            while in the field          dinner?
      b)     Staying up late         Sleeping in the next
                                     day
      c)     Completing your         Passing your           6.5 Mutually Exclusive Events
             calculus review         calculus exam          Refer to the Key Concepts on page 340.
      d)     Randomly selecting      Randomly selecting
                                                            13. Classify each pair of events as mutually
             a shirt                 a tie
                                                                exclusive or non-mutually exclusive.
10. Bruno has just had job interviews with two                              First Event        Second Event
      separate firms: Golden Enterprises and                     a)    Randomly selecting   Randomly selecting
      Outer Orbit Manufacturing. He estimates                         a classical CD       a rock CD
      that he has a 40% chance of receiving a job               b)    Your next birthday   Your next birthday
      offer from Golden and a 75% chance of                           occurring on a       occurring on a
                                                                      Wednesday            weekend
      receiving an offer from Outer Orbit.
                                                                c)    Ordering a           Ordering a
      a) What is the probability that Bruno will                      hamburger with       hamburger with no
           receive both job offers?                                   cheese               onions
      b) Is Bruno applying the concept of                       d)    Rolling a perfect    Rolling an even
           theoretical, empirical, or subjective                      square with a die    number with a die
           probability? Explain.
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management
Mc graw hill - data management

More Related Content

PPSX
PDF
PDF
Parametric equations
PDF
Networks and Matrices
PDF
Math Modules (DRAFT)
PDF
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA
PDF
Year 12 Maths A Textbook - Chapter 7
PDF
Optimization in Crowd Movement Models via Anticipation
Parametric equations
Networks and Matrices
Math Modules (DRAFT)
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA
Year 12 Maths A Textbook - Chapter 7
Optimization in Crowd Movement Models via Anticipation

Similar to Mc graw hill - data management (20)

DOC
37756909 yearly-plan-add-maths-form-4-edit-kuching-1
DOC
BDPA IT Showcase: Production Planning Tools
DOC
Algebreviewer
DOC
Algebreviewer
PDF
Maths curriculum Poland
PDF
Linear programming introduction
PPTX
ANT COLONY OPTIMIZATION FOR IMAGE EDGE DETECTION
PDF
Operational Research And Networks Finke G Ed
PPTX
Algebra 2 benchmark 3 review
PPTX
Slides algebra concept_031012
PPTX
Advanced Functions Unit 1
PDF
CPOSC Presentation 20091017
PPT
Graph functions
PDF
S1 2005 jan
DOC
Question bank
PDF
UCLA: Data Management for Librarians
PPTX
Y11 m02 networks
PDF
Math 17 midterm exam review jamie
PDF
Algebra Readiness Page 10 HW
PPTX
Alg II3-1 Solving Systems Using Tables & Graphs
37756909 yearly-plan-add-maths-form-4-edit-kuching-1
BDPA IT Showcase: Production Planning Tools
Algebreviewer
Algebreviewer
Maths curriculum Poland
Linear programming introduction
ANT COLONY OPTIMIZATION FOR IMAGE EDGE DETECTION
Operational Research And Networks Finke G Ed
Algebra 2 benchmark 3 review
Slides algebra concept_031012
Advanced Functions Unit 1
CPOSC Presentation 20091017
Graph functions
S1 2005 jan
Question bank
UCLA: Data Management for Librarians
Y11 m02 networks
Math 17 midterm exam review jamie
Algebra Readiness Page 10 HW
Alg II3-1 Solving Systems Using Tables & Graphs
Ad

Recently uploaded (20)

PDF
Empowerment Technology for Senior High School Guide
PDF
My India Quiz Book_20210205121199924.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
20th Century Theater, Methods, History.pptx
PPTX
Computer Architecture Input Output Memory.pptx
PDF
advance database management system book.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
Empowerment Technology for Senior High School Guide
My India Quiz Book_20210205121199924.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Practical Manual AGRO-233 Principles and Practices of Natural Farming
History, Philosophy and sociology of education (1).pptx
What if we spent less time fighting change, and more time building what’s rig...
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Introduction to pro and eukaryotes and differences.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
202450812 BayCHI UCSC-SV 20250812 v17.pptx
20th Century Theater, Methods, History.pptx
Computer Architecture Input Output Memory.pptx
advance database management system book.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
FORM 1 BIOLOGY MIND MAPS and their schemes
Chinmaya Tiranga quiz Grand Finale.pdf
Ad

Mc graw hill - data management

  • 1. McGraw-Hili Ryerson D�-J� Mathematics of J:;j�jJ�0ajSJajJ-J This book was distributed by Jack Truong for use at William Lyon Mackenzie Collegiate Institute.
  • 2. 1 PT ER Tools for Data Management CHA Specific Expectations Section Locate data to answer questions of significance or personal interest, by 1.3 searching well-organized databases. Use the Internet effectively as a source for databases. 1.3 Create database or spreadsheet templates that facilitate the manipulation 1.2, 1.3, 1.4 and retrieval of data from large bodies of information that have a variety of characteristics. Represent simple iterative processes, using diagrams that involve 1.1 branches and loops. Represent complex tasks or issues, using diagrams. 1.1, 1.5 Solve network problems, using introductory graph theory. 1.5 Represent numerical data, using matrices, and demonstrate an 1.6, 1.7 understanding of terminology and notation related to matrices. Demonstrate proficiency in matrix operations, including addition, scalar 1.6, 1.7 multiplication, matrix multiplication, the calculation of row sums, and the calculation of column sums, as necessary to solve problems, with and without the aid of technology. Solve problems drawn from a variety of applications, using matrix 1.6, 1.7 methods.
  • 3. Chapter Problem VIA Rail Routes 1. a) List several routes you have When travelling by bus, train, or airplane, travelled where you were able to you usually want to reach your destination reach your destination directly. without any stops or transfers. However, b) List a route where you had to it is not always possible to reach your change vehicles exactly once before destination by a non-stop route. The reaching your destination. following map shows the VIA Rail routes 2. a) List all the possible routes from for eight major cities. The arrows Montréal to Toronto by VIA Rail. represent routes on which you do not have b) Which route would you take to get to change trains. from Montréal to Toronto in the Montréal least amount of time? Explain your Sudbury reasoning. 3. a) List all the possible routes from Ottawa Kingston to London. Kingston b) Give a possible reason why VIA Rail chooses not to have a direct train Toronto from Kingston to London. This chapter introduces graph theory, matrices, and technology that you can use London to model networks like the one shown. You Niagara Falls will learn techniques for determining the Windsor number of direct and indirect routes from one city to another. The chapter also discusses useful data-management tools including iterative processes, databases, software, and simulations.
  • 4. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Order of operations Evaluate each 5. Graphing data Organize the following set of expression. data using a fully-labelled double-bar graph. a) (−4)(5) + (2)(−3) City Snowfall (cm) Total Precipitation (cm) b) (−2)(3) + (5)(−3) + (8)(7) St. John’s 322.1 148.2 Charlottetown 338.7 120.1 c) (1)(0) + (1)(1) + (0)(0) + (0)(1) Halifax 261.4 147.4 12 d) (2)(4) + ᎏᎏ − (3)2 Fredericton 294.5 113.1 3 Québec City 337.0 120.8 2. Substituting into equations Given Montréal 214.2 94.0 f (x) = 3x2 − 5x + 2 and g(x) = 2x − 1, Ottawa 221.5 91.1 evaluate each expression. Toronto 135.0 81.9 Winnipeg 114.8 50.4 a) f (2) Regina 107.4 36.4 b) g(2) Edmonton 129.6 46.1 c) f (g(−1)) Calgary 135.4 39.9 d) f ( g(1)) Vancouver 54.9 116.7 Victoria 46.9 85.8 e) f ( f (2)) Whitehorse 145.2 26.9 f) g( f (2)) Yellowknife 143.9 26.7 3. Solving equations Solve for x. 6. Graphing data The following table lists the a) 2x − 3 = 7 average annual full-time earnings for males b) 5x + 2 = −8 and females. Illustrate these data using a x fully-labelled double-line graph. c) ᎏ − 5 = 5 2 d) 4x − 3 = 2x − 1 Year Women ($) Men ($) 1989 28 219 42 767 e) x2 = 25 1990 29 050 42 913 f) x3 = 125 1991 29 654 42 575 g) 3(x + 1) = 2(x − 1) 1992 30 903 42 984 2x − 5 3x − 1 1993 30 466 42 161 h) ᎏ = ᎏ 2 4 1994 30 274 43 362 1995 30 959 42 338 4. Graphing data In a sample of 1000 1996 30 606 41 897 Canadians, 46% have type O blood, 43% 1997 30 484 43 804 have type A, 8% have type B, and 3% have 1998 32 553 45 070 type AB. Represent these data with a fully- labelled circle graph. 4 MHR • Tools for Data Management
  • 5. 7. Using spreadsheets Refer to the spreadsheet 10. Ratios of areas Draw two squares on a sheet section of Appendix B, if necessary. of grid paper, making the dimensions of the a) Describe how to refer to a specific cell. second square half those of the first. b) Describe how to refer to a range of cells a) Use algebra to calculate the ratio of the in the same row. areas of the two squares. c) Describe how to copy data into another b) Confirm this ratio by counting the cell. number of grid units contained in each square. d) Describe how to move data from one column to another. c) If you have access to The Geometer’s Sketchpad or similar software, confirm e) Describe how to expand the width of a the area ratio by drawing a square, column. dilating it by a factor of 0.5, and f) Describe how to add another column. measuring the areas of the two squares. g) What symbol must precede a Refer to the help menu in the software, mathematical expression? if necessary. 8. Similar triangles Determine which of the 11. Simplifying expressions Expand and simplify following triangles are similar. Explain each expression. your reasoning. a) (x – 1)2 B 3 A D b) (2x + 1)(x – 4) 2 7 4 c) –5x(x – 2y) 4 C E d) 3x(x – y)2 F 6 G e) (x – y)(3x)2 12 f) (a + b)(c – d) 6 12. Fractions, percents, decimals Express as a J 9 decimal. H 5 23 2 a) ᎏᎏ b) ᎏᎏ c) ᎏᎏ 9. Number patterns Describe each of the 20 50 3 following patterns. Show the next three 138 6 d) ᎏᎏ e) ᎏᎏ f) 73% terms. 12 7 a) 65, 62, 59, … 13. Fractions, percents, decimals Express as a b) 100, 50, 25, … percent. 1 1 1 4 1 c) 1, − ᎏ , ᎏ , − ᎏ , … a) 0.46 b) ᎏᎏ c) ᎏᎏ 2 4 8 5 30 d) a, b, aa, bb, aaa, bbbb, aaaa, bbbbbbbb, … 11 d) 2.25 e) ᎏᎏ 8 Review of Prerequisite Skills • MHR 5
  • 6. 1.1 The Iterative Process If you look carefully at the branches of a tree, you can see the same pattern repeated over and over, but getting smaller toward the end of each branch. A nautilus shell repeats the same shape on a larger and larger scale from its tip to its opening. You yourself repeat many activities each day. These three examples all involve an iterative process. Iteration is a process of repeating the same procedure over and over. The following activities demonstrate this process. I N V E S T I G AT E & I N Q U I R E : Developing a Sor t Algorithm Often you need to sort data using one or more criteria, such as alphabetical or numerical order. Work with a partner to develop an algorithm to sort the members of your class in order of their birthdays. An algorithm is a procedure or set of rules for solving a problem. 1. Select two people and compare their birthdays. 2. Rank the person with the later birthday second. 3. Now, compare the next person’s birthday with the last ranked birthday. Rank the later birthday of those two last. 4. Describe the continuing process you will use to find the classmate with the latest birthday. 5. Describe the process you would use to find the person with the second latest birthday. With whom do you stop comparing? 6. Describe a process to rank all the remaining members of your class by their birthdays. 7. Illustrate your process with a diagram. The process you described is an iterative process because it involves repeating the same set of steps throughout the algorithm. Computers can easily be programmed to sort data using this process. 6 MHR • Tools for Data Management
  • 7. I N V E S T I G AT E & I N Q U I R E : T h e S i e r p i n s k i Tr i a n g l e Method 1: Pencil and Paper 1. Using isometric dot paper, draw a large equilateral triangle with side lengths of 32 units. 2. Divide this equilateral triangle into four smaller equilateral triangles. 3. Shade the middle triangle. What fraction of the original triangle is shaded? 4. For each of the unshaded triangles, repeat this process. What fraction of the original triangle is shaded? 5. For each of the unshaded triangles, repeat this process again. What fraction of the original triangle is shaded now? 6. Predict the fraction of the original triangle that would be shaded for the fourth and fifth steps in this iterative process. 7. Predict the fraction of the original triangle that would be shaded if this iterative process continued indefinitely. Method 2: The Geometer’s Sketchpad® 1. Open a new sketch and a new script. 2. Position both windows side by side. 3. Click on REC in the script window. 4. In the sketch window, construct a triangle. Shift-click on each side of the triangle. Under the Construct menu, choose Point at Midpoint and then Polygon Interior of the midpoints. 5. Shift-click on one vertex and the two adjacent midpoints. Choose Loop in your script. 6. Repeat step 5 for the other two vertices. 7. Shift-click on the three midpoints. From the Display menu, choose Hide Midpoints. 8. Stop your script. 9. Open a new sketch. Construct a new triangle. Mark the three vertices. Play your script at a recursion depth of at least 3. You may increase the speed by clicking on Fast. 10. a) What fraction of the original triangle is shaded i) after one recursion? ii) after two recursions? iii) after three recursions? b) Predict what fraction would be shaded after four and five recursions. c) Predict the fraction of the original triangle that would be shaded if this iterative (recursion) process continued indefinitely. 11. Experiment with recursion scripts to design patterns with repeating shapes. 1.1 The Iterative Process • MHR 7
  • 8. The Sierpinski triangle is named after the Polish mathematician, Waclaw Sierpinski (1882−1924). It is an example of a fractal, a geometric figure that is www.mcgrawhill.ca/links/MDM12 generally created using an iterative process. One part of the process is that fractals are made of self- Visit the above web site and follow the links to learn more about the Sierpinski triangle and similar shapes. As the shapes become smaller and fractals. Choose an interesting fractal and describe smaller, they keep the same geometrical how it is self-similar. characteristics as the original larger shape. Fractal geometry is a very rich area of study. Fractals can be used to model plants, trees, economies, or the honeycomb pattern in human bones. Example 1 Modelling With a Fractal Fractals can model the branching of a tree. Describe the algorithm used to model the tree shown. Solution Begin with a 1-unit segment. Branch off at 60° with two segments, each one half the length of the previous branch. Repeat this process for a total of three iterations. Arrow diagrams can illustrate iterations. Such diagrams show the sequence of steps in the process. Example 2 The Water Cycle Illustrate the water cycle using an arrow diagram. Solution The Water Cycle The water, or hydrologic, cycle is Condensation an iterative process. Although the timing of the precipitation can Precipitation Transpiration Evaporation vary, the cycle will repeat itself Surface indefinitely. runoff Lake Percolation Streams and Ocean Water table Rivers Groundwater 8 MHR • Tools for Data Management
  • 9. Example 3 Tree Diagram a) Illustrate the results of a best-of-five hockey playoff series between Ottawa and Toronto using a tree diagram. b) How many different outcomes of the series are possible? Solution a) For each game, the tree diagram has two branches, O one representing a win by Ottawa (O) and the other O O a win by Toronto (T). Each set of branches T O T represents a new game in the playoff round. As soon O T O as one team wins three games, the playoff round O O T ends, so the branch representing that sequence also T T stops. O O T T T b) By counting the endpoints of the branches, you can O determine that there are 20 possible outcomes for O O T T this series. O O O T T T T O O T O T T T 1st game 2nd 3rd 4th 5th Example 4 Recursive Formula The recursive formula tn = 3tn-1 − tn-2 defines a sequence of numbers. Find the next five terms in the sequence given that the initial or seed values are t1 = 1 and t2 = 3. Solution t3 = 3t2 − t1 t4 = 3t3 − t2 t5 = 3t4 − t3 = 3(3) − 1 = 3(8) − 3 = 3(21) − 8 =8 = 21 = 55 t6 = 3t5 − t4 t7 = 3t6 − t5 = 3(55) − 21 = 3(144) − 55 = 144 = 377 The next five terms are 8, 21, 55, 144, and 377. 1.1 The Iterative Process • MHR 9
  • 10. Key Concepts • Iteration occurs in many natural and mathematical processes. Iterative processes can create fractals. • A process that repeats itself can be illustrated using arrows and loops. • A tree diagram can illustrate all the possible outcomes of a repeated process involving two or more choices at each step. • For recursive functions, the first step is calculated using initial or seed values, then each successive term is calculated using the result of the preceding step. Communicate Your Understanding 1. Describe how fractals have been used to model the fern leaf shown on the right. 2. Describe your daily routine as an iterative process. Practise 2. The diagram below illustrates the carbon- oxygen cycle. Draw arrows to show the A gains and losses of carbon dioxide. 1. Which of the following involve an iterative process? The Carbon-Oxygen Cycle World atmospheric CO2 a) the cycle of a washing machine carbon dioxide supply CO CO2 b) your reflections in two mirrors that face CO2 CO2 CO2 CO2 CO2 CO2 CO2 Industrial activity (H20) each other Soil Animal Plant Photosynthesis Combustion respir– respir– respir– ation c) the placement of the dials on an ation ation automobile dashboard Surface exchange Ocean Land d) a chart of sunrise and sunset times Photo– CO2 in seawater Peat synthesis Molten e) substituting a value for the variable in Respiration Fossil rocks Plankton Marine Coal a quadratic equation Animals fuels f) a tessellating pattern, such as paving Sediments Organic sediments (hydrocarbons) Petroleum Natural gas bricks that fit together without gaps CaCO2 in rock (calcium carbonate) 10 MHR • Tools for Data Management
  • 11. 3. Draw a tree diagram representing the 8. Application In 1904 the Swedish playoffs of eight players in a singles tennis mathematician Helge von Koch tournament. The tree diagram should show (1870−1924) developed a fractal based on the winner of each game continuing to the an equilateral triangle. Using either paper next round until a champion is decided. and pencil or a drawing program, such as The Geometer’s Sketchpad, draw a large Apply, Solve, Communicate equilateral triangle and trisect each side. Replace each middle segment with two B segments the same length as the middle 4. Draw a diagram to represent the food chain. segment, forming an equilateral triangle with the base removed, as shown below. 5. Communication Describe how the tracing of heartbeats on a cardiac monitor or electrocardiogram is iterative. Illustrate your description with a sketch. 6. In the first investigation, on page 6, you developed a sort algorithm in which new data were compared to the lowest ranked Repeat the process of trisection and birthday until the latest birthday was found. replacement on each of the 12 smaller Then, the second latest, third latest, and so segments. If you are using a computer on were found in the same manner. program, continue this process for at least a) Write a sort algorithm in which this two more iterations. process is reversed so that the highest a) How many segments are there after ranked item is found instead of the three iterations? lowest. b) How many segments are there after b) Write a sort algorithm in which you four iterations? compare the first two data, then the c) What pattern can you use to predict the second and third, then the third and number of segments after n iterations? fourth, and so on, interchanging the order of the data in any pair where the 9. The first two terms of a sequence are given second item is ranked higher. as t1 = 2 and t2 = 4. The recursion formula is tn = (tn−1 ) 2 − 3tn−2. Determine the next four 7. Application Sierpinski’s carpet is similar to terms in the sequence. Sierpinski’s triangle, except that it begins with a square. This square is divided into nine smaller squares and the middle one is shaded. Use paper and pencil or a drawing program to construct Sierpinski’s carpet to at least three stages. Predict what fraction of the original square will be shaded after n stages. 1.1 The Iterative Process • MHR 11
  • 12. 10. Each of the following fractal trees has a a) Select a starting point near the centre of different algorithm. Assume that each tree a sheet of grid paper. Assign the numbers begins with a segment 1 unit long. 1 to 4 to the directions north, south, a) Illustrate or describe the algorithm for east, or west in any order. Now, generate each fractal tree. random whole numbers between 1 and 4 using a die, coin, or graphing calculator. i) ii) Draw successive line segments one unit long in the directions corresponding to the random numbers until you reach an edge of the paper. b) How would a random walk be affected if it were self-avoiding, that is, not allowed iii) to intersect itself? Repeat part a) using this extra rule. c) Design your own random walk with a different set of rules. After completing the walk, trade drawings with a classmate and see if you can deduce the rules for each other’s walk. www.mcgrawhill.ca/links/MDM12 To learn more about chaos theory, visit the above web site and follow the links. Describe an aspect of chaos theory that interests you. b) What is the total length of the branches in each tree? c) An interesting shape on a fractal tree is 12. Use the given values for t1 to find the a spiral, which you can trace by tracing successive terms of the following recursive a branch to its extremity. Are all spirals formulas. Continue until a pattern appears. within a tree self-similar? Describe the pattern and make a prediction d) Write your own set of rules for a fractal for the value of the nth term. tree. Draw the tree using paper and a) tn = 2−tn−1; t1 = 0 pencil or a drawing program. b) tn = ͙tn − 1; t1 = 256 ෆ 1 11. Inquiry/Problem Solving Related to fractals c) tn = ᎏ ; t1 = 2 is the mathematical study of chaos, in which tn − 1 no accurate prediction of an outcome can be made. A random walk can illustrate such “chaotic” outcomes. 12 MHR • Tools for Data Management
  • 13. ACHIEVEMENT CHECK 15. Inquiry/Problem Solving The infinite series S = cos θ + cos2 θ + cos3 θ + … can be Knowledge/ Thinking/Inquiry/ Understanding Problem Solving Communication Application illustrated by drawing a circle centred at the origin, with radius of 1. Draw an angle θ 13. a) Given t1 = 1, list the next five terms for and, on the x-axis, label the point (cos θ, 0) the recursion formula tn = n × tn-1. as P1. Draw a new circle, centred at P1 , b) In this sequence, tk is a factorial number, with radius of cos θ. Continue this iterative often written as k! Show that process. Predict the length of the line tk = k! segment defined by the infinite series = k(k − 1)(k − 2)…(2)(1). S = cos θ + cos2 θ + cos3 θ + …. c) Explain what 8! means. Evaluate 8! 16. Communication Music can be written using d) Explain why factorial numbers can be fractal patterns. Look up this type of music considered an iterative process. in a library or on the Internet. What e) Note that characteristics does fractal music have? (25 )(5!) = (2 × 2 × 2 × 2 × 2)(5 × 4 × 3 × 2 × 1) 17. Computers use binary (base 2) code to = (2 × 5)(2 × 4)(2 × 3)(2 × 2)(2 × 1) represent numbers as a series of ones and = 10 × 8 × 6 × 4 × 2 zeros. which is the product of the first five Base 10 Binary even positive integers. Write a formula 0 0 for the product of the first n even 1 1 positive integers. Explain why your 2 10 formula is correct. 3 11 10! f) Write ᎏ as a product of 4 100 (25)(5!) Ӈ Ӈ consecutive odd integers. g) Write a factorial formula for the a) Describe an algorithm for converting product of integers from base 10 to binary. i) the first six odd positive integers b) Write each of the following numbers in binary. ii) the first ten odd positive integers i) 16 ii) 21 iii) the first n odd positive integers iii) 37 iv) 130 c) Convert the following binary numbers C to base 10. 14. Inquiry/Problem Solving Recycling can be i) 1010 ii) 100000 considered an iterative process. Research iii) 111010 iv) 111111111 the recycling process for a material such as newspaper, aluminum, or glass and illustrate the process with an arrow diagram. 1.1 The Iterative Process • MHR 13
  • 14. 1.2 Data Management Software I N V E S T I G AT E & I N Q U I R E : S o f t w a r e To o l s 1. List every computer program you can think of that can be used to manage data. 2. Sort the programs into categories, such as word-processors and spreadsheets. 3. Indicate the types of data each category of software would be best suited to handle. 4. List the advantages and disadvantages of each category of software. 5. Decide which of the programs on your list would be best for storing and accessing the lists you have just made. Most office and business software manage data of some kind. Schedulers and organizers manage lists of appointments and contacts. E-mail programs allow you to store, access, and sort your messages. Word-processors help you manage your documents and often have sort and outline functions for organizing data within a document. Although designed primarily for managing financial information, spreadsheets can perform calculations related to the management and analysis of a wide variety of data. Most of these programs can easily transfer data to other applications. Database programs, such as Microsoft® Access and Corel® Paradox®, are powerful tools for handling large numbers of records. These programs produce relational databases, ones in which different sets of records can be linked and sorted in complex ways based on the data contained in the records. For example, many organizations use a relational database to generate a monthly mailing of reminder letters to people whose memberships are about to expire. However, these complex relational database programs are difficult to learn and can be frustrating to use until you are thoroughly familiar with how they work. Partly for this reason, there are thousands of simpler database programs designed for specific types of data, such as book indexes or family trees. Of particular interest for this course are programs that can do statistical analysis of data. Such programs range from modest but useful freeware to major data-analysis packages costing thousands of dollars. The more commonly used programs include MINITAB™, SAS, and SST (Statistical 14 MHR • Tools for Data Management
  • 15. Software Tools). To demonstrate statistical software, some examples in this book have alternative solutions that use FathomTM, a statistical software package specifically designed for use in schools. Data management programs can perform complex calculations and link, search, sort, and graph data. The examples in this section use a spreadsheet to illustrate these operations. A spreadsheet is software that arranges data in rows and columns. For basic spreadsheet instructions, please refer to the spreadsheet section of Appendix B. If you are not already familiar with spreadsheets, you may find it helpful to try each of the examples yourself before answering the Practise questions at the end of the section. The two most commonly used spreadsheets are Corel Quattro® Pro and Microsoft Excel. Formulas and Functions A formula entered in a spreadsheet cell can perform calculations based on values or formulas contained in other cells. Formulas retrieve data from other cells by using cell references to indicate the rows and columns where the data are located. In the formulas C2*0.05 and D5+E5, each reference is to an individual cell. In both Microsoft® Excel and Corel® Quattro® Pro, it is good practice to begin a formula with an equals sign. Although not always necessary, the equals sign ensures that a formula is calculated rather than being interpreted as text. Built-in formulas are called functions. Many functions, such as the SUM function or MAX function use references to a range of cells. In Corel® Quattro Pro, precede a function with an @ symbol. For example, to find the total of the values in cells A2 through A6, you would enter Corel® Quattro Pro: @SUM(A2..A6) Microsoft® Excel: SUM(A2:A6) Similarly, to find the total for a block of cells from A2 through B6, enter Corel® Quattro Pro: @SUM(A2..B6) Microsoft® Excel: SUM(A2:B6) A list of formulas is available in the Insert menu by selecting Function…. You may select from a list of functions in categories such as Financial, Math & Trig, and Database. Example 1 Using Formulas and Functions The first three columns of the spreadsheet on the right list a student’s marks on tests and assignments for the first half of a course. Determine the percent mark for each test or assignment and calculate an overall midterm mark. 1.2 Data Management Software • MHR 15
  • 16. Solution In column D, enter formulas with cell referencing to find the percent for each individual mark. For example, in cell D2, you could use the formula B2/C2*100. Use the SUM function to find totals for columns B and C, and then convert to percent in cell D12 to find the midterm mark. Relative and Absolute Cell References Spreadsheets automatically adjust cell references whenever cells are copied, moved, or sorted. For example, if you copy a SUM function, used to calculate the sum of cells A3 to E3, from cell F3 to cell F4, the spreadsheet will change the cell references in the copy to A4 and E4. Thus, the value in cell F4 will be the sum of those in cells A4 to E4, rather than being the same as the value in F3. Because the cell references are relative to a location, this automatic adjustment is known as relative cell referencing. If the formula references need to be kept exactly as written, use absolute cell referencing. Enter dollar signs before the row and column references to block automatic adjustment of the references. Fill and Series Features When a formula or function is to be copied to several adjoining cells, as for the percent calculations in Example 1, you can use the Fill feature instead of Copy. Click once on the cell to be copied, then click and drag across or down through the range of cells into which the formula is to be copied. To create a sequence of numbers, enter the first two values in adjoining cells, then select Edit/Fill/Series to continue the sequence. 16 MHR • Tools for Data Management
  • 17. Example 2 Using the Fill Feature The relationship between Celsius and Fahrenheit temperatures is given by the formula Fahrenheit = 1.8 × Celsius + 32. Use a spreadsheet to produce a conversion table for temperatures from 1°C to 15ºC. Solution Enter 1 into cell E2 and 2 into cell E3. Use the Fill feature to put the numbers 3 through 15 into cells E4 to E16. Enter the conversion formula E2*1.8+32 into cell F2. Then, use the Fill feature to copy the formula into cells F3 through F16. Note that the values in these cells show that the cell references in the formulas did change when copied. These changes are an example of relative cell referencing. Charting Another important feature of spreadsheets is the ability to display numerical data in the form of charts or graphs, thereby making the data easier to understand. The first step is to select the range of cells to be graphed. For non-adjoining fields, hold down the Ctrl key while highlighting the cells. Then, use the Chart feature to specify how you want the chart to appear. You can produce legends and a title for your graph as well as labels for the axes. Various two- and three-dimensional versions of bar, line, and circle graphs are available in the menus. Example 3 Charting The results and standings of a hockey league are listed in this spreadsheet. Produce a two-dimensional bar chart using the TEAM and POINTS columns. 1.2 Data Management Software • MHR 17
  • 18. Solution Holding down the Ctrl key, highlight cells A1 to A7 and then G1 to G7. Use the Chart feature and follow the on-screen instructions to customize your graph. You will see a version of the bar graph as shown here. Sorting Spreadsheets have the capability to sort data alphabetically, numerically, by date, and so on. The sort can use multiple criteria in sequence. Cell references will adjust to the new locations of the sorted data. To sort, select the range of cells to be sorted. Then, use the Sort feature. Select the criteria under which the data are to be sorted. A sort may be made in ascending or descending order based on the data in any given column. A sort with multiple criteria can include a primary sort, a secondary sort within it, and a tertiary sort within the secondary sort. Example 4 Sorting Rank the hockey teams in Example 3, counting points first (in descending order), then wins (in descending order), and finally losses (in ascending order). Solution When you select the Sort feature, the pop-up window asks if there is a header row. Confirming that there is a header row excludes the column headings from the sort so that they are left in place. Next, set up a three-stage sort: • a primary sort in descending order, using the points column • then, a secondary sort in descending order, using the wins column • finally, a tertiary sort in ascending order, using the losses column 18 MHR • Tools for Data Management
  • 19. Searching To search for data in individual cells, select Find and Replace. Then, in the dialogue box, enter the data and the criteria under which you are searching. You have the option to search or to search and replace. A filtered search allows you to search for rows containing the data for which you are searching. Arrows will appear at the top of each column containing data. Clicking on an arrow opens a pull-down menu where you can select the data you wish to find. The filter will then display only the rows containing these data. You can filter for a specific value or select custom… to use criteria such as greater than, begins with, and does not contain. To specify multiple criteria, click the And or Or options. You can set different filter criteria for each column. Example 5 Filtered Search In the hockey-league spreadsheet from Example 3, use a filtered search to list only those teams with fewer than 16 points. 1.2 Data Management Software • MHR 19
  • 20. Solution In Microsoft® Excel, select Data/Filter/Autofilter to begin the filter process. Click on the arrow in the POINTS column and select custom… In the dialogue window, select is less than and key in 16. In Corel® Quattro® Pro, you use Tools/Quickfilter/custom…. Now, the filter shows only the rows representing teams with fewer than 16 points. Adding and Referencing Worksheets To add worksheets within your spreadsheet file, click on one of the sheet tabs at the bottom of the data area. You can enter data onto the additional worksheet using any of the methods described above or you can copy and Project Prep paste data from the first worksheet or from another file. To reference data from cells in another worksheet, preface the cell reference The calculation, with the worksheet number for the cells. sorting, and charting Such references allow data entered in sheet A or sheet 1 to be manipulated in capabilities of another sheet without changing the values or order of the original data. Data spreadsheets could edited in the original sheet will be automatically updated in the other sheets be particularly that refer to it. Any sort performed in the original sheet will carry through to useful for your any references in other sheets, but any other data in the secondary sheets will tools for data not be affected. Therefore, it is usually best to either reference all the data in management the secondary sheets or to sort the data only in the secondary sheets. project. 20 MHR • Tools for Data Management
  • 21. Example 6 Sheet Referencing Reference the goals for (GF) and goals against (GA) for the hockey teams in Example 3 on a separate sheet and rank the teams by their goals scored. Solution Sheet 2 needs only the data in the columns titled GF and GA in sheet 1. Notice that cell C2 contains a cell reference to sheet 1. This reference ensures the data in cell F2 of sheet 1 will carry through to cell C2 of sheet 2 even if the data in sheet 1 is edited. Although the referenced and sorted data on sheet 2 appear as shown, the order of the teams on sheet 1 is unchanged. Key Concepts • Thousands of computer programs are available for managing data. These programs range from general-purpose software, such as word-processors and spreadsheets, to highly specialized applications for specific types of data. • A spreadsheet is a software application that is used to enter, display, and manipulate data in rows and columns. Spreadsheet formulas perform calculations based on values or formulas contained in other cells. • Spreadsheets normally use relative cell referencing, which automatically adjusts cell references whenever cells are copied, moved, or sorted. Absolute cell referencing keeps formula references exactly as written. • Spreadsheets can produce a wide variety of charts and perform sophisticated sorts and searches of data. • You can add additional worksheets to a file and reference these sheets to cells in another sheet. Communicate Your Understanding 1. Explain how you could use a word-processor as a data management tool. 2. Describe the advantages and drawbacks of relational database programs. 3. Explain what software you would choose if you wanted to determine whether there was a relationship between class size and subject in your school. Would you choose different software if you were going to look at class sizes in all the schools in Ontario? 4. Give an example of a situation requiring relative cell referencing and one requiring absolute cell referencing. 5. Briefly describe three advantages that spreadsheets have over hand-written tables for storing and manipulating data. 1.2 Data Management Software • MHR 21
  • 22. Practise Apply, Solve, Communicate A B 1. Application Set up a spreadsheet page in which 3. Set up a spreadsheet page that converts you have entered the following lists of data. angles in degrees to radians using the For the appropriate functions, look under the formula Radians = π×Degrees/180, for Statistical category in the Function list. angles from 0° to 360° in steps of 5°. Use Student marks: the series capabilities to build the data in the 65, 88, 56, 76, 74, 99, 43, 56, 72, 81, 80, Degrees column. Use π as defined by the 30, 92 spreadsheet. Calculations should be rounded to the nearest hundredth. Dentist appointment times in minutes: 45, 30, 40, 32, 60, 38, 41, 45, 40, 45 4. The first set of data below represents the a) Sort each set of data from smallest to number of sales of three brands of CD greatest. players at two branches of Mad Dog Music. b) Calculate the mean (average) value for Enter the data into a spreadsheet using two each set of data. rows and three columns. c) Determine the median (middle) value Branch Brand A Brand B Brand C for each set of data. Store P 12 4 8 Store Q 9 15 6 d) Determine the mode (most frequent) BRAND A BRAND B BRAND C value for each set of data. The second set of data Brand Price represents the prices for A $102 2. Using the formula features of the these CD players. Enter B $89 spreadsheet available in your school, write the data using one C $145 a formula for each of the following: column into a second a) the sum of the numbers stored in cells sheet of the same spreadsheet workbook. A1 to A9 Set up a third sheet of the spreadsheet b) the largest number stored in cells F3 to K3 workbook to reference the first two sets of c) the smallest number in the block from data and calculate the total revenue from CD A1 to K4 player sales at each Mad Dog Music store. d) the sum of the cells A2, B5, C7, and D9 5. Application In section 1.1, question 12, you e) the mean, median, and mode of the predicted the value of the nth term of the numbers stored in the cells F5 to M5 recursion formulas listed below. Verify your f) the square root of the number in cell A3 predictions by using a spreadsheet to g) the cube of the number in cell B6 calculate the first ten terms for each formula. h) the number in cell D2 rounded off to a) tn = 2−tn–1; t1 = 0 four decimal places b) tn = ͙tn − 1; t1 = 256 ෆ i) the number of cells between cells D3 and 1 M9 that contain data c) tn = ᎏ ; t1 = 2 tn − 1 j) the product of the values in cells A1, B3, and C5 to C10 k) the value of π 22 MHR • Tools for Data Management
  • 23. 6. a) Enter the data shown in the table below f) Perform a search in the second sheet to into a spreadsheet and set up a second find the cereals containing less than 1 g of sheet with relative cell references to the fat and more than 1.5 g of fibre. Make a Name, Fat, and Fibre cells in the original three-dimensional bar graph of the results. sheet. C Nutritional Content of 14 Breakfast Cereals 7. In section 1.1, question 10, you described (amounts in grams) the algorithm used to draw each fractal tree Name Protein Fat Sugars Starch Fibre Other TOTALS below. Assuming the initial segment is 4 cm Alphabits 2.4 1.1 12.0 12.0 0.9 1.6 in each tree, use a spreadsheet to determine Bran Flakes 4.4 1.2 6.3 4.7 11.0 2.4 the total length of a spiral in each tree, Cheerios 4.0 2.3 0.8 18.7 2.2 2.0 calculated to 12 iterative stages. Crispix 2.2 0.3 3.2 22.0 0.5 1.8 a) Froot Loops 1.3 0.8 14.0 12.0 0.5 1.4 Frosted Flakes 1.4 0.2 12.0 15.0 0.5 0.9 Just Right 2.2 0.8 6.6 17.0 1.4 2.0 Lucky Charms 2.1 1.0 13.0 11.0 1.4 1.5 Nuts ’n Crunch 2.3 1.6 7.1 16.5 0.7 1.8 Rice Krispies 2.1 0.4 2.9 22.0 0.3 2.3 Shreddies 2.9 0.6 5.0 16.0 3.5 2.0 b) Special K 5.1 0.4 2.5 20.0 0.4 1.6 Sugar Crisp 2.0 0.7 14.0 11.0 1.1 1.2 Trix 0.9 1.6 13.0 12.0 1.1 1.4 AVERAGES MAXIMUM MINIMUM b) On the first sheet, calculate the values for the TOTALS column and AVERAGES row. c) Determine the maximum and minimum values in each column. d) Rank the cereals using fibre content in decreasing order as a primary criterion, 8. Communication Describe how to lock column protein content in decreasing order as a and row headings in your spreadsheet secondary criterion, and sugar content in software so that they remain visible when increasing order as a tertiary criterion. you scroll through a spreadsheet. e) Make three circle graphs or pie charts: 9. Inquiry/Problem Solving Outline a one for the averages row in part b), one for the cereal at the top of the list in part spreadsheet algorithm to calculate d), and one for the cereal at the bottom n × (n − 1) × (n − 2) … 3 × 2 × 1 for any of the list in part d). natural number n without using the built-in factorial function. 1.2 Data Management Software • MHR 23
  • 24. TE C H N OL OG Y E X T EN S I O N Introduction to Fathom™ Fathom™ is a statistics software package that offers a variety of powerful data- analysis tools in an easy-to-use format. This section introduces the most basic features of Fathom™: entering, displaying, sorting, and filtering data. A Appendix B includes complete guide is available on the Fathom™ CD. The real power of this details on all the software will be demonstrated in later chapters with examples that apply its Fathom™ functions sophisticated tools to statistical analysis and simulations. used in this text. When you enter data into Fathom™, it creates a collection, an object that contains the data. Fathom™ can then use the data from the collection to produce other objects, such as a graph, table, or statistical test. These secondary objects display and analyse the data from the collection, but they do not actually contain the data themselves. If you delete a graph, table, or statistical test, the data still remains in the collection. Fathom™ displays a collection as a rectangular window with gold balls in it. The gold balls of the collection represent the original or “raw” data. Each of the gold balls represents a case. Each case in a collection can have a number of attributes. For example the cases in a collection of medical records could have attributes such as the patient’s name, age, sex, height, weight, blood pressure, and so on. There are two basic types of attributes, categorical (such as male/female) and continuous (such as height or mass). The case table feature displays the cases in a collection in a format similar to a spreadsheet, with a row for each case and a column for each attribute. You can add, modify, and delete cases using a case table. Example 1 Tables and Graphs a) Set up a collection for the hockey league standings from section 1.2, Example 3 on page 17. b) Graph the Team and Points attributes. Solution a) To enter the data, start Fathom™ and drag the case table icon from the menu bar down onto the work area. 24 MHR • Tools for Data Management
  • 25. Click on the attribute <new>, type the heading Team, and press Enter. Fathom™ will automatically create a blank cell for data under the heading and start a new column to the right of the first. Enter the heading Wins at the top of the new column, and continue this process to enter the rest of the headings. You can type entries into the cells under the headings in much the same way as you would enter data into the cells of a spreadsheet. Note that Fathom™ has stored your data as Collection 1, which will remain intact even if you delete the case table used to enter the data. To give the collection a more descriptive name, double-click on Collection 1 and type in HockeyStats. b) Drag the graph icon onto the work area. Now, drag the Team attribute from the case table to the x-axis of the graph and the Points attribute to the y-axis of the graph. ➔ Your graph should look like this: Technology Extension: Introduction to Fathom™ • MHR 25
  • 26. Fathom™ can easily sort or filter data using the various attributes. Example 2 Sorting and Filtering a) Rank the hockey teams in Example 1 by points first, then by wins if two teams have the same number of points, and finally by losses if two teams have the same number of points and wins. b) List only those teams with fewer than 16 points. c) Set up a separate table showing only the goals for (GF) and goals against (GA) data for the teams and rank the teams by their goals scored. Solution a) To Sort the data, right-click on the Points attribute and choose Sort Descending. Fathom™ will list the team with the most points first, with the others following in descending order by their point totals. To set the secondary sort, right-click on the Wins attribute and choose Sort Descending. Similarly, right- click on the Losses attribute and choose Sort Ascending for the final sort, giving the result below. b) To Filter the data, from the Data menu, choose Add Filter. Click on the plus sign beside Attributes. Now, double-click on the Points attribute, choose the less-than button , and type 16. Click the Apply button and then OK. The results should look like this: 26 MHR • Tools for Data Management
  • 27. The Filter is listed at the bottom as Points < 16. c) Click on HockeyStats, and then drag a new table onto the work area. Click on the Wins attribute. From the Display menu, choose Hide Attribute. Use the same method to hide the Losses, Ties, and Points attributes. Right-click the GF attribute and use Sort Descending to rank the teams. 1. Enter the data from Example 1 into Fathom™. Use the built-in For details on functions functions in Fathom™ to find the following. in Fathom™, see the Fathom™ section of a) the mean of goals against (GA) Appendix B or b) the largest value of goals for (GF) consult the Fathom™ c) the smallest value of GF Help screen or manual. d) the sum of GA e) the sum of GA and GF for each case 2. a) Set up a new collection with the following student marks: 65, 88, 56, 76, 74, 99, 43, 56, 72, 81, 80, 30, 92 b) Sort the marks from lowest to highest. c) Calculate the mean mark. d) Determine the median (middle) mark. 3. Explain how you would create a graph of class size versus subjects in your school using Fathom™. 4. Briefly compare the advantages and disadvantages of using Fathom™ and spreadsheets for storing and manipulating data. www.mcgrawhill.ca/links/MDM12 For more examples, data, and information on how to use Fathom™, visit the above web site and follow the links. Technology Extension: Introduction to Fathom™ • MHR 27
  • 28. 1.3 Databases A database is an organized store of records. Databases may contain information about almost any subject— incomes, shopping habits, demographics, features of cars, and so on. I N V E S T I G AT E & I N Q U I R E : D a t a b a s e s i n a L i b r a r y In your school or local public library, log on to the library catalogue. 1. Describe the types of fields under which a search can be conducted (e.g., subject). 2. Conduct a search for a specific topic of your choice. 3. Describe the results of your search. How is the information presented to the user? I N V E S T I G AT E & I N Q U I R E : T h e E - S TAT D a t a b a s e 1. Connect to the Statistics Canada web site and go to the E-STAT database. Your school may have a direct link to this database. If not, you can follow the Web Connection links shown here. You may need to get a password from your teacher to log in. 2. Locate the database showing the educational attainment data for Canada by following these steps: a) Click on Data. b) Under the heading People, click on www.mcgrawhill.ca/links/MDM12 Education. c) Click on Educational Attainment, To connect to E-STAT visit the above web site and then under the heading Census follow the links. databases, select Educational Attainment again. d) Select Education, Mobility and Migration for the latest census. 28 MHR • Tools for Data Management
  • 29. 3. Scroll down to the heading University, pop. 15 years and over by highest level of schooling, hold down the Ctrl key, and select all four subcategories under this heading. View the data in each of the following formats: a) table b) bar graph c) map 4. Describe how the data are presented in each instance. What are the advantages and disadvantages of each format? Which format do you think is the most effective for displaying this data? Explain why. 5. Compare the data for the different provinces and territories. What conclusions could you draw from this data? A database record is a set of data that is treated as a unit. A record is usually divided into fields that are reserved for specific types of information. For example, the record for each person in a telephone book has four fields: last name, first name or initial, address, and telephone number. This database is sorted in alphabetical order using the data in the first two fields. You search this database by finding the page with the initial letters of a person’s name and then simply reading down the list. A music store will likely keep its inventory records on a computerized database. The record for each different CD could have fields for information, such as title, artist, publisher, music type, price, number in stock, and a product code (for example, the bar code number). The computer can search such databases for matches in any of the data fields. The staff of the music store would be able to quickly check if a particular CD was in stock and tell the customer the price and whether the store had any other CDs by the same artist. Databases in a Library A library catalogue is a database. In the past, library databases were accessed through a card catalogue. Most libraries are now computerized, with books listed by title, author, publisher, subject, a Dewey Decimal or Library of Congress catalogue number, and an international standard book number (ISBN). Records can be sorted and searched using the information in any of the fields. Such catalogues are examples of a well-organized database because they Project are easy to access using keywords and searches in multiple fields, many of Prep which are cross-referenced. Often, school libraries are linked to other Skills in researching libraries. Students have access to a variety of print and online databases in library and on-line the library. One powerful online database is Electric Library Canada, a databases will help you database of books, newspapers, magazines, and television and radio find the information transcripts. Your school probably has access to it or a similar library needed for your tools database. Your local public library may also have online access to Electric for data management Library Canada. project. 1.3 Databases • MHR 29
  • 30. Statistics Canada Statistics Canada is the federal government department responsible for collecting, summarizing, analysing, and storing data relevant to Canadian demographics, education, health, and so on. Statistics Canada maintains a number of large databases using data collected from a variety of sources including its own research and a nation-wide census. One such database is CANSIM II (the updated version of the Canadian Socio-economic Information Management System), which profiles the Canadian people, economy, and industries. Although Statistics Canada charges a fee for access to some of its data, a variety of CANSIM II data is available to the public for free on Statistics Canada’s web site. Statistics Canada also has a free educational database, called E-STAT. It gives Data in Action access to many of Statistics Canada’s extensive, well-organized databases, By law, Statistics including CANSIM II. E-STAT can display data in a variety of formats and Canada is required allows students to download data into a spreadsheet or statistical software to conduct a census program. of Canada’s population and agriculture every five years. For the 2001 census, Statistics Canada needed about 37 000 people to distribute the questionnaires. Entering the data from the approximately 13.2 million questionnaires will take about 5 billion keystrokes. 30 MHR • Tools for Data Management
  • 31. Key Concepts • A database is an organized store of records. A well-organized database can be easily accessed through searches in multiple fields that are cross-referenced. • Although most databases are computerized, many are available in print form. Communicate Your Understanding 1. For a typical textbook, describe how the table of contents and the index are sorted. Why are they sorted differently? 2. Describe the steps you need to take in order to access the 1860−61 census results through E-STAT. Practise 3. a) Describe how you would locate a database showing the ethnic makeup of A your municipality. List several possible 1. Which of the following would be considered sources. databases? Explain your reasoning. b) If you have Internet access, log onto a) a dictionary E-STAT and go to the data on ethnic b) stock-market listings origins of people in large urban centres: c) a catalogue of automobile specifications i) Select Data on the Table of Contents and prices page. d) credit card records of customers’ ii) Select Population and Demography. spending habits iii) Under Census, select Ethnic Origin. e) an essay on Shakespeare’s Macbeth iv) Select Ethnic Origin and Visible f) a teacher’s mark book Minorities for the latest census in g) the Guinness World Records book large urban centres. h) a list of books on your bookshelf v) Enter a postal code for an urban area and select two or more ethnic origins Apply, Solve, Communicate while holding down the Ctrl key. B vi) View table, bar graph, and map in 2. Describe each field you would include in a turn and describe how the data are database of presented in each instance. a) a person’s CD collection c) Compare these results with the data you b) a computer store’s software inventory get if you leave the postal code section line blank. What conclusions could you c) a school’s textbook inventory draw from the two sets of data? d) the backgrounds of the students in a school e) a business’s employee records 1.3 Databases • MHR 31
  • 32. 4. Application 6. Application The Internet is a link between a) Describe how you could find data to many databases. Search engines, such as compare employment for males and Yahoo Canada, Lycos, Google, and Canoe, females. List several possible sources. are large databases of web sites. Each search engine organizes its database differently. b) If you have Internet access, log onto E-STAT and go to the data on a) Use three different search engines to employment and work activity: conduct a search using the keyword automobile. Describe how each search i) Under the People heading, select engine presents its data. Labour. b) Compare the results of searches with ii) Under the Census databases heading, three different search engines using the select Salaries and Wages. following keywords: iii) Select Sources of Income (Latest i) computer monitors census, Provinces, Census Divisions, Municipalities). ii) computer+monitors iv) While holding down the Ctrl key, iii) computer or monitors click on All persons with iv) “computer monitors” employment income by work activity, Males with employment income by 7. Use the Internet to check whether the map pte ha of VIA Rail routes at the start of this chapter work activity, and Females with C r employment income by work activity. is up-to-date. Are there still no trains that m P r oble go from Montréal or Kingston right v) Download this data as a spreadsheet through to Windsor? file. Record the path and file name for the downloaded data. 8. Communication Log on to the Electric c) Open the data file with a spreadsheet. Library Canada web site or a similar You may have to convert the format to database available in your school library. match your spreadsheet software. Use Enter your school’s username and password. your spreadsheet to Perform a search for magazine articles, i) calculate the percentage difference newspaper articles, and radio transcripts between male and female about the “brain drain” or another issue of employment interest to you. Describe the results of your search. How many articles are listed? How ii) display all fields as a bar graph are the articles described? What other 5. Communication Go to the reference area of information is provided? your school or local library and find a published database in print form. a) Briefly describe how the database is organized. b) Describe how to search the database. c) Make a list of five books that are set up as databases. Explain why they would be considered databases. 32 MHR • Tools for Data Management
  • 33. 1.4 Simulations A simulation is an experiment, model, or activity that imitates real or hypothetical conditions. The newspaper article shown here describes how astrophysicists used computers to simulate a collision between Earth and a planet the size of Mars, an event that would be impossible to measure directly. The simulation showed that such a collision could have caused both the formation of the moon and the rotation of Earth, strengthening an astronomical theory put forward in the 1970s. I N V E S T I G AT E & I N Q U I R E : Simulations For each of the following, describe what is being simulated, the advantages of using a simulation, and any drawbacks. a) crash test dummies b) aircraft simulators c) wind tunnels d) zero-gravity simulator e) 3-D movies f) paint-ball games g) movie stunt actors h) grow lights i) architectural scale models In some situations, especially those with many variables, it can be difficult to calculate an exact value for a quantity. In such cases, simulations often can provide a good estimate. Simulations can also help verify theoretical calculations. Example 1 Simulating a Multiple-Choice Test When writing a multiple-choice test, you may have wondered “What are my chances of passing just by guessing?” Suppose that you make random guesses on a test with 20 questions, each having a choice of 5 answers. Intuitively, you would assume that your mark will be somewhere around 4 out of 20 since there is a 1 in 5 chance of guessing right on each question. However, it is possible that you could get any number of the questions right—anywhere from zero to a perfect score. a) Devise a simulation for making guesses on the multiple-choice test. b) Run the simulation 100 times and use the results to estimate the mark you are likely to get, on average. c) Would it be practical to run your simulation 1000 times or more? 1.4 Simulations • MHR 33
  • 34. Solution 1 Using Pencil and Paper a) Select any five cards from a deck of cards. Designate one of these cards to represent guessing the correct answer on a question. Shuffle the five cards and choose one at random. If it is the designated card, then you got the first question right. If one of the other four cards is chosen, then you got the question wrong. Put the chosen card back with the others and repeat the process 19 times to simulate answering the rest of the questions on the test. Keep track of the number of right answers you obtained. b) You could run 100 simulations by repeating the process in part a) over and over. However, you would have to choose a card 2000 times, which would be quite tedious. Instead, form a group with some of your classmates and pool your results, so that each student has to run only 10 to 20 repetitions of the simulation. Make a table of the scores on the 100 simulated tests and calculate the mean score. You will usually find that this average is fairly close to the intuitive estimate of a score around 4 out of 20. However, a mean does not tell the whole story. Tally up the number of times each score appears in your table. Now, construct a bar graph showing the frequency for each score. Your graph will look something like the one shown. 25 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 This graph gives a much more detailed picture of the results you could expect. Although 4 is the most likely score, there is also a good chance of getting 2, 3, 5, or 6, but the chance of guessing all 20 questions correctly is quite small. c) Running the simulation 1000 times would require shuffling the five cards and picking one 20 000 times—possible, but not practical. 34 MHR • Tools for Data Management
  • 35. Solution 2 Using a Graphing Calculator See Appendix B for more details on how to a) You can use random numbers as the basis for a simulation. If you use the graphing generate random integers from 1 to 5, you can have 1 correspond to calculator and a correct guess and 2 through 5 correspond to wrong answers. software functions in Solutions 2 to 4. Use the STAT EDIT menu to view lists L1 and L2. Make sure both lists are empty. Scroll to the top of L1 and enter the randInt function from the MATH PRB menu. This function produces random integers. Enter 1 for the lower limit, 5 for the upper limit, and 20 for the number of trials. L1 will now contain 20 random integers between 1 and 5. Next, sort the list with the SortA function on the LIST OPS menu. Press 2nd 1 to enter L1 into the sort function. When you return to L1, the numbers in it will appear in ascending order. Now, you can easily scroll down the list to determine how many correct answers there were in this simulation. b) The simplest way to simulate 100 tests is to repeat the procedure in part a) and keep track of the results by entering the number of correct answers in L2. Again, you may want to pool results with your classmates to reduce the number of times you have to enter the same formula over and over. If you know how to program your calculator, you can set it to re-enter the formulas for you automatically. However, unless you are experienced in programming the calculator, it will probably be faster for you to just re-key the formulas. Once you have the scores from 100 simulations in L2, calculate the average using the mean function on the LIST MATH menu. To see which scores occur most frequently, plot L2 using STAT PLOT. i) Turn off all plots except Plot1. ii) For Type, choose the bar-graph icon and enter L2 for Xlist. Freq should be 1, the default value. iii) Use ZOOM/ZoomStat to set the window for the data. Press WINDOW to check the window settings. Set Xscl to 1 so that the bars correspond to integers. iv) Press GRAPH to display the bar graph. c) It is possible to program the calculator to run a large number of simulations automatically. However, the maximum list length on the TI-83 Plus is 999, so you would have to use at least two lists to run the simulation a 1000 times or more. 1.4 Simulations • MHR 35
  • 36. Solution 3 Using a Spreadsheet a) Spreadsheets have built-in functions that you can use to generate and count the random numbers for the simulation. The RAND() function produces a random real number that is equal to or greater than zero and less than one. The INT function rounds a real number down to the nearest integer. Combine these functions to generate a random integer between 1 and 5. In Microsoft® Excel, you can use Enter the formula INT(RAND( )*5)+1 or RANDBETWEEN(1,5) in A1 and copy RANDBETWEEN only if it down to A20. Next, use the COUNTIF function to count the number of 1s you have the Analysis in column A. Record this score in cell A22. Toolpak installed. b) To run 100 simulations, copy A1:A22 into columns B through CV using the Fill feature. Then, use the average function to find the mean score for the 100 simulated tests. Record this average in cell B23. Next, use the COUNTIF function to find the number of times each possible score occurs in cells A22 to CV22. Enter the headings SUMMARY, Score, and Frequency in cells A25, A26, and A27, respectively. Then, enter 0 in cell B26 and highlight cells B26 through V26. Use the Fill feature to enter the integers 0 through 20 in cells B26 through V26. In B27, enter the formula for the number of zero scores; in C27, the number of 1s; in D27, the number of 2s; and so on, finishing with V27 having the number of perfect 36 MHR • Tools for Data Management
  • 37. scores. Note that by using absolute cell referencing you can simply copy the COUNTIF function from B27 to the other 20 cells. Finally, use the Chart feature to plot frequency versus score. Highlight cells A26 through V27, then select Insert/Chart/XY. c) The method in part b) can easily handle 1000 simulations or more. Solution 4 Using FathomTM a) FathomTM also has built-in functions to generate random numbers and count the scores in the simulations. Launch FathomTM and open a new document if necessary. Drag a new collection box to the document and rename it MCTest. Right-click on the box and create 20 new cases. Drag a case table to the work area. You should see your 20 cases listed. Expand the table if you cannot see them all on the screen. Rename the <new> column Guess. Right-click on Guess and select Edit Formula, Expand Functions, then Random Numbers. Enter 1,5 into the randomInteger() function and click OK to fill the Guess column with random integers between 1 and 5. Scroll down the column to see how many correct guesses there are in this simulation. 1.4 Simulations • MHR 37
  • 38. b) You can run a new simulation just by pressing Ctrl-Y, which will fill the Guess column with a new set of random numbers. Better still, you can set FathomTM to automatically repeat the simulation 100 times automatically and keep track of the number of correct guesses. First, set up the count function. Right–click on the collection box and select Inspect Collection. Select the Measures tab and rename the <new> column Score. Then, right-click the column below Formula and select Edit Formula, Functions, Statistical, then One Attribute. Select count, enter Guess = 1 between the brackets, and click OK to count the number of correct guesses in your case table. Click on the MCTest collection box. Now, select Analyse, Collect Measures from the main menu bar, which creates a new collection box called Measures from MCTest. Click on this box and drag a new case table to the document. FathomTM will automatically run five simulations of the multiple-choice test and show the results in this case table . To simulate 100 tests, right-click on the Measures from MCTest collection box and select Inspect Collection. Turn off the animation in order to speed up the simulation. Change the number of measures to 100. Then, click on the Collect More Measures button. You should now have 100 measures in the case table for Measures from MCTest. Next, use the mean function to find the average score for these simulations. Go back to the Inspect Measures from MCTest collection box and change the column heading <new> to Average. Right-click on Formula and select Edit Formula, Functions, Statistical, then One Attribute. Select mean, enter Score between the brackets, and select OK to display the mean mark on the 100 tests. Finally, plot a histogram of the scores from the simulations. Drag the graph icon onto the work area. Then, drag the Score column from the Measures from MCTest case table to the horizontal axis of the graph. FathomTM then automatically produces a dot plot of your data. To display a histogram instead, simply click the menu in the upper right hand corner of the graph and choose Histogram. 38 MHR • Tools for Data Management
  • 39. c) FathomTM can easily run this simulation 1000 times or more. All you have to do is change the number of measures. Key Concepts • Simulations can be useful tools for estimating quantities that are difficult to calculate and for verifying theoretical calculations. • A variety of simulation methods are available, ranging from simple manual models to advanced technology that makes large-scale simulations feasible. Communicate Your Understanding 1. Make a table summarizing the pros and cons of the four simulation methods used in Example 1. 2. A manufacturer of electric motors has a failure rate of 0.2% on one of its products. A quality-control inspector needs to know the range of the number of failures likely to occur in a batch of 1000 of these motors. Which tool would you use to simulate this situation? Give reasons for your choice. 1.4 Simulations • MHR 39
  • 40. Practise 7. Inquiry/Problem Solving Consider a random walk in which a coin toss determines the A direction of each step. On the odd- 1. Write a graphing calculator formula for numbered tosses, walk one step north for a) generating 100 random integers between heads and one step south for tails. On even- 1 and 25 numbered tosses, walk one step east for heads and one step west for tails. b) generating 24 random integers between −20 and 20 a) Beginning at position (0, 0) on a Cartesian graph, simulate this random 2. Write a spreadsheet formula for walk for 100 steps. Note the coordinates a) generating 100 random numbers where you finish. between 1 and 25 b) Repeat your simulation 10 times and b) generating 100 random integers between record the results. 1 and 25 c) Use these results to formulate a hypothesis c) generating 16 random integers between about the endpoints for this random walk. −40 and 40 d) Change the rules of the random walk and d) counting the number of entries that investigate the effect on the end points. equal 42.5 in the range C10 to V40 ACHIEVEMENT CHECK Apply, Solve, Communicate Knowledge/ Thinking/Inquiry/ Communication Application Understanding Problem Solving B 3. Communication Identify two simulations you 8. a) Use technology to simulate rolling two use in everyday life and list the advantages dice 100 times and record the sum of of using each simulation. the two dice each time. Make a histogram of the sums. 4. Describe three other manual methods you b) Which sum occurs most often? Explain could use to simulate the multiple-choice why this sum is likely to occur more test in Example 1. often than the other sums. 5. Communication c) Which sum or sums occur least often? Explain this result. a) Describe a calculation or mechanical process you could use to produce d) Suppose three dice are rolled 100 times random integers. and the sums are recorded. What sums would you expect to be the most b) Could you use a telephone book to frequent and least frequent? Give generate random numbers? Explain reasons for your answers. why or why not. 6. Application A brother and sister each tell the truth two thirds of the time. The brother C stated that he owned the car he was driving. 9. Communication Describe a quantity that The sister said he was telling the truth. would be difficult to calculate or to measure Develop a simulation to show whether you in real life. Outline a simulation procedure should believe them. you could use to determine this quantity. 40 MHR • Tools for Data Management
  • 41. 1.5 Graph Theory Graph theory is a branch of mathematics in which graphs or networks are used to solve problems in many fields. Graph theory has many applications, such as • setting examination timetables • colouring maps • modelling chemical compounds • designing circuit boards • building computer, communication, or transportation networks • determining optimal paths In graph theory, a graph is unlike the traditional Cartesian graph used for graphing functions and relations. A graph (also known as a network) is a collection of line segments and nodes. Mathematicians usually call the nodes vertices and the line segments edges. Networks can illustrate the relationships among a great variety of objects or sets. This network is an illustration of the subway system in Toronto. In order to show the connections between subway stations, this map is not to scale. In fact, networks are rarely drawn to scale. I N V E S T I G AT E & I N Q U I R E : Map Colouring In each of the following diagrams the lines represent borders between countries. Countries joined by a line segment are considered neighbours, but countries joining at only a single point are not. 1. Determine the smallest number of colours needed for each map such that all neighbouring countries have different colours. a) b) 1.5 Graph Theory • MHR 41
  • 42. c) d) e) 2. Make a conjecture regarding the maximum number of colours needed to colour a map. Why do you think your conjecture is correct? Although the above activity is based on maps, it is very mathematical. It is about solving problems involving connectivity. Each country could be represented as a node or vertex. Each border could be represented by a segment or edge. Example 1 Representing Maps With Networks Represent each of the following maps with a network. a) b) A B A B D C D C F E Solution a) Let A, B, C, and D be vertices representing countries A, B, C, and A D, respectively. A shares a border with both B and D but not with C, so A should be connected by edges to B and D only. Similarly, B D B is connected to only A and C; C, to only B and D; and D, to only A and C. C b) Let A, B, C, D, E, and F be vertices representing countries A, B, C, A B D, E, and F, respectively. Note that the positions of the vertices are not important, but their interconnections are. A shares borders with C F B, C, and F, but not with D or E. Connect A with edges to B, C, and F only. Use the same process to draw the rest of the edges. E D 42 MHR • Tools for Data Management
  • 43. As components of networks, edges could represent connections such as roads, wires, pipes, or air lanes, while vertices could represent cities, switches, airports, computers, or pumping stations. The networks could be used to carry vehicles, power, messages, fluid, airplanes, and so on. If two vertices are connected by an edge, they are considered to be adjacent. A B In the network on the right, A and B are adjacent, as are B and C. A and C are not adjacent. The number of edges that begin or end at a vertex is called the degree of the C vertex. In the network, A has degree 1, B has degree 2, and C has degree 3. The loop counts as both an edge beginning at C and an edge ending at C. Any connected sequence of vertices is called a path. If the path begins and ends at the same vertex, the path is called a circuit. A circuit is independent of the starting point. Instead, the circuit depends on the route taken. Example 2 Circuits Determine if each path is a circuit. a) A B b) A B c) A B D C D C D C Solution a) Path: BC to CD to DA Since this path begins at B and ends at A, it is not a circuit. b) Path: BC to CD to DA to AB This path begins at B and ends at B, so it is a circuit. c) Path: CA to AB to BC to CD to DA Since this path begins at C and ends at A, it is not a circuit. A network is connected if and only if there is at least one path connecting each pair of vertices. A complete network is a network with an edge between every pair of vertices. Connected but not complete: Not Connected and complete: All vertices Neither connected nor complete: all vertices are joined directly. are joined to each other by edges. Not all vertices are joined. 1.5 Graph Theory • MHR 43
  • 44. In a traceable network all the vertices are connected to at least one other vertex and all the edges can be travelled exactly once in a continuous path. B B P P A C A C S Q S Q D D R R Traceable: All vertices are connected to at least Non-traceable: No continuous path can one other vertex, and the path from A to B to C travel all the edges only once. to D to A to C includes all the edges without repeating any of them. Example 3 The Seven Bridges of Koenigsberg The eighteenth-century German town of Koenigsberg (now the Russian city of Kaliningrad) was situated on two islands and the banks of the Pregel River. Koenigsberg had seven bridges as shown in the map. People of the town believed—but could not prove—that it was impossible to tour the town, crossing each bridge exactly once, regardless of where the tour started or finished. Were they right? Solution Reduce the map to a simple network of vertices and edges. Let vertices A A and C represent the mainland, with B and D representing the islands. Each edge represents a bridge joining two parts of the town. B D C If, for example, you begin at vertex D, you will Begin Pass through leave and eventually return but, because D has and leave a degree of 3, you will have to leave again. Return Return and end D D Leave again Conversely, if you begin elsewhere, you will pass through vertex D at some point, entering by one edge and leaving by another. But, because D has degree 3, you must return in order to trace the third edge and, therefore, 44 MHR • Tools for Data Management
  • 45. must end at D. So, your path must either begin or end at vertex D. Because all the vertices are of odd degree, the same argument applies to all the other vertices. Since you cannot begin or end at more than two vertices, the network is non-traceable. Therefore, it is indeed impossible to traverse all the town’s bridges without crossing one twice. Leonhard Euler developed this proof of Example 3 in 1735. He laid the foundations for the branch of mathematics now called graph theory. Among other discoveries, Euler found the following general conditions about the traceability of networks. • A network is traceable if it has only vertices of even degree (even vertices) or exactly two vertices of odd degree (odd vertices). • If the network has two vertices of odd degree, the tracing path must begin at one vertex of odd degree and end at the other vertex of odd degree. Example 4 Traceability and Degree For each of the following networks, a) list the number of vertices with odd degree and with even degree b) determine if the network is traceable i) ii) iii) iv) Solution i) a) 3 even vertices ii) a) 0 even vertices iii) a) 3 even vertices iv) a) 1 even vertex 0 odd vertices 4 odd vertices 2 odd vertices 4 odd vertices b) traceable b) non-traceable b) traceable b) non-traceable If it is possible for a network to be drawn on a two-dimensional surface so that the edges do not cross anywhere except at vertices, it is planar. Example 5 Planar Networks Determine whether each of the following networks is planar. a) b) c) d) e) 1.5 Graph Theory • MHR 45
  • 46. Solution a) Planar b) Planar c) Planar d) can be redrawn as Therefore, the network is planar. e) cannot be redrawn as a planar network: Therefore, the network is non-planar. Example 6 Map Colouring (The Four-Colour Problem) A graphic designer is working on a logo representing the different tourist regions in Ontario. What is the minimum number of colours required for D the design shown on the right to have all adjacent areas coloured B A E differently? C Solution Because the logo is two-dimensional, you can redraw it as a planar network B D as shown on the right. This network diagram can help you see the relationships between the regions. The vertices represent the regions and A the edges show which regions are adjacent. Vertices A and E both connect to the three other vertices but not to each other. Therefore, A and E can C E have the same colour, but it must be different from the colours for B, C, and D. Vertices B, C, and D all connect to each other, so they require three different colours. Thus, a minimum of four colours is necessary for the logo. 46 MHR • Tools for Data Management
  • 47. This example is a specific case of a famous problem in graph theory called the four-colour problem. As you probably conjectured in the investigation at the start of this www.mcgrawhill.ca/links/MDM12 section, the maximum number of colours required in any planar map is four. This fact had been suspected Visit the above web site and follow the links to find out more about the four-colour problem. Write a for centuries but was not proven until 1976. The short report on the history of the four-colour proof by Wolfgang Haken and Kenneth Appel at problem. the University of Illinois required a supercomputer to break the proof down into cases and many years of verification by other mathematicians. Non-planar maps can require more colours. Example 7 Scheduling The mathematics department has five committees. Each of these committees meets once a month. Membership on these committees is as follows: Committee A: Szczachor, Large, Ellis Committee B: Ellis, Wegrynowski, Ho, Khan Committee C: Wegrynowski, Large Committee D: Andrew, Large, Szczachor Committee E: Bates, Card, Khan, Szczachor What are the minimum number of time slots needed to schedule the committee meetings with no conflicts? Solution Draw the schedule as a network, with each vertex representing a different A B committee and each edge representing a potential conflict between committees (a person on two or more committees). Analyse the network as if you were colouring a map. E C The network can be drawn as a planar graph. Therefore, a maximum of four time slots is necessary to “colour” this graph. Because Committee A D is connected to the four other committees (degree 4), at least two time slots are necessary: one for committee A and at least one for Project all the other committees. Because each of the other nodes has degree 3, Prep at least one more time slot is necessary. In fact, three time slots are sufficient since B is not connected to D and C is not connected to E. Graph theory provides problem-solving Time Slot Committees techniques that will be 1 A useful in your tools for 2 B, D data management 3 C, E project. 1.5 Graph Theory • MHR 47
  • 48. Key Concepts • In graph theory, a graph is also known as a network and is a collection of line segments (edges) and nodes (vertices). • If two vertices are connected by an edge, they are adjacent. The degree of a vertex is equal to the number of edges that begin or end at the vertex. • A path is a connected sequence of vertices. A path is a circuit if it begins and ends at the same vertex. • A connected network has at least one path connecting each pair of vertices. A complete network has an edge connecting every pair of vertices. • A connected network is traceable if it has only vertices of even degree (even vertices) or exactly two vertices of odd degree (odd vertices). If the network has two vertices of odd degree, the tracing must begin at one of the odd vertices and end at the other. • A network is planar if its edges do not cross anywhere except at the vertices. • The maximum number of colours required to colour any planar map is four. Communicate Your Understanding 1. Describe how to convert a map into a network. Use an example to aid in your description. 2. A network has five vertices of even degree and three vertices of odd degree. Using a diagram, show why this graph cannot be traceable. 3. A modern zoo contains natural habitats for its animals. However, many of the animals are natural enemies and cannot be placed in the same habitat. Describe how to use graph theory to determine the number of different habitats required. 48 MHR • Tools for Data Management
  • 49. Practise 5. Is it possible to add one bridge to the Koenigsberg map to make it traceable? A Provide evidence for your answer. 1. For each network, 6. Inquiry/Problem Solving The following chart i) find the degree of each vertex indicates the subjects studied by five students. ii) state whether the network is traceable C. Powell B. Bates G. Farouk a) A b) P E English Calculus Calculus S French French French History Geometry Geography Q U B Music Physics Music D T E. Ho N. Khan C R Calculus English 2. Draw a network diagram representing the English Geography maps in questions 1d) and 1e) of the Geometry Mathematics of Data investigation on pages 41 and 42. Mathematics of Data Management Management Physics 3. a) Look at a map of Canada. How many colours are needed to colour the ten a) Draw a network to illustrate the overlap provinces and three territories of Canada? of subjects these students study. b) How many colours are needed if the map b) Use your network to design an includes the U.S.A. coloured with a examination timetable without conflicts. single colour? (Hint: Consider each subject to be one vertex of a network.) Apply, Solve, Communicate 7. A highway inspector wants to travel each B road shown once and only once to inspect 4. The following map is made up of curved for winter damage. Determine whether it is lines that cross each other and stop only at possible to do so for each map shown below. the boundary of the map. Draw three other a) maps using similar lines. Investigate the four maps and make a conjecture of how many colours are needed for this type of map. b) 1.5 Graph Theory • MHR 49
  • 50. 8. Inquiry/Problem Solving 10. Application a) Find the degree of each vertex in the a) Three houses are located at positions A, network shown. B, and C, respectively. Water, gas, and A electrical utilities are located at positions D, E, and F, respectively. Determine whether the houses can each be B connected to all three utilities without D any of the connections crossing. Provide evidence for your decision. Is it necessary to reposition any of the utilities? Explain. C b) Find the sum of the degrees of the A D vertices. B E c) Compare this sum with the number of edges in the network. Investigate other networks and determine the sum of the C F degrees of their vertices. b) Show that a network representing two d) Make a conjecture from your observations. houses attached to n utilities is planar. 11. The four Anderson sisters live near each 9. a) The following network diagram of the main floor of a large house uses vertices other and have connected their houses by to represent rooms and edges to a network of paths such that each house has represent doorways. The exterior of the a path leading directly to each of the other house can be treated as one room. Sketch three houses. None of these paths intersect. a floor plan based on this network. Can their brother Warren add paths from his house to each of his sisters’ houses Library Conservatory without crossing any of the existing paths? 12. In the diagram below, a sheet of paper with Kitchen Dining a circular hole cut out partially covers a Room Family drawing of a closed figure. Given that point Hallway Room A is inside the closed figure, determine Living Room whether point B is inside or outside. Provide Tea Room reasons for your answer. Parlour Exterior b) Draw a floor plan and a network diagram A for your own home. B 50 MHR • Tools for Data Management
  • 51. 13. Application A communications network 15. In a communications network, the optimal between offices of a company needs to path is the one that provides the fastest link. provide a back-up link in case one part of a In the network shown, all link times are in path breaks down. For each network below, seconds. determine which links need to be backed up. Thunder Bay Describe how to back up the links. 2.7 a) Thunder Bay Sudbury Sudbury 4.5 1.7 North Bay 2.3 2.0 North Bay 0.5 Ottawa Kitchener Kitchener Ottawa 0.8 1.2 0.6 1.2 Windsor 1.4 Hamilton Hamilton Determine the optimal path from Windsor a) Thunder Bay to Windsor b) b) Hamilton to Sudbury Charlottetown c) Describe the method you used to Halifax estimate the optimal path. Toronto Montréal 16. A salesperson must travel by air to all of the Kingston cities shown in the diagram below. The Winnipeg Saskatoon diagram shows the cheapest one-way fare for Edmonton flights between the cities. Determine the Vancouver least expensive travel route beginning and ending in Toronto. 14. During an election campaign, a politician Thunder Bay will visit each of the cities on the map below. $319 $150 Sudbury Waterloo 55 Guelph Vancouver $225 Stratford 31 60 $378 $175 63 23 $111 $349 41 23 Toronto Cambridge Orangeville $378 $213 46 51 Calgary $349 Woodstock 52 116 25 Halifax $119 $218 38 $321 Brantford 45 Hamilton $109 Windsor Montréal a) Is it possible to visit each city only once? $399 b) Is it possible to begin and end in the same city? c) Find the shortest route for visiting all the cities. (Hint: You can usually find the shortest paths by considering the shortest edge at each vertex.) 1.5 Graph Theory • MHR 51
  • 52. ACHIEVEMENT CHECK 19. Inquiry/Problem Solving Use graph theory to determine if it is possible to draw the Knowledge/ Thinking/Inquiry/ Understanding Problem Solving Communication Application diagram below using only three strokes of a pencil. 17. The diagram below shows the floor plan of a house. 20. Communication a) Find a route that passes through each a) Can a connected graph of six vertices doorway of this house exactly once. be planar? Explain your answer. b) Use graph theory to explain why such b) Can a complete graph of six vertices a route is possible. be planar? Explain. c) Where could you place two exterior 21. Can the graph below represent a map in two doors so that it is possible to start dimensions. Explain. outside the house, pass through each doorway exactly once, and end up on B the exterior again? Explain your reasoning. d) Is a similar route possible if you add A C three exterior doors instead of two? Explain your answer. E D C 22. Can a network have exactly one vertex with 18. a) Six people at a party are seated at a table. an odd degree? Provide evidence to support No three people at the table know each your answer. other. For example, if Aaron knows Carmen and Carmen knows Allison, then 23. Communication A graph is regular if all its Aaron and Allison do not know each vertices have the same degree. Consider other. Show that at least three of the six graphs that do not have either loops people seated at the table must be connecting a vertex back to itself or multiple strangers to each other. (Hint: Model this edges connecting any pair of vertices. situation using a network with six a) Draw the four regular planar graphs that vertices.) have four vertices. b) Show that, among five people, it is b) How many regular planar graphs with possible that no three all know each five vertices are there? other and that no three are all strangers. c) Explain the difference between your results in parts a) and b). 52 MHR • Tools for Data Management
  • 53. 1.6 Modelling With Matrices A matrix is a rectangular array of numbers used to manage and organize data, somewhat like a table or a page in a spreadsheet. Matrices are made up of horizontal rows and vertical columns and are usually enclosed in square brackets. Each number appearing in the matrix is called an entry. For instance, A = ΄5 21 0 ΅ −2 3 is a matrix with two rows and three columns, with entries 5, −2, and 3 in the first row and entries 2, 1, and 0 in the second row. The dimensions of this matrix are 2 × 3. A matrix with m rows and n columns has dimensions of m × n. I N V E S T I G AT E & I N Q U I R E : Olympic Medal Winners At the 1998 Winter Olympic games in Nagano, Japan, Germany won 12 gold, 9 silver, and 8 bronze medals; Norway won 10 gold, 10 silver, and 5 bronze medals; Russia won 9 gold, 6 silver, and 3 bronze medals; Austria won 3 gold, 5 silver, and 9 bronze medals; Canada won 6 gold, 5 silver, and 4 bronze medals; and the United States won 6 gold, 3 silver, and 4 bronze medals. 1. Organize the data using a matrix with a row for each type of medal and a column for each country. 2. State the dimensions of the matrix. 3. a) What is the meaning of the entry in row 3, column 1? b) What is the meaning of the entry in row 2, column 4? 4. Find the sum of all the entries in the first row of the matrix. What is the significance of this row sum? What would the column sum represent? 5. Use your matrix to estimate the number of medals each country would win if the number of Olympic events were to be increased by 20%. 6. a) Interchange the rows and columns in your matrix by “reflecting” the matrix in the diagonal line beginning at row 1, column 1. b) Does this transpose matrix provide the same information? What are its dimensions? 7. State one advantage of using matrices to represent data. 1.6 Modelling With Matrices • MHR 53
  • 54. In general, use a capital letter as the symbol for a matrix and represent each entry using the corresponding lowercase letter with two indices. For example, ΄ ΅ c11 c12 c13 … c1n … ΄ ΅ ΄ ΅ a11 a12 a13 b11 b12 c21 c22 c23 c2n A = a21 a22 a23 B = b21 b22 C = c31 c32 c33 … c3n a31 a32 a33 b31 b32 Ӈ Ӈ Ӈ Ӈ Ӈ cm1 cm2 cm3 … cmn Here, ai j , bi j , and ci j represent the entries in row i and column j of these matrices. The transpose of a matrix is indicated by a superscript t, so the transpose of A is shown as At. A matrix with only one row is called a row matrix, and a matrix with only one column is a column matrix. A matrix with the same number of rows as columns is called a square matrix. ΄ ΅ ΄ ΅ −3 3 4 9 [1 −2 5 −9] 0 −1 0 2 5 5 −10 −3 a row matrix a column matrix a square matrix Example 1 Representing Data With a Matrix The number of seats in the House of Commons won by each party in the federal election in 1988 were Bloc Québécois (BQ), 0; Progressive Conservative Party (PC), 169; Liberal Party (LP), 83; New Democratic Party (NDP), 43; Reform Party (RP), 0; Other, 0. In 1993, the number of seats won were BQ, 54; PC, 2; LP, 177; NDP, 9; RP, 52; Other, 1. In 1997, the number of seats won were BQ, 44; PC, 20; LP, 155; NDP, 21; RP, 60; Other, 1. a) Organize the data using a matrix S with a row for each political party. b) What are the dimensions of your matrix? c) What does the entry s43 represent? d) What entry has the value 52? e) Write the transpose matrix for S. Does S t provide the same information as S? f) The results from the year 2000 federal election were Bloc Québécois, 38; Progressive Conservative, 12; Liberal, 172; New Democratic Party, 13; Canadian Alliance (formerly Reform Party), 66; Other, 0. Update your matrix to include the results from the 2000 federal election. 54 MHR • Tools for Data Management
  • 55. Solution a) 1988 1993 1997 ΄ ΅ 0 54 44 BQ 169 2 20 PC 83 177 155 LP S = 43 9 21 NDP 0 52 60 RP 0 1 1 Other Labelling the rows and columns in large matrices can help you keep track of what the entries represent. b) The dimensions of the matrix are 6 × 3. c) The entry s43 shows that the NDP won 21 seats in 1997. d) The entry s52 has the value 52. e) The transpose matrix is BQ PC LP NDP RP Other ΄ ΅ 0 169 83 43 0 0 1988 S = t 54 2 177 9 52 1 1993 44 20 155 21 60 1 1997 Comparing the entries in the two matrices shows that they do contain exactly the same information. f) 1988 1993 1997 2000 ΄ ΅ 0 54 44 38 BQ 169 2 20 12 PC 83 177 155 172 LP 43 9 21 13 NDP 0 52 60 66 CA (RP) 0 1 1 0 Other Two matrices are equal only if each entry in one matrix is equal to the corresponding entry in the other. 1.5 4 −8 ΄ ΅ ΄ ΅ 3 ᎏᎏ ͙16 ෆ (−2)3 For example, 2 and 1 are equal matrices. ᎏᎏ −4 2 5−1 −4 −(−2) 5 1.6 Modelling With Matrices • MHR 55
  • 56. Two or more matrices can be added or subtracted, provided that their dimensions are the same. To add or subtract matrices, add or subtract the corresponding entries of each matrix. For example, −1 5 5 −3 ΄2 0 7 −8 ΅ + ΄ −2 0 4 −1 ΅ = ΄ −2 2 4 2 11 −9 ΅ Matrices can be multiplied by a scalar or constant. To multiply a matrix by a scalar, multiply each entry of the matrix by the scalar. For example, ΄ ΅΄ ΅ 4 5 −12 −15 −3 −6 0 = 18 0 3 −8 −9 24 Example 2 Inventory Problem The owner of Lou’s ’Lectronics Limited has two stores. The manager takes inventory of their top-selling items at the end of the week and notes that at the eastern store, there are 5 video camcorders, 7 digital cameras, 4 CD players, 10 televisions, 3 VCRs, 2 stereo systems, 7 MP3 players, 4 clock radios, and 1 DVD player in stock. At the western store, there are 8 video camcorders, 9 digital cameras, 3 CD players, 8 televisions, 1 VCR, 3 stereo systems, 5 MP3 players, 10 clock radios, and 2 DVD players in stock. During the next week, the eastern store sells 3 video camcorders, 2 digital cameras, 4 CD players, 3 televisions, 3 VCRs, 1 stereo system, 4 MP3 players, 1 clock radio, and no DVD players. During the same week, the western store sells 5 video camcorders, 3 digital cameras, 3 CD players, 8 televisions, no VCRs, 1 stereo system, 2 MP3 players, 7 clock radios, and 1 DVD player. The warehouse then sends each store 4 video camcorders, 3 digital cameras, 4 CD players, 4 televisions, 5 VCRs, 2 stereo systems, 2 MP3 players, 3 clock radios, and 1 DVD player. a) Use matrices to determine how many of each item is in stock at the stores after receiving the new stock from the warehouse. b) Immediately after receiving the new stock, the manager phones the head office and requests an additional 25% of the items presently in stock in anticipation of an upcoming one-day sale. How many of each item will be in stock at each store? 56 MHR • Tools for Data Management
  • 57. Solution 1 Using Pencil and Paper a) Let matrix A represent the initial inventory, matrix B represent the number of items sold, and matrix C represent the items in the first shipment of new stock. E W 5 8 camcorders 3 5 4 4 7 9 cameras 2 3 3 3 4 3 CD players 4 3 4 4 10 8 TVs 3 8 4 4 A= 3 1 VCRs B= 3 0 C= 5 5 2 3 stereos 1 1 2 2 7 5 MP3 players 4 2 2 2 4 10 clock radios 1 7 3 3 1 2 DVD players 0 1 1 1 Since the dimensions of matrices A, B, and C are the same, matrix addition and subtraction can be performed. Then, the stock on hand before the extra shipment is 5 8 3 5 4 4 6 7 7 9 2 3 3 3 8 9 4 3 4 3 4 4 4 4 10 8 3 8 4 4 11 4 D=A−B+C= 3 1 − 3 0 + 5 5 = 5 6 2 3 1 1 2 2 3 4 7 5 4 2 2 2 5 5 4 10 1 7 3 3 6 6 1 2 0 1 1 1 2 2 Let E represent the stock in the stores after the extra shipment from the warehouse. 6 7 7.5 8.75 8 9 10 11.25 4 4 5 5 11 4 13.75 5 E = 125% × D = 1.25 5 6 = 6.25 7.5 3 4 3.75 5 5 5 6.25 6.25 6 6 7.5 7.5 2 2 2.5 2.5 Assuming the manager rounds to the nearest whole number, the stock at the eastern store will be 8 video camcorders, 10 digital cameras, 5 CD players, 14 televisions, 6 VCRs, 4 stereo systems, 6 MP3 players, 8 clock radios, and 3 DVD players in stock. At the western store, there will be 9 video camcorders, 11 digital cameras, 5 CD players, 5 televisions, 8 VCRs, 5 stereo systems, 6 MP3 players, 8 clock radios, and 3 DVD players in stock. 1.6 Modelling With Matrices • MHR 57
  • 58. Solution 2 Using a Graphing Calculator a) As in the pencil-and-paper solution, let matrix A represent the initial inventory, matrix B the items sold, and matrix C the first shipment of new stock. Use the MATRX EDIT menu to store matrices. Press ENTER to select a matrix name, then key in the dimensions and the entries. The calculator will store the matrix until it is cleared or overwritten. Matrix names and entries appear in square brackets on the calculator screen. Use the MATRX NAMES menu to copy the matrices into the expression for D, the matrix representing the stock on hand before the extra shipment. Just move the cursor to the matrix you need and press ENTER. b) To find the stock on hand after the extra shipment for the one-day sale, multiply matrix D by 1.25 and store the result in matrix E. Then, you can use the round function in the MATH NUM menu to display the closest whole numbers for the entries in matrix E. Solution 3 Using a Spreadsheet a) You can easily perform matrix operations using a spreadsheet. It is also easy to add headings and row labels to keep track of what the entries represent. Enter each matrix using two adjacent columns: matrix A (initial stock) in columns A and B, matrix B (sales) in columns C and D, and matrix C (new stock) in columns E and F. To find the amount of stock on hand after the first shipment from the warehouse, enter the formula A3–C3+E3 in cell H3. Then, use the Fill feature to copy this formula for the rest of the entries in columns H and I. b) Use the Fill feature in a similar way to copy the formula for the entries in matrix E, the stock on hand after the extra shipment from the warehouse. You can use the ROUND function to find the nearest whole number automatically. The formula for cell J3, the first entry, is ROUND(1.25*H3,0). 58 MHR • Tools for Data Management
  • 59. Key Concepts • A matrix is used to manage and organize data. • A matrix made up of m rows and n columns has dimensions m × n. • Two matrices are equal if they have the same dimensions and all corresponding entries are equal. • The transpose matrix is found by interchanging rows with the corresponding columns. • To add or subtract matrices, add or subtract the corresponding entries of each matrix. The dimensions of the matrices must be the same. • To multiply a matrix by a scalar, multiply each entry of the matrix by the scalar. Communicate Your Understanding 1. Describe how to determine the dimensions of any matrix. 2. Describe how you know whether two matrices are equal. Use an example to illustrate your answer. 3. Can transpose matrices ever be equal? Explain. 4. a) Describe how you would add two matrices. Give an example. b) Explain why the dimensions of the two matrices need to be the same to add or subtract them. 5. Describe how you would perform scalar multiplication on a matrix. Give an example. 1.6 Modelling With Matrices • MHR 59
  • 60. Practise 5. a) Give two examples of square matrices. b) State the dimensions of each matrix in A part a). 1. State the dimensions of each matrix. 6. a) Write a 3 × 4 matrix, A, with the 5 −1 a) ΄ 4 −2 3 8 ΅ b) [1 0 −7] property that entry aij = i + j. b) Write a 4 × 4 matrix, B, with the property 3 if i = j Άi × j if i ΄ ΅ 3 −9 −6 that entry bij = 5 4 7 j c) 1 0 8 7. Solve for w, x, y, and z. 8 −1 2 a) ΄ −2 x 4 4z − 2 ΅ ΄ = 3 w y−1 6 ΅ ΄ ΅ −5 3 2 6 0 −1 ΄w ΅ = ΄ 8 −8 2y 9 ΅ 3 2. For the matrix A = , b) x2 4 8 −3 2y 3z 2z − 5 7 1 −4 a) state the value in entry ΄ ΅ ΄ ΅ 2 −1 3 4 i) a21 ii) a43 iii) a13 3 9 −6 1 8. Let A = ,B= , b) state the entry with value 5 0 8 2 −4 1 −1 −5 i) 4 ii) −3 iii) 1 3 −2 and C = ΄ 6 5 . ΅ ΄ ΅ a b c d e 1 4 0 −8 f g h i j 3. Let A = Calculate, if possible, k l m n o a) A + B b) B + A c) B − C p q r s t 1 d) 3A e) −ᎏᎏB f) 2(B − A) ΄ ΅ u v 2 g) 3A − 2B and B = w x . y z ΄ ΅ ΄ ΅ 8 −6 0 −1 For each of the following, replace aij or bij 9. Let A = 1 −2 , B = 2 4 , with its corresponding entry in the above −4 5 9 −3 matrices to reveal a secret message. ΄ ΅ 2 3 a) a33a11a45a43a24a13a15a44 8 −6 . and C = a11a43a15 a21b11a34 4 1 b) a24 a32a35b12a15 a33a11a45a23 Show that c) b21a35b21 a45a23a24a44 a) A + B = B + A a24a44 a21b11a34 (commutative property) 4. a) Give two examples of row matrices and b) (A + B) + C = A + (B + C ) two examples of column matrices. (associative property) b) State the dimensions of each matrix in c) 5(A + B) = 5A + 5B part a). (distributive property) 60 MHR • Tools for Data Management
  • 61. 10. Find the values of w, x, y, and z if sciences; U.K. with 21 Nobel prizes in ΄ ΅ ΄ ΅ 5 −1 2 6 y 5 physics, 25 in chemistry, 24 in 4 x −8 + 2 −3 2 1 physiology/medicine, 8 in literature, 13 in 7 0 3 2 −3 z peace, and 7 in economic sciences; Germany with 20 Nobel prizes in physics, 27 in ΄ ΅ 34 10 24 1 chemistry, 16 in physiology/medicine, 7 in = ᎏᎏ −4 24 −12 literature, 4 in peace, and 1 in economic 2 2w −12 42 sciences; France with 12 Nobel prizes in 11. Solve each equation. physics, 7 in chemistry, 7 in physiology/ medicine, 12 in literature, 9 in peace, and 1 in a) ΄3 2 0 8 ΅ 2 −5 + A = 7 −4 ΄ 0 1 3 −2 ΅ economic sciences; and Sweden with 4 Nobel prizes in physics, 4 in chemistry, 7 in ΄ ΅ ΄ ΅΄ ΅ 5 7 1 6 7 19 physiology/medicine, 7 in literature, 5 in b) 4 0 +y 0 −4 = 4 −8 peace, and 2 in economic sciences. −1 −3 2 5 3 7 a) Represent this data as a matrix, N. What Apply, Solve, Communicate are the dimensions of N ? b) Use row or column sums to calculate B how many Nobel prizes have been 12. Application The map below shows driving awarded to citizens of each country. distances between five cities in Ontario. 14. The numbers of university qualifications Thunder Bay (degrees, certificates, and diplomas) granted in Canada for 1997 are as follows: social sciences, 28 421 males and 38 244 females; 710 km education, 8036 males and 19 771 females; humanities, 8034 males and 13 339 females; North Bay health professions and occupations, 3460 Sault Ste. Ottawa males and 9613 females; engineering and 425 km 365 km Marie 350 km applied sciences, 10 125 males and 2643 655 km 400 km females; agriculture and biological sciences, Toronto 4780 males and 6995 females; mathematics and physical sciences, 6749 males and 2989 a) Represent the driving distances between females; fine and applied arts, 1706 males each pair of cities with a matrix, A. and 3500 females; arts and sciences, 1730 b) Find the transpose matrix, At. males and 3802 females. c) Explain how entry a23 in matrix A and The numbers for 1998 are as follows: social entry a32 in matrix At are related. sciences, 27 993 males and 39 026 females; 13. Nobel prizes are awarded for physics, education, 7565 males and 18 391 females; chemistry, physiology/medicine, literature, humanities, 7589 males and 13 227 females; peace, and economic sciences. The top five health professions and occupations, 3514 Nobel prize-winning countries are U.S.A. males and 9144 females; engineering and with 67 Nobel prizes in physics, 43 in applied sciences, 10 121 males and 2709 chemistry, 78 in physiology/medicine, 10 in females; agriculture and biological sciences, literature, 18 in peace, and 25 in economic 4779 males and 7430 females; 1.6 Modelling With Matrices • MHR 61
  • 62. mathematics and physical sciences, 6876 b) What is the total population for each age males and 3116 females; fine and applied arts, group? 1735 males and 3521 females; arts and c) Suppose that Canada’s population grows sciences, 1777 males and 3563 females. by 1.5% in all age groups. Calculate the a) Enter two matrices in a graphing calculator anticipated totals for each age group. or spreadsheet—one two-column matrix for males and females receiving degrees in 16. a) Prepare a matrix showing the pte ha connections for the VIA Rail routes 1997 and a second two-column matrix for C r the number of males and females receiving shown on page 3. Use a 1 to indicate a m P r oble degrees in 1998. direct connection from one city to another city. Use a 0 to indicate no direct b) How many degrees were granted to males connection from one city to another city. in 1997 and 1998 for each field of study? Also, use a 0 to indicate no direct c) How many degrees were granted to connection from a city to itself. females in 1997 and 1998 for each field b) What does the entry in row 4, column 3 of study? represent? d) What is the average number of degrees c) What does the entry in row 3, column 4 granted to females in 1997 and 1998 for represent? each field of study? d) Explain the significance of the 15. Application The table below shows the relationship between your answers in population of Canada by age and gender in parts b) and c). the year 2000. e) Describe what the sum of the entries in Age Group Number of Males Number of Females the first row represents. 0−4 911 028 866 302 f) Describe what the sum of the entries in 5−9 1 048 247 996 171 the first column represents. 10−14 1 051 525 997 615 g) Explain why your answers in parts e) and 15−19 1 063 983 1 007 631 f ) are the same. 20−24 1 063 620 1 017 566 25−29 1 067 870 1 041 900 C 30−34 1 154 071 1 129 095 17. Inquiry/Problem Solving Show that for any 35−39 1 359 796 1 335 765 m × n matrices, A and B 40−44 1 306 705 1 304 538 a) (At )t = A b) (A + B)t = At + B t 45−49 1 157 288 1 162 560 50−54 1 019 061 1 026 032 18. Communication Make a table to compare 55−59 769 591 785 657 matrix calculations with graphing calculators 60−64 614 659 641 914 and with spreadsheets. What are the 65−69 546 454 590 435 advantages, disadvantages, and limitations 70−74 454 269 544 008 of these technologies? 75−79 333 670 470 694 80−84 184 658 309 748 19. Inquiry/Problem Solving Search the 85−89 91 455 190 960 newspaper for data that could be organized 90+ 34 959 98 587 in a matrix. What calculations could you perform with these data in matrix form? Is a) Create two matrices using the above data, there any advantage to using matrices for one for males and another for females. these calculations? 62 MHR • Tools for Data Management
  • 63. 1.7 Problem Solving With Matrices The previous section demonstrated how to use matrices to model, organize, and manipulate data. With multiplication techniques, matrices become a powerful tool in a wide variety of applications. I N V E S T I G AT E & I N Q U I R E : Matrix Multiplication The National Hockey League standings on March 9, 2001 in the Northeast Division are shown below along with the league’s point system for a win, loss, tie, or overtime loss (OTL). Team Win Loss Tie OTL Score Points Ottawa 39 17 8 3 Win 2 Buffalo 36 25 5 1 Loss 0 Toronto 31 23 10 5 Tie 1 Boston 28 27 6 7 OTL 1 Montréal 23 36 5 4 1. Calculate the number of points for each team in the Northeast Division using the above tables. Explain your method. 2. a) Represent the team standings as a 5 × 4 matrix, A. b) Represent the points system as a column matrix, B. 3. Describe a procedure for determining the total points for Ottawa using the entries in row 1 of matrix A and column 1 of matrix B. 4. How could you apply this procedure to find the points totals for the other four teams? 5. Represent the total points for each team as a column matrix, C. How are the dimensions of C related to those of A and B? 6. Would it make sense to define matrix multiplication using a procedure such that A × B = C? Explain your reasoning. 1.7 Problem Solving With Matrices • MHR 63
  • 64. In the above investigation, matrix A has dimensions 5 × 4 and A5x4 × B4x1 = C5x1 matrix B has dimensions 4 × 1. Two matrices can be multiplied when their inner dimensions are equal. The outer dimensions same are the dimensions of the resultant matrix when matrices A dimensions and B are multiplied. outer dimensions give dimensions of resultant matrix Example 1 Multiplying Matrices Matrix A represents the proportion of students at a high school who have part-time jobs on Saturdays and the length of their shifts. Matrix B represents the number of students at each grade level. Gr 9 Gr 10 Gr 11 Gr 12 M F ΄ ΅ 120 130 Gr 9 ΄ ΅ 0.20 0.10 0.20 0.15 ≤ 4 h 137 155 Gr 10 A = 0.25 0.30 0.25 0.45 4.1 − 6 h B= 103 110 Gr 11 0.05 0.25 0.15 0.10 > 6 h 95 92 Gr 12 a) Calculate AB. Interpret what each entry represents. b) Calculate BA, if possible. Solution a) A and B have the same inner dimensions, so multiplication is possible and their product will be a 3 × 2 matrix: A3×4 × B4×2 = C3×2 ΅΄ ΅ 120 130 ΄ 0.20 0.10 0.20 0.15 137 155 AB = 0.25 0.30 0.25 0.45 103 110 0.05 0.25 0.15 0.10 95 92 ΄ ΅ (0.20)(120) + (0.10)(137) + (0.20)(103) + (0.15)(95) (0.20)(130) + (0.10)(155) + (0.20)(110) + (0.15)(92) = (0.25)(120) + (0.30)(137) + (0.25)(103) + (0.45)(95) (0.25)(130) + (0.30)(155) + (0.25)(110) + (0.45)(92) (0.05)(120) + (0.25)(137) + (0.15)(103) + (0.10)(95) (0.05)(130) + (0.25)(155) + (0.15)(110) + (0.10)(92) ΄ ΅ 73 77 ⋅ = 140 148 65 71 Approximately 73 males and 77 females work up to 4 h; 140 males and 148 females work 4− 6 h, and 65 males and 71 females work more than 6 h on Saturdays. b) For B4×2 × A3×4, the inner dimensions are not the same, so BA cannot be calculated. 64 MHR • Tools for Data Management
  • 65. Technology is an invaluable tool for solving problems that involve large amounts of data. Example 2 Using Technology to Multiply Matrices The following table shows the number and gender of full-time students enrolled at each university in Ontario one year. University Full-Time Students Males (%) Females (%) Brock 6509 43 57 Carleton 12 376 55 45 Guelph 11 773 38 62 Lakehead 5308 48 52 Laurentian 3999 43 57 McMaster 13 797 46 54 Nipissing 1763 34 66 Ottawa 16 825 42 58 Queen’s 13 433 44 56 Ryerson 10 266 47 53 Toronto 40 420 44 56 Trent 3764 36 64 Waterloo 17 568 55 45 Western 21 778 46 54 Wilfred Laurier 6520 45 55 Windsor 9987 46 54 York 27 835 39 61 a) Set up two matrices, one listing the numbers of full-time students at each university and the other the percents of males and females. b) Determine the total number of full-time male students and the total number of full-time female students enrolled in Ontario universities. Solution 1 Using a Graphing Calculator a) Use the MATRX EDIT menu to store matrices for a 1 × 17 matrix for the numbers of full-time students and a 17 × 2 matrix for the percents of males and females. b) To multiply matrices, use the MATRX NAMES menu. Copy the matrices into an expression such as [A]*[B] or [A][B]. There are 100 299 males and 123 622 females enrolled in Ontario universities. 1.7 Problem Solving With Matrices • MHR 65
  • 66. You can also enter matrices directly into an expression by using the square brackets keys. This method is simpler for small matrices, but does not store the matrix in the MATRX NAMES menu. Solution 2 Using a Spreadsheet Enter the number of full-time students at each university as a 17 × 1 matrix in cells B2 to B18. This placement leaves you the option of putting labels in the first row and column. Enter the proportion of male and female students as a 2 × 17 matrix in cells D2 to T3. Both Corel® Quattro Pro and Microsoft® Excel have built-in functions for multiplying matrices, although the procedures in the two programs differ somewhat. Corel® Quattro Pro: On the Tools menu, select Numeric Tools/Multiply. In the pop-up window, enter the cell ranges for the two matrices you want to multiply and the cell where you want the resulting matrix to start. Note that you must list the 2 × 17 matrix first. Project Prep You can apply these techniques for matrix multiplication to the calculations for your tools for data management project. Microsoft® Excel: The MMULT(matrix1,matrix2) function will calculate the product of the two matrices but displays only the first entry of the resulting matrix. Use the INDEX function to retrieve the entry for a specific row and column of the matrix. 66 MHR • Tools for Data Management
  • 67. ΄ ΅ 1 0 0 0 … 0 0 1 0 0 … 0 0 0 1 0 … 0 Identity matrices have the form I = with entries 0 0 0 1 … 0 Ӈ Ӈ Ӈ Ӈ Ӈ Ӈ 0 0 0 0 … 1 of 1 along the main diagonal and zeros for all other entries. The identity matrix with dimensions n × n is represented by In. It can easily be shown that Am×n In = Am×n for any m × n matrix A. For most square matrices, there exists an inverse matrix A–1 with the 1 property that AA−1 = A−1A = I. Note that A−1 ᎏᎏ. A For 2 × 2 matrices, AA−1 = ΄ ac d ΅ ΄ w x ΅ = ΄ 1 0 ΅ b y z 0 1 Multiplying the matrices gives four simultaneous equations for w, x, y, and z. d −b 1 ΄ ΅ Solving these equations yields A−1 = ᎏᎏ − c a . You can confirm that ad − bc A A = I, also. If ad = bc, then A does not exist since it would require –1 −1 dividing by zero. The formulas for the inverses of larger matrices can be determined in the same way as for 2 × 2 matrices, but the calculations become much more involved. However, it is relatively easy to find the inverses of larger matrices with graphing calculators since they have the necessary formulas built in. 1.7 Problem Solving With Matrices • MHR 67
  • 68. Example 3 Calculating the Inverse Matrix Calculate, if possible, the inverse of a) A = ΄ 3 7 4 −2 ΅ b) B = 6 8 ΄ 3 4 ΅ Solution 1 Using Pencil and Paper A−1 = ᎏ d − b ΄ ΅ 1 a) ad − bc − c a = ᎏᎏ −2 −7 ΄ ΅ 1 (3)( −2) − (7)(4) −4 3 = − ᎏᎏ −2 −7 ΄ ΅ 1 34 −4 3 ΄ ΅ 1 7 ᎏᎏ ᎏᎏ = 17 34 2 3 ᎏᎏ −ᎏᎏ 17 34 b) For B, ad − bc = (6)(4) − (8)(3) = 0, so B −1 does not exist. Solution 2 Using a Graphing Calculator a) Use the MATRX EDIT menu to store the 2 × 2 matrix. Retrieve it with the MATRX NAMES menu, then use x −1 to find the inverse. To verify that the decimal numbers shown are equal to the fractions in the pencil-and-paper solution, use the ᭤Frac function from the MATH NUM menu. b) For B, the calculator shows that the inverse cannot be calculated. 68 MHR • Tools for Data Management
  • 69. Solution 3 Using a Spreadsheet The spreadsheet functions for inverse matrices are similar to those for matrix multiplication. a) Enter the matrix in cells A1 to B2. In Corel® Quattro Pro, use Tools/Numeric Tools/Invert… to enter the range of cells for the matrix and the cell where you want the inverse matrix to start. Use the Fraction feature to display the entries as fractions rather than decimal numbers. In Microsoft® Excel, use the MINVERSE function to produce the inverse matrix and the INDEX function to access the entries in it. If you put absolute cell references in the MINVERSE function for the first entry, you can use the Fill feature to generate the formulas for the other entries. Use the Fraction feature to display the entries as fractions rather than decimal numbers. 1.7 Problem Solving With Matrices • MHR 69
  • 70. During the 1930s, Lester Hill, an American mathematician, developed methods for using matrices to decode messages. The following example illustrates a simplified version of Hill’s technique. Example 4 Coding a Message Using Matrices a) Encode the message PHONE ME TONIGHT using 2 × 2 matrices. b) Determine the matrix key required to decode the message. Solution a) Write the message using 2 × 2 matrices. Fill in any missing entries with the letter Z. ΄ O N ΅, ΄ E M ΅, ΄ OI N ΅, ΄ H T ΅ P H E T G Z Z Replace each letter with its corresponding number in the alphabet. A B C D E F G H I J K L M 1 2 3 4 5 6 7 8 9 10 11 12 13 N O P Q R S T U V W X Y Z 14 15 16 17 18 19 20 21 22 23 24 25 26 ΄ 16 15 14΅ ΄ 5 13 ΅, ΄ 15 14 ΅, ΄ 26 26 ΅ 8 , 5 20 9 7 8 20 Now, encode the message by multiplying with a coding matrix that only ΄ ΅ the sender and receiver know. Suppose that you chose C = 3 1 as your coding matrix. 5 2 ΄ 3 1 ΅΄ 16 5 2 15 8 14 ΅ = ΄ 110 68 ΅ 63 38 ΄ 3 1 ΅΄ 5 13 ΅ = ΄ 20 105 ΅ 5 2 5 20 35 59 ΄ 3 1 ΅΄ 15 14 ΅ = ΄ 54 49 ΅ 5 2 9 7 93 84 ΄ 3 1 ΅΄ 26 26 ΅ = ΄ 50 152 ΅ 5 2 8 20 92 86 You would send the message as 63, 38, 110, 68, 20, 59, 35, 105, 54, 49, 93, 84, 50, 86, 92, 152. 70 MHR • Tools for Data Management
  • 71. b) First, rewrite the coded message as 2 × 2 matrices. ΄ 110 68 ΅, ΄ 20 105 ΅, ΄ 63 38 35 59 54 49 , 93 84 ΅ ΄ 92 152 ΅ 50 86 You can decode the message with the inverse matrix for the coding matrix. C −1 × CM = C −1C × M = IM = M where M is the message matrix and C is the coding matrix. Thus, the decoding matrix, or key, is the inverse matrix of the coding matrix. For the coding matrix used in part a), the key is = ᎏᎏ 2 −1 = 2 −1 −1 ΄3 1΅ ΄ ΅ ΄ ΅ 1 5 2 (3)(2) − (5)(1) −5 3 −5 3 Multiplying the coded message by this key gives −1 ΄ −5 2 3 ΅΄ 110 68 ΅ = ΄ 15 14 ΅ 63 38 16 8 −1 20 59 = 5 13 ΄ −5 2 3 ΅΄ 35 105 ΅ ΄ 5 20 ΅ −1 54 49 = 15 14 ΄ −5 2 3 ΅΄ 93 84 ΅ ΄ 9 7 ΅ −1 50 86 = 8 20 ΄ −5 2 3 ΅΄ 92 152 ΅ ΄ 26 26 ΅ The decoded message is 16, 8, 15, 14, 5, 13, 5, 20, 15, 14, 9, 7, 8, 20, 26, 26. Replacing each number with its corresponding letter in the alphabet gives PHONEMETONIGHTZZ, the original message with the two Zs as fillers. Matrix multiplication and inverse matrices are the basis for many computerized encryption systems like those used for electronic transactions between banks and income tax returns filed over the Internet. Transportation and communication networks can be represented using matrices, called network matrices. Such matrices provide information on the number of direct links between two vertices or points (such as people or places). The advantage of depicting networks using matrices is that information on indirect routes can be found by performing calculations with the network matrix. To construct a network matrix, let each vertex (point) be represented as a row and as a column in the matrix. Use 1 to represent a direct link and 0 to represent no direct link. A vertex may be linked to another vertex in one direction or in both directions. Assume that a vertex does not link with itself, so each entry in the main diagonal is 0. Note that the network matrix provides information only on direct links. 1.7 Problem Solving With Matrices • MHR 71
  • 72. Example 5 Using Matrices to Model a Network Matrixville Airlines offers flights between London, England eight cities as shown on the right. a) Represent the network using a Toronto New matrix, A. Organize the matrix Vancouver Paris Delhi so the cities are placed in alphabetical order. Kingston, b) Calculate A2. What information Honolulu Jamaica does it contain? Buenos Aires c) How many indirect routes with exactly one change of planes are there from London to Buenos Aires? d) Calculate A + A2. What information does it contain? e) Explain what the entry from Vancouver to Paris in A + A2 represents. f) Calculate A3. Compare this calculation with the one for A2. g) Explain the significance of any entry in matrix A3. Solution a) B H K L N P T V 0 0 0 0 0 1 1 0 B 0 0 0 0 0 0 1 0 H 0 0 0 1 0 0 1 0 K A= 0 0 1 0 1 1 1 1 L 0 0 0 1 0 1 0 0 N 1 0 0 1 1 0 1 0 P 1 1 1 1 0 1 0 1 T 0 0 0 1 0 0 1 0 V b) Since the dimensions of matrix A are 8 × 8, you may prefer to use a calculator or software for this calculation. 2 1 1 2 1 1 1 1 1 1 1 1 0 1 0 1 1 1 2 1 1 2 1 2 A2 = 2 1 1 5 1 2 3 1 1 0 1 1 2 1 2 1 1 1 2 2 1 4 2 2 1 0 1 3 2 2 6 2 1 1 2 1 1 2 1 2 The entries in A2 show the number of indirect routes with exactly one change of planes. A2 does not contain any information on direct routes. 72 MHR • Tools for Data Management
  • 73. c) There are two indirect routes with exactly one change of planes from London to Buenos Aires. London → Paris → Buenos Aires London → Toronto → Buenos Aires 2 1 1 2 1 2 2 1 1 1 1 1 0 1 1 1 1 1 2 2 1 2 2 2 d) A + A2 = 2 1 2 5 2 3 4 2 1 0 1 2 2 2 2 1 2 1 2 3 2 4 3 2 2 1 2 4 2 3 6 2 1 1 2 2 1 2 2 2 Since A shows the number of direct routes and A2 shows the number of routes with one change of planes, A + A2 shows the number of routes with at most one change of planes. e) The entry in row 8, column 6 of A + A2 shows that there are two routes with a maximum of one change of planes from Vancouver to Paris. Vancouver → Toronto → Paris Vancouver → London → Paris 2 1 3 5 3 6 8 3 1 0 1 3 2 2 6 1 3 1 2 8 3 4 9 2 f) A3 = 5 3 8 8 7 11 12 8 3 2 3 7 2 6 5 3 6 2 4 11 6 6 12 4 8 6 9 12 5 12 8 9 3 1 2 8 3 4 9 2 The calculation of A3 = A2 × A is more laborious than that for A2 = A × A since A2 has substantially fewer zero entries than A does. A calculator or spreadsheet could be useful. g) The entries in A3 tell you the number of indirect routes with exactly two changes of planes between each pair of cities. 1.7 Problem Solving With Matrices • MHR 73
  • 74. Key Concepts • To multiply two matrices, their inner dimensions must be the same. The outer dimensions give the dimensions of the resultant matrix: Am×n × Bn×p = Cm×p. To find the entry with row i and column j of matrix AB, multiply the entries of row i of matrix A with the corresponding entries of column j of matrix B, and then add the resulting products together. • The inverse of the 2 × 2 matrix A = a b is A−1 = ᎏᎏ d −b ΄ ΅ ΄ ΅ 1 c d ad − bc −c a provided that ad ≠ bc. Larger inverse matrices can be found using a graphing calculator or a spreadsheet. • To represent a network as a matrix, use a 1 to indicate a direct link and a 0 to indicate no direct link. Calculations with the square of a network matrix and its higher powers give information on the various direct and indirect routings possible. Communicate Your Understanding 1. Explain how multiplying matrices is different from scalar multiplication of matrices. 2. Describe the steps you would take to multiply ΄ 4 −2 ΅΄ 3 1 5 0 6 4 2 −1 . 5 7 ΅ 3. Is it possible to find an inverse for a matrix that is not square? Why or why not? 4. Explain why a network matrix must be square. A B 5. Describe how you would represent the following network as a matrix. How would you find the number of routes C D with up to three changeovers? E Practise Calculate, if possible, a) BD b) DB c) B2 d) EA A e) AC f) CE g) DA 1. Let A = ΄ 4 7 −3 −5 1΅ 0 ,B= 2 −7 ΄ ΅ 9 , 0 2. Given A = ΄ 4 −1 ΅ and B = ΄−2 3 ΅, show 2 0 ΄ ΅ ΄ ΅ 1 5 8 1 0 0 C = 2 0 −4 , D = −3 −3 −2 8 2 ΄ 5΅ 1 ,E= 2 . −3 ΄ that A2 + 2B3 = 16 −30 . ΅ 24 1 74 MHR • Tools for Data Management
  • 75. 8. Application Calculators Galore has three 3. If A = ΄0 0 ΅ 0 1 , show that A4 = ΄ 0 0 ΅ 0 , 0 stores in Matrixville. The downtown store sold 12 business calculators, 40 scientific the 2 × 2 zero matrix. calculators, and 30 graphing calculators during the past week. The northern store 4. Let A = ΄ 5 −1 ΅, B = ΄ −2 4 ΅, C = ΄ 1 −3΅. 2 0 3 0 0 7 sold 8 business calculators, 30 scientific calculators, and 21 graphing calculators Show that during the same week, and the southern a) A(B + C) = AB + AC store sold 10 business calculators, (distributive property) 25 scientific calculators, and 23 graphing b) (AB)C = A(BC ) calculators. What were the total weekly sales (associative property) for each store if the average price of a c) AB ≠ BA business calculator is $40, a scientific (not commutative) calculator is $30, and a graphing calculator is $150? 5. Find the inverse matrix, if it exists. 9. Application The manager at Sue’s Restaurant 4 −6 a) ΄ ΅ 0 −1 2 4 b) ΄ ΅ −2 3 c) ΄ 3 −6 0 1 ΅ prepares the following schedule for the next week. d) ΄ 5 3΅ 4 2 ΄ 4 2΅ e) 10 5 Employee Mon. Tues. Chris − 8 Wed. Thurs. − 8 Fri. Sat. Sun. 8 − − Wage Per Hour $7.00 6. Use a graphing calculator or a spreadsheet Lee 4 4 − − 6.5 4 4 $6.75 to calculate the inverse matrix, if it exists. Jagjeet − 4 4 4 4 8 8 $7.75 Pierre − 3 3 3 3 8 − $6.75 ΄ ΅ 1 −3 1 Ming 8 8 8 8 − − − $11.00 a) A = −2 1 3 Bobby − − 3 5 5 8 − $8.00 0 −1 0 Nicole 3 3 3 3 3 − − $7.00 ΄ ΅ −2 0 5 Louis 8 8 8 8 8 − − $12.00 b) B = 2 −1 −1 Glenda 8 − − 8 8 8 8 $13.00 3 4 0 Imran 3 4.5 4 3 5 − − $7.75 a) Create matrix A to represent the number ΄ ΅ 2 −1 1 0 of hours worked per day for each 0 1 0 2 c) C = employee. −2 −1 0 0 b) Create matrix B to represent the hourly 1 0 −1 0 wage earned by each employee. Apply, Solve, Communicate c) Use a graphing calculator or spreadsheet to calculate the earnings of each B employee for the coming week. 7. For A = ΄−2 − 4 ΅ and B = ΄ 5 7΅, show that 2 5 3 4 d) What is the restaurant’s total payroll for these employees? a) (A−1)−1 = A b) (AB)−1 = B −1A−1 c) (A t )−1 = (A−1) t 1.7 Problem Solving With Matrices • MHR 75
  • 76. 10. According to a 1998 general social survey c) What is the total cost of cloth and labour conducted by Statistics Canada, the ten most for filling the order in part a)? popular sports for people at least 15 years old are as follows: 12. Use the coding matrix each message. ΄ −2 −3 ΅ to encode 2 5 Sport Total (%) Male (%) Female (%) Golf 7.4 11.1 3.9 a) BIRTHDAY PARTY FRIDAY Ice Hockey 6.2 12.0 0.5 b) SEE YOU SATURDAY NIGHT Baseball 5.5 8.0 3.1 Swimming 4.6 3.6 5.6 13. Application Use the decoding matrix ΄ −1 −3 ΅ to decode each message. Basketball 3.2 4.6 1.9 2 Volleyball 3.1 3.3 2.8 2 Soccer 3.0 4.6 1.5 a) 64, 69, 38, 45, 54, 68, 31, 44, 5, 115, 3, Tennis 2.7 3.6 1.8 70, 40, 83, 25, 49 Downhill/Alpine Skiing 2.7 2.9 2.6 b) 70, 47, 39, 31, 104, 45, 61, 25, 93, 68, 57, Cycling 2.5 3.0 2.0 44, 55, 127, 28, 76 In 1998, about 11 937 000 males and 14. a) Create a secret message about 16 to 24 12 323 000 females in Canada were at least letters long using the coding 15 years old. Determine how many males and how many females declared each of the ΄ matrix 3 5 . 1 2 ΅ above sports as their favourite. Describe how b) Trade messages with a classmate and you used matrices to solve this problem. decode each other’s messages. 11. Application A company manufacturing 15. Quality education at a school requires open designer T-shirts produces five sizes: extra- communication among many people. small, small, medium, large, and extra-large. Superintendent The material and labour needed to produce a box of 100 shirts depends on the size of the shirts. Cloth per Labour per Administration Size shirt (m2) 100 shirts (h) Extra-small 0.8 8 Small 0.9 8.5 Teachers Guidance Medium 1.2 9 Large 1.5 10 Extra-large 2.0 11 Students Parents a) How much cloth and labour are required to fill an order for 1200 small, 1500 a) Represent this network as a matrix, A. medium, 2500 large, and 2000 extra- b) Explain the meaning of any entry, ai j , of large T-shirts? matrix A. b) If the company pays $6.30 per square c) Describe what the sum of the entries in metre for fabric and $10.70 per hour for the third column represents. labour, find the cost per box for each size d) Calculate A2. of T-shirt. 76 MHR • Tools for Data Management
  • 77. e) How many indirect links exist with C exactly one intermediary between the 18. Inquiry/Problem Solving Create your own principal and parents? List these links. network problem, then exchange problems f) Calculate A + A2. Explain what with a classmate. Solve both problems and information this matrix provides. compare your solutions with those of your 16. Network matrices provide another approach classmate. Can you suggest any to the Koenigsberg bridges example on improvements for either set of solutions? page 44. 19. Show how you could use inverse matrices Blacksmith to solve any system of equations in two Bridge Honey Wooden variables whose matrix of coefficients has Bridge Bridge an inverse. D E G 20. Communication Research encryption Merchants C Bridge techniques on the Internet. What is meant by 128-bit encryption? How does the system A F of private and public code keys work? B High Green Bridge 21. Inquiry/Problem Solving Bridge Connecting Bridge a) Suppose you receive a coded message like the one in Example 4, but you do Use network matrices to answer the not know the coding matrix or its following questions. inverse. Describe how you could use a a) How many ways can you get from computer to break the code and decipher Honey Bridge to Connecting Bridge by the message. crossing only one of the other bridges? b) Describe three methods you could use List these routes. to make a matrix code harder to break. b) How many ways can you get from 22. a) Show that, for any m × n matrix A and Blacksmith Bridge to Connecting Bridge any n × p matrix B, (AB)t = B tA t. without crossing more than one of the other bridges? b) Show that, if a square matrix C has an inverse C –1, then C t also has an inverse, c) Is it possible to travel from Wooden and (C t )–1 = (C –1) t. Bridge to Green Bridge without crossing at least two other bridges? 17. Use network matrices to find the number pt ha e of VIA Rail routes from C r a) Toronto to Montréal with up to two m P r oble change-overs b) Kingston to London with up to three change-overs 1.7 Problem Solving With Matrices • MHR 77
  • 78. Review of Key Concepts 1.1 The Iterative Process 1.2 Data Management Software Refer to the Key Concepts on page 10. Refer to the Key Concepts on page 21. 1. a) Draw a tree diagram showing your direct 5. List three types of software that can be used ancestors going back four generations. for data management, giving an example of b) How many direct ancestors do you have the data analysis you could do with each in four generations? type. 2. a) Describe the algorithm used to build the 6. Evaluate each spreadsheet expression. iteration shown. a) F2+G7–A12 b) Continue the iteration for eight more where F2=5, G7= –9, and A12=F2+G7 rows. b) PROD(D3,F9) c) Describe the resulting iteration. where D3=6 and F9=5 c) SQRT(B1) MATH MATHMATH where B1=144 MATH MATH 7. Describe how to reference cells A3 to A10 MATHMATHMATHMATH MATH MATH in one sheet of a spreadsheet into cells B2 MATHMATH MATHMATH to B9 in another sheet. 3. a) Construct a Pythagoras fractal tree using 8. Use a spreadsheet to convert temperatures the following algorithm. between −30° C and 30° C to the Step 1: Construct a square. Fahrenheit scale, using the formula Step 2: Construct an isosceles right Fahrenheit = 1.8 × Celsius + 32. Describe triangle with the hypotenuse on how you would list temperatures at two- one side of the square. degree intervals in the Celsius column. Step 3: Construct a square on each of the other sides of the triangle. 1.3 Databases Repeat this process, with the newly Refer to the Key Concepts on page 31. drawn squares to a total of four 9. Describe the characteristics of a well- iterations. organized database. b) If the edges in the first square are 4 cm, determine the total area of all the squares 10. Outline a design for a database of a shoe in the fourth iteration. store’s customer list. c) Determine the total area of all the 11. a) Describe the types of data that are squares in the diagram. available from Statistics Canada’s 4. Design an iterative process using the percent E-STAT database. reduction capabilities of a photocopier. b) What can you do with the data once you have accessed them? 78 MHR • Tools for Data Management
  • 79. 12. What phrase would you enter into a search 18. State whether each network is engine to find i) connected a) the top-selling cookbook in Canada? ii) traceable b) the first winner of the Fields medal? iii) planar c) a list of movies in which bagpipes are a) A C b) P Q played? 1.4 Simulations U R Refer to the Key Concepts on page 39. B D 13. List three commonly used simulations and T S a reason why each is used. c) L 14. Write out the function to generate a random J M integer between 18 and 65 using K a) a graphing calculator N b) a spreadsheet 19. For each network in question 18, verify that 15. A chocolate bar manufacturer prints one of a repeating sequence of 50 brainteasers on V − E + R = 2, where V is the number of the inside of the wrapper for each of its vertices, E is the number of edges, and R is chocolate bars. Describe a manual the number of regions in a graph. simulation you could use to estimate the 20.The following is a listing of viewing requests chances of getting two chocolate bars with submitted by patrons of a classic film the same brainteaser if you treat yourself to festival. Use graph theory to set up the one of the bars every Friday for five weeks. shortest viewing schedule that has no 16. Outline how you would use technology to conflicts for any of these patrons. run a simulation 500 times for the scenario Person A: Gone With the Wind, Curse of The in question 15. Mummy, Citizen Kane Person B: Gone With the Wind, Jane Eyre 1.5 Graph Theory Person C: The Amazon Queen, West Side Refer to the Key Concepts on page 48. Story, Citizen Kane Person D: Jane Eyre, Gone With the Wind, 17. How many colours are needed to colour West Side Story each of the following maps? Person E: The Amazon Queen, Ben Hur a) b) A B C C D A B D E F E G Review of Key Concepts • MHR 79
  • 80. 21. Below is a network showing the Calculate, if possible, relationships among a group of children. a) A + C b) C − B The vertices are adjacent if the children are friends. c) A + B d) 3D 1 Sarah Mai e) −ᎏᎏ C f) 3(B + D) 2 g) A t + B h) B t + C t Deqa Priya 25. The manager of a sporting goods store takes inventory at the end of the month and finds Tanya Afra 15 basketballs, 17 volleyballs, 4 footballs, 15 baseballs, 8 soccer balls, 12 packs of a) Rewrite the network in table form. tennis balls, and 10 packs of golf balls. The b) Are these children all friends with each manager orders and receives a shipment of other? 10 basketballs, 3 volleyballs, 15 footballs, c) Who has the most friends? 20 baseballs, 12 soccer balls, 5 packs of d) Who has the fewest friends? tennis balls, and 15 packs of golf balls. During the next month, the store sells 1.6 Modelling With Matrices 17 basketballs, 13 volleyballs, 17 footballs, Refer to the Key Concepts on page 59. 12 baseballs, 12 soccer balls, 16 packs of tennis balls, and 23 packs of golf balls. ΄ ΅ 2 −1 5 a) Represent the store’s stock using three 0 4 3 22. For the matrix A = , matrices, one each for the inventory, new 7 −8 −6 stock received, and items sold. −2 9 1 b) How many of each item is in stock at the a) state the dimensions end of the month? b) state the value of entry c) At the beginning of the next month, the i) a32 ii) a13 iii) a41 manager is asked to send 20% of the c) list the entry with value store’s stock to a new branch that is about to open. How many of each item i) 3 ii) 9 iii) −1 will be left at the manager’s store? 23. Write a 4 × 3 matrix, A, with the property 26. Outline the procedure you would use to that aij = i × j for all entries. subtract one matrix from another ΄ ΅ 8 −2 a) manually 2 −1 , B = 24. Given A = ΄ 3 −7 0 5 ΅ 3 4 , 2 5 b) using a graphing calculator c) using a spreadsheet ΄ ΅ 4 3 1 −4 , and D = −1 C= ΄ −5 6 9 0 ΅ 6 7 . 2 80 MHR • Introduction to Probability
  • 81. 1.7 Problem Solving With Matrices 31. a) Write an equation to show the Refer to the Key Concepts on page 74. relationship between a matrix and its inverse. ΄ ΅ −1 27. Let A = ΄ −6 5΅, B = ΄ −5 7 ΅, 4 3 1 0 b) Show that 20 1.5 0 −1.5 −13 is the −7.5 0.5 5 ΄ ΅ ΄ ΅ 3 6 −1 5 ΄ ΅ C = 2 0 4 , and D = 4 . 4 2 6 −5 −2 8 −3 inverse of 10 0 2 . Calculate, if possible, 5 3 9 a) AB b) BA c) A2 d) DC e) C 2 c) Find the inverse of ΄ 4 5 ΅. 2 3 28. a) Write the transpose of matrices 32. The following diagram illustrates the food ΄ ΅ A = 1 5 and B = 0 4 . 8 −2 ΄ 6 −1 ΅ chains in a pond. b) Show whether (AB) = B tA t. t Plants 29. A small accounting firm charges $50 per Small Fish Large Fish hour for preparing payrolls, $60 per hour for corporate tax returns, and $75 per hour for audited annual statements. The firm did Snails Bacteria the following work for three of its clients: XYZ Limited, payrolls 120 hours, tax a) Represent these food chains as a network returns 10 hours, auditing 10 hours matrix, A. YZX Limited, payrolls 60 hours, tax b) Calculate A2. returns 8 hours, auditing 8 hours c) How many indirect links with exactly ZXY Limited, payrolls 200 hours, tax one intermediate step are there from returns 15 hours, auditing 20 hours plants to snails? a) Use matrices to determine how much the d) Calculate A + A2. Explain the meaning accounting firm should bill each client. of any entry in the resulting matrix. b) How can you determine the total billed e) Calculate A3. to the three clients? f) List all the links with two intermediate 30. Suppose you were to encode a message by steps from plants to bacteria. writing it in matrix form and multiplying by a coding matrix. Would your message be more secure if you then multiplied the resulting matrices by another coding matrix with the same dimensions as the first one? Explain why or why not. Review of Key Concepts • MHR 81
  • 82. Chapter Test ACHIEVEMENT CHART Knowledge/ Thinking/Inquiry/ Category Communication Application Understanding Problem Solving Questions All 8, 9, 14 1, 2, 5, 6, 7, 8, 9, 14 9, 10, 13, 14 1. a) Describe an iterative 5. Suppose that, on January 10, you borrowed process you could use $1000 at 6% per year compounded monthly to draw the red path. (0.5% per month). You will be expected to b) Complete the path. repay $88.88 a month for 1 year. However, the final payment will be less than $88.88. You set up a spreadsheet with the following 2. Find the first few terms of the recursion column headings: MONTH, BALANCE, 1 PAYMENT, INTEREST, PRINCIPAL, formula tn = ᎏ , given t1 = 0. NEW BALANCE tn − 1 + 2 Is there a pattern to these terms? If so, The first row of entries would be: describe the pattern. MONTH: February BALANCE: 1000.00 3. A “fan-out” calling system is frequently used PAYMENT: 88.88 to spread news quickly to a large number of INTEREST: 5.00 people such as volunteers for disaster relief. PRINCIPAL: 83.88 The first person calls three people. Each of NEW BALANCE: 916.12 those people calls an additional three people; Describe how you would each of whom calls an additional three a) use the cell referencing formulas and people, and so on. the Fill feature to complete the table a) Use a tree diagram to illustrate a fan-out b) determine the size of the final payment calling system with sufficient levels to on January 10 of the following year call 50 people. c) construct a line graph showing the b) How many levels would be sufficient to declining balance call 500 people? 6. Describe how you would design a database 4. Rewrite each of the following expressions as of the daily travel logs for a company’s spreadsheet functions. salespersons. a) C1+C2+C3+C4+C5+C6+C7+C8 7. Describe three different ways to generate b) The smallest value between cells A5 random integers between 1 and 50. and G5 8. a) Redraw this map as a 5 − ͙6ෆ c) ᎏ network. 10 + 15 b) How many colours are needed to colour the map? Explain your reasoning. 82 MHR • Introduction to Probability
  • 83. 9. A salesperson must visit each of the towns b) What is the value of entry a23? on the following map. c) Identify the entry of matrix A with Pinkford 55 Orangeton value −2. 67 Blacktown d) Is it possible to calculate A2? Explain. 50 60 55 ΄΅ 46 2 Blueton Brownhill Redville 35 86 12. Let A = 1 , B = [7 5 0], C = 4 8 , 5 5 −3 ΄ ΅ 53 38 40 ΄ ΅ Whiteford 49 Greenside 8 −2 a) Is there a route that goes through each ΄ 9 1 ΅ D = 2 −7 , and E = 5 0 . −4 1 town only once? Explain. b) Find the shortest route that begins and Calculate, if possible, ends in Pinkford and goes through all the a) 2C + D b) A + B c) AD d) EC e) E t towns. Show that it is the shortest route. 13. A local drama club staged a variety show for 10. The following map four evenings. The admission for adults was shows the bridges $7.00, for students $4.00, and for children of Uniontown, 13 years of age and under $2.00. On situated on the Wednesday, 52 adult tickets, 127 student banks of a river and on three islands. Use tickets, and 100 child tickets were sold; on graph theory to determine if a continuous Thursday, 67 adult tickets, 139 student path could traverse all the bridges once each. tickets, and 115 child tickets were sold; on Friday, 46 adult tickets, 115 student tickets, ΄ ΅ 4 −2 6 –8 5 9 and 102 child tickets were sold; and on 11. Let A = . Saturday, 40 adult tickets, 101 student 0 1 −1 3 −7 −3 tickets, and 89 child tickets were sold. Use matrices to calculate how much money was a) State the dimensions of matrix A. collected from admissions. ACHIEVEMENT CHECK Knowledge/Understanding Thinking/Inquiry/Problem Solving Communication Application 14. The network diagram below gives the cost of flights between Montréal $579 $249 five Canadian cities. $469 a) Construct a network matrix A for these routes. $199 Halifax Vancouver $269 b) Calculate A2 and A3. Winnipeg $438 $398 c) How many ways can a person travel from Halifax to $508 Vancouver by changing planes exactly twice? Describe Toronto each route. Which route is most economical? Chapter Test • MHR 83
  • 84. To o l s f o r D a t a M a n a g e m e n t P r o j e c t Wrap-Up Implementing Your Action Plan 8. From your rankings, select the top five 1. With your whole class or a small group, universities or community colleges. Draw brainstorm criteria for ranking universities a diagram of the distances from each and community colleges. List the three university or college to the four others and universities or colleges that you think will to your home. Then, use graph theory to most likely be the best choices for you. determine the most efficient way to visit each of the five universities or community 2. Have a class discussion on weighting colleges during a five-day period, such as a systems. March break vacation. 3. Look up the Maclean’s university and 9. Based on your project, select your top three community college rankings in a library or choices. Comment on how this selection on the Internet. Note the criteria that compares with your original list of top Maclean’s uses. choices. 4. Determine your own set of criteria. These may include those that Maclean’s uses or Suggested Resources others, such as travelling distances, • Maclean’s magazine rankings of universities programs offered, size of the city or town and community colleges where you would be living, and opportunities for part-time work. • Other publications ranking universities and community colleges 5. Choose the ten criteria you consider most • University and community college calendars important. Research any data you need to • Guidance counsellors rate universities and colleges with these criteria. • Map of Ontario • Spreadsheets 6. Assign a weighting factor to each of the ten criteria. For example, living close to home Refer to section 9.3 for information on may be worth a weighting of 5 and tuition implementing an action plan and Appendix C cost may be worth a weighting of 7. for information on research techniques. 7. Use a spreadsheet and matrix methods to determine an overall score for each university or community college in Ontario. Then, rank the universities or www.mcgrawhill.ca/links/MDM12 community colleges on the spreadsheet. For details of the Maclean’s rankings of universities Compare your rankings with those in and colleges, visit the above web site and follow Maclean’s magazine. Explain the similarities the links. or differences. 84 MHR • Tools for Data Management Project
  • 85. Evaluating Your Project Presentation 1. Reflect on your weighting formula and Prepare a written report on your findings. whether you believe it fairly ranks the Include universities and community colleges in • the raw data Ontario. • a rationale for your choice of criteria 2. Compare your rating system to that used • a rationale for your weightings by one of your classmates. Can you suggest • a printout of your spreadsheet improvements to either system? • a diagram showing the distances between 3 What went well in this project? your five highest-ranked universities or community colleges and the route you would 4. If you were to do the project over again, use to visit them what would you change? Why? • a summary of your findings 5. If you had more time, how would you extend this project? 6. What factors could change between now and when you make your final decision about which university or college to attend? Preparing for the Culminating Project Applying Project Skills Keeping on Track Consider how the data management tools you Now is a good time to draw up a schedule used on this project could be applied to the for your culminating project and to culminating project in Chapter 9 to investigate methods for selecting a topic. • access resources Refer to Chapter 9 for an overview of how to • carry out research prepare a major project. Section 9.1 suggests • carry out an action plan methods for choosing a topic. Also, consider • evaluate your project how to find the information you will need in • summarize your findings in a written report order to choose your topic. Refine/Redefine Define the Define Develop an Implement Evaluate Your Prepare Present Your Constructively Problem Your Task Action Plan Your Action Investigation Written Investigation Critique the Plan and Its Results Report and Its Results Presentations of Others Tools for Data Management Project: Wrap-Up • MHR 85
  • 86. Career Connection Cryptographer In this digital era, information is sent with blinding speed around the world. These transmissions need to be both secure and accurate. Although best known for their work on secret military codes, cryptographers also design and test computerized encryption systems that protect a huge range of sensitive data including telephone conversations among world leaders, business negotiations, data sent by credit-card readers in retail stores, and financial transactions on the Internet. Encrypted passwords protect hackers from reading or disrupting critical databases. Even many everyday devices, such as garage-door openers and TV remote controls, use codes. Cryptographers also develop error-correcting codes. Adding these special codes to a signal allows a computer receiving it to detect and correct errors that have occurred during transmission. Such codes have numerous applications including CD players, automotive computers, cable TV networks, and pictures sent back to Earth by interplanetary spacecraft. Modern cryptography is a marriage of mathematics and computers. A cryptographer must have a background in logic, matrices, combinatorics, and computer programming as well as fractal, chaos, number, and graph theory. Cryptographers work for a wide variety of organizations including banks, government offices, the military, software developers, and universities. www.mcgrawhill.ca/links/MDM12 Visit the above web site and follow the links for more information about a career as a cryptographer and about other careers related to mathematics. 86 MHR • Tools for Data Management
  • 87. Statistics Project Life Expectancies Background Do women live longer than men? Do people live longer in warmer climates? Are people living longer today than 50 years ago? Do factors such as education and income affect life expectancy? In this project, you will answer such questions by applying the statistical techniques described in the next two chapters. Your Task Research and analyse current data on life expectancies in Canada, and perhaps in other countries. You will use statistical analysis to compare and contrast the data, detect trends, predict future life expectancies, and identify factors that may affect life expectancies. Developing an Action Plan You will need to find sources of data on life expectancies and to choose the kinds of comparisons you want to make. You will also have to decide on a method for handling the data and appropriate techniques for analysing them. <<Section Project: Introduction • MHR Statistics number and title>> 87
  • 88. 2 2 PT ER ER Statistics of One Variable CHA Specific Expectations Section Locate data to answer questions of significance or personal interest, by 2.2 searching well-organized databases. Use the Internet effectively as a source for databases. 2.2 Demonstrate an understanding of the purpose and the use of a variety 2.3, 2.4 of sampling techniques. Describe different types of bias that may arise in surveys. 2.4 Illustrate sampling bias and variability by comparing the characteristics 2.4, 2.5, 2.6 of a known population with the characteristics of samples taken repeatedly from that population, using different sampling techniques. Organize and summarize data from secondary sources, using 2.1, 2.2, 2.5, technology. 2.6 Compute, using technology, measures of one-variable statistics (i.e., 2.5, 2.6 the mean, median, mode, range, interquartile range, variance, and standard deviation), and demonstrate an understanding of the appropriate use of each measure. Interpret one-variable statistics to describe characteristics of a data set. 2.5, 2.6 Describe the position of individual observations within a data set, using 2.6 z-scores and percentiles. Explain examples of the use and misuse of statistics in the media. 2.4 Assess the validity of conclusions made on the basis of statistical studies, 2.5, 2.6 by analysing possible sources of bias in the studies and by calculating and interpreting additional statistics, where possible. Explain the meaning and the use in the media of indices based on 2.2 surveys.
  • 89. In earlier times they had no statistics, and so they had to fall back on lies. Hence the huge exaggerations of primitive literature—giants or miracles or wonders! They did it with lies and we do it with statistics; but it is all the same. —Stephen Leacock (1869–1944) Facts are stubborn, but statistics are more pliable. —Mark Twain (1835–1910) Chapter Problem Contract Negotiations As these questions suggest, statistics could François is a young NHL hockey player be used to argue both for and against whose first major-league contract is up for a large salary increase for François. renewal. His agent wants to bargain for a However, the statistics themselves are better salary based on François’ strong not wrong or contradictory. François’ performance over his first five seasons with agent and the team’s manager will, the team. Here are some of François’ understandably, each emphasize only the statistics for the past five seasons. statistics that support their bargaining positions. Such selective use of statistics Season Games Goals Assists Points is one reason why they sometimes receive negative comments such as the quotations 1 20 3 4 7 above. Also, even well-intentioned 2 45 7 11 18 researchers sometimes inadvertently use 3 76 19 25 44 biased methods and produce unreliable 4 80 19 37 56 results. This chapter explores such sources 5 82 28 36 64 of error and methods for avoiding them. Total 303 76 113 189 Properly used, statistical analysis is a 1. How could François’ agent use these powerful tool for detecting trends and statistics to argue for a substantial pay drawing conclusions, especially when you increase for his client? have to deal with large sets of data. 2. Are there any trends in the data that the team’s manager could use to justify a more modest increase?
  • 90. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Fractions, percents, decimals The following 6. Graphing data Consider the following amounts are the total cost for the items graph, which shows the average price including the 7% goods and services tax of thingamajigs over time. (GST) and an 8% provincial sales tax (PST). Price of Thingamajigs ($) Determine the price of each item. 1.90 1.80 a) watch $90.85 1.70 b) CD $19.54 1.60 c) bicycle $550.85 1.50 1.40 d) running shoes $74.39 0 1996 1997 1998 1999 2000 2001 2. Fractions, percents, decimals Year a) How much will Josh make if he receives an 8% increase on his pay of $12.50/h? a) What was the price of thingamajigs in 1996? b) What is the net increase in Josh’s take- home pay if the payroll deductions total b) In what year did the price first rise 17%? above $1.50? c) Describe the overall trend over the 3. Fractions, percents, decimals What is the time period shown. percent reduction on a sweater marked d) Estimate the percent increase in the down from $50 to $35? price of thingamajigs from 1996 to 2001. 4. Fractions, percents, decimals Determine e) List the domain and range of these data. the cost, including taxes, of a VCR sold at a 25% discount from its original price of 7. Graphing data The table below gives the $219. number of CDs sold at a music store on each day of the week for one week. 5. Mean, median, mode Calculate the mean, Day Number of CDs Sold median, and mode for each set of data. Monday 48 a) 22, 26, 28, 27, 26 Tuesday 52 b) 11, 19, 14, 23, 16, 26, 30, 29 Wednesday 44 c) 10, 18, 30, 43, 18, 13, 10 Thursday 65 d) 70, 30, 25, 52, 12, 70 Friday 122 e) 370, 260, 155, 102, 126, 440 Saturday 152 Sunday 84 f) 24, 32, 37, 24, 32, 38, 32, 36, 35, 42 Display the data on a circle graph. 90 MHR • Statistics of One Variable
  • 91. 2.1 Data Analysis With Graphs Statistics is the gathering, organization, analysis, and presentation of numerical information. You can apply statistical methods to almost any kind of data. Researchers, advertisers, professors, and sports announcers all make use of statistics. Often, researchers gather large quantities of data since larger samples usually give more accurate results. The first step in the analysis of such data is to find ways to organize, analyse, and present the information in an understandable form. I N V E S T I G AT E & I N Q U I R E : U s i n g G r a p h s t o A n a l y s e D a t a 1. Work in groups or as a class to design a fast and efficient way to survey your class about a simple numerical variable, such as the students’ heights or the distances they travel to school. 2. Carry out your survey and record all the results in a table. 3. Consider how you could organize these results to look for any trends or patterns. Would it make sense to change the order of the data or to divide them into groups? Prepare an organized table and see if you can detect any patterns in the data. Compare your table to those of your classmates. Which methods work best? Can you suggest improvements to any of the tables? 4. Make a graph that shows how often each value or group of values occurs in your data. Does your graph reveal any patterns in the data? Compare your graph to those drawn by your classmates. Which graph shows the data most clearly? Do any of the graphs have other advantages? Explain which graph you think is the best overall. 5. Design a graph showing the total of the frequencies of all values of the variable up to a given amount. Compare this cumulative-frequency graph to those drawn by your classmates. Again, decide which design works best and look for ways to improve your own graph and those of your classmates. The unprocessed information collected for a study is called raw data. The quantity being measured is the variable. A continuous variable can have any value within a given range, while a discrete variable can have only certain separate values (often integers). For example, the height of students in your school is a continuous variable, but the number in each class is a discrete variable. Often, it is useful to know how frequently the different values of a variable occur in a set of data. Frequency tables and frequency diagrams can give a convenient overview of the distribution of values of the variable and reveal trends in the data. 2.1 Data Analysis With Graphs • MHR 91
  • 92. A histogram is a special form of bar graph in which the areas of the bars are proportional to the frequencies of the values of the variable. The bars in a histogram are connected and represent a continuous range of values. Histograms are used for variables whose values can be arranged in numerical order, especially continuous variables, such as weight, temperature, or travel time. Bar graphs can represent all kinds of variables, including the frequencies of separate categories that have no set order, such as hair colour or citizenship. A frequency polygon can illustrate the same information as a histogram or bar graph. To form a frequency polygon, plot frequencies versus variable values and then join the points with straight lines. 10 10 10 Frequency Frequency Frequency 5 5 5 0 0 Red 0 5 10 15 20 25 30 Blond Brown Black Purple Green 5 10 15 20 25 Travel Time to School (min) Hair Colour Travel Time to School (min) Histogram Bar Graph Frequency Polygon A cumulative-frequency graph or ogive shows the running 30 total of the frequencies from the lowest value up. 25 Cumulative Frequency 20 www.mcgrawhill.ca/links/MDM12 15 To learn more about histograms, visit the above web site and follow the links. Write a short 10 description of how to construct a histogram. 5 0 5 10 15 20 25 Travel Time to School (min) Example 1 Frequency Tables and Diagrams Here are the sums of the two numbers from 50 rolls of a pair of standard dice. 11 4 4 10 8 7 6 6 5 10 7 9 8 8 4 7 9 11 12 10 3 7 6 9 5 8 6 8 2 6 7 5 11 2 5 5 6 6 5 2 10 9 6 5 5 5 3 9 8 2 a) Use a frequency table to organize these data. b) Are any trends or patterns apparent in this table? c) Use a graph to illustrate the information in the frequency table. 92 MHR • Statistics of One Variable
  • 93. d) Create a cumulative-frequency table and graph for the data. e) What proportion of the data has a value of 6 or less? Solution a) Go through the data and tally the frequency of each value of Sum Tally Frequency the variable as shown in the table on the right. 2 |||| 4 3 || 2 b) The table does reveal a pattern that was not 4 ||| 3 obvious from the raw data. From the 5 |||| |||| 9 frequency column, notice that the middle 6 |||| ||| 8 values tend to be the most frequent while 7 |||| 5 the high and low values are much less 8 |||| | 6 frequent. 9 |||| 5 10 |||| 4 11 ||| 3 12 | 1 c) The bar graph or 8 8 Frequency frequency polygon Frequency 6 6 makes the pattern 4 4 in the data more 2 2 apparent. 0 0 2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12 Sum Sum d) Add a column for cumulative frequencies to the table. Each value in this column is the running total of the frequencies of each sum up to and including the one listed in the corresponding row of the sum column. Graph these cumulative frequencies against the values of the variable. Sum Tally Frequency Cumulative Frequency Cumulative Frequency 50 2 |||| 4 4 40 3 || 2 6 30 4 ||| 3 9 20 5 |||| |||| 9 18 10 6 |||| ||| 8 26 0 7 |||| 5 31 2 3 4 5 6 7 8 9 10 11 12 8 |||| | 6 37 Sum 9 |||| 5 42 10 |||| 4 46 11 ||| 3 49 12 | 1 50 e) From either the cumulative-frequency column or the diagram, you can see that 26 of the 50 outcomes had a value of 6 or less. 2.1 Data Analysis With Graphs • MHR 93
  • 94. When the number of measured values is large, data are usually grouped into classes or intervals, which make tables and graphs easier to construct and interpret. Generally, it is convenient to use from 5 to 20 equal intervals that cover the entire range from the smallest to the largest value of the variable. The interval width should be an even fraction or multiple of the measurement unit for the variable. Technology is particularly helpful when you are working with large sets of data. Example 2 Working With Grouped Data This table lists the daily high temperatures in July for a city in southern Ontario. Day 1 2 3 4 5 6 7 8 9 10 11 Temperature (°C) 27 25 24 30 32 31 29 24 22 19 21 Day 12 13 14 15 16 17 18 19 20 21 22 Temperature (°C) 25 26 31 33 33 30 29 27 28 26 27 Day 23 24 25 26 27 28 29 30 31 Temperature (°C) 22 18 20 25 26 29 32 31 28 a) Group the data and construct a frequency table, a histogram or frequency See Appendix B for polygon, and a cumulative-frequency graph. more detailed b) On how many days was the maximum temperature 25°C or less? On how information about many days did the temperature exceed 30°C? technology functions and keystrokes. Solution 1 Using a Graphing Calculator a) The range of the data is 33°C − 18°C = 15°C. You could use five 3-degree intervals, but then many of the recorded temperatures would fall on the interval boundaries. You can avoid this problem by using eight 2-degree intervals with the lower limit of the first interval at 17.5°C. The upper limit of the last interval will be 33.5°C. Use the STAT EDIT menu to make sure that lists L1 to L4 are clear, and then enter the temperature data into L1. Use STAT PLOT to turn on Plot1 and select the histogram icon. Next, adjust the window settings. Set Xmin and Xmax to the lower and upper limits for your intervals and set Xscl to the interval width. Ymin should be 0. Press GRAPH to display the histogram, then adjust Ymax and Yscl, if necessary. 94 MHR • Statistics of One Variable
  • 95. You can now use the TRACE instruction and the arrow keys to determine the tally for each of the intervals. Enter the midpoints of the intervals into L2 and the tallies into L3. Turn off Plot1 and set up Plot2 as an x-y line plot of lists L2 and L3 to produce a frequency polygon. Use the cumSum( function from the LIST OPS menu to find the running totals of the frequencies in L3 and store the totals in L4. Now, an x-y line plot of L2 and L4 will produce a cumulative-frequency graph. b) Since you know that all the temperatures were in whole degrees, you can see from the cumulative frequencies in L4 that there were 11 days on which the maximum temperature was no higher than 25°C. You can also get this information from the cumulative-frequency graph. You cannot determine the exact number of days with temperatures over 30°C from the grouped data because temperatures from 29.5°C to 31.5°C are in the same interval. However, by interpolating the cumulative- frequency graph, you can see that there were about 6 days on which the maximum temperature was 31°C or higher. Solution 2 Using a Spreadsheet a) Enter the temperature data into column A and the midpoints of the intervals into column B. Use the COUNTIF function in column C to tally the cumulative frequency for each interval. If you use absolute cell referencing, you can copy the formula down the column and then change just the upper limit in the counting condition. Next, find the frequency for each interval by finding the difference between its cumulative frequency and the one for the previous interval. You can then use the Chart feature to produce a frequency polygon by graphing columns B and D. Similarly, charting columns B and C will produce a cumulative-frequency graph. 2.1 Data Analysis With Graphs • MHR 95
  • 96. In Corel® Quattro® Pro, you can also use the Histogram tool in the Tools/Numeric Tools/Analysis menu to automatically tally the frequencies and cumulative frequencies. b) As in the solution using a graphing calculator, you can see from the cumulative frequencies that there were 11 days on which the maximum temperature was no higher than 25°C. Also, you can estimate from the cumulative-frequency graph that there were 6 days on which the maximum temperature was 31°C or higher. Note that you could use the COUNTIF function with the raw data to find the exact number of days with temperatures over 30°C. 96 MHR • Statistics of One Variable
  • 97. A relative-frequency table or diagram shows the frequency of a data Project group as a fraction or percent of the whole data set. Prep You may find Example 3 Relative-Frequency Distribution frequency- Here are a class’ scores 78 81 55 60 65 86 44 90 distribution diagrams obtained on a data- 77 71 62 39 80 72 70 64 useful for your management examination. 88 73 61 70 75 96 51 73 statistics project. 59 68 65 81 78 67 a) Construct a frequency table that includes a column for relative frequency. b) Construct a histogram and a frequency polygon. c) Construct a relative-frequency histogram and a relative-frequency polygon. d) What proportion of the students had marks between 70% and 79%? Solution a) The lowest and highest scores are Score (%) Midpoint Tally Frequency Relative Frequency 39% and 96%, which give a range 34.5−39.5 37 | 1 0.033 of 57%. An interval width of 5 is 39.5−44.5 42 | 1 0.033 convenient, so you could use 44.5−49.5 47 − 0 0 13 intervals as shown here. To 49.5−54.5 52 | 1 0.033 determine the relative frequencies, 54.5−59.5 57 || 2 0.067 divide the frequency by the total 59.5−64.5 62 |||| 4 0.133 number of scores. For example, the 64.5−69.5 67 |||| 4 0.133 relative frequency of the first interval 69.5−74.5 72 |||| | 6 0.200 1 74.5−79.5 77 |||| 4 0.133 is ᎏᎏ, showing that approximately 79.5−84.5 82 ||| 3 0.100 30 3% of the class scored between 84.5−89.5 87 || 2 0.067 34.5% and 39.5%. 89.5−94.5 92 | 1 0.033 94.5−99.5 97 | 1 0.033 b) The frequency polygon can be superimposed onto the same grid as the histogram. 6 Frequency 4 2 0 37 42 47 52 57 62 67 72 77 82 87 92 97 Score 2.1 Data Analysis With Graphs • MHR 97
  • 98. c) Draw the relative-frequency histogram and 0.2 Relative Frequency the relative-frequency polygon using the same procedure as for a regular histogram and frequency polygon. As you can see, the only 0.1 difference is the scale of the y-axis. 0 37 42 47 52 57 62 67 72 77 82 87 92 97 Score d) To determine the proportion of students with marks in the 70s, add the relative frequencies of the interval from 69.5 to 74.5 and the interval from 74.5 to 79.5: 0.200 + 0.133 = 0.333 Thus, 33% of the class had marks between 70% and 79%. Categorical data are given labels rather than being measured numerically. For example, surveys of blood types, citizenship, or favourite foods all produce categorical data. Circle graphs (also known as pie charts) and pictographs are often used instead of bar graphs to illustrate categorical data. Example 4 Presenting Categorical Data The table at the right shows Canadians’ primary use Primary Use Households (%) of the Internet in 1999. E-mail 15.8 Electronic banking 4.2 Illustrate these data with Purchase of goods and services 3.6 a) a circle graph Medical or health information 8.6 b) a pictograph Formal education/training 5.8 Government information 7.8 Other specific information 14.7 General browsing 14.2 Playing games 6.7 Chat groups 4.7 Other Internet services 5.8 Obtaining music 5.0 Listening to the radio 3.1 98 MHR • Statistics of One Variable
  • 99. Solution a) Home Internet Use Listening to the Radio 3.1% Obtaining Music 5.0% E-mail 15.8% Other Internet Services 5.8% Chat Groups 4.7% Electronic Banking 4.2% Purchase of Goods and Services 3.6% Playing Games 6.7% Medical or Health Information 8.6% Formal Education/Training 5.8% General Browsing 14.2% Government Information 7.8% Other Specific Information 14.7% b) There are numerous ways to represent the data with a pictograph. The one shown here has the advantages of being simple and visually indicating that the data involve computers. Home Internet Use E-mail Electronic Banking Purchase of Goods and Services Medical or Health Information Formal Education/Training Government Information Other Specific Information General Browsing Playing Games Chat Groups Other Internet Services Obtaining Music Listening to the Radio Each represents 2% of households. You can see from the example above that circle graphs are good for showing the sizes of categories relative to the whole and to each other. Pictographs can use a wide variety of visual elements to clarify the data and make the graph more interesting. However, with both circle graphs and pictographs, the relative frequencies for the categories can be hard to read accurately. While a well- designed pictograph can be a useful tool, you will sometimes see pictographs with distorted or missing scales or confusing graphics. 2.1 Data Analysis With Graphs • MHR 99
  • 100. Key Concepts • Variables can be either continuous or discrete. • Frequency-distribution tables and diagrams are useful methods of summarizing large amounts of data. • When the number of measured values is large, data are usually grouped into classes or intervals. This technique is particularly helpful with continuous variables. • A frequency diagram shows the frequencies of values in each individual interval, while a cumulative-frequency diagram shows the running total of frequencies from the lowest interval up. • A relative-frequency diagram shows the frequency of each interval as a proportion of the whole data set. • Categorical data can be presented in various forms, including bar graphs, circle graphs (or pie charts), and pictographs. Communicate Your Understanding 1. a) What information does a histogram present? b) Explain why you cannot use categorical data in a histogram. 2. a) What is the difference between a frequency diagram and a cumulative- frequency diagram? b) What are the advantages of each of these diagrams? 3. a) What is the difference between a frequency diagram and a relative- frequency diagram? b) What information can be easily read from a frequency diagram? c) What information can be easily read from a relative-frequency diagram? 4. Describe the strengths and weaknesses of circle graphs and pictographs. 100 MHR • Statistics of One Variable
  • 101. Practise b) Use the circle graph to determine what percent of the people surveyed chose A vegetarian dishes. 1. Explain the problem with the intervals in c) Sketch a pictograph for the data. each of the following tables. d) Use the pictograph to determine whether a) Age (years) Frequency more than half of the respondents chose 28−32 6 red-meat dishes. 33−38 8 4. a) Estimate the number of hours you spent 38−42 11 each weekday on each of the following 42−48 9 activities: eating, sleeping, attending 48−52 4 class, homework, a job, household b) Score (%) Frequency chores, recreation, other. 61−65 5 b) Present this information using a circle 66−70 11 graph. 71−75 7 c) Present the information using a 76−80 4 pictograph. 91−95 1 Apply, Solve, Communicate 2. Would you choose a histogram or a bar 5. The examination scores for a biology class graph with separated bars for the data listed are shown below. below? Explain your choices. 68 77 91 66 52 58 79 94 81 a) the numbers from 100 rolls of a standard 60 73 57 44 58 71 78 80 54 die 87 43 61 90 41 76 55 75 49 b) the distances 40 athletes throw a shot-put a) Determine the range for these data. c) the ages of all players in a junior lacrosse b) Determine a reasonable interval size league and number of intervals. d) the heights of all players in a junior c) Produce a frequency table for the lacrosse league grouped data. 3. A catering service conducted a survey asking d) Produce a histogram and frequency respondents to choose from six different hot polygon for the grouped data. meals. e) Produce a relative-frequency polygon Meal Chosen Number for the data. Chicken cordon bleu 16 f) Produce a cumulative-frequency polygon New York steak 20 for the data. Pasta primavera (vegetarian) 9 g) What do the frequency polygon, the Lamb chop 12 relative-frequency polygon, and the Grilled salmon 10 cumulative-frequency polygon each Mushroom stir-fry with almonds (vegetarian) 5 illustrate best? a) Create a circle graph to illustrate these data. 2.1 Data Analysis With Graphs • MHR 101
  • 102. b) Create a frequency table and diagram. B 6. a) Sketch a bar graph to show the results c) Create a cumulative-frequency diagram. you would expect if you were to roll a d) How might the store owner use this standard die 30 times. information in planning sales b) Perform the experiment or simulate it promotions? with software or the random-number 9. The speeds of 24 motorists ticketed for generator of a graphing calculator. exceeding a 60-km/h limit are listed below. Record the results in a table. 75 72 66 80 75 70 71 82 c) Produce a bar graph for the data you 69 70 72 78 90 75 76 80 collected. 75 96 91 77 76 84 74 79 d) Compare the bar graphs from a) and c). Account for any discrepancies you a) Construct a frequency-distribution table observe. for these data. b) Construct a histogram and frequency 7. Application In order to set a reasonable price polygon. for a “bottomless” cup of coffee, a restaurant c) Construct a cumulative-frequency owner recorded the number of cups each diagram. customer ordered on a typical afternoon. d) How many of the motorists exceeded 2 1 2 3 0 1 1 1 2 2 the speed limit by 15 km/h or less? 1 3 1 4 2 0 1 2 3 1 e) How many exceeded the speed limit by a) Would you present these data in a over 20 km/h? grouped or ungrouped format? Explain your choice. 10. Communication This table summarizes the pt ha e salaries for François’ hockey team. b) Create a frequency table and diagram. C r Salary ($) Number of Players m P r c) Create a cumulative-frequency diagram. oble 300 000 2 d) How can the restaurant owner use this 500 000 3 information to set a price for a cup of 750 000 8 coffee? What additional information 900 000 6 would be helpful? 1 000 000 2 8. Application The list below shows the value 1 500 000 1 of purchases, in dollars, by 30 customers at 3 000 000 1 a clothing store. 4 000 000 1 55.40 48.26 28.31 14.12 88.90 34.45 a) Reorganize these data into appropriate 51.02 71.87 105.12 10.19 74.44 29.05 intervals and present them in a frequency 43.56 90.66 23.00 60.52 43.17 28.49 table. 67.03 16.18 76.05 45.68 22.76 36.73 39.92 112.48 81.21 56.73 47.19 34.45 b) Create a histogram for these data. c) Identify and explain any unusual features a) Would you present these data in a about this distribution. grouped or ungrouped format? Explain your choice. 102 MHR • Statistics of One Variable
  • 103. 11. Communication a) What is the sum of all the relative frequencies for any set of data? b) Explain why this sum occurs. b) Sketch a relative-frequency polygon to 12. The following relative-frequency polygon show the results you would expect if was constructed for the examination scores these dice were rolled 100 times. for a class of 25 students. Construct the c) Explain why your graph has the shape frequency-distribution table for the students’ it does. scores. d) Use software or a graphing calculator 0.32 to simulate rolling the funny dice 100 0.28 times, and draw a relative-frequency Relative Frequency 0.24 polygon for the results. 0.20 e) Account for any differences between 0.16 the diagrams in parts b) and d). 0.12 0.08 15. This cumulative-frequency diagram shows 0.04 the distribution of the examination scores 0 35 45 55 65 75 85 95 for a statistics class. Score Cumulative Frequency 30 25 13. Inquiry/Problem Solving The manager of a 20 rock band suspects that MP3 web sites have 15 reduced sales of the band’s CDs. A survey of 10 fans last year showed that at least 50% had 5 purchased two or more of the band’s CDs. 0 34.5 44.5 54.5 64.5 74.5 84.5 94.5 A recent survey of 40 fans found they had Score purchased the following numbers of the band’s CDs. a) What interval contains the greatest 2 1 2 1 3 1 4 1 0 1 number of scores? Explain how you can 0 2 4 1 0 5 2 3 4 1 tell. 2 1 1 1 3 1 0 5 4 2 b) How many scores fall within this interval? 3 1 1 0 2 2 0 0 1 3 16. Predict the shape of the relative-frequency Does the new data support the manager’s diagram for the examination scores of a theory? Show the calculations you made to first-year university calculus class. Explain reach your conclusion, and illustrate the why you chose the shape you did. Assume results with a diagram. that students enrolled in a wide range of C programs take this course. State any other 14. Inquiry/Problem Solving assumptions that you need to make. a) What are the possible outcomes for a roll of two “funny dice” that have faces with the numbers 1, 1, 3, 5, 6, and 7? 2.1 Data Analysis With Graphs • MHR 103
  • 104. 2.2 Indices In the previous section, you used tables and graphs of frequencies to summarize data. Indices are another way to summarize data and recognize trends. An index relates the value of a variable (or group of variables) to a base level, which is often the value on a particular date. The base level is set so that the index produces numbers that are easy to understand and compare. Indices are used to report on a wide variety of variables, including prices and wages, ultraviolet levels in sunlight, and even the readability of textbooks. I N V E S T I G AT E & I N Q U I R E : C o n s u m e r P r i c e I n d e x The graph below shows Statistics Canada’s Unadjusted Consumer Price Index 118 consumer price index (CPI), which tracks 116 the cost of over 600 items that would be 114 (1992 = 100) purchased by a typical family in Canada. 112 For this chart, the base is the cost of the 110 same items in 1992. 108 106 104 M J J J J J M 1996 1997 1998 1999 2000 2001 1. What trend do you see in this graph? Estimate the annual rate of increase. 2. Estimate the annual rate of increase for the period from 1992 to 1996. Do you think the difference between this rate and the one from 1996 to 2001 is significant? Why or why not? 3. What was the index value in February of 1998? What does this value tell you about consumer prices at that time? 104 MHR • Statistics of One Variable
  • 105. 4. What would be the best way to estimate what the consumer price index will be in May of 2003? Explain your reasoning. 5. Explain how the choice of the vertical scale in the graph emphasizes changes in the index. Do you think this emphasis could be misleading? Why or why not? The best-known Canadian business index is the S&P/TSX Composite Index, managed for the Toronto Stock Exchange by Standard & Poor’s Corporation. Introduced in May, 2002, this index is a continuation of the TSE 300 Composite Index®, which goes back to 1977. The S&P/TSX Composite Index is a measure of the total market value of the shares of over 200 of the largest companies traded on the Toronto Stock Exchange. The index is the current value of these stocks divided by their total value in a base year and then multiplied by a scaling factor. When there are significant changes (such as takeovers or bankruptcies) in any of the companies in the index, the scaling factor is adjusted so that the values of the index remain directly comparable to earlier values. Note that the composite index weights each company by the total value of its shares (its market capitalization) rather than by the price of the individual shares. The S&P/TSX Composite Index usually indicates trends for major Canadian corporations reasonably well, but it does not always accurately reflect the overall Canadian stock market. Time-series graphs are often used to show how indices change over time. Such graphs plot variable values versus time and join the adjacent data points with straight lines. Example 1 Stock Market Index The following table shows the TSE 300 Composite Index® from 1971 to 2001. TSE 300 Index (1975 = 1000) 10 000 8000 6000 4000 2000 0 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 a) What does the notation “1975 = 1000” mean? b) By what factor did the index grow over the period shown? c) Estimate the rate of growth of the index during the 1980s. 2.2 Indices • MHR 105
  • 106. Solution a) The notation indicates that the index shows the stock prices relative to what they were in 1975. This 1975 base has been set at 1000. An index value of 2000 would mean that overall the stocks of the 300 companies in the index are selling for twice what they did in 1975. b) From the graph, you can see that the index increased from about 1000 in 1971 to about 10 000 in 2001. Thus, the index increased by a factor of approximately 10 over this period. c) To estimate the rate of growth of the index during the 1980s, approximate the time-series graph with a straight line during that 10-year interval. Then, calculate the slope of the line. www.mcgrawhill.ca/links/MDM12 rise m= ᎏ run For more information on stock indices, visit the above web site and follow the links. Write a brief ⋅ 3700 − 1700 = ᎏᎏ description of the rules for inclusion in 10 the various market indices. = 200 The TSE 300 Composite Index® rose about 200 points a year during the 1980s. Statistics Canada calculates a variety of carefully researched economic indices. For example, there are price indices for new housing, raw materials, machinery and equipment, industrial products, and farm products. Most of these indices are available with breakdowns by province or region and by specific categories, such as agriculture, forestry, or manufacturing. Statisticians, economists, and the media make extensive use of these indices. (See section 1.3 for information on how to access Statistics Canada data.) The consumer price index (CPI) is the most widely reported of these economic indices because it is an important measure of inflation. Inflation is Data in Action a general increase in prices, which corresponds to a decrease in the value of Statistics Canada money. To measure the average change in retail prices across Canada, usually publishes the consumer price index Statistics Canada monitors the retail prices of a set of over 600 goods and for each month in services including food, shelter, clothing, transportation, household items, the third week of the health and personal care, recreation and education, and alcohol and tobacco following month. products. These items are representative of purchases by typical Canadians Over 60 000 price and are weighted according to estimates of the total amount Canadians spend quotations are collected for each on each item. For example, milk has a weighting of 0.69% while tea has a update. weighting of only 0.06%. 106 MHR • Statistics of One Variable
  • 107. Example 2 Consumer Price Index The following graph shows the amount by which the consumer price index changed since the same month of the previous year. Percent Change in CPI 3 2 1 0 May J J J J J May 1996 1997 1998 1999 2000 2001 a) What does this graph tell you about changes in the CPI from 1996 to 2001? b) Estimate the mean annual change in the CPI for this period. Solution a) Note that the graph above shows the annual changes in the CPI, unlike the graph on page 104, which illustrates the value of the CPI for any given month. From the above graph, you can see that the annual change in the CPI varied between 0.5% and 4% from 1996 to 2001. Overall, Project there is an upward trend in the annual change during this period. Prep b) You can estimate the mean annual change by drawing a horizontal line If your statistics such that the total area between the line and the parts of the curve project examines above it is approximately equal to the total area between the line and how a variable the parts of the curve below it. As shown above, this line meets the changes over time, y-axis near 2%. a time-series graph may be an effective Thus, the mean annual increase in the CPI was roughly 2% from 1996 way to illustrate to 2001. your findings. The consumer price index and the cost of living index are not quite the same. The cost of living index measures the cost of maintaining a constant standard of living. If consumers like two similar products equally well, www.mcgrawhill.ca/links/MDM12 their standard of living does not change when they switch from one to the other. For example, if you For more information about Statistics Canada indices, visit the above web site and follow the like both apples and pears, you might start buying links to Statistics Canada. more apples and fewer pears if the price of pears went up while the price of apples was unchanged. Thus, your cost of living index increases less than the consumer price index does. 2.2 Indices • MHR 107
  • 108. Indices are also used in many other fields, including science, sociology, medicine, and engineering. There are even indices of the clarity of writing. Example 3 Readability Index The Gunning fog index is a measure of the readability of prose. This index estimates the years of schooling required to read the material easily. Gunning fog index = 0.4(average words per sentence + percent “hard” words) where “hard” words are all words over two syllables long except proper nouns, compounds of easy words, and verbs whose third syllable is ed or es. a) Calculate the Gunning fog index for a book with an average sentence length of 8 words and a 20% proportion of hard words. b) What are the advantages and limitations of this index? Solution a) Gunning fog index = 0.4(8 + 20) = 11.2 The Gunning fog index shows that the book is written at a level Project appropriate for readers who have completed grade 11. Prep b) The Gunning fog index is easy to use and understand. It generates a You may want to use grade-level rating, which is often more useful than a readability rating an index to on an arbitrary scale, such as 1 to 10 or 1 to 100. However, the index summarize and assumes that bigger words and longer sentences always make prose compare sets of data harder to read. A talented writer could use longer words and sentences in your statistics and still be more readable than writers who cannot clearly express their project. ideas. The Gunning fog index cannot, of course, evaluate literary merit. www.mcgrawhill.ca/links/MDM12 Visit the above web site to find a link to a readability-index calculator. Determine the reading level of a novel of your choice. 108 MHR • Statistics of One Variable
  • 109. Key Concepts • An index can summarize a set of data. Indices usually compare the values of a variable or group of variables to a base value. • Indices have a wide variety of applications in business, economics, science, and other fields. • A time-series graph is a line graph that shows how a variable changes over time. • The consumer price index (CPI) tracks the overall price of a representative basket of goods and services, making it a useful measure of inflation. Communicate Your Understanding 1. What are the key features of a time-series graph? 2. a) Name three groups who would be interested in the new housing price index. b) How would this information be important for each group? 3. Explain why the consumer price index is not the same as the cost of living index. Practise B A 3. Refer to the graph of the TSE 300 Composite Index® on page 105. 1. Refer to the consumer price index graph on page 104. a) When did this index first reach five times its base value? a) By how many index points did the CPI increase from January, 1992 to January, b) Estimate the growth rate of the index 2000? from 1971 to 1977. What does this growth rate suggest about the Canadian b) Express this increase as a percent. economy during this period? c) Estimate what an item that cost c) During what two-year period did the index i) $7.50 in 1992 cost in April, 1998 grow most rapidly? Explain your answer. ii) $55 in August, 1997 cost in May, 2000 d) Could a straight line be a useful mathematical model for the TSE 300 Apply, Solve, Communicate Composite Index®? Explain why or why 2. a) Explain why there is a wide variety of not. items in the CPI basket. 4. Communication b) Is the percent increase for the price of a) Define inflation. each item in the CPI basket the same? Explain. b) In what way do the consumer price index and the new housing price index provide a measure of inflation? 2.2 Indices • MHR 109
  • 110. c) How would you expect these two indices b) Describe how the overall trend in energy to be related? costs compares to that of the CPI for the d) Why do you think that they would be period shown. related in this way? c) What insight is gained by removing the energy component of the CPI? 5. Application Consider the following time- d) Estimate the overall increase in the series graph for the consumer price index. energy-adjusted CPI for the period shown. Consumer Price Index e) Discuss how your result in part d) compares to the value found in part b) of Example 2. (1992 = 100) 100 50 7. François’ agent wants to bargain for a better pte ha salary based on François’ statistics for his C r 0 first five seasons with the team. m P 1980 1984 1988 1992 1996 2000 r oble a) Produce a time-series graph for François’ a) Identify at least three features of this goals, assists, and points over the past graph that are different from the CPI five years. graph on page 104. b) Calculate the mean number of goals, b) Explain two advantages that the graph assists, and points per game played shown here has over the one on page 104. during each of François’ five seasons. c) Explain two disadvantages of the graph c) Generate a new time-series graph based shown here compared to the one on on the data from part b). page 104. d) Which time-series graph will the agent d) Estimate the year in which the CPI was likely use, and which will the team’s at 50. manager likely use during the contract e) Explain the significance of the result in negotiations? Explain. part d) in terms of prices in 1992. e) Explain the method or technology that you used to answer parts a) to d). 6. Application The following graph illustrates the CPI both with and without energy price 8. Aerial surveys of wolves in Algonquin Park changes. produced the following estimates of their population density. 4.0 Year Wolves/100 km2 Percent Change in CPI 3.0 1988–89 4.91 All Items 1989–90 2.47 2.0 1990–91 2.80 1991–92 3.62 1.0 All Items 1992–93 2.53 Excluding Energy 1993–94 2.23 0 May J J J J May 1994–95 2.82 1997 1998 1999 2000 2001 1995–96 2.75 a) How is this graph different from the one 1996–97 2.33 on page 107? 1997–98 3.04 1998–99 1.59 110 MHR • Statistics of One Variable
  • 111. a) Using 1988–89 as a base, construct an 12. Communication Use the Internet, a library, index for these data. or other resources to research two indices b) Comment on any trends that you not discussed in this section. Briefly describe observe. what each index measures, recent trends in the index, and any explanation or rationale 9. Use Statistics Canada web sites or other for these trends. sources to find statistics for the following and describe any trends you notice. 13. Inquiry/Problem Solving The pictograph below shows total greenhouse-gas emissions a) the population of Canada for each province and territory in 1996. b) the national unemployment rate = 638 c) the gross domestic product = 50 200 = 99 800 10. Inquiry/Problem Solving = 149 500 a) Use data from E-STAT or other sources = 199 100 to generate a time-series graph that kilotonnes of shows the annual number of crimes in CO2 equivalent Canada for the period 1989−1999. If using E-STAT, look in the Nation section under Justice/Crimes and Offences. b) Explain any patterns that you notice. c) In what year did the number of crimes peak? d) Suggest possible reasons why the number a) Which two provinces have the highest of crimes peaked in that year. What levels of greenhouse-gas emissions? other statistics would you need to b) Are the diameters or areas of the circles confirm whether these reasons are proportional to the numbers they related to the peak in the number of represent? Justify your answer. crimes? c) What are the advantages and disadvantages of presenting these data 11. a) Use data from E-STAT or other sources as a pictograph? to generate a time-series graph that shows the number of police officers in d) Which provinces have the highest levels Canada for the period 1989−1999. If of greenhouse-gas emissions per using E-STAT, look in the Nation section geographic area? under Justice/Police services. e) Is your answer to part d) what you would b) In what ways are the patterns in these have expected? How can you account for data similar to the patterns in the data such relatively high levels in these areas? in question 10? In what ways are the f) Research information from E-STAT patterns different? or other sources to determine the c) In what year did the number of police greenhouse-gas emissions per person officers peak? for each province. d) Explain how this information could affect your answer to part d) of question 10. 2.2 Indices • MHR 111
  • 112. ACHIEVEMENT CHECK a) Construct a Pareto chart for these data. b) Describe the similarities and differences Knowledge/ Thinking/Inquiry/ Communication Application Understanding Problem Solving between a Pareto chart and other frequency diagrams. 14. The graph below shows the national unemployment rate from January, 1997, Method Number of Respondents to June, 2001. Automobile: alone 26 % Automobile: car pool 35 10.0 Bus/Streetcar 52 9.5 Train 40 Unemployment Rate Seasonally Adjusted 9.0 8.5 Bicycle/Walking 13 8.0 7.5 7.0 6.5 6.0 www.mcgrawhill.ca/links/MDM12 J J J J J M 1997 1998 1999 2000 2001 For more information about Pareto charts, visit the above web site and follow the links. Give two examples of a) Describe the overall trend for the period situations where you would use a Pareto chart. shown. Explain your reasoning. b) When did the unemployment rate reach its lowest level? c) Estimate the overall unemployment rate 16. Pick five careers of interest to you. for the period shown. a) Use resources such as CANSIM II, d) Explain what the term seasonally adjusted E-STAT, newspapers, or the Internet to means. obtain information about entry-level income levels for these professions. e) Who is more likely to use this graph in an election campaign, the governing b) Choose an effective method to present party or an opposing party? Explain. your data. f) How might an opposing party produce a c) Describe any significant information graph showing rising unemployment you discovered. without changing the data? Why would 17. a) Research unemployment data for they produce such a graph? Ontario over the past 20 years. b) Present the data in an appropriate form. C c) Conduct additional research to account 15. A Pareto chart is a type of frequency diagram for any trends or unusual features of the in which the frequencies for categorical data data. are shown by connected bars arranged in d) Predict unemployment trends for both descending order of frequency. In a random the short term and the long term. survey, commuters listed their most Explain your predictions. common method of travelling to the downtown of a large city. 112 MHR • Statistics of One Variable
  • 113. 2.3 Sampling Techniques Who will win the next federal election? Are Canadians concerned about global warming? Should a Canadian city bid to host the next Olympic Games? Governments, political parties, advocacy groups, and news agencies often want to know the public’s opinions on such questions. Since it is not feasible to ask every citizen directly, researchers often survey a much smaller group and use the results to estimate the opinions of the entire population. I N V E S T I G AT E & I N Q U I R E : Extrapolating From a Sample 1. Work in groups or as a class to design a survey to determine the opinions of students in your school on a subject such as favourite movies, extra-curricular activities, or types of music. 2. Have everyone in your class answer the survey. 3. Decide how to categorize and record the results. Could you refine the survey questions to get results that are easier to work with? Explain the changes you would make. 4. How could you organize and present the data to make it easier to recognize any patterns? Can you draw any conclusions from the data? 5. a) Extrapolate your data to estimate the opinions of the entire school population. Explain your method. b) Describe any reasons why you think the estimates in part a) may be inaccurate. c) How could you improve your survey methods to get more valid results? In statistics, the term population refers to all individuals who belong to a group being studied. In the investigation above, the population is all the students in your school, and your class is a sample of that population. The population for a statistical study depends on the kind of data being collected. Example 1 Identifying a Population Identify the population for each of the following questions. a) Whom do you plan to vote for in the next Ontario election? b) What is your favourite type of baseball glove? c) Do women prefer to wear ordinary glasses or contact lenses? 2.3 Sampling Techniques • MHR 113
  • 114. Solution a) The population consists of those people in Ontario who will be eligible to vote on election day. b) The population would be just those people who play baseball. However, you might want to narrow the population further. For example, you might be interested only in answers from local or professional baseball players. c) The population is all women who use corrective lenses. Once you have identified the population, you need to decide how you will obtain your data. If the population is small, it may be possible to survey the entire group. For larger populations, you need to use an appropriate sampling technique. If selected carefully, a relatively small sample can give quite accurate results. The group of individuals who actually have a chance of being selected is called the sampling frame. The sampling frame varies depending on the sampling technique used. Here are some of the most commonly used sampling techniques. Simple Random Sample In a simple random sample, every member of the population has an equal chance of being selected and the selection of any particular individual does not affect the chances of any other individual being chosen. Choosing the sample randomly reduces the risk that selected members will not be representative of the whole population. You could select the sample by drawing names randomly or by assigning each member of the population a unique number and then using a random-number generator to determine which members to include. Systematic Sample For a systematic sample, you go through the population sequentially and select members at regular intervals. The sample size and the population size determine the sampling interval. population size interval = ᎏᎏ sample size For example, if you wanted the sample to be a tenth of the population, you would select every tenth member of the population, starting with one chosen randomly from among the first ten in sequence. 114 MHR • Statistics of One Variable
  • 115. Example 2 Designing a Systematic Sample A telephone company is planning a marketing survey of its 760 000 customers. For budget reasons, the company wants a sample size of about 250. a) Suggest a method for selecting a systematic sample. b) What expense is most likely to limit the sample size? Solution a) First, determine the sampling interval. population size interval = ᎏᎏ sample size 760 000 = ᎏᎏ 250 = 3040 The company could randomly select one of the first 3040 names on its list of customers and then choose every 3040th customer from that point on. For simplicity, the company might choose to select every 3000th customer instead. b) The major cost is likely to be salaries for the staff to call and interview the customers. Stratified Sample Sometimes a population includes groups of members who share common characteristics, such as gender, age, or education level. Such groups are called strata. A stratified sample has the same proportion of members from each stratum as the population does. Example 3 Designing a Stratified Sample Before booking bands for the school dances, the students’ council at Statsville High School wants to survey the music preferences of the student body. The following table shows the enrolment at the school. Grade Number of Students 9 255 10 232 11 209 12 184 Total 880 a) Design a stratified sample for a survey of 25% of the student body. b) Suggest other ways to stratify this sample. 2.3 Sampling Techniques • MHR 115
  • 116. Solution a) To obtain a stratified sample Grade Number of Students Relative Frequency Number Surveyed with the correct proportions, 9 255 0.29 64 simply select 25% of the 10 232 0.26 58 students in each grade level 11 209 0.24 52 as shown on the right. 12 184 0.21 46 Total 880 1.00 220 b) The sample could be stratified according to gender or age instead of grade level. Other Sampling Techniques Cluster Sample: If certain groups are likely to be representative of the entire population, you can use a random selection of such groups as a cluster sample. For example, a fast-food chain could save time and money by surveying all its employees at randomly selected locations instead of surveying randomly selected employees throughout the chain. Multi-Stage Sample: A multi-stage sample uses several levels of random sampling. If, for example, your population consisted of all Ontario households, you could first randomly sample from all cities and townships in Ontario, then randomly sample from all subdivisions or blocks within the selected cities and townships, and finally randomly sample from all houses within the selected subdivisions or blocks. Voluntary-Response Sample: In a voluntary-response sample, the researcher simply invites any member of the population to participate in the survey. The results from the responses of such surveys can be skewed because the people who choose to respond are often not representative of the population. Call-in shows and mail-in surveys rely on voluntary-response samples. Convenience Sample: Often, a sample is selected simply because it is easily accessible. While obviously not as random as some of the other techniques, such convenience samples can sometimes yield helpful information. The investigation at the beginning of this section used your class as a convenience sample. Key Concepts • Α carefully selected sample can provide accurate information about a population. • Selecting an appropriate sampling technique is important to ensure that the sample reflects the characteristics of the population. Randomly selected samples have a good chance of being representative of the population. • The choice of sampling technique will depend on a number of factors, such as the nature of the population, cost, convenience, and reliability. 116 MHR • Statistics of One Variable
  • 117. Communicate Your Understanding 1. What are the advantages and disadvantages of using a sample to estimate the characteristics of a population? 2. Discuss whether a systematic sample is a random sample. 3. a) Explain the difference between stratified sampling and cluster sampling. b) Suggest a situation in which it would be appropriate to use each of these two sampling techniques. Practise e) A statistician conducting a survey randomly selects 20 cities from across A Canada, then 5 neighbourhoods from 1. Identify the population for each of the each of the cities, and then 3 households following questions. from each of the neighbourhoods. a) Who should be the next president of f) The province randomly chooses 25 the students’ council? public schools to participate in a new b) Who should be next year’s grade-10 fundraising initiative. representative on the student council? 3. What type(s) of sample would be c) What is the your favourite soft drink? appropriate for d) Which Beatles song was the best? a) a survey of engineers, technicians, and e) How effective is a new headache remedy? managers employed by a company? b) determining the most popular pizza 2. Classify the sampling method used in each topping? of the following scenarios. c) measuring customer satisfaction for a a) A radio-show host invites listeners to call department store? in with their views on banning smoking in restaurants. Apply, Solve, Communicate b) The Heritage Ministry selects a sample of recent immigrants such that the B proportions from each country of origin 4. Natasha is organizing the annual family are the same as for all immigrants last picnic and wants to arrange a menu that will year. appeal to children, teens, and adults. She c) A reporter stops people on a downtown estimates that she has enough time to survey street to ask what they think of the city’s about a dozen people. How should Natasha lakefront. design a stratified sample if she expects 13 children, 8 teens, and 16 adults to attend d) A school guidance counsellor arranges the picnic? interviews with every fifth student on the alphabetized attendance roster. 2.3 Sampling Techniques • MHR 117
  • 118. 5. Communication Find out, or estimate, how 9. Application The host of a call-in program many students attend your school. Describe invites listeners to comment on a recent how you would design a systematic sample trade by the Toronto Maple Leafs. One of these students. Assume that you can caller criticizes the host, stating that the survey about 20 students. sampling technique is not random. The host replies: “So what? It doesn’t matter!” 6. The newly elected Chancellor of the a) What sampling technique is the call-in Galactic Federation is interested in the show using? opinions of all citizens regarding economic conditions in the galaxy. Unfortunately, she b) Is the caller’s statement correct? Explain. does not have the resources to visit every c) Is the host’s response mathematically populated planet or to send delegates to correct? Why or why not? them. Describe how the Chancellor might organize a multi-stage sample to carry out C her survey. 10. Look in newspapers and periodicals or on the Internet for an article about a study 7. Communication A community centre chooses involving a systematic, stratified, cluster, 15 of its members at random and asks them or multi-stage sample. Comment on the to have each member of their families suitability of the sampling technique and complete a short questionnaire. the validity of the study. Present your a) What type of sample is the community answer in the form of a brief report. Include centre using? any suggestions you have for improving the b) Are the 15 community-centre members study. a random sample of the community? 11. Inquiry/Problem Solving Design a data- Explain. gathering method that uses a combination c) To what extent are the family members of convenience and systematic sampling randomly chosen? techniques. 8. Application A students’ council is conducting 12. Inquiry/Problem Solving Pick a professional a poll of students as they enter the cafeteria. sport that has championship playoffs each a) What sampling method is the student year. council using? a) Design a multi-stage sample to gather b) Discuss whether this method is your schoolmates’ opinions on which appropriate for surveying students’ team is likely to win the next opinions on championship. i) the new mural in the cafeteria b) Describe how you would carry out your ii) the location for the graduation prom study and illustrate your findings. c) Would another sampling technique be c) Research the media to find what the better for either of the surveys in part b)? professional commentators are predicting. Do you think these opinions would be more valid than the results of your survey? Why or why not? 118 MHR • Statistics of One Variable
  • 119. 2.4 Bias in Surveys The results of a survey can be accurate only if the sample is representative of the population and the measurements are objective. The methods used for choosing the sample and collecting the data must be free from bias. Statistical bias is any factor that favours certain outcomes or responses and hence systematically skews the survey results. Such bias is often unintentional. A researcher may inadvertently use an unsuitable method or simply fail to recognize a factor that prevents a sample from being fully random. Regrettably, some people deliberately bias surveys in order to get the results they want. For this reason, it is important to understand not only how to use statistics, but also how to recognize the misuse of statistics. I N V E S T I G AT E & I N Q U I R E : Bias in a Sur vey 1. What sampling technique is the pollster in this cartoon likely to be using? 2. What is wrong with his survey methods? How could he improve them? 3. Do you think the bias in this survey is intentional? Why or why not? 4. Will this bias seriously distort the results of the survey? Explain your reasoning. 5. What point is the cartoonist making about survey methods? 6. Sketch your own cartoon or short comic strip about data management. Sampling bias occurs when the sampling frame does not reflect the characteristics of the population. Biased samples can result from problems with either the sampling technique or the data-collection method. 2.4 Bias in Surveys • MHR 119
  • 120. Example 1 Sampling Bias Identify the bias in each of the following surveys and suggest how it could be avoided. a) A survey asked students at a high-school football game whether a fund for extra-curricular activities should be used to buy new equipment for the football team or instruments for the school band. b) An aid agency in a developing country wants to know what proportion of households have at least one personal computer. One of the agency’s staff members conducts a survey by calling households randomly selected from the telephone directory. Solution a) Since the sample includes only football fans, it is not representative of the whole student body. A poor choice of sampling technique makes the results of the survey invalid. A random sample selected from the entire student body would give unbiased results. b) There could be a significant number of households without telephones. Such households are unlikely to have computers. Since the telephone survey excludes these households, it will overestimate the proportion of households that have computers. By using a telephone survey as the data-collection method, the researcher has inadvertently biased the sample. Visiting randomly selected households would give a more accurate estimate of the proportion that have computers. However, this method of data collection would be more time-consuming and more costly than a telephone survey. Non-response bias occurs when particular groups are under-represented in a survey because they choose not to participate. Thus, non-response bias is a form of sampling bias. Example 2 Non-Response Bias A science class asks every fifth student entering the cafeteria to answer a survey on environmental issues. Less than half agree to complete the questionnaire. The completed questionnaires show that a high proportion of the respondents are concerned about the environment and well-informed about environmental issues. What bias could affect these results? Solution The students who chose not to participate in the survey are likely to be those least interested in environmental issues. As a result, the sample may not be representative of all the students at the school. 120 MHR • Statistics of One Variable
  • 121. To avoid non-response bias, researchers must ensure that the sampling process is truly random. For example, they could include questions that identify members of particular groups to verify that they are properly represented in the sample. Measurement bias occurs when the data-collection method consistently either under- or overestimates a characteristic of the population. While random errors tend to cancel out, a consistent measurement error will skew the results of a survey. Often, measurement bias results from a data-collection process that affects the variable it is measuring. Example 3 Measurement Bias Identify the bias in each of the following surveys and suggest how it could be avoided. a) A highway engineer suggests that an economical way to survey traffic speeds on an expressway would be to have the police officers who patrol the highway record the speed of the traffic around them every half hour. b) As part of a survey of the “Greatest Hits of All Time,” a radio station asks its listeners: Which was the best song by the Beatles? i) Help! ii) Nowhere Man iii) In My Life iv) Other: c) A poll by a tabloid newspaper includes the question: “Do you favour the proposed bylaw in which the government will dictate whether you have the right to smoke in a restaurant?” Solution a) Most drivers who are speeding will slow down when they see a police cruiser. A survey by police cruisers would underestimate the average traffic speed. Here, the data-collection method would systematically decrease the variable it is measuring. A survey by unmarked cars or hidden speed sensors would give more accurate results. b) The question was intended to remind listeners of some of the Beatles’ early recordings that might have been overshadowed by their later hits. However, some people will choose one of the suggested songs as their answer even though they would not have thought of these songs without prompting. Such leading questions usually produce biased results. The survey would more accurately determine listeners’ opinions if the question did not include any suggested answers. c) This question distracts attention from the real issue, namely smoking in restaurants, by suggesting that the government will infringe on the respondents’ rights. Such loaded questions contain wording or information intended to influence the respondents’ answers. A question with straightforward neutral language will produce more accurate data. For example, the question could read simply: “Should smoking in restaurants be banned?” 2.4 Bias in Surveys • MHR 121
  • 122. Response bias occurs when participants in a survey deliberately give false Project or misleading answers. The respondents might want to influence the results Prep unduly, or they may simply be afraid or embarrassed to answer sensitive questions honestly. When gathering data for your statistics project, you Example 4 Response Bias will need to ensure that the sampling A teacher has just explained a particularly difficult concept to her class and process is free from wants to check that all the students have grasped this concept. She realizes bias. that if she asks those who did not understand to put up their hands, these students may be too embarrassed to admit that they could not follow the lesson. How could the teacher eliminate this response bias? Solution The teacher could say: “This material is very difficult. Does anyone want me to go over it again?” This question is much less embarrassing for students to answer honestly, since it suggests that it is normal to have difficulty with the material. Better still, she could conduct a survey that lets the students answer anonymously. The teacher could ask the students to rate their understanding on a scale of 1 to 5 and mark the ratings on slips of paper, which they would deposit in a box. The teacher can then use these ballots to decide whether to review the challenging material at the next class. As the last two examples illustrate, careful wording of survey questions is essential for avoiding bias. Researchers can also use techniques such as follow-up questions and guarantees of anonymity to eliminate response bias. For a study to be valid, all aspects of the sampling process must be free from bias. Key Concepts • Sampling, measurement, response, and non-response bias can all invalidate the results of a survey. • Intentional bias can be used to manipulate statistics in favour of a certain point of view. • Unintentional bias can be introduced if the sampling and data-collection methods are not chosen carefully. • Leading and loaded questions contain language that can influence the respondents’ answers. 122 MHR • Statistics of One Variable
  • 123. Communicate Your Understanding 1. Explain the difference between a measurement bias and a sampling bias. 2. Explain how a researcher could inadvertently bias a study. 3. Describe how each of the following might use intentional bias a) the media b) a marketing department c) a lobby group Practise 3. Communication Reword each of the following questions to eliminate the measurement bias. A a) In light of the current government’s weak 1. Classify the bias in each of the following policies, do you think that it is time for a scenarios. refreshing change at the next federal a) Members of a golf and country club are election? polled regarding the construction of a b) Do you plan to support the current highway interchange on part of their golf government at the next federal election, course. in order that they can continue to b) A group of city councillors are asked implement their effective policies? whether they have ever taken part in c) Is first-year calculus as brutal as they say? an illegal protest. d) Which of the following is your favourite c) A random poll asks the following male movie star? question: “The proposed casino will i) Al Pacino ii) Keanu Reeves produce a number of jobs and economic activity in and around your city, and it iii) Robert DeNiro iv) Jack Nicholson will also generate revenue for the v) Antonio Banderas vi) Other: provincial government. Are you in favour e) Do you think that fighting should be of this forward-thinking initiative?” eliminated from professional hockey so d) A survey uses a cluster sample of Toronto that skilled players can restore the high residents to determine public opinion on standards of the game? whether the provincial government should increase funding for the public transit. B 4. Communication Apply, Solve, Communicate a) Write your own example of a leading question and a loaded question. 2. For each scenario in question 1, suggest how the survey process could be changed to b) Write an unbiased version for each eliminate bias. of these two questions. 2.4 Bias in Surveys • MHR 123
  • 124. ACHIEVEMENT CHECK 6. Application A talk-show host conducts an on-air survey about re-instituting capital Knowledge/ Thinking/Inquiry/ Understanding Problem Solving Communication Application punishment in Canada. Six out of ten callers voice their support for capital punishment. 5. A school principal wants to survey data- The next day, the host claims that 60% of management students to determine Canadians are in favour of capital whether having computer Internet access punishment. Is this claim statistically valid? at home improves their success in this Explain your reasoning. course. a) What type of sample would you C suggest? Why? Describe a technique 7. a) Locate an article from a newspaper, for choosing the sample. periodical, or Internet site that involves b) The following questions were drafted a study that contains bias. for the survey questionnaire. Identify b) Briefly describe the study and its any bias in the questions and suggest a findings. rewording to eliminate the bias. c) Describe the nature of the bias inherent i) Can your family afford high-speed in the study. Internet access? d) How has this bias affected the results of ii) Answer the question that follows the study? your mark in data management. e) Suggest how the study could have Over 80%: How many hours per eliminated the bias. week do you spend on the Internet at home? 8. Inquiry/Problem Solving Do you think that 60−80%: Would home Internet the members of Parliament are a access improve your mark in data representative sample of the population? management ? Why or why not? Below 60%: Would increased Internet access at school improve your mark in data management? c) Suppose the goal is to convince the school board that every data- management student needs daily access to computers and the Internet in the classroom. How might you alter your sampling technique to help achieve the desired results in this survey? Would these results still be statistically valid? 124 MHR • Statistics of One Variable
  • 125. 2.5 Measures of Central Tendency It is often convenient to use a central value to summarize a set of data. People frequently use a simple arithmetic average for this purpose. However, there are several different ways to find values around which a set of data tends to cluster. Such values are known as measures of central tendency. I N V E S T I G A T E & I N Q U I R E : N o t Yo u r A v e r a g e A v e r a g e François is a NHL hockey player whose first major-league contract is up for renewal. His agent is bargaining with the team’s general manager. Agent: Based on François’ strong performance, we can accept no less than the team’s average salary. Manager: Agreed, François deserves a substantial increase. The team is willing to pay François the team’s average salary, which is $750 000 a season. Agent: I’m certain that we calculated the average salary to be $1 000 000 per season. You had better check your arithmetic. Manager: There is no error, my friend. Half of the players earn $750 000 or more, while half of the players receive $750 000 or less. $750 000 is a fair offer. This table lists the current salaries for the team. Salary ($) Number of Players 300 000 2 500 000 3 750 000 8 900 000 6 1 000 000 2 1 500 000 1 3 000 000 1 4 000 000 1 1. From looking at the table, do you think the agent or the manager is correct? Explain why. 2.5 Measures of Central Tendency • MHR 125
  • 126. 2. Find the mean salary for the team. Describe how you calculated this amount. 3. Find the median salary. What method did you use to find it? 4. Were the statements by François’ agent and the team manager correct? 5. Explain the problem with the use of the term average in these negotiations. In statistics, the three most commonly used measures of central tendency are the mean, median, and mode. Each of these measures has its particular advantages and disadvantages for a given set of data. A mean is defined as the sum of the values of a variable divided by the number of values. In statistics, it is important to distinguish between the mean of a population and the mean of a sample of that population. The sample mean will approximate the actual mean of the population, but the two means could have different values. Different symbols are used to distinguish the two kinds of means: The Greek letter mu, µ, represents a population mean, while −, read as x “x-bar,” represents a sample mean. Thus, x1 + x2 + … + xN x1 + x2 + … + xn − = ᎏᎏ µ = ᎏᎏ and x N n ∑x ∑x =ᎏ =ᎏ N n where ∑x is the sum of all values of X in the population or sample, N is the number of values in the entire population, and n is the number of values in a sample. Note that ∑ , the capital Greek letter sigma, is used in mathematics as a symbol for “the sum of.” If no limits are shown above or below the sigma, the sum includes all of the data. Usually, the mean is what people are referring to when they use the term average in everyday conversation. The median is the middle value of the data when they are ranked from highest to lowest. When there is an even number of values, the median is the midpoint between the two middle values. The mode is the value that occurs most frequently in a distribution. Some distributions do not have a mode, while others have several. Some distributions have outliers, which are values distant from the majority of the data. Outliers have a greater effect on means than on medians. For example, the mean and median for the salaries of the hockey team in the investigation have substantially different values because of the two very high salaries for the team’s star players. 126 MHR • Statistics of One Variable
  • 127. Example 1 Determining Mean, Median, and Mode Two classes that wrote the same physics examination had the following results. Class A 71 82 55 76 66 71 90 84 95 64 71 70 83 45 73 51 68 Class B 54 80 12 61 73 69 92 81 80 61 75 74 15 44 91 63 50 84 a) Determine the mean, median, and mode for each class. b) Use the measures of central tendency to compare the performance of the two classes. c) What is the effect of any outliers on the mean and median? Solution www.mcgrawhill.ca/links/MDM12 a) For class A, the mean is − ∑x x=ᎏ For more information about means, medians, and n modes, visit the above web site and follow the 71 + 82 + … + 68 links. For each measure, give an example of a = ᎏᎏ situation where that measure is the best indicator 17 of the centre of the data. 1215 = ᎏᎏ 17 = 71.5 When the marks are ranked from highest to lowest, the middle value is 71. Therefore, the median mark for class A is 71. The mode for class A is also 71 since this mark is the only one that occurs three times. 54 + 80 + … + 84 Similarly, the mean mark for class B is ᎏᎏ = 64.4. When the marks 18 are ranked from highest to lowest, the two middle values are 69 and 73, so the 69 + 73 median mark for class B is ᎏ = 71. There are two modes since the values 61 2 and 80 both occur twice. However, the sample is so small that all the values occur only once or twice, so these modes may not be a reliable measure. b) Although the mean score for class A is significantly higher than that for class B, the median marks for the two classes are the same. Notice that the measures of central tendency for class A agree closely, but those for class B do not. c) A closer examination of the raw data shows that, aside from the two extremely low scores of 15 and 12 in class B, the distributions are not all that different. Without these two outlying marks, the mean for class B would be 70.1, almost the same as the mean for class A. Because of the relatively small size of class B, the effect of the outliers on its mean is significant. However, the values of these outliers have no effect on the median for class B. Even if the two outlying marks were changed to numbers in the 60s, the median mark would not change because it would still be greater than the two marks. 2.5 Measures of Central Tendency • MHR 127
  • 128. The median is often a better measure of central tendency than the mean for small data sets that contain outliers. For larger data sets, the effect of outliers on the mean is less significant. Example 2 Comparing Samples to a Population Compare the measures of central tendency for each class in Example 1 to those for all the students who wrote the physics examination. Solution 1 Using a Graphing Calculator Use the STAT EDIT menu to check that lists L1 and L2 are clear. Then, enter the data for class A in L1 and the data for class B in L2. Next, use the augment( function from the LIST OPS menu to combine L1 and L2, and store the result in L3. You can use the mean( and median( functions from the LIST MATH menu to find the mean and median for each of the three lists. You can also find these measures by using the 1-Var Stats command from the STAT CALC menu. To find the modes, sort the lists with the SortA( function from the LIST OPS menu, and then scroll down through the lists to find the most frequent values. Alternatively, you can use STAT PLOT to display a histogram for each list and read the x-values for the tallest bars with the TRACE instruction. Note that the mean for class A overestimates the population mean, while the mean for class B underestimates it. The measures of central tendency for class A are reasonably close to those for the whole population of students who wrote the physics examination, but the two sets of measures are not identical. Because both of the low-score outliers happen to be in class B, it is a less representative sample of the population. Solution 2 Using a Spreadsheet Enter the data for class A and class B in separate columns. The AVG and MEAN functions in Corel® Quattro® Pro will calculate the mean for any range of cells you specify, as will the AVERAGE function in Microsoft® Excel. In both spreadsheets, you can use the MEDIAN, and MODE functions to find the median and mode for each class and for the combined data for both classes. Note that all these functions ignore any blank cells in a specified range. The MODE function reports only one mode even if the data have two or more modes. 128 MHR • Statistics of One Variable
  • 129. Solution 3 Using Fathom™ Drag the case table icon to the workspace and name the attribute for the first column Marks. Enter the data for class A and change the name of the collection from Collection1 to ClassA. Use the same method to enter the marks for class B into a collection called ClassB. To create a collection with the combined data, first open another case table and name the collection Both. Then, go back to the class A case table and use the Edit menu to select all cases and then copy them. Return to the Both case table and select Paste Cases from the Edit menu. Copy the cases from the class B table in the same way. Project Now, right-click on the class A collection to open the inspector. Click the Prep Measures tab, and create Mean, Median, and Mode measures. Use the Edit In your statistics Formula menu to enter the formulas for these measures. Use the same project, you may find procedure to find the mean, median, and mode for the other two measures of central collections. Note from the screen below that Fathom™ uses a complicated tendency useful for formula to find modes. See the Help menu or the Fathom™ section of describing your data. Appendix B for details. 2.5 Measures of Central Tendency • MHR 129
  • 130. Chapter 8 discusses a method for calculating how representative of a population a sample is likely to be. Sometimes, certain data within a set are more significant than others. For example, the mark on a final examination is often considered to be more important than the mark on a term test for determining an overall grade for a course. A weighted mean gives a measure of central tendency that reflects the relative importance of the data: − w1x1 + w2x2 + … + wnxn x w = ᎏᎏᎏ w1 + w2 + … + wn ∑wi xi = ᎏi ∑wi i where ∑ wi xi is the sum of the weighted values and ∑ wi is the sum of the various i i weighting factors. Weighted means are often used in calculations of indices. Example 3 Calculating a Weighted Mean The personnel manager for Statsville Marketing Limited considers five criteria when interviewing a job applicant. The manager gives each applicant a score between 1 and 5 in each category, with 5 as the highest score. Each category has a weighting between 1 and 3. The following table lists a recent applicant’s scores and the company’s weighting factors. Criterion Score, xi Weighting Factor, wi Education 4 2 Job experience 2 2 Interpersonal skills 5 3 Communication skills 5 3 References 4 1 a) Determine the weighted mean score for this job applicant. b) How does this weighted mean differ from the unweighted mean? c) What do the weighting factors indicate about the company’s hiring priorities? 130 MHR • Statistics of One Variable
  • 131. Solution a) To compute the weighted mean, find the sum of the products of each score and its weighting factor. ∑wi xi − xw = ᎏi ∑ wi i 2(4) + 2(2) + 3(5) + 3(5) + (1)4 = ᎏᎏᎏᎏ 2+2+3+3+1 46 = ᎏᎏ 11 = 4.2 Therefore, this applicant had a weighted-mean score of approximately 4.2. b) The unweighted mean is simply the sum of unweighted scores divided by 5. − ∑x x=ᎏ n 4+2+5+5+4 = ᎏᎏ 5 =4 Without the weighting factors, this applicant would have a mean score of 4 out of 5. c) Judging by these weighting factors, the company places a high importance on an applicant’s interpersonal and communication skills, moderate importance on education and job experience, and some, but low, importance on references. When a set of data has been grouped into intervals, you can approximate the mean using the formula ∑f m ∑f m ⋅ i i i µ= ᎏ − ⋅ i i i x= ᎏ ∑ fi ∑ fi i i where mi is the midpoint value of an interval and fi the frequency for that interval. You can estimate the median for grouped data by taking the midpoint of the interval within which the median is found. This interval can be found by analysing the cumulative frequencies. 2.5 Measures of Central Tendency • MHR 131
  • 132. Example 4 Calculating the Mean and Median for Grouped Data A group of children were asked how many hours a day Number of Hours Number of Children, fi they spend watching television. The table at the right 0−1 1 summarizes their responses. 1−2 4 a) Determine the mean and median number of hours 2−3 7 for this distribution. 3−4 3 b) Why are these values simply approximations? 4−5 2 5− 6 1 Solution a) First, find the midpoints and cumulative frequencies for the intervals. Then, use the midpoints and the frequencies for the intervals to calculate an estimate for the mean. Number of Midpoint, Number of Cumulative fixi Hours xi Children, fi Frequency 0−1 0.5 1 1 0.5 1−2 1.5 4 5 6 2−3 2.5 7 12 17.5 3−4 3.5 3 15 10.5 4−5 4.5 2 17 9 5−6 5.5 1 18 5.5 ∑ fi = 18 ∑ fi xi = 49 i i ∑f x − ⋅ i i i x= ᎏ ∑ fi i 49 = ᎏᎏ 18 = 2.7 Therefore, the mean time the children spent watching television is approximately 2.7 h a day. To determine the median, you must identify the interval in which the middle value occurs. There are 18 data values, so the median is the mean of the ninth and tenth values. According to the cumulative-frequency column, both of these occur within the interval of 2−3 h. Therefore, an approximate value for the median is 2.5 h. b) These values for the mean and median are approximate because you do not know where the data lie within each interval. For example, the child whose viewing time is listed in the first interval could have watched anywhere from 0 to 60 min of television a day. If the median value is close to one of the boundaries of the interval, then taking the midpoint of the interval as the median could give an error of almost 30 min. 132 MHR • Statistics of One Variable
  • 133. Key Concepts • The three principal measures of central tendency are the mean, median, and mode. The measures for a sample can differ from those for the whole population. • The mean is the sum of the values in a set of data divided by the number of values in the set. • The median is the middle value when the values are ranked in order. If there are two middle values, then the median is the mean of these two middle values. • The mode is the most frequently occurring value. • Outliers can have a dramatic effect on the mean if the sample size is small. • Α weighted mean can be a useful measure when all the data are not of equal significance. • For data grouped into intervals, the mean and median can be estimated using the midpoints and frequencies of the intervals. Communicate Your Understanding 1. Describe a situation in which the most useful measure of central tendency is a) the mean b) the median c) the mode 2. Explain why a weighted mean would be used to calculate an index such as the consumer price index. ∑f m − ⋅ i i i 3. Explain why the formula x = ᎏ gives only an approximate value for the ∑ fi i mean for grouped data. Practise c) List a set of eight values that has two modes. A d) List a set of eight values that has a 1. For each set of data, calculate the mean, median that is one of the data values. median, and mode. a) 2.4 3.5 1.9 3.0 3.5 2.4 1.6 3.8 1.2 Apply, Solve, Communicate 2.4 3.1 2.7 1.7 2.2 3.3 3. Stacey got 87% on her term work in b) 10 15 14 19 18 17 12 10 14 15 18 chemistry and 71% on the final examination. 20 9 14 11 18 What will her final grade be if the term 2. a) List a set of eight values that has no mark counts for 70% and the final mode. examination counts for 30%? b) List a set of eight values that has a median that is not one of the data values. 2.5 Measures of Central Tendency • MHR 133
  • 134. 4. Communication Determine which measure of 8. Application An academic award is to be central tendency is most appropriate for granted to the student with the highest each of the following sets of data. Justify overall score in four weighted categories. your choice in each case. Here are the scores for the three finalists. a) baseball cap sizes Criterion Weighting Paulo Janet Jamie b) standardized test scores for 2000 students Academic achievement 3 4 3 5 c) final grades for a class of 18 students Extra-curricular d) lifetimes of mass-produced items, such as activities 2 4 4 4 batteries or light bulbs Community service 2 2 5 3 B Interview 1 5 5 4 5. An interviewer rates candidates out of 5 for a) Calculate each student’s mean score each of three criteria: experience, education, without considering the weighting factors. and interview performance. If the first two criteria are each weighted twice as much as b) Calculate the weighted-mean score for the interview, determine which of the each student. following candidates should get the job. c) Who should win the award? Explain. Criterion Nadia Enzo Stephan 9. Al, a shoe salesman, needs to restock his Experience 4 5 5 best-selling sandal. Here is a list of the sizes Education 4 4 3 of the pairs he sold last week. This sandal Interview 4 3 4 does not come in half-sizes. 6. Determine the effect the two outliers have 10 7 6 8 7 10 5 10 7 9 on the mean mark for all the students in 11 4 6 7 10 10 7 8 10 7 9 7 10 4 7 7 10 11 Example 2. Explain why this effect is different from the effect the outliers had on a) Determine the three measures of central the mean mark for class B. tendency for these sandals. 7. Application The following table shows the b) Which measure has the greatest grading system for Xabbu’s calculus course. significance for Al? Explain. Term Mark Overall Mark c) What other value is also significant? Knowledge and Term mark 70% d) Construct a histogram for the data. understanding (K/U) 35% Final examination What might account for the shape of Thinking, inquiry, problem 30% this histogram? solving (TIPS) 25% Communication (C) 15% 10. Communication Last year, the mean number Application (A) 25% of goals scored by a player on Statsville’s soccer team was 6. a) Determine Xabbu’s term mark if he a) How many goals did the team score last scored 82% in K/U, 71% in TIPS, 85% year if there were 15 players on the team? in C, and 75% in A. b) Explain how you arrived at the answer for b) Determine Xabbu’s overall mark if he part a) and show why your method works. scored 65% on the final examination. 134 MHR • Statistics of One Variable
  • 135. 11. Inquiry/Problem Solving The following table f) Determine a mean, median, and mode shows the salary structure of Statsville Plush for the grouped data. Explain any Toys, Inc. Assume that salaries exactly on an differences between these measures interval boundary have been placed in the and the ones you calculated in part a). higher interval. 13. The modal interval for grouped data is Salary Range ($000) Number of Employees the interval that contains more data than 20−30 12 any other interval. 30−40 24 a) Determine the modal interval(s) for 40−50 32 your data in part d) of question 12. 50−60 19 b) Is the modal interval a useful measure 60−70 9 of central tendency for this particular 70−80 3 distribution? Why or why not? 80−90 0 90−100 1 14. a) Explain the effect outliers have on the median of a distribution. Use examples a) Determine the approximate mean salary to support your explanation. for an employee of this firm. b) Explain the effect outliers have on the b) Determine the approximate median mode of a distribution. Consider salary. different cases and give examples of each. c) How much does the outlier influence the mean and median salaries? Use C calculations to justify your answer. 15. The harmonic mean is defined as ΂ ΃ 1 –1 ∑ ᎏ , where n is the number of values 12. Inquiry/Problem Solving A group of friends i nx i and relatives get together every Sunday for in the set of data. a little pick-up hockey. The ages of the 30 a) Use a harmonic mean to find the average regulars are shown below. price of gasoline for a driver who bought 22 28 32 45 48 19 20 52 50 21 $20 worth at 65¢/L last week and 30 46 21 38 45 49 18 25 23 46 another $20 worth at 70¢/L this week. 51 24 39 48 28 20 50 33 17 48 b) Describe the types of calculations for a) Determine the mean, median, and mode which the harmonic mean is useful. for this distribution. 16. The geometric mean is defined as b) Which measure best describes these n data? Explain your choice. x1 × x2 × … × xn ͙ෆෆෆ , where n is the number of values in the set of data. c) Group these data into six intervals and produce a frequency table. a) Use the geometric mean to find the average annual increase in a labour d) Illustrate the grouped data with a contract that gives a 4% raise the first frequency diagram. Explain why the year and a 2% raise for the next three shape of this frequency diagram could be years. typical for such groups of hockey players. b) Describe the types of calculations for e) Produce a cumulative-frequency diagram. which the geometric mean is useful. 2.5 Measures of Central Tendency • MHR 135
  • 136. 2.6 Measures of Spread The measures of central tendency indicate the central values of a set of data. Often, you will also want to know how closely the data cluster around these centres. I N V E S T I G AT E & I N Q U I R E : S p r e a d i n a S e t o f D a t a For a game of basketball, a group of friends split into two randomly chosen teams. The heights of the players are shown in the table below. Falcons Ravens Player Height (cm) Player Height (cm) Laura 183 Sam 166 Jamie 165 Shannon 163 Deepa 148 Tracy 168 Colleen 146 Claudette 161 Ingrid 181 Maria 165 Justiss 178 Amy 166 Sheila 154 Selena 166 1. Judging by the raw data in this table, which team do you think has a height advantage? Explain why. 2. Do the measures of central tendency confirm that the teams are mismatched? Why or why not? 3. Explain how the distributions of heights on the two teams might give one of them an advantage. How could you use a diagram to illustrate the key difference between the two teams? The measures of spread or dispersion of a data set are quantities that indicate how closely a set of data clusters around its centre. Just as there are several measures of central tendency, there are also different measures of spread. 136 MHR • Statistics of One Variable
  • 137. Standard Deviation and Variance A deviation is the difference between an individual value in a set of data and the mean for the data. For a population, For a sample, deviation = x − µ − deviation = x − x The larger the size of the deviations, the greater the spread in the data. Values less than the mean have negative deviations. If you simply add up all the deviations for a data set, they will cancel out. You could use the sum of the absolute values of the deviations as a measure of spread. However, statisticians have shown that a root- mean-square quantity is a more useful measure of spread. The standard deviation is the square root of the mean of the squares of the deviations. The lowercase Greek letter sigma, σ, is the symbol for the standard deviation of a population, while the letter s stands for the standard deviation of a sample. Population standard deviation Sample standard deviation ∑(x − µ)2 − ∑(x − x )2 σ= Ί๶ᎏᎏ N s= Ί๶ ᎏᎏ n−1 where N is the number of data in the population and n is the number in the sample. Note that the formula for s has n − 1 in the denominator instead of n. This denominator compensates for the fact that a sample taken from a population tends to underestimate the deviations in the population. Remember that the sample mean, −, is not necessarily equal to the population mean, µ. Since − is the central x x − than to µ. When n is large, value of the sample, the sample data cluster closer to x the formula for s approaches that for σ. Also note that the standard deviation gives greater weight to the larger deviations since it is based on the squares of the deviations. The mean of the squares of the deviations is another useful measure. This quantity is called the variance and is equal to the square of the standard deviation. Population variance Sample variance ∑(x − µ)2 − ∑(x − x )2 σ 2 = ᎏᎏ s 2 = ᎏᎏ N n−1 Example 1 Using a Formula to Calculate Standard Deviations Use means and standard deviations to compare the distribution of heights for the two basketball teams listed in the table on page 136. 2.6 Measures of Spread • MHR 137
  • 138. Solution Since you are considering the teams as two separate populations, use the mean and standard deviation formulas for populations. First, calculate the mean height for the Falcons. ∑x µ= ᎏ N 1155 = ᎏᎏ 7 = 165 Next, calculate all the deviations and their squares. Falcons Height (cm) Deviation, x – µ (x – µ)2 Laura 183 18 324 Jamie 165 0 0 Deepa 148 −17 289 Colleen 146 −19 361 Ingrid 181 16 256 Justiss 178 13 169 Sheila 154 −11 121 Sum 1155 0 1520 Now, you can determine the standard deviation. ∑(x − µ)2 σ= Ί๶ ᎏᎏ N = Ί๶ 1520 ᎏ 7 = 14.7 Therefore, the Falcons have a mean height of 165 cm with a standard deviation of 14.7 cm. Similarly, you can determine that the Ravens also have a mean height of 165 cm, but their standard deviation is only 2.1 cm. Clearly, the Falcons have a much greater spread in height than the Ravens. Since the two teams have the same mean height, the difference in the standard deviations indicates that the Falcons have some players who are taller than any of the Ravens, but also some players who are shorter. If you were to consider either of the basketball teams in the example above as a sample of the whole group of players, you would use the formula for s to calculate the team’s standard deviation. In this case, you would be using the sample to estimate the characteristics of a larger population. However, the teams are very small samples, so they could have significant random variations, as the difference in their standard deviations demonstrates. 138 MHR • Statistics of One Variable
  • 139. For large samples the calculation of standard deviation can be quite tedious. See Appendix B for more However, most business and scientific calculators have built-in functions for detailed information about such calculations, as do spreadsheets and statistical software. technology functions and keystrokes. Example 2 Using Technology to Calculate Standard Deviations A veterinarian has collected data on the life spans of a rare breed of cats. Life Spans (in years) 16 18 19 12 11 15 20 21 18 15 16 13 16 22 18 19 17 14 9 14 15 19 20 15 15 Determine the mean, standard deviation, and the variance for these data. Solution 1 Using a Graphing Calculator Use the ClrList command to make sure list L1 is clear, then enter the data into it. Use the 1-Var Stats command from the STAT CALC menu to calculate a set of statistics including the mean and the standard deviation. Note that the calculator displays both a sample standard deviation, Sx, and a population standard deviation, σx. Use Sx since you are dealing with a sample in this case. Find the variance by calculating the square of Sx. The mean life span for this breed of cats is about 16.3 years with a standard deviation of 3.2 years and a variance of 10.1. Note that variances are usually stated without units. The units for this variance are years squared. Solution 2 Using a Spreadsheet Enter the data into your spreadsheet program. With Corel® Quattro® Pro, you can use the AVG, STDS, and VARS functions to calculate the mean, sample standard deviation, and sample variance. In Microsoft® Excel, the equivalent functions are AVERAGE, STDEV, and VAR. 2.6 Measures of Spread • MHR 139
  • 140. Solution 3 Using Fathom™ Drag a new case table onto the workspace, name the attribute for the first column Lifespan, and enter the data. Right-click to open the inspector, and click the Measures tab. Create Mean, StdDev, and Variance measures and select the formulas for the mean, standard deviation, and variance from the Edit Formula/ Functions/Statistical/One Attribute menu. If you are working with grouped data, you can estimate the standard Project deviation using the following formulas. Prep For a population, For a sample, In your statistics ∑fi (mi − µ)2 − ∑fi(mi − x )2 project, you may ⋅ σ = ᎏᎏΊ๶ N ⋅ Ί๶ s = ᎏᎏ n−1 wish to use an appropriate measure where fi is the frequency for a given interval and mi is the midpoint of the of spread to describe interval. However, calculating standard deviations from raw, ungrouped the distribution of data will give more accurate results. your data. Quartiles and Interquartile Ranges Quartiles divide a set of ordered data into four groups with equal numbers of values, just as the median divides data into two equally sized groups. The three “dividing points” are the first quartile (Q1), the median (sometimes called the second quartile or Q2), and the third quartile (Q3). Q1 and Q3 are the medians of the lower and upper halves of the data. Interquartile Range Lowest Datum First Quartile Median Third Quartile Highest Datum Q1 Q2 Q3 140 MHR • Statistics of One Variable
  • 141. Recall that when there are an even number of data, you take the midpoint between the two middle values as the median. If the number of data below the median is even, Q1 is the midpoint between the two middle values in this half of the data. Q3 is determined in a similar way. The interquartile range is Q3 − Q1, which is the range of the middle half of the data. The larger the interquartile range, the larger the spread of the central half of the data. Thus, the interquartile range provides a measure of spread. The semi-interquartile range is one half of the interquartile range. Both these ranges indicate how closely the data are clustered around the median. A box-and-whisker plot of the data illustrates these measures. The box shows the first quartile, the median, and the third quartile. The ends of the “whiskers” represent the lowest and highest values in the set of data. Thus, the length of the box shows the interquartile range, while the left whisker shows the range of the data below the first quartile, and the right whisker shows the range above the third quartile. Interquartile Range Lowest Datum Highest Datum Q1 Q3 Median (Q2) A modified box-and-whisker plot is often used when the data contain outliers. By convention, any point that is at least 1.5 times the box length away from the box is classified as an outlier. A modified box-and-whisker plot shows such outliers as separate points instead of including them in the whiskers. This method usually gives a clearer illustration of the distribution. Interquartile Range Lowest Datum Highest Datum Q1 Q3 Outliers Median (Q2) 2.6 Measures of Spread • MHR 141
  • 142. Example 3 Determining Quartiles and Interquartile Ranges A random survey of people at a science-fiction convention asked them how many times they had seen Star Wars. The results are shown below. 3 4 2 8 10 5 1 15 5 16 6 3 4 9 12 3 30 2 10 7 a) Determine the median, the first and third quartiles, and the interquartile and semi-interquartile ranges. What information do these measures provide? b) Prepare a suitable box plot of the data. c) Compare the results in part a) to those from last year’s survey, which found a median of 5.1 with an interquartile range of 8.0. Solution 1 Using Pencil and Paper a) First, put the data into numerical order. 1 2 2 3 3 3 4 4 5 5 6 7 8 9 10 10 12 15 16 30 The median is either the middle datum or, as in this case, the mean of the two middle data: 5+6 median = ᎏ 2 = 5.5 The median value of 5.5 indicates that half of the people surveyed had seen Star Wars less than 5.5 times and the other half had seen it more than 5.5 times. To determine Q1, find the median of the lower half of the data. Again, there are two middle values, both of which are 3. Therefore, Q1 = 3. Similarly, the two middle values of the upper half of the data are both 10, so Q3 = 10. Since Q1 and Q3 are the boundaries for the central half of the data, they show that half of the people surveyed have seen Star Wars between 3 and 10 times. Q3 − Q1 = 10 − 3 =7 Therefore, the interquartile range is 7. The semi-interquartile range is half this value, or 3.5. These ranges indicate the spread of the central half of the data. 142 MHR • Statistics of One Variable
  • 143. b) The value of 30 at the end of the ordered data is clearly an outlier. Therefore, a modified box-and-whisker plot will best illustrate this set of data. 0 5 10 15 20 25 30 Viewings of Star Wars c) Comparing the two surveys shows that the median number of viewings is higher this year and the data are somewhat less spread out. Solution 2 Using a Graphing Calculator a) Use the STAT EDIT menu to enter the data into a list. Use the 1-Var Stats command from the CALC EDIT menu to calculate the statistics for your list. Scroll down to see the values for the median, Q1, and Q3. Use the values for Q1 and Q3 to calculate the interquartile and semi-interquartile ranges. b) Use STAT PLOT to select a modified box plot of your list. Press GRAPH to display the box-and-whisker plot and adjust the window settings, if necessary. Solution 3 Using Fathom™ a) Drag a new case table onto the workspace, create an attribute called StarWars, and enter your data. Open the inspector and create Median, Q1, Q3, and IQR measures. Use the Edit Formula/Functions/Statistical/One Attribute menu to enter the formulas for the median, quartiles, and interquartile range. 2.6 Measures of Spread • MHR 143
  • 144. b) Drag the graph icon onto the workspace, then drop the StarWars attribute on the x-axis of the graph. Select Box Plot from the drop-down menu in the upper right corner of the graph. Although a quartile is, strictly speaking, a single value, people sometimes speak of a datum being within a quartile. What they really mean is that the datum is in the quarter whose upper boundary is the quartile. For example, if a value x1 is “within the first quartile,” then x1 ≤ Q1. Similarly, if x2 is “within the third quartile,” then the median ≤ x2 ≤ Q3. Example 4 Classifying Data by Quartiles In a survey of low-risk mutual funds, the median annual yield was 7.2%, while Q1 was 5.9% and Q3 was 8.3%. Describe the following funds in terms of quartiles. Mutual Fund Annual Yield (%) XXY Value 7.5 YYZ Dividend 9.0 ZZZ Bond 7.2 Solution The yield for the XXY Value fund was between the median and Q3. You might see this fund described as being in the third quartile or having a third-quartile yield. YYZ Dividend’s yield was above Q3. This fund might be termed a fourth- or top-quartile fund. ZZZ Bond’s yield was equal to the median. This fund could be described as a median fund or as having median performance. 144 MHR • Statistics of One Variable
  • 145. Percentiles Percentiles are similar to quartiles, except that percentiles divide the data into 100 intervals that have equal numbers of values. Thus, k percent of the data are less than or equal to kth percentile, Pk , and (100 − k) percent are greater than or equal to Pk. Standardized tests often use percentiles to convert raw scores to scores on a scale from 1 to 100. As with quartiles, people sometimes use the term percentile to refer to the intervals rather than their boundaries. Example 5 Percentiles An audio magazine tested 60 different models of speakers and gave each one an overall rating based on sound quality, reliability, efficiency, and appearance. The raw scores for the speakers are listed in ascending order below. 35 47 57 62 64 67 72 76 83 90 38 50 58 62 65 68 72 78 84 91 41 51 58 62 65 68 73 79 86 92 44 53 59 63 66 69 74 81 86 94 45 53 60 63 67 69 75 82 87 96 45 56 62 64 67 70 75 82 88 98 a) If the Audio Maximizer Ultra 3000 scored at the 50th percentile, what was its raw score? b) What is the 90th percentile for these data? c) Does the SchmederVox’s score of 75 place it at the 75th percentile? Solution a) Half of the raw scores are less than or equal to the 50th percentile and half are greater than or equal to it. From the table, you can see that 67 divides the data in this way. Therefore, the Audio Maximizer Ultra 3000 had a raw score of 67. b) The 90th percentile is the boundary between the lower 90% of the scores and the top 10%. In the table, you can see that the top 10% of the scores are in the 10th column. Therefore, the 90th percentile is the midpoint between values of 88 and 90, which is 89. c) First, determine 75% of the number of raw scores. 60 × 75% = 45 There are 45 scores less than or equal to the 75th percentile. Therefore, the 75th percentile is the midpoint between the 45th and 46th scores. These two scores are 79 and 81, so the 75th percentile is 80. The SchmederVox’s score of 75 is below the 75th percentile. 2.6 Measures of Spread • MHR 145
  • 146. Z-Scores A z-score is the number of standard deviations that a datum is from the mean. You calculate the z-score by dividing the deviation of a datum by the standard deviation. For a population, For a sample, x−µ x−x − z= ᎏ z= ᎏ σ s Variable values below the mean have negative z-scores, values above the mean have positive z-scores, and values equal to the mean have a zero z-score. Chapter 8 describes z-scores in more detail. Example 6 Determining Z-Scores Determine the z-scores for the Audio Maximizer Ultra 3000 and SchmederVox speakers. Solution You can use a calculator, spreadsheet, or statistical software to determine that the mean is 68.1 and the standard deviation is 15.2 for the speaker scores in Example 4. Now, use the mean and standard deviation to calculate the z-scores for the two speakers. For the Audio Maximizer Ultra 3000, x−x − z= ᎏ s 67 − 68.1 = ᎏᎏ 15.2 = −0.072 146 MHR • Statistics of One Variable
  • 147. For the SchmederVox, x−x − z= ᎏ s 75 − 68.1 = ᎏᎏ 15.2 = 0.46 The Audio Maximizer Ultra 3000 has a z-score of −0.072, indicating that it is approximately 7% of a standard deviation below the mean. The SchmederVox speaker has a z-score of 0.46, indicating that it is approximately half a standard deviation above the mean. Key Concepts • The variance and the standard deviation are measures of how closely a set of data clusters around its mean. The variance and standard deviation of a sample may differ from those of the population the sample is drawn from. • Quartiles are values that divide a set of ordered data into four intervals with equal numbers of data, while percentiles divide the data into 100 intervals. • The interquartile range and semi-interquartile range are measures of how closely a set of data clusters around its median. • The z-score of a datum is a measure of how many standard deviations the value is from the mean. Communicate Your Understanding 1. Explain how the term root-mean-square applies to the calculation of the standard deviation. 2. Why does the semi-interquartile range give only an approximate measure of how far the first and third quartiles are from the median? 3. Describe the similarities and differences between the standard deviation and the semi-interquartile range. 4. Are the median, the second quartile, and the 50th percentile always equal? Explain why or why not. 2.6 Measures of Spread • MHR 147
  • 148. Practise Apply, Solve, Communicate A B 1. Determine the mean, standard deviation, 6. The board members of a provincial and variance for the following samples. organization receive a car allowance for a) Scores on a data management quiz travel to meetings. Here are the distances (out of 10 with a bonus question): the board logged last year (in kilometres). 5 7 9 6 5 10 8 2 44 18 125 80 63 42 35 68 52 11 8 7 7 6 9 5 8 75 260 96 110 72 51 b) Costs for books purchased including a) Determine the mean, standard deviation, taxes (in dollars): and variance for these data. b) Determine the median, interquartile 12.55 15.31 21.98 45.35 19.81 33.89 29.53 30.19 38.20 range, and semi-interquartile range. c) Illustrate these data using a box-and- 2. Determine the median, Q1, Q3, the whisker plot. interquartile range, and semi-interquartile d) Identify any outliers. range for the following sets of data. a) Number of home runs hit by players 7. The nurses’ union collects data on the hours on the Statsville little league team: worked by operating-room nurses at the Statsville General Hospital. 6 4 3 8 9 11 6 5 15 Hours Per Week Number of Employees b) Final grades in a geography class: 12 1 88 56 72 67 59 48 81 62 32 5 90 75 75 43 71 64 78 84 35 7 38 8 3. For a recent standardized test, the median 42 5 was 88, Q1 was 67, and Q3 was 105. Describe a) Determine the mean, variance, and the following scores in terms of quartiles. standard deviation for the nurses’ hours. a) 8 b) Determine the median, interquartile b) 81 range, and semi-interquartile range. c) 103 c) Illustrate these data using a box-and- whisker plot. 4. What percentile corresponds to a) the first quartile? 8. Application b) the median? a) Predict the changes in the standard c) the third quartile? deviation and the box-and-whisker plot if the outlier were removed from the data 5. Convert these raw scores to z-scores. in question 7. 18 15 26 20 21 b) Remove the outlier and compare the new results to your original results. c) Account for any differences between your prediction and your results in part b). 148 MHR • Statistics of One Variable
  • 149. 9. Application Here are the current salaries for semi-interquartile range? Give an example pte ha François’ team. or explain why one is not possible. C r Salary ($) Number of Players m P r oble 14. Inquiry/Problem Solving A business- 300 000 2 travellers’ association rates hotels on a 500 000 3 variety of factors including price, cleanliness, 750 000 8 services, and amenities to produce an overall 900 000 6 score out of 100 for each hotel. Here are the 1 000 000 2 ratings for 50 hotels in a major city. 1 500 000 1 39 50 56 60 65 68 73 77 81 87 3 000 000 1 41 50 56 60 65 68 74 78 81 89 4 000 000 1 42 51 57 60 66 70 74 78 84 91 a) Determine the standard deviation, 44 53 58 62 67 71 75 79 85 94 48 55 59 63 68 73 76 80 86 96 variance, interquartile range, and semi-interquartile range for these data. a) What score represents b) Illustrate the data with a modified i) the 50th percentile? box-and-whisker plot. ii) the 95th percentile? c) Determine the z-score of François’ b) What percentile corresponds to a rating current salary of $300 000. of 50? d) What will the new z-score be if François’ c) The travellers’ association lists hotels agent does get him a million-dollar above the 90th percentile as “highly contract? recommended” and hotels between the 10. Communication Carol’s golf drives have a 75th and 90th percentiles as mean of 185 m with a standard deviation “recommended.” What are the minimum of 25 m, while her friend Chi-Yan shoots scores for the two levels of recommended a mean distance of 170 m with a standard hotels? deviation of 10 m. Explain which of the two friends is likely to have a better score in a round of golf. What assumptions do you ACHIEVEMENT CHECK have to make for your answer? Knowledge/ Thinking/Inquiry/ Communication Application Understanding Problem Solving 11. Under what conditions will Q1 equal one of 15. a) A data-management teacher has two the data points in a distribution? classes whose midterm marks have 12. a) Construct a set of data in which Q1 = Q3 identical means. However, the standard and describe a situation in which this deviations for each class are significantly equality might occur. different. Describe what these measures tell you about the two classes. b) Will such data sets always have a median equal to Q1 and Q3? Explain your b) If two sets of data have the same mean, reasoning. can one of them have a larger standard deviation and a smaller interquartile 13. Is it possible for a set of data to have a range than the other? Give an example standard deviation much smaller than its or explain why one is not possible. 2.6 Measures of Spread • MHR 149
  • 150. a) Determine the midrange and C − interquartile range for these data. 16. Show that ∑(x − x ) = 0 for any distribution. b) What are the similarities and differences n(∑x ) − (∑x) 2 2 between these two measures of spread? 17. a) Show that s = Ί๶๶ ᎏᎏ . n(n − 1) 19. The mean absolute deviation of a set of − − ∑|x − x | (Hint: Use the fact that ∑ x = nx .) data is defined as ᎏ , where | x − x | is− b) What are two advantages of using the n formula in part a) for calculating the absolute value of the difference between standard deviations? each data point and the mean. a) Calculate the mean absolute deviation 18. Communication The midrange of a set of and the standard deviation for the data data is defined as half of the sum of the in question 18. highest value and the lowest value. The b) What are the similarities and differences incomes for the employees of Statsville between these two measures of spread? Lawn Ornaments Limited are listed below (in thousands of dollars). 28 34 49 22 50 31 55 32 73 21 63 112 35 19 44 28 59 85 47 39 Career Connection Statistician Use of statistics today is so widespread that there are numerous career opportunities for statisticians in a broad range of fields. Governments, medical-research laboratories, sports agencies, financial groups, and universities are just a few of the many organizations that employ statisticians. Current trends suggest an ongoing need for statisticians in many areas. A statistician is engaged in the collection, analysis, presentation, and interpretation of data in a variety of forms. Statisticians provide insight into which data are likely to be reliable and whether valid conclusions or predictions can be drawn from them. A research statistician might develop new statistical techniques or applications. Because computers are essential for analysing large amounts of data, a statistician should possess a strong background in computers as well as www.mcgrawhill.ca/links/MDM12 mathematics. Many positions call for a minimum of a bachelor’s or master’s degree. For more information about a career as a Research at a university or work for a statistician and other careers related to consulting firm usually requires a doctorate. mathematics, visit the above web site and follow the links. 150 MHR • Statistics of One Variable
  • 151. Review of Key Concepts 2.1 Data Analysis With Graphs 2.2 Indices Refer to the Key Concepts on page 100. Refer to the Key Concepts on page 109. 1. The following data show monthly sales of The following graph shows four categories houses by a real-estate agency. from the basket of goods and services used to calculate the consumer price index. 6 5 7 6 8 3 5 4 6 7 5 9 5 6 6 7 200 Fresh Vegetables a) Construct an ungrouped frequency for Ontario (1992 = 100) Consumer Price Index 150 table for this distribution. Coffee and Tea b) Create a frequency diagram. 100 Rent c) Create a cumulative-frequency diagram. 50 Fuel Oil and Other 2. A veterinary study recorded the masses in Fuel grams of 25 kittens at birth. 0 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 240 300 275 350 280 260 320 295 340 305 280 265 300 275 315 285 320 325 275 270 290 4. a) What is this type of graph called? 245 235 305 265 b) Which of the four categories had the greatest increase during the period a) Organize these data into groups. shown? b) Create a frequency table and histogram. c) Why do all four graphs intersect c) Create a frequency polygon. at 1992? d) Create a relative-frequency diagram. d) Which category was i) the most volatile? 3. A class of data-management students listed their favourite board games. ii) the least volatile? Game Frequency e) Suggest reasons for this difference Pictionary® 10 in volatility. Chess 5 5. a) If a tin of coffee cost $5.99 in 1992, Trivial Pursuit® 8 what would you expect it to cost in MONOPOLY® 3 i) 1995? Balderdash® 6 ii) 1990? Other 4 b) What rent would a typical tenant pay a) What type of data does this table show? in 2000 for an apartment that had a rent Explain your reasoning. of $550 per month in 1990? b) Graph these data using an appropriate c) What might you expect to pay for format. broccoli in 2000, if the average price c) Explain why you chose the type of graph you paid in 1996 was $1.49 a bunch? you did. Review of Key Concepts • MHR 151
  • 152. 2.3 Sampling Techniques c) A budding musician plays a new song for Refer to the Key Concepts on page 116. family members and friends to see if it is good enough to record professionally. 6. a) Explain the difference between a d) Every fourth person entering a public stratified sample and a systematic sample. library is asked: “Do you think Carol b) Describe a situation where a convenience Shields should receive the Giller prize sample would be an appropriate for her brilliant and critically acclaimed technique. new novel?” c) What are the advantages and disadvantages of a voluntary-response 10. For each situation in question 9, suggest sample? how the statistical process could be changed to remove the bias. 7. Suppose you are conducting a survey that you would like to be as representative as 2.5 Measures of Central Tendency possible of the entire student body at your Refer to the Key Concepts on page 133. school. However, you have time to visit only six classes and to process data from a 11. a) Determine the mean, median, and mode total of 30 students. for the data in question 1. a) What sampling technique would you use? b) Which measure of central tendency best b) Describe how you would select the describes these data? Explain your students for your sample. reasoning. 12. a) Use your grouped data from question 2 8. Drawing names from a hat and using a random-number generator are two ways to to estimate the mean and median masses obtain a simple random sample. Describe for the kittens. two other ways of selecting a random sample. b) Determine the actual mean and median masses from the raw data. 2.4 Bias in Surveys c) Explain any differences between your Refer to the Key Concepts on page 122. answers to parts a) and b). 9. Identify the type of bias in each of the 13. a) For what type of “average” will the following situations and state whether the following statement always be true? bias is due to the sampling technique or “There are as many people with the method of data collection. below-average ages as there are with a) A survey asks a group of children above-average ages. ” whether or not they should be allowed b) Is this statement likely to be true for unlimited amounts of junk food. either of the other measures of central b) A teachers asks students to raise their tendency discussed in this chapter? hands if they have ever told a harmless lie. Why or why not? 152 MHR • Statistics of One Variable
  • 153. 14. Angela is applying to a university 17. a) Explain why you cannot calculate the engineering program that weights an semi-interquartile range if you know applicant’s eight best grade-12 marks as only the difference between either Q3 shown in the following table. and the median or median and Q1. Subjects Weighting b) Explain how you could determine the Calculus, chemistry, geometry semi-interquartile range if you did know and discrete mathematics, physics 3 both of the differences in part a). Computer science, data management, English 2 18. a) For the data in question 2, determine Other 1 i) the first and third quartiles Angela’s grade-12 final marks are listed ii) the 10th, 25th, 75th, and 90th below. percentiles Subject Mark Subject Mark b) Would you expect any of the values in Calculus 95 Computer science 84 part a) to be equal? Why or why not? English 89 Chemistry 90 Geometry and 94 Mathematics of 87 19. The scores on a precision-driving test for discrete mathematics data management prospective drivers at a transit company have Physical education 80 Physics 92 a mean of 100 and a standard deviation of 15. a) Calculate Angela’s weighted average. a) Determine the z-score for each of the b) Calculate Angela’s unweighted average. following raw scores. c) Explain why the engineering program i) 85 ii) 135 iii) 100 iv) 62 would use this weighting system. b) Determine the raw score corresponding to each of the following z-scores. 15. Describe three situations where the mode i) 1 ii) −2 iii) 1.5 iv) −1.2 would be the most appropriate measure of central tendency. 20. Dr. Simba’s fourth-year class in animal biology has only 12 students. Their scores on 2.6 Measures of Spread the midterm examination are shown below. Refer to the Key Concepts on page 147. 50 71 65 54 84 69 82 67 52 52 86 85 16. a) Determine the standard deviation, the interquartile range, and the semi- a) Calculate the mean and median for these interquartile range for the data in data. Compare these two statistics. question 1. b) Calculate the standard deviation and the b) Create a box-and-whisker plot for these semi-interquartile range. Compare these data. statistics and comment on what you notice. c) Are there any outliers in the data? c) Which measure of spread is most Justify your answer. suitable for describing this data set? Explain why. Review of Key Concepts • MHR 153
  • 154. Chapter Test ACHIEVEMENT CHART Knowledge/ Thinking/Inquiry/ Category Communication Application Understanding Problem Solving Questions All 10, 11 4, 6, 7, 8, 9, 11 5, 6, 11 Use the following set of data-management final 6. An interview committee graded three short- examination scores to answer questions 1 listed candidates for a management position through 5. as shown below. The scores are on a scale of 1 to 5, with 5 as the top score. 92 48 59 62 66 98 70 70 55 63 70 97 61 53 56 64 46 69 58 64 Criterion Weight Clarise Pina Steven Education 2 3 3 4 1. a) Group these data into intervals and Experience 2 4 5 3 create a frequency table. Interpersonal skills 3 3 3 5 b) Produce a frequency diagram and First interview 1 5 4 3 a frequency polygon. Who should the committee hire based on c) Produce a cumulative-frequency these data? Justify your choice. diagram. 7. Describe the type of sample used in each 2. Determine the of the following scenarios. a) three measures of central tendency a) A proportionate number of boys and b) standard deviation and variance girls are randomly selected from a class. c) interquartile and semi-interquartile b) A software company randomly chooses ranges a group of schools in a particular school district to test a new timetable program. 3. a) Produce a modified box-and-whisker plot for this distribution. c) A newspaper prints a questionnaire and invites its readers to mail in their b) Identify any outliers. responses. c) Identify and explain any other unusual d) A telephone-survey company uses a features of this graph. random-number generator to select 4. Explain which of the three measures of which households to call. central tendency is most appropriate to e) An interviewer polls people passing by describe this distribution of marks and why on the street. the other two measures are not appropriate. 8. A group of 8 children in a day-care centre 5. Students with scores above the 90th are to be interviewed about their favourite percentile receive a book prize. games. Describe how you would select a a) How many students will receive prizes? systematic sample if there are 52 children b) What are these students’ scores? at the centre. 154 MHR • Statistics of One Variable
  • 155. 9. a) Identify the bias in the following surveys iii) A random survey of corporate and explain the effect it could have on executives asked: “Do you favour their results. granting a cable-television licence i) Parents of high-school students were for a new economics and business asked: “Do you think that students channel?” should be released from school a half b) Suggest how to eliminate the bias in hour early on Friday, free to run each of the surveys in part a). around and get into trouble?” 10. A mutual-fund company proudly advertises ii) Audience members at an investment that all of its funds have “first-quartile workshop were asked to raise their performance.” What mathematical errors has hands if they had been late with a bill the company made in this advertisement? payment within the last six months. ACHIEVEMENT CHECK Knowledge/Understanding Thinking/Inquiry/Problem Solving Communication Application 11. The graph below shows the stock price for an Ontario technology company over a one-month period in 2001. 30 28 Stock Price ($) 26 24 22 20 18 23 25 1 8 15 22 August 2001 September 2001 a) When did the stock reach its lowest value during the period shown? Suggest a possible reason for this low point. b) Compare the percent drop in stock price from September 1 to September 8 to the drop during the following week. c) Sketch a new graph and provide a commentary that the company could use to encourage investors to buy the company’s stock. Chapter Test • MHR 155
  • 156. 3 PT ER Statistics of Two Variables CHA Specific Expectations Section Define the correlation coefficient as a measure of the fit of a scatter 3.1, 3.2, 3.3, graph to a linear model. 3.5 Calculate the correlation coefficient for a set of data, using graphing 3.1, 3.2, 3.3, calculators or statistical software. 3.5 Demonstrate an understanding of the distinction between cause-effect 3.1, 3.2, 3.3, relationships and the mathematical correlation between variables. 3.4, 3.5 Describe possible misuses of regression. 3.2, 3.3, 3.5 Explain examples of the use and misuse of statistics in the media. 3.5 Assess the validity of conclusions made on the basis of statistical studies, 3.2, 3.3, 3.4, by analysing possible sources of bias in the studies and by calculating 3.5 and interpreting additional statistics, where possible. Demonstrate an understanding of the purpose and the use of a variety of 3.4, 3.5 sampling techniques. Organize and summarize data from secondary sources, using 3.1, 3.2, 3.3, technology. 3.4, 3.5 Locate data to answer questions of significance or personal interest, by 3.1, 3.2, 3.4, searching well-organized databases. 3.5 Use the Internet effectively as a source for databases. 3.1, 3.2, 3.4, 3.5
  • 157. Chapter Problem Job Prospects 1. How could Gina graph this data to Gina is in her second year of business estimate studies at university and she is starting to a) her chances of finding a job in her think about a job upon graduation. She has field when she graduates in two years? two primary concerns—the job market and b) her starting salary? expected income. Gina does some research at the university’s placement centre and 2. What assumptions does Gina have to finds employment statistics for graduates of make for her predictions? What other her program and industry surveys of entry- factors could affect the accuracy of level salaries. Gina’s estimates? Number Mean Starting This chapter introduces statistical Number of Hired Upon Salary Year Graduates techniques for measuring relationships Graduation ($000) 1992 172 151 26 between two variables. As you will see, 1993 180 160 27 these techniques will enable Gina to make 1994 192 140 28 more precise estimates of her job prospects. 1995 170 147 27.5 1996 168 142 27 Two-variable statistics have an enormous 1997 176 155 26.5 range of applications including industrial 1998 180 160 27 processes, medical studies, and 1999 192 162 29 environmental issues—in fact, almost any 2000 200 172 31 field where you need to determine if a 2001 220 180 34 change in one variable affects another.
  • 158. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Scatter plots For each of the following sets of 5. Graphing exponential functions data, create a scatter plot and describe any a) Identify the base and the numerical patterns you see. coefficient for each of the following a) x y b) x y functions. 3 18 4 6 i) y = 0.5(3)x ii) y = 2x iii) y = 100(0.5)x 5 15 7 2 b) Graph each of the functions in part a). 8 12 13 17 c) Explain what happens to the value of x as 9 10 14 5 the curves in part b) approach the x-axis. 12 8 23 19 15 4 24 11 6. Sigma notation Calculate each sum without 17 1 25 30 the use of technology. 8 5 33 21 a) Αi b) Αi 2 36 29 i=1 i=1 − 7. Sigma notation Given x = 2.5, calculate each 40 39 42 26 sum without the use of technology. 6 4 − − 46 32 a) Α (i − x ) i=1 b) Α (i − x ) i=1 2 2. Scatter plots For each plot in question 1, 8. Sigma notation i) graph the line of best fit and calculate a) Repeat questions 6 and 7 using its equation appropriate technology such as a ii) estimate the x- and y-intercepts graphing calculator or a spreadsheet. iii) estimate the value of y when x = 7 b) Explain the method that you chose. 3. Graphing linear equations Determine the 9. Sampling (Chapter 2) Briefly explain each slope and y-intercept for the lines defined by of the following terms. the following equations, and then graph the a) simple random sample lines. b) systematic sample a) y = 3x − 4 b) y = −2x + 6 c) outlier c) 12x − 6y = 7 10. Bias (Chapter 2) 4. Graphing quadratic functions Graph the a) Explain the term measurement bias. following functions and estimate any x- and y-intercepts. b) Give an example of a survey method containing unintentional measurement bias. a) y = 2x2 c) Give an example of a survey method b) y = x2 + 5x − 6 containing intentional measurement bias. c) y = −3x2 + x + 2 d) Give an example of sampling bias. 158 MHR • Statistics of Two Variables
  • 159. 3.1 Scatter Plots and Linear Correlation Does smoking cause lung cancer? Is job performance related to marks in high school? Do pollution levels affect the ozone layer in the atmosphere? Often the answers to such questions are not clear-cut, and inferences have to be made from large sets of data. Two-variable statistics provide methods for detecting relationships between variables and for developing mathematical models of these relationships. The visual pattern in a graph or plot can often reveal the nature of the relationship between two variables. I N V E S T I G AT E & I N Q U I R E : V i s u a l i z i n g R e l a t i o n s h i p s B e t w e e n Va r i a b l e s A study examines two new obedience-training methods for dogs. The dogs were randomly selected to receive from 5 to 16 h of training in one of the two training programs. The dogs were assessed using a performance test graded out of 20. Rogers Method Laing System Hours Score Hours Score 10 12 8 10 15 16 6 9 7 10 15 12 12 15 16 7 8 9 9 11 5 8 11 7 8 11 10 9 16 19 10 6 10 14 8 15 1. Could you determine which of the two training systems is more effective by comparing the mean scores? Could you calculate another statistic that would give a better comparison? Explain your reasoning. 2. Consider how you could plot the data for the Rogers Method. What do you think would be the best method? Explain why. 3. Use this method to plot the data for the Rogers Method. Describe any patterns you see in the plotted data. 4. Use the same method to plot the data for the Laing System and describe any patterns you see. 5. Based on your data plots, which training method do you think is more effective? Explain your answer. 3.1 Scatter Plots and Linear Correlation • MHR 159
  • 160. 6. Did your plotting method make it easy to compare the two sets of data? Are there ways you could improve your method? 7. a) Suggest factors that could influence the test scores but have not been taken into account. b) How could these factors affect the validity of conclusions drawn from the data provided? In data analysis, you are often trying to discern whether one variable, the dependent (or response) variable, is affected by another variable, the independent (or explanatory) variable. Variables have a linear correlation if changes in one variable tend to be proportional to changes in the other. Variables X and Y have a perfect positive (or direct) linear correlation if Y increases at a constant rate as X increases. Similarly, X and Y have a perfect negative (or inverse) linear correlation if Y decreases at a constant rate as X increases. A scatter plot shows such relationships graphically, usually with the independent variable as the horizontal axis and the dependent variable as the vertical axis. The line of best fit is the straight line that passes as close as possible to all of the points on a scatter plot. The stronger the correlation, the more closely the data points cluster around the line of best fit. Example 1 Classifying Linear Correlations Classify the relationship between the variables X and Y for the data shown in the following diagrams. a) y b) y c) y x x x d) y e) y f) y x x x 160 MHR • Statistics of Two Variables
  • 161. Solution a) The data points are clustered around a line that rises to the right (positive slope), indicating definitely that Y increases as X increases. Although the points are not perfectly lined up, there is a strong positive linear correlation between X and Y. b) The data points are all exactly on a line that slopes down to the right, so Y decreases as X increases. In fact, the changes in Y are exactly proportional to the changes in X. There is a perfect negative linear correlation between X and Y. c) No discernible linear pattern exists. As X increases, Y appears to change randomly. Therefore, there is zero linear correlation between X and Y. d) A definite positive trend exists, but it is not as clear as the one in part a). Here, X and Y have a moderate positive linear correlation. e) A slight positive trend exists. X and Y have a weak positive linear correlation. f) A definite negative trend exists, but it is hard to classify at a glance. Here, X and Y have a moderate or strong negative linear correlation. As Example 1 shows, a scatter plot often can give only a rough indication of the correlation between two variables. Obviously, it would be useful to have a more precise way to measure correlation. Karl Pearson (1857−1936) developed a formula for estimating such a measure. Pearson, who also invented the term standard deviation, was a key figure in the development of modern statistics. The Correlation Coefficient To develop a measure of correlation, mathematicians first defined the covariance of two variables in a sample: 1 sXY = ᎏᎏ Α (x − x )( y − − ) − y n−1 where n is the size of the sample, x represents individual values of the variable X, y represents individual values of the variable Y, x is the mean of X, and − is − y the mean of Y. Recall from Chapter 2 that the symbol Α means “the sum of.” Thus, the covariance is the sum of the products of the deviations of x and y for all the data points divided by n − 1. The covariance depends on how the deviations of the two variables are related. For example, the covariance will have a large positive value if both x − x and y − − tend to be large at the same time, and a negative − y value if one tends to be positive when the other is negative. 3.1 Scatter Plots and Linear Correlation • MHR 161
  • 162. The correlation coefficient, r, is the covariance divided by the product of the standard deviations for X and Y: sXY r = ᎏᎏ sX × sY where sX is the standard deviation of X and sY is the standard deviation of Y. This coefficient gives a quantitative measure of the strength of a linear correlation. In other words, the correlation coefficient indicates how closely the data points cluster around the line of best fit. The correlation coefficient is also called the Pearson product-moment coefficient of correlation (PPMC) or Pearson’s r. The correlation coefficient always has values in the range from −1 to 1. Consider a perfect positive linear correlation first. For such correlations, changes in the dependent variable Y are directly proportional to changes in the independent variable X, so Y = aX + b, where a is a positive constant. It follows that sXY = ᎏ Α (x − x )( y − − ) 1 − ∑( y − − )2 Ί๶ y y sY = ᎏᎏ n−1 n−1 1 − − − = ᎏ Α (x − x )[(ax + b) − (ax + b)] ∑[(ax + b) − (ax + b)]2 n−1 = Ί๶๶ ᎏᎏᎏ n−1 1 − = ᎏ Α (x − x )(ax − ax )− − ∑(ax − ax )2 n−1 = ᎏᎏΊ๶ n−1 1 = ᎏ Α a(x − x )2− − a ∑(x − x )2 Ί๶ 2 n−1 = ᎏᎏ n−1 − ∑(x − x )2 − ∑(x − x )2 = a ᎏᎏ n−1 Ί๶ = a ᎏᎏ n−1 = as X 2 = asX Substituting into the equation for the correlation coefficient gives sXY r= ᎏ sX sY as2 =ᎏ X sX (asX ) Y =1 r =1 X 162 MHR • Statistics of Two Variables
  • 163. Similarly, r = −1 for a perfect negative linear correlation. For two variables with no correlation, Y is r=0 equally likely to increase or decrease as X increases. The terms in Α (x − x )( y − −) are − y randomly positive or negative and tend to cancel each other. Therefore, the correlation Y Y coefficient is close to zero if there is little or no correlation between the variables. For r = –0.5 moderate linear correlations, the summation terms partially cancel out. X X The following diagram illustrates how the correlation coefficient corresponds to the strength of a linear correlation. Negative Linear Correlation Positive Linear Correlation Perfect Perfect Strong Moderate Weak Weak Moderate Strong –1 –0.67 – 0.33 0 0.33 0.67 1 Correlation Coefficient, r − Using algebraic manipulation and the fact that Α x = nx , Pearson showed that n∑xy − (∑x)(∑y) r = ᎏᎏᎏᎏ ͙[n∑x2 − (∑x)2][n∑y2 − (∑y)2] ෆ ෆෆෆෆ where n is the number of data points in the sample, x represents individual values of the variable X, and y represents individual values of the variable Y. (Note that Α x2 is the sum of the squares of all the individual values of X, while ( Α x)2 is the square of the sum of all the individual values.) Like the alternative formula for standard deviations (page 150), this formula for r avoids having to calculate all the deviations individually. Many scientific and statistical calculators have built-in functions for calculating the correlation coefficient. It is important to be aware that increasing the number of data points used in determining a correlation improves the accuracy of the mathematical model. Some of the examples and exercise questions have a fairly small set of data in order to simplify the computations. Larger data sets can be found in the e-book that accompanies this text. 3.1 Scatter Plots and Linear Correlation • MHR 163
  • 164. Example 2 Applying the Correlation Coefficient Formula A farmer wants to determine whether there is a relationship between the mean temperature during the growing season and the size of his wheat crop. He assembles the following data for the last six crops. Mean Temperature (°C) Yield (tonnes/hectare) 4 1.6 8 2.4 10 2.0 9 2.6 11 2.1 6 2.2 a) Does a scatter plot of these data indicate any linear correlation between the two variables? b) Compute the correlation coefficient. c) What can the farmer conclude about the relationship between the mean temperatures during the growing season and the wheat yields on his farm? Solution a) The farmer wants to know whether the crop yield depends y 2.5 on temperature. Here, temperature is the independent Yield (T/ha) 2 variable, X, and crop yield is the dependent variable, Y. The 1.5 scatter plot has a somewhat positive trend, so there appears 1 to be a moderate positive linear correlation. 0.5 0 2 4 6 8 10 12 14 x Mean Temperature (ºC) b) To compute r, set up a table to calculate the quantities required by the formula. Temperature, x Yield, y x2 y2 xy 4 1.6 16 2.56 6.4 8 2.4 64 5.76 19.2 10 2.0 100 4.00 20.0 9 2.6 81 6.76 23.4 11 2.1 121 4.41 23.1 6 2.2 36 4.84 13.2 Α x = 48 Α y = 12.9 Α x = 418 2 Αy 2 = 28.33 Α xy = 105.3 164 MHR • Statistics of Two Variables
  • 165. Now compute r, using the formula: Data in Action n∑(xy) − (∑x)(∑y) From 1992 to 2001, r = ᎏᎏᎏᎏ ͙ෆ− (∑x)2][n∑y2 − (∑yෆ [n∑x2 ෆෆ ෆ)2] Canada produced an average of 27 million 6(105.3) − (48)(12.9) = ᎏᎏᎏᎏ tonnes of wheat a [6(418) − (48)2][6(28.33) − (12.9)ෆ ͙ෆෆෆෆෆ2] year. About 70% 631.8 − 619.2 of this crop was = ᎏᎏᎏᎏ exported. (2508 − 2304)(169.98 − 166.41) ͙ෆෆෆෆෆ 12.6 =ᎏ 26.99 = 0.467 The correlation coefficient for crop yield versus mean temperature is approximately 0.47, which confirms a moderate positive linear correlation. c) It appears that the crop yield tends to increase somewhat as the mean temperature for the growing season increases. However, the farmer cannot conclude that higher temperatures cause greater crop yields. Other variables could account for the correlation. For example, the lower temperatures could be associated with heavy rains, which could lower yields by flooding fields or leaching nutrients from the soil. The important principle that a correlation does not prove the existence of a cause- and-effect relationship between two variables is discussed further in section 3.4. Example 3 Using Technology to Determine Correlation Coefficients Determine whether there is a linear correlation between horsepower and fuel consumption for these five vehicles by creating a scatter plot and calculating the correlation coefficient. Vehicle Horsepower, x Fuel Consumption (L/100 km), y Midsize sedan 105 6.7 Minivan 170 23.5 Small sports utility vehicle 124 5.9 Midsize motorcycle 17 3.4 Luxury sports car 296 8.4 Solution 1 Using a Graphing Calculator Use the ClrList command to make sure lists L1 and L2 are clear, then enter See Appendix B for more details on the the horsepower data in L1 and the fuel consumption figures in L2. graphing calculator To display a scatter plot, first make sure that all functions in the Y= editor and software functions are either clear or turned off. Then, use STAT PLOT to select PLOT1. used in this section. 3.1 Scatter Plots and Linear Correlation • MHR 165
  • 166. Turn the plot on, select the scatter-plot icon, and enter L1 for XLIST and L2 for YLIST. (Some of these settings may already be in place.) From the ZOOM menu, select 9:ZoomStat. The calculator will automatically optimize the window settings and display the scatter plot. To calculate the correlation coefficient, from the CATALOG menu, select DiagnosticOn, then select the LinReg(ax+b) instruction from the STAT CALC menu. The calculator will perform a series of statistical calculations using the data in lists L1 and L2. The last line on the screen shows that the correlation coefficient is approximately 0.353. Therefore, there is a moderate linear correlation between horsepower and fuel consumption for the five vehicles. Solution 2 Using a Spreadsheet Set up three columns and enter the data from the table above. Highlight the numerical data and use your spreadsheet’s Chart feature to display a scatter plot. Both Corel® Quattro® Pro and Microsoft® Excel have a CORREL function that allows you to calculate the correlation coefficient easily. The scatter plot and correlation coefficient indicate a moderate correlation between horsepower and fuel consumption. Solution 3 Using Fathom™ Create a new collection by setting up a case table with three attributes: Vehicle, Hp, and FuelUse. Enter the data for the five cases. To create a scatter plot, drag the graph icon onto the work area and drop the Hp attribute on the x-axis and the FuelUse attribute on the y-axis. 166 MHR • Statistics of Two Variables
  • 167. To calculate the correlation coefficient, right-click on the collection and select Inspect Collection. Select the Measures tab and name a new measure PPMC. Right-click this measure and select Edit Formula, then Functions/Statistical/Two Attributes/correlation. When you enter the Hp and FuelUse attributes in the correlation function, Fathom™ will calculate the correlation coefficient for these data. Again, the scatter plot and correlation coefficient show a moderate linear correlation. Project Prep For your statistics project, you may be investigating the linear correlation between two variables. A graphing calculator or computer software may be a valuable Notice that the scatter plots in Example 3 have an outlier at (170, 23.5). aid for this Without this data point, you would have a strong positive linear correlation. analysis. Section 3.2 examines the effect of outliers in more detail. Key Concepts • Statistical studies often find linear correlations between two variables. • A scatter plot can often reveal the relationship between two variables. The independent variable is usually plotted on the horizontal axis and the dependent variable on the vertical axis. • Two variables have a linear correlation if changes in one variable tend to be proportional to changes in the other. Linear correlations can be positive or negative and vary in strength from zero to perfect. • The correlation coefficient, r, is a quantitative measure of the correlation between two variables. Negative values indicate negative correlations while positive values indicate positive correlations. The greater the absolute value of r, the stronger the linear correlation, with zero indicating no correlation at all and 1 indicating a perfect correlation. • Manual calculations of correlation coefficients can be quite tedious, but a variety of powerful technology tools are available for such calculations. 3.1 Scatter Plots and Linear Correlation • MHR 167
  • 168. Communicate Your Understanding 1. Describe the advantages and disadvantages of using a scatter plot or the correlation coefficient to estimate the strength of a linear correlation. 2. a) What is the meaning of a correlation coefficient of i) −1? ii) 0? iii) 0.5? b) Can the correlation coefficient have a value greater than 1? Why or why not? 3. A mathematics class finds a correlation coefficient of 0.25 for the students’ midterm marks and their driver’s test scores and a coefficient of −0.72 for their weight-height ratios and times in a 1-km run. Which of these two correlations is stronger? Explain your answer. Practise Apply, Solve, Communicate A B 1. Classify the type of linear correlation that 3. For a week prior to their final physics you would expect with the following pairs examination, a group of friends collect of variables. data to see whether time spent studying a) hours of study, examination score or time spent watching TV had a stronger correlation with their marks on the b) speed in excess of the speed limit, examination. amount charged on a traffic fine Hours Examination c) hours of television watched per week, Hours Studied Watching TV Score final mark in calculus 10 8 72 d) a person’s height, sum of the digits in 11 7 67 the person’s telephone number 15 4 81 e) a person’s height, the person’s strength 14 3 93 8 9 54 2. Identify the independent variable and the 5 10 66 dependent variable in a correlational study of a) Create a scatter plot of hours studied a) heart disease and cholesterol level versus examination score. Classify the b) hours of basketball practice and free- linear correlation. throw success rate b) Create a similar scatter plot for the c) amount of fertilizer used and height hours spent watching TV. of plant c) Which independent variable has a d) income and level of education stronger correlation with the examination scores? Explain. e) running speed and pulse rate 168 MHR • Statistics of Two Variables
  • 169. d) Calculate the correlation coefficient for c) Does the computed r-value agree with hours studied versus examination score the classification you made in part a)? and for hours watching TV versus Explain why or why not. examination score. Do these answers d) Identify any outliers in the data. support your answer to c)? Explain. e) Suggest possible reasons for any outliers 4. Application Refer to the tables in the identified in part d). investigation on page 159. 6. Application Six classmates compared their a) Determine the correlation coefficient arm spans and their scores on a recent and classify the linear correlation for mathematics test as shown in the following the data for each training method. table. Span (m) Arm Score b) Suppose that you interchanged the 1.5 82 dependent and independent variables, 1.4 71 so that the test scores appear on the 1.7 75 horizontal axis of a scatter plot and the 1.6 66 hours of training appear on the vertical 1.6 90 axis. Predict the effect this change will 1.8 73 have on the scatter plot and the correlation coefficient for each set of data. a) Illustrate these data with a scatter plot. c) Test your predictions by plotting the data and calculating the correlation b) Determine the correlation coefficient coefficients with the variables reversed. and classify the linear correlation. Explain any differences between your c) What can the students conclude from results and your predictions in part b). their data? 5. A company studied whether there was a 7. a) Use data in the table on page 157 to pte relationship between its employees’ years of ha create a scatter plot that compares the size C r service and number of days absent. The data of graduating classes in Gina’s program to m P r oble for eight randomly selected employees are the number of graduates who found jobs. shown below. b) Classify the linear correlation. Years of Days Absent c) Determine the linear correlation Employee Service Last Year coefficient. Jim 5 2 Leah 2 6 8. a) Search sources such as E-STAT, Efraim 7 3 CANSIM II, the Internet, newspapers, Dawn 6 3 and magazines for pairs of variables that Chris 4 4 exhibit Cheyenne 8 0 i) a strong positive linear correlation Karrie 1 2 ii) a strong negative linear correlation Luke 10 1 iii) a weak or zero linear correlation a) Create a scatter plot for these data and b) For each pair of variables in part a), classify the linear correlation. identify the independent variable and b) Calculate the correlation coefficient. the dependent variable. 3.1 Scatter Plots and Linear Correlation • MHR 169
  • 170. 9. Find a set of data for two variables known 13. a) Search sources such as newspapers, to have a perfect positive linear correlation. magazines, and the Internet for a set of Use these data to demonstrate that the two-variable data with correlation coefficient for such variables is 1. i) a moderate positive linear correlation Alternatively, find a set of data with a perfect ii) a moderate negative correlation negative correlation and show that the correlation coefficient is −1. iii) a correlation in which |r| > 0.9 b) Outline any conclusions that you can 10. Communication make from each set of data. Are there a) Would you expect to see a correlation any assumptions inherent in these between the temperature at an outdoor conclusions? Explain. track and the number of people using c) Pose at least two questions that could the track? Why or why not? form the basis for further research. b) Sketch a typical scatter plot of this type of data. 14. a) Sketch scatter plots of three different patterns of data that you think would c) Explain the key features of your scatter have zero linear correlation. plot. b) Explain why r would equal zero for each 11. Inquiry/Problem Solving Refer to data tables of these patterns. in the investigation on page 159. c) Use Fathom™ or a spreadsheet to create a) How could the Rogers Training a scatter plot that looks like one of your Company graph the data so that their patterns and calculate the correlation training method looks particularly good? coefficient. Adjust the data points to get b) How could Laing Limited present the r as close to zero as you can. same data in a way that favours their training system? c) How could a mathematically knowledgeable consumer detect the distortions in how the two companies present the data? C 12. Inquiry/Problem Solving a) Prove that interchanging the independent and dependent variables does not change the correlation coefficient for any set of data. b) Illustrate your proof with calculations using a set of data selected from one of the examples or exercise questions in this section. 170 MHR • Statistics of Two Variables
  • 171. 3.2 Linear Regression Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear correlation, you can develop a simple mathematical model of the relationship between the two variables by finding a line of best fit. You can then use the equation for this line to make predictions by interpolation (estimating between data points) and extrapolation (estimating beyond the range of the data). I N V E S T I G AT E & I N Q U I R E : Modelling a Linear Relationship A university would like to construct a mathematical model to predict first-year marks for incoming students based on their achievement in grade 12. A comparison of these marks for a random sample of first-year students is shown below. Grade 12 Average 85 90 76 78 88 84 76 96 86 85 First-Year Average 74 83 68 70 75 72 64 91 78 86 1. a) Construct a scatter plot for these data. Which variable should be placed on the vertical axis? Explain. b) Classify the linear correlation for this data, based on the scatter plot. 2. a) Estimate and draw a line of best fit for the data. b) Measure the slope and y-intercept for this line, and write an equation for it in the form y = mx + b. 3. Use this linear model to predict a) the first-year average for a student who had an 82 average in grade 12 b) the grade-12 average for a student with a first-year average of 60 4. a) Use software or the linear regression instruction of a graphing calculator to find the slope and y-intercept for the line of best fit. (Note that most graphing calculators use a instead of m to represent slope.) b) Are this slope and y-intercept close to the ones you measured in question 2? Why or why not? 3.2 Linear Regression • MHR 171
  • 172. c) Estimate how much the new values for slope and y-intercept will change your predictions in question 3. Check your estimate by recalculating your predictions using the new values and explain any discrepancies. 5. List the factors that could affect the accuracy of these mathematical models. Which factor do you think is most critical? How could you test how much effect this factor could have? It is fairly easy to “eyeball” a good estimate of the line of best fit on a scatter plot when the linear correlation is strong. However, an analytic method using a least-squares fit gives more accurate results, especially for weak correlations. Consider the line of best fit in the following scatter plot. A dashed blue line shows the residual or vertical deviation of each data point from the line of best fit. The residual is the difference between the values of y at the data point and at the point that lies on the line of best fit and has the same x-coordinate as the data point. Notice that the residuals are positive for points above the line and negative for points below the line. The boxes show the squares of the residuals. y x For the line of best fit in the least-squares method, • the sum of the residuals is zero (the positive and negative residuals cancel out) • the sum of the squares of the residuals has the least possible value Although the algebra is daunting, it can be shown that this line has the equation n(∑xy) − (∑x)(∑y) y = ax + b, where a = ᎏᎏ and b = − − ax y − n(∑x2) − (∑x)2 Recall from Chapter 2 that x is the mean of x and − is the mean of y. Many − y statistics texts use an equation with the form y = a + bx, so you may sometimes see the equations for a and b reversed. 172 MHR • Statistics of Two Variables
  • 173. Example 1 Applying the Least-Squares Formula This table shows data for the full-time employees of a Age (years) Annual Income ($000) small company. 33 33 a) Use a scatter plot to classify the correlation between 25 31 age and income. 19 18 b) Find the equation of the line of best fit analytically. 44 52 c) Predict the income for a new employee who is 21 and 50 56 an employee retiring at age 65. 54 60 38 44 29 35 Solution a) The scatter plot suggests a strong positive linear 65 correlation between age and income level. 55 Income 45 35 25 15 0 15 20 25 30 35 40 45 50 55 Age b) To determine the equation of the line of best fit, organize the data into a table and compute the sums required for the formula. Age, x Income, y x2 xy 33 33 1089 1089 25 31 625 775 19 18 361 342 44 52 1936 2288 50 56 2500 2800 54 60 2916 3240 38 44 1444 1672 29 35 841 1015 Α x = 292 Α y = 329 Αx 2 = 11 712 Α xy = 13 221 Substitute these totals into the formula for a. n(∑xy) − (∑x)(∑y) a = ᎏᎏ n(∑x2) − (∑x)2 8(13 221) − (292)(329) = ᎏᎏᎏ 8(11 712) − (292)2 9700 =ᎏ 8432 ⋅ 1.15 = 3.2 Linear Regression • MHR 173
  • 174. To determine b, you also need the means of x and y. − ∑x −= ᎏ∑y x =ᎏ y b = − − ax y − n n = 41.125 − 1.15(36.5) 292 329 = −0.85 =ᎏ =ᎏ 8 8 = 36.5 = 41.125 Now, substitute the values of a and b into the equation for the line of best fit. y = ax + b = 1.15x − 0.85 Therefore, the equation of the line of best fit is y = 1.15x − 0.85. c) Use the equation of the line of best fit as a model. For a 21-year-old employee, For a 65-year-old employee, y = ax + b y = ax + b = 1.15(21) − 0.85 = 1.15(65) − 0.85 = 23.3 = 73.9 Therefore, you would expect the new employee to have an income of about $23 300 and the retiring employee to have an income of about $73 900. Note that the second estimate is an extrapolation beyond the range of the data, so it could be less accurate than the first estimate, which is interpolated between two data points. Note that the slope a indicates only how y varies with x on the line of best fit. The slope does not tell you anything about the strength of the correlation between the two variables. It is quite possible to have a weak correlation with a large slope or a strong correlation with a small slope. Example 2 Linear Regression Using Technology Researchers monitoring the numbers of wolves and rabbits in a wildlife reserve think that the wolf population depends on the rabbit population since wolves prey on rabbits. Over the years, the researchers collected the following data. Year 1994 1995 1996 1997 1998 1999 2000 2001 Rabbit Population 61 72 78 76 65 54 39 43 Wolf Population 26 33 42 49 37 30 24 19 a) Determine the line of best fit and the correlation coefficient for these data. b) Graph the data and the line of best fit. Do these data support the researchers’ theory? 174 MHR • Statistics of Two Variables
  • 175. Solution 1 Using a Graphing Calculator a) You can use the calculator’s linear regression instruction to find both the line of best fit and the correlation coefficient. Since the theory is that the wolf population depends on the rabbit population, the rabbit population is the independent variable and the wolf population is the dependent variable. Use the STAT EDIT menu to enter the rabbit data into list L1 and the wolf data into L2. Set DiagnosticOn, and then use the STAT CALC menu to select LinReg(ax+b). The equation of the line of best fit is y = 0.58x − 3.1 and the correlation coefficient is 0.87. b) Store the equation for the line of best fit as a function, Y1. Then, use the STAT PLOT menu to set up the scatter plot. By displaying both Y1 and the scatter plot, you can see how closely the data plots are distributed around the line of best fit. The correlation coefficient and the scatter plot show a strong positive linear correlation between the variables. This correlation supports the researchers’ theory, but does not prove that changes in the rabbit population are the cause of the changes in the wolf population. Solution 2 Using a Spreadsheet Set up a table with the data for the rabbit and wolf populations. You can calculate the correlation coefficient with the CORREL function. Use the Chart feature to create a scatter plot. In Corel® Quattro® Pro, you can find the equation of the line of best fit by selecting Tools/Numeric Tools/Regression. Enter the cell ranges for the data, and the program will display regression calculations including the constant (b), the x-coefficient (or slope, a), and r 2. 3.2 Linear Regression • MHR 175
  • 176. In Microsoft® Excel, you can find the equation of the line of best fit by selecting Chart/Add Trendline. Check that the default setting is Linear. Select the straight line that appears on your chart, then click Format/Selected Trendline/Options. Check the Display equation on chart box. You can also display r 2. Project Prep When analysing two-variable data Solution 3 Using Fathom™ for your statistics project, you may Drag a new case table to the workspace, create attributes for Year, Rabbits, and wish to develop a Wolves, and enter the data. Drag a new graph to the workspace, then drag the linear model, Rabbits attribute to the x-axis and the Wolves attribute to the y-axis. From the particularly if a Graph menu, select Least Squares Line. Fathom™ will display r 2 and the strong linear equation for the line of best fit. To calculate the correlation coefficient directly, correlation is select Inspect Collection, click the Measures tab, then create a new measure by evident. selecting Functions/Statistical/Two Attributes/correlation and entering Rabbits and Wolves as the attributes. 176 MHR • Statistics of Two Variables
  • 177. In Example 2, the sample size is small, so you should be cautious about making generalizations from it. Small samples have a greater chance of not being representative of the whole population. Also, outliers can seriously affect the results of a regression on a small sample. Example 3 The Effect of Outliers To evaluate the performance of one of its instructors, a driving school tabulates the number of hours of instruction and the driving-test scores for the instructor’s students. Instructional Hours 10 15 21 6 18 20 12 Student’s Score 78 85 96 75 84 45 82 a) What assumption is the management of the driving school making? Is this assumption reasonable? b) Analyse these data to determine whether they suggest that the instructor is an effective teacher. c) Comment on any data that seem unusual. d) Determine the effect of any outliers on your analysis. Solution a) The management of the driving school is assuming that the correlation between instructional hours and test scores is an indication of the instructor’s teaching skills. Such a relationship could be difficult to prove definitively. However, the assumption would be reasonable if the driving school has found that some instructors have consistently strong correlations between the time spent with their students and the students’ test scores while other instructors have consistently weaker correlations. b) The number of hours of instruction is the independent variable. You could analyse the data using any of the methods in the two previous examples. For simplicity, a spreadsheet solution is shown here. Except for an obvious outlier at (20, 45), the scatter plot below indicates a strong positive linear correlation. At first glance, it appears that the number of instructional hours is positively correlated to the students’ test scores. However, the linear regression analysis yields a line of best fit with the equation y = −0.13x + 80 and a correlation coefficient of −0.05. These results indicate that there is virtually a zero linear correlation, and the line of best fit even has a negative slope! The outlier has a dramatic impact on the regression results because it is distant from the other data points and the sample size is quite small. Although the scatter plot looked 3.2 Linear Regression • MHR 177
  • 178. favourable, the regression analysis suggests that the instructor’s lessons had no positive effect on the students’ test results. c) The fact that the outlier is substantially below all the other data points suggests that some special circumstance may have caused an abnormal result. For instance, there might have been an illness or emotional upset that affected this one student’s performance on the driving test. In that case, it would be reasonable to exclude this data point when evaluating the driving instructor. d) Remove the outlier from your data table and repeat your analysis. Notice that the line of best fit is now much closer to the data points and has a positive slope. The correlation coefficient, r, is 0.93, indicating a strong positive linear correlation between the number of instructional hours and the driver’s test scores. This result suggests that the instructor may be an effective teacher after all. It is quite possible that the original analysis was not a fair evaluation. However, to do a proper evaluation, you would need a larger set of data, more information about the outlier, or, ideally, both. 178 MHR • Statistics of Two Variables
  • 179. As Example 3 demonstrates, outliers can skew a Project regression analysis, but they could also simply Prep indicate that the data really do have large variations. A comprehensive analysis of a set of data should look If your statistics project involves a for outliers, examine their possible causes and their linear relationship that contains effect on the analysis, and discuss whether they outliers, you will need to consider should be excluded from the calculations. As you carefully their impact on your results, observed in Chapter 2, outliers have less effect on and how you will deal with them. larger samples. www.mcgrawhill.ca/links/MDM12 Visit the above web site and follow the links to learn more about linear regression. Describe an application of linear regression that interests you. Key Concepts • Linear regression provides a means for analytically determining a line of best fit. In the least-squares method, the line of best fit is the line which minimizes the sum of the squares of the residuals while having the sum of the residuals equal zero. • You can use the equation of the line of best fit to predict the value of one of the two variables given the value of the other variable. • The correlation coefficient is a measure of how well a regression line fits a set of data. • Outliers and small sample sizes can reduce the accuracy of a linear model. Communicate Your Understanding 1. What does the correlation coefficient reveal about the line of best fit generated by a linear regression? 2. Will the correlation coefficient always be negative when the slope of the line of best fit is negative? Explain your reasoning. 3. Describe the problem that outliers present for a regression analysis and outline what you could do to resolve this problem. 3.2 Linear Regression • MHR 179
  • 180. Practise a) Create a scatter plot and classify the linear correlation. A b) Apply the method of least squares to 1. Identify any outliers in the following sets of generate the equation of the line of data and explain your choices. best fit. a) X 25 34 43 55 92 105 16 c) Predict the mass of a trainee whose Y 30 41 52 66 18 120 21 height is 165 cm. X 5 7 6 6 4 8 d) Predict the height of a 79-kg trainee. b) Y 304 99 198 205 106 9 e) Explain any discrepancy between your answer to part d) and the actual height of 2. a) Perform a linear regression analysis to the 79-kg trainee in the sample group. generate the line of best fit for each set of data in question 1. 6. A random survey of a small group of high- school students collected information on the b) Repeat the linear regressions in part a), students’ ages and the number of books they leaving out any outliers. had read in the past year. c) Compare the lines of best fit in parts a) and b). Age (years) Books Read 16 5 Apply, Solve, Communicate 15 3 18 8 B 17 6 3. Use the formula for the method of least 16 4 squares to verify the slope and intercept 15 4 values you found for the data in the 14 5 investigation on page 171. Account for 17 15 any discrepancies. a) Create a scatter plot for this data. 4. Use software or a graphing calculator to Classify the linear correlation. verify the regression results in Example 1. b) Determine the correlation coefficient 5. Application The following table lists the and the equation of the line of best fit. heights and masses for a group of fire- c) Identify the outlier. department trainees. d) Repeat part b) with the outlier excluded. Height (cm) Mass (kg) e) Does removing the outlier improve the 177 91 linear model? Explain. 185 88 f) Suggest other ways to improve the 173 82 model. 169 79 g) Do your results suggest that the number 188 87 of books a student reads depends on the 182 85 student’s age? Explain. 175 79 180 MHR • Statistics of Two Variables
  • 181. 7. Application Market research has provided b) Determine the correlation coefficient the following data on the monthly sales of and the equation of the line of best fit. a licensed T-shirt for a popular rock band. c) Repeat the linear regression analysis with Price ($) Monthly Sales any outliers removed. 10 2500 d) Repeat parts a) and b) using the data for 12 2200 the productions in 2002. 15 1600 e) Repeat parts a) and b) using the 18 1200 combined data for productions in both 20 800 2001 and 2002. Do there still appear to 24 250 be any outliers? a) Create a scatter plot for these data. f) Which of the four linear equations do you think is the best model for the b) Use linear regression to model these relationship between production costs data. and revenue? Explain your choice. c) Predict the sales if the shirts are priced g) Explain why the executive producer at $19. might choose to use the equation from d) The vendor has 1500 shirts in stock and part d) to predict the income from the band is going to finish its concert MDM’s 2003 productions. tour in a month. What is the maximum price the vendor can charge and still 9. At Gina’s university, there are 250 business pt avoid having shirts left over when the ha e students who expect to graduate in 2006. C r band stops touring? a) Model the relationship between the total m P r oble number of graduates and the number 8. Communication MDM Entertainment has hired by performing a linear regression produced a series of TV specials on the lives on the data in the table on page 157. of great mathematicians. The executive Determine the equation of the line of producer wants to know if there is a linear best fit and the correlation coefficient. correlation between production costs and revenue from the sales of broadcast rights. b) Use this linear model to predict how The costs and gross sales revenue for many graduates will be hired in 2006. productions in 2001 and 2002 were as c) Identify any outliers in this scatter plot follows (amounts in millions of dollars). and suggest possible reasons for an 2001 2002 outlier. Would any of these reasons Cost ($M) Sales ($M) Cost ($M) Sales ($M) justify excluding the outlier from the 5.5 15.4 2.7 5.2 regression calculations? 4.1 12.1 1.9 1.0 d) Repeat part a) with the outlier removed. 1.8 6.9 3.4 3.4 e) Compare the results in parts a) and d). 3.2 9.4 2.1 1.9 What assumptions do you have to make? 4.2 1.5 1.4 1.5 a) Create a scatter plot using the data for the productions in 2001. Do there appear to be any outliers? Explain. 3.2 Linear Regression • MHR 181
  • 182. 10. Communication Refer to Example 2, which ii) add a moveable line to the scatter plot describes population data for wolves and and construct the geometric square rabbits in a wildlife reserve. An alternate for the deviation of each data point theory has it that the rabbit population from the moveable line depends on the wolf population since the iii) generate a dynamic sum of the areas wolves prey on the rabbits. of these squares a) Create a scatter plot of rabbit population iv) manoeuvre the moveable line to the versus wolf population and classify the position that minimizes the sum of linear correlation. How are your data the areas of the squares. points related to those in Example 2? v) record the equation of this line b) Determine the correlation coefficient b) Determine the equation of the line of and the equation of the line of best fit. best fit for this set of data. Graph this line on your scatter plot. c) Compare the equations you found in c) Is the equation of the line of best fit the parts a) and b). Explain any differences inverse of that found in Example 2? or similarities. Explain. d) Plot both populations as a time series. 12. Application Use E-STAT or other sources Can you recognize a pattern or to obtain the annual consumer price index relationship between the two series? figures from 1914 to 2000. Explain. a) Download this information into a e) Does the time series suggest which spreadsheet or statistical software, or population is the dependent variable? enter it into a graphing calculator. (If you Explain. use a graphing calculator, enter the data from every third year.) Find the line of 11. The following table lists the mathematics best fit and comment on whether a of data management marks and grade 12 straight line appears to be a good model averages for a small group of students. for the data. Mathematics of Data Grade 12 b) What does the slope of the line of best Management Mark Average fit tell you about the rate of inflation? 74 77 c) Find the slope of the line of best fit for 81 87 the data for just the last 20 years, and 66 68 then repeat the calculation using only 53 67 the data for the last 5 years. 92 85 d) What conclusions can you make by 45 55 comparing the three slopes? Explain 80 76 your reasoning. a) Using FathomTM or The Geometer’s Sketchpad, i) create a scatter plot for these data 182 MHR • Statistics of Two Variables
  • 183. ACHIEVEMENT CHECK C Knowledge/ Thinking/Inquiry/ 14. Suppose that a set of data has a perfect linear Communication Application Understanding Problem Solving correlation except for two outliers, one above 13. The Worldwatch Institute has collected the line of best fit and the other an equal the following data on concentrations of distance below it. The residuals of these two carbon dioxide (CO2) in the atmosphere. outliers are equal in magnitude, but one is positive and the other negative. Would you Year CO2 Level (ppm) agree that a perfect linear correlation exists 1975 331 because the effects of the two residuals 1976 332 cancel out? Support your opinion with 1977 333.7 mathematical reasoning and a diagram. 1978 335.3 1979 336.7 15. Inquiry/Problem Solving Recall the formulas 1980 338.5 for the line of best fit using the method of 1981 339.8 least squares that minimizes the squares of 1982 341 vertical deviations. 1983 342.6 a) Modify these formulas to produce a line 1984 344.3 of best fit that minimizes the squares of 1985 345.7 horizontal deviations. 1986 347 b) Do you think your modified formulas 1987 348.8 will produce the same equation as the 1988 351.4 regular least-squares formula? 1989 352.7 1990 354 c) Use your modified formula to calculate 1991 355.5 a line of best fit for one of the examples 1992 356.2 in this section. Does your line have the 1993 357 same equation as the line of best fit in 1994 358.8 the example? Is your equation the inverse 1995 360.7 of the equation in the example? Explain why or why not. a) Use technology to produce a scatter plot of these data and describe any 16. a) Calculate the residuals for all of the data correlation that exists. points in Example 3 on page 177. Make a plot of these residuals versus the b) Use a linear regression to find the line independent variable, X, and comment of best fit for the data. Discuss the on any pattern you see. reliability of this model. b) Explain how you could use such residual c) Use the regression equation to predict plots to detect outliers. the level of atmospheric CO2 that you would expect today. d) Research current CO2 levels. Are the results close to the predicted level? What factors could have affected the trend? 3.2 Linear Regression • MHR 183
  • 184. 3.3 Non-Linear Regression Many relationships between two variables follow patterns that are not linear. For example, square-law, exponential, and logarithmic relationships often appear in the natural sciences. Non-linear regression is an analytical technique for finding a curve of best fit for data from such relationships. The equation for this curve can then be used to model the relationship between the two variables. As you might expect, the calculations for curves are more complicated than those for straight lines. Graphing calculators have built-in regression functions for a variety of curves, as do some spreadsheets and statistical programs. Once you enter the data and specify the type of curve, these technologies can automatically find the best-fit curve of that type. They can also calculate the coefficient of determination, r 2, which is a useful measure of how closely a curve fits the data. I N V E S T I G AT E & I N Q U I R E : Bacterial Growth A laboratory technician monitors the growth of a bacterial culture by scanning it every hour and estimating the number of bacteria. The initial population is unknown. Time (h) 0 1 2 3 4 5 6 7 Population ? 10 21 43 82 168 320 475 1. a) Create a scatter plot and classify the linear correlation. b) Determine the correlation coefficient and the line of best fit. c) Add the line of best fit to your scatter plot. Do you think this line is a satisfactory model? Explain why or why not. 2. a) Use software or a graphing calculator to find a curve of best fit with a i) quadratic regression of the form y = ax2 + bx + c ii) cubic regression of the form y = ax3 + bx2 + cx + d b) Graph these curves onto a scatter plot of the data. c) Record the equation and the coefficient of determination, r 2, for the curves. d) Use the equations to estimate the initial population See Appendix B for details of the bacterial culture. Do these estimates seem on using technology for reasonable? Why or why not? non-linear regressions. 184 MHR • Statistics of Two Variables
  • 185. 3. a) Perform an exponential regression on the data. Graph the curve of best fit and record its equation and coefficient of determination. b) Use this model to estimate the initial population. c) Do you think the exponential equation is a better model for the growth of the bacterial culture than the quadratic or cubic equations? Explain your reasoning. Recall that Pearson’s correlation coefficient, r, is a measure of the linearity of the data, so it can indicate only how closely a straight line fits the data. However, the coefficient of determination, r 2, is defined such that it applies to any type of regression curve. variation in y explained by variation in x r 2 = ᎏᎏᎏᎏᎏ total variation in y – )2 ∑( yest − y = ᎏᎏ ∑( y − – )2 y where − is the mean y value, yest is the value estimated by the best-fit curve for y a given value of x, and y is the actual observed value for a given value of x. Unexplained (x, y) Total deviation deviation Explained (x,y est) The total variation is the deviation sum of the squares of the Y y deviations for all of the individual data points. Curve of best fit X The coefficient of determination can have values from 0 to 1. If the curve is a perfect fit, then yest and y will be identical for each value of x. In this case, the variation in x accounts for all of the variation in y, so r 2 = 1. Conversely, if the curve is a poor fit, the total of ( yest − − )2 will be much smaller than the total of y ( y − − )2, since the variation in x will account for only a small part of the total y variation in y. Therefore, r 2 will be close to 0. For any given type of regression, the curve of best fit will be the one that has the highest value for r 2. For graphing calculators and Microsoft® Excel, the procedures for non-linear regression are almost identical to those for linear regression. At present, Corel® Quattro® Pro and Fathom™ do not have built-in functions for non-linear regression. 3.3 Non-Linear Regression • MHR 185
  • 186. Exponential Regression Exponential regressions produce equations with the form y = ab x or y = ae kx, where e = 2.718 28…, an irrational number commonly used as the base for exponents and logarithms. These two forms are equivalent, and it is straightforward to convert from one to the other. Example 1 Exponential Regression Generate an exponential regression for the bacterial culture in the investigation on page 184. Graph the curve of best fit and determine its equation and the coefficient of determination. Solution 1 Using a Graphing Calculator Use the ClrList command from the STAT EDIT menu to clear lists L1 and L2, and then enter the data. Set DiagnosticOn so that regression calculations will display the coefficient of determination. From the STAT CALC menu, select the non-linear regression function ExpReg. If you do not enter any list names, the calculator will use L1 and L2 by default. The equation for the curve of best fit is y = 5.70(1.93) x, and the coefficient of determination is r 2 = 0.995. Store the equation as Y1. Use STAT PLOT to display a scatter plot of the data along with Y1. From the ZOOM menu, select 9:ZoomStat to adjust the window settings automatically. Solution 2 Using a Spreadsheet Enter the data into two columns. Next, highlight these columns and use the Chart feature to create an x-y scatter plot. 186 MHR • Statistics of Two Variables
  • 187. Select Chart/Add Trendline and then choose Expontenial regression. Then, select the curve that appears on your chart, and click Format/Selected Trendline/Options. Check the option boxes to display the equation and r 2. The equation of the best-fit curve is y = 5.7e0.66x and the coefficient of determination is r 2 = 0.995. This equation appears different from the one found ⋅ with the graphing calculator. In fact, the two forms are equivalent, since e0.66 = 1.93. Power and Polynomial Regression In power regressions, the curve of best fit has an equation with the form y = ax b. Example 2 Power Regression For a physics project, a group of students videotape a ball dropped from the top of a 4-m high ladder, which they have marked every 10 cm. During playback, they stop the videotape every tenth of a second and compile the following table for the distance the ball travelled. Time (s) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Distance (m) 0.05 0.2 0.4 0.8 1.2 1.7 2.4 3.1 3.9 4.9 a) Does a linear model fit the data well? b) Use a power regression to find a curve of best fit for the data. Does the power-regression curve fit the data more closely than the linear model does? c) Use the equation for the regression curve to predict i) how long the ball would take to fall 10 m ii) how far the ball would fall in 5 s 3.3 Non-Linear Regression • MHR 187
  • 188. Solution 1 Using a Graphing Calculator a) Although the linear correlation coefficient is 0.97, a scatter plot of the data shows a definite curved pattern. Since b = −1.09, the linear model predicts an initial position of about –1.1 m and clearly does not fit the first part of the data well. Also, the pattern in the scatter plot suggests the linear model could give inaccurate predictions for times beyond 1 s. b) From the STAT CALC menu, select the non-linear regression function PwrReg and then follow the same steps as in Example 1. The equation for the curve of best fit is y = 4.83x2. The coefficient of determination and a graph on the scatter plot show that the quadratic curve is almost a perfect fit to the data. c) Substitute the known values into the equation for the quadratic curve of best fit: i) 10 = 4.83x2 ii) y = 4.83(5)2 10 x2 = ᎏᎏ = 4.83(25) 4.83 = 121 Ί๶ 10 x = ᎏᎏ 4.83 = 1.4 The quadratic model predicts that i) the ball would take approximately 1.4 s to fall 10 m ii) the ball would fall 121 m in 5 s Solution 2 Using a Spreadsheet a) As in Solution 1, the scatter plot shows that a curve might be a better model. 188 MHR • Statistics of Two Variables
  • 189. b) Use the Chart feature as in Example 1, but select Power when adding the trend line. The equation for the curve of best fit is y = 4.83x 2. The graph and the value for r 2 show that the quadratic curve is almost a perfect fit to the data. c) Use the equation for the curve of best fit to enter formulas for the two values you want to predict, as shown in cells A13 and B14 in the screen above. Example 3 Polynomial Regression Suppose that the laboratory technician takes further measurements of the bacterial culture in Example 1. Time (h) 8 9 10 11 12 13 14 Population 630 775 830 980 1105 1215 1410 a) Discuss the effectiveness of the exponential model from Example 1 for the new data. b) Find a new exponential curve of best fit. c) Find a better curve of best fit. Comment on the effectiveness of the new model. 3.3 Non-Linear Regression • MHR 189
  • 190. Solution a) If you add the new data to the scatter plot, you will see that the exponential curve determined earlier, y = 5.7(1.9) x, is no longer a good fit. b) If you perform a new exponential regression on all 14 data points, you obtain the equation y = 18(1.4) x with a coefficient of determination of r 2 = 0.88. From the graph, you can see that this curve is not a particularly good fit either. Because of the wide range of non-linear regression options, you can insist on a fairly high value of r 2 when searching for a curve of best fit to model the data. c) If you perform a quadratic regression, you get a much better fit with the equation y = 4.0x2 + 55x − 122 and a coefficient of determination of r 2 = 0.986. This quadratic model will probably serve well for interpolating between most of the data shown, but may not be accurate for times before 3 h and after 14 h. At some point between 2 h and 3 h, the curve intersects the x-axis, indicating a negative population prior to this time. Clearly the quadratic model is not accurate in this range. Similarly, if you zoom out, you will notice a problem beyond 14 h. The rate of change of the quadratic curve continues to increase after 14 h, but the trend of the data does not suggest such an increase. In fact, from 7 h to 14 h the trend appears quite linear. It is important to recognize the limitations of regression curves. One interesting property of polynomial regressions is that for a set of n data points, a polynomial function of degree n − 1 can be produced which perfectly fits the data, that is, with r 2 = 1. For example, you can determine the equation for a line (a first-degree polynomial) with two points and the equation for a quadratic (a second- Project degree polynomial) with three points. However, these polynomials are Prep not always the best models for the data. Often, these curves can give inaccurate predictions when extrapolated. Non-linear models may be useful when you are Sometimes, you can find that several different types of curves fit closely analysing two-variable to a set of data. Extrapolating to an initial or final state may help data in your statistics determine which model is the most suitable. Also, the mathematical project. model should show a logical relationship between the variables. 190 MHR • Statistics of Two Variables
  • 191. Key Concepts • Some relationships between two variables can be modelled using non-linear regressions such as quadratic, cubic, power, polynomial, and exponential curves. • The coefficient of determination, r 2, is a measure of how well a regression curve fits a set of data. • Sometimes more than one type of regression curve can provide a good fit for data. To be an effective model, however, the curve must be useful for extrapolating beyond the data. Communicate Your Understanding 1. A data set for two variables has a linear correlation coefficient of 0.23. Does this value preclude a strong correlation between the variables? Explain why or why not. 2. A best-fit curve for a set of data has a coefficient of determination of r 2 = 0.76. Describe some techniques you can use to improve the model. Practise 2. For each set of data use software or a graphing calculator to find the equation and A coefficient of determination for a curve of 1. Match each of the following coefficients of best fit. determination with one of the diagrams a) b) c) below. x y x y x y a) 0 b) 0.5 c) 0.9 d) 1 −2.8 0.6 −2.7 1.6 1.1 2.5 i) ii) −3.5 −5.8 −3.5 −3 3.5 11 −2 3 −2.2 3 2.8 8.6 −1 6 −0.5 −0.5 2.3 7 0.2 4 0 1.3 0 1 1 1 0.6 4.7 3.8 14 −1.5 5 −1.8 1.7 1.4 4.2 iii) iv) 1.4 −3.1 −3.8 −7 −4 0.2 0.7 3 −1.3 0.6 −1.3 0.6 −0.3 6.1 0.8 7 3 12 −3.3 −3.1 0.5 2.7 4.1 17 −4 −7 −1 1.5 2.2 5 2 −5.7 −3 −1.1 −2.7 0.4 3.3 Non-Linear Regression • MHR 191
  • 192. Apply, Solve, Communicate Animal Mass (kg) BMR (kJ/day) Frog 0.018 0.050 B Squirrel 0.90 1.0 3. The heights of a stand of pine trees were Cat 3.0 2.6 measured along with the area under the Monkey 7.0 4.0 cone formed by their branches. Baboon 30 14 Height (m) Area (m2) Human 60 25 2.0 5.9 Dolphin 160 44 1.5 3.4 Camel 530 116 1.8 4.8 2.4 8.6 a) Create a scatter plot and explain why 2.2 7.3 Kleiber thought a power-regression 1.2 2.1 curve would fit the data. 1.8 4.9 3.1 14.4 b) Use a power regression to find the equation of the curve of best fit. Can a) Create a scatter plot of these data. you rewrite the equation so that it has b) Determine the correlation coefficient exponents that are whole numbers? and the equation of the line of best fit. Do so, if possible, or explain why not. c) Use a power regression to calculate a c) Is this power equation a good coefficient of determination and an mathematical model for the relationship equation for a curve of best fit. between an animal’s mass and its basal d) Which model do you think is more metabolic rate? Explain why or why not. accurate? Explain why. d) Use the equation of the curve of best fit e) Use the more accurate model to predict to predict the basal metabolic rate of i) the area under a tree whose height is i) a 15-kg dog 2.7 m ii) a 2-tonne whale ii) the height of a tree whose area is 5. Application As a sample of a radioactive 30 m2 element decays into more stable elements, f) Suggest a reason why the height and the amount of radiation it gives off circumference of a tree might be related in decreases. The level of radiation can be the way that the model in part d) suggests. used to estimate how much of the original element remains. Here are measurements 4. Application The biologist Max Kleiber for a sample of radium-227. (1893−1976) pioneered research on the metabolisms of animals. In 1932, he Time (h) Radiation Level (%) determined the relationship between an 0 100 animal’s mass and its energy requirements or 1 37 basal metabolic rate (BMR). Here are data 2 14 for eight animals. 3 5.0 4 1.8 5 0.7 6 0.3 192 MHR • Statistics of Two Variables
  • 193. a) Create a scatter plot for these data. b) Use the equation for this curve of best fit b) Use an exponential regression to to estimate the power level at a distance of find the equation for the curve of best i) 1.0 km from the transmitter fit. ii) 4.0 km from the transmitter c) Is this equation a good model for the iii) 50.0 km from the transmitter radioactive decay of this element? Explain why or why not. 8. Communication Logistic curves are often a d) A half-life is the time it takes for half of good model for population growth. These the sample to decay. Use the regression curves have equations with the form c equation to estimate the half-life of y = ᎏ , where a, b, and c are constants. 1 + ae−bx radium-227. Consider the following data for the bacterial 6. a) Create a time-series graph for the culture in Example 1: pte ha mean starting salary of the graduates Time (h) 0 1 2 3 4 5 C r who find jobs. Describe the pattern Population ? 10 21 43 82 168 m P r oble that you see. Time (h) 6 7 8 9 10 11 b) Use non-linear regression to construct a curve of best fit for the data. Record the Population 320 475 630 775 830 980 equation of the curve and the coefficient Time (h) 12 13 14 15 16 17 of determination. Population 1105 1215 1410 1490 1550 1575 c) Comment on whether this equation is a good model for the graduates’ starting Time (h) 18 19 20 salaries. Population 1590 1600 1600 a) Use software or a graphing calculator 7. An engineer testing the transmitter for a new radio station measures the radiated to find the equation and coefficient of power at various distances from the determination for the logistic curve transmitter. The engineer’s readings are that best fits the data for the bacteria in microwatts per square metre. population from 1 to 20 h. b) Graph this curve on a scatter plot of Distance (km) Power Level (µW/m2) the data. 2.0 510 5.0 78 c) How well does this curve appear to 8.0 32 fit the entire data set? Describe the shape of the curve. 10.0 19 12.0 14 d) Write a brief paragraph to explain 15.0 9 why you think a bacterial population may exhibit this type of growth 20.0 5 pattern. a) Find an equation for a curve of best fit for these data that has a coefficient of determination of at least 0.98. 3.3 Non-Linear Regression • MHR 193
  • 194. 9. Inquiry/Problem Solving The following 11. Inquiry/Problem Solving Use a software table shows the estimated population of a program, such as Microsoft® Excel, to crop-destroying insect. analyse these two sets of data: Year Population (billions) Data Set A Data Set B 1995 100 x y x y 1996 130 2 5 2 6 1997 170 4 7 4 5 1998 220 6 2 7 –4 1999 285 8 5 9 1 2000 375 12 2 2001 490 a) For each set of data, a) Determine an exponential curve of best i) determine the degree of polynomial fit for the population data. regression that will generate a b) Suppose that 100 million of an arachnid perfectly fit regression curve that preys on the insect are imported ii) perform the polynomial regression from overseas in 1995. Assuming the and record the value of r 2 and the arachnid population doubles every year, equation of the regression curve estimate when it would equal 10% of the b) Assess the effectiveness of the best-fit insect population. polynomial curve as a model for the c) What further information would trend of the set of data. you need in order to estimate the c) For data set B, population of the crop-destroying i) explain why the best-fit polynomial insect once the arachnids have been curve is an unsatisfactory model introduced? ii) generate a better model and record d) Write an expression for the size of this the value of r 2 and the equation of population. your new best-fit curve C iii) explain why this curve is a better 10. Use technology to calculate the coefficient model than the polynomial curve of determination for two of the linear found in part a) regression examples in section 3.2. Is there any relationship between these coefficients of determination and the linear correlation coefficients for these examples? 194 MHR • Statistics of Two Variables
  • 195. 3.4 Cause and Effect Usually, the main reason for a correlational study is to find evidence of a cause-and-effect relationship. A health researcher may wish to prove that even mild exercise reduces the risk of heart disease. A chemical company developing an oil additive would like to demonstrate that it improves engine performance. A school board may want to know whether calculators help students learn mathematics. In each of these cases, establishing a strong correlation between the variables is just the first step in determining whether one affects the other. I N V E S T I G AT E & I N Q U I R E : C o r r e l a t i o n Ve r s u s C a u s e a n d E f f e c t 1. List the type of correlation that you would expect to observe between the following pairs of variables. Also list whether you think the correlation is due to a cause-and-effect relationship or some other factor. a) hours spent practising at a golf driving range, golf drive distance b) hours spent practising at a golf driving range, golf score c) size of corn harvest, size of apple harvest d) score on a geometry test, score on an algebra test e) income, number of CDs purchased 2. Compare your list with those of your classmates and discuss any differences. Would you change your list because of factors suggested by your classmates? 3. Suggest how you could verify whether there is a cause-and-effect relationship between each pair of variables. A strong correlation does not prove that the changes in one variable cause changes in the other. There are various types and degrees of causal relationships between variables. Cause-and-Effect Relationship: A change in X produces a change in Y. Such relationships are sometimes clearly evident, especially in physical processes. For example, increasing the height from which you drop an object increases its impact velocity. Similarly, increasing the speed of a production line increases the number of items produced each day (and, perhaps, the rate of defects). 3.4 Cause and Effect • MHR 195
  • 196. Common-Cause Factor: An external variable causes two variables to change in the same way. For example, suppose that a town finds that its revenue from parking fees at the public beach each summer correlates with the local tomato harvest. It is extremely unlikely that cars parked at the beach have any effect on the tomato crop. Instead good weather is a common-cause factor that increases both the tomato crop and the number of people who park at the beach. Reverse Cause-and-Effect Relationship: The dependent and independent variables are reversed in the process of establishing causality. For example, suppose that a researcher observes a positive linear correlation between the amount of coffee consumed by a group of medical students and their levels of anxiety. The researcher theorizes that drinking coffee causes nervousness, but instead finds that nervous people are more likely to drink coffee. Accidental Relationship: A correlation exists without any causal relationship between variables. For example, the number of females enrolled in undergraduate engineering programs and the number of “reality” shows on television both increased for several years. These two variables have a positive linear correlation, but it is likely entirely coincidental. Presumed Relationship: A correlation does not seem to be accidental even though no cause-and-effect relationship or common-cause factor is apparent. For example, suppose you found a correlation between people’s level of fitness and the number of adventure movies they watched. It seems logical that a physically fit person might prefer adventure movies, but it would be difficult to find a common cause or to prove that the one variable affects the other. Example 1 Causal Relationships Classify the relationships in the following situations. a) The rate of a chemical reaction increases with temperature. b) Leadership ability has a positive correlation with academic achievement. c) The prices of butter and motorcycles have a strong positive correlation over many years. d) Sales of cellular telephones had a strong negative correlation with ozone levels in the atmosphere over the last decade. e) Traffic congestion has a strong correlation with the number of urban expressways. 196 MHR • Statistics of Two Variables
  • 197. Solution a) Cause-and-effect relationship: Higher temperatures cause faster reaction rates. b) Presumed relationship: A positive correlation between leadership ability and academic achievement seems logical, yet there is no apparent common-cause factor or cause-and-effect relationship. c) Common-cause factor: Inflation has caused parallel increases in the prices of butter and motorcycles over the years. d) Accidental relationship: The correlation between sales of cellular telephones and ozone levels is largely coincidental. However, it is possible that the chemicals used to manufacture cellular telephones cause a small portion of the depletion of the ozone layer. e) Cause-and-effect relationship and reverse cause-and-effect relationship: Originally expressways were built to relieve traffic congestion, so traffic congestion did lead to the construction of expressways in major cites throughout North America. However, numerous studies over the last 20 years have shown that urban expressways cause traffic congestion by encouraging more people to use cars. As Example 1 demonstrates, several types of causal relationships can be involved in the same situation. Determining the nature of causal relationships can be further complicated by the presence of extraneous variables that affect either the dependent or the independent variable. Here, extraneous means external rather than irrelevant. For example, you might expect to see a strong positive correlation between term marks and final examination results for students in your class since both these variables are affected by each student’s aptitude and study habits. However, there are extraneous factors that could affect the examination results, including the time each student had for studying before the examination, the individual examination schedules, and varying abilities to work well under pressure. In order to reduce the effect of extraneous variables, researchers often compare an experimental group to a control group. These two groups should be as similar as possible, so that extraneous variables will have about the same effect on both groups. The researchers vary the independent variable for the experimental group but not for the control group. Any difference in the dependent variables for the two groups can then be attributed to the changes in the independent variable. 3.4 Cause and Effect • MHR 197
  • 198. Example 2 Using a Control Group A medical researcher wants to test a new drug believed to help smokers overcome the addictive effects of nicotine. Fifty people who want to quit smoking volunteer for the study. The researcher carefully divides the volunteers into two groups, each with an equal number of moderate and heavy smokers. One group is given nicotine patches with the new drug, while the second group uses ordinary nicotine patches. Fourteen people in the first group quit smoking completely, as do nine people in the second group. a) Identify the experimental group, the control group, the independent variable, and the dependent variable. b) Can the researcher conclude that the new drug is effective? c) What further study should the researcher do? Solution a) The experimental group consists of the volunteers being given nicotine patches with the new drug, while the control group consists of the volunteers being given the ordinary patches. The independent variable is the presence of the new drug, and the dependent variable is the number of volunteers who quit smoking. b) The results of the study are promising, but the researcher has not proven that the new drug is effective. The sample size is relatively small, which is prudent for an early trial of a new drug that could have unknown side- effects. However, the sample is small enough that the results could be affected by random statistical fluctuations or extraneous variables, such as the volunteers’ work environments, previous attempts to quit, and the influence of their families and friends. c) Assuming that the new drug does not have any serious side-effects, the researcher should conduct further studies with larger groups and try to select the experimental and control groups to minimize the effect of all extraneous variables. The researcher might also conduct a study with several experimental groups that receive different dosages of the new drug. When designing a study or interpreting a correlation, you often need background knowledge and insight to recognize the causal relationships present. Here are some techniques that can help determine whether a correlation is the result of a cause-and-effect relationship. 198 MHR • Statistics of Two Variables
  • 199. • Use sampling methods that hold the extraneous variables constant. Project Prep • Conduct similar investigations with different samples and check for consistency in the results. In your statistics • Remove, or account for, possible common-cause factors. project, you may wish to consider The later chapters in this book introduce probability theory and some cause-and-effect statistical methods for a more quantitative approach to determining cause- relationships and and-effect relationships. extraneous variables that could affect your study. Key Concepts • Correlation does not necessarily imply a cause-and-effect relationship. Correlations can also result from common-cause factors, reverse cause-and- effect relationships, accidental relationships, and presumed relationships. • Extraneous variables can invalidate conclusions based on correlational evidence. • Comparison with a control group can help remove the effect of extraneous variables in a study. Communicate Your Understanding 1. Why does a strong linear correlation not imply cause and effect? 2. What is the key characteristic of a reverse cause-and-effect relationship? 3. Explain the difference between a common-cause factor and an extraneous variable. 4. Why are control groups used in statistical studies? Practise b) score on physics examination, score on calculus examination A c) increase in pay, job performance 1. Identify the most likely type of causal d) population of rabbits, consumer price relationship between each of the following index pairs of variables. Assume that a strong positive correlation has been observed with e) number of scholarships received, number the first variable as the independent variable. of job offers upon graduation a) alcohol consumption, incidence of f) coffee consumption, insomnia automobile accidents e) funding for athletic programs, number of medals won at Olympic games 3.4 Cause and Effect • MHR 199
  • 200. 2. For each of the following common-cause 6. Application A random survey of students relationships, identify the common-cause at Statsville High School found that their factor. Assume a positive correlation interest in computer games is positively between each pair of variables. correlated with their marks in mathematics. a) number of push-ups performed in one a) How would you classify this causal minute, number of sit-ups performed in relationship? one minute b) Suppose that a follow-up study found b) number of speeding tickets, number of that students who had increased the time accidents they spent playing computer games c) amount of money invested, amount of tended to improve their mathematics money spent marks. Assuming that this study held all extraneous variables constant, would you Apply, Solve, Communicate change your assessment of the nature of the causal relationship? Explain why or 3. A civil engineer examining traffic flow why not. problems in a large city observes that the number of traffic accidents is positively 7. a) The net assets of Custom Industrial correlated with traffic density and concludes Renovations Inc., an industrial that traffic density is likely to be a major construction contractor, has a strong cause of accidents. What alternative negative linear correlation with those of conclusion should the engineer consider? MuchMega-Fun, a toy distributor. How would you classify the causal relationship B between these two variables? 4. Communication An elementary school is b) Suppose that the two companies are both testing a new method for teaching grammar. subsidiaries of Diversified Holdings Ltd., Two similar classes are taught the same which often shifts investment capital material, one with the established method between them. Explain how this additional and the other with the new method. When information could change your both classes take the same test, the class interpretation of the correlation in part a). taught with the established method has somewhat higher marks. 8. Communication Aunt Gisele simply cannot a) What extraneous variables could sleep unless she has her evening herbal tea. influence the results of this study? However, the package for the tea does not list any ingredients known to induce sleep. b) Explain whether the study gives the school Outline how you would conduct a study to enough evidence to reject the new method. determine whether the tea really does help c) What further studies would you people sleep. recommend for comparing the two teaching methods? 9. Find out what a double-blind study is and briefly explain the advantages of using this 5. Communication An investor observes a technique in studies with a control group. positive correlation between the stock price of two competing computer companies. 10. a) The data on page 157 show a positive pte Explain what type of causal relationship is ha correlation between the size of the C r likely to account for this correlation. graduating class and the number of m P r oble 200 MHR • Statistics of Two Variables
  • 201. graduates hired. Does this correlation 12. Search the E-STAT, CANSIM II, or other mean that increasing the number of databases for a set of data on two variables graduates causes a higher demand for with a positive linear correlation that you them? Explain your answer. believe to be accidental. Explain your b) A recession during the first half of the findings and reasoning. 1990s reduced the demand for business C graduates. Review the data on page 157 13. Use a library, the Internet, or other and describe any trends that may be caused by this recession. resources to find information on the Hawthorne effect and the placebo effect. Briefly explain what these effects are, how ACHIEVEMENT CHECK they can affect a study, and how researchers can avoid having their results skewed by Knowledge/ Thinking/Inquiry/ Communication Application these effects. Understanding Problem Solving 11. The table below lists numbers of divorces 14. Inquiry/Problem Solving In a behavioural and personal bankruptcies in Canada for study of responses to violence, an the years 1976 through 1985. experimental group was shown violent Year Divorces Bankruptcies images, while a control group was shown 1976 54 207 10 049 neutral images. From the initial results, the 1977 55 370 12 772 researchers suspect that the gender of the 1978 57 155 15 938 people in the groups may be an extraneous 1979 59 474 17 876 variable. Suggest how the study could be redesigned to 1980 62 019 21 025 1981 67 671 23 036 a) remove the extraneous variable 1982 70 436 30 643 b) determine whether gender is part of the 1983 68 567 26 822 cause-and-effect relationship 1984 65 172 22 022 15. Look for material in the media or on the 1985 61 976 19 752 Internet that incorrectly uses correlational a) Create a scatter plot and classify the evidence to claim that a cause-and-effect linear correlation between the number relationship exists between the two variables. of divorces and the number of Briefly describe bankruptcies. a) the nature of the correlational study b) Perform a regression analysis. Record b) the cause and effect claimed or inferred the equation of the line of best fit and c) the reasons why cause and effect was not the correlation coefficient. properly proven, including any c) Identify an external variable that could extraneous variables that were not be a common-cause factor. accounted for d) Describe what further investigation you d) how the study could be improved could do to analyse the possible relationship between divorces and bankruptcies. 3.4 Cause and Effect • MHR 201
  • 202. 3.5 Critical Analysis Newspapers and radio and television news programs often run stories involving statistics. Indeed, the news media often commission election polls or surveys on major issues. Although the networks and major newspapers are reasonably careful about how they present statistics, their reporters and editors often face tight deadlines and lack the time and mathematical knowledge to thoroughly critique statistical material. You should be particularly careful about accepting statistical evidence from sources that could be biased. Lobby groups and advertisers like to use statistics because they appear scientific and objective. Unfortunately, statistics from such sources are sometimes flawed by unintentional or, occasionally, entirely deliberate bias. To judge the conclusions of a study properly, you need information about its sampling and analytical methods. I N V E S T I G AT E & I N Q U I R E : Statistics in the Media 1. Find as many instances as you can of statistical claims made in the media or on the Internet, including news stories, features, and advertisements. Collect newspaper and magazine clippings, point-form notes of radio and television stories, and printouts of web pages. 2. Compare the items you have collected with those found by your classmates. What proportion of the items provide enough information to show that they used valid statistical methods? 3. Select several of the items. For each one, discuss a) the motivation for the statistical study b) whether the statistical evidence www.mcgrawhill.ca/links/MDM12 justifies the claim being made Visit the above web site and follow the links to The examples in this section illustrate how you learn more about how statistics can be misused. can apply analytical tools to assess the results of Describe two examples of the misuse of statistical studies. statistics. 202 MHR • Statistics of Two Variables
  • 203. Example 1 Sample Size and Technique Test Score Productivity 98 78 A manager wants to know if a new aptitude test accurately predicts 57 81 employee productivity. The manager has all 30 current employees 82 83 write the test and then compares their scores to their 76 44 productivities as measured in the most recent performance reviews. 65 62 72 89 The data is ordered alphabetically by employee surname. In order 91 85 to simplify the calculations, the manager selects a systematic 87 71 sample using every seventh employee. Based on this sample, the 81 76 39 71 manager concludes that the company should hire only applicants 50 66 who do well on the aptitude test. Determine whether the 75 90 manager’s analysis is valid. 71 48 89 80 82 83 Solution 95 72 56 72 A linear regression of the systematic sample produces a line of best 71 90 fit with the equation y = 0.55x + 33 and a correlation coefficient of 68 74 r = 0.98, showing a strong linear correlation between productivity 77 51 59 65 and scores on the aptitude test. Thus, these calculations seem to 83 47 support the manager’s conclusion. However, the manager has made 75 91 the questionable assumption that a systematic sample will be 66 77 48 63 representative of the population. The sample is so small that 61 58 statistical fluctuations could seriously affect the results. 78 55 70 73 y 68 75 84 64 69 Productivity 80 76 72 68 64 0 55 60 65 70 75 80 85 90 95 x Test Score Examine the raw data. A scatter plot with all 30 data points does not show any clear correlation at all. A linear regression yields a line of best fit with the equation y = 0.15x + 60 and a correlation coefficient of only 0.15. y 90 Productivity 80 70 60 50 40 0 30 40 50 60 70 80 90 100 x Test Score 3.5 Critical Analysis • MHR 203
  • 204. Thus, the new aptitude test will probably be useless for predicting employee productivity. Clearly, the sample was far from representative. The manager’s choice of an inappropriate sampling technique has resulted in a sample size too small to make any valid conclusions. In Example 1, the manager should have done an analysis using all of the data available. Even then the data set is still somewhat small to use as a basis for a major decision such as changing the company’s hiring procedures. Remember that small samples are also particularly vulnerable to the effects of outliers. Example 2 Extraneous Variables and Sample Bias An advertising blitz by SuperFast Computer Training Inc. features profiles of some of its young graduates. The number of months of training that these graduates took, their job titles, and their incomes appear prominently in the advertisements. Months of Income Graduate Training ($000) Sarah, software developer 9 85 Zack, programmer 6 63 Eli, systems analyst 8 72 Yvette, computer technician 5 52 Kulwinder, web-site designer 6 66 Lynn, network administrator 4 60 a) Analyse the company’s data to determine the strength of the linear correlation between the amount of training the graduates took and their incomes. Classify the linear correlation and find the equation of the linear model for the data. b) Use this model to predict the income of a student who graduates from the company’s two-year diploma program after 20 months of training. Does this prediction seem reasonable? c) Does the linear correlation show that SuperFast’s training accounts for the graduates’ high incomes? Identify possible extraneous variables. d) Discuss any problems with the sampling technique and the data. Solution a) The scatter plot for income versus months of training shows a definite positive linear correlation. The regression line is y = 5.44x + 31.9, and the correlation coefficient is 0.90. There appears to be a strong positive correlation between the amount of training and income. 204 MHR • Statistics of Two Variables
  • 205. b) As shown in cell C9 in the screen above, substituting 20 months into the linear regression equation predicts an income of approximately y = 5.44(20) + 31.9 = 141 Therefore, the linear model predicts that a graduate who has taken 20 months of training will make about $141 000 a year. This amount is extremely high for a person with a two-year diploma and little or no job experience. The prediction suggests that the linear model may not be accurate, especially when applied to the company’s longer programs. c) Although the correlation between SuperFast’s training and the graduates’ incomes appears to be quite strong, the correlation by itself does not prove that the training causes the graduates’ high incomes. A number of extraneous variables could contribute to the graduates’ success, including experience prior to taking the training, aptitude for working with computers, access to a high- end computer at home, family or social connections in the industry, and the physical stamina to work very long hours. d) The sample is small and could have intentional bias. There is no indication that the individuals in the advertisements were randomly chosen from the population of SuperFast’s students. Quite likely, the company carefully selected the best success stories in order to give potential customers inflated expectations of future earnings. Also, the company shows youthful graduates, but does not actually state that the graduates earned their high incomes immediately after graduation. It may well have taken the graduates years of hard work to reach the income levels listed in the advertisements. Further, the amounts given are incomes, not salaries. The income of a graduate working for a small start-up company might include stock options that could turn out to be worthless. In short, the advertisements do not give you enough information to properly evaluate the data. 3.5 Critical Analysis • MHR 205
  • 206. Example 2 had several fairly obvious extraneous variables. However, extraneous variables are sometimes difficult to recognize. Such hidden or lurking variables can also invalidate conclusions drawn from statistical results. Example 3 Detecting a Hidden Variable An arts council is considering whether to fund the start-up of a local youth orchestra. The council has a limited budget and knows that the number of youth orchestras in the province has been increasing. The council needs to know whether starting another youth orchestra will help the development of young musicians. One measure of the success of such programs is the number of youth-orchestra players who go on to professional orchestras. The council has collected the following data. Year Number of Youth Orchestras Number of Players Becoming Professionals 1991 10 16 1992 11 18 1993 12 20 1994 12 23 1995 14 26 1996 14 32 1997 16 13 1998 16 16 1999 18 20 2000 20 26 a) Does a linear regression allow you to determine whether the council should fund a new youth orchestra? Can you draw any conclusions from other analysis? b) Suppose you discover that one of the country’s professional orchestras went bankrupt in 1997. How does this information affect your analysis? Solution a) A scatter plot of the number of youth-orchestra members who go on to play professionally versus the number of youth orchestras shows that there may be a weak positive linear correlation. The correlation coefficient is 0.16, indicating that the linear correlation is very weak. Therefore, you might conclude that starting another youth orchestra will not help the development of young musicians. However, notice that the data points seem to form two clusters in the scatter plot, one on the left side and the other on the right. This unusual pattern suggests the presence of a hidden variable, which could affect your analysis. You will need more information to determine the nature and effect of the possible hidden variable. 206 MHR • Statistics of Two Variables
  • 207. You have enough data to produce a time-series graph of the numbers of young musicians who go on to professional orchestras. This graph also has two clusters of data points. The numbers rise from 1991 to 1996, drop substantially in 1997, and then rise again. This pattern suggests that something unusual happened in 1997. b) The collapse of a major orchestra means both that there is one less orchestra hiring young musicians and that about a hundred experienced players are suddenly available for work with the remaining professional orchestras. The resulting drop in the number of young musicians hired by professional orchestras could account for the clustering of data points you observed in part a). Because of the change in the number of jobs available for young musicians, it makes sense to analyse the clusters separately. 3.5 Critical Analysis • MHR 207
  • 208. Observe that the two sets of data both exhibit a strong linear correlation. The correlation coefficients are 0.93 for the data prior to 1997 and 0.94 for the data from 1997 on. The number of players who go on to professional orchestras is strongly correlated to the number of youth orchestras. So, funding the new orchestra may be a worthwhile project for the arts council. The presence of a hidden variable, the collapse of a major orchestra, distorted the data and masked the underlying pattern. However, splitting the data into two sets results in smaller sample sizes, so you still have to be cautious about drawing conclusions. When evaluating claims based on statistical studies, you must assess the Project methods used for collecting and analysing the data. Some critical questions Prep are: When collecting • Is the sampling process free from intentional and unintentional bias? and analysing data for your statistics • Could any outliers or extraneous variables influence the results? project, you can • Are there any unusual patterns that suggest the presence of a hidden apply the concepts variable? in this section to ensure that your • Has causality been inferred with only correlational evidence? conclusions are valid. 208 MHR • Statistics of Two Variables
  • 209. Key Concepts • Although the major media are usually responsible in how they present statistics, you should be cautious about accepting any claim that does not include information about the sampling technique and analytical methods used. • Intentional or unintentional bias can invalidate statistical claims. • Small sample sizes and inappropriate sampling techniques can distort the data and lead to erroneous conclusions. • Extraneous variables must be eliminated or accounted for. • A hidden variable can skew statistical results and yet still be hard to detect. Communicate Your Understanding 1. Explain how a small sample size can lead to invalid conclusions. 2. A city councillor states that there are problems with the management of the police department because the number of reported crimes in the city has risen despite increased spending on law enforcement. Comment on the validity of this argument. 3. Give an example of a hidden variable not mentioned in this section, and explain why this variable would be hard to detect. Apply, Solve, Communicate B A 3. A student compares height and grade average with four friends and collects 1. An educational researcher discovers that the following data. levels of mathematics anxiety are negatively correlated with attendance in mathematics Height (cm) Grade Average (%) class. The researcher theorizes that poor 171 73 attendance causes mathematics anxiety. 145 91 Suggest an alternate interpretation of the 162 70 evidence. 159 81 178 68 2. A survey finds a correlation between the proportion of high school students who own From this table, the student concludes that a car and the students’ ages. What hidden taller students tend to get lower marks. variable could affect this study? a) Does a regression analysis support the student’s conclusion? b) Why are the results of this analysis invalid? c) How can the student get more accurate results? 3.5 Critical Analysis • MHR 209
  • 210. 4. Inquiry/Problem Solving A restaurant chain b) Is this prediction realistic? Explain. randomly surveys its customers several times c) Explain why this model generated such a year. Since the surveys show that the level an inaccurate prediction despite having of customer satisfaction is rising over time, a high value for the coefficient of the company concludes that its customer determination. service is improving. Discuss the validity of d) Suggest methods Gina could use to make the surveys and the conclusion based on a more accurate prediction. these surveys. 7. Communication Find a newspaper or 5. Application A teacher offers the following data magazine article, television commercial, or to show that good attendance is important. web page that misuses statistics of two Days Absent Final Grade variables. Perform a critical analysis using 8 72 the techniques in this chapter. Present your 2 75 findings in a brief report. 0 82 8. Application A manufacturing company keeps 11 68 records of its overall annual production and 15 66 its number of employees. Data for a ten-year 20 30 period are shown below. A student with a graphing calculator points Year Number of Employees Production (000) out that the data indicate that anyone who 1992 158 75 misses 17 days or more is in danger of 1993 165 81 failing the course. 1994 172 84 a) Show how the student arrived at this 1995 148 68 conclusion. 1996 130 58 b) Identify and explain the problems that 1997 120 51 make this conclusion invalid. 1998 98 50 c) Outline statistical methods to avoid these 1999 105 57 problems. 2000 110 62 6. Using a graphing calculator, Gina found the 2001 120 70 apte h cubic curve of best fit for the salary data in a) Create a scatter plot to see if there is a C r the table on page 157. This curve has a linear correlation between annual m P r oble coefficient of determination of 0.98, production and number of employees. indicating an almost perfect fit to the data. Classify the correlation. The equation of the cubic curve is b) At some point, the company began to lay starting salary off workers. When did these layoffs begin? = 0.0518y3 – 310y2 + 618 412y – 411 344 091 c) Does the scatter plot suggest the where the salary is given in thousands of presence of a hidden variable? Could the dollars and y is the year of graduation. layoffs account for the pattern you see? a) What mean starting salary does this Explain why or why not. model predict for Gina’s class when they d) The company’s productivity is its annual graduate in 2005? production divided by the number of 210 MHR • Statistics of Two Variables
  • 211. employees. Create a time-series graph comment on any evidence of a hidden for the company’s productivity. variable. Conduct further research to e) Find the line of best fit for the graph in determine if there are any hidden variables. part d). Write a brief report outlining your analysis and conclusions. f) The company has adopted a better management system. When do you think 10. Inquiry/Problem Solving A study conducted the new system was implemented? by Stanford University found that Explain your reasoning. behavioural counselling for people who had suffered a heart attack reduced the risk of a C further heart attack by 45%. Outline how 9. Search E-STAT, CANSIM II, or other you would design such a study. List the sources for time-series data for the price of independent and dependent variables you a commodity such as gasoline, coffee, or would use and describe how you would computer memory. Analyse the data and account for any extraneous variables. Career Connection Economist Economists apply statistical methods to develop mathematical models of the production and distribution of wealth. Governments, large businesses, and consulting firms are employers of economists. Some of the functions performed by an economist include • recognizing and interpreting domestic and international market trends • using supply and demand analysis to assess market potential and set prices • identifying factors that affect economic growth, such as inflation and unemployment • advising governments on fiscal and monetary policies • optimizing the economic activity of financial institutions and large businesses Typically, a bachelor’s degree in economics is necessary to enter this field. However, many positions require a master’s or doctorate degree or specialized training. Since economists often deal with large amounts of data, a strong background in statistics and an ability to work with computers are definite assets. An economist can expect to earn a comfortable living. Most employment opportunities for economists are in large cities. The current www.mcgrawhill.ca/links/MDM12 demand for economists is reasonably strong and likely to remain so for the foreseeable Visit the above web site and follow the links to future, as governments and large businesses will learn more about a career as an economist continue to need the information and analysis and other related careers. that economists provide. 3.5 Critical Analysis • MHR 211
  • 212. Review of Key Concepts 3.1 Scatter Plots and Linear Correlation a) Create a scatter plot for these data. Refer to the Key Concepts on page 167. Classify the linear correlation. b) Determine the correlation coefficient. 1. a) Classify the linear correlation in each scatter plot shown below. c) Can you make any conclusions about the effect that watching television has on y academic achievement? Explain. 14 12 10 3.2 Linear Regression 8 Refer to the Key Concepts on page 179. 6 3. Use the method of least squares to find the 4 2 equation for the line of best fit for the data in question 2. 0 2 4 6 8 10 12 14 x 4. The scores for players’ first and second y 10 games at a bowling tournament are shown 8 below. 6 First Game 169 150 202 230 187 177 164 4 Second Game 175 162 195 241 185 235 171 2 a) Create a scatter plot for these data. 0 2 4 6 8 10 12 14 x b) Determine the correlation coefficient y and the line of best fit. 10 c) Identify any outliers. 8 d) Repeat part b) with the outliers removed. 6 4 e) A player scores 250 in the first game. 2 Use both linear models to predict this 0 x player’s score for the second game. 2 4 6 8 10 12 14 How far apart are the two predictions? b) Determine the correlation coefficient for data points in the scatter plots in part a). 3.3 Non-Linear Regression c) Do these correlation coefficients agree Refer to the Key Concepts on page 191. with your answers in part a)? 5. An object is thrown straight up into the air. 2. A survey of a group of randomly selected The table below shows the height of the students compared the number of hours of object as it ascends. television they watched per week with their Time (s) 0 0.1 0.2 0.3 0.4 0.5 0.6 grade averages. Height (m) 0 1 1.8 2.6 3.2 3.8 4.2 Hours Per Week 12 10 5 3 15 16 8 a) Create a scatter plot for these data. Grade Average (%) 70 85 82 88 65 75 68 212 MHR • Statistics of Two Variables
  • 213. b) Perform a non-linear regression for these 8. a) Explain the relationship between data. Record the equation of the curve of experimental and control groups. best fit and the coefficient of b) Why is a control group needed in determination. some statistical studies? c) Use your model to predict the maximum height of the object. 9. a) Explain the difference between an accidental relationship and a presumed d) Use your model to predict how long the relationship. object will be in the air. b) Provide an example of each. e) Do you think that your model is accurate? Explain. 10. The price of eggs is positively correlated with wages. Explain why you cannot 6. The table shows the Time (s) Distance (m) conclude that raising the price of eggs distance travelled by a 0 0 should produce a raise in pay. car as a function of time. 2 6 a) Determine a curve 4 22 11. An educational researcher compiles data on of best fit to model 6 50 Internet use and scholastic achievement for the data. 8 90 a random selection of students, and observes 10 140 b) Do you think the a strong positive linear correlation. She 12 190 equation for this concludes that Internet use improves student 14 240 curve of best fit is 16 290 grades. Comment on the validity of this a good model for 18 340 conclusion. the situation? 20 380 Explain your 22 410 3.5 Critical Analysis reasoning. 24 430 Refer to the Key Cconcepts on page 209. c) Describe what the 26 440 28 440 12. A teacher is trying to determine whether a driver did between new spelling game enhances learning. In 0 and 28 s. his gifted class, he finds a strong positive correlation between use of the game and 3.4 Cause and Effect spelling-test scores. Should the teacher Refer to the Key Concepts on page 199. recommend the use of the game in all 7. Define or explain the following terms and English classes at his school? Explain your provide an example of each one. answer. a) common-cause factor 13. a) Explain what is meant by the term hidden b) reverse cause-and-effect relationship variable. c) extraneous variable b) Explain how you might detect the presence of a hidden variable in a set of data. Review of Key Concepts • MHR 213
  • 214. Chapter Test ACHIEVEMENT CHART Knowledge/ Thinking/Inquiry/ Category Communication Application Understanding Problem Solving Questions All 5, 7, 10 1, 5, 6, 8, 10 3, 4, 7, 10 1. Explain or define each of the following d) Use this model to predict the average terms. word length in a book recommended a) perfect negative linear correlation for 12-year olds. b) experimental research Use the following information in order to c) outlier answer questions 4–6. d) extraneous variable Jerome has kept track of the hours he spent e) hidden variable studying and his marks on examinations. 2. Match the following. Subject Hours Studied Mark Mathematics, grade 9 5 70 Correlation Type Coefficient, r English, grade 9 3 65 a) strong negative linear 1 Science, grade 9 4 68 b) direct 0.6 Geography, grade 9 4 72 c) weak positive linear 0.3 French, grade 9 2 38 d) moderate positive linear −0.8 Mathematics, grade 10 7 74 e) perfect negative linear −1 English, grade 10 5 69 Science, grade 10 6 71 3. The following set of data relates mean word History, grade 10 5 75 length and recommended age level for a set Mathematics, grade 11 12 76 of children’s books. English, grade 11 9 74 Recommended Age Mean Word Length Physics, grade 11 14 78 4 3.5 6 5.5 4. a) Create a scatter plot for Jerome’s data and classify the linear correlation. 5 4.6 6 5.0 b) Perform a regression analysis. Identify 7 5.2 the equation of the line of best fit as y1, and record the correlation coefficient. 9 6.5 8 6.1 c) Identify any outliers. 5 4.9 d) Repeat part b) with the outlier removed. a) Create a scatter plot and classify the Identify this line as y2. linear correlation. 5. Which of the two linear models found in b) Determine the correlation coefficient. question 4 gives a more optimistic c) Determine the line of best fit. prediction for Jerome’s upcoming biology examination? Explain. 214 MHR • Statistics of Two Variables
  • 215. 6. a) Identify at least three extraneous a) Create a scatter plot for the data. variables in Jerome’s study. b) Perform a quadratic regression. Record b) Suggest some ways that Jerome might the equation of the curve of best fit and improve the validity of his study. the coefficient of determination. c) Repeat part b) for an exponential 7. A phosphorescent material can glow in the regression. dark by absorbing energy from light and then gradually re-emitting it. The following d) Compare how well these two models fit table shows the light levels for a the data. phosphorescent plastic. e) According to each model, what will be Time (h) Light Level (lumens) the light level after 10 h? 0 0.860 f) Which of these two models is superior 1 0.695 for extrapolating beyond 6 h? Explain. 2 0.562 8. Explain how you could minimize the effects 3 0.455 of extraneous variables in a correlation study. 4 0.367 5 0.305 9. Provide an example of a reverse cause-and- 6 0.247 effect relationship. ACHIEVEMENT CHECK Knowledge/Understanding Thinking/Inquiry/Problem Solving Communication Application 10. The table shown on the right contains data Licensed Number of % of Drivers in Age from the Ontario Road Safety Annual Report Age Drivers Collisions Group in Collisions for 1999. 16 85 050 1 725 2.0 a) Organize the data so that the age intervals 17 105 076 7 641 7.3 are consistent. Create a scatter plot of the 18 114 056 9 359 8.2 proportion of drivers involved in collisions 19 122 461 9 524 7.8 versus age. 20 123 677 9 320 7.5 b) Perform a regression analysis. Record the 21–24 519 131 36 024 6.9 equations of the curves of best fit for each 25–34 1 576 673 90 101 5.7 regression you try as well as the coefficient 35–44 1 895 323 90 813 4.8 of determination. 45–54 1 475 588 60 576 4.1 c) In Ontario, drivers over 80 must take vision 55–64 907 235 31 660 3.5 and knowledge tests every two years to 65–74 639 463 17 598 2.8 renew their licences. However, these drivers 75 and no longer have to take road tests as part of older 354 581 9 732 2.7 the review. Advocacy groups for seniors had Total 7 918 314 374 073 4.7 lobbied the Ontario government for this change. How could such groups have used your data analysis to support their position? Chapter Test • MHR 215
  • 216. Statistics Project Wrap-Up Implementing Your Action Plan Suggested Resources 1. Look up the most recent census data from • Statistics Canada web sites and publications Statistics Canada. Pick a geographical • Embassies and consulates region and study the data on age of all • United Nations web sites and publications respondents by gender. Conjecture a such as UNICEF’s CyberSchoolbus and relationship between age and the relative World Health Organization reports numbers of males and females. Use a table and a graph to organize and present the • Statistical software (the Fathom™ sample data. Does the set of data support your documents include census data for Beverly conjecture? Hills, California) • Spreadsheets 2. You may want to compare the data you • Graphing calculators analysed in step 1 to the corresponding data for other regions of Canada or for other countries. Identify any significant similarities or differences between the data sets. Suggest reasons for any differences www.mcgrawhill.ca/links/MDM12 you notice. Visit the web site above to find links to various 3. Access data on life expectancies in Canada census databases. for males and females from the 1920s to the present. Do life expectancies appear to be changing over time? Is there a correlation between these two variables? Evaluating Your Project If so, use regression analysis to predict To help assess your own project, consider the future life expectancies for males and following questions. females in Canada. 1. Are the data you selected appropriate? 4. Access census data on life expectancies in 2. Are your representations of the data the various regions of Canada. Select effective? another attribute from the census data and conjecture whether there is a correlation 3. Are the mathematical models that you used between this variable and life expectancies. reliable? Analyse data from different regions to see if the data support your conjecture. 4. Who would be interested in your findings? Is there a potential market for this information? 216 MHR • Statistics Project
  • 217. 5. Are there questions that arose from your Presentation research that warrant further investigation? Present the findings of your investigation in How would you go about addressing these one or more of the following forms: issues in a future project? • written report 6. If you were to do this project again, what • oral presentation would you do differently? Why? • computer presentation (using software such as Corel® Presentations™ or Microsoft® Section 9.4 describes methods for evaluating PowerPoint®) your own work. • web page • display board Remember to include a bibliography. See section 9.5 and Appendix D for information on how to prepare a presentation. Preparing for the Culminating Project Applying Project Skills be the focus of your project and begun to Throughout this statistics project, you have gather relevant data. Section 9.2 provides developed skills in statistical research and suggestions to help you clearly define your analysis that may be helpful in preparing task. Your next steps are to develop and your culminating project: implement an action plan. • making a conjecture or hypothesis Make sure there are enough data to support • using technology to access, organize, and your work. Decide on the best way to analyse data organize and present the data. Then, determine what analysis you need to do. As • applying a variety of statistical tools you begin to work with the data, you may • comparing two sets of data find that they are not suitable or that further • presenting your findings research is necessary. Your analysis may lead to a new approach or topic that you would Keeping on Track like to pursue. You may find it necessary to At this point, you should have a good idea of refine or alter the focus of your project. Such the basic nature of your culminating project. changes are a normal part of the development You should have identified the issue that will and implementation process. Refine/Redefine Define the Define Develop an Implement Evaluate Your Prepare a Present Your Constructively Problem Your Task Action Plan Your Action Investigation Written Investigation Critique the Plan and Its Results Report and Its Results Presentations of Others Statistics Project: Wrap-Up • MHR 217
  • 218. Cumulative Review: Chapters 1 to 3 4 Cumulative Review: Chapters 3 and 5. Classify the type of linear correlation that ΄ ΅ 7 3 you would expect for each pair of variables. 1. Let A = 0 –2 , B = 8 –5 4 –5 ΄ 4 ΅ 1 , and a) air temperature, altitude b) income, athletic ability ΄ ΅ –8 0 C= 5 6 . Calculate, if possible, c) people’s ages from 1 to 20 years, their 9 –3 masses a) –2(A + C) b) AC d) people’s ages from 21 to 40 years, their masses c) (BA)t d) B2 e) C2 f) B –1 6. Identify the most likely causal relationship between each of the following pairs of variables. 2. a) Describe the iterative process used to a) grade point average, starting salary upon generate the table below. graduation b) Continue the process until all the cells b) grade in chemistry, grade in physics are filled. c) sales of symphony tickets, carrot harvest 17 16 15 14 13 d) monthly rainfall, monthly umbrella sales 18 5 4 3 12 6 1 2 11 7. a) Sketch a map that can be coloured using 7 8 9 10 only three colours. b) Reconfigure your map as a network. 8. State whether each of the following 3. Which of the following would you consider networks is to be databases? Explain your reasoning. i) connected ii) traceable iii) planar a) a novel Provide evidence for your decisions. b) school attendance records a) b) P B T c) the home page of a web site d) an advertising flyer from a department A store C Q S 4. What sampling techniques are most likely to be used for the following surveys? Explain D R each of your choices. 9. Use a tree diagram to represent the a) a radio call-in show administrative structure of a school that has b) a political poll a principal, vice-principals, department c) a scientific study heads, assistant heads, and teachers. 218 MHR • Cumulative Review: Chapters 1 to 3
  • 219. 10. A renowned jazz pianist living in Toronto b) Create a histogram and a cumulative- often goes on tours in the United States. For frequency diagram for the data. the tour shown below, which city has the c) What proportion of the families surveyed most routes earn an annual income of $60 000 or less? a) with exactly one stopover? 13. Classify the bias in each of the following b) with no more than two stopovers? situations. Explain your reasoning in each case. Toronto a) At a financial planning seminar, the Buffalo audience were asked to raise their hands Detroit if they had ever considered declaring New York Cleveland bankruptcy. Chicago Philadelphia b) A supervisor asked an employee if he Pittsburgh would mind working late for a couple Washington of hours on Friday evening. c) A survey asked neighbourhood dog- 11. The following are responses to a survey that owners if dogs should be allowed to run asked: “On average, how many hours per free in the local park. week do you read for pleasure?” d) An irascible talk show host listed the 1 3 0 0 7 2 0 1 10 5 2 2 2 0 1 4 0 8 3 1 3 mayor’s blunders over the last year and 0 0 2 15 4 9 1 6 7 0 3 3 14 5 7 0 1 1 0 10 0 invited listeners to call in and express Use a spreadsheet to their opinions on whether the mayor should resign. a) sort the data from smallest to largest value b) determine the mean hours of pleasure 14. The scores in a recent bowling tournament reading are shown in the following table. c) organize the data into a frequency table 150 260 213 192 176 204 138 214 298 188 with appropriate intervals 168 195 225 170 260 254 195 177 149 224 d) make a histogram of the information in 260 222 167 182 207 221 185 163 112 189 part c) a) Calculate the mean, median, and mode 12. The annual incomes of 40 families surveyed for this distribution. Which measure at random are shown in the table. would be the most useful? Which would be the least useful? Explain your choices. Income ($000) b) Determine the standard deviation, first 28.5 38 61 109 42 56 19 quartile, third quartile, and interquartile 27 44.5 81 36 39 51 40.5 67 28 60 87 58 120 111 range. 73 65 34 54 16.5 135 70.5 c) Explain what each of the quantities in part 59 47 92 38 55 84.5 107 b) tells you about the distribution of scores. 71 59 26.5 76 50 d) What score is the 50th percentile for this a) Group these data into 8 to 12 intervals distribution? and create a frequency table. Cumulative Review: Chapters 1 to 3 • MHR 219
  • 220. e) Is the player who scored 222 above the a) Create a time-series graph for these data. 80th percentile? Explain why or why not. b) Based on this graph, what level of sales would you predict for 2003? 15. The players on a school baseball team compared their batting averages and the c) List three factors that could affect the hours they spent at the batting practice. accuracy of your prediction. Batting Average Practice Hours d) Compute an index value for the sales 0.220 20 each year using the 1997 sales as a base. 0.215 18 What information do the index values 0.185 15 provide? 0.170 14 e) Suppose that this salesperson is thinking 0.200 18 of changing jobs. Outline how she could 0.245 22 use the sales index to convince other 0.230 19 employers to hire her. 0.165 15 0.205 17 18. The following time-series graph shows the a) Identify the independent variable and Consumer Price Index (CPI) for the period dependent variable. Explain your choices. 1971 to 2001. Consumer Price Index (CPI) 150 b) Produce a scatter plot for the data and classify the linear correlation. c) Determine the correlation coefficient and (1992=100) 100 the equation of the line of best fit. d) Use this linear model to predict the batting average for players who had 50 batting practice for i) 16 h ii) 13 h iii) 35 h e) Discuss how accurate you think each of 1975 1980 1985 1990 1995 2000 these predictions will be. Year 16. Describe a method you could use to detect a) What is the base for this index? When outliers in a sample. did the CPI equal half of this base value? 17. A bright, young car salesperson has made the b) Approximately how many times did the following gross sales with her first employer. average price of goods double from 1971 Year Gross Sales ($ millions) to 1992? 1997 0.8 c) Which decade on this graph had the 1998 1.1 highest rate of inflation? Explain your 1999 1.6 answer. 2000 2.3 d) Estimate the overall rate of inflation for 2001 3.5 the period from 1971 to 2001. 2002 4.7 220 MHR • Cumulative Review: Chapters 1 to 3
  • 221. Probability Project Designing a Game Background Many games introduce elements of chance with random processes. For example, card games use shuffled cards, board games often use dice, and bingo uses randomly selected numbers. Your Task Design and then analyse a game for two or more players, involving some form of random process. One of the players may assume the role of dealer or game master. Developing an Action Plan You will need to decide on one or more instruments of chance, such as dice, cards, coins, coloured balls, a random-number generator, a spinner, or a nail maze. Recommend a method of tracking progress or keeping score, such as a game board or tally sheet. Create the rules of the game. Submit a proposal to your teacher outlining the concept and purpose of your game. Probability Project: Introduction • MHR <<Section number and title>> 221
  • 222. 4 4 PT ER ER Permutations and Organized CHA Counting Specific Expectations Section Represent complex tasks or issues, using diagrams. 4.1 Solve introductory counting problems involving the additive and 4.1, 4.2, 4.3 multiplicative counting principles. Express the answers to permutation and combination problems, using 4.2, 4.3 standard combinatorial symbols. Evaluate expressions involving factorial notation, using appropriate 4.2, 4.3 methods. Solve problems, using techniques for counting permutations where some 4.3 objects may be alike. Identify patterns in Pascal’s triangle and relate the terms of Pascal’s 4.4, 4.5 n triangle to values of ΂ r ΃, to the expansion of a binomial, and to the solution of related problems. Communicate clearly, coherently, and precisely the solutions to counting 4.1, 4.2, 4.3, problems. 4.4, 4.5
  • 223. Chapter Problem Students’ Council Elections 1. In how many ways could the positions Most high schools in Ontario have a of president and vice-president be filled students’ council comprised of students by these ten students if all ten are from each grade. These students are elected eligible for these positions? How many representatives, and a part of their function ways are there if only the grades 11 and is to act as a liaison between the staff and 12 students are eligible? the students. Often, these students are 2. The grade representatives must instrumental in fundraising and in represent their current grade level. coordinating events, such as school dances In how many ways could the grade and sports. representative positions be filled? A students’ council executive could consist of a You could answer both of these questions president, vice-president, secretary, treasurer, by systematically listing all the possibilities social convenor, fundraising chair, and four and then counting them. In this chapter, grade representatives. Suppose ten students you will learn easier and more powerful have been nominated to fill these positions. techniques that can also be applied to much Five of the nominees are from grade 12, three more complex situations. are from grade 11, and the other two are a grade 9 and a grade 10 student.
  • 224. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Tree diagrams Draw a tree diagram to c) 3 by 5 grid? d) 4 by 5 grid? illustrate the number of ways a quarter, a dime, and a nickel can come up heads or tails if you toss one after the other. 2. Tree diagrams a) Draw a tree diagram to illustrate the possible outcomes of tossing a coin and 5. Evaluating expressions Evaluate each rolling a six-sided die. expression given x = 5, y = 4, and z = 3. b) How many possible outcomes are there? 8y(x + 2)( y + 2)(z + 2) a) ᎏᎏᎏ 3. Number patterns The manager of a grocery (x − 3)( y + 3)(z + 2) store asks a stock clerk to arrange a display (x − 2)3( y + 2)2(z + 1)2 b) ᎏᎏᎏ of canned vegetables in a triangular pyramid 2 y(x + 1)( y − 1) like the one shown. Assume all cans are the (x + 4)( y − 2)(z + 3) (x − 1)2(z + 1)y same size and shape. c) ᎏᎏᎏ + ᎏᎏ ( y − 1)(x − 3)z 4 (x − 3) ( y + 4) 6. Order of operations Evaluate. a) 5(4) + (–1)3(3)2 (10 − 2)2(10 − 3)2 b) ᎏᎏᎏ (10 − 2) − (10 − 3) 2 2 a) How many cans is the tallest complete 6(6 − 1)(6 − 2)(6 − 3)(6 − 4)(6 − 5) pyramid that the clerk can make with c) ᎏᎏᎏᎏ 3(3 − 1)(3 − 2) 100 cans of vegetables? 50(50 − 1)(50 − 2)…(50 − 49) b) How many cans make up the base level d) ᎏᎏᎏᎏ 48(48 − 1)(48 − 2)…(48 − 47) of the pyramid in part a)? 12 × 11 × 10 × 9 10 × 9 × 8 × 7 c) How many cans are in the full pyramid e) ᎏᎏ + ᎏᎏ 2 4 6 2 in part a)? 8×7×6×5 d) What is the sequence of the numbers of − ᎏᎏ 42 cans in the levels of the pyramid? 7. Simplifying expressions Simplify. 4. Number patterns What is the greatest x2 − xy + 2x (4x + 8)2 possible number of rectangles that can a) ᎏᎏ b) ᎏ 2x 16 be drawn on a 14(3x2 + 6) a) 1 by 5 grid? b) 2 by 5 grid? c) ᎏᎏ 7×6 x(x − 1)(x − 2)(x − 3) d) ᎏᎏᎏ 2 x − 2x 2y + 1 16y + 4 e) ᎏ + ᎏ x 4x 224 MHR • Permutations and Organized Counting
  • 225. 4.1 Organized Counting The techniques and mathematical logic for counting possible arrangements or outcomes are useful for a wide variety of applications. A computer programmer writing software for a game or industrial process would use such techniques, as would a coach planning a starting line-up, a conference manager arranging a schedule of seminars, or a school board trying to make the most efficient use of its buses. Combinatorics is the branch of mathematics dealing with ideas and methods for counting, especially in complex situations. These techniques are also valuable for probability calculations, as you will learn in Chapter 6. I N V E S T I G AT E & I N Q U I R E : Licence Plates Until 1997, most licence plates for passenger cars in Ontario had three numbers followed by three letters. Suppose the provincial government had wanted all the vehicles registered in Ontario to have plates with the letters O, N, and T. 1. Draw a diagram to illustrate all the possibilities for arranging these three letters assuming that the letters can be repeated. How many possibilities are there? 2. How could you calculate the number of possible three-letter groups without listing them all? 3. Predict how many three-letter groups the letters O, N, T, and G can form. 4. How many three-letter groups do you think there would be if you had a choice of five letters? 5. Suggest a general strategy for counting all the different possibilities in situations like those above. 4.1 Organized Counting • MHR 225
  • 226. When you have to make a series of choices, you can usually determine the total number of possibilities without actually counting each one individually. Example 1 Travel Itineraries Martin lives in Kingston and is planning a trip to Vienna, Austria. He checks a web site offering inexpensive airfares and finds that if he travels through London, England, the fare is much lower. There are three flights available from Toronto to London and two flights from London to Vienna. If Martin can take a bus, plane, or train from Kingston to Toronto, how many ways can he travel from Kingston to Vienna? Solution Martin's Choices You can use a tree diagram to illustrate and count Martin’s choices. Flight A Flight 1 This diagram suggests another way to determine the number Flight 2 Flight 1 of options Martin has for his trip. Bus Flight B Flight 2 Flight 1 Flight C Choices for the first portion of trip: 3 Flight 2 Choices for the second portion of trip: 3 Flight A Flight 1 Flight 2 Choices for the third portion of trip: 2 Flight 1 Total number of choices: 3 × 3 × 2 = 18 Train Flight B Flight 2 Flight 1 Flight C In all, Martin has 18 ways to travel from Kingston to Vienna. Flight 2 Flight 1 Flight A Flight 2 Flight 1 Plane Flight B Flight 2 Flight 1 Flight C Flight 2 Example 2 Stereo Systems Javon is looking at stereos in an electronics store. The store has five types of receivers, four types of CD players, and five types of speakers. How many different choices of stereo systems does this store offer? Solution For each choice of receiver, Javon could choose any one of the CD players. Thus, there are 5 × 4 = 20 possible combinations of receivers and CD players. For each of these combinations, Javon could then choose one of the five kinds of speakers. The store offers a total of 5 × 4 × 5 = 100 different stereo systems. 226 MHR • Permutations and Organized Counting
  • 227. These types of counting problems illustrate the fundamental or multiplicative counting principle: If a task or process is made up of stages with separate choices, the total number of choices is m × n × p × …, where m is the number of choices for the first stage, n is the number of choices for the second stage, p is the number of choices for the third stage, and so on. Example 3 Applying the Fundamental Counting Principle Project Prep A school band often performs at benefits and other functions outside the school, so its members are looking into buying band uniforms. The band You can use the committee is considering four different white shirts, dress pants in grey, navy, fundamental or or black, and black or grey vests with the school crest. How many different multiplicative designs for the band uniform is the committee considering? counting principle to help design Solution the game for your probability First stage: choices for the white shirts, m = 4 project. Second stage: choices for the dress pants, n = 3 Third stage: choices for the vests, p = 2 The total number of possibilities is m×n×p=4×3×2 = 24 The band committee is considering 24 different possible uniforms. In some situations, an indirect method makes a calculation easier. Example 4 Indirect Method Leora, a triathlete, has four pairs of running shoes loose in her gym bag. In how many ways can she pull out two unmatched shoes one after the other? Solution You can find the number of ways of picking unmatched shoes by subtracting the number of ways of picking matching ones from the total number of ways of picking any two shoes. There are eight possibilities when Leora pulls out the first shoe, but only seven when she pulls out the second shoe. By the fundamental counting principle, the number of ways Leora can pick any two shoes out of the bag is 8 × 7 = 56. She could pick each of the matched pairs in two ways: left shoe then right shoe or right shoe then left shoe. Thus, there are 4 × 2 = 8 ways of picking a matched pair. Leora can pull out two unmatched shoes in 56 − 8 = 48 ways. 4.1 Organized Counting • MHR 227
  • 228. Sometimes you will have to count several subsets of possibilities separately. Example 5 Signal Flags Sailing ships used to send messages with signal flags flown from their masts. How many different signals are possible with a set of four distinct flags if a minimum of two flags is used for each signal? Solution A ship could fly two, three, or four signal flags. Signals with two flags: 4 × 3 = 12 Signals with three flags: 4 × 3 × 2 = 24 Signals with four flags: 4 × 3 × 2 × 1 = 24 Total number of signals: 12 + 24 + 24 = 60 Thus, the total number of signals possible with these flags is 60. In Example 5, you were counting actions that could not occur at the same time. When counting such mutually exclusive actions, you can apply the additive counting principle or rule of sum: If one mutually exclusive action can occur in m ways, a second in n ways, a third in p ways, and so on, then there are m + n + p … ways in which one of these actions can occur. Key Concepts • Τree diagrams are a useful tool for organized counting. • Ιf you can choose from m items of one type and n items of another, there are m × n ways to choose one item of each type (fundamental or multiplicative counting principle). • If you can choose from either m items of one type or n items of another type, then the total number of ways you can choose an item is m + n (additive counting principle). • Both the multiplicative and the additive counting principles also apply to choices of three or more types of items. • Sometimes an indirect method provides an easier way to solve a problem. 228 MHR • Permutations and Organized Counting
  • 229. Communicate Your Understanding 1. Explain the fundamental counting principle in your own words and give an example of how you could apply it. 2. Are there situations where the fundamental counting principle does not apply? If so, give one example. 3. Can you always use a tree diagram for organized counting? Explain your reasoning. Practise Apply, Solve, Communicate A 6. Ten different books and four different pens 1. Construct a tree diagram to illustrate the are sitting on a table. One of each is possible contents of a sandwich made from selected. Should you use the rule of sum or white or brown bread, ham, chicken, or the product rule to count the number of beef, and mustard or mayonnaise. How possible selections? Explain your reasoning. many different sandwiches are possible? B 2. In how many ways can you roll either a sum 7. Application A grade 9 student may build a of 4 or a sum of 11 with a pair of dice? timetable by selecting one course for each 3. In how many ways can you draw a 6 or a period, with no duplication of courses. face card from a deck of 52 playing cards? Period 1 must be science, geography, or physical education. Period 2 must be art, 4. How many ways are there to draw a 10 or music, French, or business. Periods 3 and 4 a queen from the 24 cards in a euchre deck, must each be mathematics or English. which has four 10s and four queens? a) Construct a tree diagram to illustrate the 5. Use tree diagrams to answer the following: choices for a student’s timetable. a) How many different soccer uniforms are b) How many different timetables could a possible if there is a choice of two types student choose? of shirts, three types of shorts, and two types of socks? 8. A standard die is rolled five times. How many different outcomes are possible? b) How many different three-scoop cones can be made from vanilla, chocolate, and 9. A car manufacturer offers three kinds of strawberry ice cream? upholstery material in five different colours c) Suppose that a college program has six for this year’s model. How many upholstery elective courses, three on English options would a buyer have? Explain your literature and three on the other arts. If reasoning. the college requires students to take one of the English courses and one of the 10. Communication In how many ways can a other arts courses, how many pairs of student answer a true-false test that has six courses will satisfy these requirements? questions. Explain your reasoning. 4.1 Organized Counting • MHR 229
  • 230. 11. The final score of a soccer game is 6 to 3. 17. Ten students have been nominated for pte How many different scores were possible ha a students’ council executive. Five of the C r at half-time? nominees are from grade 12, three are m P r oble from grade 11, and the other two are 12. A large room has a bank of five windows. from grades 9 and 10. Each window is either open or closed. How a) In how many ways could the nominees many different arrangements of open and fill the positions of president and vice- closed windows are there? president if all ten are eligible for these 13. Application A Canadian postal code uses six senior positions? characters. The first, third, and fifth are b) How many ways are there to fill letters, while the second, fourth, and sixth these positions if only grade 11 and are digits. A U.S.A. zip code contains five grade 12 students are eligible? characters, all digits. 18. Communication a) How many codes are possible for each country? a) How many different licence plates could be made using three numbers followed b) How many more possible codes does by three letters? the one country have than the other? b) In 1997, Ontario began issuing licence 14. When three-digit area codes were plates with four letters followed by three introduced in 1947, the first digit had to be numbers. How many different plates are a number from 2 to 9 and the middle digit possible with this new system? had to be either 1 or 0. How many area c) Research the licence plate formats used in codes were possible under this system? the other provinces. Compare and contrast 15. Asha builds new homes and offers her these formats briefly and suggest reasons customers a choice of brick, aluminium for any differences between the formats. siding, or wood for the exterior, cedar or 19. In how many ways can you arrange the asphalt shingles for the roof, and radiators or letters of the word think so that the t and the forced-air for the heating system. How many h are separated by at least one other letter? different configurations is Asha offering? 20. Application Before the invention of the 16. a) In how many ways could you choose telephone, Samuel Morse (1791−1872) two fives, one after the other, from a developed an efficient system for sending deck of cards? messages as a series of dots and dashes b) In how many ways could you choose a red (short or long pulses). International code, a five and a spade, one after the other? modified version of Morse code, is still c) In how many ways could you choose widely used. a red five or a spade? a) How many different characters can the d) In how many ways could you choose international code represent with one to a red five or a heart? four pulses? e) Explain which counting principles you b) How many pulses would be necessary could apply in parts a) to d). to represent the 72 letters of the Cambodian alphabet using a system like Morse code? 230 MHR • Permutations and Organized Counting
  • 231. ACHIEVEMENT CHECK 24. Inquiry/Problem Solving Your school is purchasing a new type of combination lock Knowledge/ Thinking/Inquiry/ Understanding Problem Solving Communication Application for the student lockers. These locks have 40 positions on their dials and use a three- 21. Ten finalists are competing in a race at number combination. the Canada Games. a) How many combinations are possible if a) In how many different orders can the consecutive numbers cannot be the competitors finish the race? same? b) How many ways could the gold, silver, b) Are there any assumptions that you have and bronze medals be awarded? made? Explain. c) One of the finalists is a friend from c) Assuming that the first number must be your home town. How many of the dialled clockwise from 0, how many possible finishes would include your different combinations are possible? friend winning a medal? d) Suppose the first number can also be d) How many possible finishes would dialled counterclockwise from 0. Explain leave your friend out of the medal the effect this change has on the number standings? of possible combinations. e) Suppose one of the competitors is e) If you need four numbers to open the injured and cannot finish the race. lock, how many different combinations How does that affect your previous are possible? answers? f) How would the competitor’s injury 25. Inquiry/Problem Solving In chess, a knight affect your friend’s chances of winning can move either two squares horizontally a medal? Explain your reasoning. plus one vertically or two squares vertically What assumptions have you made? plus one horizontally. a) If a knight starts from one corner of a standard 8 × 8 chessboard, how many C different squares could it reach after 22. A locksmith has ten types of blanks for i) one move? keys. Each blank has five different cutting ii) two moves? positions and three different cutting depths at each position, except the first position, iii) three moves? which only has two depths. How many b) Could you use the fundamental counting different keys are possible with these principle to calculate the answers for blanks? part a)? Why or why not? 23. Communication How many 5-digit numbers are there that include the digit 5 and exclude the digit 8? Explain your solution. 4.1 Organized Counting • MHR 231
  • 232. 4.2 Factorials and Permutations In many situations, you need to determine the number of different orders in which you can choose or place a set of items. I N V E S T I G AT E & I N Q U I R E : N u m b e r s o f A r r a n g e m e n t s Consider how many different ways a president and a vice-president could be chosen from eight members of a students’ council. 1. a) Have one person in your class make two signs, writing President on one and Vice-President on the other. Now, choose two people to stand at the front of the class. Using the signs to indicate which person holds each position, decide in how many ways you can choose a president and a vice-president from the two people at the front of the class. b) Choose three students to be at the front of the class. Again using the signs to indicate who holds each position, determine how many ways you can choose a president and a vice- president from the three people at the front of the class. c) Repeat the process with four students. Do you see a pattern in the number of ways a president and a vice-president can be chosen from the different sizes of groups? If so, what is the pattern? If not, continue the process with five students and then with six students. d) When you see a pattern, predict how many ways a president and a vice-president can be chosen from the eight members of the students’ council. e) Suggest other ways of simulating the selection of a president and a vice-president for the students’ council. 232 MHR • Permutations and Organized Counting
  • 233. 2. Suppose that each of the eight members of the students’ council has to give a brief speech at an assembly. Consider how you could determine the number of different orders in which they could speak. a) Choose two students from your class and list all the possible orders in which they could speak. b) Choose three students and list all the possible orders in which they could speak. c) Repeat this process with four students. d) Is there an easy method to organize the list so that you could include all the possibilities? e) Is this method related to your results in question 1? Explain. f) Can you use your method to predict the number of different orders in which eight students could give speeches? Many counting and probability calculations involve the product of a series of consecutive integers. You can use factorial notation to write such expressions more easily. For any natural number n, n! = n × (n − 1) × (n − 2) × (n − 3) × … × 3 × 2 × 1 This expression is read as n factorial. Example 1 Evaluating Factorials Calculate each factorial. a) 2! b) 4! c) 8! Solution a) 2! = 2 × 1 =2 b) 4! = 4 × 3 × 2 × 1 = 24 c) 8! = 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 = 40 320 As you can see from Example 1, n! increases dramatically as n becomes larger. However, calculators and computer software provide an easy means of calculating the larger factorials. Most scientific and graphing calculators have a factorial key or function. 4.2 Factorials and Permutations • MHR 233
  • 234. Example 2 Using Technology to Evaluate Factorials Calculate. a) 21! b) 53! c) 70! Solution 1 Using a Graphing Calculator Enter the number on the home screen and then use the ! function on the MATH PRB menu to calculate the factorial. a) 21! = 21 × 20 × 19 × 18 × … × 2 × 1 = 5.1091 × 1019 b) 53! = 53 × 52 × 51 × … × 3 × 2 × 1 = 4.2749 × 1069 c) Entering 70! on a graphing calculator gives an ERR:OVERFLOW message since 70! > 10100 which is the largest number the calculator can handle. In fact, 69! is the largest factorial you can calculate directly on TI-83 series calculators. Solution 2 Using a Spreadsheet Both Corel® Quattro® Pro and Microsoft® Excel have a built-in factorial function with the syntax FACT(n). 234 MHR • Permutations and Organized Counting
  • 235. Example 3 Evaluating Factorial Expressions Evaluate. 10! 83! a) ᎏ b) ᎏ 5! 79! Solution In both these expressions, you can divide out the common terms in the numerator and denominator. 10! 10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 a) ᎏ = ᎏᎏᎏᎏ 5! 5×4×3×2×1 = 10 × 9 × 8 × 7 × 6 = 30 240 83! 83 × 82 × 81 × 80 × 79 × 78 × … × 2 × 1 b) ᎏ = ᎏᎏᎏᎏᎏ 79! 79 × 78 × … × 2 × 1 = 83 × 82 × 81 × 80 = 44 102 880 Note that by dividing out the common terms, you can use a calculator to evaluate this expression even though the factorials are too large for the calculator. Example 4 Counting Possibilities The senior choir has rehearsed five songs for an upcoming assembly. In how many different orders can the choir perform the songs? Solution There are five ways to choose the first song, four ways to choose the second, three ways to choose the third, two ways to choose the fourth, and only one way to choose the final song. Using the fundamental counting principle, the total number of different ways is 5 × 4 × 3 × 2 × 1 = 5! = 120 The choir can sing the five songs in 120 different orders. Example 5 Indirect Method In how many ways could ten questions on a test be arranged, if the easiest question and the most difficult question a) are side-by-side? b) are not side-by-side? 4.2 Factorials and Permutations • MHR 235
  • 236. Solution a) Treat the easiest question and the most difficult question as a unit making nine items that are to be arranged. The two questions can be arranged in 2! ways within their unit. 9! × 2! = 725 760 The questions can be arranged in 725 760 ways if the easiest question and the most difficult question are side-by-side. b) Use the indirect method. The number of arrangements with the easiest and most difficult questions separated is equal to the total number of possible arrangements less the number with the two questions side-by-side: 10! − 9! × 2! = 3 628 800 − 725 760 = 2 903 040 The questions can be arranged in 2 903 040 ways if the easiest question and the most difficult question are not side-by-side. A permutation of n distinct items is an arrangement of all the items in a definite order. The total number of such permutations is denoted by nPn or P(n, n). There are n possible ways of choosing the first item, n − 1 ways of choosing the second, n − 2 ways of choosing the third, and so on. Applying the fundamental counting principle as in Example 5 gives P = n × (n − 1) × (n − 2) × (n − 3) × … × 3 × 2 × 1 n n = n! Example 6 Applying the Permutation Formula In how many different orders can eight nominees for the students’ council give their speeches at an assembly? Solution P = 8! 8 8 =8×7×6×5×4×3×2×1 = 40 320 There are 40 320 different orders in which the eight nominees can give their speeches. Example 7 Student Government In how many ways could a president and a vice-president be chosen from a group of eight nominees? Solution Using the fundamental counting principle, there are 8 × 7, or 56, ways to choose a president and a vice-president. 236 MHR • Permutations and Organized Counting
  • 237. A permutation of n distinct items taken r at a time is an arrangement of r of the n items in a definite order. Such permutations are sometimes called r-arrangements of n items. The total number of possible arrangements of r items out of a set of n is denoted by nPr or P(n, r). There are n ways of choosing the first item, n − 1 ways of choosing the second item, and so on down to n − r + 1 ways of choosing the rth item. Using the fundamental counting principle, P = n(n − 1)(n − 2)…(n − r + 1) n r Project Prep It is often more convenient to rewrite this expression in terms of factorials. The permutations formula could be n! P = ᎏᎏ a useful tool for n r (n − r)! your probability project. The denominator divides out completely, as in Example 3, so these two ways of writing nPr are equivalent. Example 8 Applying the Permutation Formula In a card game, each player is dealt a face down “reserve” of 13 cards that can be turned up and used one by one during the game. How many different sequences of reserve cards could a player have? Solution 1 Using Pencil and Paper Here, you are taking 13 cards from a deck of 52. 52! P = ᎏᎏ 52 13 (52 − 13)! 52! =ᎏ 39! = 52 × 51 × 50 × … × 41 × 40 = 3.9542 × 1021 There are approximately 3.95 × 1021 different sequences of reserve cards a player could have. Solution 2 Using a Graphing Calculator Use the nPr function on the MATH PRB menu. There are approximately 3.95 × 1021 different sequences of reserve cards a player could turn up during one game. 4.2 Factorials and Permutations • MHR 237
  • 238. Solution 3 Using a Spreadsheet Both Corel® Quattro® Pro and Microsoft® Excel have a permutations function with the syntax PERMUT(n,r). There are approximately 3.95 × 1021 different sequences of reserve cards a player could turn up during one game. Key Concepts • A factorial indicates the multiplication of consecutive natural numbers. n! = n(n − 1)(n − 2) × … × 1. • The number of permutations of n distinct items chosen n at a time in a definite order is nPn = n! • The number of permutations of r items taken from n distinct items is n! P = ᎏ. n r (n − r)! Communicate Your Understanding 1. Explain why it is convenient to write the expression for the number of possible permutations in terms of factorials. 2. a) Is (−3)! possible? Explain your answer. b) In how many ways can you order an empty list, or zero items? What does this tell you about the value of 0!? Check your answer using a calculator. 238 MHR • Permutations and Organized Counting
  • 239. Practise B A 7. Simplify each of the following in factorial form. Do not evaluate. 1. Express in factorial notation. a) 12 × 11 × 10 × 9! a) 6 × 5 × 4 × 3 × 2 × 1 b) 72 × 7! b) 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 c) (n + 4)(n + 5)(n +3)! c) 3 × 2 × 1 d) 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 8. Communication Explain how a factorial is an iterative process. 2. Evaluate. 7! 11! 9. Seven children are to line up for a photograph. a) ᎏ b) ᎏ 4! 9! a) How many different arrangements are 8! 15! possible? c) ᎏ d) ᎏ 5! 2! 3! 8! b) How many arrangements are possible if 85! 14! Brenda is in the middle? e) ᎏ f) ᎏ 82! 4! 5! c) How many arrangements are possible if Ahmed is on the far left and Yen is on 3. Express in the form nPr. the far right? a) 6 × 5 × 4 d) How many arrangements are possible if b) 9 × 8 × 7 × 6 Hanh and Brian must be together? c) 20 × 19 × 18 × 17 10. A 12-volume encyclopedia is to be placed on d) 101 × 100 × 99 × 98 × 97 a shelf. How many incorrect arrangements e) 76 × 75 × 74 × 73 × 72 × 71 × 70 are there? 4. Evaluate without using technology. 11. In how many ways can the 12 members of a) P(10, 4) b) P(16, 4) c) 5P2 a volleyball team line up, if the captain and d) 9P4 e) 7! assistant captain must remain together? 12. Ten people are to be seated at a rectangular 5. Use either a spreadsheet or a graphing or scientific calculator to verify your answers table for dinner. Tanya will sit at the head of to question 4. the table. Henry must not sit beside either Wilson or Nancy. In how many ways can the Apply, Solve, Communicate people be seated for dinner? 6. a) How many ways can you arrange the 13. Application Joanne prefers classical and letters in the word factor? pop music. If her friend Charlene has five classical CDs, four country and western b) How many ways can Ismail arrange CDs, and seven pop CDs, in how many four different textbooks on the shelf in orders can Joanne and Charlene play the his locker? CDs Joanne likes? c) How many ways can Laura colour 4 adjacent regions on a map if she has 14. In how many ways can the valedictorian, a set of 12 coloured pencils? class poet, and presenter of the class gift be chosen from a class of 20 students? 4.2 Factorials and Permutations • MHR 239
  • 240. 15. Application If you have a standard deck of ACHIEVEMENT CHECK 52 cards, in how many different ways can Knowledge/ Thinking/Inquiry/ Communication Application you deal out Understanding Problem Solving a) 5 cards? b) 10 cards? 20. Wayne has a briefcase with a three-digit c) 5 red cards? d) 4 queens? combination lock. He can set the combination himself, and his favourite 16. Inquiry/Problem Solving Suppose you are digits are 3, 4, 5, 6, and 7. Each digit can designing a coding system for data relayed be used at most once. by a satellite. To make transmissions errors a) How many permutations of three of easier to detect, each code must have no these five digits are there? repeated digits. b) If you think of each permutation as a a) If you need 60 000 different codes, how three-digit number, how many of these many digits long should each code be? numbers would be odd numbers? b) How many ten-digit codes can you c) How many of the three-digit numbers create if the first three digits must be 1, are even numbers and begin with a 4? 3, or 6? d) How many of the three-digit numbers are 17. Arnold Schoenberg (1874 −1951) pioneered even numbers and do not begin with a 4? serialism, a technique for composing music e) Is there a connection among the four based on a tone row, a sequence in which answers above? If so, state what it is and each of the 12 tones in an octave is played why it occurs. only once. How many tone rows are possible? 18. Consider the students’ council described on C pte ha page 223 at the beginning of this chapter. C r a) In how many ways can the secretary, 21. TI-83 series calculators use the definition m P r ΂ − ᎏ2ᎏ΃! = ͙␲. Research the origin of this oble 1 treasurer, social convenor, and ෆ fundraising chair be elected if all ten nominees are eligible for any of these definition and explain why it is useful for positions? mathematical calculations. b) In how many ways can the council be 22. Communication How many different ways chosen if the president and vice- can six people be seated at a round table? president must be grade 12 students and Explain your reasoning. the grade representatives must represent their current grade level? 23. What is the highest power of 2 that divides evenly into 100! ? 19. Inquiry/Problem Solving A student has volunteered to photograph the school’s 24. A committee of three teachers are to select championship basketball team for the the winner from among ten students yearbook. In order to get the perfect nominated for special award. The teachers picture, the student plans to photograph the each make a list of their top three choices in ten players and their coach lined up in every order. The lists have only one name in possible order. Determine whether this plan common, and that name has a different rank is practical. on each list. In how many ways could the teachers have made their lists? 240 MHR • Permutations and Organized Counting
  • 241. 4.3 Permutations With Some Identical Items Often, you will deal with permutations in which some items are identical. I N V E S T I G AT E & I N Q U I R E : W h a t I s i n a N a m e ? 1. In their mathematics class, John and Jenn calculate the number of permutations of all the letters of their first names. a) How many permutations do you think John finds? b) List all the permutations of John’s name. c) How many permutations do you think Jenn finds? d) List all the permutations of Jenn’s name. e) Why do you think there are different numbers of permutations for the two names? 2. a) List all the permutations of the letters in your first name. Is the number of permutations different from what you would calculate using the nPn = n! formula? If so, explain why. b) List and count all the permutations of a word that has two identical pairs of letters. Compare your results with those your classmates found with other words. What effect do the identical letters have on the number of different permutations? c) Predict how many permutations you could make with the letters in the word googol. Work with several classmates to verify your prediction by writing out and counting all of the possible permutations. 3. Suggest a general formula for the number of permutations of a word that has two or more identical letters. As the investigation above suggests, you can develop a general formula for permutations in which some items are identical. 4.3 Permutations With Some Identical Items • MHR 241
  • 242. Example 1 Permutations With Some Identical Elements Compare the different permutations for the words DOLE, DOLL, and LOLL. Solution The following are all the permutations of DOLE : DOLE DOEL DLOE DLEO DEOL DELO ODLE ODEL OLDE OLED OEDL OELD LODE LOED LDOE LDEO LEOD LEDO EOLD EODL ELOD ELDO EDOL EDLO There are 24 permutations of the four letters in DOLE. This number matches what you would calculate using 4P4 = 4! To keep track of the permutations of the letters in the word DOLL, use a subscript to distinguish the one L from the other. DOLL1 DOL1L DLO L1 DL L1O DL1OL DL1LO ODLL1 ODL1L OLDL1 OLL1D OL1DL OL1LD LODL1 LOL1D LDOL1 LDL1O LL1OD LL1DO L1OLD L1ODL L1LOD L1LDO L1DOL L1DLO Of the 24 arrangements listed here, only 12 are actually different from each other. Since the two Ls are in fact identical, each of the permutations shown in black is duplicated by one of the permutations shown in red. If the two Ls in a permutation trade places, the resulting permutation is the same as the original one. The two Ls can trade places in 2P2 = 2! ways. Thus, the number of different arrangements is 4! 24 ᎏ=ᎏ 2! 2 = 12 In other words, to find the number of permutations, you divide the total number of arrangements by the number of ways in which you can arrange the identical letters. For the letters in DOLL, there are four ways to choose the first letter, three ways to choose the second, two ways to choose the third, and one way to choose the fourth. You then divide by the 2! or 2 ways that you can arrange the two Ls. Similarly, you can use subscripts to distinguish the three Ls in LOLL, and then highlight the duplicate arrangements. L2OLL1 L2OL1L L2LOL1 L2LL1O L2L1OL L2L1LO OL2LL1 OL2L1L OLL2L1 OLL1L2 OL1L2L OL1LL2 LOL2L1 LOL1L2 LL2OL1 LL2L1O LL1OL2 LL1L2O L1OLL2 L1OL2L L1LOL2 L1LL2O L1L2OL L1L2LO 242 MHR • Permutations and Organized Counting
  • 243. The arrangements shown in black are the only different ones. As with the other two words, there are 24 possible arrangements if you distinguish between the identical Ls. Here, the three identical Ls can trade places in 3P3 = 3! ways. 4! Thus, the number of permutations is ᎏ = 4. 3! You can generalize the argument in Example 1 to show that the number of n! permutations of a set of n items of which a are identical is ᎏ . a! Example 2 Tile Patterns Tanisha is laying out tiles for the edge of a mosaic. How many patterns can she make if she uses four yellow tiles and one each of blue, green, red, and grey tiles? Solution Here, n = 8 and a = 4. 8! ᎏ =8×7×6×5 4! = 1680 Tanisha can make 1680 different patterns with the eight tiles. Example 3 Permutation With Several Sets of Identical Elements The word bookkeeper is unusual in that it has three consecutive double letters. How many permutations are there of the letters in bookkeeper? Solution If each letter were different, there would be 10! permutations, but there are two os, two ks, and three es. You must divide by 2! twice to allow for the duplication of the os and ks, and then divide by 3! to allow for the three es: 10! 10 × 9 × 8 × 7 × 6 × 5 × 4 ᎏ = ᎏᎏᎏ 2!2!3! 2×2 = 151 200 There are 151 200 permutations of the letters in bookkeeper. The number of permutations of a set of n objects containing a identical objects of one kind, b identical objects of a second kind, c identical objects n! of a third kind, and so on is ᎏᎏ. a!b!c!… 4.3 Permutations With Some Identical Items • MHR 243
  • 244. Example 4 Applying the Formula for Several Sets of Identical Elements Barbara is hanging a display of clothing imprinted with the school’s crest on a line on a wall in the cafeteria. She has five sweatshirts, three T-shirts, and four pairs of sweatpants. In how many ways can Barbara arrange the display? Solution Project Here, a = 5, b = 3, c = 4, and the total number of items is 12. Prep So, The game you n! 12! design for your ᎏ=ᎏ a!b!c! 5!3!4! probability project = 27 720 could involve Barbara can arrange the display in 27 720 different ways. permutations of identical objects. Key Concepts • When dealing with permutations of n items that include a identical items of one type, b identical items of another type, and so on, you can use the n! formula ᎏ . a!b!c!… Communicate Your Understanding 1. Explain why there are fewer permutations of a given number of items if some of the items are identical. 2. a) Explain why the formula for the numbers of permutations when some items are identical has the denominator a!b!c!… instead of a × b × c… . b) Will there ever be cases where this denominator is larger than the numerator? Explain. c) Will there ever be a case where the formula does not give a whole number answer? What can you conclude about the denominator and the numerator? Explain your reasoning. 244 MHR • Permutations and Organized Counting
  • 245. Practise 8. a) Calculate the number of permutations for each of the jumbled words in this puzzle. A b) Estimate how long it would take to solve 1. Identify the indistinguishable items in each this puzzle by systematically writing out situation. the permutations. a) The letters of the word mathematics are arranged. b) Dina has six notebooks, two green and four white. c) The cafeteria prepares 50 chicken sandwiches, 100 hamburgers, and 70 plates of French fries. d) Thomas and Richard, identical twins, are sitting with Marianna and Megan. 2. How many permutations are there of all the letters in each name? a) Inverary b) Beamsville c) Mattawa d) Penetanguishene 3. How many different five-digit numbers © Tribune Media Services, Inc. All Rights Reserved. Reprinted with Permission. can be formed using three 2s and two 5s? 4. How many different six-digit numbers are possible using the following numbers? www.mcgrawhill.ca/links/MDM12 a) 1, 2, 3, 4, 5, 6 b) 1, 1, 1, 2, 3, 4 For more word jumbles and other puzzles, visit c) 1, 3, 3, 4, 4, 5 d) 6, 6, 6, 6, 7, 8 the above web site and follow the links. Find or generate two puzzles for a classmate Apply, Solve, Communicate to solve. B 9. Application Roberta is a pilot for a small 5. Communication A coin is tossed eight times. airline. If she flies to Sudbury three times, In how many different orders could five Timmins twice, and Thunder Bay five times heads and three tails occur? Explain your before returning home, how many different reasoning. itineraries could she follow? Explain your reasoning. 6. Inquiry/Problem Solving How many 7-digit even numbers less than 3 000 000 can be 10. After their training run, six members of a formed using all the digits 1, 2, 2, 3, 5, 5, 6? track team split a bag of assorted doughnuts. How many ways can the team share the 7. Kathryn’s soccer team played a good season, doughnuts if the bag contains finishing with 16 wins, 3 losses, and 1 tie. In a) six different doughnuts? how many orders could these results have happened? Explain your reasoning. b) three each of two varieties? c) two each of three varieties? 4.3 Permutations With Some Identical Items • MHR 245
  • 246. 11. As a project for the photography class, 15. Ten students have been nominated for the pte Haseeb wants to create a linear collage ha positions of secretary, treasurer, social C r of photos of his friends. He creates a convenor, and fundraising chair. In how m P r oble template with 20 spaces in a row. If many ways can these positions be filled if Haseeb has 5 identical photos of each the Norman twins are running and plan to of 4 friends, in how many ways can he switch positions on occasion for fun since make his collage? no one can tell them apart? 12. Communication A used car lot has four 16. Inquiry/Problem Solving In how many ways green flags, three red flags, and two blue can all the letters of the word CANADA be flags in a bin. In how many ways can the arranged if the consonants must always be owner arrange these flags on a wire in the order in which they occur in the word stretched across the lot? Explain your itself? reasoning. C 13. Application Malik wants to skateboard over 17. Glen works part time stocking shelves in a to visit his friend Gord who lives six blocks grocery store. The manager asks him to away. Gord’s house is two blocks west and make a pyramid display using 72 cans of four blocks north of Malik’s house. Each corn, 36 cans of peas, and 57 cans of carrots. time Malik goes over, he likes to take a Assume all the cans are the same size and different route. How many different routes shape. On his break, Glen tries to work out are there for Malik if he only travels west or how many different ways he could arrange north? the cans into a pyramid shape with a triangular base. a) Write a formula for the number of ACHIEVEMENT CHECK different ways Glen could stack the Knowledge/ Thinking/Inquiry/ cans in the pyramid. Communication Application Understanding Problem Solving b) Estimate how long it will take Glen to 14. Fran is working on a word puzzle and is calculate this number of permutations looking for four-letter “scrambles” from by hand. the clue word calculate. c) Use computer software or a calculator a) How many of the possible four-letter to complete the calculation. scrambles contain four different letters? b) How many contain two as and one 18. How many different ways are there of other pair of identical letters? arranging seven green and eight brown c) How many scrambles consist of any bottles in a row, so that exactly one pair two pairs of identical letters? of green bottles is side-by-side? d) What possibilities have you not yet 19. In how many ways could a class of taken into account? Find the number 18 students divide into groups of of scrambles for each of these cases. 3 students each? e) What is the total number of four-letter scrambles taking all cases into account? 246 MHR • Permutations and Organized Counting
  • 247. 4.4 Pascal’s Triangle The array of numbers shown below is called Pascal’s triangle in honour of French mathematician, Blaise Pascal (1623−1662). Although it is believed that the 14th century Chinese mathematician Chu Shi-kie knew of this array and some of its applications, Pascal discovered it independently at age 13. Pascal found many mathematical uses for the array, especially in probability theory. Pascal’s method for building his triangle is a simple iterative process similar to those described in, section 1.1. In Pascal’s triangle, each term is equal to the sum of the two terms immediately above it. The first and last terms in each row are both equal to 1 since the only term immediately above them is also always a 1. If tn,r represents the term in row n, position r, then tn,r = tn-1,r-1 + tn-1,r . For example, t6,2 = t5,1 + t5,2. Note that both the row and position labelling begin with 0. Chu Shi-kie’s triangle 1 Row 0 t0,0 1 1 Row 1 t1,0 t1,1 1 2 1 Row 2 t2,0 t2,1 t2,2 1 3 3 1 Row 3 t3,0 t3,1 t3,2 t3,3 1 4 6 4 1 Row 4 t4,0 t4,1 t4,2 t4,3 t4,4 1 5 10 10 5 1 Row 5 t5,0 t5,1 t5,2 t5,3 t5,4 t5,5 1 6 15 20 15 6 1 Row 6 t6,0 t6,1 t6,2 t6,3 t6,4 t6,5 t6,6 www.mcgrawhill.ca/links/MDM12 Visit the above web site and follow the links to learn more about Pascal’s triangle. Write a brief report about an application or an aspect of Pascal’s triangle that interests you. 4.4 Pascal’s Triangle • MHR 247
  • 248. I N V E S T I G AT E & I N Q U I R E : R o w S u m s 1. Find the sums of the numbers in each of the first six rows of Pascal’s triangle and list these sums in a table. 2. Predict the sum of the entries in a) row 7 b) row 8 c) row 9 3. Verify your predictions by calculating the sums of the numbers in rows 7, 8, and 9. 4. Predict the sum of the entries in row n of Pascal’s triangle. 5. List any other patterns you find in Pascal’s triangle. Compare your list with those of your classmates. Do their lists suggest further patterns you could look for? In his book Mathematical Carnival, Martin Gardner describes Pascal’s triangle as “so simple that a 10-year old can write it down, yet it contains such inexhaustible riches and links with so many seemingly unrelated aspects of mathematics, that it is surely one of the most elegant of number arrays.” Example 1 Pascal’s Method a) The first six terms in row 25 of Pascal’s triangle are 1, 25, 300, 2300, 12 650, and 53 130. Determine the first six terms in row 26. b) Use Pascal’s method to write a formula for each of the following terms: i) t12,5 ii) t40,32 iii) tn+1,r+1 Solution a) t26,0 = 1 t26,1 = 1 + 25 t26,2 = 25 + 300 = 26 = 325 t26,3 = 300 + 2300 t26,4 = 2300 + 12 650 t26,5 = 12 650 + 53 130 = 2600 = 14 950 = 65 780 b) i) t12,5 = t11,4 + t11,5 ii) t40,32 = t39,31 + t39,32 iii) tn+1,r+1 = tn,r + tn,r+1 248 MHR • Permutations and Organized Counting
  • 249. Example 2 Row Sums Which row in Pascal’s triangle has the sum of its terms equal to 32 768? Solution From the investigation on page 248, you know that the sum of the terms in any row n is 2n. Dividing 32 768 by 2 repeatedly, you find that 32 768 = 215. Thus, it is row 15 of Pascal’s triangle that has terms totalling 32 768. Example 3 Divisibility Determine whether tn,2 is divisible by tn,1 in each row of Pascal’s triangle. Solution tn,2 Row ᎏ Divisible? tn,1 0 and 1 n/a n/a 2 0.5 no 3 1 yes 4 1.5 no 5 2 yes 6 2.5 no 7 3 yes It appears that tn,2 is divisible by tn,1 only in odd-numbered rows. However, 2tn,2 is divisible by tn,1 in all rows that have three or more terms. Example 4 Triangular Numbers Coins can be arranged in the shape of an equilateral triangle as shown. a) Continue the pattern to determine the numbers of coins in triangles with four, five, and six rows. b) Locate these numbers in Pascal’s triangle. c) Relate Pascal’s triangle to the number of coins in a triangle with n rows. d) How many coins are in a triangle with 12 rows? 4.4 Pascal’s Triangle • MHR 249
  • 250. Solution a) The numbers of coins in the triangles follow the pattern 1 + 2 + 3 + … as shown in the table below. b) The numbers of coins in the triangles match the entries on the third diagonal of Pascal’s triangle. Number of Rows Number of Coins Term in Pascal’s Triangle 1 1 1 t2,2 1 1 2 3 t3,2 1 2 1 3 6 t4,2 1 3 3 1 1 41 6 4 4 10 t5,2 1 5 10 10 5 1 5 15 t6,2 1 6 15 20 15 6 1 6 21 t7,2 1 7 21 35 35 21 7 1 c) Compare the entries in the first and third columns of the table. The row number of the term from Pascal’s triangle is always one greater than the number of rows in the equilateral triangle. The position of the term in the row, r, is always 2. Thus, the number of coins in a triangle with n rows is equal to the term tn+1,2 in Pascal’s triangle. d) t12+1,2 = t13,2 = 78 A triangle with 12 rows contains 78 coins. Numbers that correspond to the number of items stacked in a triangular array are known as triangular numbers. Notice that the nth triangular number is also the sum of the first n positive integers. Example 5 Perfect Squares Can you find a relationship between perfect squares and the sums of pairs of entries in Pascal’s triangle? Solution Again, look at the third diagonal in n n2 Entries in Pascal’s Triangle Terms in Pascal’s Triangle Pascal’s triangle. 1 1 1 t2,2 2 4 1+3 t2,2 + t3,2 3 9 3+6 t3,2 + t4,2 4 16 6 + 10 t4,2 + t5,2 Each perfect square greater than 1 is equal to the sum of a pair of adjacent terms on the third diagonal of Pascal’s triangle: n2 = tn,2 + tn+1,2 for n > 1. 250 MHR • Permutations and Organized Counting
  • 251. Key Concepts • Each term in Pascal’s triangle is equal to the sum of the two adjacent terms in the row immediately above: tn,r = tn-1,r-1 + tn-1,r where tn,r represents the r th term in row n. • The sum of the terms in row n of Pascal’s triangle is 2n. • Τhe terms in the third diagonal of Pascal’s triangle are triangular numbers. Many other number patterns occur in Pascal’s triangle. Communicate Your Understanding 1. Describe the symmetry in Pascal’s triangle. 2. Explain why the triangular numbers in Example 4 occur in Pascal’s triangle. Practise Apply, Solve, Communicate A B 1. For future use, make a diagram of the first 5. Inquiry/Problem Solving 12 rows of Pascal’s triangle. a) Alternately add and subtract the terms in each of the first seven rows of Pascal’s 2. Express as a single term from Pascal’s triangle and list the results in a table triangle. similar to the one below. a) t7,2 + t7,3 Row Sum/Difference Result b) t51,40 + t51,41 0 1 1 c) t18,12 − t17,12 1 1−1 0 d) tn,r − tn-1,r 2 1−2+1 0 3 1 − 3 + 3 −1 0 3. Determine the sum of the terms in each of Ӈ these rows in Pascal’s triangle. a) row 12 b) Predict the result of alternately adding b) row 20 and subtracting the entries in the eighth row. Verify your prediction. c) row 25 c) Predict the result for the nth row. d) row (n − 1) 6. a) Predict the sum of the squares of the 4. Determine the row number for each of the terms in the nth row of Pascal’s triangle. following row sums from Pascal’s triangle. b) Predict the result of alternately adding a) 256 b) 2048 and subtracting the squares of the terms c) 16 384 d) 65 536 in the nth row of Pascal’s triangle. 4.4 Pascal’s Triangle • MHR 251
  • 252. 7. Communication 11. Application Oranges can be piled in a a) Compare the first four powers of 11 with tetrahedral shape as shown. The first pile entries in Pascal’s triangle. Describe any contains one orange, the second contains pattern you notice. four oranges, the third contains ten oranges, and so on. The numbers of items in such b) Explain how you could express row 5 as stacks are known as tetrahedral numbers. a power of 11 by regrouping the entries. c) Demonstrate how to express rows 6 and 7 as powers of 11 using the regrouping method from part b). Describe your method clearly. 8. a) How many diagonals are there in a) Relate the number of oranges in the nth i) a quadrilateral? pile to entries in Pascal’s triangle. ii) a pentagon? b) What is the 12th tetrahedral number? iii) a hexagon? 12. a) Relate the sum of the squares of the first b) Find a relationship between entries in n positive integers to entries in Pascal’s Pascal’s triangle and the maximum triangle. number of diagonals in an n-sided polygon. b) Use part a) to predict the sum of the squares of the first ten positive integers. c) Use part b) to predict how many Verify your prediction by adding the diagonals are in a heptagon and an numbers. octagon. Verify your prediction by drawing these polygons and counting the 13. Inquiry/Problem Solving A straight line number of possible diagonals in each. drawn through a circle divides it into two regions. 9. Make a conjecture about the divisibility of the terms in prime-numbered rows a) Determine the maximum number of of Pascal’s triangle. Confirm that your regions formed by n straight lines drawn conjecture is valid up to row 11. through a circle. Use Pascal’s triangle to help develop a formula. 10. a) Which rows of Pascal’s triangle contain only odd numbers? Is there a pattern to these rows? b) Are there any rows that have only even numbers? c) Are there more even or odd entries in Pascal’s triangle? Explain how you b) What is the maximum number of regions arrived at your answer. inside a circle cut by 15 lines? 14. Describe how you would set up a spreadsheet to calculate the entries in Pascal’s triangle. 252 MHR • Permutations and Organized Counting
  • 253. 18. a) Write the first 20 rows of Pascal’s C triangle on a sheet of graph paper, 15. The Fibonacci sequence is 1, 1, 2, 3, 5, 8, placing each entry in a separate square. 13, 21, … . Each term is the sum of the previous two terms. Find a relationship b) Shade in all the squares containing between the Fibonacci sequence and the numbers divisible by 2. following version of Pascal’s triangle. c) Describe, in detail, the patterns 1 produced. 1 1 d) Repeat this process for entries divisible 1 2 1 by other whole numbers. Observe the 1 3 3 1 resulting patterns and make a conjecture 1 4 6 4 1 about the divisibility of the terms in 1 5 10 10 5 1 Pascal’s triangle by various whole 1 6 15 20 15 6 1 numbers. 1 7 21 35 35 21 7 1 … 19. Communication 16. Application Toothpicks are laid out to a) Describe the iterative process used to form triangles as shown below. The first generate the terms in the triangle below. triangle contains 3 toothpicks, the second 1 ᎏᎏ contains 9 toothpicks, the third 1 contains 18 toothpicks, and so on. 1 1 ᎏᎏ ᎏᎏ 2 2 1 1 1 ᎏᎏ ᎏᎏ ᎏᎏ 3 6 3 1 1 1 1 ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ 4 12 12 4 1 1 1 1 1 ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ a) Relate the number of toothpicks in the 5 20 30 20 5 nth triangle to entries in Pascal’s triangle. 1 1 1 1 1 1 ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ ᎏᎏ b) How many toothpicks would the 6 30 60 60 30 6 10th triangle contain? b) Write the entries for the next two rows. 17. Design a 3-dimensional version of Pascal’s c) Describe three patterns in this triangle. triangle. Use your own criteria for the d) Research why this triangle is called the layers. The base may be any regular harmonic triangle. Briefly explain the geometric shape, but each successive layer origin of the name, listing your source(s). must have larger dimensions than the one above it. 4.4 Pascal’s Triangle • MHR 253
  • 254. 4.5 Applying Pascal’s Method The iterative process that generates the terms in Pascal’s triangle can also be applied to counting paths or routes between two points. Consider 1 water being poured into the top bucket in the diagram. You can use Pascal’s method to count the different paths that water overflowing from the top bucket could take to each of the buckets in the bottom row. 1 1 The water has one path to each of the buckets in the second row. There is one path to each outer bucket of the third 2 1 row, but two paths to the middle bucket, and so on. 1 The numbers in the diagram match those in Pascal’s triangle because they were derived using the same 1 3 3 1 method—Pascal’s method. I N V E S T I G AT E & I N Q U I R E : C o u n t i n g R o u t e s Suppose you are standing at the corner of Pythagoras Kovalevsky Avenue Sierpinski Street Street and Kovalevsky Avenue, and want to reach the Germain Street corner of Fibonacci Terrace and Euler Boulevard. To de Fermat Drive avoid going out of your way, you would travel only Euler Boulevard Pythagoras Street east and south. Notice that you could start out by Gauss Street going to the corner of either Euclid Street and Agnes Road Kovalevsky Avenue or Pythagoras Street and de Descartes Street Euclid Street Fermat Drive. Hypatia Street 1. How many routes are possible to the corner of Euclid Street and de Fermat Drive from your Wiles Lane starting point? Sketch the street grid and mark the number of routes onto it. Fibonacci Terrace 2. a) Continue to travel only east or south. How many routes are possible from the start to the corner of i) Descartes Street and Kovalevsky Avenue? ii) Pythagoras Street and Agnes Road? iii) Euclid Street and Agnes Road? iv) Descartes Street and de Fermat Drive? v) Descartes Street and Agnes Road? b) List the routes you counted in part a). 254 MHR • Permutations and Organized Counting
  • 255. 3. Consider your method and the resulting numbers. How do they relate to Pascal’s triangle? 4. Continue to mark the number of routes possible on your sketch until you have reached the corner of Fibonacci Terrace and Euler Boulevard. How many different routes are possible? 5. Describe the process you used to find the number of routes from Pythagoras Street and Kovalevsky Avenue to Fibonacci Terrace and Euler Boulevard. Example 1 Counting Paths in an Array Determine how many different paths will spell PASCAL if you start at the top and proceed to the next row by moving diagonally left or right. P A A S S S C C C C A A A L L Solution Starting at the top, record the number of possible paths moving diagonally P to the left and right as you proceed to each different letter. For instance, 1 A A 1 there is one path from P to the left A and one path from P to the right A. 1 2 1 There is one path from an A to the left S, two paths from an A to the S S S 1 3 3 1 middle S, and one path from an A to the right S. C C C C 4 6 4 Continuing with this counting reveals that there are 10 different paths A A A leading to each L. Therefore, a total of 20 paths spell PASCAL. 10 L L10 Example 2 Counting Paths on a Checkerboard On the checkerboard shown, the checker can travel only diagonally upward. It cannot move through a square containing an X. Determine the number of paths from the checker’s current position to the top of the board. x 4.5 Applying Pascal’s Method • MHR 255
  • 256. Solution Use Pascal’s method to find the number of paths to each successive 5 9 8 8 position. There is one path possible into each of the squares diagonally 5 4 4 4 adjacent to the checker’s starting position. From the second row there 1 4 x 4 are four paths to the third row: one path to the third square from the 1 3 3 1 left, two to the fifth square, and one to the seventh square. Continue 1 2 1 this process for the remaining four rows. The square containing an X gets a zero or no number since there are no paths through this blocked 1 1 square. From left to right, there are 5, 9, 8, and 8 paths to the white squares at the top of the board, making a total of 30 paths. Key Concepts • Pascal’s method involves adding two neighbouring terms in order to find the term below. • Pascal’s method can be applied to counting paths in a variety of arrays and grids. Communicate Your Understanding 1. Suggest a context in which you could apply Pascal’s method, other than those in the examples above. 2. Which of the numbers along the perimeter of a map tallying possible routes are always 1? Explain. Practise 2. In the following arrangements of letters, start from the top and proceed to the next A row by moving diagonally left or right. How 1. Fill in the missing numbers using Pascal’s many different paths will spell each word? method. a) P 495 A A 825 T T T 3003 2112 T T T T E E E E E R R R R R R N N N N N N N S S S S S S S S 256 MHR • Permutations and Organized Counting
  • 257. b) M 5. Sung is three blocks east and five blocks A A south of her friend’s home. How many T T T different routes are possible if she walks H H H H only west or north? E E E E E M M M M M M 6. Ryan lives four blocks north and five blocks A A A A A A A west of his school. Is it possible for him to T T T T T T take a different route to school each day, I I I I I walking only south and east? Assume that C C C C there are 194 school days in a year. S S S c) T 7. A checker is placed on a checkerboard as R R shown. The checker may move diagonally I I I upward. Although it cannot move into a A A A A square with an X, the checker may jump over N N N the X into the diagonally opposite square. G G L L L E E E E x 3. The first nine terms of a row of Pascal’s x triangle are shown below. Determine the first nine terms of the previous and next rows. 1 16 120 560 1820 4368 8008 11 440 12 870 a) How many paths are there to the top of Apply, Solve, Communicate the board? B b) How many paths would there be if the 4. Determine the number of possible routes checker could move both diagonally and from A to B if you travel only south or east. straight upward? a) A 8. Inquiry/Problem Solving a) If a checker is placed as shown below, how many possible paths are there for that checker to reach the top of the game B board? Recall that checkers can travel b) A only diagonally on the white squares, one square at a time, moving upward. B c) A B 1 2 3 4 4.5 Applying Pascal’s Method • MHR 257
  • 258. b) When a checker reaches the opposite 11. Communication A popular game show uses a side, it becomes a “king.” If the starting more elaborate version of the Plinko board squares are labelled 1 to 4, from left to shown below. Contestants drop a peg into right, from which starting square does a one of the slots at the top of the upright checker have the most routes to become board. The peg is equally likely to go left a king? Verify your statement. or right at each post it encounters. 9. Application The following diagrams represent 1 2 3 4 5 6 communication networks between a company’s computer centres in various cities. Thunder Bay Charlottetown Sudbury Halifax North Bay Toronto Ottawa Montréal Kitchener Kingston $100 $1000 $0 $5000 $0 $1000 $100 Winnipeg a) Into which slot should contestants drop Hamilton Windsor Saskatoon their pegs to maximize their chances of winning the $5000 prize? Which slot Edmonton gives contestants the least chance of Vancouver winning this prize? Justify your answers. a) How many routes are there from b) Suppose you dropped 100 pegs into the Windsor to Thunder Bay? slots randomly, one at a time. Sketch a b) How many routes are there from graph of the number of pegs likely to wind Ottawa to Sudbury? up in each compartment at the bottom of c) How many routes are there from the board. How is this graph related to Montréal to Saskatoon? those described in earlier chapters? d) How many routes are there from 12. Inquiry/Problem Solving Vancouver to Charlottetown? a) Build a new version of Pascal’s triangle, e) If the direction were reversed, would the using the formula for tn,r on page 247, number of routes be the same for parts a) but start with t0,0 = 2. to d)? Explain. b) Investigate this triangle and state a 10. To outfox the Big Bad Wolf, Little Red conjecture about its terms. Riding Hood mapped all the paths through c) State a conjecture about the sum of the the woods to Grandma’s house. How many terms in each row. different routes could she take, assuming she always travels from left to right? 13. Inquiry/Problem Solving Develop a formula relating tn,r of Pascal’s triangle to the terms Little Red Riding Hood's in row n − 3. House Grandma's House 258 MHR • Permutations and Organized Counting
  • 259. ACHIEVEMENT CHECK 17. Inquiry/Problem Solving Water is poured into the top bucket of a triangular stack of Knowledge/ Thinking/Inquiry/ Understanding Problem Solving Communication Application 2-L buckets. When each bucket is full, the water overflows equally on both sides into 14. The grid below shows the streets in Anya’s the buckets immediately below. How much neighbourhood. water will have been poured into the top B bucket when at least one of the buckets in the bottom row is full? D C A a) If she only travels east and north, how many different routes can Anya take from her house at intersection A to her friend’s house at intersection B? b) How many of the routes in part a) have only one change of direction? c) Suppose another friend lives at intersection C. How many ways can A B C D E F Anya travel from A to B, meeting her friend at C along the way? 18. Application Is it possible to arrange a d) How many ways can she travel to B pyramid of buckets such that the bottom without passing through C? Explain layer will fill evenly when water overflows your reasoning. from the bucket at the top of the pyramid? e) If Anya takes any route from A to B, is she more likely to pass through intersection C 19. Application Enya is standing in the centre or D? Explain your reasoning. square of a 9 by 9 grid. She travels outward one square at a time, moving diagonally or along a row or column. How many different paths can Enya follow to the perimeter? C 15. Develop a general formula to determine the 20. Communication Describe how a chessboard number of possible routes to travel n blocks path activity involving Pascal’s method is north and m blocks west. related to network diagrams like those in section 1.5. Would network diagrams for 16. Inquiry/Problem Solving In chess, a knight such activities be planar? Explain. moves in L-shaped jumps consisting of two squares along a row or column plus one square at a right angle. On a standard 8 × 8 chessboard, the starting position for a knight is the second square of the bottom row. If the knight travels upward on every move, how many routes can it take to the top of the board? 4.5 Applying Pascal’s Method • MHR 259
  • 260. Review of Key Concepts 4.1 Organized Counting 4.3 Permutations With Some Identical Refer to the Key Concepts on page 228. Items Refer to the Key Concepts on page 244. 1. A restaurant has a daily special with soup or salad for an appetizer; fish, chicken, or a 8. How many different ten-digit telephone vegetarian dish for the entrée; and cake, ice numbers contain four 2s, three 3s, and cream, or fruit salad for dessert. Use a tree three 7s? diagram to illustrate all the different meals 9. a) How many permutations are there of possible with this special. the letters in the word baseball? 2. A theatre company has a half-price offer for b) How many begin with the letter a? students who buy tickets for at least three of c) How many end with the letter e? the eight plays presented this season. How many choices of three plays would a student 10. Find the number of 4 × 4 patterns you can have? make using eight white, four grey, and four blue floor tiles. 3. In how many different orders can a photographer pose a row of six people without having the tallest person beside 4.4 Pascal’s Triangle the shortest one? Refer to the Key Concepts on page 251. 4. A transporter truck has three compact cars, a 11. Write out the first five rows of Pascal’s station wagon, and a minivan on its trailer. triangle. In how many ways can the driver load the shipment so that one of the heavier vehicles 12. What is the sum of the entries in the is directly over the rear axle of the trailer? seventh row of Pascal’s triangle? 13. Describe three patterns in Pascal’s triangle. 4.2 Factorials and Permutations Refer to the Key Concepts on page 238. 4.5 Applying Pascal’s Method 5. For what values of n is n! less than 2 ? n Refer to the Key Concepts on page 256. Justify your answer. 14. Explain why Pascal’s method can be 6. A band has recorded five hit singles. In how considered an iterative process. many different orders could the band play 15. How many paths through S three of these five songs at a concert? the array shown will spell I I E E E 7. In how many ways could a chairperson, SIERPINSKI? R R treasurer, and secretary be chosen from a P P P I I I I 12-member board of directors? N N N S S S S K K K I I 260 MHR • Permutations and Organized Counting
  • 261. Chapter Test ACHIEVEMENT CHART Knowledge/ Thinking/Inquiry/ Category Communication Application Understanding Problem Solving Questions All 4, 7, 8 1, 3, 8 3, 4, 5, 6, 8 1. Natasha tosses four coins one after the other. 4. a) How many four-digit numbers can you a) In how many different orders could form with the digits 1, 2, 3, 4, 5, 6, and 7 heads or tails occur. if no digit is repeated? b) Draw a tree diagram to illustrate all the b) How many of these four-digit numbers possible results. are odd numbers? c) Explain how your tree diagram c) How many of them are even numbers? corresponds to your calculation in part a). 5. How many ways are there to roll either a 2. Evaluate the following by first expressing 6 or a 12 with two dice? each in terms of factorials. a) 15P6 b) P(6, 2) c) 7P3 6. How many permutations are there of the letters of each of the following words? d) 9P9 e) P(7, 0) a) data b) management c) microwave 3. Suppose you are designing a remote control that uses short, medium, or long pulses of 7. A number of long, thin sticks are lying in a infrared light to send control signals to a pile at odd angles such that the sticks cross device. each other. a) How many different control codes can a) Relate the maximum number of you define using intersection points of n sticks to entries i) three pulses? in Pascal’s triangle. ii) one, two, or three pulses? b) What is the maximum number of b) Explain how the multiplicative and intersection points with six overlapping additive counting principles apply in sticks? your calculations for part a). ACHIEVEMENT CHECK Knowledge/Understanding Thinking/Inquiry/Problem Solving Communication Application 8. At a banquet, four couples are sitting along one side of a table with men and women alternating. a) How many seating arrangements are possible for these eight people? b) How many arrangements are possible if each couple sits together? Explain your reasoning. c) How many arrangements are possible if no one is sitting beside his or her partner? d) Explain why the answers from parts b) and c) do not add up to the answer from part a). Chapter Test • MHR 261
  • 262. 5 PT ER Combinations and the CHA Binomial Theorem Specific Expectations Section Use Venn diagrams as a tool for organizing information in counting 5.1 problems. Solve introductory counting problems involving the additive and 5.1, 5.2, 5.3 multiplicative counting principles. Express answers to permutation and combination problems, using 5.1, 5.2, 5.3 standard combinatorial symbols. Evaluate expressions involving factorial notation, using appropriate 5.2, 5.3 methods. Solve problems, using techniques for counting combinations. 5.2, 5.3 Identify patterns in Pascal’s triangle and relate the terms of Pascal’s 5.4 triangle to values of ΂n΃, to the expansion of a binomial, and to the r solution of related problems. Communicate clearly, coherently, and precisely the solutions to counting 5.1, 5.2, 5.3, problems. 5.4
  • 263. Chapter Problem Radio Programming 2. In how many ways can he program the Jeffrey works as a DJ at a local radio second hour if he chooses at least station. He does the drive shift from 16 00 10 songs that are in positions 15 to 40 to 20 00, Monday to Friday. Before going on the charts? on the air, he must choose the music he will 3. Over his 4-h shift, he will play at least play during these four hours. 48 songs from the top 100. In how The station has a few rules that Jeffrey many ways can he choose these songs? must follow, but he is allowed quite a bit In these questions, Jeffrey can play the of leeway. Jeffrey must choose all his music songs in any order. Such questions can be from the top 100 songs for the week and he answered with the help of combinatorics, must play at least 12 songs an hour. In his the branch of mathematics introduced in first hour, all his choices must be from the Chapter 4. However, the permutations in top-20 list. Chapter 4 dealt with situations where the 1. In how many ways can Jeffrey choose order of items was important. Now, you the music for his first hour? will learn techniques you can apply in situations where order is not important.
  • 264. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Factorials (section 4.2) Evaluate. 7. Exponent laws Use the exponent laws to 8! simplify each of the following. a) 8! b) ᎏᎏ 5! a) (−3y)0 b) (−4x)3 24! c) ᎏᎏ d) 3! × 4! ΂ ΃ 1 5 22! c) 15(7x)4(4y)2 d) 21(x3)2 ᎏ2 x 2. Permutations (section 4.2) Evaluate mentally. ΂΃ 1 4 e) (4x0y)2(3x2y)3 f) ᎏᎏ (3x2)(2y)3 a) 5 P5 b) 10 P2 2 ΂΃ c) P1 d) 7 P3 1 0 12 g) (−3x y)(−5x2y)2 h) ᎏᎏ (−2xy)3 3 3. Permutations (section 4.2) Evaluate manually. a) P5 b) P(16, 2) 8. Simplifying expressions Expand and simplify. 10 c) P10 d) P(8, 5) a) (x − 5)2 b) (5x − y)2 10 c) (x2 + 5)2 d) (x + 3)(x − 5)2 4. Permutations (section 4.2) Evaluate using e) (x2 − y)2 f) (2x + 3)2 software or a calculator. g) (x − 4)2(x − 2) h) (2x2 + 3y)2 a) P25 b) P(37, 16) i) (2x + 1)2(x − 2) j) (x + y)(x − 2y)2 50 c) 29 P29 d) P 46 23 9. Sigma notation Rewrite the following 5. Organized counting (section 4.1) Every using sigma notation. Canadian aircraft has five letters in its registration. The first letter must be C, the a) 1 + 2 + 4 + 8 + 16 second letter must be F or G, and the last b) x +2x2 +3x3 + 4x4 + 5x5 three letters have no restrictions. If repeated 1 1 1 1 c) ᎏ + ᎏ + ᎏ + ᎏ + … letters are allowed, how many aircraft can be 2 3 4 5 registered with this system? 10. Sigma notation Expand. 5 6. Applying permutations (Chapter 4) a) How many arrangements are there of a) Α 2n n=2 three different letters from the word 4 xn kings? b) Αᎏ n=1 n! b) How many arrangements are there of 5 all the letters of the word management? c) Α (2 n=1 n + n2) c) How many ways could first, second, and third prizes be awarded to 12 entrants in a mathematics contest? 264 MHR • Combinations and the Binomial Theorem
  • 265. 5.1 Organized Counting With Venn Diagrams In Chapter 4, you used tree diagrams as a tool for counting items when the order of the items was important. This section introduces a type of diagram that helps you organize data about groups of items when the order of the items is not important. I N V E S T I G AT E & I N Q U I R E : V isualizing Relationships Between Groups A group of students meet regularly to plan the dances at Vennville High School. Amar, Belinda, Charles, and Danica are on the dance committee, and Belinda, Charles, Edith, Franco, and Geoff are on the students’ council. Hans and Irena are not members of either group, but they attend meetings as reporters for the school newspaper. 1. Draw two circles to represent the dance committee and the students’ council. Where on the diagram would you put initials representing the students who are a) on the dance committee? b) on the students’ council? c) on the dance committee and the students’ council? d) not on either the dance committee or the students’ council? 2. Redraw your diagram marking on it the number of initials in each region. What relationships can you see among these numbers? Your sketch representing the dance committee and the students’ council is a simple example of a Venn diagram. The English logician John Venn (1834−1923) introduced such diagrams as a tool for analysing situations where there is some overlap among groups of items, or sets. Circles represent different sets and a rectangular box around the circles represents the universal set, S, from which all the items are drawn. This box is usually labelled with an S in the top left corner. 5.1 Organized Counting With Venn Diagrams • MHR 265
  • 266. The items in a set are often called the elements or members of the set. The size of a circle in a Venn diagram does not have to be proportional to the number of elements in the set the circle represents. When some items in a set are also elements of another set, these items are common elements and the sets are shown as overlapping circles. If all elements of a set C are also elements of set A, then C is a subset of A. A Venn diagram would show this set C as a region contained within the circle for set A. S A B www.mcgrawhill.ca/links/MDM12 To learn more about Venn diagrams, visit the above web site and follow the links. Describe an Common elements example of how Venn diagrams can be used of A and B to organize information. The common elements are a subset of both A and B. You can use Venn diagrams to organize information for situations in which the number of items in a group are important but the order of the items is not. Example 1 Common Elements There are 10 students on the volleyball team and 15 on the basketball team. When planning a field trip with both teams, the coach has to arrange transportation for a total of only 19 students. a) Use a Venn diagram to illustrate this situation. b) Explain why you cannot use the additive counting principle to find the total number of students on the teams. c) Determine how many students are on both teams. d) Determine the number of students in the remaining regions of your diagram and explain what these regions represent. Solution a) Some students must be on both the volleyball and the basketball S VB BB team. Draw a box with an S in the top left-hand corner. Draw and label two overlapping circles to represent the volleyball and basketball teams. 266 MHR • Combinations and the Binomial Theorem
  • 267. b) The additive counting principle (or rule of sum) applies only to mutually exclusive events or items. However, it is possible for students to be on both teams. If you simply add the 10 students on the volleyball team to 15 students on the basketball team, you get a total of 25 students because the students who play on both teams have been counted twice. c) The difference between the total in part b) and the total number of students actually on the two teams is equal to the number of students who are members of both teams. Thus, 25 − 19 = 6 students play on both teams. In the Venn diagram, these 6 students are represented by the area where the two circles overlap. d) There are 10 − 6 = 4 students in the section of the VB circle that S VB BB does not overlap with the BB circle. These are the students who play only on the volleyball team. Similarly, the non-overlapping 4 9 portion of the BB circle represents the 15 − 6 = 9 students who 6 play only on the basketball team. Example 1 illustrates the principle of inclusion and exclusion. If you are counting the total number of elements in two groups or sets that have common elements, you must subtract the common elements so that they are not included twice. Principle of Inclusion and Exclusion for Two Sets For sets A and B, the total number of elements in either A or B is the number in A plus the number in B minus the number in both A and B. n(A or B) = n(A) + n(B) − n(A and B), where n(X ) represents the numbers of elements in a set X. The set of all elements in either set A or set B is the union of A and B, which is often written as A ∪ B. Similarly, the set of all elements in both A and B is the intersection of A and B, written as A ∩ B. Thus the principle of inclusion and exclusion for two sets can also be stated as n(A ∪ B) = n(A) + n(B) − n( A ∩ B) Note that the additive counting principle (or rule of sum) could be considered a special case of the principle of inclusion and exclusion that applies only when sets A and B have no elements in common, so that n(A and B) = 0. The principle of inclusion and exclusion can also be applied to three or more sets. 5.1 Organized Counting With Venn Diagrams • MHR 267
  • 268. Example 2 Applying the Principle of Inclusion and Exclusion A drama club is putting on two one-act plays. There are 11 actors in the Feydeau farce and 7 in the Molière piece. a) If 3 actors are in both plays, how many actors are there in all? b) Use a Venn diagram to calculate how many students are in only one of the two plays. Solution a) Calculate the number of students in both plays using the principle of inclusion and exclusion. n(total) = n(Feydeau) + n(Molière) − n(Feydeau and Molière) = 11 + 7 − 3 = 15 There are 15 students involved in the two one-act plays. b) There are 3 students in the overlap between the two circles. So, S F M there must be 11 − 3 = 8 students in the region for Feydeau only and 7 − 3 = 4 students in the region for Molière only. 8 4 3 Thus, a total of 8 + 4 = 12 students are in only one of the two plays. As in the first example, using a Venn diagram can clarify the relationships between several sets and subsets. Example 3 Working With Three Sets Of the 140 grade 12 students at Vennville High School, 52 have signed up for biology, 71 for chemistry, and 40 for physics. The science students include 15 who are taking both biology and chemistry, 8 who are taking chemistry and physics, 11 who are taking biology and physics, and 2 who are taking all three science courses. a) How many students are not taking any of these three science courses? b) Illustrate the enrolments with a Venn diagram. Solution a) Extend the principle of inclusion and exclusion to three sets. Total the numbers of students in each course, subtract the numbers of students taking two courses, then add the number taking all three. This procedure subtracts out the students who have been counted twice because they are in two 268 MHR • Combinations and the Binomial Theorem
  • 269. courses, and then adds back those who were subtracted twice because they were in all three courses. For simplicity, let B stand for biology, C stand for chemistry, and P stand for physics. Then, the total number of students taking at least one of these three courses is n(total) = n(B) + n(C) + n(P) − n(B and C) − n(C and P) − n(B and P) + n(B and C and P) = 52 + 71 + 40 − 15 − 8 − 11 + 2 = 131 There are 131 students taking one or more of the three science courses. To find the number of grade 12 students who are not taking any of these science courses, subtract 131 from the total number of grade 12 students. Thus, 140 − 131 = 9 students are not taking any of these three science courses in grade 12. b) For this example, it is easiest to start with the overlap among the S B C three courses and then work outward. Since there are 2 students taking all three courses, mark 2 in the centre of the diagram where the three circles overlap. 2 Next, consider the adjacent regions representing the students who are taking exactly two of the three courses. P Biology and chemistry: Of the 15 students taking these two courses, S B C 2 are also taking physics, so 13 students are taking only biology and chemistry. 13 Chemistry and physics: 8 students less the 2 in the centre region 2 9 6 leaves 6. Biology and physics: 11 − 2 = 9. P Now, consider the regions representing students taking only one of the science courses. Biology: Of the 52 students taking this course, 13 + 2 + 9 = 24 are S B C in the regions overlapping with the other two courses, leaving 28 students who are taking biology only. 28 13 50 Chemistry: 71 students less the 13 + 2 + 6 leaves 50. 2 9 6 Physics: 40 − (9 + 2 + 6) = 23. 23 Adding all the numbers within the circles gives a total of 131. 9 P Thus, there must be 140 − 131 = 9 grade 12 students who are not taking any of the three science courses, which agrees with the answer found in part a). 5.1 Organized Counting With Venn Diagrams • MHR 269
  • 270. Key Concepts • Venn diagrams can help you visualize the relationships between sets of items, especially when the sets have some items in common. • The principle of inclusion and exclusion gives a formula for finding the number of items in the union of two or more sets. For two sets, the formula is n(A or B) = n(A) + n(B) − n(A and B). Communicate Your Understanding 1. Describe the principal use of Venn diagrams. 2. Is the universal set the same for all Venn diagrams? Explain why or why not. 3. Explain why the additive counting principle can be used in place of the principle of inclusion and exclusion for mutually exclusive sets. Practise c) List all subsets containing exactly two elements for A i) A 1. Let set A consist of an apple, an orange, ii) B and a pear and set B consist of the apple and a banana. iii) A ∪ B a) List the elements of 2. A recent survey of a group of students found i) A and B that many participate in baseball, football, ii) A or B and soccer. The Venn diagram below shows the results of the survey. iii) S iv) S ∩ B S Baseball v) A ∪ B ∪ S Football 27 8 10 b) List the value of 6 3 4 i) n(A) + n(B) 19 ii) n(A or B) 5 Soccer iii) n(S) iv) n(A ∪ B ) v) n(S ∩ A ) 270 MHR • Combinations and the Binomial Theorem
  • 271. a) How many students participated in the 5. Suppose the Canadian Embassy in the survey? Netherlands has 32 employees, all of whom b) How many of these students play both speak both French and English. In addition, soccer and baseball? 22 of the employees speak German and 15 speak Dutch. If there are 10 who speak both c) How many play only one sport? German and Dutch, how many of the d) How many play football and soccer? employees speak neither German nor e) How many play all three sports? Dutch? Illustrate your answer with a Venn f) How many do not play soccer? diagram. Apply, Solve, Communicate 6. Application There are 900 employees at CantoCrafts Inc. Of these, 615 are female, B 345 are under 35 years old, 482 are single, 3. Of the 220 graduating students in a school, 295 are single females, 187 are singles 110 attended the semi-formal dance and 150 under 35 years old, 190 are females attended the formal dance. If 58 students under 35 years old, and 120 are single attended both events, how many graduating females under 35 years old. Use a Venn students did not attend either of the two diagram to determine how many employees dances? Illustrate your answer with a Venn are married males who are at least diagram. 35 years old. 4. Application A survey of 1000 television 7. Communication A survey of 100 people who viewers conducted by a local television volunteered information about their reading station produced the following data: habits showed that • 40% watch the news at 12 00 • 75 read newspapers daily • 60% watch the news at 18 00 • 35 read books at least once a week • 50% watch the news at 23 00 • 45 read magazines regularly • 25% watch the news at 12 00 and at 18 00 • 25 read both newspapers and books • 20% watch the news at 12 00 and at 23 00 • 15 read both books and magazines • 20% watch the news at 18 00 and at 23 00 • 10 read newspapers, books, and • 10% watch all three news broadcasts magazines a) What percent of those surveyed watch at a) Construct a Venn diagram to determine least one of these programs? the maximum number of people in the b) What percent watch none of these news survey who read both newspapers and broadcasts? magazines. b) Explain why you cannot determine c) What percent view the news at 12 00 and at 18 00, but not at 23 00? exactly how many of the people surveyed read both newspapers and d) What percent view only one of these magazines. shows? e) What percent view exactly two of these shows? 5.1 Organized Counting With Venn Diagrams • MHR 271
  • 272. 8. Jeffrey works as a DJ at a local radio station. pte C ha On occasion, he chooses some of the songs 9. Inquiry/Problem Solving The Vennville C r he will play based on the phone-in requests m junior hockey team has 12 members who P r oble received by the switchboard the previous can play forward, 8 who can play defence, day. Jeffrey’s list of 200 possible selections and 2 who can be goalies. What is the includes smallest possible size of the team if • all the songs in the top 100 a) no one plays more than one position? • 134 hard-rock songs • 50 phone-in requests b) no one plays both defence and forward? • 45 hard-rock songs in the top 100 c) three of the players are able to play • 20 phone-in requests in the top 100 defence or forward? • 24 phone-in requests for hard-rock songs d) both the goalies can play forward but not Use a Venn diagram to determine defence? a) how many phone-in requests were for 10. Inquiry/Problem Solving Use the principle of hard-rock songs in the top 100 inclusion and exclusion to develop a formula b) how many of the songs in the top 100 for the number of elements in were neither phone-in requests nor hard- a) three sets b) four sets c) n sets rock selections Career Connection Forensic Scientist The field of forensic science could be attractive to those with a mathematics and science background. The job of a forensic scientist is to identify, analyse, and match items collected from crime scenes. Forensic scientists most often work in a forensic laboratory. Such laboratories examine and analyse physical evidence, including controlled substances, biological materials, firearms and ammunition components, and DNA samples. Forensic scientists may have specialities such as fingerprints, bullistics, clothing and fibres, footprints, tire tracks, DNA profiling, or crime scene analysis. Modern forensic science combines www.mcgrawhill.ca/links/MDM12 mathematics and computers. A forensic scientist should have a background in For more information about forensic science combinatorics, biology, and the physical and other careers related to mathematics, visit the above web site and follow the links. Write a sciences. Forensic scientists work for a wide brief description of how combinatorics could be variety of organizations including police forces, used by forensic scientists. government offices, and the military. 272 MHR • Combinations and the Binomial Theorem
  • 273. 5.2 Combinations In Chapter 4, you learned about permutations—arrangements in which the order of the items is specified. However, in many situations, order does not matter. For example, in many card games, what is in your hand is important, but the order in which it was dealt is not. I N V E S T I G AT E & I N Q U I R E : S t u d e n t s ’ C o u n c i l Suppose the students at a secondary school elect a council of eight members, two from each grade. This council then chooses two of its members as co-chairpersons. How could you calculate the number of different pairs of members who could be chosen as the co-chairs? Choose someone in the class to record your answers to the following questions on a blackboard or an overhead projector. a) Start with the simplest case. Choose two students to stand at the front of the class. In how many ways can you choose two co-chairs from this pair of students? b) Choose three students to be at the front of the class. In how many ways can you choose two co-chairs from this trio? c) In how many ways can you choose two co-chairs from a group of four students? d) In how many ways can you choose two co-chairs from a group of five students? Do you see a pattern developing? If so, what is it? If not, try choosing from a group of six students and then from a group of seven students while continuing to look for a pattern. e) When you see a pattern, predict the number of ways two co-chairs can be chosen from a group of eight students. f) Can you suggest how you could find the answers for this investigation from the numbers of permutations you found in the investigation in section 4.2? 5.2 Combinations • MHR 273
  • 274. In the investigation on the previous page, you were dealing with a situation in which you were selecting two people from a group, but the order in which you chose the two did not matter. In a permutation, there is a difference between selecting, say, Bob as president and Margot as vice-president as opposed to selecting Margot as president and Bob as vice-president. If you select Bob and Margot as co-chairs, the order in which you select them does not matter since they will both have the same job. A selection from a group of items without regard to order is called a combination. Example 1 Comparing Permutations and Combinations a) In how many ways could Alana, Barbara, Carl, Domenic, and Edward fill the positions of president, vice-president, and secretary? b) In how many ways could these same five people form a committee with three members? List the ways. c) How are the numbers of ways in parts a) and b) related? Solution a) Since the positions are different, order is important. Use a permutation, nPr. There are five people to choose from, so n = 5. There are three people being chosen, so r = 3. The number of permutations is 5P3 = 60. There are 60 ways Alana, Barbara, Carl, Domenic, and Edward could fill the positions of president, vice-president, and secretary. b) The easiest way to find all committee combinations is to write them in an ordered fashion. Let A represent Alana, B represent Barbara, C represent Carl, D represent Domenic, and E represent Edward. The possible combinations are: ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE All other possible arrangements include the same three people as one of the combinations listed above. For example, ABC is the same as ACB, BAC, BCA, CAB, and CBA since order is not important. So, there are only ten ways Alana, Barbara, Carl, Domenic and Edward can form a three-person committee. 274 MHR • Combinations and the Binomial Theorem
  • 275. c) In part a), there were 60 possible permutations, while in part b), there were 10 possible combinations. The difference is a factor of 6. This factor is P = 3!, the number of possible arrangements of the three people in each 3 3 combination. Thus, number of permutations number of combinations = ᎏᎏᎏᎏᎏᎏ number of permutations of the objects selected P = ᎏ5 3 3! 60 = ᎏ 6 = 10 Combinations of n distinct objects taken r at a time The number of combinations of r objects chosen from a set of n distinct objects is P n Cr = ᎏ n r r! n! ᎏᎏ (n − r)! = ᎏᎏ r! n! = ᎏ (n − r)!r! The notations nCr, C(n, r), and ΂ n ΃ are all equivalent. Many people prefer the form r n when a number of combinations are multiplied together. The symbol C is used ΂r΃ n r most often in this text since it is what appears on most scientific and graphing calculators. Example 2 Applying the Combinations Formula How many different sampler dishes with 3 different flavours could you get at an ice-cream shop with 31 different flavours? Solution There are 31 flavours, so n = 31. The sampler dish has 3 flavours, so r = 3. 31! C3 = ᎏᎏ 31 (31 − 3)! 3 31! = ᎏᎏ 28!3! 31 × 30 × 29 = ᎏᎏ 3×2 = 4495 There are 4495 possible sampler combinations. 5.2 Combinations • MHR 275
  • 276. Note that the number of combinations in Example 2 was easy to calculate because the number of items chosen, r, was quite small. Example 3 Calculating Numbers of Combinations Manually A ballet choreographer wants 18 dancers for a scene. a) In how many ways can the choreographer choose the dancers if the company has 20 dancers? 24 dancers? b) How would you advise the choreographer to choose the dancers? Solution a) When n and r are close in value, nCr can be calculated mentally. With n = 20 and r = 18, 20! C18 = ᎏᎏ 20 (20 − 18)!18! 20 × 19 = ᎏᎏ 20 ÷ 2 = 10 2! = 190 Then, 10 × 19 = 190 The choreographer could choose from 190 different combinations of the 20 dancers. With n = 24 and r = 18, nCr can be calculated manually or with a basic calculator once you have divided out the common terms in the factorials. 24! C18 = ᎏᎏ 24 (24 − 18)!18! 24 × 23 × 22 × 21 × 20 × 19 = ᎏᎏᎏ 6! 24 × 23 × 22 × 21 × 20 × 19 = ᎏᎏᎏ 6×5×4×3×2×1 = 23 × 11 × 7 × 4 × 19 = 134 596 With the 4 additional dancers, the choreographer now has a choice of 134 596 combinations. b) From part a), you can see that it would be impractical for the choreographer to try every possible combination. Instead the choreographer could use an indirect method and try to decide which dancers are least likely to be suitable for the scene. 276 MHR • Combinations and the Binomial Theorem
  • 277. Even though there are fewer permutations of n objects than there are combinations, the numbers of combinations are often still too large to calculate manually. Example 4 Using Technology to Calculate Numbers of Combinations Each player in a bridge game receives a hand of 13 cards dealt from a standard For details of deck. How many different bridge hands are possible? calculator and software functions, Solution 1 Using a Graphing Calculator refer to Appendix B. Here, the order in which the player receives the cards does not matter. What you want to determine is the number of different combinations of cards a player could have once the dealing is complete. So, the answer is simply 52C13. You can evaluate 52C13 by using the nCr function on the MATH PRB menu of a graphing calculator. This function is similar to the nPr function used for permutations. There are about 635 billion possible bridge hands. Solution 2 Using a Spreadsheet Most spreadsheet programs have a combinations function for calculating numbers of combinations. In Microsoft® Excel, this function is the COMBIN(n,r) function. In Corel® Quattro® Pro, this function is the COMB(r,n) function. You now have a variety of methods for finding numbers of combinations— Project paper-and-pencil calculations, factorials, scientific or graphing calculators, and Prep software. When appropriate, you can also apply both of the counting principles described in Chapter 4. Techniques for calculating Example 5 Using the Counting Principles With Combinations numbers of combinations Ursula runs a small landscaping business. She has on hand 12 kinds of rose could be helpful bushes, 16 kinds of small shrubs, 11 kinds of evergreen seedlings, and 18 kinds for designing the of flowering lilies. In how many ways can Ursula fill an order if a customer game in your wants probability project, a) 15 different varieties consisting of 4 roses, 3 shrubs, 2 evergreens, especially if your and 6 lilies? game uses cards. b) either 4 different roses or 6 different lilies? 5.2 Combinations • MHR 277
  • 278. Solution a) The order in which Ursula chooses the plants does not matter. The number of ways of choosing the roses is 12C4. The number of ways of choosing the shrubs is 16C3. The number of ways of choosing the evergreens is 11C2. The number of ways of choosing the lilies is 18C6. Since varying the rose selection for each different selection of the shrubs, evergreens, and lilies produces a different choice of plants, you can apply the fundamental (multiplicative) counting principle. Multiply the series of combinations to find the total number of possibilities. 12 C4 × 16C3 × 11C2 × 18C6 = 495 × 560 × 55 × 18 564 = 2.830 267 44 × 1011 Ursula has over 283 billion ways of choosing the plants for her customer. b) Ursula can choose the 4 rose bushes in 12 C4 ways. She can choose the 6 lilies in 18C6 ways. Since the customer wants either the rose bushes or the lilies, you can apply the additive counting principle to find the total number of possibilities. 12 C4 + 18C6 = 495 + 18 564 = 19 059 Ursula can fill the order for either roses or lilies in 19 059 ways. As you can see, even relatively simple situations can produce very large numbers of combinations. Key Concepts • A combination is a selection of objects in which order is not important. • The number of combinations of n distinct objects taken r at a time is denoted n! as nCr , C(n, r), or ΂ n ΃ and is equal to ᎏᎏ . r (n − r)! r! • The multiplicative and additive counting principles can be applied to problems involving combinations. 278 MHR • Combinations and the Binomial Theorem
  • 279. Communicate Your Understanding 1. Explain why n objects have more possible permutations than combinations. Use a simple example to illustrate your explanation. 2. Explain whether you would use combinations, permutations, or another method to calculate the number of ways of choosing a) three items from a menu of ten items b) an appetizer, an entrée, and a dessert from a menu with three appetizers, four entrées, and five desserts 3. Give an example of a combination expression you could calculate a) by hand b) algebraically c) only with a calculator or computer Practise 4. How many ways can 4 cards be chosen from a deck of 52, if the order in which they are A chosen does not matter? 1. Evaluate using a variety of methods. Explain why you chose each method. 5. How many groups of 3 toys can a child choose to take on a vacation from a toy box a) C19 b) C28 21 30 containing 11 toys? c) 18 C5 d) 16 C3 e) C4 f) C20 6. How many sets of 6 questions for a test can 19 25 be chosen from a list of 22 questions? 2. Evaluate the following pairs of combinations and compare their values. 7. In how many ways can a teacher select 5 students from a class of 23 to make a a) C1, 11C10 11 bulletin-board display? Explain your b) 11 C2, 11C9 reasoning. c) 11 C3, 11C8 8. As a promotion, a video store decides to give Apply, Solve, Communicate away posters for recently released movies. a) If posters are available for 27 recent B releases, in how many ways could the 3. Communication In how many ways could you video-store owner choose 8 different choose 2 red jellybeans from a package of posters for the promotion? 15 red jellybeans? Explain your reasoning. b) Are you able to calculate the number of ways mentally? Why or why not? 5.2 Combinations • MHR 279
  • 280. 9. Communication A club has 11 members. 14. Jeffrey, a DJ at a local radio station, is pte ha choosing the music he will play on his shift. a) How many different 2-member C r committees could the club form? He must choose all his music from the top m P r oble 100 songs for the week and he must play at b) How many different 3-member least 12 songs an hour. In his first hour, all committees could the club form? his choices must be from the top-20 list. c) In how many ways can a club president, a) In how many ways can Jeffrey choose the treasurer, and secretary be chosen? songs for his first hour if he wants to d) By what factor do the answers in parts b) choose exactly 12 songs? and c) differ? How do you account for b) In how many ways can Jeffrey choose this difference? the 12 songs if he wants to pick 8 of the 10. Fritz has a deck of 52 cards, and he may top 10 and 4 from the songs listed from choose any number of these cards, from 11 to 20 on the chart? none to all. Use a spreadsheet or Fathom™ c) In how many ways can Jeffrey choose to calculate and graph the number of either 12 or 13 songs to play in the first combinations for each of Fritz’s choices. hour of his shift? d) In how many ways can Jeffrey choose the 11. Application A track club, a swim club, and a songs if he wants to play up to 15 songs cycling club are forming a joint committee in the first hour? to organize a triathlon. The committee will have two members from each club. In how 15. The game of euchre uses only 24 of the many ways can the committee be formed if cards from a standard deck. How many ten runners, eight swimmers, and seven different five-card euchre hands are cyclists volunteer to serve on it? possible? 12. In how many ways can a jury of 6 men and 16. Application A taxi is shuttling 11 students 6 women be chosen from a group of 10 men to a concert. The taxi can hold only 4 and 15 women? students. In how many ways can 4 students be chosen for 13. Inquiry/Problem Solving There are 15 technicians and 11 chemists working in a) the taxi’s first trip? a research laboratory. In how many ways b) the taxi’s second trip? could they form a 5-member safety committee if the committee 17. Diane is making a quilt. She needs three pieces with a yellow undertone, two pieces a) may be chosen in any way? with a blue undertone, and four pieces with b) must have exactly one technician? a white undertone. If she has six squares c) must have exactly one chemist? with a yellow undertone, five with a blue d) must have exactly two chemists? undertone, and eight with a white undertone to choose from, in how many ways can she e) may be all technicians or all chemists? choose the squares for the quilt? 280 MHR • Combinations and the Binomial Theorem
  • 281. 18. Inquiry/Problem Solving At a family reunion, 20. In the game of bridge, each player is dealt everyone greets each other with a a hand of 13 cards from a standard deck of handshake. If there are 20 people at the 52 cards. reunion, how many handshakes take place? a) By what factor does the number of possible bridge hands differ from the number of ways a bridge hand could be ACHIEVEMENT CHECK dealt to a player? Explain your reasoning. Knowledge/ Thinking/Inquiry/ b) Use combinations to write an expression Communication Application Understanding Problem Solving for the number of bridge hands that have 19. A basketball team consists of five players— exactly five clubs, two spades, three one centre, two forwards, and two guards. diamonds, and three hearts. The senior squad at Vennville Central c) Use combinations to write an expression High School has two centres, six forwards, for the number of bridge hands that have and four guards. exactly five hearts. a) How many ways can the coach pick the d) Use software or a calculator to evaluate two starting guards for a game? the expressions in parts b) and c). b) How many different starting lineups are C possible if all team members play their 21. There are 18 students involved in the class specified positions? production of Arsenic and Old Lace. c) How many of these starting lineups a) In how many ways can the teacher cast include Dana, the team’s 185-cm the play if there are five male roles and centre? seven female roles and the class has nine d) Some coaches designate the forwards as male and nine female students? power forward and small forward. If all b) In how many ways can the teacher cast six forwards are adept in either position, the play if Jean will play the young how would this designation affect the female part only if Jovane plays the male number of possible starting lineups? lead? e) As the league final approaches, the c) In how many ways can the teacher cast centre Dana, forward Ashlee, and guard the play if all the roles could be played Hollie are all down with a nasty flu. by either a male or a female student? Fortunately, the five healthy forwards can also play the guard position. If the 22. A large sack contains six basketballs and coach can assign these players as either five volleyballs. Find the number of forwards or guards, will the number of combinations of four balls that can be possible starting lineups be close to the chosen from the sack if number in part b)? Support your answer a) they may be any type of ball mathematically. b) two must be volleyballs and two must f) Is the same result achieved if the be basketballs forwards play their regular positions but c) all four must be volleyballs the guards can play as either forwards d) none may be volleyballs or guards? Explain your answer. 5.2 Combinations • MHR 281
  • 282. 5.3 Problem Solving With Combinations In the last section, you considered the number of ways of choosing r items from a set of n distinct items. This section will examine situations where you want to know the total number of possible combinations of any size that you could choose from a given number of items, some of which may be identical. I N V E S T I G AT E & I N Q U I R E : Combinations of Coins 1. a) How many different sums of money can you create with a penny and a nickel? List these sums. b) How many different sums can you create with a penny, a nickel, and a dime? List them. c) Predict how many different sums you can create with a penny, a nickel, a dime, and a quarter. Test your conjecture by listing the possible sums. 2. a) How many different sums of money can you create with two pennies and a dime? List them. b) How many different sums can you create with three pennies and a dime? c) Predict how many sums you can create with four pennies and a dime. Test your conjecture. Can you see a pattern developing? If so, what is it? Example 1 All Possible Combinations of Distinct Items An artist has an apple, an orange, and a pear in his refrigerator. In how many ways can the artist choose one or more pieces of fruit for a still-life painting? Solution The artist has two choices for each piece of fruit: either include it in the painting or leave it out. Thus, the artist has a total of 2 × 2 × 2 = 8 choices. Note that one of these choices is to leave out the apple, the orange, and the pear. However, the artist wants at least one piece of fruit in his painting. Thus, he has 23 − 1 = 7 combinations to choose from. 282 MHR • Combinations and the Binomial Theorem
  • 283. You can apply the same logic to any group of distinct items. The total number of combinations containing at least one item chosen from a group of n distinct items is 2n − 1. Remember that combinations are subsets of the group of n objects. A null set is a set that has no elements. Thus, A set with n distinct elements has 2n subsets including the null set. Example 2 Applying the Formula for Numbers of Subsets In how many ways can a committee with at least one member be appointed from a board with six members? Solution The board could choose 1, 2, 3, 4, 5, or 6 people for the committee, so n = 6. Since the committee must have at least one member, use the formula that excludes the null set. 26 − 1 = 64 − 1 = 63 There are 63 ways to choose a committee of at least one person from a six-member board. Example 3 All Possible Combinations With Some Identical Items Kate is responsible for stocking the coffee room at her office. She can purchase up to three cases of cookies, four cases of soft drinks, and two cases of coffee packets without having to send the order through the accounting department. How many different direct purchases can Kate make? Solution Kate can order more than one of each kind of item, so this situation involves combinations in which some items are alike. • Kate may choose to buy three or two or one or no cases of cookies, so she has four ways to choose cookies. • Kate may choose to buy four or three or two or one or no cases of soft drinks, so she has five ways to choose soft drinks. • Kate may choose to buy two or one or no cases of coffee packets, so she has three ways to choose coffee. 5.3 Problem Solving With Combinations • MHR 283
  • 284. Cookies 0 1 2 3 Soft Drinks 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 Coffee 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 As shown on the first branch of the diagram above, one of these choices is purchasing no cookies, no soft drinks, and no coffee. Since this choice is not a purchase at all, subtract it from the total number of choices. Thus, Kate can make 4 × 5 × 3 − 1 = 59 different direct purchases. In a situation where you can choose all, some, or none of the p items available, you have p + 1 choices. You can then apply the fundamental (multiplicative) counting principle if you have successive choices of different kinds of items. Always consider whether the choice of not picking any items makes sense. If it does not, subtract 1 from the total. Combinations of Items in Which Some are Alike If at least one item is chosen, the total number of selections that can be made from p items of one kind, q items of another kind, r items of another kind, and so on, is ( p + 1)(q + 1)(r + 1) … − 1 Having identical elements in a set reduces the number of possible combinations when you choose r items from that set. You cannot calculate this number by simply dividing by a factorial as you did with permutations in section 4.3. Often, you have to consider a large number of cases individually. However, some situations have restrictive conditions that make it much easier to count the number of possible combinations. Example 4 Combinations With Some Identical Items The director of a short documentary has found five rock songs, two blues tunes, and three jazz pieces that suit the theme of the film. In how many ways can the director choose three pieces for the soundtrack if she wants the film to include some jazz? 284 MHR • Combinations and the Binomial Theorem
  • 285. Solution 1 Counting Cases The director can select exactly one, two, or three jazz pieces. Case 1: One jazz piece The director can choose the one jazz piece in 3C1 ways and two of the seven non-jazz pieces in 7C2 ways. Thus, there are 3C1 × 7C2 = 63 combinations of music with one jazz piece. Case 2: Two jazz pieces The director can choose the two jazz pieces in 3C2 ways and one of the seven non-jazz pieces in 7C1 ways. There are 3C2 × 7C1 = 21 combinations with two jazz pieces. Case 3: Three jazz pieces The director can choose the three jazz pieces and none of the seven non-jazz pieces in only one way: 3C3 × 7C0 = 1. The total number of combinations with at least one jazz piece is 63 + 21 + 1 = 85. Solution 2 Indirect Method You can find the total number of possible combinations of three pieces of music and subtract those that do not have any jazz. The total number of ways of choosing any three pieces from the ten available is C = 120. The number of ways of not picking any jazz, that is, choosing only 10 3 from the non-jazz pieces is 7C3 = 35. Thus, the number of ways of choosing at least one jazz piece is 120 − 35 = 85. Here is a summary of ways to approach questions involving choosing or selecting objects. Is order important? Yes: Use permutations. Can the same No: Use combinations. Are you choosing exactly objects be selected more than once r objects? (like digits for a telephone number)? Yes: Could some of the objects be identical? Yes: Use the fundamental counting Yes: Count the individual cases. principle. n! No: Use nCr = ᎏ No: Are some of the objects identical? (n − r)!r! n! No: Are some of the objects identical? Yes: Use the formula ᎏ . a!b!c!… Yes: Use ( p + 1)(q + 1)(r + 1) − 1 to find n! the total number of combinations No: Use nPr = ᎏ . with at least one object. (n − r)! No: Use 2n to find the total number of combinations; subtract 1 if you do not want to include the null set. 5.3 Problem Solving With Combinations • MHR 285
  • 286. Key Concepts • Use the formula ( p + 1)(q + 1)(r + 1) … − 1 to find the total number of selections of at least one item that can be made from p items of one kind, q of a second kind, r of a third kind, and so on. • A set with n distinct elements has 2n subsets including the null set. • For combinations with some identical elements, you often have to consider all possible cases individually. • In a situation where you must choose at least one particular item, either consider the total number of choices available minus the number without the desired item or add all the cases in which it is possible to have the desired item. Communicate Your Understanding 1. Give an example of a situation where you would use the formula ( p + 1)(q + 1)(r + 1) … − 1. Explain why this formula applies. 2. Give an example of a situation in which you would use the expression 2n − 1. Explain your reasoning. 3. Using examples, describe two different ways to solve a problem where at least one particular item must be chosen. Explain why both methods give the same answer. Practise 4. In how many ways can a committee with eight members form a subcommittee with A at least one person on it? 1. How many different sums of money can you make with a penny, a dime, a one-dollar B coin, and a two-dollar coin? 5. Determine whether the following questions involve permutations or combinations and 2. How many different sums of money can be list any formulas that would apply. made with one $5 bill, two $10 bills, and a) How many committees of 3 students can one $50 bill? be formed from 12 students? 3. How many subsets are there for a set with b) In how many ways can 12 runners finish a) two distinct elements? first, second, and third in a race? b) four distinct elements? c) How many outfits can you assemble from c) seven distinct elements? three pairs of pants, four shirts, and two pairs of shoes? d) How many two-letter arrangements can be formed from the word star? 286 MHR • Combinations and the Binomial Theorem
  • 287. Apply, Solve, Communicate 11. The number 5880 can be factored into prime divisors as 2 × 2 × 2 × 3 × 5 × 7 × 7. 6. Seven managers and eight sales representatives a) Determine the total number of divisors volunteer to attend a trade show. Their of 5880. company can afford to send five people. In b) How many of the divisors are even? how many ways could they be selected c) How many of the divisors are odd? a) without any restriction? b) if there must be at least one manager and 12. Application A theme park has a variety of one sales representative chosen? rides. There are seven roller coasters, four water rides, and nine story rides. If 7. Application A cookie jar contains three Stephanie wants to try one of each type chocolate-chip, two peanut-butter, one of ride, how many different combinations lemon, one almond, and five raisin cookies. of rides could she choose? a) In how many ways can you reach into the jar and select some cookies? 13. Shuwei finds 11 shirts in his size at a b) In how many ways can you select some clearance sale. How many different cookies, if you must include at least one purchases could Shuwei make? chocolate-chip cookie? 14. Communication Using the summary on 8. A project team of 6 students is to be selected page 285, draw a flow chart for solving from a class of 30. counting problems. a) How many different teams can be selected? 15. a) How many different teams of 4 students b) Pierre, Gregory, and Miguel are students could be chosen from the 15 students in in this class. How many of the teams the grade-12 Mathematics League? would include these 3 students? b) How many of the possible teams would c) How many teams would not include include the youngest student in the league? Pierre, Gregory, and Miguel? c) How many of the possible teams would exclude the youngest student? 9. The game of euchre uses only the 9s, 10s, jacks, queens, kings, and aces from a standard 16. Inquiry/Problem Solving deck of cards. How many five-card hands have a) Use combinations to determine how a) all red cards? many diagonals there are in b) at least two red cards? i) a pentagon ii) a hexagon c) at most two red cards? b) Draw sketches to verify your answers in part a). 10. If you are dealing from a standard deck of 52 cards, 17. A school is trying to decide on new school a) how many different 4-card hands could colours. The students can choose three have at least one card from each suit? colours from gold, black, green, blue, red, b) how many different 5-card hands could and white, but they know that another have at least one spade? school has already chosen black, gold, and red. How many different combinations of c) how many different 5-card hands could three colours can the students choose? have at least two face cards (jacks, queens, or kings? 5.3 Problem Solving With Combinations • MHR 287
  • 288. 18. Application The social convenor has 21. Communication 12 volunteers to work at a school dance. a) How many possible combinations are Each dance requires 2 volunteers at the there for the letters in the three circles door, 4 volunteers on the floor, and for each of the clue words in this puzzle? 6 floaters. Joe and Jim have not volunteered b) Explain why you cannot answer part a) before, so the social convenor does not want with a single nCr calculation for each word. to assign them to work together. In how many ways can the volunteers be assigned? 19. Jeffrey is a DJ at a local radio station. For pt ha e the second hour of his shift, he must choose C r all his music from the top 100 songs for the m P r oble week. Jeffery will play exactly 12 songs during this hour. a) How many different stacks of discs could Jeffrey pull from the station’s collection if he chooses at least 10 songs that are in positions 15 to 40 on the charts? b) Jeffrey wants to start his second hour with a hard-rock song and finish with a pop classic. How many different play lists can Jeffrey prepare if he has chosen 4 hard rock songs, 5 soul pieces, and 3 pop classics? © Tribune Media Services, Inc. All Rights Reserved. Reprinted with Permission. c) Jeffrey has 8 favourite songs currently on 22. Determine the number of ways of selecting the top 100 list. How many different four letters, without regard for order, from subsets of these songs could he choose to the word parallelogram. play during his shift? C ACHIEVEMENT CHECK 23. Inquiry/Problem Solving Suppose the artist Knowledge/ Thinking/Inquiry/ Communication Application in Example 1 of this section had two apples, Understanding Problem Solving two oranges, and two pears in his 20. There are 52 white keys on a piano. The refrigerator. How many combinations does lowest key is A. The keys are designated A, he have to choose from if he wants to paint B, C, D, E, F, and G in succession, and a still-life with then the sequence of letters repeats, ending a) two pieces of fruit? with a C for the highest key. b) three pieces of fruit? a) If five notes are played simultaneously, c) four pieces of fruit? in how many ways could the notes all be i) As? ii) Gs? 24. How many different sums of money can be iii) the same letter? iv) different letters? formed from one $2 bill, three $5 bills, two $10 bills, and one $20 bill? b) If the five keys are played in order, how would your answers in part a) change? 288 MHR • Combinations and the Binomial Theorem
  • 289. 5.4 The Binomial Theorem Recall that a binomial is a polynomial with just two terms, so it has the form a + b. Expanding (a + b)n becomes very laborious as n increases. This section introduces a method for expanding powers of binomials. This method is useful both as an algebraic tool and for probability calculations, as you will see in later chapters. Blaise Pascal I N V E S T I G AT E & I N Q U I R E : Patterns in the Binomial Expansion 1. Expand each of the following and simplify fully. a) (a + b)1 b) (a + b)2 c) (a + b)3 d) (a + b)4 e) (a + b)5 2. Study the terms in each of these expansions. Describe how the degree of each term relates to the power of the binomial. 3. Compare the terms in Pascal’s triangle to the expansions in question 1. Describe any pattern you find. 4. Predict the terms in the expansion of (a + b)6. In section 4.4, you found a number of patterns in Pascal’s triangle. Now that you are familiar with combinations, there is another important pattern that you can recognize. Each term in Pascal’s triangle corresponds to a value of nCr. 1 0 C0 1 1 1 C0 1 C1 1 2 1 C 2 0 C 2 1 2 C2 1 3 3 1 3 C0 3 C1 3 C2 3 C3 1 4 6 4 1 C 4 0 C 4 1 C 4 2 C 4 3 4 C4 1 5 10 10 5 1 C0 5 5 C1 5 C2 5 C3 5 C4 C5 5 5.4 The Binomial Theorem • MHR 289
  • 290. Comparing the two triangles shown on page 289, you can see that tn,r = nCr. Recall that Pascal’s method for creating his triangle uses the relationship tn,r = tn−1, r−1 + tn−1, r. So, this relationship must apply to combinations as well. Pascal’s Formula n Cr = n−1Cr−1 + n−1Cr Proof: (n − 1)! (n − 1)! Cr−1 + n−1Cr = ᎏᎏ + ᎏᎏ n−1 (r − 1)!(n − r)! r!(n − r − 1)! r(n − 1)! (n − 1)!(n − r) = ᎏᎏ + ᎏᎏᎏ r(r − 1)!(n − r)! r!(n − r)(n − r − 1)! r(n − 1)! (n − 1)!(n − r) = ᎏ + ᎏᎏ r!(n − r)! r!(n − r)! (n − 1)! = ᎏ [r + (n − r)] r!(n − r)! (n − 1)! × n = ᎏᎏ r!(n − r)! n! = ᎏ r!(n − r)! = nCr This proof shows that the values of nCr do indeed follow the pattern that creates Pascal’s triangle. It follows that nCr = tn,r for all the terms in Pascal’s triangle. Example 1 Applying Pascal’s Formula to Combinations Rewrite each of the following using Pascal’s formula. a) C 12 8 b) 19C5 + 19C6 Solution a) 12 C8 = 11C7 + 11C8 b) 19 C5 + 19C6 = 20C6 As you might expect from the investigation at the beginning of this section, the coefficients of each term in the expansion of (a + b)n correspond to the terms in row n of Pascal’s triangle. Thus, you can write these coefficients in combinatorial form. 290 MHR • Combinations and the Binomial Theorem
  • 291. The Binomial Theorem (a + b)n = nC0 a n + nC1a n−1b + nC2a n−2b 2 + … + nCr a n−rb r + … + nCnb n n or (a + b)n = Α nCr a n−rb r r=0 Example 2 Applying the Binomial Theorem Use combinations to expand (a + b)6. Solution 6 (a + b)6 = Α 6Cr a 6−rb r r=0 = 6C0a6 + 6C1a5b + 6C2a4b 2 + 6C3a3b3 + 6C4a2b4 + 6C5ab5 + 6C6b6 = a6 + 6a5b + 15a4b 2 + 20a3b3 + 15a2b4 + 6ab5 + b6 Example 3 Binomial Expansions Using Pascal’s Triangle Use Pascal’s triangle to expand a) (2x −1)4 b) (3x − 2y)5 Solution a) Substitute 2x for a and −1 for b. Since the exponent is 4, use the terms in row 4 of Pascal’s triangle as the coefficients: 1, 4, 6, 4, and 1. Thus, (2x − 1)4 = 1(2x)4 + 4(2x)3( − 1) + 6(2x)2( − 1)2 + 4(2x)( − 1)3 + 1( − 1)4 = 16x4 + 4(8x3)( − 1) + 6(4x2)(1) + 4(2x)( − 1) + 1 = 16x4 − 32x3 + 24x2 − 8x + 1 b) Substitute 3x for a and −2y for b, and use the terms from row 5 as coefficients. (3x − 2y)5 = 1(3x)5 + 5(3x)4( − 2y) + 10(3x)3( − 2y)2 + 10(3x)2( − 2y)3 + 5(3x)( − 2y)4 + 1( − 2y)5 = 243x5 − 810x4 y + 1080x3y2 − 720x2y3 + 240xy4 − 32y5 Example 4 Expanding Binomials Containing Negative Exponents ΂ ΃ 2 4 Use the binomial theorem to expand and simplify x + ᎏ . x2 5.4 The Binomial Theorem • MHR 291
  • 292. Solution 2 Substitute x for a and ᎏ for b. x2 4 ΂ ΃ ΂ ΃ 2 4 2 r x+ ᎏ x2 = Α r=0 Cr x 4−r ᎏᎏ 4 x2 ΂ ΃ ΂ ΃ ΂ ΃ ΂ ΃ 2 2 2 2 3 2 4 = 4C0 x4 + 4C1x3 ᎏ + 4C2x2 ᎏ + 4C3x ᎏ + 4C4 ᎏ x2 x2 x2 x2 = 1x + 4x ΂ ᎏ ΃ + 6x ΂ ᎏ ΃ + 4x΂ ᎏ ΃ + 1΂ ᎏ ΃ 4 2 3 4 2 8 16 2 4 6 8 x x x x = x4 + 8x + 24x −2 + 32x −5 + 16x −8 Example 5 Patterns With Combinations Using the patterns in Pascal’s triangle from the investigation and Example 4 in section 4.4, write each of the following in combinatorial form. a) the sum of the terms in row 5 and row 6 b) the sum of the terms in row n c) the first 5 triangular numbers d) the nth triangular number Solution a) Row 5: Row 6: 1 + 5 + 10 + 10 + 5 + 1 1 + 6 + 15 + 20 + 15 + 6 + 1 = 5C0 + 5C1 + 5C2 + 5C3 + 5C4 + 5C5 = 6C0 + 6C1 + 6C2 + 6C3 + 6C4 + 6C5 + 6C6 = 32 = 64 = 25 = 26 b) From part a) it appears that nC0 + nC1 + … + nCn = 2n. Using the binomial theorem, 2n = (1 + 1)n = nC0 × 1n + nC1 × 1n−1 × 1 + … + nCn × 1n = nC0 + nC1 + … + nCn c) n Triangular Numbers Combinatorial Form 1 1 C 2 2 2 3 3 C2 3 6 4 C2 4 10 5 C2 5 15 6 C2 d) The nth triangular number is C2. n+1 292 MHR • Combinations and the Binomial Theorem
  • 293. Example 6 Factoring Using the Binomial Theorem Rewrite 1 + 10x2 + 40x4 + 80x6 + 80x8 + 32x10 in the form (a + b)n. Solution There are six terms, so the exponent must be 5. The first term of a binomial expansion is an, so a must be 1. The final term is 32x10 = (2x2)5, so b = 2x2. Therefore, 1 + 10x2 + 40x4 + 80x6 + 80x8 + 32x10 = (1 + 2x2)5 Key Concepts • The coefficients of the terms in the expansion of (a + b)n correspond to the terms in row n of Pascal’s triangle. • The binomial (a + b)n can also be expanded using combinatorial symbols: n (a + b)n = nC0 a n + nC1 a n−1b + nC2 a n−2b 2 + … + nCn b n or Α r=0 Cr a n−rb r n • The degree of each term in the binomial expansion of (a + b)n is n. • Patterns in Pascal’s triangle can be summarized using combinatorial symbols. Communicate Your Understanding 1. Describe how Pascal’s triangle and the binomial theorem are related. 2. a) Describe how you would use Pascal’s triangle to expand (2x + 5y)9. b) Describe how you would use the binomial theorem to expand (2x + 5y)9. 3. Relate the sum of the terms in the nth row of Pascal’s triangle to the total number of subsets of a set of n elements. Explain the relationship. Practise 2. Determine the value of k in each of these terms from the binomial expansion of (a + b)10. A a) 210a 6b k b) 45a kb8 c) 252a kb k 1. Rewrite each of the following using Pascal’s formula. 3. How many terms would be in the expansion a) C11 b) C36 of the following binomials? 17 43 c) Cr+1 d) C4 + 32C5 a) (x + y)12 b) (2x − 3y)5 c) (5x − 2)20 n+1 32 e) 15 C10 + 15C9 f) nCr + nCr+1 4. For the following terms from the expansion g) C − 17C9 18 9 h) C − 23C7 24 8 of (a + b)11, state the coefficient in both nCr i) nCr − Cr−1 and numeric form. n−1 a) a 2b 9 b) a11 c) a 6b 5 5.4 The Binomial Theorem • MHR 293
  • 294. Apply, Solve, Communicate 10. Communication a) Find and simplify the first five terms of B the expansion of (3x + y)10. 5. Using the binomial theorem and patterns in b) Find and simplify the first five terms of Pascal’s triangle, simplify each of the the expansion of (3x − y)10. following. c) Describe any similarities and differences a) 9C0 + 9C1 + … + 9C9 between the terms in parts a) and b). b) 12 C0 − 12C1 + 12C2 − … − 12C11 + 12C12 15 n 11. Use the binomial theorem to expand and c) Α r=0 15 Cr d) ΑC r=0 n r simplify the following. ΂ ΃ ΂ ΃ 1 5 3 4 a) x2 − ᎏ b) 2y + ᎏ n x y2 6. If Α nCr = 16 384, determine the value of n. ΂ ΃ r=0 k 5 c) (͙x + 2x2)6 ෆ d) k + ᎏ2 7. a) Write formulas in combinatorial form for m the following. (Refer to section 4.4, if ΂ ΃ ΂ ΃ 2 7 2 4 necessary.) e) ͙y − ᎏ ෆ f) 2 3m2 − ᎏ ͙yෆ ͙ෆm i) the sum of the squares of the terms in the nth row of Pascal’s triangle 12. Application Rewrite the following expansions ii) the result of alternately adding and in the form (a + b)n, where n is a positive subtracting the squares of the terms in integer. the nth row of Pascal’s triangle a) x6 + 6x5y + 15x4 y2 + 20 x3y3 + 15x2y4 iii) the number of diagonals in an n-sided + 6xy5 + y6 polygon b) y12 + 8y9 + 24y6 + 32y3 + 16 b) Use your formulas from part a) to c) 243a5 − 405a4b + 270a3b 2 − 90a2b3 determine + 15ab4 − b5 i) the sum of the squares of the terms in 13. Communication Use the binomial theorem to row 15 of Pascal’s triangle simplify each of the following. Explain your ii) the result of alternately adding and results. subtracting the squares of the terms in ΂ ΃ ΂ ΃ ΂ ΃ ΂ ΃ 1 5 1 5 1 5 1 5 row 12 of Pascal’s triangle a) ᎏ + 5 ᎏ + 10 ᎏ + 10 ᎏ 2 2 2 2 iii) the number of diagonals in a 14-sided ΂2΃ ΂2΃ 1 1 5 5 polygon +5 ᎏ + ᎏ 8. How many terms would be in the expansion b) (0.7)7 + 7(0.7)6(0.3) + 21(0.7)5(0.3)2 + … of (x2 + x)8? + (0.3)7 c) 79 − 9 × 78 + 36 × 77 − … − 70 9. Use the binomial theorem to expand and simplify the following. ΂ ΃ and compare it with 2 4 14. a) Expand x + ᎏᎏ a) (x + y) 7 b) (2x + 3y) 6 x c) (2x − 5y) d) (x2 + 5)4 1 5 the expansion of ᎏ (x2 + 2)4. x4 e) (3a2 + 4c)7 f) 5(2p − 6c2)5 b) Explain your results. 294 MHR • Combinations and the Binomial Theorem
  • 295. 15. Use your knowledge of algebra and the 20. Inquiry/Problem Solving binomial theorem to expand and simplify a) Use the binomial theorem to expand each of the following. (x + y + z)2 by first rewriting it as a) (25x2 + 30xy + 9y2)3 [x + ( y + z)]2. b) (3x − 2y)5(3x + 2y)5 b) Repeat part a) with (x + y + z)3. c) Using parts a) and b), predict the 16. Application expansion of (x + y + z)4. Verify your a) Calculate an approximation for (1.2)9 by prediction by using the binomial expanding (1 + 0.2)9. theorem to expand (x + y + z)4. b) How many terms do you have to evaluate d) Write a formula for (x + y + z)n. to get an approximation accurate to two e) Use your formula to expand and simplify decimal places? (x + y + z)5. 17. In a trivia contest, Adam has drawn a topic he 21. a) In the expansion of (x + y)5, replace x knows nothing about, so he makes random and y with B and G, respectively. Expand guesses for the ten true/false questions. Use and simplify. the binomial theorem to help find b) Assume that a couple has an equal a) the number of ways that Adam can chance of having a boy or a girl. How answer the test using exactly four trues would the expansion in part a) help find b) the number of ways that Adam can the number of ways of having k girls in a answer the test using at least one true family with five children? c) In how many ways could a family with ACHIEVEMENT CHECK five children have exactly three girls? Knowledge/ Thinking/Inquiry/ Communication Application d) In how many ways could they have Understanding Problem Solving exactly four boys? 18. a) Expand (h + t)5. 22. A simple code consists of a string of five b) Explain how this expansion can be used to symbols that represent different letters of determine the number of ways of getting the alphabet. Each symbol is either a dot (•) exactly h heads when five coins are tossed. or a dash (–). c) How would your answer in part b) a) How many different letters are possible change if six coins are being tossed? How using this code? would it change for n coins? Explain. b) How many coded letters will contain exactly two dots? C c) How many different coded letters will contain at least one dash? 19. Find the first three terms, ranked by degree of the terms, in each expansion. a) (x + 3)(2x + 5)4 b) (2x + 1)2(4x − 3)5 c) (x2 − 5)9(x3 + 2)6 5.4 The Binomial Theorem • MHR 295
  • 296. Review of Key Concepts 5.1 Organized Counting With Venn b) Use a Venn diagram to find the Diagrams proportion of households in each Refer to the Key Concepts on page 270. of these categories. 1. Which regions in the diagram below 5.2 Combinations correspond to Refer to the Key Concepts on page 278. a) the union of sets A and B? b) the intersection of sets B and C? 4. Evaluate the following and indicate any calculations that could be done manually. c) A ∩ C? a) 41 C8 b) 33 C15 d) either B or S? c) 25 C17 d) 50 C10 S A B e) 10 C8 f) 15 C13 R2 R6 R3 g) 5C4 h) C24 25 R1 R8 R5 R7 i) 15 C11 j) 25 C20 R4 k) 16 C8 l) 30 C26 C 5. A track and field club has 12 members who are runners and 10 members who specialize 2. a) Write the equation for the number of in field events. The club has been invited to elements contained in either of two sets. send a team of 3 runners and 2 field athletes b) Explain why the principle of inclusion to an out-of-town meet. How many and exclusion subtracts the last term in different teams could the club send? this equation. 6. A bridge hand consists of 13 cards. How c) Give a simple example to illustrate your explanation. many bridge hands include 5 cards of one suit, 6 cards of a second, and 2 cards of a 3. A survey of households in a major city found third? that 7. Explain why combination locks should really • 96% had colour televisions be called permutation locks. • 65% had computers • 51% had dishwashers 5.3 Problem Solving With Combinations • 63% had colour televisions and computers Refer to the Key Concepts on page 286. • 49% had colour televisions and 8. At Subs Galore, you have a choice of lettuce, dishwashers onions, tomatoes, green peppers, • 31% had computers and dishwashers mushrooms, cheese, olives, cucumbers, and • 30% had all three hot peppers on your submarine sandwich. a) List the categories of households not How many ways can you “dress” your included in these survey results. sandwich? 296 MHR • Combinations and the Binomial Theorem
  • 297. 9. Ballots for municipal elections usually list 16. Use the binomial theorem to expand candidates for several different positions. a) (x + y)6 If a resident can vote for a mayor, two b) (6x − 5y)4 councillors, a school trustee, and a hydro commissioner, how many combinations of c) (5x + 2y)5 positions could the resident choose to mark d) (3x − 2)6 on the ballot? 17. Write the first three terms of the expansion 10. There are 12 questions on an examination, of and each student must answer 8 questions a) (2x + 5y)7 including at least 4 of the first 5 questions. b) (4x − y)6 How many different combinations of questions could a student choose to answer? 18. Describe the steps in the binomial expansion of (2x − 3y)6. 11. Naomi invites eight friends to a party on short notice, so they may not all be able to 19. Find the last term in the binomial expansion ΂ᎏxᎏ + 2x΃ . come. How many combinations of guests 1 5 of 2 could attend the party? 12. In how many ways could 15 different books 20. Find the middle term in the binomial ΂ ΃ be divided equally among 3 people? 5 8 expansion of ͙x + ᎏ . ෆ ͙ෆ x 13. The camera club has five members, and the mathematics club has eight. There is only 21. In the expansion of (a + x)6, the first three one member common to both clubs. In how terms are 1 + 3 + 3.75. Find the values many ways could a committee of four people of a and x. be formed with at least one member from 22. Use the binomial theorem to expand and each club? simplify ( y2 − 2)6( y2 + 2)6. 23. Write 1024x10 − 3840x8 + 5760x6 − 4320x4 + 5.4 The Binomial Theorem 1620x2 − 243 in the form (a + b)n. Explain Refer to the Key Concepts on page 293. your steps. 14. Without expanding (x + y)5, determine a) the number of terms in the expansion b) the value of k in the term 10x k y2 15. Use Pascal’s triangle to expand a) (x + y)8 b) (4x − y)6 c) (2x + 5y)4 d) (7x − 3)5 Review of Key Concepts • MHR 297
  • 298. Chapter Test ACHIEVEMENT CHART Knowledge/ Thinking/Inquiry/ Category Communication Application Understanding Problem Solving Questions All 12 6, 12 5, 6, 7, 8, 9 1. Evaluate each of the following. List any 6. A track club has 20 members. calculations that require a calculator. a) In how many ways can the club choose a) C 25 25 3 members to help officiate at a meet? b) 52 C1 b) In how many ways can the club choose c) C3 a starter, a marshal, and a timer? 12 d) C15 c) Should your answers to parts a) and b) 40 be the same? Explain why or why not. 2. Rewrite each of the following as a single combination. 7. Statistics on the grade-12 courses taken by students graduating from a secondary a) 10 C7 + 10C8 school showed that b) 23 C15 − 22C14 • 85 of the graduates had taken a science 3. Use Pascal’s triangle to expand course • 75 of the graduates had taken a second a) (3x − 4)4 language b) (2x + 3y)7 • 41 of the graduates had taken mathematics 4. Use the binomial theorem to expand • 43 studied both science and a second a) (8x − 3)5 language b) (2x − 5y)6 • 32 studied both science and mathematics • 27 had studied both a second language 5. A student fundraising committee has 14 and mathematics members, including 7 from grade 12. In how • 19 had studied all three subjects many ways can a 4-member subcommittee a) Use a Venn diagram to determine the for commencement awards be formed if minimum number of students who could a) there are no restrictions? be in this graduating class. b) the subcommittee must be all grade-12 b) How many students studied students? mathematics, but neither science nor c) the subcommittee must have 2 students a second language? from grade 12 and 2 from other grades? d) the subcommittee must have no more than 3 grade-12 students? 298 MHR • Combinations and the Binomial Theorem
  • 299. 8. A field-hockey team played seven games and b) The restaurant also has a lunch special won four of them. There were no ties. with your choice of one item from each a) How many arrangements of the four group. How many choices do you have wins and three losses are possible? with this special? b) In how many of these arrangements 10. In the expansion of (1 + x)n, the first three would the team have at least two wins terms are 1 − 0.9 + 0.36. Find the values of x in a row? and n. 9. A restaurant offers an all-you-can-eat 11. Use the binomial theorem to expand and Chinese buffet with the following items: simplify (4x2 − 12x + 9)3. • egg roll, wonton soup 12. A small transit bus has 8 window seats and • chicken wings, chicken balls, beef, pork 12 aisle seats. Ten passengers board the bus • steamed rice, fried rice, chow mein and select seats at random. How many • chop suey, mixed vegetables, salad seating arrangements have all the window • fruit salad, custard tart, almond cookie seats occupied if which passenger is in a seat a) How many different combinations of a) does not matter? b) matters? items could you have? ACHIEVEMENT CHECK Knowledge/Understanding Thinking/Inquiry/Problem Solving Communication Application 13. The students’ council is having pizza at their next meeting. There are 20 council members, 6 of whom are vegetarian. A committee of 3 will order six pizzas from a pizza shop that has a special price for large pizzas with up to three toppings. The shop offers ten different toppings. a) How many different pizza committees can the council choose if there must be at least one vegetarian and one non-vegetarian on the committee? b) In how many ways could the committee choose exactly three toppings for a pizza? c) In how many ways could the committee choose up to three toppings for a pizza? d) The committee wants as much variety as possible in the toppings. They decide to order each topping exactly once and to have at least one topping on each pizza. Describe the different cases possible when distributing the toppings in this way. e) For one of these cases, determine the number of ways of choosing and distributing the ten toppings. Chapter Test • MHR 299
  • 300. 6 PT ER Introduction to Probability CHA Specific Expectations Section Use Venn diagrams as a tool for organizing information in counting 6.5 problems. Solve problems, using techniques for counting permutations where some 6.3 objects may be alike. Solve problems, using techniques for counting combinations. 6.3 Solve probability problems involving combinations of simple events, 6.3, 6.4, 6.5, using counting techniques. 6.6 Interpret probability statements, including statements about odds, from a 6.1, 6.2, 6.3, variety of sources. 6.4, 6.5, 6.6 Design and carry out simulations to estimate probabilities in situations 6.3 for which the calculation of the theoretical probabilities is difficult or impossible. Assess the validity of some simulation results by comparing them with the 6.3 theoretical probabilities, using the probability concepts developed in the course. Represent complex tasks or issues, using diagrams. 6.1, 6.5 Represent numerical data, using matrices, and demonstrate an understanding of terminology and notation related to matrices. 6.6 Demonstrate proficiency in matrix operations, including addition, scalar multiplication, matrix multiplication, the calculation of row sums, and the 6.6 calculation of column sums, as necessary to solve problems, with and without the aid of technology. Solve problems drawn from a variety of applications, using matrix methods. 6.6
  • 301. Chapter Problem Genetic Probabilities themselves, while offspring of healthy does Biologists are studying a deer population have only a 20% likelihood of developing in a provincial conservation area. The it. Currently, 30% of the does have bald biologists know that many of the bucks patches. (male deer) in the area have an unusual 1. Out of ten deer randomly captured, “cross-hatched” antler structure, which how many would you expect to have seems to be genetic in origin. Of either cross-hatched antlers or bald 48 randomly tagged deer, 26 were does patches? (females), 22 were bucks, and 7 of the bucks had cross-hatched antlers. 2. Do you think that the proportion of does with the bald patches will increase, Several of the does have small bald patches decrease, or remain relatively stable? on their hides. This condition also seems to have some genetic element. Careful long- In this chapter, you will learn methods that term study has found that female offspring the biologists could use to calculate of does with bald patches have a 65% probabilities from their samples and to likelihood of developing the condition make predictions about the deer population.
  • 302. Review of Prerequisite Skills If you need help with any of the skills listed in purple below, refer to Appendix A. 1. Fractions, percents, decimals Express each 6. Tree diagrams In the game of backgammon, decimal as a percent. you roll two dice to determine how you can a) 0.35 move your counters. Suppose you roll first one die and then the other and you need to b) 0.04 roll 9 or more to move a counter to safety. c) 0.95 Use a tree diagram to list the different d) 0.008 rolls in which e) 0.085 a) you make at least 9 f) 0.375 b) you fail to move your counter to safety 2. Fractions, percents, decimals Express each 7. Fundamental counting principle (section 4.1) percent as a decimal. Benoit is going skating on a cold wintry day. a) 15% b) 3% He has a toque, a watch cap, a beret, a heavy c) 85% d) 6.5% scarf, a light scarf, leather gloves, and wool gloves. In how many different ways can e) 26.5% f) 75.2% Benoit dress for the cold weather? 3. Fractions, percents, decimals Express each 8. Additive counting principle (section 4.1) How percent as a fraction in simplest form. many 13-card bridge hands include either a) 12% b) 35% seven hearts or eight diamonds? c) 67% d) 4% 9. Venn diagrams (section 5.1) e) 0.5% f) 98% a) List the elements for each of the 4. Fractions, percents, decimals Express each following sets for whole numbers from fraction as a percent. Round answers to the 1 to 10 inclusive. nearest tenth, if necessary. i) E, the set of even numbers 1 13 ii) O, the set of odd numbers a) ᎏᎏ b) ᎏᎏ 4 15 iii) C, the set of composite numbers 11 7 c) ᎏᎏ d) ᎏᎏ iv) P, the set of perfect squares 14 10 4 13 b) Draw a diagram to illustrate how the e) ᎏᎏ f) ᎏᎏ 9 20 following sets are related. i) E and O 5. Tree diagrams A coin is flipped three times. Draw a tree diagram to illustrate all possible ii) E and C outcomes. iii) O and P iv) E, C, and P 302 MHR • Introduction to Probability
  • 303. 10. Principle of inclusion and exclusion 16. Combinations (section 5.2) A pizza shop has (section 5.1) nine toppings available. How many different a) Explain the principle of inclusion and three-topping pizzas are possible if each exclusion. topping is selected no more than once? b) A gift store stocks baseball hats in red or 17.Combinations (section 5.3) A construction green colours. Of the 35 hats on display crew has 12 carpenters and 5 drywallers. on a given day, 20 are green. As well, How many different safety committees could 18 of the hats have a grasshopper logo they form if the members of this committee on the brim. Suppose 11 of the red hats are have logos. How many hats are red, or have logos, or both? a) any 5 of the crew? b) 3 carpenters and 2 drywallers? 11. Factorials (section 4.2) Evaluate. a) 6! b) 0! 18. Matrices (section 1.6) Identify any square 16! 12! matrices among the following. Also identify c) ᎏ d) ᎏ any column or row matrices. 14! 9! 3! e) ᎏᎏ 100! 98! f) ᎏᎏ 16! 10! × 8! a) ΄ 3 4 0 1 ΅ b) [0.4 0.3 0.2] ΄ ΅ ΄ ΅ 1 0 −2 3 9 12. Permutations (section 4.2) Evaluate. c) 0.5 0.5 d) 0 11 −4 a) 5P3 b) P 7 1 0.8 0.6 3 6 −1 c) P(6, 2) d) P ΄ ΅ ΄ ΅ 9 9 49 63 8 e) 100 1 P f) P(100, 2) e) 25 14 f) 16 13. Permutations (section 4.2) A baseball team 72 9 32 has 13 members. If a batting line-up consists 19. Matrices (section 1.7) Given A = [0.3 0.7] of 9 players, how many different batting line-ups are possible? and B = ΄ 0.4 0.55 0.45 ΅ 0.6 , perform the 14. Permutations (section 4.2) What is the following matrix operations, if possible. If maximum number of three-digit area codes the operation is not possible, explain why. possible if the area codes cannot start with a) A × B b) B × A either 1 or 0? 2 c) B d) B3 15. Combinations (section 5.2) Evaluate these e) A 2 f) A × A t expressions. a) 6C3 b) C(4, 3) c) 8C8 d) 11 C0 ΂4΃ × ΂5 ΃ ΂1΃ 6 7 100 e) f) g) 20 C2 h) 20 C18 Review of Prerequisite Skills • MHR 303
  • 304. 6.1 Basic Probability Concepts How likely is rain tomorrow? What are the chances that you will pass your driving test on the first attempt? What are the odds that the flight will be on time when you go to meet someone at the airport? Probability is the branch of mathematics that attempts to predict answers to questions like these. As the word probability suggests, you can often predict only what might happen. However, you may be able to calculate how likely it is. For example, if the weather report forecasts a 90% chance of rain, there is still that slight possibility that sunny skies will prevail. While there are no sure answers, in this case it probably will rain. I N V E S T I G AT E & I N Q U I R E : A Number Game Work with a partner. Have each partner take three identical slips of paper, number them 1, 2, and 3, and place them in a hat, bag, or other container. For each trial, both partners will randomly select one of their three slips of paper. Replace the slips after each trial. Score points as follows: • If the product of the two numbers shown is less than the sum, Player A gets a point. • If the product is greater than the sum, Player B gets a point. • If the product and sum are equal, neither player gets a point. 1. Predict who has the advantage in this game. Explain why you think so. 2. Decide who will be Player A by flipping a coin or using the random number generator on a graphing calculator. Organize your results in a table like the one below. Trial 1 2 3 4 5 6 7 8 9 10 Number drawn by A Number drawn by B Product Sum Point awarded to: 304 MHR • Introduction to Probability
  • 305. 3. a) Record the results for 10 trials. Total the points and determine the winner. Do the results confirm your prediction? Have you changed your opinion on who has the advantage? Explain. b) To estimate a probability for each player getting a point, divide the number of points each player earned by the total number of trials. 4. a) Perform 10 additional trials and record point totals for each player over all 20 trials. Estimate the probabilities for each player, as before. b) Are the results for 20 trials consistent with the results for 10 trials? Explain. c) Are your results consistent with those of your classmates? Comment on your findings. 5. Based on your results for 20 trials, predict how many points each player will have after 50 trials. 6. Describe how you could alter the game so that the other player has the advantage. The investigation you have just completed is an example of a probability experiment. In probability, an experiment is a well-defined process consisting of a number of trials in which clearly distinguishable outcomes, or possible results, are observed. The sample space, S, of an experiment is the set of all possible outcomes. For the sum/product game in the investigation, the outcomes are all the possible pairings of slips drawn by the two players. For example, if Player A draws 1 and Player B draws 2, you can label this outcome (1, 2). In this particular game, the result is the same for the outcomes (1, 2) and (2, 1), but with different rules it might be important who draws which number, so it makes sense to view the two outcomes as different. Outcomes are often equally likely. In the sum/product game, each possible pairing of numbers is as likely as any other. Outcomes are often grouped into events. An example of an event is drawing slips for which the product is greater than the sum, and there are several outcomes in which this event happens. Different events often have different chances of occurring. Events are usually labelled with capital letters. Example 1 Outcomes and Events Let event A be a point awarded to Player A in the sum/product game. List the outcomes that make up event A. Solution Player A earns a point if the sum of the two numbers is greater than the product. This event is sometimes written as event A = {sum > product}. A useful technique in probability is to tabulate the possible outcomes. 6.1 Basic Probability Concepts • MHR 305
  • 306. Sums Products Player A Player A 1 2 3 1 2 3 Player B 1 2 3 4 Player B 1 1 2 3 2 3 4 5 2 2 4 6 3 4 5 6 3 3 6 9 Use the tables shown to list the outcomes where the sum is greater than the product: (1, 1), (1, 2), (1, 3), (2, 1), (3, 1) These outcomes make up event A. Using this list, you can also write event A as event A = {(1, 1), (1, 2), (1, 3), (2, 1), (3, 1)} The probability of event A, P(A), is a quantified measure of the likelihood that the event will occur. The probability of an event is always a value between 0 and 1. A probability of 0 indicates that the event is impossible, and 1 signifies that the event is a certainty. Most events in probability studies fall somewhere between these extreme values. Probabilities less than 0 or greater than 1 have no meaning. Probability can be expressed as fractions, decimals, or percents. Probabilities expressed as percents are always between 0% and 100%. For example, a 70% chance of rain tomorrow means the same as a probability of 0.7, 7 or ᎏᎏ, that it will rain. 10 The three basic types of probability are • empirical probability, based on direct observation or experiment • theoretical probability, based on mathematical analysis • subjective probability, based on informed guesswork The empirical probability of a particular event (also called experimental or relative frequency probability) is determined by dividing the number of times that the event actually occurs in an experiment by the number of trials. In the sum/product investigation, you were calculating empirical probabilities. For example, if you had found that in the first ten trials, the product was greater than the sum four times, then the empirical probability of this event would be 4 P(A) = ᎏ 10 2 = ᎏᎏ or 0.4 5 The theoretical probability of a particular event is deduced from analysis of the possible outcomes. Theoretical probability is also called classical or a priori probability. A priori is Latin for “from the preceding,” meaning based on analysis rather than experiment. 306 MHR • Introduction to Probability
  • 307. For example, if all possible outcomes are equally likely, then Project Prep n(A) P(A) = ᎏᎏ n(S) You will need to where n(A) is the number of outcomes in which event A can occur, and n(S) determine theoretical is the total number of possible outcomes. You used tables to list the probabilities to outcomes for A in Example 1, and this technique allows you to find the design and analyse theoretical probability P(A) by counting n(A) = 5 and n(S) = 9. Another way your game in the to determine the values of n(A) and n(S) is by organizing the information in probability project. a tree diagram. Example 2 Using a Tree Diagram to Calculate Probability Determine the theoretical probabilities for each key event in the sum/product game. Solution product sum The tree diagram shows the nine possible outcomes, each 1 1 < 2 equally likely, for the sum/product game. 1 2 2 < 3 3 3 < 4 Let event A be a point for Player A, event B a point 1 2 < 3 for Player B, and event C a tie between sum and 2 2 4 = 4 product. From the tree diagram, five of the nine possible 3 6 > 5 outcomes have the sum greater than the product. 1 3 < 4 Therefore, the theoretical probability of this event is 3 2 6 > 5 3 9 > 6 n(A) P(A) = ᎏ n(S) 5 = ᎏᎏ 9 Similarly, n(B) n(C) P(B) = ᎏ and P(C ) = ᎏ n(S) n(S) 3 1 = ᎏᎏ = ᎏᎏ 9 9 In Example 2, you know that one, and only one, of the three events will occur. The sum of the probabilities of all possible events always equals 1. 5 3 1 P(A) + P(B) + P(C) = ᎏᎏ + ᎏᎏ + ᎏᎏ 9 9 9 =1 Here, the numerator in each fraction represents the number of ways that each event can occur. The total of these numerators is the total number of possible outcomes, which is equal to the denominator. 6.1 Basic Probability Concepts • MHR 307
  • 308. Empirical probabilities may differ sharply from theoretical probabilities when only a few trials are made. Such statistical fluctuation can result in an event occurring more frequently or less frequently than theoretical probability suggests. Over a large number of trials, however, statistical fluctuations tend to cancel each other out, and empirical probabilities usually approach theoretical values. Statistical fluctuations often appear in sports, for example, where a team can enjoy a temporary winning streak that is not sustainable over an entire season. In most problems, you will be determining theoretical probability. Therefore, from now on you may take the term probability to mean theoretical probability unless stated otherwise. Example 3 Dice Probabilities Many board games involve a roll of two six-sided dice to see how far you may move your pieces or counters. What is the probability of rolling a total of 7? Solution The table shows the totals for all possible rolls of two dice. First Die 1 2 3 4 5 6 1 2 3 4 5 6 7 2 Second Die 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 To calculate the probability of a particular total, count the number of times it appears in the table. For event A = {rolling 7}, n(A) P(A) = ᎏ n(S) n(rolls totalling 7) = ᎏᎏᎏ n(all possible rolls) 6 = ᎏᎏ 36 1 = ᎏᎏ 6 1 The probability of rolling a total of 7 is ᎏᎏ. 6 308 MHR • Introduction to Probability
  • 309. A useful and important concept in probability is the complement of an event. The complement of event A, A′ or ~A, is the event that “event A does not happen.” Thus, whichever outcomes make up A, all the other outcomes make up A′. Because A and A′ together include all possible outcomes, the sum of their probabilities must be 1. Thus, P(A) + P(AЈ) = 1 and P(AЈ) = 1 − P(A) A' A The event A′ is usually called “A-prime,” or sometimes “not-A”; ~A is called “tilde-A.” Example 4 The Complement of an Event What is the probability that a randomly drawn integer between 1 and 40 is not a perfect square? Solution Let event A = {a perfect square}. Then, the complement of A is the event A′ = {not a perfect square}. In this case, you need to calculate P(A′), but it is easier to do this by finding P(A) first. There are six perfect squares between 1 and 40: 1, 4, 9, 16, 25, and 36. The probability of a perfect square is, therefore, n(A) P(A) = ᎏ n(S) 6 = ᎏᎏ 40 3 = ᎏᎏ 20 Thus, P(A′) = 1 − P(A) 3 = 1 − ᎏᎏ 20 17 = ᎏᎏ 20 17 There is a ᎏᎏ or 85% chance that a random integer between 1 and 40 will not 20 be a perfect square. 6.1 Basic Probability Concepts • MHR 309
  • 310. Subjective probability, the third basic type of probability, is an estimate of likelihood based on intuition and experience—an educated guess. For example, a well-prepared student may be 90% confident of passing the next data management test. Subjective probabilities often figure in everyday speech in expressions such as “I think the team has only a 10% chance of making the finals this year.” Example 5 Determining Subjective Probability Estimate the probability that a) the next pair of shoes you buy will be the same size as the last pair you bought b) an expansion baseball team will win the World Series in their first season c) the next person to enter a certain coffee shop will be male Solution a) There is a small chance that the size of your feet has changed significantly or that different styles of shoes may fit you differently, so 80–90% would be a reasonable subjective probability that your next pair of shoes will be the same size as your last pair. b) Expansion teams rarely do well during their first season, and even strong teams have difficulty winning the World Series. The subjective www.mcgrawhill.ca/links/MDM12 probability of a brand-new team winning the World Series is close to zero. For some interesting baseball statistics, visit the above web site and follow the links. Write a c) Without more information about the coffee shop problem that could be solved using in question, your best estimate is to assume that probabilities. the shop’s patrons are representative of the general population. This assumption gives a subjective probability of 50% that the next customer will be male. Note that the answers in Example 5 contain estimates, assumptions, and, in some cases, probability ranges. While not as rigorous a measure as theoretical or empirical probability, subjective probabilities based on educated guesswork can still prove useful in some situations. 310 MHR • Introduction to Probability
  • 311. Key Concepts • A probability experiment is a well-defined process in which clearly identifiable outcomes are measured for each trial. • An event is a collection of outcomes satisfying a particular condition. The probability of an event can range between 0 (impossible) and 1 or 100% (certain). • The empirical probability of an event is the number of times the event occurs divided by the total number of trials. n(A) • The theoretical probability of an event A is given by P(A) = ᎏ , where n(S) n(A) is the number of outcomes making up A, n(S ) is the total number of outcomes in the sample space S, and all outcomes are equally likely to occur. • A subjective probability is based on intuition and previous experience. • If the probability of event A is given by P(A), then the probability of the complement of A is given by P(AЈ) = 1 − P(A). Communicate Your Understanding 1. Give two synonyms for the word probability. 2. a) Explain why P(A) + P(AЈ) = 1. b) Explain why probabilities less than 0 or greater than 1 have no meaning. 3. Explain the difference between theoretical, empirical, and subjective probability. Give an example of how you would determine each type. 4. Describe three situations in which statistical fluctuations occur. 5. a) Describe a situation in which you might determine the probability of event A indirectly by calculating P(AЈ) first. b) Will this method always yield the same result as calculating P(A) directly? c) Defend your answer to part b) using an explanation or proof, supported by an example. 6.1 Basic Probability Concepts • MHR 311
  • 312. Practise Determine the following probabilities. a) P(resident owns home) A b) P(resident rents and has lived at present 1. Determine the probability of address less than two years) a) tossing heads with a single coin c) P(homeowner has lived at present b) tossing two heads with two coins address more than two years) c) tossing at least one head with three coins B d) rolling a composite number with one die 5. Application Suppose your school’s basketball e) not rolling a perfect square with two dice team is playing a four-game series against f) drawing a face card from a standard deck another school. So far this season, each team of cards has won three of the six games in which they faced each other. 2. Estimate a subjective probability of each of the following events. Provide a rationale for a) Draw a tree diagram to illustrate all each estimate. possible outcomes of the series. a) the sun rising tomorrow b) Use your tree diagram to determine the probability of your school winning b) it never raining again exactly two games. c) your passing this course c) What is the probability of your school d) your getting the next job you apply for sweeping the series (winning all four 3. Recall the sum/product game at the games)? beginning of this section. Suppose that the d) Discuss any assumptions you made in the game were altered so that the slips of paper calculations in parts b) and c). showed the numbers 2, 3, and 4, instead of 6. Application Suppose that a graphing calculator 1, 2, and 3. is programmed to generate a random natural a) Identify all the outcomes that will number between 1 and 10 inclusive. What is produce each of the three possible events the probability that the number will be prime? i) p>s ii) p < s iii) p = s 7. Communication b) Which player has the advantage in this situation? a) A game involves rolling two dice. Player A wins if the throw totals 5, 7, or 9. Apply, Solve, Communicate Player B wins if any other total is thrown. Which player has the advantage? 4. The town planning department surveyed Explain. residents of a town about home ownership. b) Suppose the game is changed so that The table shows the results of the survey. Player A wins if 5, 7, or doubles (both dice At Address At Address showing the same number) are thrown. Less Than More Than Total for Residents 2 Years 2 Years Category Who has the advantage now? Explain. Owners 2000 8000 10 000 c) Design a similar game in which each Renters 4500 1500 6 000 player has an equal chance of winning. Total 6500 9500 16 000 312 MHR • Introduction to Probability
  • 313. 8. a) Based on the randomly tagged sample, 11. Communication Prior to a municipal pte ha what is the empirical probability that a election, a public-opinion poll determined C r deer captured at random will be a doe? that the probability of each of the four m P r oble b) If ten deer are captured at random, candidates winning was as follows: how many would you expect to be Jonsson 10% bucks? Trimble 32% C Yakamoto 21% 9. Inquiry/Problem Solving Refer to the prime Audette 37% number experiment in question 6. What a) How will these probabilities change if happens to the probability if you change the Jonsson withdraws from the race after upper limit of the sample space? Use a ballots are cast? graphing calculator or appropriate computer b) How will these probabilities change if software to investigate this problem. Let A Jonsson withdraws from the race before be the event that the random natural ballots are cast? number will be a prime number. Let the c) Explain why your answers to a) and b) random number be between 1 and n are different. inclusive. Predict what you think will happen to P(A) as n increases. Investigate 12. Inquiry/Problem Solving It is known from P(A) as a function of n, and reflect on your studying past tests that the correct answers hypothesis. Did you observe what you to a certain university professor’s multiple- expected? Why or why not? choice tests exhibit the following pattern. 10. Suppose that the Toronto Blue Jays face the Correct Answer Percent of Questions New York Yankees in the division final. In A 15% this best-of-five series, the winner is the first B 25% team to win three games. The games are C 30% played in Toronto and in New York, with D 15% Toronto hosting the first, second, and if E 15% needed, fifth games. The consensus among experts is that Toronto has a 65% chance of a) Devise a strategy for guessing that would winning at home and a 40% chance of maximize a student’s chances for success, winning in New York. assuming that the student has no idea of the correct answers. Explain your a) Construct a tree diagram to illustrate all method. the possible outcomes. b) Suppose that the study of past tests b) What is the chance of Toronto winning revealed that the correct answer choice in three straight games? for any given question was the same as c) For each outcome, add to your tree that of the immediately preceding diagram the probability of that outcome. question only 10% of the time. How d) Communication Explain how you found would you use this information to adjust your answers to parts b) and c). your strategy in part a)? Explain your reasoning. 6.1 Basic Probability Concepts • MHR 313
  • 314. 6.2 Odds Odds are another way to express a level of confidence about an outcome. Odds are commonly used in sports and other areas. Odds are often used when the probability of an event versus its complement is of interest, for example whether a sprinter will win or lose a race or whether a basketball team will make it to the finals. I N V E S T I G AT E & I N Q U I R E : Te n n i s To u r n a m e n t For an upcoming tennis tournament, a television commentator estimates that the top-seeded (highest-ranked) player has “a 25% probability of winning, but her odds of winning are only 1 to 3.” 1. a) If event A is the top-seeded player winning the tournament, what is A′? b) Determine P(A′). 2. a) How are the odds of the top-seeded player winning related to P(A) and P(A′)? b) Should the television commentator be surprised that the odds were only 1 to 3? Why or why not? 3. a) What factors might the commentator consider when estimating the probability of the top-seeded player winning the tournament? www.mcgrawhill.ca/links/MDM12 b) How accurate do you think the commentator’s estimate is likely to be? For more information about tennis rankings and Would you consider such an estimate other tennis statistics, visit the above web site and primarily a classical, an empirical, or a follow the links. Locate some statistics about a subjective probability? Explain. tennis player of your choice. Use odds to describe these statistics. 314 MHR • Introduction to Probability
  • 315. The odds in favour of an event’s occurring are given by the ratio of the probability that the event will occur to the probability that it will not occur. P(A) odds in favour of A = ᎏᎏ P(AЈ) Giving odds in favour of an event is a common way to express a probability. Example 1 Determining Odds A messy drawer contains three red socks, five white socks, and four black socks. What are the odds in favour of randomly drawing a red sock? Solution Let the event A be drawing a red sock. The probability of this event is 3 P(A) = ᎏ 12 1 = ᎏᎏ 4 Project The probability of not drawing a red sock is Prep P(AЈ) = 1 − P(A) A useful feature you 3 = ᎏᎏ could include in your 4 probability project is a Using the definition of odds, calculation of the odds P(A) odds in favour of A = ᎏ of winning your game. P(AЈ) 1 ᎏᎏ 4 = ᎏ 3 ᎏᎏ 4 1 = ᎏᎏ 3 1 Therefore, the odds in favour of drawing a red sock are ᎏᎏ, or less than 1. You 3 are more likely not to draw a red sock. These odds are commonly written as 1:3, which is read as “one to three” or “one in three.” Notice in Example 1 that the ratio of red socks to other socks is 3:9, which is the same as the odds in favour of drawing a red sock. In fact, the odds in favour of an event A can also be found using n(A) odds in favour of A = ᎏᎏ n(AЈ) 6.2 Odds • MHR 315
  • 316. A common variation on the theme of odds is to express the odds against an event happening. P(AЈ) odds against A = ᎏᎏ P(A) Example 2 Odds Against an Event If the chance of a snowstorm in Windsor, Ontario, in January is estimated at 0.4, what are the odds against Windsor’s having a snowstorm next January? Is a January snowstorm more likely than not? Solution Let event A = {snowstorm in January}. Since P(A) + P(AЈ) = 1, P(AЈ) odds against A = ᎏ P(A) 1 − P(A) = ᎏ P(A) 1 − 0.4 = ᎏ 0.4 0.6 = ᎏ 0.4 3 = ᎏᎏ 2 The odds against a snowstorm are 3:2, which is greater than 1:1. So a snowstorm is less likely to occur than not. Sometimes, you might need to convert an expression of odds into a probability. You can do this conversion by expressing P(AЈ) in terms of P(A). Example 3 Probability From Odds A university professor, in an effort to promote good attendance habits, states that the odds of passing her course are 8 to 1 when a student misses fewer than five classes. What is the probability that a student with good attendance will pass? Solution Let the event A be that a student with good attendance passes. Since P(A) odds in favour of A = ᎏ , P(AЈ) 316 MHR • Introduction to Probability
  • 317. 8 P(A) ᎏ = ᎏ 1 P(AЈ) P(A) = ᎏ 1 − P(A) 8 − 8P(A) = P(A) 8 = 9P(A) 8 P(A) = ᎏᎏ 9 8 The probability that a student with good attendance will pass is ᎏᎏ, or 9 approximately 89%. h h In general, it can be shown that if the odds in favour of A = ᎏᎏ, then P(A) = ᎏᎏ. k h+k Example 4 Using the Odds-Probability Formula The odds of Rico’s hitting a home run are 2:7. What is the probability of Rico’s hitting a home run? Solution Let A be the event that Rico hits a home run. Then, h = 2 and k = 7, and h P(A) = ᎏ h+k 2 = ᎏ 2+7 2 = ᎏᎏ 9 Rico has approximately a 22% chance of hitting a home run. Key Concepts P(A) • The odds in favour of A are given by the ratio ᎏ . P(AЈ) P(AЈ) • The odds against A are given by the ratio ᎏ . P(A) h h • If the odds in favour of A are ᎏ , then P(A) = ᎏ . k h+k 6.2 Odds • MHR 317
  • 318. Communicate Your Understanding 1. Explain why the terms odds and probability have different meanings. Give an example to illustrate your answer. 2. Would you prefer the odds in favour of passing your next data management test to be 1:3 or 3:1? Explain your choice. 3. Explain why odds can be greater than 1, but probabilities must be between 0 and 1. Practise Apply, Solve, Communicate A B 1. Suppose the odds in favour of good weather 5. Greta’s T-shirt drawer contains three tank tomorrow are 3:2. tops, six V-neck T-shirts, and two sleeveless a) What are the odds against good weather shirts. If she randomly draws a shirt from tomorrow? the drawer, what are the odds that she will b) What is the probability of good weather a) draw a V-neck T-shirt? tomorrow? b) not draw a tank top? 2. The odds against the Toronto Argonauts 6. Application If the odds in favour of Boris winning the Grey Cup are estimated at 19:1. beating Elena in a chess game are 5 to 4, What is the probability that the Argos will what is the probability that Elena will win win the cup? an upset victory in a best-of-five chess tournament? 3. Determine the odds in favour of rolling each of the following sums with a standard 7. a) Based on the randomly tagged sample, pte pair of dice. ha what are the odds in favour of a captured C r a) 12 b) 5 or less deer being a cross-hatched buck? m P r oble c) a prime number d) 1 b) What are the odds against capturing a doe? 4. Calculate the odds in favour of each event. a) New Year’s Day falling on a Friday b) tossing three tails with three coins www.mcgrawhill.ca/links/MDM12 c) not tossing exactly two heads with three coins Visit the above web site and follow the links for more information about Canadian d) randomly drawing a black 6 from a wildlife. complete deck of 52 cards e) a random number from 1 to 9 inclusive being even 318 MHR • Introduction to Probability
  • 319. 8. The odds against A, by definition, are 12. George estimates that there is a 30% chance equivalent to the odds in favour of A′. Use of rain the next day if he waters the lawn, a this definition to show that the odds against 40% chance if he washes the car, and a 50% A are equal to the reciprocal of the odds in chance if he plans a trip to the beach. favour of A. Assuming George’s estimates are accurate, what are the odds 9. Application Suppose the odds of the Toronto a) in favour of rain tomorrow if he waters Maple Leafs winning the Stanley Cup are the lawn? 1:5, while the odds of the Montréal Canadiens winning the Stanley Cup are b) in favour of rain tomorrow if he washes 2:13. What are the odds in favour of either the car? Toronto or Montréal winning the Stanley c) against rain tomorrow if he plans a trip Cup? to the beach? 10. What are the odds against drawing C a) a face card from a standard deck? 13. Communication A volleyball coach claims b) two face cards? that at the next game, the odds of her team winning are 3:1, the odds against losing are ACHIEVEMENT CHECK 5:1, and the odds against a tie are 7:1. Are these odds possible? Explain your reasoning. Knowledge/ Thinking/Inquiry/ Communication Application Understanding Problem Solving 14. Inquiry/Problem Solving Aki is a participant 11. Mike has a loaded (or unfair) six-sided die. on a trivia-based game show. He has an He rolls the die 200 times and determines equal likelihood on any given trial of being the following probabilities for each score: asked a question from one of six categories: P(1) = 0.11 Hollywood, Strange Places, Number Fun, P(2) = 0.02 Who?, Having a Ball, and Write On! Aki P(3) = 0.18 feels that he has a 50/50 chance of getting P(4) = 0.21 Having a Ball or Strange Places questions P(5) = 0.40 correct, but thinks he has a 90% probability of getting any of the other questions right. If a) What is P(6)? Aki has to get two of three questions b) Mike claims that the odds in favour of correct, what are his odds of winning? tossing a prime number with this die are the same as with a fair die. Do you agree 15. Inquiry/Problem Solving Use logic and with his claim? mathematical reasoning to show that if c) Using Mike’s die, devise a game with the odds in favour of A are given by odds in Mike’s favour that an h h ᎏᎏ, then P(A) = ᎏᎏ. Support your unsuspecting person would be tempted k h+k to play. Use probabilities to show that reasoning with an example. the game is in Mike’s favour. Explain why a person who does not realize that the die is loaded might be tempted by this game. 6.2 Odds • MHR 319
  • 320. 6.3 Probabilities Using Counting Techniques How likely is it that, in a game of cards, you will be dealt just the hand that you need? Most card players accept this question as an unknown, enjoying the unpredictability of the game, but it can also be interesting to apply counting analysis to such problems. In some situations, the possible outcomes are not easy or convenient to count individually. In many such cases, the counting techniques of permutations and combinations (see Chapters 4 and 5, respectively) can be helpful for calculating theoretical probabilities, or you can use a simulation to determine an empirical probability. I N V E S T I G AT E & I N Q U I R E : Fishing Simulation Suppose a pond has only three types of fish: catfish, trout, and bass, in the ratio 5:2:3. There are 50 fish in total. Assuming you are allowed to catch only three fish before throwing them back, consider the following two events: • event A = {catching three trout} • event B = {catching the three types of fish, in alphabetical order} 1. Carry out the following probability experiment, independently or with a partner. You can use a hat or paper bag to represent the pond, and some differently coloured chips or markers to represent the fish. How many of each type of fish should you release into the pond? Count out the appropriate numbers and shake the container to simulate the fish swimming around. 2. Draw a tree diagram to illustrate the different possible outcomes of this experiment. 3. Catch three fish, one at a time, and record the results in a table. Replace all three fish and shake the container enough to ensure that they are randomly distributed. Repeat this process for a total of ten trials. 4. Based on these ten trials, determine the empirical probability of event A, catching three trout. How accurate do you think this value is? Compare your results with those of the rest of the class. How can you obtain a more accurate empirical probability? 5. Repeat step 4 for event B, which is to catch a bass, catfish, and trout in order. 320 MHR • Introduction to Probability
  • 321. 6. Perform step 3 again for 10 new trials. Calculate the empirical probabilities of events A and B, based on your 20 trials. Do you think these probabilities are more accurate than those from 10 trials? Explain why or why not. 7. If you were to repeat the experiment for 50 or 100 trials, would your results be more accurate? Why or why not? 8. In this investigation, you knew exactly how many of each type of fish were in the pond because they were counted out at the beginning. Describe how you could use the techniques of this investigation to estimate the ratios of different species in a real pond. This section examines methods for determining the theoretical probabilities of successive or multiple events. Example 1 Using Permutations Two brothers enter a race with five friends. The racers draw lots to determine their starting positions. What is the probability that the older brother will start in lane 1 with his brother beside him in lane 2? Solution A permutation nPr, or P(n, r), is the number of ways to select r objects from a set of n objects, in a certain order. (See Chapter 4 for more about permutations.) The sample space is the total number of ways the first two lanes can be occupied. Thus, n(S) = 7P2 7! = ᎏ (7 − 2)! 7! = ᎏᎏ 5! 7 × 6 × (5!) = ᎏᎏ 5! = 42 The specific outcome of the older brother starting in lane 1 and the younger brother starting in lane 2 can only happen one way, so n(A) = 1. Therefore, n(A) P(A) = ᎏ n(S) 1 = ᎏᎏ 42 The probability that the older brother will start in lane 1 next to his brother in 1 lane 2 is ᎏᎏ, or approximately 2.3%. 42 6.3 Probabilities Using Counting Techniques • MHR 321
  • 322. Example 2 Probability Using Combinations A focus group of three members is to be randomly selected from a medical team consisting of five doctors and seven technicians. a) What is the probability that the focus group will be comprised of doctors only? b) What is the probability that the focus group will not be comprised of doctors only? Solution ΂ r ΃, is the number of ways to n a) A combination nCr, also written C(n, r) or select r objects from a set of n objects, in any order. (See Chapter 5 for more about combinations.) Let event A be selecting three doctors to form the focus group. The number of possible ways to make this selection is n(A) = 5C3 5! = ᎏᎏ 3!(5 − 3)! 5 × 4 × 3! = ᎏᎏ 3! × 2! 20 = ᎏᎏ 2 = 10 However, the focus group can consist of any three people from the team of 12. n(S) = 12C3 12! = ᎏᎏ 3!(12 − 3)! 12 × 11 × 10 × 9! = ᎏᎏ 3! × 9! 1320 = ᎏᎏ 6 = 220 The probability of selecting a focus group of doctors only is n(A) P(A) = ᎏ n(S) 10 = ᎏᎏ 220 1 = ᎏᎏ 22 1 The probability of selecting a focus group consisting of three doctors is ᎏᎏ, 22 or approximately 0.045. 322 MHR • Introduction to Probability
  • 323. b) Either the focus group is comprised of doctors only, or it is not. Project Therefore, the probability of the complement of A, P(A′), gives Prep the desired result. When you determine the P(A′ ) = 1 − P(A) classical probabilities for 1 your probability project, = 1 − ᎏᎏ 22 you may need to apply 21 the counting techniques = ᎏ 22 of permutations and So, the probability of selecting a focus group not comprised of combinations. 21 doctors only is ᎏᎏ, or approximately 0.955. 22 Example 3 Probability Using the Fundamental Counting Principle What is the probability that two or more students out of a class of 24 will have the same birthday? Assume that no students were born on February 29. Solution 1 Using Pencil and Paper The simplest method is to find the probability of the complementary event that no two people in the class have the same birthday. Pick two students at random. The second student has a different birthday than the first for 364 of the 365 possible birthdays. Thus, the probability that the 364 two students have different birthdays is ᎏᎏ. Now add a third student. Since 365 there are 363 ways this person can have a different birthday from the other two students, the probability that all three students have different birthdays 364 363 is ᎏᎏ × ᎏᎏ. Continuing this process, the probability that none of the 365 365 24 people have the same birthday is n(A′) P(A′) = ᎏ n(S) 364 363 362 342 = ᎏ × ᎏ × ᎏ ×…× ᎏ 365 365 365 365 =⋅ 0.462 P(A) = 1 − P(A′) = 1 − 0.462 = 0.538 The probability that at least two people in the group have the same birthday is approximately 0.538. 6.3 Probabilities Using Counting Techniques • MHR 323
  • 324. Solution 2 Using a Graphing Calculator Use the iterative functions of a graphing calculator to evaluate the formula above much more easily. The prod( function on the LIST MATH menu will find the product of a series of numbers. The seq( function on the LIST OPS menu generates a sequence for the range you specify. Combining these two functions allows you to calculate the probability in a single step. Key Concepts • In probability experiments with many possible outcomes, you can apply the fundamental counting principle and techniques using permutations and combinations. • Permutations are useful when order is important in the outcomes; combinations are useful when order is not important. Communicate Your Understanding 1. In the game of bridge, each player is dealt 13 cards out of the deck of 52. Explain how you would determine the probability of a player receiving a) all hearts b) all hearts in ascending order 2. a) When should you apply permutations in solving probability problems, and when should you apply combinations? b) Provide an example of a situation where you would apply permutations to solve a probability problem, other than those in this section. c) Provide an example of a situation where you would apply combinations to solve a probability problem, other than those in this section. Practise 3. A fruit basket contains five red apples and three green apples. Without looking, you A randomly select two apples. What is the 1. Four friends, two females and two males, are probability that playing contract bridge. Partners are a) you will select two red apples? randomly assigned for each game. What is b) you will not select two green apples? the probability that the two females will be partners for the first game? 4. Refer to Example 1. What is the probability that the two brothers will start beside each 2. What is the probability that at least two other in any pair of lanes? out of a group of eight friends will have the same birthday? 324 MHR • Introduction to Probability
  • 325. Apply, Solve, Communicate b) What is the probability that the friends will arrive in order of ascending age? B c) What assumptions must be made in parts 5. An athletic committee with three members a) and b)? is to be randomly selected from a group of six gymnasts, four weightlifters, and eight 9. A hockey team has two goalies, six defenders, long-distance runners. Determine the eight wingers, and four centres. If the team probability that randomly selects four players to attend a a) the committee is comprised entirely of charity function, what is the likelihood that runners a) they are all wingers? b) the committee is represented by each of b) no goalies or centres are selected? the three types of athletes 10. Application A lottery promises to award 6. A messy drawer contains three black socks, ten grand-prize trips to Hawaii and sells five blue socks, and eight white socks, none 5 400 000 tickets. of which are paired up. If the owner grabs a) Determine the probability of winning a two socks without looking, what is the grand prize if you buy probability that both will be white? i) 1 ticket 7. a) A family of nine has a tradition of ii) 10 tickets drawing two names from a hat to see iii) 100 tickets whom they will each buy presents for. If b) Communication How many tickets do there are three sisters in the family, and you need to buy in order to have a 5% the youngest sister is always allowed the chance of winning a grand prize? Do you first draw, determine the probability that think this strategy is sensible? Why or the youngest sister will draw both of the why not? other two sisters’ names. If she draws her c) How many tickets do you need to ensure own name, she replaces it and draws a 50% chance of winning? another. b) Suppose that the tradition is modified 11. Suki is enrolled in one data-management one year, so that the first person whose class at her school and Leo is in another. A name is drawn is to receive a “main” school quiz team will have four volunteers, present, and the second a less expensive, two randomly selected from each of the two “fun” present. Determine the probability classes. Suki is one of five volunteers from that the youngest sister will give a main her class, and Leo is one of four volunteers present to the middle sister and a fun from his. Calculate the probability of the present to the eldest sister. two being on the team and explain the steps in your calculation. 8. Application a) Laura, Dave, Monique, Marcus, and Sarah are going to a party. What is the probability that two of the girls will arrive first? 6.3 Probabilities Using Counting Techniques • MHR 325
  • 326. 12. a) Suppose 4 of the 22 tagged bucks are c) Could the random-number generator of pte ha randomly chosen for a behaviour study. a graphing calculator be used to simulate C r What is the probability that this investigation? If so, explain how. If m P r oble i) all four bucks have the cross-hatched not, explain why. antlers? d) Outline the steps you would use to ii) at least one buck has cross-hatched model this problem with software such antlers? as FathomTM or a spreadsheet. b) If two of the seven cross-hatched males e) Is the assumption that the fish are are randomly selected for a health study, randomly distributed likely to be what is the probability that the eldest of completely correct? Explain. What other the seven will be selected first, followed assumptions might affect the accuracy of by the second eldest? the calculated probabilities? 15. A network of city streets forms square ACHIEVEMENT CHECK blocks as shown in the diagram. Knowledge/ Thinking/Inquiry/ Library Communication Application Understanding Problem Solving 13. Suppose a bag contains the letters to spell probability. a) How many four-letter arrangements are possible using these letters? Pool b) What is the probability that Barb chooses four letters from the bag in the Jeanine leaves the library and walks toward order that spell her name? the pool at the same time as Miguel leaves c) Pick another four-letter arrangement the pool and walks toward the library. and calculate the probability that it is Neither person follows a particular route, chosen. except that both are always moving toward their destination. What is the probability d) What four-letter arrangement would be that they will meet if they both walk at the most likely to be picked? Explain your same rate? reasoning. 16. Inquiry/Problem Solving A committee is C formed by randomly selecting from eight 14. Communication Refer to the fishing nurses and two doctors. What is the investigation at the beginning of this section. minimum committee size that ensures at least a 90% probability that it will not be a) Determine the theoretical probability of comprised of nurses only? i) catching three trout ii) catching a bass, catfish, and trout in alphabetical order b) How do these results compare with the empirical probabilities from the investigation? How do you account for any differences? 326 MHR • Introduction to Probability
  • 327. 6.4 Dependent and Independent Events If you have two examinations next Tuesday, what is the probability that you will pass both of them? How can you predict the risk that a critical network server and its backup will both fail? If you flip an ordinary coin repeatedly and get heads 99 times in a row, is the next toss almost certain to come up tails? In such situations, you are dealing with compound events involving two or more separate events. I N V E S T I G AT E & I N Q U I R E : G e t t i n g O u t o f J a i l i n M O N O P O LY ® While playing MONOPOLY® for the first time, Kenny finds himself in jail. To get out of jail, he needs to roll doubles on a pair of standard dice. 1. Determine the probability that Kenny will roll doubles on his first try. 2. Suppose that Kenny fails to roll doubles on his first two turns in jail. He reasons that on his next turn, his odds are now 50/50 that he will get out of jail. Explain how Kenny has reasoned this. 3. Do you agree or disagree with Kenny’s reasoning? Explain. 4. What is the probability that Kenny will get out of jail on his third attempt? 5. After how many turns is Kenny certain to roll doubles? Explain. 6. Kenny’s opponent, Roberta, explains to Kenny that each roll of the dice is an independent event and that, since the relatively low probability of rolling doubles never changes from trial to trial, Kenny may never get out of jail and may as well just forfeit the game. Explain the flaws in Roberta’s analysis. 6.4 Dependent and Independent Events • MHR 327
  • 328. In some situations involving compound events, the occurrence of one event has no effect on the occurrence of another. In such cases, the events are independent. Example 1 Simple Independent Events a) A coin is flipped and turns up heads. What is the probability that the second flip will turn up heads? b) A coin is flipped four times and turns up heads each time. What is the probability that the fifth trial will be heads? c) Find the probability of tossing five heads in a row. d) Comment on any difference between your answers to parts b) and c). Solution a) Because these events are independent, the outcome of the first toss has no effect on the outcome of the second toss. Therefore, the probability of tossing heads the second time is 0.5. b) Although you might think “tails has to come up sometime,” there is still a 50/50 chance on each independent toss. The coin has no memory of the past four trials! Therefore, the fifth toss still has just a 0.5 probability of coming up heads. c) Construct a tree diagram to represent five tosses of the coin. H H T H H T H T H H T T H T There is an equal T number of outcomes H H in which the first flip H turns up tails. T H H T T T H H T T H T T 328 MHR • Introduction to Probability
  • 329. The number of outcomes doubles with each trial. After the fifth toss, there are 25 or 32 possible outcomes, only one of which is five heads in a row. So, the probability of five heads in a row, prior to any coin tosses, 1 is ᎏᎏ or 0.031 25. 32 d) The probability in part c) is much less than in part b). In part b), you calculate only the probability for the fifth trial on its own. In part c), you are finding the probability that every one of five separate events actually happens. Example 2 Probability of Two Different Independent Events A coin is flipped while a die is rolled. What is the probability of flipping heads and rolling 5 in a single trial? Solution Here, two independent events occur in a single trial. Let A be the event of flipping heads, and B be the event of rolling 5. The notation P(A and B) represents the compound, or joint, probability that both events, A and B, will occur simultaneously. For independent events, the probabilities can simply be multiplied together. P(A and B) = P(A) × P(B) 1 1 = ᎏᎏ × ᎏᎏ 2 6 1 = ᎏᎏ 12 1 The probability of simultaneously flipping heads while rolling 5 is ᎏᎏ or 12 approximately 8.3% In general, the compound probability of two independent events can be calculated using the product rule for independent events: P(A and B) = P(A) × P(B) From the example above, you can see that the product rule for independent events agrees with common sense. The product rule can also be derived mathematically from the fundamental counting principle (see Chapter 4). 6.4 Dependent and Independent Events • MHR 329
  • 330. Proof: A and B are separate events and so they correspond to separate sample spaces, SA and SB. Their probabilities are thus n(A) n(B) P(A) = ᎏ and P(B) = ᎏ . n(SA ) n(SB ) Call the sample space for the compound event S, as usual. You know that n(A and B) P(A and B) = ᎏᎏ (1) n(S) Because A and B are independent, you can apply the fundamental counting principle to get an expression for n(A and B). n(A and B) = n(A) × n(B) (2) Similarly, you can also apply the fundamental counting principle to get an expression for n(S). n(S) = n(SA ) × n(SB ) (3) Substitute equations (2) and (3) into equation (1). n(A)n(B) P(A and B) = ᎏᎏ n(SA )n(SB ) n(A) n(B) = ᎏ × ᎏ n(SA ) n(SB ) = P(A) × P(B) Example 3 Applying the Product Rule for Independent Events Soo-Ling travels the same route to work every day. She has determined that there is a 0.7 probability that she will wait for at least one red light and that there is a 0.4 probability that she will hear her favourite new song on her way to work. a) What is the probability that Soo-Ling will not have to wait at a red light and will hear her favourite song? b) What are the odds in favour of Soo-Ling having to wait at a red light and not hearing her favourite song? 330 MHR • Introduction to Probability
  • 331. Solution a) Let A be the event of Soo-Ling having to wait at a red light, and B be the event of hearing her favourite song. Assume A and B to be independent events. In this case, you are interested in the combination A′ and B. P(A′ and B) = P(A′) × P(B) = (1 − P(A)) × P(B) = (1 − 0.7) × 0.4 = 0.12 There is a 12% chance that Soo-Ling will hear her favourite song and not have to wait at a red light on her way to work. b) P(A and B′) = P(A) × P(B′) = P(A) × (1 − P(B)) = 0.7 × (1 − 0.4) = 0.42 The probability of Soo-Ling having to wait at a red light and not hearing her favourite song is 42%. The odds in favour of this happening are P(A and B′) odds in favour = ᎏᎏ 1 − P(A and B′) 42% = ᎏᎏ 100% − 42% 42 = ᎏᎏ 58 21 = ᎏᎏ 29 The odds in favour of Soo-Ling having to wait at a red light and not hearing her favourite song are 21:29. In some cases, the probable outcome of an event, B, depends directly on the outcome of another event, A. When this happens, the events are said to be dependent. The conditional probability of B, P(B | A), is the probability that B occurs, given that A has already occurred. Example 4 Probability of Two Dependent Events A professional hockey team has eight wingers. Three of these wingers are 30-goal scorers, or “snipers.” Every fall the team plays an exhibition match with the club’s farm team. In order to make the match more interesting for the fans, the coaches agree to select two wingers at random from the pro team to play for the farm team. What is the probability that two snipers will play for the farm team? 6.4 Dependent and Independent Events • MHR 331
  • 332. Solution Let A = {first winger is a sniper} and B = {second winger is a sniper}. Three of the eight wingers are snipers, so the probability of the first winger selected being a sniper is 3 P(A) = ᎏᎏ 8 If the first winger selected is a sniper, then there are seven remaining wingers to choose from, two of whom are snipers. Therefore, 2 P(B | A) = ᎏᎏ 7 Applying the fundamental counting principle, the probability of randomly selecting two snipers for the farm team is the number of ways of selecting two snipers divided by the number of ways of selecting any two wingers. 3×2 P(A and B) = ᎏ 8×7 3 = ᎏ 28 3 There is a ᎏᎏ or 10.7% probability that two professional snipers will play for 28 the farm team in the exhibition game. Notice in Example 4 that, when two events A and B are Project dependent, you can still multiply probabilities to find the Prep probability that they both happen. However, you must use the conditional probability for the second event. Thus, When designing your game for the probability that both events will occur is given by the the probability project, you may product rule for dependent events: decide to include situations involving independent or P(A and B) = P(A) × P(B | A) dependent events. If so, you will need to apply the appropriate This reads as: “The probability that both A and B will occur product rule in order to equals the probability of A times the probability of B given determine classical probabilities. that A has occurred.” Example 5 Conditional Probability From Compound Probability Serena’s computer sometimes crashes while she is trying to use her e-mail program, OutTake. When OutTake “hangs” (stops responding to commands), Serena is usually able to close OutTake without a system crash. In a computer magazine, she reads that the probability of OutTake hanging in any 15-min period is 2.5%, while the chance of OutTake and the operating system failing together in any 15-min period is 1%. If OutTake is hanging, what is the probability that the operating system will crash? 332 MHR • Introduction to Probability
  • 333. Solution Let event A be OutTake hanging, and event B be an operating system failure. Since event A can trigger event B, the two events are dependent. In fact, you need to find the conditional probability P(B | A). The data from the magazine tells you that P(A) = 2.5%, and P(A and B) = 1%. Therefore, P(A and B) = P(A) × P(B | A) 1% = 2.5% × P(B | A) 1% P(B | A) = ᎏ 2.5% = 0.4 There is a 40% chance that the operating system will crash when OutTake is hanging. Example 5 suggests a useful rearrangement of the product rule for dependent events. P(A and B) P(B | A ) = ᎏᎏ P(A) This equation is sometimes used to define the conditional probability P(B | A ). Key Concepts • If A and B are independent events, then the probability of both occurring is given by P(A and B) = P(A) × P(B). • If event B is dependent on event A, then the conditional probability of B given A is P(B | A). In this case, the probability of both events occurring is given by P(A and B) = P(A) × P(B | A). Communicate Your Understanding 1. Consider the probability of randomly drawing an ace from a standard deck of cards. Discuss whether or not successive trials of this experiment are independent or dependent events. Consider cases in which drawn cards are a) replaced after each trial b) not replaced after each trial 2. Suppose that for two particular events A and B, it is true that P(B A) = P(B). | What does this imply about the two events? (Hint: Try substituting this equation into the product rule for dependent events.) 6.4 Dependent and Independent Events • MHR 333
  • 334. Practise 5. a) Rocco and Biff are two koala bears participating in a series of animal A behaviour tests. They each have 10 min 1. Classify each of the following as to solve a maze. Rocco has an 85% independent or dependent events. probability of succeeding if he can smell First Event Second Event the eucalyptus treat at the other end. He a) Attending a rock Passing a final can smell the treat 60% of the time. Biff concert on Tuesday examination the has a 70% chance of smelling the treat, night following but when he does, he can solve the maze Wednesday morning only 75% of the time. Neither bear will b) Eating chocolate Winning at checkers try to solve the maze unless he smells the c) Having blue eyes Having poor hearing eucalyptus. Determine which koala bear d) Attending an Improving personal is more likely to enjoy a tasty treat on employee training productivity any given trial. session e) Graduating from b) Communication Explain how you arrived Running a marathon university at your conclusion. f) Going to a mall Purchasing a new shirt 6. Shy Tenzin’s friends assure him that if he asks Mikala out on a date, there is an 85% 2. Amitesh estimates that he has a 70% chance chance that she will say yes. If there is a of making the basketball team and a 20% 60% chance that Tenzin will summon the chance of having failed his last geometry courage to ask Mikala out to the dance next quiz. He defines a “really bad day” as one in week, what are the odds that they will be which he gets cut from the team and fails his seen at the dance together? quiz. Assuming that Amitesh will receive both pieces of news tomorrow, how likely is 7. When Ume’s hockey team uses a “rocket it that he will have a really bad day? launch” breakout, she has a 55% likelihood of receiving a cross-ice pass while at full 3. In the popular dice game Yahtzee®, a speed. When she receives such a pass, the Yahtzee occurs when five identical numbers 1 turn up on a set of five standard dice. What probability of getting her slapshot away is ᎏᎏ. 3 is the probability of rolling a Yahtzee on one Ume’s slapshot scores 22% of the time. roll of the five dice? What is the probability of Ume scoring with her slapshot when her team tries a rocket Apply, Solve, Communicate launch? B 8. Inquiry/Problem Solving Show that if A and 4. There are two tests for a particular antibody. B are dependent events, then the conditional T A gives a correct result 95% of the time. est probability P(A | B) is given by T B is accurate 89% of the time. If a patient est P(A and B) is given both tests, find the probability that P(A | B) = ᎏᎏ . P(B) a) both tests give the correct result b) neither test gives the correct result c) at least one of the tests gives the correct result 334 MHR • Introduction to Probability
  • 335. 9. A consultant’s study found Megatran’s call 14. Application A critical circuit in a centre had a 5% chance of transferring a communication network relies on a set of call about schedules to the lost articles eight identical relays. If any one of the relays department by mistake. The same study fails, it will disrupt the entire network. The shows that, 1% of the time, customers design engineer must ensure a 90% calling for schedules have to wait on hold, probability that the network will not fail only to discover that they have been over a five-year period. What is the mistakenly transferred to the lost articles maximum tolerable probability of failure for department. What are the chances that a each relay? customer transferred to lost articles will be C put on hold? 15. a) Show that if a coin is tossed n times, the 10. Pinder has examinations coming up in data probability of tossing n heads is given by management and biology. He estimates that ΂΃ 1 n his odds in favour of passing the data- P(A) = ᎏᎏ . 2 management examination are 17:3 and his b) What is the probability of getting at least odds against passing the biology examination one tail in seven tosses? are 3:7. Assume these to be independent events. 16. What is the probability of not throwing 7 or a) What is the probability that Pinder will doubles for six consecutive throws with a pass both exams? pair of dice? b) What are the odds in favour of Pinder 17. Laurie, an avid golfer, gives herself a 70% failing both exams? chance of breaking par (scoring less than 72 c) What factors could make these two on a round of 18 holes) if the weather is events dependent? calm, but only a 15% chance of breaking par on windy days. The weather forecast gives a 11. Inquiry/Problem Solving How likely is it for 40% probability of high winds tomorrow. a group of five friends to have the same birth What is the likelihood that Laurie will break month? State any assumptions you make for par tomorrow, assuming that she plays one your calculation. round of golf? 12. Determine the probability that a captured pte 18. Application The Tigers are leading the ha deer has the bald patch condition. Storm one game to none in a best-of-five C r playoff series. After a playoff win, the m P r oble 13. Communication Five different CD-ROM probability of the Tigers winning the next games, Garble, Trapster, Zoom!, Bungie, game is 60%, while after a loss, their and Blast ’Em, are offered as a promotion probability of winning the next game drops by SugarRush cereals. One game is by 5%. The first team to win three games randomly included with each box of cereal. takes the series. Assume there are no ties. What is the probability of the Storm coming a) Determine the probability of getting all back to win the series? 5 games if 12 boxes are purchased. b) Explain the steps in your solution. c) Discuss any assumptions that you make in your analysis. 6.4 Dependent and Independent Events • MHR 335
  • 336. 6.5 Mutually Exclusive Events The phone rings. Jacques is really hoping that it is one of his friends calling about either softball or band practice. Could the call be about both? In such situations, more than one event could occur during a single trial. You need to compare the events in terms of the outcomes that make them up. What is the chance that at least one of the events happens? Is the situation “either/or,” or can both events occur? I N V E S T I G AT E & I N Q U I R E : Baseball Pitches Marie, at bat for the Coyotes, is facing Anton, who is pitching for the Power Trippers. Anton uses three pitches: a fastball, a curveball, and a slider. Marie feels she has a good chance of making a base hit, or better, if Anton throws either a fastball or a slider. The count is two strikes and three balls. In such full-count situations, Anton goes to his curveball one third of the time, his slider half as often, and his fastball the rest of the time. 1. Determine the probability of Anton throwing his a) curveball b) slider c) fastball 2. a) What is the probability that Marie will get the pitch she does not want? b) Explain how you can use this information to determine the probability that Marie will get a pitch she likes. 3. a) Show another method of determining this probability. b) Explain your method. 4. What do your answers to questions 2 and 3 suggest about the probabilities of events that cannot happen simultaneously? The possible events in this investigation are said to be mutually exclusive (or disjoint) since they cannot occur at the same time. The pitch could not be both a fastball and a slider, for example. In this particular problem, you were interested in the probability of either of two favourable events. You can use the notation P(A or B) to stand for the probability of either A or B occurring. 336 MHR • Introduction to Probability
  • 337. Example 1 Probability of Mutually Exclusive Events Teri attends a fundraiser at which 15 T-shirts are being given away as door prizes. Door prize winners are randomly given a shirt from a stock of 2 black shirts, 4 blue shirts, and 9 white shirts. Teri really likes the black and blue shirts, but is not too keen on the white ones. Assuming that Teri wins the first door prize, what is the probability that she will get a shirt that she likes? Solution Let A be the event that Teri wins a black shirt, and B be the event that she wins a blue shirt. 2 4 P(A) = ᎏ and P(B) = ᎏ 15 15 Teri would be happy if either A or B occurred. There are 2 + 4 = 6 non-white shirts, so 6 P(A or B) = ᎏ 15 2 = ᎏᎏ 5 2 The probability of Teri winning a shirt that she likes is ᎏᎏ or 40%. Notice that 5 this probability is simply the sum of the probabilities of the two mutually exclusive events. When events A and B are mutually exclusive, the probability that A or B will occur is given by the addition rule for mutually exclusive events: P(A or B) = P(A) + P(B) A Venn diagram shows mutually exclusive events as non-overlapping, S or disjoint. Thus, you can apply the additive counting principle (see Chapter 4) to prove this rule. A B Proof: If A and B are mutually exclusive events, then n(A or B) P(A or B) = ᎏᎏ n(S) n(A) + n(B) = ᎏᎏ A and B are disjoint sets, and thus share no elements. n(S ) n(A) n(B) = ᎏ + ᎏ n(S) n(S) = P(A) + P(B) 6.5 Mutually Exclusive Events • MHR 337
  • 338. In some situations, events are non-mutually exclusive, which means Second die that they can occur simultaneously. For example, consider a board game 1 2 3 4 5 6 in which you need to roll either an 8 or doubles, using two dice. 1 2 3 4 5 6 7 Notice that in one outcome, rolling two fours, both events have 2 3 4 5 6 7 8 occurred simultaneously. Hence, these events are not mutually First 3 4 5 6 7 8 9 exclusive. Counting the outcomes in the diagram shows that the die 4 5 6 7 8 9 10 10 5 probability of rolling either an 8 or doubles is ᎏ or ᎏ . You 5 6 7 8 9 10 11 36 18 6 7 8 9 10 11 12 need to take care not to count the (4, 4) outcome twice. You are applying the principle of inclusion and exclusion, which was explained in greater detail in Chapter 5. Example 2 Probability of Non-Mutually Exclusive Events A card is randomly selected from a standard deck of cards. What is the probability that either a heart or a face card (jack, queen, or king) is selected? Solution Let event A be that a heart is selected, and event B be that a face card is selected. 13 12 P(A) = ᎏ and P(B) = ᎏ 52 52 If you add these probabilities, you get 13 12 P(A) + P(B) = ᎏ + ᎏ 52 52 25 = ᎏ 52 However, since the jack, queen, and king of hearts are in both A and B, the sum P(A) + P(B) actually includes these outcomes twice. A ♣ 2 ♣ 3 ♣ 4 ♣ 5 ♣ 6 ♣ 7 ♣ 8 ♣ 9 ♣ 10 ♣ J ♣ Q ♣ K ♣ A ♦ 2 ♦ 3 ♦ 4 ♦ 5 ♦ 6 ♦ 7 ♦ 8 ♦ 9 ♦ 10 ♦ J ♦ Q ♦ K ♦ A ♥ 2 ♥ 3 ♥ 4 ♥ 5 ♥ 6 ♥ 7 ♥ 8 ♥ 9 ♥ 10 ♥ J ♥ Q ♥ K ♥ A ♠ 2 ♠ 3 ♠ 4 ♠ 5 ♠ 6 ♠ 7 ♠ 8 ♠ 9 ♠ 10 ♠ J ♠ Q ♠ K ♠ Based on the diagram, the actual theoretical probability of drawing either 22 11 a heart or a face card is ᎏᎏ, or ᎏᎏ. You can find the correct value by subtracting 52 26 the probability of selecting the three elements that were counted twice. 338 MHR • Introduction to Probability
  • 339. 13 12 3 S P(A or B) = ᎏ + ᎏ − ᎏ 52 52 52 22 Hearts Face card = ᎏᎏ 13 12 52 P = ––– P = ––– 52 52 11 = ᎏᎏ 26 Heart and face card The probability that either a heart 3 P = ––– 11 52 or a face card is selected is ᎏᎏ. 26 When events A and B are non-mutually exclusive, the probability that S A or B will occur is given by the addition rule for non-mutually exclusive events: A B P(A or B) = P(A) + P(B) − P(A and B) A and B Example 3 Applying the Addition Rule for Project Non-Mutually Exclusive Events Prep An electronics manufacturer is testing a new product to see When analysing the possible whether it requires a surge protector. The tests show that a outcomes for your game in the voltage spike has a 0.2% probability of damaging the probability project, you may need to product’s power supply, a 0.6% probability of damaging consider mutually exclusive or non- downstream components, and a 0.1% probability of mutually exclusive events. If so, you damaging both the power supply and other components. will need to apply the appropriate Determine the probability that a voltage spike will damage addition rule to determine theoretical the product. probabilities. Solution Let A be damage to the power supply and C be S damage to other components. A C The overlapping region represents the probability that 0.2 0.1 0.6 a voltage surge damages both the power supply and another component. The probability that either A or C occurs is given by P(A or C) = P(A) + P(C) − P(A and C) = 0.2% + 0.6% − 0.1% = 0.7% There is a 0.7% probability that a voltage spike will damage the product. 6.5 Mutually Exclusive Events • MHR 339
  • 340. Key Concepts • If A and B are mutually exclusive events, then the probability of either A or B occurring is given by P(A or B) = P(A) + P(B). • If A and B are non-mutually exclusive events, then the probability of either A or B occurring is given by P(A or B) = P(A) + P(B) − P(A and B). Communicate Your Understanding 1. Are an event and its complement mutually exclusive? Explain. 2. Explain how to determine the probability of randomly throwing either a composite number or an odd number using a pair of dice. 3. a) Explain the difference between independent events and mutually exclusive events. b) Support your explanation with an example of each. c) Why do you add probabilities in one case and multiply them in the other? Practise 2. Nine members of a baseball team are randomly assigned field positions. There are A three outfielders, four infielders, a pitcher, 1. Classify each pair of events as mutually and a catcher. Troy is happy to play any exclusive or non-mutually exclusive. position except catcher or outfielder. Determine the probability that Troy will Event A Event B Randomly drawing Randomly drawing be assigned to play a) a grey sock from a a wool sock from a a) catcher drawer drawer b) outfielder b) Randomly selecting Randomly selecting a student with a student on the c) a position he does not like brown eyes honour roll c) Having an even Having an odd 3. A car dealership analysed its customer number of students number of students database and discovered that in the last in your class in your class model year, 28% of its customers chose a d) Rolling a six with a Rolling a prime 2-door model, 46% chose a 4-door model, die number with a die 19% chose a minivan, and 7% chose a e) Your birthday Your birthday falling on a falling on a 4-by-4 vehicle. If a customer was selected Saturday next year weekend next year randomly from this database, what is the f) Getting an A on the Passing the next test probability that the customer next test a) bought a 4-by-4 vehicle? g) Calm weather at Stormy weather at noon tomorrow noon tomorrow b) did not buy a minivan? h) Sunny weather next Rainy weather next c) bought a 2-door or a 4-door model? week week d) bought a minivan or a 4-by-4 vehicle? 340 MHR • Introduction to Probability
  • 341. Apply, Solve, Communicate 7. Application In an animal-behaviour study, hamsters were tested with a number B of intelligence tasks, as shown in the 4. As a promotion, a resort has a draw for free table below. family day-passes. The resort considers July, Number of Tests Number of Hamsters August, March, and December to be 0 10 “vacation months.” 1 6 a) If the free passes are randomly dated, 2 4 what is the probability that a day-pass 3 3 will be dated within 4 or more 5 i) a vacation month? ii) June, July, or August If a hamster is randomly chosen from this study group, what is the likelihood that the b) Draw a Venn diagram of the events in hamster has participated in part a). a) exactly three tests? 5. A certain provincial park has 220 campsites. b) fewer than two tests? A total of 80 sites have electricity. Of the 52 c) either one or two tests? sites on the lakeshore, 22 of them have electricity. If a site is selected at random, what d) no tests or more than three tests? is the probability that 8. Communication a) it will be on the lakeshore? a) Prove that, if A and B are non-mutually b) it will have electricity? exclusive events, the probability of either c) it will either have electricity or be on the A or B occurring is given by lakeshore? P(A or B) = P(A) + P(B) − P(A and B). d) it will be on the lakeshore and not have b) What can you conclude if P(A and B) = 0? electricity? Give reasons for your conclusion. 6. A market-research firm monitored 1000 9. Inquiry/Problem Solving Design a game in television viewers, consisting of 800 adults which the probability of drawing a winning and 200 children, to evaluate a new comedy card from a standard deck is between 55% series that aired for the first time last week. and 60%. Research indicated that 250 adults and 10. Determine the probability that a captured 148 children viewed some or all of the pte ha deer has either cross-hatched antlers or bald program. If one of the 1000 viewers was C r patches. Are these events mutually exclusive? selected, what is the probability that m P r oble Why or why not? a) the viewer was an adult who did not watch the new program? 11. The eight members of the debating club b) the viewer was a child who watched pose for a yearbook photograph. If they line the new program? up randomly, what is the probability that c) the viewer was an adult or someone a) either Hania will be first in the row or who watched the new program? Aaron will be last? b) Hania will be first and Aaron will not be last? 6.5 Mutually Exclusive Events • MHR 341
  • 342. ACHIEVEMENT CHECK C Knowledge/ Thinking/Inquiry/ Communication Application 13. A grade 12 student is selected at random to Understanding Problem Solving sit on a university liaison committee. Of the 12. Consider a Stanley Cup playoff series in 120 students enrolled in the grade 12 which the Toronto Maple Leafs hockey university-preparation mathematics courses, team faces the Ottawa Senators. Toronto • 28 are enrolled in data management only hosts the first, second, and if needed, fifth • 40 are enrolled in calculus only and seventh games in this best-of-seven • 15 are enrolled in geometry only contest. The Leafs have a 65% chance of • 16 are enrolled in both data management beating the Senators at home in the first and calculus game. After that, they have a 60% chance • 12 are enrolled in both calculus and geometry of a win at home if they won the previous • 6 are enrolled in both geometry and data game, but a 70% chance if they are management bouncing back from a loss. Similarly, the • 3 are enrolled in all three of data Leafs’ chances of victory in Ottawa are management, calculus, and geometry 40% after a win and 45% after a loss. a) Draw a Venn diagram to illustrate this a) Construct a tree diagram to illustrate situation. all the possible outcomes of the first b) Determine the probability that the three games. student selected will be enrolled in either b) Consider the following events: data management or calculus. A = {Leafs lose the first game but go c) Determine the probability that the on to win the series in the fifth game} student selected will be enrolled in only B = {Leafs win the series in the fifth one of the three courses. game} 14. Application For a particular species of cat, C = {Leafs lose the series in the fifth the odds against a kitten being born with game} either blue eyes or white spots are 3:1. If the Identify all the outcomes that make up probability of a kitten exhibiting only one of each event, using strings of letters, such these traits is equal and the probability of as LLSLL. Are any pairs from these exhibiting both traits is 10%, what are the three events mutually exclusive? odds in favour of a kitten having blue eyes? c) What is the probability of event A in 15. Communication part b)? a) A standard deck of cards is shuffled and d) What is the chance of the Leafs winning three cards are selected. What is the in exactly five games? probability that the third card is either e) Explain how you found your answers to a red face card or a king if the king of parts c) and d). diamonds and the king of spades are selected as the first two cards? b) Does this probability change if the first two cards selected are the queen of diamonds and the king of spades? Explain. 342 MHR • Introduction to Probability
  • 343. 16. Inquiry/Problem Solving The table below lists v) a male or a graduate in mathematics the degrees granted by Canadian universities and physical sciences? from 1994 to 1998 in various fields of study. b) If a male graduate from 1996 is selected a) If a Canadian university graduate from at random, what is the probability that 1998 is chosen at random, what is the he is graduating in mathematics and probability that the student is physical sciences? i) a male? c) If a mathematics and physical sciences ii) a graduate in mathematics and graduate is selected at random from the physical sciences? period 1994 to 1996, what is the probability that the graduate is a male? iii) a male graduating in mathematics and physical sciences? d) Do you think that being a male and graduating in mathematics and physical iv) not a male graduating in mathematics sciences are independent events? Give and physical sciences? reasons for your hypothesis. 1994 1995 1996 1997 1998 Canada 178 074 178 066 178 116 173 937 172 076 Male 76 470 76 022 75 106 73 041 71 949 Female 101 604 102 044 103 010 100 896 100 127 Social sciences 69 583 68 685 67 862 66 665 67 019 Male 30 700 29 741 29 029 28 421 27 993 Female 38 883 38 944 38 833 38 244 39 026 Education 30 369 30 643 29 792 27 807 25 956 Male 9093 9400 8693 8036 7565 Female 21 276 21 243 21 099 19 771 18 391 Humanities 23 071 22 511 22 357 21 373 20 816 Male 8427 8428 8277 8034 7589 Female 14 644 14 083 14 080 13 339 13 227 Health professions and occupations 12 183 12 473 12 895 13 073 12 658 Male 3475 3461 3517 3460 3514 Female 8708 9012 9378 9613 9144 Engineering and applied sciences 12 597 12 863 13 068 12 768 12 830 Male 10 285 10 284 10 446 10 125 10 121 Female 2312 2579 2622 2643 2709 Agriculture and biological sciences 10 087 10 501 11 400 11 775 12 209 Male 4309 4399 4756 4780 4779 Female 5778 6102 6644 6995 7430 Mathematics and physical sciences 9551 9879 9786 9738 9992 Male 6697 6941 6726 6749 6876 Female 2854 2938 3060 2989 3116 Fine and applied arts 5308 5240 5201 5206 5256 Male 1773 1740 1780 1706 1735 Female 3535 3500 3421 3500 3521 Arts and sciences 5325 5271 5755 5532 5340 Male 1711 1628 1882 1730 1777 Female 3614 3643 3873 3802 3563 6.5 Mutually Exclusive Events • MHR 343
  • 344. 6.6 Applying Matrices to Probability Problems In some situations, the probability of an outcome depends on the outcome of the previous trial. Often this pattern appears in stock market trends, weather patterns, athletic performance, and consumer habits. Dependent probabilities can be calculated using Markov chains, a powerful probability model pioneered about a century ago by the Russian mathematician Andrei Markov. I N V E S T I G AT E & I N Q U I R E : Running Late Although Marla tries hard to be punctual, the demands of her home life and the challenges of commuting sometimes cause her to be late for work. When she is late, she tries especially hard to be punctual the next day. Suppose that the following pattern emerges: If Marla is punctual on any given day, then there is a 70% chance that she will be punctual the next day and a 30% chance that she will be late. On days she is late, however, there is a 90% chance that she will be punctual the next day and just a 10% chance that she will be late. Suppose Marla is punctual on the first day of the work week. 1. Create a tree diagram of the possible outcomes for the second and third days. Show the probability for each branch. 2. a) Describe two branches in which Marla is punctual on day 3. b) Use the product rule for dependent events on page 332 to calculate the compound probability of Marla being punctual on day 2 and on day 3. c) Find the probability of Marla being late on day 2 and punctual on day 3. d) Use the results from parts b) and c) to determine the probability that Marla will be punctual on day 3. 3. Repeat question 2 for the outcome of Marla being late on day 3. 4. a) Create a 1 × 2 matrix A in which the first element is the probability that Marla is punctual and the second element is the probability that she is late on day 1. Recall that Marla is punctual on day 1. b) Create a 2 × 2 matrix B in which the elements in each row represent conditional probabilities that Marla will be punctual and late. Let the first row be the probabilities after a day in which Marla was punctual, and the second row be the probabilities after a day in which she was late. 344 MHR • Introduction to Probability
  • 345. c) Evaluate A × B and A × B2. d) Compare the results of part c) with your answers to questions 2 and 3. Explain what you notice. e) What does the first row of the matrix B2 represent? The matrix model you have just developed is an example of a Markov chain, a probability model in which the outcome of any trial depends directly on the outcome of the previous trial. Using matrix operations can simplify probability calculations, especially in determining long-term trends. The 1 × 2 matrix A in the investigation is an initial probability vector, S (0), and represents the probabilities of the initial state of a Markov chain. The 2 × 2 matrix B is a transition matrix, P, and represents the probabilities of moving from any initial state to a new state in any trial. These matrices have been arranged such that the product S (0) × P generates the row matrix that gives the probabilities of each state after one trial. This matrix is called the first-step probability vector, S (1). In general, the nth- step probability vector, S (n), can be obtained by repeatedly multiplying the probability vector by P. Sometimes these vectors are also called first-state and nth-state vectors, respectively. Notice that each entry in a probability vector or a transition matrix is a probability and must therefore be between 0 and 1. The possible states in a Markov chain are always mutually exclusive events, one of which must occur at each stage. Therefore, the entries in a probability vector must sum to 1, as must the entries in each row of the transition matrix. Example 1 Probability Vectors Two video stores, Video Vic’s and MovieMaster, have just opened in a new residential area. Initially, they each have half of the market for rented movies. A customer who rents from Video Vic’s has a 60% probability of renting from Video Vic’s the next time and a 40% chance of renting from MovieMaster. On the other hand, a customer initially renting from MovieMaster has only a 30% likelihood of renting from MovieMaster the next time and a 70% probability of renting from Video Vic’s. a) What is the initial probability vector? b) What is the transition matrix? c) What is the probability of a customer renting a movie from each store the second time? d) What is the probability of a customer renting a movie from each store the third time? e) What assumption are you making in part d)? How realistic is it? 6.6 Applying Matrices to Probability Problems • MHR 345
  • 346. Solution a) Initially, each store has 50% of the market, so, the initial probability vector is VV MM S (0) = [0.5 0.5] b) The first row of the transition matrix represents the probabilities for the second rental by customers whose initial choice was Video Vic’s. There is a 60% chance that the customer returns, so the first entry is 0.6. It is 40% likely that the customer will rent from MovieMaster, so the second entry is 0.4. Similarly, the second row of the transition matrix represents the probabilities for the second rental by customers whose first choice was MovieMaster. There is a 30% chance that a customer will return on the next visit, and a 70% chance that the customer will try Video Vic’s. VV MM P= ΄ 0.6 0.4 VV 0.7 0.3 MM ΅ Regardless of which store the customer chooses the first time, you are assuming that there are only two choices for the next visit. Hence, the sum of the probabilities in each row equals one. c) To find the probabilities of a customer renting from either store on the second visit, calculate the first-step probability vector, S (1): S (1) = S (0)P = [0.5 0.5] ΄ 0.6 0.7 0.4 0.3 ΅ = [0.65 0.35] This new vector shows that there is a 65% probability that a customer will rent a movie from Video Vic’s on the second visit to a video store and a 35% chance that the customer will rent from MovieMaster. d) To determine the probabilities of which store a customer will pick on the third visit, calculate the second-step probability vector, S (2): S (2) = S (1)P = [0.65 0.35] ΄ 0.6 0.7 0.4 0.3 ΅ = [0.635 0.365] So, on a third visit, a customer is 63.5% likely to rent from Video Vic’s and 36.5% likely to rent from MovieMaster. 346 MHR • Introduction to Probability
  • 347. e) To calculate the second-step probabilities, you assume that the conditional transition probabilities do not change. This assumption might not be realistic since customers who are 70% likely to switch away from MovieMaster may not be as much as 40% likely to switch back, unless they forget why they switched in the first place. In other words, Markov chains have no long-term memory. They recall only the latest state in predicting the next one. Note that the result in Example 1d) could be calculated in another way. S (2) = S (1)P = (S (0)P)P = S (0)(PP) since matrix multiplication is associative = S (0)P 2 Similarly, S (3) = S (0)P 3, and so on. In general, the nth-step probability vector, S (n), is given by S (n) = S (0)P n This result enables you to determine higher-state probability vectors easily using a graphing calculator or software. Example 2 Long-Term Market Share A marketing-research firm has tracked the sales of three brands of hockey sticks. Each year, on average, • Player-One keeps 70% of its customers, but loses 20% to Slapshot and 10% to Extreme Styx • Slapshot keeps 65% of its customers, but loses 10% to Extreme Styx and 25% to Player-One • Extreme Styx keeps 55% of its customers, but loses 30% to Player-One and 15% to Slapshot a) What is the transition matrix? b) Assuming each brand begins with an equal market share, determine the market share of each brand after one, two, and three years. c) Determine the long-range market share of each brand. d) What assumption must you make to answer part c)? 6.6 Applying Matrices to Probability Problems • MHR 347
  • 348. Solution 1 Using Pencil and Paper a) The transition matrix is P S E ΄ ΅ 0.7 0.2 0.1 P P= 0.25 0.65 0.1 S 0.3 0.15 0.55 E b) Assuming each brand begins with an equal market share, the initial probability vector is ΄ ΅ 1 1 1 S (0) = ᎏᎏ ᎏᎏ ᎏᎏ 3 3 3 To determine the market shares of each brand after one year, compute the first-step probability vector. S (1) = S (0)P ΄ ΅ 0.7 0.2 0.1 ΄ ΅ 1 1 1 = ᎏᎏ ᎏᎏ ᎏᎏ 0.25 0.65 0.1 3 3 3 0.3 0.15 0.55 – – = [0.416 0.3 0.25] So, after one year Player-One will have a market share of approximately 42%, Slapshot will have 33%, and Extreme Styx will have 25%. Similarly, you can predict the market shares after two years using S (2) = S (1)P ΄ ΅ 0.7 0.2 0.1 – – = [0.416 0.3 0.25] 0.25 0.65 0.1 0.3 0.15 0.55 = [0.45 0.3375 0.2125] After two years, Player-One will have approximately 45% of the market, Slapshot will have 34%, and Extreme Styx will have 21%. The probabilities after three years are given by S (3) = S (2)P ΄ ΅ 0.7 0.2 0.1 = [0.45 0.3375 0.2125] 0.25 0.65 0.1 0.3 0.15 0.55 = [0.463 0.341 0.196] After three years, Player-One will have approximately 46% of the market, Slapshot will have 34%, and Extreme Styx will have 20%. 348 MHR • Introduction to Probability
  • 349. c) The results from part b) suggest that the relative market shares may be converging to a steady state over a long period of time. You can test this hypothesis by calculating higher-state vectors and checking for stability. For example, S (10) = S (9)P S (11) = S (10)P = [0.471 0.347 0.182] = [0.471 0.347 0.182] The values of S (10) and S (11) are equal. It is easy to verify that they are equal to all higher orders of S (n) as well. The Markov chain has reached a steady state. A steady-state vector is a probability vector that remains unchanged when multiplied by the transition matrix. A steady state has been reached if S (n) = S (n)P = S (n+1) In this case, the steady state vector [0.471 0.347 0.182] indicates that, over a long period of time, Player-One will have approximately 47% of the market for hockey sticks, while Slapshot and Extreme Styx will have 35% and 18%, respectively, based on current trends. d) The assumption you make in part c) is that the transition matrix does not change, that is, the market trends stay the same over the long term. Solution 2 Using a Graphing Calculator a) Use the MATRX EDIT menu to enter and store a matrix for the transition matrix B. b) Similarly, enter the initial probability vector as matrix A. Then, use the MATRX EDIT menu to enter the calculation A × B on the home screen. The resulting matrix shows the market shares after one year are 42%, 33%, and 25%, respectively. To find the second-step probability vector use the formula S (2) = S (0)P2. Enter A × B2 using the MATRX NAMES menu and the 2 key. After two years, therefore, the market shares are 45%, 34%, and 21%, respectively. 6.6 Applying Matrices to Probability Problems • MHR 349
  • 350. Similarly, enter A × B 3 to find the third-step probability vector. After three years, the market shares are 46%, 34%, and 20%, respectively. c) Higher-state probability vectors are easy to determine with a graphing calculator. S (10) = S (0)P 10 = [0.471 0.347 0.182] S (100) = S (0)P 100 = [0.471 0.347 0.182] S (10) and S (100) are equal. The tiny difference between S (10) and S (100) is unimportant since the original data has only two significant digits. Thus, [0.471 0.347 0.182] is a steady-state vector, and the long-term market shares are predicted to be about 47%, 35%, and 18% for Player-One, Slapshot, and Extreme Styx, respectively. Regular Markov chains always achieve a steady state. A Markov chain is Project regular if the transition matrix P or some power of P has no zero entries. Prep Thus, regular Markov chains are fairly easy to identify. A regular Markov chain will reach the same steady state regardless of the initial probability vector. In the probability project, you may need to use Example 3 Steady State of a Regular Markov Chain Markov chains to Suppose that Player-One and Slapshot initially split most of the market evenly determine long- between them, and that Extreme Styx, a relatively new company, starts with a term probabilities. 10% market share. a) Determine each company’s market share after one year. b) Predict the long-term market shares. Solution a) The initial probability vector is S (0) = [0.45 0.45 0.1] Using the same transition matrix as in Example 2, S(1) = S(0)P ΄ ΅ 0.7 0.2 0.1 = [0.45 0.45 0.1] 0.25 0.65 0.1 0.3 0.15 0.55 = [0.4575 0.3975 0.145] – – These market shares differ from those in Example 2, where S (1) = [0.416 0.3 0.25]. 350 MHR • Introduction to Probability
  • 351. b) S (100) = S (0)P 100 = [0.471 0.347 0.182] In the long term, the steady state is the same as in Example 2. Notice that although the short-term results differ as seen in part a), the same steady state is achieved in the long term. The steady state of a regular Markov chain can also be determined analytically. Example 4 Analytic Determination of Steady State The weather near a certain seaport follows this pattern: If it is a calm day, there is a 70% chance that the next day will be calm and a 30% chance that it will be stormy. If it is a stormy day, the chances are 50/50 that the next day will also be stormy. Determine the long-term probability for the weather at the port. Solution The transition matrix for this Markov chain is C S P= ΄0.7 0.3 C 0.5 0.5 S ΅ The steady-state vector will be a 1 × 2 matrix, S (n) = [ p q]. The Markov chain will reach a steady state when S (n) = S (n)P, so [ p q] = [ p q] ΄ 0.7 0.5 0.3 0.5 ΅ = [0.7p + 0.5q 0.3p + 0.5q] Setting first elements equal and second elements equal gives two equations in two unknowns. These equations are dependent, so they define only one relationship between p and q. p = 0.7p + 0.5q q = 0.3p + 0.5q Subtracting the second equation from the first gives p − q = 0.4p q = 0.6p 6.6 Applying Matrices to Probability Problems • MHR 351
  • 352. Now, use the fact that the sum of probabilities at any state must equal 1, p+q=1 p + 0.6p = 1 1 p= ᎏ 1.6 = 0.625 q=1−p = 0.375 So, the steady-state vector for the weather is [0.625 0.375]. Over the long term, there will be a 62.5% probability of a calm day and 37.5% chance of a stormy day at the seaport. Key Concepts • The theory of Markov chains can be applied to probability models in which the outcome of one trial directly affects the outcome of the next trial. • Regular Markov chains eventually reach a steady state, which can be used to make long-term predictions. Communicate Your Understanding 1. Why must a transition matrix always be square? 2. Given an initial probability vector S (0) = [0.4 0.6] and a transition matrix P= ΄ 0.5 0.3 0.7 ΅ 0.5 , state which of the following equations is easier to use for determining the third-step probability vector: S (3) = S (2)P or S (3) = S (0)P 3 Explain your choice. 3. Explain how you can determine whether a Markov chain has reached a steady state after k trials. 4. What property or properties must events A, B, and C have if they are the only possible different states of a Markov chain? 352 MHR • Introduction to Probability
  • 353. Practise Apply, Solve, Communicate A B 1. Which of the following cannot be an initial 4. Refer to question 3. probability vector? Explain why. a) Which company do you think will a) [0.2 0.45 0.25] increase its long-term market share, b) [0.29 0.71] based on the information provided? Explain why you think so. c) ΄ 0.4 ΅ 0.6 b) Calculate the steady-state vector for the Markov chain. d) [0.4 −0.1 0.7] c) Which company increased its market e) [0.4 0.2 0.15 0.25] share over the long term? 2. Which of the following cannot be a d) Compare this result with your answer transition matrix? Explain why. to part a). Explain any differences. ΄ ΅ 0.3 0.3 0.4 5. For which of these transition matrices will a) 0.1 0 0.9 the Markov chain be regular? In each case, 0.2 0.3 0.4 explain why. b) ΄ 0.2 0.65 0.8 0.35 ΅ a) ΄ 0.2 0.95 0.8 0.05 ΅ c) ΄ 0.5 0.3 0.1 0.22 0.4 0.48 ΅ b) ΄ 1 0΅ 0 1 3. Two competing companies, ZapShot and ΄ ΅ 0.1 0.6 0.3 E-pics, manufacture and sell digital cameras. c) 0.33 0.3 0.37 Customer surveys suggest that the 0.5 0 0.5 companies’ market shares can be modelled using a Markov chain with the following 6. Gina noticed that the performance of her initial probability vector S(0) and transition baseball team seemed to depend on the matrix P. outcome of their previous game. When her ΄ 0.6 ΅ 0.4 team won, there was a 70% chance that they S (0) = [0.67 0.33] P= 0.50.5 would win the next game. If they lost, Assume that the first element in the initial however, there was only a 40% chance that probability vector pertains to ZapShot. they would their next game. Explain the significance of a) What is the transition matrix of the a) the elements in the initial probability Markov chain for this situation? vector b) Following a loss, what is the probability b) each element of the transition matrix that Gina’s team will win two games later? c) each element of the product S (0)P c) What is the steady-state vector for the Markov chain, and what does it mean? 6.6 Applying Matrices to Probability Problems • MHR 353
  • 354. 7. Application Two popcorn manufacturers, 9. Application On any given day, the stock price Ready-Pop and ButterPlus, are competing for Bluebird Mutual may rise, fall, or remain for the same market. Trends indicate that unchanged. These states, R, F, and U, can 65% of consumers who purchase Ready-Pop be modelled by a Markov chain with the will stay with Ready-Pop the next time, transition matrix: while 35% will try ButterPlus. Among those R F U who purchase ButterPlus, 75% will buy ΄ ΅ 0.75 0.15 0.1 R ButterPlus again and 25% will switch to 0.25 0.6 0.15 F Ready-Pop. Each popcorn producer initially 0.4 0.4 0.2 U has 50% of the market. a) What is the initial probability vector? a) If, after a day of trading, the value of Bluebird’s stock has fallen, what is the b) What is the transition matrix? probability that it will rise the next day? c) Determine the first- and second-step b) If Bluebird’s value has just risen, what is probability vectors. the likelihood that it will rise one week d) What is the long-term probability that a from now? customer will buy Ready-Pop? c) Assuming that the behaviour of the 8. Inquiry/Problem Solving The weather Bluebird stock continues to follow this pattern for a certain region is as follows. On established pattern, would you consider a sunny day, there is a 50% probability that Bluebird to be a safe investment? Explain the next day will be sunny, a 30% chance your answer, and justify your reasoning that the next day will be cloudy, and a 20% with appropriate calculations. chance that the next day will be rainy. On a cloudy day, the probability that the next day 10. Assume that each doe produces one female pt will be cloudy is 35%, while it is 40% likely ha e offspring. Let the two states be D, a normal C r to be rainy and 25% likely to be sunny the doe, and B, a doe with bald patches. m P r oble next day. On a rainy day, there is a 45% Determine chance that it will be rainy the next day, a a) the initial probability vector 20% chance that the next day will be sunny, b) the transition matrix for each generation and a 35% chance that the next day will be of offspring cloudy. c) the long-term probability of a new-born a) What is the transition matrix? doe developing bald patches b) If it is cloudy on Wednesday, what is the d) Describe the assumptions which are probability that it will be sunny on inherent in this analysis. What other Saturday? factors could affect the stability of this c) What is the probability that it will be Markov chain? sunny four months from today, according to this model? d) What assumptions must you make in part c)? Are they realistic? Why or why not? 354 MHR • Introduction to Probability
  • 355. ACHIEVEMENT CHECK C Knowledge/ Thinking/Inquiry/ 12. Communication Refer to Example 4 on Communication Application Understanding Problem Solving page 351. 11. When Mazemaster, the mouse, is placed in a) Suppose that the probability of stormy a maze like the one shown below, he will weather on any day following a calm day explore the maze by picking the doors at increases by 0.1. Estimate the effect this random to move from compartment to change will have on the steady state of compartment. A transition takes place the Markov chain. Explain your when Mazemaster moves through one of prediction. the doors into another compartment. Since b) Calculate the new steady-state vector and all the doors lead to other compartments, compare the result with your prediction. the probability of moving from a Discuss any difference between your compartment back to the same estimate and the calculated steady state. compartment in a single transition is zero. c) Repeat parts a) and b) for the situation in which the probability of stormy weather 1 6 following either a calm or a stormy day 4 increases by 0.1, compared to the data in Example 4. 2 7 d) Discuss possible factors that might cause 5 the mathematical model to be altered. 3 8 13. For each of the transition matrices below, decide whether the Markov chain is regular a) Construct the transition matrix, P, for and whether it approaches a steady state. the Markov chain. (Hint: An irregular Markov chain could still b) Use technology to calculate P 2, P 3, and have a steady-state vector.) P 4. c) If Mazemaster starts in compartment 1, a) ΄ 0 1΅ 1 0 b) ΄0 0.5 1 0.5 ΅ what is the probability that he will be in compartment 4 after i) two transitions? c) ΄1 0.5 0 0.5 ΅ ii) three transitions? 14. Refer to Example 2 on page 347. iii) four transitions? a) Using a graphing calculator, find P 100. d) Predict where Mazemaster is most likely Describe this matrix. to be in the long run. Explain the b) Let S (0) = [a b c]. Find an expression for reasoning for your prediction. the value of S (0)P 100. Does this expression e) Calculate the steady-state vector. Does depend on S (0), P, or both? it support your prediction? If not, c) What property of a regular Markov identify the error in your reasoning in chain can you deduce from your answer part d). to part b)? 6.6 Applying Matrices to Probability Problems • MHR 355
  • 356. 15. Inquiry/Problem Solving The transition matrix for a Markov chain with steady-state ΄ 7 13 13 6 ΅ ΄ vector of ᎏᎏ ᎏᎏ is 0.4 0.6 . m n ΅ Determine the unknown transition matrix elements, m and n. Career Connection Investment Broker Many people use the services of an investment broker to help them invest their earnings. An investment broker provides advice to clients on how to invest their money, based on their individual goals, income, and risk tolerance, among other factors. An investment broker can work for a financial institution, such as a bank or trust company, or a brokerage, which is a company that specializes in investments. An investment broker typically buys, sells, and trades a variety of investment items, including stocks, bonds, mutual funds, and treasury bills. An investment broker must be able to read and acceptable substitute. A broker must have interpret a variety of financial data including a licence from the provincial securities periodicals and corporate reports. Based on commission and must pass specialized courses experience and sound mathematical principles, in order to trade in specific investment the successful investment broker must be able products such as securities, options, and futures to make reasonable predictions of uncertain contracts. The chartered financial analyst (CFA) outcomes. designation is recommended for brokers wishing to enter the mutual-fund field or other Because of the nature of this industry, earnings financial-planning services. often depend directly on performance. An investment broker typically earns a commission, similar to that for a sales representative. In the short term, the investment broker can expect some fluctuations in earnings. In the long www.mcgrawhill.ca/links/MDM12 term, strong performers can expect a very Visit the above web site and follow the links to comfortable living, while weak performers find out more about an investment broker and are not likely to last long in the field. other careers related to mathematics. Usually, an investment broker requires a minimum of a bachelor’s degree in economics or business, although related work experience in investments or sales is sometimes an 356 MHR • Introduction to Probability
  • 357. Review of Key Concepts 6.1 Basic Probability Concepts Based on this survey, calculate Refer to the Key Concepts on page 311. a) the odds that a customer visited the restaurant exactly three times 1. A bag of marbles contains seven whites, five blacks, and eight cat’s-eyes. Determine the b) the odds in favour of a customer having probability that a randomly drawn marble is visited the restaurant fewer than three times a) a white marble c) the odds against a customer having visited the restaurant more than three times b) a marble that is not black 2. When a die was rolled 20 times, 4 came up 6.3 Probabilities Using Counting five times. Techniques a) Determine the empirical probability of Refer to the Key Concepts on page 324. rolling a 4 with a die based on the 20 trials. 6. Suppose three marbles are selected at random b) Determine the theoretical probability of from the bag of marbles in question 1. rolling a 4 with a die. a) Draw a tree diagram to illustrate all c) How can you account for the difference possible outcomes. between the results of parts a) and b)? b) Are all possible outcomes equally likely? 3. Estimate the subjective probability of each Explain. event and provide a rationale for your c) Determine the probability that all three decision. selected marbles are cat’s-eyes. a) All classes next week will be cancelled. d) Determine the probability that none of b) At least one severe snow storm will occur the marbles drawn are cat’s-eyes. in your area next winter. 7. The Sluggers baseball team has a starting line- up consisting of nine players, including Tyrone 6.2 Odds and his sister Amanda. If the batting order is Refer to the Key Concepts on page 317. randomly assigned, what is the probability that Tyrone will bat first, followed by Amanda? 4. Determine the odds in favour of flipping three coins and having them all turn up 8. A three-member athletics council is to be heads. randomly chosen from ten students, five of whom are runners. The council positions 5. A restaurant owner conducts a study that are president, secretary, and treasurer. measures the frequency of customer visits in Determine the probability that a given month. The results are recorded in the following table. a) the committee is comprised of all runners Number of Visits Number of Customers 1 4 b) the committee is comprised of the three 2 6 eldest runners 3 7 c) the eldest runner is president, second 4 or more 3 eldest runner is secretary, and third eldest runner is treasurer Review of Key Concepts • MHR 357
  • 358. 6.4 Dependent and Independent Events 12. During a marketing blitz, a telemarketer Refer to the Key Concepts on page 333. conducts phone solicitations continuously from 16 00 until 20 00. Suppose that you 9. Classify each of the following pairs of events have a 20% probability of being called as independent or dependent. during this blitz. If you generally eat dinner First Event Second Event between 18 00 and 18 30, how likely is it a) Hitting a home run Catching a pop fly that the telemarketer will interrupt your while at bat while in the field dinner? b) Staying up late Sleeping in the next day c) Completing your Passing your 6.5 Mutually Exclusive Events calculus review calculus exam Refer to the Key Concepts on page 340. d) Randomly selecting Randomly selecting 13. Classify each pair of events as mutually a shirt a tie exclusive or non-mutually exclusive. 10. Bruno has just had job interviews with two First Event Second Event separate firms: Golden Enterprises and a) Randomly selecting Randomly selecting Outer Orbit Manufacturing. He estimates a classical CD a rock CD that he has a 40% chance of receiving a job b) Your next birthday Your next birthday offer from Golden and a 75% chance of occurring on a occurring on a Wednesday weekend receiving an offer from Outer Orbit. c) Ordering a Ordering a a) What is the probability that Bruno will hamburger with hamburger with no receive both job offers? cheese onions b) Is Bruno applying the concept of d) Rolling a perfect Rolling an even theoretical, empirical, or subjective square with a die number with a die probability? Explain.