Visualising Road Traffic Accident Data

1.	 	 A Brief History of Data Visualisation
          The power and importance of effective visualisation has long been recognised.
	         William Playfair (1759-1823) the founder of statistical graphics contrasted his new graphical
          method with the tabular presentation of data as follows;

          ‘Information, that is imperfectly acquired, is generally imperfectly retained; and a man who
          has carefully inspected a printed table, finds when done, that he has only a very faint and
          partial ideas of what he has read’ (1)


          This view has been echoed over the intervening 200 years. For example, Florence
          Nightingale (1820-1910) recognised the power of data visualisation as an effective aid for
          communicating to a wide audience issues of concern particularly the impact of poor
          sanitation on mortality rates during the Crimean war. This is summarised in her statement of
          the power of graphics‘ to affect thro the eyes what we may fail to convey to the brains of the
          public through their word-proof ears’.
          	
          Graphical innovations were relatively absent in the first half of the 20th century
          but renewed interest in visualisation followed the publication in 1962 of a paper entitled ‘The
          Future of Data Analysis’ (2) by American statistician John W. Tukey. This paper was regarded
          as a landmark in data visualisation. Tukey suggested that we examine our data as a
          detective would examine the scene of a crime - not with a hypothesis - ‘I’ll bet the butler did
          it’, but with an open mind and as few assumptions as possible.

          This approach was a radical departure from conventional data analysis (and research
          programmes in general) which tended to be based on the scientific principles of formulating a
          hypothesis, collecting appropriate data and finally using some test statistic to decide on the
          validity of the hypothesis. Tukey believed by letting the data speak to us ‘we can learn the
          truths hidden beneath the random fluctuations, errors and general confusion seen in real
          data’.


          The publication in 1967 of Jacques Bertin's ‘Semiologie Graphique’ (3) was also an
          important milestone in the development of data visualisation. In his foreword to the English
          version of this text published in 1983 Howard Wainer states that the text ‘is the most
          important work on graphics since the publication of William Playfair's Atlas. While William


Cyril Connolly, IADT!                              1
Playfair illustrated good graphic practice over 200 years previously he did not explain why the
            specific structures of his graphic forms and formats work’.
            The development of a variety of highly specialised and well-developed interactive computer
            systems during the 1970s allowed data to be analysed in a dynamic, iterative and visual
            manner. One of the early systems was known as the PRIM-9 (4) at the Standford Linear
            Accelerator Centre. PRIM stood for Projection, Rotation, Isolation and Masking and allowed
            for the exploration of multidimensional data in up to nine dimensions. It ran on an IBM
            system and required a few million dollars worth of computer and display hardware, (the
            display unit was $400,000 alone) and cost several hundred dollars an hour to use.


           Later developments in hardware and software allowed PRIM technology to become
            generally available on desktop computers. The innovative Apple Macintosh hardware
            and software, first produced during the mid 1980s led the way in these developments
            with applications like MacSpin (5) and DataDesk (6). These changes in computer
            systems have as William Cleveland states in his text Visualising Data (7) ‘changed how
            we carry out visualisation but not its goals’



2.	 	 Data Visualisation using DataDesk
	

         DataDesk was originally developed on the Apple Macintosh platform by Apple
            research fellow, Paul Velleman during the latter part of the 1980’s and subsequently
            become available on the Windows platform. The principle feature of DataDesk, in
            contrast to other mainstream data analysis applications, is the ability to interact with
            multiple linked views of a dataset, so that, for example, selecting a subset of cases
            in one view highlights them in all other views. This ability to ‘slice and dice’ data
            using dynamic and interactive tools brings statistics to life generating interest and
            an appreciation of its importance in the decision making process. Some examples
            of the use of DataDesk to explore Irish road accident data are shown below.

            
      i)	   Regional Variation of Road Accidents

	

         The knife tool ‘slices’ over the east coast of Ireland’s accident scatterplot map in Figure
            1 . The two bar charts to the right of this plot illustrate the daily (Sunday = 1, Saturday
            = 7) and monthly distribution of accidents (January = 1, December = 12). From the
            plot the distribution of accidents along the east cost by weekday and month appears
            to be fairly constant by weekday and month.




Cyril Connolly, IADT!                                2
Figure 1: Spatial distribution of east and west coast accidents	



!            If the knife is moved to the west coast as shown in Figure 1 the bar charts update
             automatically and the distribution of accidents by weekday and month reveal a different
             pattern to the east coast. Accidents by weekday are lowest during midweek and
             highest at the weekends while accidents by month are highest during the summer
             months and lowest during the winter months.


      ii)	   The Influence of Daylight Variation on Pedestrian Road Accidents

      !      Figure 2 illustrates the number of pedestrians killed in Ireland by month between 2000 and
             2006. The plot suggests a U profile with accidents higher in the winter months but lower in
             the summer months.




	




!             Figure 2: Monthly Distribution of Fatal Pedestrian Road Accidents,


!             To investigate this pattern in more detail a plot of the number of fatal pedestrians by hour is
              generated. Browsing the the hourly bar chart with the knife tool it becomes clear that the U
              shape is explained by fatalities between 16:00 to 21:00 hours as shown in Figure 3.




Cyril Connolly, IADT!                                  3
Figure 3: Monthly distribution of fatal accidents between 16:00 and 21:00 (left) and excluding the

            hours 16:00-21:00 (right)	


	           This is further illustrated by examining the distribution of accidents excluding the hours 16:00
            to 21:00 as shown in Figure 3. The monthly bar chart now shows no evidence of a seasonal
            profile. The seasonal U profile of fatal pedestrian accidents during these hours is explained
            by the variation in the number of hours of daylight during these hours throughout the year
            (8).


	           For the winter months of December and January there is virtually no daylight during these
            hours and the corresponding number of fatal accidents is highest. For the summer months
            of June and July there is almost complete daylight between 4pm and 10pm and the number
            of pedestrian accident is lowest.
.


    iii)	   Accident Profiling using Rotating Plots


           The French cartographer Jacques Bertin stated in his ground breaking text Graphics and
            Graphic Information Processing (9) that ‘it is not sufficient to have data, to have statistics, in
            order to arrive at a decision. Items of data do not supply the information necessary for
            decision making. What must be seen are the relationships which emerge from consideration
            of the entire set of data’


	           This statement is illustrated in the examination of the age distribution of the driver, front and
            rear seat passengers coded as ageDr, ageFP and ageRP, respectively. If we are restricted
            to working in what Edward Tufte (10) refers to as two-dimensional Flatland we would
            generate three scatterplots which would examine the relationship between driver and front
            seat passenger, driver and rear seat passenger and front seat and rear seat passenger as
            shown in Figure 4.




Cyril Connolly, IADT!                                 4
Figure 4: Scatterplots of the age of driver vs age of front passenger (left), age of driver vs age of rear

          passenger (centre) and age of front seat passenger versus age of rear seat passenger

	         While these plots illustrate the presence of up to three clusters it is through the use of a
          rotating plot that we can see the overall relationships emerging from consideration of the
          entire set of data as shown in Figure 5. After spending a short time rotating the data a star
          shape becomes evident with each arm corresponding to a distinctive cluster. Investigating
          the profile of each cluster is easy with DataDesk. Capturing each cluster using a lasso tool
          and dynamically linking the cluster with variables of hour, primcoltype, ageDr, ageFP and
          ageRP, and genderDr, genderFP and genderRP gender the profile of this segment can be
          readily determined.

	         For example, in Figure 5 the centre cluster is selected. The linked variables suggest that this
          profile comprises young vehicle occupants with a substantial number of accidents in the
          early hours of the morning, a high proportion of primcoltype code 2 values which
          corresponds to single vehicle accidents. In addition, the profile of the driver is primarily male
          with an excess of male over female passengers. In summary, this accident profile is
          explained by young male drivers with passengers of a similar age who are involved primarily
          in single vehicle accidents. The principal causal factor associated with this profile is alcohol
          and /or excessive speed.




	         Figure 5: Centre of star cluster with dynamically linked variables hour, type of collision, age and
          gender of vehicle occupants


Cyril Connolly, IADT!                                  5
In contrast, selecting the southern arm of the star in Figure 6 we see a considerably different
         profile. The early morning surge is absent as is the dominance of code 2 primcoltype. The
         driver and front seat passengers are of a similar but older age profile with a considerably
         younger rear seat passenger. The drivers are primarily male, the front seat passengers are
         primarily female while the distribution of male and female rear seat passengers is virtually the
         same. It is clear that this profile represents accidents involving parents with a young child in
         the rear seat.
	
         The ability to slice, brush and rotate data allows the analyst to discover hidden patterns
         and relationships while also providing a framework for explaining more theoretical
         concepts including the use of multivariate analysis techniques




	        Figure 6: Southern arm of star cluster with dynamically linked variables hour, type of collision, age and
         gender of vehicle occupants


    
     In summary data visualisation is described by the American psychologist and statistician
          Michael Friendly as ‘an approach to data analysis that focuses on insightful graphical display.
          The word ‘insightful’ suggests that the goal is (we hope) to reveal some aspects of the data
          that might not be perceived, appreciated or absorbed by other means’ (11).




Cyril Connolly, IADT!                                6
!



  !        References

  [1]
     Playfair, William, Commercial and Political Atlas, London, 1786, pp xiii- xiv. Reprinted as Playfair’s
           Commercial and Political Atlas and Statistical Breviary edited and introduced by Howard Wainer and
           Ian Spence, 2005, Cambridge University Press.


   [2]	    Tukey, J. W., 1962, The future of data analysis, Annals of Mathematical Statistics,
   	       33: 1-67, 812.


   [3]	    Bertin, J, Semiologie Graphique, 1967, Paris: Editions Gauthier-Villars. English translation by W.J.
           Berg as Semiology of Graphics, Madison, WI: University of Wisconsin Press, 1983., (reprinted in
           October 2010 by ESRI Press)

   [4]	    Fisherkeller, M.A., Friedman, J.H., and Tukey, J.W., 1975, PRIM-9: an interactive multidimensional
           data display analysis system, Data: Its Use, Organisation and Management, 140-145. New York: The
           Association for Computing Machinery.


   [5]	    Donoho, A.W., Donoho, D.L., and Gasko, M, 1988, MacSpin: Dynamic Graphics on a desktop
           computer. In W.S Cleveland and M.E. McGill, eds., Dynamic Graphics for Statistics. Belmont, CA:
           Wadsworth, pp 331-351.


   [6]	    Velleman, P.F., 1988, Data Desk. Ithaca, New York: Data Descriptions Inc.


   [7] 	   Cleveland, W.S, Visualising data, 1993, Hobart Press, page 2.


   [8]	    Pedestrian Accidents in Ireland, Great Britain and Northern Ireland, 1998, National Roads Authority,
           Dublin.		


   [9]
    Bertin, J, La Graphique et le Treatment Graphique de I’Information 1977, Paris: Flammarion. English
           translation by W.J. Berg and P. Scott as Graphics and Graphic Information Processing, 1981, Berlin:
           Walter de Gruyter & Co.


   [10]	 Tufte, E.R, 1990, Envisioning information, Graphics Press. pp 12-30.


   [11]	 Friendly, M.,2001, Visualizing Categorical Data, SAS Institute Inc.,Cary, NC, USA.




Cyril Connolly, IADT!                                   7

More Related Content

PDF
PDF
Perspectivas de lo urbano
PPTX
Open source health gis presentation final
PDF
Help drought indices tool
PDF
mobmap hands-on @ FOSS4G SEOUL 2015
PPT
NUTS AND SECTORAL LOANS DEFAULT CHART OF TURKEY: Graphical Data-Mining Analys...
PDF
p29-das (1)
PPT
Geography and Geographic Informations Systems
Perspectivas de lo urbano
Open source health gis presentation final
Help drought indices tool
mobmap hands-on @ FOSS4G SEOUL 2015
NUTS AND SECTORAL LOANS DEFAULT CHART OF TURKEY: Graphical Data-Mining Analys...
p29-das (1)
Geography and Geographic Informations Systems

What's hot (16)

PPT
Types of Diagram
DOC
Hydrographs prediction
PPT
Loans In Light of the New Support System The Financial Map: A Graphical Data-...
PDF
Land cover game lps16.ppt
PDF
How to find your way - US Army
PDF
Searching for GIS Nuggets: Mining Annual Reports by Canada’s Commissioner of ...
PPT
BANKING SECTOR ANALYSIS OF IZMIR PROVINCE: A GRAPHICAL DATA-MINING ANALYSIS ...
PDF
Big Two Map
PDF
RITMAN2012-kun
DOCX
survey paper 2
PPTX
Geographical information System
PPTX
The geopolitics of supply chains-interactive infographic guide
PPTX
Data visualization representation of Analytics data
PDF
Comments on ‘GIS and GeoSkills: New Ways to Achieve New Evidence for Better D...
PDF
Zontos_ep410_Report1_SSN
PPTX
Application of gis in natural disaster management
Types of Diagram
Hydrographs prediction
Loans In Light of the New Support System The Financial Map: A Graphical Data-...
Land cover game lps16.ppt
How to find your way - US Army
Searching for GIS Nuggets: Mining Annual Reports by Canada’s Commissioner of ...
BANKING SECTOR ANALYSIS OF IZMIR PROVINCE: A GRAPHICAL DATA-MINING ANALYSIS ...
Big Two Map
RITMAN2012-kun
survey paper 2
Geographical information System
The geopolitics of supply chains-interactive infographic guide
Data visualization representation of Analytics data
Comments on ‘GIS and GeoSkills: New Ways to Achieve New Evidence for Better D...
Zontos_ep410_Report1_SSN
Application of gis in natural disaster management
Ad

Similar to Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data (20)

PDF
Mazza introduction-to-information-visualization-2004
PDF
Class 3 visual representation of data
PPT
Archiving digital maps
PDF
Data science week_2_visualization
PPTX
RJW Odyssey the 1980s
PDF
Data as a Creative Material
PDF
pkpadmin,+1008-4741-1-CE.pdf
PPT
Scientific communication_presentation.ppt
PDF
Shortest path from seeing to understanding - benefits of strict standards in ...
PDF
Sight, Sound, Numbers & Us: Data Visualization + Data Sonification = Data Acc...
PDF
Visualizations and Mashups in Online News Production
PDF
What is the Major Power Linking Statistics & Data Mining? November 2013
PDF
What is the major power linking statistics & data mining
PDF
A Brief Guide To Designing Effective Figures For The Scientific Paper
PDF
Timeline Dr Bob Williams
PDF
AlfredoConetta_EGM712_GIS_Project
PDF
Inst 760_Data_Visualization_Final_paper
PDF
Innovation in Cartographic Communication
PPTX
Mac373 med312 data journalism lecture
Mazza introduction-to-information-visualization-2004
Class 3 visual representation of data
Archiving digital maps
Data science week_2_visualization
RJW Odyssey the 1980s
Data as a Creative Material
pkpadmin,+1008-4741-1-CE.pdf
Scientific communication_presentation.ppt
Shortest path from seeing to understanding - benefits of strict standards in ...
Sight, Sound, Numbers & Us: Data Visualization + Data Sonification = Data Acc...
Visualizations and Mashups in Online News Production
What is the Major Power Linking Statistics & Data Mining? November 2013
What is the major power linking statistics & data mining
A Brief Guide To Designing Effective Figures For The Scientific Paper
Timeline Dr Bob Williams
AlfredoConetta_EGM712_GIS_Project
Inst 760_Data_Visualization_Final_paper
Innovation in Cartographic Communication
Mac373 med312 data journalism lecture
Ad

More from Dublinked . (20)

PDF
Route to PA Project Meeting Dublinked Presentation 03.12.2015
PDF
Boost you Open Data with Co-Creation
PDF
Housing Intelligence for Dublin
PDF
Organicity - Co-creating Future Cities
PPT
The Local Asset Mapping Project (LAMP)
PPT
The 1911 Census
PPT
Future Skills Needs for Data and Analytics
PDF
Girls Hack Ireland
PDF
Dublinked - Celebrating Over Three Years of Open Data for the Dublin Region
PPTX
The CSO Open Data Experience
PPTX
Data, Infrastructure and Public Policy
PPTX
Startup Ireland and the Startup Gathering 2015
PDF
Catalysing research and enterprise collaboration in the data ecosystem
PDF
Open Data StartUp Stories in Ireland
PPTX
Roscommon County Council Open Data Portal
PPTX
Developing technology solutions for communities
PPTX
Open Data Ireland: Developing a national open data strategy
PDF
Open Knowledge Ireland
PPT
Data Driven Tranportation Analytics
PDF
The Irish Times Data Blog
Route to PA Project Meeting Dublinked Presentation 03.12.2015
Boost you Open Data with Co-Creation
Housing Intelligence for Dublin
Organicity - Co-creating Future Cities
The Local Asset Mapping Project (LAMP)
The 1911 Census
Future Skills Needs for Data and Analytics
Girls Hack Ireland
Dublinked - Celebrating Over Three Years of Open Data for the Dublin Region
The CSO Open Data Experience
Data, Infrastructure and Public Policy
Startup Ireland and the Startup Gathering 2015
Catalysing research and enterprise collaboration in the data ecosystem
Open Data StartUp Stories in Ireland
Roscommon County Council Open Data Portal
Developing technology solutions for communities
Open Data Ireland: Developing a national open data strategy
Open Knowledge Ireland
Data Driven Tranportation Analytics
The Irish Times Data Blog

Recently uploaded (20)

PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
The various Industrial Revolutions .pptx
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PPT
Geologic Time for studying geology for geologist
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
DOCX
search engine optimization ppt fir known well about this
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Architecture types and enterprise applications.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Module 1.ppt Iot fundamentals and Architecture
A contest of sentiment analysis: k-nearest neighbor versus neural network
1 - Historical Antecedents, Social Consideration.pdf
The influence of sentiment analysis in enhancing early warning system model f...
The various Industrial Revolutions .pptx
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Taming the Chaos: How to Turn Unstructured Data into Decisions
Consumable AI The What, Why & How for Small Teams.pdf
Getting started with AI Agents and Multi-Agent Systems
Geologic Time for studying geology for geologist
Abstractive summarization using multilingual text-to-text transfer transforme...
search engine optimization ppt fir known well about this
A review of recent deep learning applications in wood surface defect identifi...
Architecture types and enterprise applications.pdf
CloudStack 4.21: First Look Webinar slides
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
A comparative study of natural language inference in Swahili using monolingua...
A proposed approach for plagiarism detection in Myanmar Unicode text
Developing a website for English-speaking practice to English as a foreign la...
Credit Without Borders: AI and Financial Inclusion in Bangladesh

Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data

  • 1. Visualising Road Traffic Accident Data 1. A Brief History of Data Visualisation The power and importance of effective visualisation has long been recognised. William Playfair (1759-1823) the founder of statistical graphics contrasted his new graphical method with the tabular presentation of data as follows; ‘Information, that is imperfectly acquired, is generally imperfectly retained; and a man who has carefully inspected a printed table, finds when done, that he has only a very faint and partial ideas of what he has read’ (1) This view has been echoed over the intervening 200 years. For example, Florence Nightingale (1820-1910) recognised the power of data visualisation as an effective aid for communicating to a wide audience issues of concern particularly the impact of poor sanitation on mortality rates during the Crimean war. This is summarised in her statement of the power of graphics‘ to affect thro the eyes what we may fail to convey to the brains of the public through their word-proof ears’. Graphical innovations were relatively absent in the first half of the 20th century but renewed interest in visualisation followed the publication in 1962 of a paper entitled ‘The Future of Data Analysis’ (2) by American statistician John W. Tukey. This paper was regarded as a landmark in data visualisation. Tukey suggested that we examine our data as a detective would examine the scene of a crime - not with a hypothesis - ‘I’ll bet the butler did it’, but with an open mind and as few assumptions as possible. This approach was a radical departure from conventional data analysis (and research programmes in general) which tended to be based on the scientific principles of formulating a hypothesis, collecting appropriate data and finally using some test statistic to decide on the validity of the hypothesis. Tukey believed by letting the data speak to us ‘we can learn the truths hidden beneath the random fluctuations, errors and general confusion seen in real data’. The publication in 1967 of Jacques Bertin's ‘Semiologie Graphique’ (3) was also an important milestone in the development of data visualisation. In his foreword to the English version of this text published in 1983 Howard Wainer states that the text ‘is the most important work on graphics since the publication of William Playfair's Atlas. While William Cyril Connolly, IADT! 1
  • 2. Playfair illustrated good graphic practice over 200 years previously he did not explain why the specific structures of his graphic forms and formats work’. The development of a variety of highly specialised and well-developed interactive computer systems during the 1970s allowed data to be analysed in a dynamic, iterative and visual manner. One of the early systems was known as the PRIM-9 (4) at the Standford Linear Accelerator Centre. PRIM stood for Projection, Rotation, Isolation and Masking and allowed for the exploration of multidimensional data in up to nine dimensions. It ran on an IBM system and required a few million dollars worth of computer and display hardware, (the display unit was $400,000 alone) and cost several hundred dollars an hour to use. Later developments in hardware and software allowed PRIM technology to become generally available on desktop computers. The innovative Apple Macintosh hardware and software, first produced during the mid 1980s led the way in these developments with applications like MacSpin (5) and DataDesk (6). These changes in computer systems have as William Cleveland states in his text Visualising Data (7) ‘changed how we carry out visualisation but not its goals’ 2. Data Visualisation using DataDesk DataDesk was originally developed on the Apple Macintosh platform by Apple research fellow, Paul Velleman during the latter part of the 1980’s and subsequently become available on the Windows platform. The principle feature of DataDesk, in contrast to other mainstream data analysis applications, is the ability to interact with multiple linked views of a dataset, so that, for example, selecting a subset of cases in one view highlights them in all other views. This ability to ‘slice and dice’ data using dynamic and interactive tools brings statistics to life generating interest and an appreciation of its importance in the decision making process. Some examples of the use of DataDesk to explore Irish road accident data are shown below. i) Regional Variation of Road Accidents The knife tool ‘slices’ over the east coast of Ireland’s accident scatterplot map in Figure 1 . The two bar charts to the right of this plot illustrate the daily (Sunday = 1, Saturday = 7) and monthly distribution of accidents (January = 1, December = 12). From the plot the distribution of accidents along the east cost by weekday and month appears to be fairly constant by weekday and month. Cyril Connolly, IADT! 2
  • 3. Figure 1: Spatial distribution of east and west coast accidents ! If the knife is moved to the west coast as shown in Figure 1 the bar charts update automatically and the distribution of accidents by weekday and month reveal a different pattern to the east coast. Accidents by weekday are lowest during midweek and highest at the weekends while accidents by month are highest during the summer months and lowest during the winter months. ii) The Influence of Daylight Variation on Pedestrian Road Accidents ! Figure 2 illustrates the number of pedestrians killed in Ireland by month between 2000 and 2006. The plot suggests a U profile with accidents higher in the winter months but lower in the summer months. ! Figure 2: Monthly Distribution of Fatal Pedestrian Road Accidents, ! To investigate this pattern in more detail a plot of the number of fatal pedestrians by hour is generated. Browsing the the hourly bar chart with the knife tool it becomes clear that the U shape is explained by fatalities between 16:00 to 21:00 hours as shown in Figure 3. Cyril Connolly, IADT! 3
  • 4. Figure 3: Monthly distribution of fatal accidents between 16:00 and 21:00 (left) and excluding the hours 16:00-21:00 (right) This is further illustrated by examining the distribution of accidents excluding the hours 16:00 to 21:00 as shown in Figure 3. The monthly bar chart now shows no evidence of a seasonal profile. The seasonal U profile of fatal pedestrian accidents during these hours is explained by the variation in the number of hours of daylight during these hours throughout the year (8). For the winter months of December and January there is virtually no daylight during these hours and the corresponding number of fatal accidents is highest. For the summer months of June and July there is almost complete daylight between 4pm and 10pm and the number of pedestrian accident is lowest. . iii) Accident Profiling using Rotating Plots The French cartographer Jacques Bertin stated in his ground breaking text Graphics and Graphic Information Processing (9) that ‘it is not sufficient to have data, to have statistics, in order to arrive at a decision. Items of data do not supply the information necessary for decision making. What must be seen are the relationships which emerge from consideration of the entire set of data’ This statement is illustrated in the examination of the age distribution of the driver, front and rear seat passengers coded as ageDr, ageFP and ageRP, respectively. If we are restricted to working in what Edward Tufte (10) refers to as two-dimensional Flatland we would generate three scatterplots which would examine the relationship between driver and front seat passenger, driver and rear seat passenger and front seat and rear seat passenger as shown in Figure 4. Cyril Connolly, IADT! 4
  • 5. Figure 4: Scatterplots of the age of driver vs age of front passenger (left), age of driver vs age of rear passenger (centre) and age of front seat passenger versus age of rear seat passenger While these plots illustrate the presence of up to three clusters it is through the use of a rotating plot that we can see the overall relationships emerging from consideration of the entire set of data as shown in Figure 5. After spending a short time rotating the data a star shape becomes evident with each arm corresponding to a distinctive cluster. Investigating the profile of each cluster is easy with DataDesk. Capturing each cluster using a lasso tool and dynamically linking the cluster with variables of hour, primcoltype, ageDr, ageFP and ageRP, and genderDr, genderFP and genderRP gender the profile of this segment can be readily determined. For example, in Figure 5 the centre cluster is selected. The linked variables suggest that this profile comprises young vehicle occupants with a substantial number of accidents in the early hours of the morning, a high proportion of primcoltype code 2 values which corresponds to single vehicle accidents. In addition, the profile of the driver is primarily male with an excess of male over female passengers. In summary, this accident profile is explained by young male drivers with passengers of a similar age who are involved primarily in single vehicle accidents. The principal causal factor associated with this profile is alcohol and /or excessive speed. Figure 5: Centre of star cluster with dynamically linked variables hour, type of collision, age and gender of vehicle occupants Cyril Connolly, IADT! 5
  • 6. In contrast, selecting the southern arm of the star in Figure 6 we see a considerably different profile. The early morning surge is absent as is the dominance of code 2 primcoltype. The driver and front seat passengers are of a similar but older age profile with a considerably younger rear seat passenger. The drivers are primarily male, the front seat passengers are primarily female while the distribution of male and female rear seat passengers is virtually the same. It is clear that this profile represents accidents involving parents with a young child in the rear seat. The ability to slice, brush and rotate data allows the analyst to discover hidden patterns and relationships while also providing a framework for explaining more theoretical concepts including the use of multivariate analysis techniques Figure 6: Southern arm of star cluster with dynamically linked variables hour, type of collision, age and gender of vehicle occupants In summary data visualisation is described by the American psychologist and statistician Michael Friendly as ‘an approach to data analysis that focuses on insightful graphical display. The word ‘insightful’ suggests that the goal is (we hope) to reveal some aspects of the data that might not be perceived, appreciated or absorbed by other means’ (11). Cyril Connolly, IADT! 6
  • 7. ! ! References [1] Playfair, William, Commercial and Political Atlas, London, 1786, pp xiii- xiv. Reprinted as Playfair’s Commercial and Political Atlas and Statistical Breviary edited and introduced by Howard Wainer and Ian Spence, 2005, Cambridge University Press. [2] Tukey, J. W., 1962, The future of data analysis, Annals of Mathematical Statistics, 33: 1-67, 812. [3] Bertin, J, Semiologie Graphique, 1967, Paris: Editions Gauthier-Villars. English translation by W.J. Berg as Semiology of Graphics, Madison, WI: University of Wisconsin Press, 1983., (reprinted in October 2010 by ESRI Press) [4] Fisherkeller, M.A., Friedman, J.H., and Tukey, J.W., 1975, PRIM-9: an interactive multidimensional data display analysis system, Data: Its Use, Organisation and Management, 140-145. New York: The Association for Computing Machinery. [5] Donoho, A.W., Donoho, D.L., and Gasko, M, 1988, MacSpin: Dynamic Graphics on a desktop computer. In W.S Cleveland and M.E. McGill, eds., Dynamic Graphics for Statistics. Belmont, CA: Wadsworth, pp 331-351. [6] Velleman, P.F., 1988, Data Desk. Ithaca, New York: Data Descriptions Inc. [7] Cleveland, W.S, Visualising data, 1993, Hobart Press, page 2. [8] Pedestrian Accidents in Ireland, Great Britain and Northern Ireland, 1998, National Roads Authority, Dublin. [9] Bertin, J, La Graphique et le Treatment Graphique de I’Information 1977, Paris: Flammarion. English translation by W.J. Berg and P. Scott as Graphics and Graphic Information Processing, 1981, Berlin: Walter de Gruyter & Co. [10] Tufte, E.R, 1990, Envisioning information, Graphics Press. pp 12-30. [11] Friendly, M.,2001, Visualizing Categorical Data, SAS Institute Inc.,Cary, NC, USA. Cyril Connolly, IADT! 7