SlideShare a Scribd company logo
Creating an Accessible GIS Database from a Research Multisensor Precipitation Estimator

                                       Product



                               MET 4905, Section Two

                                     Spring 2005

                                Dr. Henry E. Fuelberg

                                    29 April 2005

                                 John L. Sullivan, Jr.
Sullivan 1


Introduction

       Geographical Information Systems have quickly become an essential tool for easily

analyzing vast amounts of spatial information. Recently, Joseph Marzen went to extensive

efforts to create a high-quality rainfall database for Florida, exhibiting the extreme spatial (4 x 4

km2 grid) and temporal (hourly) characteristics of precipitation in the state. His work included

in-depth quality control of the precipitation gauges to improve the output of the National

Weather Service’s Multisensor Precipitation Estimator (MPE) software. The more accurate

product was created for the entire Southeast River Forecast Center’s area of responsibility

(Marzen 2004). This paper will discuss the process involved in converting the database into an

ArcGIS compatible format, specifically using the Microsoft Access DBASE IV format, as well

as the effectiveness and usability of such a database in a Geographic Information System (GIS).



Methodology

       The output files from the MPE runs consist of a 305 x 382 HRAP grid in an ASCII grid

file. The files are arranged such that there is one file for every hour of data compressed into one

file for each year. Also included in the compressed yearly files are four separate files

referencing the latitude, longitude, HRAP x-, and HRAP y- coordinates for the center of each 4 x

4 km2 grid box of rainfall totals. These files are used to correlate the ASCII hourly precipitation

grid files. In order to work with the data, the files for the entire Florida Peninsula grid are

extracted to one directory where FORTRAN programs can be used to access and convert the

ASCII files into the comma-separated values (CSV) format. This is a format that is commonly

accessible by database software (i.e., Microsoft Access) and relatively simple to develop. The
Sullivan 2


programs were written for the specific intent of this project and involve concatenating the hourly

precipitation files with the four reference files in a deliberate manner.

       In working with Dr. Harry Cooper, one of our first tasks was to create a monthly

precipitation sums database for the Black Creek Basin, a relatively small watershed in Clay

County, Florida. It became evident in working with the data that it would be advantageous to

use this region as a test in creating a GIS database capable of summing, averaging, and analyzing

the data over user-defined intervals. The procedure described herein was carried out for this

specific region, but with the idea that the code can be easily modified to create the same database

for any region (assuming that the MPE product has already been run and output to the

appropriate format). However, one of the issues that came to light during the process is that, in

its current form, the database can get very large quite rapidly. Possible solutions or workarounds

to this issue are discussed in the Future Use section later in this paper.



Choosing the Right Database

       Many trial runs of the FORTRAN programs were repeated until the database was

eventually exportable into a satisfactory Microsoft Access database. Some of the first trials of

the database attempted to incorporate it directly into the ArcGIS software; however, this method

was abandoned due to the capability of the GIS software versus that of Access. Although the

method of storing data in ArcMap is similar to that of Access, the data is much more easily

accessible through Access. Most of the advantages come in the ability of Access to perform

advanced queries of data much more efficiently, especially when calculating sums or averages of

precipitation. Access also has the ability to be made into a more user friendly environment that

is easily expandable. Microsoft Access was chosen over other database software packages
Sullivan 3


because of its widespread availability in the market and the fact that it gets the job done

efficiently without advanced expertise in database management. Another tidbit to push the

deciding factor in favor of Access is that ArcGIS is designed to work well side-by-side with the

Microsoft Access product.

       One of the other factors for keeping the database separate from the GIS software is to

make it unnecessary to have the GIS software on the same machine to query the database and

extract the data for a particular study. The data can be simply extracted from the Access

database and imported to the GIS

software for incorporation into maps.

Another important concept was to

streamline the process as to have the

least amount of necessary steps possible

to create a map of any particular query.

Through Access queries can be

automated and macros can be

programmed to run more advanced
                                             Figure 1. Screenshot of an example of a macro
processes with the click of a button (Fig.
                                             created in Microsoft Access. This macro opens the
                                             query dialog.
1).
Sullivan 4


Creating the Database Format and Miscellaneous Nuances

       The first step in extracting the data into CSV files was to find the HRAP coordinates

encompassing the Black Creek Basin. The HRAP polygon grid (supplied by St. Johns Water

Management District) was overlaid on a map of the Basin. Working with Dr. Cooper, the square

                                                                      polygons of the 4 x 4 km2

                                                                      grid were selected over the

                                                                      basin. Any grid box that

                                                                      contained any part of the

                                                                      basin was chosen. The

                                                                      Black Creek Basin contains

                                                                      123 of the 4 x 4 km2 grid

                                                                      boxes, a small fraction of the

                                                                      total 116,510 boxes for the

                                                                      Florida Peninsula (Fig. 2).

                                                                             The database

                                                                      contains a line of data for

                                                                      each hour of each grid box

                                                                      over the six year data period.

                                                                      In the early stages of the

Figure 2. The HRAP grid for the Florida peninsula is shown in         database creation, the
black; the Black Creek Basin grid is highlighted in orange.
Black Creek Basin is a very small portion of the Florida              latitude and longitude of the
NEXRAD grid.
                                                                      center of each 4 x 4 km2 grid

box was put into each line of hourly precipitation so that the data could be directly imported to
Sullivan 5


GIS as point events. Since this method was abandoned, the latitude and longitude values could

be removed from each line, helping to keep the disk space consumption of the database down.

       One important nuance to overcome was to incorporate missing data into the final maps.

From the hourly ASCII files, a value of -0.0039 indicates that there is missing data for that time

in that grid. This also created a problem in that it would inhibit summing over missing data

periods. The solution is that there are three columns relating to the actual precipitation in the

final database: the raw value directly from the ASCII file, a field denoting whether data was

missing for that time, and a modified precipitation value (in inches). The raw data field contains

the precipitation for that hour in inches, unless data is missing, in which case -0.0039 is placed

there to signify that data was missing for that time. The default value in the missing data column

is a “0” quantity. In the case of missing data,

however, the missing data column contains a

“1” and the modified precipitation field would

contain a value of 0.00 inches. When data is

not missing (a raw value greater than or equal

to 0.00 inches), the raw value is assigned to the

modified precipitation value and the missing

data column is left at default (“0”). Essentially

the raw data field could be eliminated, but it

remains for reference and the sake of future

endeavors.

       The missing data and modified
                                                     Figure 3. Sample of an output map from
precipitation columns make less processing that      ArcMap GIS software. Notice the grid
                                                     colored to intervals of precipitation amounts.
Sullivan 6


the database query must accomplish when summing over random data periods. The Access

query is set up to sum the modified precipitation column and the missing data column, giving

results of the total precipitation (in inches) and the number of missing hours for that period. This

creates an output table easy to import into ArcMap, allowing for a color coded grid layout of

precipitation and a label denoting the number of missing hours on each grid (Fig. 3). It is

necessary for the user to know whether a period had an abundance of missing data values, as to

not assume that there is no precipitation for the entire period when all of the data is missing. On

a related note, there was no radar data available for December 1999, 27-31 May 2000, April

2001, or December 2001. In those cases, the database values were generically created using a

separate FORTRAN program and imported into the Access database. These hours were given

values corresponding to missing data.

       The database contains nine columns in the following order: MPE zone (PENINSULA for

all of Black Creek), month, day, year, hour, raw precipitation value, missing data, and modified

precipitation amount (inches). All time is in Coordinated Universal Time (UTC), denoting the

rainfall at the end of the hour (i.e., 1200 UTC would be rainfall for 1100-1159 UTC).



Using the database

       The power of Access to group sums based on certain fields allows the sums to be created

based on each grid box, thereby allowing the values to be easily queried, summed, and sent to a

new database table at once (Fig. 4).
Sullivan 7




Figure 4. Screenshot of the query tool used to access the data in the database and create a new table
with the sums of precipitation and missing data, along with the I_J coordinates that will be used to
link the data to individual grid boxes. The criteria allow the user to query based on any interval of
hours. This query will sum the precipitation and missing hours on 12 July 1997 1100 - 2259 UTC.
The output table will be grouped based on the I_J coordinate, which corresponds to basically the ID of
each grid box. This is used to match the data to a spatial shapefile in ArcMap.
Sullivan 8


The table is created from the query (Fig. 5).




                  Figure 5. The table is output from the query containing I_J
                  coordinate, sum of precipitation in inches, and sum of
                  missing data (over specified query interval). This is the
                  table from the Figure 4 example (12 July 1997 1100 – 2259
                  UTC).
Sullivan 9




                           Figure 6. Screenshot of the join dialog
                           from ArcMap. This is where the software
                           is set up to link the I_J values from the data
                           table to the spatial I_J coordinates of the
                           grid boxes.


The database table is saved and can then be imported into the GIS where the data is joined based

on the I_J coordinates (Fig. 6). The coordinates in the table can be found in the spatial polygon

shapefile of the NEXRAD radar grid.
Sullivan 10


Figure 7 shows that precipitation values from the database can then be displayed on the map in a

responsive and useful way.




Figure 7. Screenshot of the ArcMap software showing the precipitation displayed in the HRAP
grid and overlaid with rivers and lakes. This is the example from previous figures (12 July 1997
1100 – 2259 UTC).


The power to make useful maps is provided in the GIS. The final map can be exported to a

variety of formats and used digitally, or printed to paper (Fig. 8). The following image is the

example from Figures 4, 5, 6, and 7 exported as an image in JPEG format.
Sullivan 11




Figure 8. Sample map of the Black Creek Basin; shaded for the sum of precipitation
between 1100 UTC and 2259 UTC on 12 July 1997. Missing hours are displayed in the
grid boxes. Note that the zeroes represent that there was no missing data for this interval.
Sullivan 12


          The most advantageous aspect of the final database is that it can be queried on an almost

unlimited level. Not only can precipitation and missing data sums be calculated over any

multiple of hours between 1 January 1996 and 31 December 2001, but by simply changing a

dropdown box, an average, maximum value, or standard deviation can be calculated just as

easily.



Future Use

          One of the biggest drawbacks of such a large GIS database is the disk space needed to

store the data, but it is a necessary aspect of analyzing data at spatially and temporally high

resolutions. There are two options to creating this database for another region. One method

would incorporate databases being created on an as needed basis while the other method would

encompass creating one master database from which any region of data could be selected. In the

former, the HRAP coordinates for a specific region can be chosen and that database can be

created by modifying the FORTRAN program to output the HRAP/NEXRAD grid boxes that are

part of the chosen region into a new CSV file. The file could then be imported to a new Access

database. The latter method would involve modifying the FORTRAN program to output all of

the HRAP coordinates into one large CSV file. This file would be on the order of 300 GB and

would not be very efficient for querying data. Simplifying the database to use the least amount

of text possible is an option to cut down on the size, but there is not much that can be done about

the number of records. When running a query, it could take a daunting amount of time for

Microsoft Access to run through such a large number of records. It would be possible to cut the

HRAP grid down some, as to not include grid boxes far off of the coast, but there still are a large

number of records from which to search. With the processing and storage capabilities currently
Sullivan 13


available, the optimal method would be to create databases for smaller regions, possible by

county. This would help to alleviate the transfer of data as well, which is important considering

that a future possibility might incorporate making the data available on an internet server.



Conclusions

        The process of creating an hourly precipitation database from the MPE product took

some time in implementing, but can be repeated rather quickly and with much greater ease for

other regions. The database has turned out to be a very useful tool for analyzing the distribution

of rainfall across the Black Creek Region. The database is friendly to use and can create sums of

rainfall that are easily imported into ArcMap. The power to analyze the data once it is available

in the Geographical Information System increases drastically, with the ability to overlay

numerous layers of data and understand the correlation (or not) of rainfall with other features.

        I have found this project to be very rewarding. The details that were learned in the

process of generically formatting data for broad future use will be beneficial to my future

endeavors in research and as a scientist. It was useful to gain more experience with the GIS

software as well. The ability to make data more accessible to the masses is an important step in

understanding the scientific reasons for occurrences within the dataset.

        A copy of this paper and more images that were created from this database can be found

online at http://bertha.met.fsu.edu/~jsull/. Special thanks belong to Denny VanCleve for his help

in familiarizing me with the data, previous FORTRAN programs, and scripts used to complete

this project.
Sullivan 14


References

Marzen, J.L., 2004: Development of a Florida High-Resolution Multisensor Precipitation Dataset
      for 1996-2001 -- Quality Control and Verification. M.Sc. Thesis, Dept. Meteorology. The
      Florida State University (unpublished).

Xie, H., X. Zhou, E. Vivoni, E. Small, and J.M.H. Hendrickx, 2005: Development of a GIS
       based NEXRAD precipitation database: automated approaches for data processing and
       visualization. Comp. & Geosci., 31, 65-76.

More Related Content

PDF
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
PDF
E031201032036
PDF
Download-manuals-surface water-waterlevel-21howtomakedataentryforwaterleveldata
PDF
Hadoop scheduler with deadline constraint
PDF
benchmarks-sigmod09
PDF
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
E031201032036
Download-manuals-surface water-waterlevel-21howtomakedataentryforwaterleveldata
Hadoop scheduler with deadline constraint
benchmarks-sigmod09
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work

What's hot (20)

PDF
Download-manuals-ground water-manual-gw-volume5operationmanualgiscreationofd...
PDF
Download-manuals-ground water-manual-gw-volume8operationmanualdataprocessing...
PDF
Python in an Evolving Enterprise System (PyData SV 2013)
PDF
Parallel Data Processing with MapReduce: A Survey
PPTX
APPLICATIONS OF RS AND GIS FOR DEVELOPMENT OF SMALL HYDROPOWER PLANTS (SHP)
PDF
Mar88CADalyst
PDF
A Brief on MapReduce Performance
PDF
FME to the Rescue
PDF
Knapp, wilkins 2018 - gridded satellite (grid sat) goes and conus data-anno...
PDF
A Comparative Study Of Analytical Tools For Strategic & Tactical Forest Manag...
PDF
Process Model
PDF
Hot-Spot analysis Using Apache Spark framework
PDF
Hadoop with Lustre WhitePaper
PDF
C044051215
PDF
A sql implementation on the map reduce framework
PDF
Eg4301808811
PDF
MapReduce: Distributed Computing for Machine Learning
PDF
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
PDF
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
PPTX
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Download-manuals-ground water-manual-gw-volume5operationmanualgiscreationofd...
Download-manuals-ground water-manual-gw-volume8operationmanualdataprocessing...
Python in an Evolving Enterprise System (PyData SV 2013)
Parallel Data Processing with MapReduce: A Survey
APPLICATIONS OF RS AND GIS FOR DEVELOPMENT OF SMALL HYDROPOWER PLANTS (SHP)
Mar88CADalyst
A Brief on MapReduce Performance
FME to the Rescue
Knapp, wilkins 2018 - gridded satellite (grid sat) goes and conus data-anno...
A Comparative Study Of Analytical Tools For Strategic & Tactical Forest Manag...
Process Model
Hot-Spot analysis Using Apache Spark framework
Hadoop with Lustre WhitePaper
C044051215
A sql implementation on the map reduce framework
Eg4301808811
MapReduce: Distributed Computing for Machine Learning
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Ad

Similar to Gis rainfallstudy (20)

PPTX
Watershed Delineation in ArcGIS
PDF
Download-manuals-ground water-manual-gw-volume5operationmanualgiscreationofd...
PDF
Lecture3
PPTX
GIS Presentation.pptx
PPTX
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
PDF
Arc gis concept
PPTX
Watershed Delineation Using ArcMap
PDF
Download-manuals-surface water-manual-sw-volume9designmanualdatatransferstorage
PDF
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
PPTX
Presentation
PDF
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
PDF
SmartGeo - G. Satta
PPT
040419 san forum
PDF
Systematic Approch to a GIS
PPTX
Geographic information system and remote sensing
PDF
What is a Data Commons and Why Should You Care?
PDF
WaPOR version 3 - Annemarie Klaasse - eLeaf - 05 May 2023.pdf
PPTX
Lecture 1 - Introduction to GIS and SDI.pptx
PPT
GPS to GIS Emergency Mapping
Watershed Delineation in ArcGIS
Download-manuals-ground water-manual-gw-volume5operationmanualgiscreationofd...
Lecture3
GIS Presentation.pptx
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Arc gis concept
Watershed Delineation Using ArcMap
Download-manuals-surface water-manual-sw-volume9designmanualdatatransferstorage
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
My Other Computer is a Data Center: The Sector Perspective on Big Data
Presentation
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
SmartGeo - G. Satta
040419 san forum
Systematic Approch to a GIS
Geographic information system and remote sensing
What is a Data Commons and Why Should You Care?
WaPOR version 3 - Annemarie Klaasse - eLeaf - 05 May 2023.pdf
Lecture 1 - Introduction to GIS and SDI.pptx
GPS to GIS Emergency Mapping
Ad

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
August Patch Tuesday
PPTX
TLE Review Electricity (Electricity).pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A comparative analysis of optical character recognition models for extracting...
SOPHOS-XG Firewall Administrator PPT.pptx
Group 1 Presentation -Planning and Decision Making .pptx
cloud_computing_Infrastucture_as_cloud_p
Digital-Transformation-Roadmap-for-Companies.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Web App vs Mobile App What Should You Build First.pdf
August Patch Tuesday
TLE Review Electricity (Electricity).pptx
A comparative study of natural language inference in Swahili using monolingua...
Enhancing emotion recognition model for a student engagement use case through...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Tartificialntelligence_presentation.pptx
Encapsulation theory and applications.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf

Gis rainfallstudy

  • 1. Creating an Accessible GIS Database from a Research Multisensor Precipitation Estimator Product MET 4905, Section Two Spring 2005 Dr. Henry E. Fuelberg 29 April 2005 John L. Sullivan, Jr.
  • 2. Sullivan 1 Introduction Geographical Information Systems have quickly become an essential tool for easily analyzing vast amounts of spatial information. Recently, Joseph Marzen went to extensive efforts to create a high-quality rainfall database for Florida, exhibiting the extreme spatial (4 x 4 km2 grid) and temporal (hourly) characteristics of precipitation in the state. His work included in-depth quality control of the precipitation gauges to improve the output of the National Weather Service’s Multisensor Precipitation Estimator (MPE) software. The more accurate product was created for the entire Southeast River Forecast Center’s area of responsibility (Marzen 2004). This paper will discuss the process involved in converting the database into an ArcGIS compatible format, specifically using the Microsoft Access DBASE IV format, as well as the effectiveness and usability of such a database in a Geographic Information System (GIS). Methodology The output files from the MPE runs consist of a 305 x 382 HRAP grid in an ASCII grid file. The files are arranged such that there is one file for every hour of data compressed into one file for each year. Also included in the compressed yearly files are four separate files referencing the latitude, longitude, HRAP x-, and HRAP y- coordinates for the center of each 4 x 4 km2 grid box of rainfall totals. These files are used to correlate the ASCII hourly precipitation grid files. In order to work with the data, the files for the entire Florida Peninsula grid are extracted to one directory where FORTRAN programs can be used to access and convert the ASCII files into the comma-separated values (CSV) format. This is a format that is commonly accessible by database software (i.e., Microsoft Access) and relatively simple to develop. The
  • 3. Sullivan 2 programs were written for the specific intent of this project and involve concatenating the hourly precipitation files with the four reference files in a deliberate manner. In working with Dr. Harry Cooper, one of our first tasks was to create a monthly precipitation sums database for the Black Creek Basin, a relatively small watershed in Clay County, Florida. It became evident in working with the data that it would be advantageous to use this region as a test in creating a GIS database capable of summing, averaging, and analyzing the data over user-defined intervals. The procedure described herein was carried out for this specific region, but with the idea that the code can be easily modified to create the same database for any region (assuming that the MPE product has already been run and output to the appropriate format). However, one of the issues that came to light during the process is that, in its current form, the database can get very large quite rapidly. Possible solutions or workarounds to this issue are discussed in the Future Use section later in this paper. Choosing the Right Database Many trial runs of the FORTRAN programs were repeated until the database was eventually exportable into a satisfactory Microsoft Access database. Some of the first trials of the database attempted to incorporate it directly into the ArcGIS software; however, this method was abandoned due to the capability of the GIS software versus that of Access. Although the method of storing data in ArcMap is similar to that of Access, the data is much more easily accessible through Access. Most of the advantages come in the ability of Access to perform advanced queries of data much more efficiently, especially when calculating sums or averages of precipitation. Access also has the ability to be made into a more user friendly environment that is easily expandable. Microsoft Access was chosen over other database software packages
  • 4. Sullivan 3 because of its widespread availability in the market and the fact that it gets the job done efficiently without advanced expertise in database management. Another tidbit to push the deciding factor in favor of Access is that ArcGIS is designed to work well side-by-side with the Microsoft Access product. One of the other factors for keeping the database separate from the GIS software is to make it unnecessary to have the GIS software on the same machine to query the database and extract the data for a particular study. The data can be simply extracted from the Access database and imported to the GIS software for incorporation into maps. Another important concept was to streamline the process as to have the least amount of necessary steps possible to create a map of any particular query. Through Access queries can be automated and macros can be programmed to run more advanced Figure 1. Screenshot of an example of a macro processes with the click of a button (Fig. created in Microsoft Access. This macro opens the query dialog. 1).
  • 5. Sullivan 4 Creating the Database Format and Miscellaneous Nuances The first step in extracting the data into CSV files was to find the HRAP coordinates encompassing the Black Creek Basin. The HRAP polygon grid (supplied by St. Johns Water Management District) was overlaid on a map of the Basin. Working with Dr. Cooper, the square polygons of the 4 x 4 km2 grid were selected over the basin. Any grid box that contained any part of the basin was chosen. The Black Creek Basin contains 123 of the 4 x 4 km2 grid boxes, a small fraction of the total 116,510 boxes for the Florida Peninsula (Fig. 2). The database contains a line of data for each hour of each grid box over the six year data period. In the early stages of the Figure 2. The HRAP grid for the Florida peninsula is shown in database creation, the black; the Black Creek Basin grid is highlighted in orange. Black Creek Basin is a very small portion of the Florida latitude and longitude of the NEXRAD grid. center of each 4 x 4 km2 grid box was put into each line of hourly precipitation so that the data could be directly imported to
  • 6. Sullivan 5 GIS as point events. Since this method was abandoned, the latitude and longitude values could be removed from each line, helping to keep the disk space consumption of the database down. One important nuance to overcome was to incorporate missing data into the final maps. From the hourly ASCII files, a value of -0.0039 indicates that there is missing data for that time in that grid. This also created a problem in that it would inhibit summing over missing data periods. The solution is that there are three columns relating to the actual precipitation in the final database: the raw value directly from the ASCII file, a field denoting whether data was missing for that time, and a modified precipitation value (in inches). The raw data field contains the precipitation for that hour in inches, unless data is missing, in which case -0.0039 is placed there to signify that data was missing for that time. The default value in the missing data column is a “0” quantity. In the case of missing data, however, the missing data column contains a “1” and the modified precipitation field would contain a value of 0.00 inches. When data is not missing (a raw value greater than or equal to 0.00 inches), the raw value is assigned to the modified precipitation value and the missing data column is left at default (“0”). Essentially the raw data field could be eliminated, but it remains for reference and the sake of future endeavors. The missing data and modified Figure 3. Sample of an output map from precipitation columns make less processing that ArcMap GIS software. Notice the grid colored to intervals of precipitation amounts.
  • 7. Sullivan 6 the database query must accomplish when summing over random data periods. The Access query is set up to sum the modified precipitation column and the missing data column, giving results of the total precipitation (in inches) and the number of missing hours for that period. This creates an output table easy to import into ArcMap, allowing for a color coded grid layout of precipitation and a label denoting the number of missing hours on each grid (Fig. 3). It is necessary for the user to know whether a period had an abundance of missing data values, as to not assume that there is no precipitation for the entire period when all of the data is missing. On a related note, there was no radar data available for December 1999, 27-31 May 2000, April 2001, or December 2001. In those cases, the database values were generically created using a separate FORTRAN program and imported into the Access database. These hours were given values corresponding to missing data. The database contains nine columns in the following order: MPE zone (PENINSULA for all of Black Creek), month, day, year, hour, raw precipitation value, missing data, and modified precipitation amount (inches). All time is in Coordinated Universal Time (UTC), denoting the rainfall at the end of the hour (i.e., 1200 UTC would be rainfall for 1100-1159 UTC). Using the database The power of Access to group sums based on certain fields allows the sums to be created based on each grid box, thereby allowing the values to be easily queried, summed, and sent to a new database table at once (Fig. 4).
  • 8. Sullivan 7 Figure 4. Screenshot of the query tool used to access the data in the database and create a new table with the sums of precipitation and missing data, along with the I_J coordinates that will be used to link the data to individual grid boxes. The criteria allow the user to query based on any interval of hours. This query will sum the precipitation and missing hours on 12 July 1997 1100 - 2259 UTC. The output table will be grouped based on the I_J coordinate, which corresponds to basically the ID of each grid box. This is used to match the data to a spatial shapefile in ArcMap.
  • 9. Sullivan 8 The table is created from the query (Fig. 5). Figure 5. The table is output from the query containing I_J coordinate, sum of precipitation in inches, and sum of missing data (over specified query interval). This is the table from the Figure 4 example (12 July 1997 1100 – 2259 UTC).
  • 10. Sullivan 9 Figure 6. Screenshot of the join dialog from ArcMap. This is where the software is set up to link the I_J values from the data table to the spatial I_J coordinates of the grid boxes. The database table is saved and can then be imported into the GIS where the data is joined based on the I_J coordinates (Fig. 6). The coordinates in the table can be found in the spatial polygon shapefile of the NEXRAD radar grid.
  • 11. Sullivan 10 Figure 7 shows that precipitation values from the database can then be displayed on the map in a responsive and useful way. Figure 7. Screenshot of the ArcMap software showing the precipitation displayed in the HRAP grid and overlaid with rivers and lakes. This is the example from previous figures (12 July 1997 1100 – 2259 UTC). The power to make useful maps is provided in the GIS. The final map can be exported to a variety of formats and used digitally, or printed to paper (Fig. 8). The following image is the example from Figures 4, 5, 6, and 7 exported as an image in JPEG format.
  • 12. Sullivan 11 Figure 8. Sample map of the Black Creek Basin; shaded for the sum of precipitation between 1100 UTC and 2259 UTC on 12 July 1997. Missing hours are displayed in the grid boxes. Note that the zeroes represent that there was no missing data for this interval.
  • 13. Sullivan 12 The most advantageous aspect of the final database is that it can be queried on an almost unlimited level. Not only can precipitation and missing data sums be calculated over any multiple of hours between 1 January 1996 and 31 December 2001, but by simply changing a dropdown box, an average, maximum value, or standard deviation can be calculated just as easily. Future Use One of the biggest drawbacks of such a large GIS database is the disk space needed to store the data, but it is a necessary aspect of analyzing data at spatially and temporally high resolutions. There are two options to creating this database for another region. One method would incorporate databases being created on an as needed basis while the other method would encompass creating one master database from which any region of data could be selected. In the former, the HRAP coordinates for a specific region can be chosen and that database can be created by modifying the FORTRAN program to output the HRAP/NEXRAD grid boxes that are part of the chosen region into a new CSV file. The file could then be imported to a new Access database. The latter method would involve modifying the FORTRAN program to output all of the HRAP coordinates into one large CSV file. This file would be on the order of 300 GB and would not be very efficient for querying data. Simplifying the database to use the least amount of text possible is an option to cut down on the size, but there is not much that can be done about the number of records. When running a query, it could take a daunting amount of time for Microsoft Access to run through such a large number of records. It would be possible to cut the HRAP grid down some, as to not include grid boxes far off of the coast, but there still are a large number of records from which to search. With the processing and storage capabilities currently
  • 14. Sullivan 13 available, the optimal method would be to create databases for smaller regions, possible by county. This would help to alleviate the transfer of data as well, which is important considering that a future possibility might incorporate making the data available on an internet server. Conclusions The process of creating an hourly precipitation database from the MPE product took some time in implementing, but can be repeated rather quickly and with much greater ease for other regions. The database has turned out to be a very useful tool for analyzing the distribution of rainfall across the Black Creek Region. The database is friendly to use and can create sums of rainfall that are easily imported into ArcMap. The power to analyze the data once it is available in the Geographical Information System increases drastically, with the ability to overlay numerous layers of data and understand the correlation (or not) of rainfall with other features. I have found this project to be very rewarding. The details that were learned in the process of generically formatting data for broad future use will be beneficial to my future endeavors in research and as a scientist. It was useful to gain more experience with the GIS software as well. The ability to make data more accessible to the masses is an important step in understanding the scientific reasons for occurrences within the dataset. A copy of this paper and more images that were created from this database can be found online at http://bertha.met.fsu.edu/~jsull/. Special thanks belong to Denny VanCleve for his help in familiarizing me with the data, previous FORTRAN programs, and scripts used to complete this project.
  • 15. Sullivan 14 References Marzen, J.L., 2004: Development of a Florida High-Resolution Multisensor Precipitation Dataset for 1996-2001 -- Quality Control and Verification. M.Sc. Thesis, Dept. Meteorology. The Florida State University (unpublished). Xie, H., X. Zhou, E. Vivoni, E. Small, and J.M.H. Hendrickx, 2005: Development of a GIS based NEXRAD precipitation database: automated approaches for data processing and visualization. Comp. & Geosci., 31, 65-76.