From the course: Geospatial Raster Data Analytics in Python

Overview of raster data

- [Instructor] Now we will overview the most important practical features and characteristics of raster data. The most important feature of raster data is that it is such a data model in which the real world is mapped or represented by a grid consisting of cells. These cells usually are visualized as squares, and they're often called pixels, and they are organized into rows and columns. The values stored in these pixels can represent a wide range of spatial features such as temperature, elevation, and population, each measured within the corresponding grid cell. Raster data is is described by its characteristic spatial resolution, which is derived from the pixel size and captures the level of details in the raster. Large pixels correspond to less detailed images while smaller pixels correspond to more detailed data. Besides spatial resolution, we may also talk about spectral resolution, which measures the number of type of spectral bands a raster contains. For example, an elevation map is a single-band raster, capturing only one value, the elevation per pixel. While an RGB image is a triple-band raster, capturing three color values, red, green, and blue per pixel. Some satellite images can even have more than a dozen different bands, each capturing different parts of the electromagnetic spectrum, like infrared or ultraviolet. At the end, the spectrum resolution counts the total number of different values that are stored in each pixel. A piece of raster data in itself is pretty much an array or a matrix of values. These values may be numbers such as temperature, text, such as land cover classification type, or even triplets of numbers such as the RGB color codes of a satellite image. Such areas of values are indexed by two identifiers, the row and the column indices, which are both integer numbers. To map this grid to real geographical coordinates, we will need to do a process, which is called spatial referencing. This means that we need to do a set of mathematical transformations that turn the image space of the grid, using into your indices into the real-world coordinate space. This usually involves scaling the two axes in different ways to match the real-world geographical coordinates. Spatial referencing is particularly important when we want to match our raster data to any other, for instance, vector dataset and conduct combined analysis. Later, in the hands-on parts of this course, we will learn how to do this in practice using Python. In this slide, I would like to review the four most commonly used raster data types. As you will see, they are all somehow linked to image storage, as essentially, a raster looks very much like a picture with pixels storing information. The most widely used, also the one we will use in this course, is called GeoTIFF. This usually comes in the .tif and .tiff file extensions. They are great for spatial referencing, working with other GIS tools, libraries and softwares, and conducting spatial analysis that requires high precision. On the downside, these files, especially with higher spatial resolutions, tend to be quite large, making processing and scaling challenging. Then, NetCDF is also a widely used format coming with the extension of .nc. This format handles multidimensional data very well and therefore, is suitable for multi-spectral imaging and time series data. This is a more advanced type of raster data, which we are not going to cover in this course. Finally, it is worth mentioning JPEG2000, JPEG and PNG image files, which are sometimes used as raster data. However, since they lack spatial referencing, their usage is rather limited. Now I would like to list a few possible applications, while for further reading, I recommend this article titled, "100 Earth-Shattering Remote Sensing Applications and Users." One of them, for instance, is the insurance industry, where satellite images are often used for assessing the damage caused by various natural hazards such as hurricanes. Flood risk assessment also heavily relies on so-called digital elevation models, a classic example of single-band raster data. Urban planning can greatly benefit from raster data as well. For instance, by quantifying the amount of greenery and built-in area in different developmental sites. One of the many environmental applications of raster data includes monitoring illegal foresting and deforestation. Finally, a very exciting and perhaps a bit unusual domain for spatial data science is archeology, where raster data can be used to detect hidden ancient sites and even long forgotten human settlements. In conclusion, throughout these slides, we learned that raster data consists of grid size or pixels, which may store one or multiple values, depending on their spectral resolution. Additionally, the size of each pixel determines the spatial resolution of the raster. We usually need to special reference these raster data, which means that we map the row and column indices to real geographical coordinates measured in a given CRS. Finally, raster data is best stored in GOT format as we will learn during the upcoming hands-on sessions in Python.

Contents