From the course: Geospatial Raster Data Analytics in Python

Creating dummy raster data

- In the second chapter, we are going to get hands-on and practical with raster data in Python. First, we are going to learn how to create synthetic raster data from scratch, and then we'll learn how to explore, analyze, dissect, and finally export raster data. This way, we will gain hands-on experience on how to deal with raster data, which later on, we will use on real life data sets. In this video, we will learn how to create simpler raster data by hand. While such a data set may not be useful for real life analytical projects, it'll help you a lot to deeply understand the structure and origin of raster data. First, we need to import the necessary libraries for the competitions we are going to do, including creating the arrays, storing our raster data values we will need. Now we will learn how to create a single band random data, which will consist of a grid where each grid cell contains one single value. A real life example of this kind of scenario could be, for instance, the grid level population data, measuring the total number of inhabitants in each grade cell. To create such synthetic data, we start by creating a table or an array of random numbers using numpai. This data table will provide the values to our first single band raster file. For the sake of this example, let's create a grid of a hundred times hundred pixels where the row and column numbers are stored in the following variables. Then let's generate the area of random numbers using the built-in random module of the numpai library. Additionally, to match the format of typical raster image files representing colors, we scale the random numbers originally in the zero to one range, to the zero to 255 range. This is because color values in most image formats are commonly represented as integers between zero and 255. Here, now you can see the random values between zero and one, and now between zero and 255. Let's call this data table single data. Finally, we convert these scale numbers to unsigned eight bit integers, a standard data type representing pixel values in raster images. This ensures that each pixel's value is stored efficiently as a single byte. We can confirm that this data table, indeed, has a hundred rows and a hundred columns as we planned by querying the length of the array, and also one element, a random element of that array. Finally, we can access a single element of the grid, the value of a given pixel by using the row and column indices, for instance, the zero zero element or the 10 10 element. In the second part of this lecture, we create an example of a multi-band raster dataset. To be more specific, we are going to create a triple band random array of data, which will correspond to the regular RGB images. Such data types are commonly used in practical applications as well. For example, images taken by regular cameras stored data in RGB format where each band or channel corresponds to one of the primary colors, red, green, or blue. Each channel holds the intensity values for that specific color forming the complete image together. Now we are going to follow the previous logic and create the three DR, DG, and the B bands separately, using the previously introduced, random lead generation method. So first I just copy this line of code and start renaming the variables. I will call this red band, and we use the same random generation method to create the green and the blue band as well. At this point, each band is stored in a separate grid or a separate array. Now we have to stack them together to arrive at a single combined data structure, using the following comment. Now let's look at the data, which will immediately show us that here, each element of the grid, each pixel contains a three plat of integer numbers. Each corresponds to the RGB values of that given pixel. As you can see, three values here, three values here, and three values here as well. Using the previous method, we can double check the number of elements in the array, which should be a hundred in each direction, and we can also easily access a single element of this grid, also using the previous indexing logic. And here now we can see that a single element, the value of a given pixel is indeed a triplet. And having said that, we arrived to the end of this lecture where we learned how to use one of the most standard Python libraries, numpai, to generate synthetic grid data. This prepared us for the upcoming lecture where we will learn how to turn this into a geospatial grid or raster data.

Contents