From the course: Learning Data Analytics: 1 Foundations
Modeling data in Power Query
You don't always have back-end access to a database and querying tools to use to model data. So what do you do when you don't? You can use Power Query. It's built right inside of Excel. We can bring our data in and through the use of merge queries, model it the way we need it. I want to take a look at our wage data and some reusable data from postal code information that we got from geonames.org. So the very first thing we'll do is go to Power Query through the Data tab. I'll go to Data, Get Data, From File, and From Workbook. I'll navigate to my exercise files. And the very first thing I want to do is pull in my postal code data. And I'll choose Postal Code. Immediately, I noticed something a little bit off with my postal codes. When postal codes are not three digits long. Let's go ahead and choose transform data and see if we can discover the problem. I see the header is called text postal code. But what it actually did is it changed the type. Power Query attempts to understand the data types and will automatically create them for you. I'm gonna actually delete this step. And when I do that, I see my text postal code and all my leading zeros come back. I want to go ahead and change this to a text data type. That way it keeps those leading zeros. I don't have to worry about it again. I actually don't need the zip code anymore. I can right-click and remove it. This data only represents the United States. Still a lot of data. Let's go ahead and do close and load. And this will bring this postal code data directly to the spreadsheet. Notice that it maintained by leading zeros, which Excel normally drops. All right, let's go grab our next data set. I'll go to data. I'll choose get data. I'll go from file, from workbook, and I'll choose my wage survey data, and I'll choose import. I'll choose survey, and I'll go ahead and transform the data. I have 343 responses from my survey. One of the very first things I want to check is the postal code. I'm going to scroll over. First thing I'm going to do is right click and move it all the way to the beginning. Just more convenient when I get ready to merge it. I also noticed that it's a one, two, three or a number data type. And really to merge or to join, they need to be the same data type. So I'll go ahead and change this to a text. Okay, great, let me expand my queries there. So I have my postal code data and I have my survey data. Okay, now I have my two data sets ready to merge. On the Home tab of my Ribbon in Power Query, I'll go to Merge Queries. If I choose Merge Queries here, then it will simply merge these two data sets into one. I like to keep mine separate. So I'll hit the drop-down and tell it to merge the queries as new. Now it's asking me to select the tables. I'll go ahead and choose my postal code below. And just like a query where we have to join the two fields, here we select the fields that should match. Now I have my survey data at the top, and I have my postal code data at the bottom. I have both the postal codes selected. and I see my join type, left outer, all from the first and matching from the second. So it's going to show me all of my survey data and where there's a matching postal code. It tells me that it's matching 337 of my 343 rows. That means either I have a miskeyed postal code or there's a postal code missing. I'll go ahead and click okay. And now I have my new merged query. Okay, what I'll do now is right-click that Merge1 and choose Rename. And I'll call this Survey Postal Codes. Okay, I wanna create another query because I wanna analyze just the survey results of the people in the state of Alabama and the 67 counties. So I'll go ahead and choose Merge Queries. I'll do Merge Queries as New, and I'll perform the same join. I'll choose survey, I'll choose postal code. Power Query doesn't offer left and right on the screen. It offers top and bottom, but the outer still works the same. So I'm less concerned about left or right, more concerned about outer and using those screen prompts. In Power Query, top is the left and bottom is the right. All right, so I'll choose a left outer. I'll choose my postal codes, and then I'll click OK. Let's go ahead and call this one Alabama Survey Results. OK, so I'm going to go ahead and close my query settings, and I'm going to scroll over. So here I see my postal codes table. In a query, we drag and drop or double-click fields. In a merge query, we just expand. And here we can tell it what we want to bring over. I want the city, the state, and the county. I can click OK. Now, I'm going to report on this in Maps, so I might need that latitude and longitude. Let me go back to my query settings, which I closed earlier, and hit that gear shape. Let's go ahead and add the latitude and longitude. And then I'll click okay. I'll go ahead and go back to survey postal codes. Scroll over here to the merge. And here I want everything but that text postal code. Don't need time zone for my reporting either. I'll go ahead and click okay. Notice that it appended the postal code name. If I don't really want the table name in there, I can also address that. You see this default column name prefix? I can delete that. And I can click OK. And then I just get those primary city and states. Let me do the same thing for my Alabama survey results. Again, that's just dependent upon what you want in your reporting. Perfect. Okay, so I have one final step, and that's to actually filter for the state of Alabama. So let me scroll back over. I'll hit my dropdown by my state. I'll choose Alabama and click OK. All right, and then I'm ready to load all of this to the spreadsheet. Okay, I'm on my Alabama survey results. I'll go to sheet four and notice that it tells me I'm in my survey postal codes. I'll go back to sheet five so that on-screen query and connections can help you. So I'll go ahead and name this Alabama only. I'll go to sheet four, name this all survey with postal, original data, and my postal codes data. Rephrase, I'll go ahead and delete sheet one. I don't need it. Okay. If I click back on my Alabama survey results, notice that it takes me directly to the Alabama sheet. Okay. I'll go ahead and save my work. Modeling data has become so much easier over the years. We model data to get it ready for further cleaning, for better reporting, but just remember a little bit of knowledge will take you a long way.
Contents
-
-
-
-
-
-
-
-
Understanding ETL in data1m 56s
-
Cleaning data using Excel macros6m 15s
-
Cleaning data with Power Query6m 30s
-
Working with reusable data4m 47s
-
Modeling data with queries7m 32s
-
Modeling data in Power Query8m 34s
-
Challenge: Rename headers in Power Query47s
-
Solution: Rename headers in Power Query4m 11s
-
-
-