From the course: Learning Data Analytics: 1 Foundations

Cleaning data using Excel macros

From the course: Learning Data Analytics: 1 Foundations

Cleaning data using Excel macros

You may find in an organization they're using older methods of cleaning data. They may be using Excel macros. It could be that that file just hasn't been touched in a while, or they haven't adopted tools like Power Query. It's important to know when you inherit a file, you inherit the knowledge and the style of the person who is responsible for that file. However, macros are a great way to determine what needs to be cleaned, and also a great opportunity for process improvement. Okay, let me show you how to record a few macros and show you how to look inside a macro. Okay, I'm in my spending trend survey. And every time I download this data, I see the same type of information. It's structured exactly the same. The only difference is there's new data each time. Okay, so I have my developer tab turned on in my ribbon. You can go to customize your ribbon to turn it on if it's not there. I'll choose record macro. This will be step one. This step one will copy the sheet. Okay, I'll go ahead and right click the sheet. I'll choose move or copy. I'll create a copy. I'll move it to the end and I'll click okay. I'll go ahead and click back on sheet and stop recording. Again, I wanna keep a copy of that sheet so I can refer back to it to test my macros later. Now I'm ready to start the cleaning process. The very first thing I'm gonna do is delete the columns I don't need. I'll start by recording a macro. I'll call this one step two. I'll call it delete columns or delcol. You make that plural. Okay, I don't need the collector ID for my reporting. I also don't need the end date. I'll go ahead and highlight IP address all the way to that custom data in G. I'll right click and delete that. Okay, great. This has me set up for exactly what I need. I'll go ahead and stop recording. I'll record my next step. Call this one step three. Size, data. Okay, I'll go ahead and hit that select all and double-click my column headings, and that will size my data. I'll click back on A1, I'll stop recording. Okay, again, every time I download this data, it has row two. Row two is additional information about the headers. So I need to go ahead and address that in my cleaning. Okay, let me go ahead and do record macro. I'll call this one step four, clean headers. I'll click OK. All right, I'll go ahead and copy this self-describe and paste it there into E1. I'll go ahead and delete row 2 because I no longer need it. I'll go ahead and address these headers. I'll name them age, gender, education, perfect. All right, I'll go back over to A1 and then I'll stop recording. Now I'm ready to test my macro, but at the same time, I actually want to record myself running my steps. So I'll go ahead and delete my sheet. I'll go ahead and name this back to sheet because that's what it'll be looking for is a sheet name named sheet. I'll go record my macro. This is called Run All. I'll click OK. I'll go to Macros. I'll run step one. You see, I'll have sheet two down below. I'll go to Macros again. I'll run step two. Do you see how it deleted my columns? I'll go to Macros. I'll do step three. It's sizing my data. Perfect. I'll go to macros one last time and clean my headers. I'll choose run. Awesome. Then I can stop recording. Now going forward, if I want to run that macro, I can just run the run all. Let me show you that. Okay. I'll change the sheet name back to sheet again, because when I download it from my survey tool, that's what it's called. I'll go to macros. Run all is selected, and I'll choose Run. And just like that, I have clean data. So you can see why companies would adopt macros. However, with the advent of Power Query, these options are made much simpler. Let me show you how you can look at the inside of a macro so you can determine what cleaning steps have occurred. So I'll go to Visual Basic. I'll expand Modules. and I'll see module one. When I double-click module one, it shows me which all columns, sheets, information I changed. These are a whole lot easier to read when you actually perform those steps, but you will notice it tells you what's selected, if it were copied, and was it pasted? And where did it go? You can see where we changed the name of things. Again, this is great information to have because you can apply it later, or if you need to tweak the macro, you can simply include it in the code. After 20 years of using Excel and with data, I know you're going to encounter files with macros. Just knowing how to look inside the code will let you know what cleaning routines were performed, how to extend them, or how to replace them.

Contents