From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Use the select function in PySpark

Use the select function in PySpark

- [Instructor] Now it's time to select some specific column because so far, we were displaying all the columns every time. If, like an SQL, if you want to select some of a specific column from your data frame, you can do that. How? Let us see. For example, we have a data frame. Out of that, we want to select only few columns. So I can use the select function and I can paste the name of all those columns that I want to view. For example: age, gender, and, maybe let's say, customer type. And then again, this will also give me another data frame. I can store it into a different variable, that is data frame one, and now I can say "display data frame one". If you see the output, you will find that this is showing me only three columns, not all the columns. In fact, sometimes I want to create a derived column. Derived column means you want to use your existing column, do some manipulation in that, and get some value and you want to see that. How you can do that? Let us see. For example, I…

Contents