From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Handle nulls in PySpark

Handle nulls in PySpark

- [Instructor] Handling null values is something which every retail engineer must know because whatever the files you read and whatever the operation you do, there is very high chances that you encounter with this null problems. So let's just see how we can handle this Null values. When you read some data frame and if you feel that there are some null value might exist in any of the column and you want to remove that null value, you can use a function call na.drop. Now this na.drop function is going to remove the records where any of the value in the column is null. For example, there are five columns in a data frame and if there is null value exists in any of the column, in that case, that specific row will be eliminated. Before using this na.drop, let me just show you what is the total count of the rows in our actual data frame DF.count. We'll print that. Then we are going to execute. Data frame one is equal to df.na.drop and let's just see the total count after that. So we get some…

Contents