From the course: Complete Guide to Databricks for Data Engineering
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Handle nulls in PySpark - Databricks Tutorial
From the course: Complete Guide to Databricks for Data Engineering
Handle nulls in PySpark
- [Instructor] Handling null values is something which every retail engineer must know because whatever the files you read and whatever the operation you do, there is very high chances that you encounter with this null problems. So let's just see how we can handle this Null values. When you read some data frame and if you feel that there are some null value might exist in any of the column and you want to remove that null value, you can use a function call na.drop. Now this na.drop function is going to remove the records where any of the value in the column is null. For example, there are five columns in a data frame and if there is null value exists in any of the column, in that case, that specific row will be eliminated. Before using this na.drop, let me just show you what is the total count of the rows in our actual data frame DF.count. We'll print that. Then we are going to execute. Data frame one is equal to df.na.drop and let's just see the total count after that. So we get some…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
Use filter and where transformations in PySpark8m 30s
-
(Locked)
Add or remove columns in PySpark8m 56s
-
(Locked)
Use the select function in PySpark6m 16s
-
(Locked)
Use UNION and DISTINCT in PySpark5m 31s
-
(Locked)
Handle nulls in PySpark8m 39s
-
(Locked)
Use sortBy and orderBy in PySpark9m 38s
-
(Locked)
Use groupBy and aggregation in PySpark8m 27s
-
(Locked)
Manipulate strings in PySpark14m 21s
-
(Locked)
Handle date manipulation in PySpark9m 37s
-
(Locked)
Handle timestamp manipulation in PySpark4m 29s
-
-
-
-
-
-
-
-
-
-
-
-