Removing empty columns in PowerBI
When consuming large datasets, the data does not typically only contain a large number of rows, but also a large number of columns. Depending on the imported payload, many of the columns might be empty columns, i.e. columns completely filled with null values.
While this in itself is not a massive issue, it can be a challenge when building a PowerBI report as you need to identify which of the columns are relevant for the visualisations used, and many columns mean a large list to choose from when building visuals.
Take for instance a table, like this where colA is empty:
Typical approach suggested
When people search for a way to resolve this, the often get the following approach:
One of the issues with this approach though is that you lose the original column names.
When for instance using the above table, the output looks something like this where colA is no longer there, colB is now Column1, colC is Column2 and so on.:
Manual approach while retaining the column names
To retain the column names, the process is slightly different.
In order to retain the column names, first we're making the table headers the first row of the table. This can be done via Transform > Use first row as headers > Use headers as first row.
If we transpose the table now via Transform > Transpose, the column headers are retained in the first column.
However, since the columns are shown in the first column, there are no Blank Rows (there is always at least one field in the row populated), so the second step above will no longer work.
Instead, we can do the following:
First we add a column via Add Column > Custom Column and in the formula, we can add the following:
= each if List.NonNullCount(Record.ToList(_))<2 then true else false
Essentially we check if less than 2 fields are populated with a value or not. Instead of less than 2, we could have made it equals 1 but to be safe, I added less than two in case the table had empty records.
A new column is added with logical true/false values for each row as can be seen in the screen shot below.
To remove the empty columns, we can filter the new column by only show FALSE values (thus removing all rows where one column or less are populated).
When done, we can remove the custom column as this would no longer be needed.
Finally, we can transpose the table back and promote the first row back to the headers via Transform > Use first row as headers.
If the preference is to have the option above as a reusable power query function, the following can be added to a blank query, where all individual steps are combined in a single step:
(selectedTable as table) =>
let RemoveColsViaPivot =
Table.PromoteHeaders(
Table.Transpose(Table.RemoveColumns(
Table.SelectRows(
Table.AddColumn(
Table.Transpose(Table.DemoteHeaders(selectedTable)),
"Custom",
each if List.NonNullCount(Record.ToList(_))<2 then true else false
),
each [Custom] = false
),{"Custom"})), [PromoteAllScalars=true])
in RemoveColsViaPivot
Copy the content in a blank query and give the Query a name. In a table, simply call it via:
#"newstepname" = QueryName(#"previousstepname")
Or if you want to use it as a step directly in the table, use the following instead:
#"newstepname" =
Table.PromoteHeaders(
Table.Transpose(Table.RemoveColumns(
Table.SelectRows(
Table.AddColumn(
Table.Transpose(Table.DemoteHeaders(<previousstepname>)),
"Custom",
each if List.NonNullCount(Record.ToList(_))<2 then true else false
),
each [Custom] = false
),{"Custom"})), [PromoteAllScalars=true])
Replace <previousstepname> with the name of the step which contains the table where the columns need to be removed.
Another option in Power Query
If Power Query is used, there is an easier way to get the same result.
When looking at the Table.RemoveColumns function (https://learn.microsoft.com/en-us/powerquery-m/table-removecolumns), you can remove multiple columns at once by identifying them as a list:
Table.RemoveColumns(#"previousstepname",{"colA","colB",...}
So in this alternate option, we can loop through all columns and check if they contain other values than null and if not, adding the column name to a list, otherwise the value null is added.
So if a column only contains null values, the function ascertains the table is empty:
And if the column contains any other value, a non-empty table is returned:
The resulting list with column names can become the input for Table.RemoveColumns after all null values are removed from the resulting list.
The resulting function with the loop looks something like this:
(selectedTable as table) =>
Table.RemoveColumns(
selectedTable,
List.RemoveNulls(
List.Generate(()=>
[i=0],
each [i]<Table.ColumnCount(selectedTable),
each [i=[i]+1],
each let i = [i] in
if Table.RowCount(Table.SelectRows(selectedTable,
each Record.Field(_,Table.ColumnNames(selectedTable){i})<>null)) = 0
then Table.ColumnNames(selectedTable){i}
else null
)
)
)
Or as a step:
#"newstepname" = Table.RemoveColumns(
selectedTable, List.RemoveNulls(
List.Generate(()=>[i=0],
each [i]<Table.ColumnCount(selectedTable),
each [i=[i]+1],
each let i = [i] in
if Table.RowCount(Table.SelectRows(
selectedTable,
each Record.Field(_,Table.ColumnNames(selectedTable){i})<>null)) = 0
then Table.ColumnNames(selectedTable){i}
else null
)
)
)