-
Notifications
You must be signed in to change notification settings - Fork 418
Description
I'd like to see a function which would turn a ragged array to a sparse one, usually when a "factor" with non-mutually exclusive choices is tentatively recorded using a group of drop-downs.
For example, if you have such a "factor" with legal values A
/B
/C
/D
recorded over three variables col1
, col2
and col3
.
id | col1 | col2 | col3 |
---|---|---|---|
1 | A | B | C |
2 | B | C | NA |
3 | D | NA | NA |
4 | B | D | NA |
calling such a function, indicating that col1
, col2
and col3
are encoding for the same information, would yield
id | A | B | C | D |
---|---|---|---|---|
1 | T | T | T | F |
2 | F | T | T | F |
3 | F | F | F | T |
4 | F | T | F | T |
Options would include the ability to set a prefix for the new variable names to avoid collisions, and to create the NA
column.
I found this use case many times in medical surveys where disease history is badly recorded using multiple drop-down lists or sets of checkboxes. IIRC, google surveys also treats sets of checkboxes this way, with one column containing semi-colon separated values. This can be dealt with using a call to separate
then a call to binarize
.
Playing around a bit with spread
and gather
allows this behavior but this can be CPU/memory heavy on large dataframes.
There is a (pre-tidyeval) implementation in PR #288