Skip to content

FR: Add function to tidy non-mutually exclusive factors surveyed over multiple inputs #384

@MaximeWack

Description

@MaximeWack

I'd like to see a function which would turn a ragged array to a sparse one, usually when a "factor" with non-mutually exclusive choices is tentatively recorded using a group of drop-downs.

For example, if you have such a "factor" with legal values A/B/C/D recorded over three variables col1, col2 and col3.

id col1 col2 col3
1 A B C
2 B C NA
3 D NA NA
4 B D NA

calling such a function, indicating that col1, col2 and col3 are encoding for the same information, would yield

id A B C D
1 T T T F
2 F T T F
3 F F F T
4 F T F T

Options would include the ability to set a prefix for the new variable names to avoid collisions, and to create the NA column.

I found this use case many times in medical surveys where disease history is badly recorded using multiple drop-down lists or sets of checkboxes. IIRC, google surveys also treats sets of checkboxes this way, with one column containing semi-colon separated values. This can be dealt with using a call to separate then a call to binarize.

Playing around a bit with spread and gather allows this behavior but this can be CPU/memory heavy on large dataframes.

There is a (pre-tidyeval) implementation in PR #288

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions