Memory issues and (potential) chunk-wise processing of peaks data

By using a `MsBackendSql` for a very large MS experiment we ran into memory issues (see issue #303 ) processing the data. All operations working on peaks data use the internal `.peaksapply` function that by default splits the processing based on `dataStorage(x)` and performs the operation in parallel (even if *processing* might only mean loading the data from the raw data files). For in memory backends and the e.g. `MsBackendSql`, `dataStorage` will only have a single value, thus no splitting/parallel processing is performed.

Advantages of splitting and parallel processing

- performance: for computationally intense processing steps.
- lower memory demand: only the data currently being processed would be loaded into memory.

These advantages are good for very large data sets. The downside of splitting and parallel processing is however also the overhead of splitting and combining the data. Thus, in many cases (e.g. also for in-memory backends) it might actually be good to **not** do it.

Maybe it would make sense to add a parameter or let the user choose to enable parallel/split processing, even if backends other than `MsBackendMzR` are used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory issues and (potential) chunk-wise processing of peaks data #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory issues and (potential) chunk-wise processing of peaks data #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions