Skip to content

Memory issues and (potential) chunk-wise processing of peaks data #304

@jorainer

Description

@jorainer

By using a MsBackendSql for a very large MS experiment we ran into memory issues (see issue #303 ) processing the data. All operations working on peaks data use the internal .peaksapply function that by default splits the processing based on dataStorage(x) and performs the operation in parallel (even if processing might only mean loading the data from the raw data files). For in memory backends and the e.g. MsBackendSql, dataStorage will only have a single value, thus no splitting/parallel processing is performed.

Advantages of splitting and parallel processing

  • performance: for computationally intense processing steps.
  • lower memory demand: only the data currently being processed would be loaded into memory.

These advantages are good for very large data sets. The downside of splitting and parallel processing is however also the overhead of splitting and combining the data. Thus, in many cases (e.g. also for in-memory backends) it might actually be good to not do it.

Maybe it would make sense to add a parameter or let the user choose to enable parallel/split processing, even if backends other than MsBackendMzR are used.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions