-
Notifications
You must be signed in to change notification settings - Fork 28
Description
By using a MsBackendSql
for a very large MS experiment we ran into memory issues (see issue #303 ) processing the data. All operations working on peaks data use the internal .peaksapply
function that by default splits the processing based on dataStorage(x)
and performs the operation in parallel (even if processing might only mean loading the data from the raw data files). For in memory backends and the e.g. MsBackendSql
, dataStorage
will only have a single value, thus no splitting/parallel processing is performed.
Advantages of splitting and parallel processing
- performance: for computationally intense processing steps.
- lower memory demand: only the data currently being processed would be loaded into memory.
These advantages are good for very large data sets. The downside of splitting and parallel processing is however also the overhead of splitting and combining the data. Thus, in many cases (e.g. also for in-memory backends) it might actually be good to not do it.
Maybe it would make sense to add a parameter or let the user choose to enable parallel/split processing, even if backends other than MsBackendMzR
are used.