How to use `rcOutDir` option to process multiple samples and avoid out of memory errors?

Hello,

I am running 9 cDNA samples sequenced on the PromethION with Bambu. Each ".bam" file is ~30GB long after filtering the reads, so the total amount of data being processed by Bambu is ~270GB. I noticed that the memory requirements are getting pretty high, as the job will fail with 500GB of RAM memory, but will complete if I increase the RAM to 1000GB.  Here is the command I am running:


`se_novel <- bambu(reads = bam, annotations = bambuAnnotations, rcOutDir = "./bambu_processed_files/", genome = fa_file, lowMemory=TRUE, ncore=8, opt.discovery = list(min.sampleNumber = 5, min.readCount = 5))`

The `bam` variable is a vector with the paths for the 9 ".bam" files.

I was wondering if I am using the `rcOutDir` option correctly?

It is my understanding that this option is supposed to help with runs utilizing multiple samples, but I am not sure if there is an intermediary step that I am missing. I ask this because in the long run I intend to use bambu to process several dozens, if not hundreds, of cDNA samples generated on the PromethION. However, with the increase in RAM requirements, memory will probably become a limiting factor for processing a larger number of samples.

Any help with this and/or tips on how to avoid out of memory errors with a large number of large samples will be much appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use `rcOutDir` option to process multiple samples and avoid out of memory errors? #278

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use rcOutDir option to process multiple samples and avoid out of memory errors? #278

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How to use `rcOutDir` option to process multiple samples and avoid out of memory errors? #278