Hello,
I am running 9 cDNA samples sequenced on the PromethION with Bambu. Each ".bam" file is ~30GB long after filtering the reads, so the total amount of data being processed by Bambu is ~270GB. I noticed that the memory requirements are getting pretty high, as the job will fail with 500GB of RAM memory, but will complete if I increase the RAM to 1000GB. Here is the command I am running:
se_novel <- bambu(reads = bam, annotations = bambuAnnotations, rcOutDir = "./bambu_processed_files/", genome = fa_file, lowMemory=TRUE, ncore=8, opt.discovery = list(min.sampleNumber = 5, min.readCount = 5))
The bam variable is a vector with the paths for the 9 ".bam" files.
I was wondering if I am using the rcOutDir option correctly?
It is my understanding that this option is supposed to help with runs utilizing multiple samples, but I am not sure if there is an intermediary step that I am missing. I ask this because in the long run I intend to use bambu to process several dozens, if not hundreds, of cDNA samples generated on the PromethION. However, with the increase in RAM requirements, memory will probably become a limiting factor for processing a larger number of samples.
Any help with this and/or tips on how to avoid out of memory errors with a large number of large samples will be much appreciated!
Hello,
I am running 9 cDNA samples sequenced on the PromethION with Bambu. Each ".bam" file is ~30GB long after filtering the reads, so the total amount of data being processed by Bambu is ~270GB. I noticed that the memory requirements are getting pretty high, as the job will fail with 500GB of RAM memory, but will complete if I increase the RAM to 1000GB. Here is the command I am running:
se_novel <- bambu(reads = bam, annotations = bambuAnnotations, rcOutDir = "./bambu_processed_files/", genome = fa_file, lowMemory=TRUE, ncore=8, opt.discovery = list(min.sampleNumber = 5, min.readCount = 5))The
bamvariable is a vector with the paths for the 9 ".bam" files.I was wondering if I am using the
rcOutDiroption correctly?It is my understanding that this option is supposed to help with runs utilizing multiple samples, but I am not sure if there is an intermediary step that I am missing. I ask this because in the long run I intend to use bambu to process several dozens, if not hundreds, of cDNA samples generated on the PromethION. However, with the increase in RAM requirements, memory will probably become a limiting factor for processing a larger number of samples.
Any help with this and/or tips on how to avoid out of memory errors with a large number of large samples will be much appreciated!