Skip to content

Single cell wrapper and removal of barcodesToFilter, sampleName, cleanReads#561

Merged
ch99l merged 11 commits intodevel_pre_v4from
sc_spatial_wrapper
Apr 20, 2026
Merged

Single cell wrapper and removal of barcodesToFilter, sampleName, cleanReads#561
ch99l merged 11 commits intodevel_pre_v4from
sc_spatial_wrapper

Conversation

@ch99l
Copy link
Copy Markdown
Collaborator

@ch99l ch99l commented Apr 16, 2026

Related Issue

Fixes # 570

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (please specify if the change breaks existing functionality)
    • Non breaking change (the feature doesn't change existing functionality)
    • Breaking change (the feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance optimization

Description

Summary of Changes

  1. Remove deprecated parameters – barcodesToFilter, sampleName, cleanReads
  2. Renamed demultiplexed -> extractBarcodeUMI
  3. Created a bambu.singlecell() function wrapper to run bambu in single-cell mode
    a. Group related single-cell parameters (dedupUMI, clusters, extractBarcodeUMI) into a opt.singlecell list argument

Implementation Details

NAMESPACE

  • Added export(bambu.singlecell)

R/bambu.R

bambu() signature: Removed demultiplexed, sampleNames, cleanReads, barcodesToFilter; added extractBarcodeUMI = FALSE.

mode = "multiplexed": Now sets extractBarcodeUMI <- TRUE (previously set demultiplexed <- TRUE + cleanReads <- TRUE).

checkInputs() call: sampleNames argument removed.

setBiocParallelParameters() and bambu.processReads() calls: Receive extractBarcodeUMI instead of demultiplexed; sampleNames, cleanReads,
barcodesToFilter removed from the latter.

assignReadClasstoTranscripts() call (quant loop): demultiplexed → extractBarcodeUMI.

New function bambu.singlecell(): Added as a wrapper around bambu() that unpacks opt.singlecell (extracting extractBarcodeUMI, dedupUMI,
clusters) and passes all other args through.


R/bambu-assignDist.R

demultiplexed → extractBarcodeUMI in the assignReadClasstoTranscripts() signature and its generateColData() call.


R/bambu-extendAnnotations-utilityExtend.R

No functional changes — only trailing whitespace removed.


R/bambu-processReads.R

bambu.processReads(): Removed demultiplexed, cleanReads, sampleNames, barcodesToFilter; added extractBarcodeUMI = FALSE. Removed
sampleNames-based renaming of reads. Non-barcode CB assignment changed from sampleNames[i] to names(reads)[i]. All !isFALSE(demultiplexed)
checks replaced with extractBarcodeUMI.

bambu.processReadsByFile(): Same parameter simplification; barcodesToFilter filtering step removed; demultiplexed → extractBarcodeUMI
throughout.

bambu.readsByFile(): Same parameter simplification; barcodesToFilter filtering step removed; demultiplexed → extractBarcodeUMI throughout.


R/bambu_utilityFunctions.R

setBiocParallelParameters(): demultiplexed → extractBarcodeUMI; condition simplified from isFALSE(demultiplexed) to !extractBarcodeUMI.

checkInputs(): sampleNames parameter and its length-mismatch validation removed.

generateColData(): demultiplexed → extractBarcodeUMI throughout (signature, doc comment, and all conditionals).


R/prepareDataFromBam.R

cleanReads parameter removed entirely. File-path-based barcode mapping (where demultiplexed could be a CSV path) removed. The demultiplexed
= TRUE vs path branching collapsed — extractBarcodeUMI = TRUE is now the only mode, always parsing CB/UB tags or CB_UMI#READNAME read names.
Entire cleanReads logic removed (soft/hard clip calculation and primary alignment selection by minimum 5' clip). cells and umi variables
removed. demultiplexed → extractBarcodeUMI in remaining checks.


R/readWrite.R

Comment-only update: demultiplexed → extractBarcodeUMI in an inline comment.

Impact of Changes

The implemented changes do not affect the results (only deprecated arguments are removed)

Checklist

  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • [] I have made corresponding changes to the documentation (vignettes, man pages).
    Note: Documentation to be updated later
  • I have tested the code on a full dataset, and any differences have been described in the Impact of Changes section.

@ch99l
Copy link
Copy Markdown
Collaborator Author

ch99l commented Apr 16, 2026

[UPDATED]

Transcript Comparison Report (comparing SummarisedExperimentObject from devel_pre_v4 and sc_spatial_wrapper)

Uploading transcript_comparison_report (2).pdf…

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors bambu’s single-cell/multiplexed interface by removing deprecated parameters, renaming the barcode/UMI extraction flag, and adding a single-cell wrapper intended to group single-cell options.

Changes:

  • Removed deprecated arguments (barcodesToFilter, sampleNames, cleanReads) and renamed demultiplexedextractBarcodeUMI across the pipeline.
  • Simplified barcode/UMI extraction in BAM preprocessing to only support CB/UB tags or CB_UMI#READNAME parsing.
  • Added bambu.singlecell() wrapper and exported it via NAMESPACE.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
R/bambu.R Updates bambu() signature/logic for extractBarcodeUMI, simplifies clusters handling, and adds exported bambu.singlecell() wrapper.
R/bambu-processReads.R Propagates renamed/simplified single-cell parameters through read processing and removes deprecated inputs.
R/prepareDataFromBam.R Removes barcode-map-file and cleanReads logic; retains CB/UB/read-name parsing under extractBarcodeUMI.
R/bambu_utilityFunctions.R Updates parallel parameter selection and colData generation to use extractBarcodeUMI; removes sampleNames input checks.
R/bambu-assignDist.R Renames function parameter and call site to extractBarcodeUMI.
R/bambu-extendAnnotations-utilityExtend.R Removes CSV fallback for clusters and updates barcode matching logic.
R/readWrite.R Comment update from demultiplexed to extractBarcodeUMI.
NAMESPACE Exports the new bambu.singlecell() entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/bambu-extendAnnotations-utilityExtend.R Outdated
Comment thread R/prepareDataFromBam.R Outdated
Comment thread R/readWrite.R Outdated
Comment thread R/bambu.R
Comment thread R/bambu.R Outdated
Comment thread R/bambu.R Outdated
Comment thread R/bambu_utilityFunctions.R
Comment thread R/bambu-processReads.R
@jonathangoeke
Copy link
Copy Markdown
Member

Code review

Found 1 issue:

  1. Cluster barcode matching regression in isore.extendAnnotations.clusters(): the old code used gsub('demultiplexed','', metadata(readClassList[[1]])$samples) to strip filename prefixes before matching user-provided cluster barcodes. This PR replaced it with a direct match(clusters[[i]], metadata(readClassList[[1]])$samples) without any prefix stripping. However, barcodes are still prefixed with the BAM filename at bambu-processReads.R:71 via paste0(names(reads)[i], '_', CB). Since user-provided cluster barcode vectors won't include these prefixes, match() will return all NAs, and every cluster will be silently skipped by the if(length(index)<20) next guard. The existing TODO comment at line 894 confirms this is a known unresolved problem that this change makes worse.

print(names(clusters)[i])
###TODO need to account for the sample name here which is added to the barcode
index <- match(clusters[[i]], metadata(readClassList[[1]])$samples)
index <- index[!is.na(index)]
print(length(index))

Related barcode prefixing at:

if(extractBarcodeUMI){
mcols(readGrgList[[i]])$CB <- paste0(names(reads)[i], '_', mcols(readGrgList[[i]])$CB)
} else{

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@jonathangoeke
Copy link
Copy Markdown
Member

Code review (below default threshold)

The following issues scored between 25-75 out of 100 confidence — below the default reporting threshold of 80. Included here for completeness.

Score 75:

  1. clusters[[i]] in the quantification loop uses the BAM-file/sample index i to index into clusters. If clusters is a flat named list (as documented in bambu.singlecell()), then clusters[[1]] returns only the first cluster's barcodes rather than all clusters for sample 1. For single-sample runs with multiple clusters, only the first cluster is quantified.

bambu/R/bambu.R

Lines 276 to 278 in 7f55cae

if(!is.null(clusters)){
iter <- clusters[[i]]
}

  1. bambu.singlecell() documents clusters as "A named list mapping cluster names to barcode vectors", but the quantification loop expects a list-of-lists (one entry per BAM file). Users following the documented format will get incorrect quantification results with multiple samples.

bambu/R/bambu.R

Lines 332 to 335 in 7f55cae

#' barcode. Defaults to FALSE}
#' \item{clusters}{A named list mapping cluster names to barcode vectors,
#' used for cluster-level transcript discovery. Defaults to NULL}
#' }

  1. The user-facing message at prepareDataFromBam.R:73 still references "check the barcode map input", but the file-based barcode mapping functionality was removed by this PR. The message should reference CB/UB BAM tags instead.

numNoCBs <- sum(is.na(mcols(readGrgList)$CB))
if(numNoCBs > 0){
message("Removing ", numNoCBs, " reads that were not assigned barcodes. If this is unexpected check the barcode map input")
readGrgList <- readGrgList[!is.na(mcols(readGrgList)$CB)]

  1. The new exported bambu.singlecell() function has no test coverage. Given the significant parameter refactoring in this PR (removing barcodesToFilter, sampleName, cleanReads, renaming demultiplexed), there is regression risk for single-cell workflows.

bambu/R/bambu.R

Lines 338 to 357 in 7f55cae

bambu.singlecell <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
mode = NULL, opt.discovery = NULL, opt.em = NULL, opt.singlecell = NULL,
rcOutDir = NULL, discovery = TRUE, assignDist = TRUE, quant = TRUE,
stranded = FALSE, ncore = 1, yieldSize = NULL, trackReads = FALSE,
returnDistTable = FALSE, lowMemory = FALSE, sampleData = NULL,
fusionMode = FALSE, verbose = FALSE, quantData = NULL,
processByChromosome = FALSE, processByBam = TRUE) {
extractBarcodeUMI <- isTRUE(opt.singlecell$extractBarcodeUMI)
dedupUMI <- isTRUE(opt.singlecell$dedupUMI)
clusters <- opt.singlecell$clusters
bambu(reads = reads, annotations = annotations, genome = genome, NDR = NDR,
mode = mode, opt.discovery = opt.discovery, opt.em = opt.em,
rcOutDir = rcOutDir, discovery = discovery, assignDist = assignDist,
quant = quant, stranded = stranded, ncore = ncore, yieldSize = yieldSize,
trackReads = trackReads, returnDistTable = returnDistTable,
lowMemory = lowMemory, sampleData = sampleData, fusionMode = fusionMode,
verbose = verbose, extractBarcodeUMI = extractBarcodeUMI,
quantData = quantData, dedupUMI = dedupUMI, clusters = clusters,
processByChromosome = processByChromosome, processByBam = processByBam)
}

Score 25:

  1. !extractBarcodeUMI in setBiocParallelParameters() is not NA-safe (unlike the old isFALSE(demultiplexed)). In practice extractBarcodeUMI cannot currently be NA due to upstream guards, so this is defensive-coding only.

#===# set parallel options: otherwise use parallel to distribute samples
bpParameters$workers <- ifelse(length(reads) == 1 & !extractBarcodeUMI, 1, ncore)
bpParameters$progressbar <- ifelse(length(reads) > 1 & !verbose, TRUE, FALSE)

  1. processByBam=TRUE path hardcodes index = 1 for all samples. Pre-existing bug, but the removal of sampleNames removes a potential downstream workaround.

processByChromosome = processByChromosome, trackReads = trackReads, fusionMode = fusionMode,
extractBarcodeUMI = extractBarcodeUMI, dedupUMI = dedupUMI, index = 1)},
BPPARAM = bpParameters)

🤖 Generated with Claude Code

Comment thread R/bambu-processReads.R
Comment thread R/bambu-processReads.R Outdated
Comment thread R/prepareDataFromBam.R
Comment thread R/bambu.R Outdated
@ch99l ch99l force-pushed the sc_spatial_wrapper branch from f259cdd to d26f253 Compare April 16, 2026 13:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

R/prepareDataFromBam.R:76

  • dedupUMI=TRUE currently does not require extractBarcodeUMI=TRUE, but the dedup logic implicitly depends on CB/UMI being present; with extractBarcodeUMI=FALSE it will silently skip (or behave unpredictably). Add an explicit validation (or automatically enable extractBarcodeUMI) when dedupUMI is requested.
    if(dedupUMI){
        #UMI deduplication by barcode
        start.ptm <- proc.time()
        numUMIs <- length(na.omit(unique(mcols(readGrgList)$UMI))) # remove NA UMIs
        if(numUMIs > 100){

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/bambu.R
Comment thread R/bambu.R Outdated
Comment thread R/bambu.R Outdated
Comment thread R/bambu-extendAnnotations-utilityExtend.R
Comment thread R/bambu-processReads.R
Comment thread R/prepareDataFromBam.R
Comment thread R/prepareDataFromBam.R Outdated
Comment thread R/bambu.R
Copy link
Copy Markdown
Member

@jonathangoeke jonathangoeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • simplify input to bambu.singlecell
  • add more description in code
  • update documentation to remove incorrect parts
  • add draft issue regarding documentation updates
  • writeBambuOutput function: is that section necessary?

Comment thread R/bambu.R Outdated
Comment thread R/bambu.R
Comment thread R/bambu.R Outdated
Comment thread R/readWrite.R Outdated
…wrapper, include single cell .bam example, update writeBambuOutput
Copy link
Copy Markdown
Member

@jonathangoeke jonathangoeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment to change order of code in bambu()

Copy link
Copy Markdown
Member

@jonathangoeke jonathangoeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please test again after all changes

@ch99l ch99l requested a review from SuiYue-2308 April 17, 2026 04:14
Copy link
Copy Markdown
Collaborator

@SuiYue-2308 SuiYue-2308 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code manually, and have tested on single cell data, for both one sample and multiple samples. I can not get the result from the pacBio demo data, but I think that is related to the demo data it self, all of the read classes were filtered out from this sample. I will update the demo data later. One small thing is that maybe we can message the changes in writeBambuOutput, like "We don't support .... anymore" to users.
The code changes look good to me, I approve it.

@ch99l ch99l merged commit 6dcb1d5 into devel_pre_v4 Apr 20, 2026
1 of 4 checks passed
@ch99l ch99l deleted the sc_spatial_wrapper branch April 20, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement bambu.singlecell / bambu.spatial function as a wrapper to call bambu in single cell or spatial mode

5 participants