Bug report
combine(by:) silently drops matching items when one side of the join has a GString value inside a Map key and the other has a String with the same textual content.
Expected behavior and actual behavior
Expected: When two channels are joined with combine(by: [0]) and the keys at index 0 are Maps that compare equal via equals(), the operator should emit one output tuple per matching key.
Actual: No output is emitted for keys where the Map values have mismatched types (specifically GString vs String). The Map keys compare equal via .equals() but produce different hashCode() values, so the HashMap lookups in CombineOp.emit() place the left and right items in different buckets and never pair them. The workflow completes successfully with incomplete results — no warning or error is raised.
This is particularly insidious during resume: meta maps reconstructed from Kryo-deserialized cache entries can end up with subtly different types than their equivalents from freshly-executed tasks, causing a handful of items in a large channel to be silently dropped.
This is related to but distinct from:
Steps to reproduce the problem
repro.nf:
workflow {
def xs = [1, 2]
// LEFT: meta.chrom is a GString (produced by runtime interpolation)
ch_left = Channel.of(
[[id: xs[0], chrom: "chr${xs[0]}"], "left_1.txt"],
[[id: xs[1], chrom: "chr${xs[1]}"], "left_2.txt"]
)
// RIGHT: meta.chrom is a String (produced by concatenation)
ch_right = Channel.of(
[[id: xs[0], chrom: 'chr' + xs[0]], "right_1.txt"],
[[id: xs[1], chrom: 'chr' + xs[1]], "right_2.txt"]
)
// Sanity: maps compare equal via equals() but have different hashCode
def m1 = [id: xs[0], chrom: "chr${xs[0]}"]
def m2 = [id: xs[0], chrom: 'chr' + xs[0]]
println "LEFT meta.chrom class = ${m1.chrom.getClass().name}"
println "RIGHT meta.chrom class = ${m2.chrom.getClass().name}"
println "m1.equals(m2) = ${m1.equals(m2)}"
println "hashCodes match? = ${m1.hashCode() == m2.hashCode()}"
// Expected: 2 combined tuples
// Actual: 0 combined tuples (silent drop)
ch_left
.combine(ch_right, by: [0])
.view { "COMBINED: $it" }
}
Run: nextflow run repro.nf
Program output
LEFT meta.chrom class = org.codehaus.groovy.runtime.GStringImpl
RIGHT meta.chrom class = java.lang.String
m1.equals(m2) = true
hashCodes match? = false
No COMBINED: lines are printed. The workflow exits cleanly with no warnings.
If the left channel is changed so both sides use String values (or both use GString values), the combine produces both expected tuples:
COMBINED: [[id:1, chrom:chr1], left_1.txt, right_1.txt]
COMBINED: [[id:2, chrom:chr2], left_2.txt, right_2.txt]
Environment
- Nextflow version: 25.04.8 (also reproduces on 25.10.4; not tested on master)
- Java version: OpenJDK 21.0.10
- Operating system: macOS 25.4.0 (Darwin)
- Bash version: zsh 5.9
Additional context
Root cause
CombineOp.emit() (modules/nextflow/src/main/groovy/nextflow/extension/CombineOp.groovy) stores items in two HashMaps, leftValues and rightValues, keyed by the value(s) at the by: indices:
synchronized void emit( DataflowWriteChannel target, int index, List p, v ) {
if( leftValues[p] == null ) leftValues[p] = []
if( rightValues[p] == null ) rightValues[p] = []
// ...
}
The key p is produced by DataflowHelper.makeKey() → KeyPair.addKey() → KeyPair.safeStr(). safeStr() converts top-level GString → String but does not recurse into Map or List values:
// KeyPair.groovy:43
static private safeStr(key) {
key instanceof GString ? key.toString() : key
}
When the key element is a Map containing GString values, those nested GStrings remain unconverted. The Map's hashCode() differs between a GString-valued and String-valued version, so HashMap lookups miss.
Suggested fix
Extend KeyPair.safeStr() to recurse one level into Map values and List elements, short-circuiting when no GString is present to avoid allocation in the common case. This fixes combine, join, cross, and phase in one place. GroupTupleOp.normalizeKey() (added in #6400) can be reduced to a thin wrapper that delegates to the same utility.
static private safeStr(key) {
if( key instanceof GString )
return key.toString()
if( key instanceof Map ) {
// short-circuit if no GStrings present
def needsConversion = key.values().any { it instanceof GString || it instanceof Map || it instanceof List }
if( !needsConversion ) return key
def result = new LinkedHashMap(key.size())
key.each { k, v -> result.put(k, safeStr(v)) }
return result
}
if( key instanceof List ) {
def needsConversion = key.any { it instanceof GString || it instanceof Map || it instanceof List }
if( !needsConversion ) return key
def result = new ArrayList(key.size())
for( e in key ) result.add(safeStr(e))
return result
}
return key
}
Bug report
combine(by:)silently drops matching items when one side of the join has a GString value inside a Map key and the other has a String with the same textual content.Expected behavior and actual behavior
Expected: When two channels are joined with
combine(by: [0])and the keys at index 0 are Maps that compare equal viaequals(), the operator should emit one output tuple per matching key.Actual: No output is emitted for keys where the Map values have mismatched types (specifically
GStringvsString). The Map keys compare equal via.equals()but produce differenthashCode()values, so the HashMap lookups inCombineOp.emit()place the left and right items in different buckets and never pair them. The workflow completes successfully with incomplete results — no warning or error is raised.This is particularly insidious during resume: meta maps reconstructed from Kryo-deserialized cache entries can end up with subtly different types than their equivalents from freshly-executed tasks, causing a handful of items in a large channel to be silently dropped.
This is related to but distinct from:
groupTuple, fixed byGroupTupleOp.normalizeKey(). The fix there doesn't recurse into Map values.groupKeyvs plain value asymmetry ingroupTuple. Resolved docs-only.groupKeyunwrapping injoin/combine. Closed in favor of operators v2.Steps to reproduce the problem
repro.nf:workflow { def xs = [1, 2] // LEFT: meta.chrom is a GString (produced by runtime interpolation) ch_left = Channel.of( [[id: xs[0], chrom: "chr${xs[0]}"], "left_1.txt"], [[id: xs[1], chrom: "chr${xs[1]}"], "left_2.txt"] ) // RIGHT: meta.chrom is a String (produced by concatenation) ch_right = Channel.of( [[id: xs[0], chrom: 'chr' + xs[0]], "right_1.txt"], [[id: xs[1], chrom: 'chr' + xs[1]], "right_2.txt"] ) // Sanity: maps compare equal via equals() but have different hashCode def m1 = [id: xs[0], chrom: "chr${xs[0]}"] def m2 = [id: xs[0], chrom: 'chr' + xs[0]] println "LEFT meta.chrom class = ${m1.chrom.getClass().name}" println "RIGHT meta.chrom class = ${m2.chrom.getClass().name}" println "m1.equals(m2) = ${m1.equals(m2)}" println "hashCodes match? = ${m1.hashCode() == m2.hashCode()}" // Expected: 2 combined tuples // Actual: 0 combined tuples (silent drop) ch_left .combine(ch_right, by: [0]) .view { "COMBINED: $it" } }Run:
nextflow run repro.nfProgram output
No
COMBINED:lines are printed. The workflow exits cleanly with no warnings.If the left channel is changed so both sides use
Stringvalues (or both useGStringvalues), the combine produces both expected tuples:Environment
Additional context
Root cause
CombineOp.emit()(modules/nextflow/src/main/groovy/nextflow/extension/CombineOp.groovy) stores items in two HashMaps,leftValuesandrightValues, keyed by the value(s) at theby:indices:The key
pis produced byDataflowHelper.makeKey()→KeyPair.addKey()→KeyPair.safeStr().safeStr()converts top-levelGString→Stringbut does not recurse intoMaporListvalues:When the key element is a Map containing
GStringvalues, those nested GStrings remain unconverted. The Map'shashCode()differs between aGString-valued andString-valued version, so HashMap lookups miss.Suggested fix
Extend
KeyPair.safeStr()to recurse one level intoMapvalues andListelements, short-circuiting when no GString is present to avoid allocation in the common case. This fixescombine,join,cross, andphasein one place.GroupTupleOp.normalizeKey()(added in #6400) can be reduced to a thin wrapper that delegates to the same utility.