Skip to content

combine(by:) silently drops matching items when meta map values have GString vs String mismatch #7053

@robsyme

Description

@robsyme

Bug report

combine(by:) silently drops matching items when one side of the join has a GString value inside a Map key and the other has a String with the same textual content.

Expected behavior and actual behavior

Expected: When two channels are joined with combine(by: [0]) and the keys at index 0 are Maps that compare equal via equals(), the operator should emit one output tuple per matching key.

Actual: No output is emitted for keys where the Map values have mismatched types (specifically GString vs String). The Map keys compare equal via .equals() but produce different hashCode() values, so the HashMap lookups in CombineOp.emit() place the left and right items in different buckets and never pair them. The workflow completes successfully with incomplete results — no warning or error is raised.

This is particularly insidious during resume: meta maps reconstructed from Kryo-deserialized cache entries can end up with subtly different types than their equivalents from freshly-executed tasks, causing a handful of items in a large channel to be silently dropped.

This is related to but distinct from:

Steps to reproduce the problem

repro.nf:

workflow {
    def xs = [1, 2]

    // LEFT: meta.chrom is a GString (produced by runtime interpolation)
    ch_left = Channel.of(
        [[id: xs[0], chrom: "chr${xs[0]}"], "left_1.txt"],
        [[id: xs[1], chrom: "chr${xs[1]}"], "left_2.txt"]
    )

    // RIGHT: meta.chrom is a String (produced by concatenation)
    ch_right = Channel.of(
        [[id: xs[0], chrom: 'chr' + xs[0]], "right_1.txt"],
        [[id: xs[1], chrom: 'chr' + xs[1]], "right_2.txt"]
    )

    // Sanity: maps compare equal via equals() but have different hashCode
    def m1 = [id: xs[0], chrom: "chr${xs[0]}"]
    def m2 = [id: xs[0], chrom: 'chr' + xs[0]]
    println "LEFT meta.chrom class  = ${m1.chrom.getClass().name}"
    println "RIGHT meta.chrom class = ${m2.chrom.getClass().name}"
    println "m1.equals(m2)          = ${m1.equals(m2)}"
    println "hashCodes match?       = ${m1.hashCode() == m2.hashCode()}"

    // Expected: 2 combined tuples
    // Actual:   0 combined tuples (silent drop)
    ch_left
        .combine(ch_right, by: [0])
        .view { "COMBINED: $it" }
}

Run: nextflow run repro.nf

Program output

LEFT meta.chrom class  = org.codehaus.groovy.runtime.GStringImpl
RIGHT meta.chrom class = java.lang.String
m1.equals(m2)          = true
hashCodes match?       = false

No COMBINED: lines are printed. The workflow exits cleanly with no warnings.

If the left channel is changed so both sides use String values (or both use GString values), the combine produces both expected tuples:

COMBINED: [[id:1, chrom:chr1], left_1.txt, right_1.txt]
COMBINED: [[id:2, chrom:chr2], left_2.txt, right_2.txt]

Environment

  • Nextflow version: 25.04.8 (also reproduces on 25.10.4; not tested on master)
  • Java version: OpenJDK 21.0.10
  • Operating system: macOS 25.4.0 (Darwin)
  • Bash version: zsh 5.9

Additional context

Root cause

CombineOp.emit() (modules/nextflow/src/main/groovy/nextflow/extension/CombineOp.groovy) stores items in two HashMaps, leftValues and rightValues, keyed by the value(s) at the by: indices:

synchronized void emit( DataflowWriteChannel target, int index, List p, v ) {
    if( leftValues[p] == null ) leftValues[p] = []
    if( rightValues[p] == null ) rightValues[p] = []
    // ...
}

The key p is produced by DataflowHelper.makeKey()KeyPair.addKey()KeyPair.safeStr(). safeStr() converts top-level GStringString but does not recurse into Map or List values:

// KeyPair.groovy:43
static private safeStr(key) {
    key instanceof GString ? key.toString() : key
}

When the key element is a Map containing GString values, those nested GStrings remain unconverted. The Map's hashCode() differs between a GString-valued and String-valued version, so HashMap lookups miss.

Suggested fix

Extend KeyPair.safeStr() to recurse one level into Map values and List elements, short-circuiting when no GString is present to avoid allocation in the common case. This fixes combine, join, cross, and phase in one place. GroupTupleOp.normalizeKey() (added in #6400) can be reduced to a thin wrapper that delegates to the same utility.

static private safeStr(key) {
    if( key instanceof GString )
        return key.toString()
    if( key instanceof Map ) {
        // short-circuit if no GStrings present
        def needsConversion = key.values().any { it instanceof GString || it instanceof Map || it instanceof List }
        if( !needsConversion ) return key
        def result = new LinkedHashMap(key.size())
        key.each { k, v -> result.put(k, safeStr(v)) }
        return result
    }
    if( key instanceof List ) {
        def needsConversion = key.any { it instanceof GString || it instanceof Map || it instanceof List }
        if( !needsConversion ) return key
        def result = new ArrayList(key.size())
        for( e in key ) result.add(safeStr(e))
        return result
    }
    return key
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions