Skip to content

Node reuse scans dotnet.exe processes and spends seconds per command-line query on Windows #13550

@clairernovotny

Description

@clairernovotny

Summary

dotnet build, dotnet clean, and dotnet restore are dramatically slower with the .NET 11 preview SDK / MSBuild 18.6 when node reuse is enabled. The slowdown reproduces with a small multitargeted SDK-style solution.

The root cause appears to be the MSBuild 18.5 change-wave node-reuse filtering path. When worker nodes are hosted as dotnet.exe MSBuild.dll, MSBuild scans all running dotnet.exe processes, retrieves each command line via WMI, and parses /nodemode to determine whether the process is a reusable MSBuild node. On my Windows machine each command-line query takes about 1.3s. With several unrelated background dotnet.exe processes from VS Code / C# Dev Kit / Uno extensions, this adds several seconds before any meaningful target work starts. Stale reusable MSBuild.exe /nodemode:1 nodes add more delay during the system-wide node count at shutdown.

Disabling only the 18.5+ change wave with MSBUILDDISABLEFEATURESFROMVERSION=18.5 makes the same repro fast again.

Environment

  • OS: Windows
  • SDK: 11.0.100-preview.3.26207.106
  • MSBuild: 18.6.0.20806
  • MSBuild source tag inspected: v11.0.0-preview.3.26207.106
  • Invocation host: dotnet
  • Shell: PowerShell

Minimal Repro

This repro creates a small multitargeted SDK-style solution.

$ErrorActionPreference = 'Stop'

$root = Join-Path $env:TEMP 'msbuild-node-reuse-repro'
Remove-Item $root -Recurse -Force -ErrorAction SilentlyContinue
New-Item -ItemType Directory -Path $root | Out-Null
Set-Location $root

dotnet new sln -n Repro

foreach ($name in 'P1','P2','P3') {
    dotnet new classlib -n $name
    dotnet sln Repro.slnx add "$name\$name.csproj"

    @"
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFrameworks>net8.0;net10.0;net11.0</TargetFrameworks>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
</Project>
"@ | Set-Content "$name\$name.csproj" -Encoding UTF8
}

dotnet restore -v:quiet

# Baseline: node reuse enabled, default behavior.
$sw = [Diagnostics.Stopwatch]::StartNew()
dotnet clean Repro.slnx -v:quiet
$sw.Stop()
"default clean: {0:n3}s" -f $sw.Elapsed.TotalSeconds

# A/B test: same command, but disable the 18.5+ MSBuild change wave.
$env:MSBUILDDISABLEFEATURESFROMVERSION = '18.5'
$sw = [Diagnostics.Stopwatch]::StartNew()
dotnet clean Repro.slnx -v:quiet
$sw.Stop()
Remove-Item Env:\MSBUILDDISABLEFEATURESFROMVERSION
"18.5 wave disabled clean: {0:n3}s" -f $sw.Elapsed.TotalSeconds

I also reproduced this with a single multitargeted SDK project:

dotnet build P1\P1.csproj --no-restore -v:quiet

Observed Results

On my machine:

Scenario Time
Single multitargeted project, default dotnet build --no-restore -v:quiet ~26.8s
Same command with MSBUILDDISABLEFEATURESFROMVERSION=18.5 ~1.1s
Same command after killing stale idle MSBuild worker nodes, default behavior ~7.2s
Three-project .slnx clean, default behavior ~39s
Three-project .slnx clean with MSBUILDDISABLEFEATURESFROMVERSION=18.5 ~1.7s
Real solution clean with MSBUILDDISABLEFEATURESFROMVERSION=18.5 ~2.0s

-nr:false also makes the minimal repro fast, but that is only a workaround. The problem is in the node-reuse candidate discovery/counting path.

Expected Behavior

Node reuse should not add several seconds to simple SDK builds/cleans/restores because unrelated dotnet.exe processes happen to be running.

For a trivial multitargeted SDK project, default node reuse should be in the same ballpark as -nr:false, not 20x slower.

Actual Behavior

Before target execution starts, MSBuild spends seconds scanning unrelated dotnet.exe processes and retrieving their command lines to determine whether they are reusable worker nodes.

The same scan also happens during shutdown/reuse decisions when counting system-wide active nodes, and stale reusable worker nodes from another SDK version amplify the delay.

Diagnostic Evidence

With MSBUILDDEBUGCOMM=1, the comm trace for the single-project repro shows that worker launch and pipe handshake are fast. The delay is command-line filtering:

Starting to acquire 2 new or existing node(s) to establish nodes from ID 2 to 3...
Filtering 4 candidate processes by NodeMode OutOfProcNode for process name 'dotnet'
... +1324ms: Skipping process 93372 - NodeMode mismatch. Expected: OutOfProcNode, Found: <null>. Command line: dotnet ... uno.vscode.dll ...
... +1301ms: Skipping process 66284 - NodeMode mismatch. Expected: OutOfProcNode, Found: <null>. Command line: dotnet ... Uno.LSP.Host.dll ...
... +1334ms: Skipping process 101716 - NodeMode mismatch. Expected: OutOfProcNode, Found: <null>. Command line: "C:\Program Files\dotnet\dotnet.exe" "... Microsoft.VisualStudio.ProjectSystem.Server.BuildHost.dll"
... +1330ms: Skipping process 158432 - NodeMode mismatch. Expected: OutOfProcNode, Found: <null>. Command line: "C:\Program Files\dotnet\dotnet.exe" build P1\P1.csproj --no-restore -v:quiet
Filtered to 0 processes matching NodeMode OutOfProcNode
Could not connect to existing process, now creating a process...
Launching node from C:\Program Files\dotnet\sdk\11.0.100-preview.3.26207.106\MSBuild.dll
Successfully launched C:\Program Files\dotnet\dotnet.exe node with PID 107352
Attempting connect to PID 107352 with pipe MSBuild107352 with timeout 30000 ms
Successfully connected to pipe MSBuild107352...!
Successfully connected to created node 3 which is PID 107352

The worker launch/handshake phase is not the expensive part. It completes in milliseconds once the scan has finished.

The same trace also showed shutdown-time node counting delays:

System-wide node count: 7, threshold: 96, this instance has: 2 nodes
System-wide node count: 7, threshold: 96, this instance has: 0 nodes

Those were caused by stale reusable worker nodes:

"C:\Program Files\dotnet\sdk\10.0.300-preview.0.26177.108\MSBuild.exe" /noautoresponse /nologo /nodemode:1 /nodeReuse:true /low:false

After killing those idle MSBuild.exe /nodemode:1 /nodeReuse:true processes, the single-project repro improved from ~26.8s to ~7.2s, leaving the remaining cost from scanning unrelated live dotnet.exe processes.

Suspected Source Path

The relevant code appears to be:

  • src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs
    • GetNodes(...) calls GetPossibleRunningNodes(msbuildLocation, expectedNodeMode) when node reuse is requested.
    • GetPossibleRunningNodes(...) enables node-mode filtering under ChangeWaves.Wave18_5.
    • FilterProcessesByNodeMode(...) iterates candidate processes and calls process.TryGetCommandLine(...).
  • src/Framework/Utilities/ProcessExtensions.cs
    • On Windows, TryGetCommandLine(...) calls Windows.GetCommandLine(process.Id).
    • Windows.GetCommandLine(...) uses WMI COM and runs:
SELECT CommandLine FROM Win32_Process WHERE ProcessId='<pid>'

This is the slow operation in the trace.

The inspected source also shows .NET-hosted CLI worker nodes being launched as dotnet.exe MSBuild.dll, so the candidate process name becomes dotnet, which includes many unrelated background processes:

  • VS Code extension hosts
  • C# Dev Kit project system build host
  • language server processes
  • other unrelated dotnet tools
  • the current dotnet build process itself

Why This Looks Like an MSBuild Regression

The following A/B isolates the issue:

# Slow
dotnet build P1\P1.csproj --no-restore -v:quiet

# Fast
$env:MSBUILDDISABLEFEATURESFROMVERSION = '18.5'
dotnet build P1\P1.csproj --no-restore -v:quiet
Remove-Item Env:\MSBUILDDISABLEFEATURESFROMVERSION

Disabling the 18.5+ change wave removes the slowdown while using the same SDK and same project.

Workarounds

These avoid or reduce the symptom, but they are not good defaults for normal development:

  • -nr:false / /nodeReuse:false
  • MSBUILDDISABLENODEREUSE=1
  • MSBUILDDISABLEFEATURESFROMVERSION=18.5
  • Killing stale idle MSBuild.exe / dotnet.exe MSBuild.dll worker nodes

-m:1 can also avoid some worker-node behavior in some cases, but it is not an acceptable general workaround because it disables normal parallel build scaling.

Potential Fix Direction

The expensive part is using WMI command-line retrieval for every candidate process during node reuse discovery/counting. Possible alternatives:

  • Avoid scanning all dotnet.exe processes when the target worker pipe name is keyed by PID and only actual worker nodes expose MSBuild<pid> pipes.
  • Cache candidate command-line results within a build invocation and reuse them between acquisition and shutdown/counting.
  • Prefer a cheaper test before WMI, such as probing the expected MSBuild worker pipe or another marker that only worker nodes have.
  • Avoid system-wide command-line counting during shutdown, or make it best-effort/cheap.
  • Use the process image plus an MSBuild-specific pipe/mutex/handshake check instead of WMI for unrelated dotnet.exe processes.

Notes

This was originally noticed while investigating slow builds in a real multitargeted .NET library solution. The MSBuild communication trace and the MSBUILDDISABLEFEATURESFROMVERSION=18.5 A/B point to MSBuild node reuse candidate filtering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions