Skip to content

ML Agent does not utilize user-provided initial_code for first iteration #53

@yqwang96

Description

@yqwang96

Problem Description

When using the ML Agent for machine learning tasks, the framework struggles to find the first feasible solution. The root cause is that the ML Agent does not properly utilize the user-provided initial_code configuration when the memory is empty (first iteration).

Root Cause Analysis

The framework already supports initial_code configuration in EvolveConfig, and it is correctly passed to the Context. However, there is an implementation inconsistency between Math Agent and ML Agent.

Math Agent (Correct Implementation)

Location: agents/math_agent/planner/plan_agent.py:87-118

init_solution = context.init_solution
init_evaluation = context.init_evaluation
init_score = 0.0
if context.init_score is not None:
    init_score = context.init_score
init_parent = {
    "solution": init_solution,
    "score": init_score,
    "evaluation": init_evaluation,
    "summary": "This is the initial solution, it has no parents...",
}

parent = self.database.sample_solution(island_id)
parent_dict = parent if parent else init_parent  # ✅ Correct: uses init_parent as fallback

ML Agent (Problematic Implementation)

Location: agents/ml_agent/planner/ml_planner.py:138-144

parent = self.database.sample_solution(context.island_id)

parent_dict = parent if parent else {}  # ❌ Problem: empty dict, initial_code is lost
parent_json = json.dumps(parent_dict, ensure_ascii=False, indent=2)

Impact

When memory is empty:

  • Math Agent: Uses user-provided initial_code as the initial parent
  • ML Agent: Uses empty dict {}, meaning no reference code at all

This causes ML Agent to:

  1. Have no code baseline to reference
  2. Rely entirely on LLM to generate complete workflow from scratch
  3. Experience unstable generation quality, leading to inefficient early iterations

Proposed Solution

Minimal Fix: Modify ml_planner.py to use init_parent fallback pattern (same as Math Agent):

# Add after line 90 in run() method
init_solution = context.init_solution
init_evaluation = context.init_evaluation or ""
init_score = 0.0
if context.init_score is not None:
    init_score = context.init_score

init_parent = {
    "solution": init_solution,
    "score": init_score,
    "evaluation": init_evaluation,
    "summary": "This is the initial solution provided by user, start evolution from here",
}

# Modify line 143
parent_dict = parent if parent else (init_parent if init_solution else {})

Usage Example

After the fix, users can provide initial code in task_config.yaml:

evolve:
  initial_code: |
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import cross_val_score

    train = pd.read_csv('train.csv')
    X = train.drop('target', axis=1)
    y = train['target']

    model = RandomForestClassifier(n_estimators=100, random_state=42)
    scores = cross_val_score(model, X, y, cv=5)
    print(f"CV Score: {scores.mean():.4f}")
  initial_score: 0.75  # Optional, will be auto-evaluated if not provided

Related Files

  • src/loongflow/framework/pes/context/config.py - Configuration definition
  • src/loongflow/framework/pes/pes_agent.py - PES Agent main logic
  • agents/math_agent/planner/plan_agent.py - Math Agent correct implementation reference
  • agents/ml_agent/planner/ml_planner.py - ML Agent problematic location
  • agents/ml_agent/executor/ml_executor.py - ML Executor (may need adaptation)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions