Clarification of the "control flow" page in the "DataOps" User Guide section by emassoulie · Pull Request #2010 · skrub-data/skrub

emassoulie · 2026-04-01T15:39:56Z

Proposed reordering and renaming of the "Control flow" page, to better indicate the use of the page:

The notions of "eager" and "deferred" execution are defined a little earlier
Added subsection names
A few error examples have been reduced

jeromedockes

jeromedockes · 2026-04-01T16:09:54Z

+Running complex operations on DataOps variables: deferred evaluation
+====================================================================
+
+Why DataOps cannot handle complex operations


they can. maybe something like "why some operations need to be inside functions" or similar

jeromedockes · 2026-04-01T16:11:30Z

-This remains true even if we have provided a value for ``orders`` and we can
-see a result for that value:


I think this still need to be stated somewhere because it can be confusing: what do you mean it will be computed later I see it right here in the repr

jeromedockes · 2026-04-01T16:14:05Z

-transformation that we apply must not modify its input, but leave it unchanged
-and return a new value.
-
-Consider the transformers in a scikit-learn pipeline: each computes a new


from oral discussions I remember this comparison can help understand why each node must return a new value instead of modifying the input

rcap107

Thanks for the PR @emassoulie, I left a few comments with some suggested changes

rcap107 · 2026-04-20T14:27:37Z

-columns: it is a skrub DataOp that will produce a list of columns, later,
+over the columns. This is the way any computation on any variable is usually run,
+referred to here as *eager* evaluation. However, ``orders.columns`` is not an actual
+list of columns: it is a skrub DataOp that will produce a list of columns, later,


I wonder if this section could benefit from a small example, something like

>>> import pandas as pd >>> import skrub >>> df = pd.DataFrame({"a": [1, 2], "b": [3, 4]}) >>> df >>> a = skrub.var("df", df) >>> cols = a.columns >>> cols.skb.eval({"df": df.drop(columns="b")})

to show that the result of the evaluation depends on what's in side the variable

rcap107 · 2026-04-20T14:36:42Z

-:func:`deferred` function. But we should make a (shallow) copy of the inputs and
-return a new value.
-
 Finally, there are other situations where using :func:`deferred` can be helpful:


I think the "Finally..." paragraph should be moved at the end of the previous section, possibly with the examples

rcap107 · 2026-04-20T14:37:29Z

+>>> csv_path = skrub.var("csv_path")
+>>> data = skrub.deferred(pd.read_csv)(csv_path)
+
 Unpacking multiple outputs from deferred functions


I wonder if after editing the rest of this section, the part about unpacking should be put into a drop down as additional explanation, something like "note about unpacking"

rcap107 · 2026-04-20T14:40:16Z

-applying ``deferred`` and calling the function as shown above we can use
+
+.. warning::
+  DataOps are evaluated *lazily* (we are building a pipeline, not immediately


I am not sure I understand what this section is saying

Eloi Massoulié added 2 commits April 1, 2026 17:20

Proposed shortening

22ce762

Reinserting subsection and adding subsection names

016d906

jeromedockes reviewed Apr 1, 2026

View reviewed changes

Eloi Massoulié added 2 commits April 9, 2026 15:25

Adjustments

d1d3e68

Alternate warning on lazy evaluation

0ba8d0c

rcap107 reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of the "control flow" page in the "DataOps" User Guide section#2010

Clarification of the "control flow" page in the "DataOps" User Guide section#2010
emassoulie wants to merge 4 commits intoskrub-data:mainfrom
emassoulie:doc-dataops-rearrange-control-flow

emassoulie commented Apr 1, 2026

Uh oh!

jeromedockes left a comment

Uh oh!

jeromedockes Apr 1, 2026

Uh oh!

jeromedockes Apr 1, 2026

Uh oh!

jeromedockes Apr 1, 2026

Uh oh!

rcap107 left a comment •

edited

Loading

Uh oh!

rcap107 Apr 20, 2026

Uh oh!

rcap107 Apr 20, 2026

Uh oh!

rcap107 Apr 20, 2026

Uh oh!

rcap107 Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		This remains true even if we have provided a value for ``orders`` and we can
		see a result for that value:

Conversation

emassoulie commented Apr 1, 2026

Uh oh!

jeromedockes left a comment

Choose a reason for hiding this comment

Uh oh!

jeromedockes Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeromedockes Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeromedockes Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

rcap107 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcap107 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

rcap107 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

rcap107 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

rcap107 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rcap107 left a comment •

edited

Loading