Skip to content

Regrouping pages on model tuning in the DataOps User Guide#2014

Draft
emassoulie wants to merge 1 commit intomainfrom
issue-2003-regroup-dataops-hyperparameter-pages
Draft

Regrouping pages on model tuning in the DataOps User Guide#2014
emassoulie wants to merge 1 commit intomainfrom
issue-2003-regroup-dataops-hyperparameter-pages

Conversation

@emassoulie
Copy link
Copy Markdown
Contributor

The five pages on validation and tuning at the end of the DataOps section of the User Guide have been fused into three main parts:

  • Validating a DataOps model
  • Hyperparameter tuning
  • Going further with Optuna

These parts have examples that should be shortened (or made into "example" pages), and subsection titles that have been made more explicit.

@emassoulie
Copy link
Copy Markdown
Contributor Author

Marked as draft because, in addition to the restructuring, the pages also need to be trimmed a little (with the larger examples perhaps moved to their own section).

Copy link
Copy Markdown
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @emassoulie, overall I think it's an improvement over the current documentation. I left a couple of comments where I was not convinced by the wording, but aside from that I think we can merge this soon.


Here are the different kinds of choices, along with their default outcome when
we are not using hyperparameter search:
Skrub provides over 10 different ``choose`` methods for tuning use cases, all detailed
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say there are 10 "different" choose methods: the choose methods are 4 (from, int, float and bool), but they can be used in different ways


Splitting the data in train and test sets
=========================================
More advanced train/test splitting
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original title should be kept here: this section isn't about a more advanced way of defining train and test splits, we really are just splitting the data in two

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants