Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Changes
- The row indices of training and testing samples are now also included in the
dictionaries produced by :meth:`DataOp.skb.iter_cv_splits`. :pr:`2012` by
:user:`Jérôme Dockès <jeromedockes>`.
- :func:`fetch_toxicity_dataset` now returns a shuffled version of the dataset by default.
:pr:`1892` by user:`Riccardo Cappuzzo <rcap107>`.

Bugfixes
--------
Expand Down Expand Up @@ -133,11 +135,6 @@ Changes
:pr:`1819` by :user:`Eloi Massoulié <emassoulie>`
- :func:`compute_ngram_distance` has been renamed to :func:`_compute_ngram_distance` and is now a private function.
:pr:`1838` by :user:`Siddharth Baleja <siddharthbaleja>`.
- The repository wheel has been made smaller by removing some material that was
not necessary for using the library. Benchmarks are now available in a separate
`repository <https://github.com/skrub-data/skrub-benchmarks>`__.
:pr:`1893` by :user:`Riccardo Cappuzzo <rcap107>`.


Bugfixes
--------
Expand Down
2 changes: 1 addition & 1 deletion skrub/datasets/_fetching.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ def fetch_toxicity(data_home=None):
path : str
The path to the toxicity CSV file.
"""
return load_simple_dataset("toxicity", data_home)
return load_simple_dataset("toxicity_shuffled", data_home)


def fetch_videogame_sales(data_home=None):
Expand Down
7 changes: 7 additions & 0 deletions skrub/datasets/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,13 @@
],
"sha256": "ee187c119925ea4cdb9abd7f0f3758159f042e71b172cafe5b784d79c7590ce3",
},
"toxicity_shuffled": {
"urls": [
"https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/toxicity_shuffled.zip",
"https://osf.io/download/zebm7",
],
"sha256": "01382d19987c04faab8c7b10dfa87719ae4af273dba08a48a95b0c9a69aeb009",
},
"traffic_violations": {
"urls": [
"https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/traffic_violations.zip",
Expand Down
2 changes: 1 addition & 1 deletion skrub/datasets/tests/test_fetching.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def test_fetch_employee_salaries():
("midwest_survey", (2494, 29)),
("open_payments", (73558, 6)),
("traffic_violations", (1578154, 43)),
("toxicity", (1000, 2)),
("toxicity_shuffled", (1000, 2)),
("videogame_sales", (16572, 11)),
("bike_sharing", (17379, 11)),
],
Expand Down
Loading