DiskFrame
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CRAN-RELEASE‎
Lines changed: 2 additions & 2 deletions b/‎CRAN-RELEASE‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎DESCRIPTION‎
Lines changed: 1 addition & 1 deletion b/‎DESCRIPTION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎NEWS.md‎
Lines changed: 1 addition & 1 deletion b/‎NEWS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎R/recommend_nchunks.r‎
Lines changed: 3 additions & 2 deletions b/‎R/recommend_nchunks.r‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎README.Rmd‎
Lines changed: 16 additions & 11 deletions b/‎README.Rmd‎
Lines changed: 16 additions & 11 deletions
diff --git a/‎README.md‎
Lines changed: 41 additions & 37 deletions b/‎README.md‎
Lines changed: 41 additions & 37 deletions
diff --git a/‎book/02-intro-disk-frame.Rmd‎
Lines changed: 2 additions & 2 deletions b/‎book/02-intro-disk-frame.Rmd‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎book/03-concepts.Rmd‎
Lines changed: 2 additions & 2 deletions b/‎book/03-concepts.Rmd‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎book/08-more-epic.Rmd‎
Lines changed: 0 additions & 30 deletions b/‎book/08-more-epic.Rmd‎
Lines changed: 0 additions & 30 deletions
@@ -51,3 +51,5 @@ cran-mirror
 misc/disk.frame-report_files/
 .httr-oauth
 README_cache
+vignettes/
+README.html
@@ -1,2 +1,2 @@
-This package was submitted to CRAN on 2019-11-23.
-Once it is accepted, delete this file and tag the release (commit a5f1cee4f9).
+This package was submitted to CRAN on 2019-12-18.
+Once it is accepted, delete this file and tag the release (commit 5c386003ef).
@@ -2,7 +2,7 @@ Type: Package
 Package: disk.frame
 Title: Larger-than-RAM Disk-Based Data Manipulation Framework
 Version: 0.3.0
-Date: 2019-12-15
+Date: 2019-12-17
 Authors@R: c(
   person("Dai", "ZJ", email = "zhuojia.dai@gmail.com", role = c("aut", "cre")),
   person("Jacky", "Poon", role = c("ctb"))
 
@@ -1,5 +1,5 @@
 # disk.frame 0.3.0
-* experimental group-by framework!
+* experimental one-stage group-by framework!
 * bug fixes for data.table trigger by integration with tidyfast
 * removed assertthat from imports
 * add benchmarkme to Suggests
 
@@ -120,7 +120,7 @@ df_ram_size <- function() {
 
       if(is.na(ram_size)) {
         warning("RAM size can't be determined. Assume you have 16GB of RAM.")
-        warning("Please report this error github.com/xiaodaigh/disk.frame/issues")
+        warning("Please report this error at github.com/xiaodaigh/disk.frame/issues")
         warning(glue::glue("Please include your operating system, R version, and if using RStudio the Rstudio version number"))
         return(16)
       } else {
@@ -130,7 +130,8 @@ df_ram_size <- function() {
     } else{
       if(is.na(ram_size)) {
         warning("RAM size can't be determined. Assume you have 16GB of RAM.")
-        warning("Please report this error github.com/xiaodaigh/disk.frame/issues")
+        warning("Please try to install install.packages('benchmarkme') and try again.")
+        warning("If error persists, please report this error at github.com/xiaodaigh/disk.frame/issues")
         warning(glue::glue("Please include your operating system, R version, and if using RStudio the Rstudio version number"))
         return(16)
       } else {
 
@@ -15,11 +15,6 @@ knitr::opts_chunk$set(
 ```
 # disk.frame <img src="inst/figures/disk.frame.png" align="right">
 
-<details>
-  <summary>Please take a moment to star the disk.frame Github repo if you like disk.frame. It keeps me going. </summary>
-<iframe src="https://ghbtns.com/github-btn.html?user=xiaodaigh&repo=disk.frame&type=star&count=true&size=large" frameborder="0" scrolling="0" width="160px" height="30px"></iframe>
-</details>
-
 <!-- badges: start -->
 <!-- ![disk.frame logo](inst/figures/disk.frame.png?raw=true "disk.frame logo") -->
 <!-- badges: end -->
@@ -64,7 +59,7 @@ install.packages("disk.frame", repo="https://cran.rstudio.com")
 Please see these vignettes and articles about `{disk.frame}`
 
   - [Quick start:
-    `{disk.frame}`](https://daizj.net/disk.frame/articles/intro-disk-frame.html)
+    `{disk.frame}`](https://diskframe.com/articles/intro-disk-frame.html)
     which replicates the `sparklyr` vignette for manipulating the
     `nycflights13` flights data.
   - [Ingesting data into `{disk.frame}`](https://diskframe.com/articles/ingesting-data.html) which lists some commons way of creating disk.frames
@@ -158,8 +153,8 @@ flights.df %>%
 
 
 
-### Group by
-Starting from {disk.frame} v0.2.2, there is for support `group_by` for a limited set of functions. For example:
+### Group-by
+Starting from `{disk.frame}` v0.3.0, there is for support `group_by` for a limited set of functions. For example:
 
 ```r
 result_from_disk.frame = iris %>% 
@@ -178,11 +173,11 @@ result_from_disk.frame = iris %>%
   collect
 ```
 
-The results should be exactly the same as if applying the same group-by operations on a data.frame. If not then please [report a bug](https://github.com/xiaodaigh/disk.frame/issues).
+The results should be exactly the same as if applying the same group-by operations on a data.frame. If not, please [report a bug](https://github.com/xiaodaigh/disk.frame/issues).
 
 #### List of supported group-by functions
 
-If a function you like is missing, please make a feature request [here](https://github.com/xiaodaigh/disk.frame/issues). It is a limitation that function that depend on the order a column can only obtained using estimated methods.
+If a function you like is missing, please make a feature request [here](https://github.com/xiaodaigh/disk.frame/issues). It is a limitation that function that depend on the order a column can only be obtained using estimated methods.
 
 | Function | Exact/Estimate | Notes |
 | -- | -- | -- |
@@ -304,7 +299,7 @@ Thank you to all our backers! [[Become a backer](https://opencollective.com/disk
 
 <a href="https://opencollective.com/diskframe#backers" target="_blank"><img src="https://opencollective.com/diskframe/backers.svg?width=890"></a>
 
-### Sponsors
+### Sponsor and back `{disk.frame}`
 
 Support `{disk.frame}` development by becoming a sponsor. Your logo will show up here with a link to your website. [[Become a sponsor](https://opencollective.com/diskframe#sponsor)]
 
@@ -315,6 +310,16 @@ Support `{disk.frame}` development by becoming a sponsor. Your logo will show up
 **Do you need help with machine learning and data science in R, Python, or Julia?**
 I am available for Machine Learning/Data Science/R/Python/Julia consulting! [Email me](mailto:dzj@analytixware.com)
 
+## Non-financial ways to contribute
+
+Do you wish to give back the open-source community in non-financial ways? Here are some ways you can contribute
+
+* Write a blogpost about your `{disk.frame}`. I would love to learn more about how `{disk.frame}` has helped you
+* Tweet or post on social media (e.g LinkedIn) about `{disk.frame}` to help promote it
+* Bring attention to typos and grammatical errors by correcting and making a PR. Or simply by [raising an issue here](https://github.com/xiaodaigh/disk.frame/issues)
+* Star the [`{disk.frame}` Github repo](https://github.com/xiaodaigh/disk.frame)
+* Star any repo that `{disk.frame}` depends on e.g. [`{fst}`](https://github.com/fstpackage/fst) and [`{future}`](https://github.com/HenrikBengtsson/future)
+
 ## Download Counts & Build Status
 
 [![](https://cranlogs.r-pkg.org/badges/disk.frame)](https://cran.r-project.org/package=disk.frame)
 
@@ -3,14 +3,6 @@
 
 # disk.frame <img src="inst/figures/disk.frame.png" align="right">
 
-<details>
-
-<summary>Please take a moment to star the disk.frame Github repo if you
-like disk.frame. It keeps me going. </summary>
-<iframe src="https://ghbtns.com/github-btn.html?user=xiaodaigh&repo=disk.frame&type=star&count=true&size=large" frameborder="0" scrolling="0" width="160px" height="30px"></iframe>
-
-</details>
-
 <!-- badges: start -->
 
 <!-- ![disk.frame logo](inst/figures/disk.frame.png?raw=true "disk.frame logo") -->
@@ -63,7 +55,7 @@ install.packages("disk.frame", repo="https://cran.rstudio.com")
 Please see these vignettes and articles about `{disk.frame}`
 
   - [Quick start:
-    `{disk.frame}`](https://daizj.net/disk.frame/articles/intro-disk-frame.html)
+    `{disk.frame}`](https://diskframe.com/articles/intro-disk-frame.html)
     which replicates the `sparklyr` vignette for manipulating the
     `nycflights13` flights data.
   - [Ingesting data into
@@ -225,21 +217,18 @@ flights.df %>%
   filter(year == 2013) %>% 
   mutate(origin_dest = paste0(origin, dest)) %>% 
   head(2)
-#>   year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
-#> 1 2013     1   1      517            515         2      830            819
-#> 2 2013     1   1      533            529         4      850            830
-#>   arr_delay carrier flight tailnum origin dest air_time distance hour minute
-#> 1        11      UA   1545  N14228    EWR  IAH      227     1400    5     15
-#> 2        20      UA   1714  N24211    LGA  IAH      227     1416    5     29
-#>             time_hour origin_dest
-#> 1 2013-01-01 05:00:00      EWRIAH
-#> 2 2013-01-01 05:00:00      LGAIAH
+#>   year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
+#> 1 2013     1   1      517            515         2      830            819        11      UA   1545  N14228
+#> 2 2013     1   1      533            529         4      850            830        20      UA   1714  N24211
+#>   origin dest air_time distance hour minute           time_hour origin_dest
+#> 1    EWR  IAH      227     1400    5     15 2013-01-01 05:00:00      EWRIAH
+#> 2    LGA  IAH      227     1416    5     29 2013-01-01 05:00:00      LGAIAH
 ```
 
-### Group by
+### Group-by
 
-Starting from {disk.frame} v0.2.2, there is for support `group_by` for a
-limited set of functions. For example:
+Starting from `{disk.frame}` v0.3.0, there is for support `group_by` for
+a limited set of functions. For example:
 
 ``` r
 result_from_disk.frame = iris %>% 
@@ -259,14 +248,14 @@ result_from_disk.frame = iris %>%
 ```
 
 The results should be exactly the same as if applying the same group-by
-operations on a data.frame. If not then please [report a
+operations on a data.frame. If not, please [report a
 bug](https://github.com/xiaodaigh/disk.frame/issues).
 
 #### List of supported group-by functions
 
 If a function you like is missing, please make a feature request
 [here](https://github.com/xiaodaigh/disk.frame/issues). It is a
-limitation that function that depend on the order a column can only
+limitation that function that depend on the order a column can only be
 obtained using estimated methods.
 
 | Function     | Exact/Estimate | Notes                                      |
@@ -290,6 +279,7 @@ obtained using estimated methods.
 
 ``` r
 library(data.table)
+#> data.table 1.12.8 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
 #> 
 #> Attaching package: 'data.table'
 #> The following object is masked from 'package:purrr':
@@ -336,31 +326,27 @@ To find out where the disk.frame is stored on disk:
 ``` r
 # where is the disk.frame stored
 attr(flights.df, "path")
-#> [1] "C:\\Users\\RTX2080\\AppData\\Local\\Temp\\Rtmpgv1Q1Y\\filebf052f045d8.df"
+#> [1] "C:\\Users\\RTX2080\\AppData\\Local\\Temp\\Rtmpeoxh5E\\file4c5c517b5f0c.df"
 ```
 
 A number of data.frame functions are implemented for disk.frame
 
 ``` r
 # get first few rows
 head(flights.df, 1)
-#>    year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
-#> 1: 2013     1   1      517            515         2      830            819
-#>    arr_delay carrier flight tailnum origin dest air_time distance hour minute
-#> 1:        11      UA   1545  N14228    EWR  IAH      227     1400    5     15
-#>              time_hour
-#> 1: 2013-01-01 05:00:00
+#>    year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
+#> 1: 2013     1   1      517            515         2      830            819        11      UA   1545  N14228
+#>    origin dest air_time distance hour minute           time_hour
+#> 1:    EWR  IAH      227     1400    5     15 2013-01-01 05:00:00
 ```
 
 ``` r
 # get last few rows
 tail(flights.df, 1)
-#>    year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
-#> 1: 2013     9  30       NA            840        NA       NA           1020
-#>    arr_delay carrier flight tailnum origin dest air_time distance hour minute
-#> 1:        NA      MQ   3531  N839MQ    LGA  RDU       NA      431    8     40
-#>              time_hour
-#> 1: 2013-09-30 08:00:00
+#>    year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
+#> 1: 2013     9  30       NA            840        NA       NA           1020        NA      MQ   3531  N839MQ
+#>    origin dest air_time distance hour minute           time_hour
+#> 1:    LGA  RDU       NA      431    8     40 2013-09-30 08:00:00
 ```
 
 ``` r
@@ -427,7 +413,7 @@ backer](https://opencollective.com/diskframe#backer)\]
 
 <a href="https://opencollective.com/diskframe#backers" target="_blank"><img src="https://opencollective.com/diskframe/backers.svg?width=890"></a>
 
-### Sponsors
+### Sponsor and back `{disk.frame}`
 
 Support `{disk.frame}` development by becoming a sponsor. Your logo will
 show up here with a link to your website. \[[Become a
@@ -442,6 +428,24 @@ or Julia?** I am available for Machine Learning/Data
 Science/R/Python/Julia consulting\! [Email
 me](mailto:dzj@analytixware.com)
 
+## Non-financial ways to contribute
+
+Do you wish to give back the open-source community in non-financial
+ways? Here are some ways you can contribute
+
+  - Write a blogpost about your `{disk.frame}`. I would love to learn
+    more about how `{disk.frame}` has helped you
+  - Tweet or post on social media (e.g LinkedIn) about `{disk.frame}` to
+    help promote it
+  - Bring attention to typos and grammatical errors by correcting and
+    making a PR. Or simply by [raising an issue
+    here](https://github.com/xiaodaigh/disk.frame/issues)
+  - Star the [`{disk.frame}` Github
+    repo](https://github.com/xiaodaigh/disk.frame)
+  - Star any repo that `{disk.frame}` depends on
+    e.g. [`{fst}`](https://github.com/fstpackage/fst) and
+    [`{future}`](https://github.com/HenrikBengtsson/future)
+
 ## Download Counts & Build Status
 
 [![](https://cranlogs.r-pkg.org/badges/disk.frame)](https://cran.r-project.org/package=disk.frame)
 
@@ -226,7 +226,7 @@ The `by` variables that were used to shard the dataset are called the `shardkey`
 
 ## Group-by
 
-`{disk.frame}` implements the `group_by` operation some caveats. In the `{disk.frame}` framework, only a set functions are supported in `summarize`. However, the user can create more custom `group-by` functions can be defined. For more information see [group-by](10-group-by.Rmd)
+`{disk.frame}` implements the `group_by` operation some caveats. In the `{disk.frame}` framework, only a set functions are supported in `summarize`. However, the user can create more custom `group-by` functions can be defined.
 
 ```{r, dependson='asdiskframe'}
 flights.df %>%
@@ -290,7 +290,7 @@ flights.df %>%
 
 `{disk.frame}` supports all `data.frame` operations, unlike Spark which can only perform those operations that Spark has implemented. Hence windowing functions like `min_rank` and `rank` are supported out of the box. 
 
-For the following example, we will use the `hard_group_by` which performs a group-by and also reorganises the chunks so that all records with the same `year`, `month`, and `day` end up in the same chunk. This is typically not adviced, as `hard_group_by` can be slow for large datasets.
+For the following example, we will use the `hard_group_by` which performs a group-by and also reorganises the chunks so that all records with the same `year`, `month`, and `day` end up in the same chunk. This is typically not advised, as `hard_group_by` can be slow for large datasets.
 
 ```{r, dependson='asdiskframe'}
 # Find the most and least delayed flight each day
 
@@ -60,9 +60,9 @@ future::nbrOfWorkers()
 
 ## How `{disk.frame}` works
 
-When `df %>% some_fn %>% collect` is callled. The `some_fn` is applied to each chunk of `df`. The collect will row-bind the results from `some_fn(chunk)`together if the returned value of `some_fn` is a data.frame, or it will return a `list` containing the results of `some_fn`.
+When `df %>% some_fn %>% collect` is called. The `some_fn` is applied to each chunk of `df`. The collect will row-bind the results from `some_fn(chunk)`together if the returned value of `some_fn` is a data.frame, or it will return a `list` containing the results of `some_fn`.
 
-The session that receives these results is called the **main session**. In general, we should try to minimise the amount of data passed from the worker sessions back to the main session, because passing data around can be slow.
+The session that receives these results is called the **main session**. In general, we should try to minimize the amount of data passed from the worker sessions back to the main session, because passing data around can be slow.
 
 Also, please note that there is no communication between the workers, except for workers passing data back to the main session.
 
 
@@ -244,33 +244,3 @@ So there you go! {disk.frame} can be even more "epic"! Here are the two main tak
 1. Load CSV files as many individual files if possible to take advantage of multi-core parallelism
 2. `srckeep` is your friend! Disk IO is often the bottleneck in data manipulation, and you can reduce disk IO by specifying only columns that you will use with `srckeep(c(columns1, columns2, ...))`.
 
-## Advertisements
-
-### Interested in learning {disk.frame} in a structured course?
-
-Please register your interest at:
-
-https://leanpub.com/c/taminglarger-than-ramwithdiskframe
-
-### Open Collective 
-
-If you like disk.frame and want to speed up its development or perhaps you have a feature request? Please consider sponsoring {disk.frame} on Open Collective. Your logo will show up here with a link to your website.
-
-#### Backers
-
-Thank you to all our backers! 🙏 [[Become a backer](https://opencollective.com/diskframe#backer)]
-
-<a href="https://opencollective.com/diskframe#backers" target="_blank"><img src="https://opencollective.com/diskframe/backers.svg?width=890"></a>
-
-[![Backers on Open Collective](https://opencollective.com/diskframe/backers/badge.svg)](#backers)
-
-#### Sponsors
-
- [[Become a sponsor](https://opencollective.com/diskframe#sponsor)]
-
- [![Sponsors on Open Collective](https://opencollective.com/diskframe/sponsors/badge.svg)](#sponsors) 
-
-### Contact me for consulting
-
-**Do you need help with machine learning and data science in R, Python, or Julia?**
-I am available for Machine Learning/Data Science/R/Python/Julia consulting! [Email me](mailto:dzj@analytixware.com)