fix for "cannot allocate vector of size"
More package author’s introduction, please access this link
Instead of loading everything at once into your RAM, you divide your data into chunks.
To quote author of the disk.frame
package: “we go from”R can only deal with data that fits in RAM" to “R can deal with any data that fits on disk”." While data.frame
uses in-RAM to process, disk.frame
uses hard drive to store and process data.disk.frame
also allows parallel processing.
library("disk.frame")
# setup_disk.frame() # sets up background workers equal to the number of CPU c res setup_disk.frame(workers =\ 2) \# or you number of workers options(future.globals.maxSize = \Inf) # large dataset can be transferred between sessions
# attr(data.df, "path") # path to where the disk.frame is
# to convert data.frame to a disk.frame
data.df <- as.disk.frame(original_data_frame)
# to convert one large CSV
# takes care of splitting large CSV into smaller ones
diskf <- disk.frame::csv_to_disk.frame(path_to_csv_file) # you can also specify,outdir = , overwrite = T.
# to convert multiple CSV
multiple_CSV = c(path_to_csv_file1,path_to_csv_file2)
diskf = disk.frame::csv_to_disk.frame(multiple_CSV)
# for faster performance, specify which column to manipulate
result = df %>%
srep(c("column1","column2")) %>%
dplyr::filter()