fix for "cannot allocate vector of size"
More package author’s introduction, please access this link
Instead of loading everything at once into your RAM, you divide your data into chunks.
To quote author of the
disk.frame package: “we go from”R can only deal with data that fits in RAM" to “R can deal with any data that fits on disk”." While
data.frame uses in-RAM to process,
disk.frame uses hard drive to store and process data.
disk.frame also allows parallel processing.
library("disk.frame") # setup_disk.frame() # sets up background workers equal to the number of CPU c res setup_disk.frame(workers =\ 2) \# or you number of workers options(future.globals.maxSize = \Inf) # large dataset can be transferred between sessions # attr(data.df, "path") # path to where the disk.frame is # to convert data.frame to a disk.frame data.df <- as.disk.frame(original_data_frame) # to convert one large CSV # takes care of splitting large CSV into smaller ones diskf <- disk.frame::csv_to_disk.frame(path_to_csv_file) # you can also specify,outdir = , overwrite = T. # to convert multiple CSV multiple_CSV = c(path_to_csv_file1,path_to_csv_file2) diskf = disk.frame::csv_to_disk.frame(multiple_CSV) # for faster performance, specify which column to manipulate result = df %>% srep(c("column1","column2")) %>% dplyr::filter()