fix for "cannot allocate vecotr of size"
More package author’s introduction, please access this link
Instead of loading everything at once into your RAM, you divide your data into chunks. To quote author of the disk.frame
package: “we go from”R can only deal with data that fits in RAM" to “R can deal with any data that fits on disk”." While data.frame
uses in-RAM to process, disk.frame
uses hard drive to store and process data.
disk.frame
also allows parallel processing.
library("disk.frame")
# setup_disk.frame() # sets up background workers equal to the number of CPU cores
setup_disk.frame(workers = 2) # or you number of workers
options(future.globals.maxSize = Inf) # large dataset can be transferred between sessions
# attr(data.df, "path") # path to where the disk.frame is stored
# to convert data.frame to a disk.frame
data.df <- as.disk.frame(original_data_frame)
# to convert one large CSV
# takes care of splitting large CSV into smaller ones
diskf <- disk.frame::csv_to_disk.frame(path_to_csv_file) # you can also specify, outdir = , overwrite = T.
# to convert multiple CSV
multiple_CSV = c(path_to_csv_file1,path_to_csv_file2)
diskf = disk.frame::csv_to_disk.frame(multiple_CSV)
# for faster performance, specify which column to manipulate
result = df %?%
srckeep(c("column1","column2")) %>%
dplyr::filter()