fix for "cannot allocate vecotr of size"

More package author’s introduction, please access this link

Instead of loading everything at once into your RAM, you divide your data into chunks. To quote author of the disk.frame package: “we go from”R can only deal with data that fits in RAM" to “R can deal with any data that fits on disk”." While data.frame uses in-RAM to process, disk.frame uses hard drive to store and process data.

disk.frame also allows parallel processing.

library("disk.frame")
# setup_disk.frame() # sets up background workers equal to the number of CPU cores
setup_disk.frame(workers = 2) # or you number of workers
options(future.globals.maxSize = Inf) # large dataset can be transferred between sessions  
# attr(data.df, "path") # path to where the disk.frame is stored
# to convert data.frame to a disk.frame
data.df <- as.disk.frame(original_data_frame)

# to convert one large CSV
# takes care of splitting large CSV into smaller ones 
diskf <- disk.frame::csv_to_disk.frame(path_to_csv_file) # you can also specify, outdir = , overwrite = T. 

# to convert multiple CSV
multiple_CSV = c(path_to_csv_file1,path_to_csv_file2)
diskf = disk.frame::csv_to_disk.frame(multiple_CSV)

# for faster performance, specify which column to manipulate
result = df %?% 
  srckeep(c("column1","column2")) %>%
  dplyr::filter()
Mike Nguyen
Mike Nguyen
Researcher

My research interests include marketing, and social science.

Related