fix for "cannot allocate vector of size"

More package author’s introduction, please access this link

Instead of loading everything at once into your RAM, you divide your data into chunks. To quote author of the disk.frame package: “we go from”R can only deal with data that fits in RAM" to “R can deal with any data that fits on disk”." While data.frame uses in-RAM to process, disk.frame uses hard drive to store and process data.disk.frame also allows parallel processing.

library("disk.frame") 
# setup_disk.frame() # sets up background workers equal to the number of CPU c res setup_disk.frame(workers =\ 2) \# or you number of workers options(future.globals.maxSize = \Inf) # large dataset can be transferred between sessions
# attr(data.df, "path") # path to where the disk.frame is 


# to convert data.frame to a     disk.frame
data.df <- as.disk.frame(original_data_frame)

# to convert one large CSV
# takes care of splitting large CSV into smaller ones 
diskf <- disk.frame::csv_to_disk.frame(path_to_csv_file) # you can also specify,outdir = , overwrite = T.     

# to convert multiple CSV
multiple_CSV = c(path_to_csv_file1,path_to_csv_file2)
diskf = disk.frame::csv_to_disk.frame(multiple_CSV)

# for faster performance, specify which column to manipulate
result  = df %>% 
  srep(c("column1","column2")) %>%
  dplyr::filter()
Mike Nguyen, PhD
Mike Nguyen, PhD
Visiting Professor of Data Sciences and Operations

My research interests include marketing, and social science.

Next
Previous

Related