Apache Arrow
more information can be found in URSA Labs
This example is from Arrow Vignettes
arrow
best when working with big data
Prep
library("arrow", warn.conflicts = FALSE)
library("dplyr", warn.conflicts = FALSE)
check if S3 support is included.
arrow::arrow_with_s3()
If TRUE, sync data locally import from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
arrow::copy_files("s3://ursa-labs-taxi-data", "nyc-taxi")
since the data is in Parquet format, we use
ds <- open_dataset("nyc-taxi", partitioning = c("year", "month"))
then you can start using data set as usual
ds