Apache Arrow

more information can be found in URSA Labs

This example is from Arrow Vignettes

arrow

best when working with big data

Prep

library("arrow", warn.conflicts = FALSE)
library("dplyr", warn.conflicts = FALSE)

check if S3 support is included.

arrow::arrow_with_s3()

If TRUE, sync data locally import from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

arrow::copy_files("s3://ursa-labs-taxi-data", "nyc-taxi")

since the data is in Parquet format, we use

ds <- open_dataset("nyc-taxi", partitioning = c("year", "month"))

then you can start using data set as usual

ds
Mike Nguyen, PhD
Mike Nguyen, PhD
Visitng Scholar

My research interests include marketing, and social science.

Next
Previous

Related