My 2 Cents on Sharing Your Research (Or How Not to Get Lost in the Data Jungle)
Sharing research data is like hosting a party. You want everything to be in the right place, accessible to everyone, but not too chaotic.
Tools that can help you create experiments at scale:
Empirica (recommend)
nodegame
otree: Python-based
Lioness Lab
BreadBoard: network
Turktool: survey for Amazon Mturk
Jspsych: JavaScript-based
PsiTurk: Survey for Amazon Mturk
To solve the problem of scaling:
From DevOps/IT:
Add memory, CPU
Rstudio Connect set up for multiple machines
From R/Shiny engineer:
use Javascript for less CPU usage
extract computations: Shiny worker, Plumber
use a database
To have faster performance with Shiny App, you can pass this command to yoru script.
# %>% bindCache()
Connect from R to Wharton Research Data Services
# to set up connection from R to WRDS (https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-r/r-from-your-computer/)
library(RPostgres)
library(dplyr)
# I've set up wrds connection before hand. Please use your username and password here.
Comprehensive patent data can be found here
United States
NBER patent data or link
Search link for individual patent: link
Patent API
USPTO - United States patent and Trademark Office
Patent ranking by orgs
Bulk Data Storage System: repository for raw public bulk data
For Researcher
Patent Assignment Dataset details information of patent assignment since 1970 with schema and description and code
Pre-Grant Publications Data Download Tables with example code note that organizaiton here is different from Compustat and CRSP, hard to match.
Information can be found in CRSP/COMPUSTAT MERGED DATABASE GUIDE
Change Identifiers:
Ticker: can be reassign to another company - abbreviation used to uniquely identify publicly-traded shares of a stock
CUSIP: A company can have multiple CUSIPS due to structural changes.