My 2 Cents on Sharing Your Research (Or How Not to Get Lost in the Data Jungle)
Sharing research data is like hosting a party. You want everything to be in the right place, accessible to everyone, but not too chaotic.
Tools that can help you create experiments at scale:
Empirica (recommend)
nodegame
otree: Python-based
Lioness Lab
BreadBoard: network
Turktool: survey for Amazon Mturk
Jspsych: JavaScript-based
PsiTurk: Survey for Amazon Mturk
To solve the problem of scaling:
From DevOps/IT:
Add memory, CPU
Rstudio Connect set up for multiple machines
From R/Shiny engineer:
use Javascript for less CPU usage
extract computations: Shiny worker, Plumber
use a database
To have faster performance with Shiny App, you can pass this command to yoru script.
# %>% bindCache()
Connect from R to Wharton Research Data Services
to set up connection from R to WRDS (here)
library(RPostgres)
library(tidyverse)
# I've set up wrds connection before hand. # Please use your username and password here.
Comprehensive patent data can be found here
United States
NBER patent data or link
Search link for individual patent: link
Patent API
USPTO - United States patent and Trademark Office
Patent ranking by orgs
Bulk Data Storage System: repository for raw public bulk data
For Researcher
Patent Assignment Dataset details information of patent assignment since 1970 with schema and description and code
Pre-Grant Publications Data Download Tables with example code note that organizaiton here is different from Compustat and CRSP, hard to match.
Information can be found in CRSP/COMPUSTAT MERGED DATABASE GUIDE
Change Identifiers:
Ticker: can be reassign to another company - abbreviation used to uniquely identify publicly-traded shares of a stock
CUSIP: A company can have multiple CUSIPS due to structural changes.