# Working Papers

**Identifying Prediction Mistakes in Observational Data**. 2022.

[ Abstract | Draft | Supplement ]

Decision makers, such as doctors, judges, and managers, make consequential choices based on predictions of unknown outcomes. Do these decision makers make systematic prediction mistakes based on the available information? If so, in what ways are their predictions systematically biased? Uncovering systematic prediction mistakes is difficult as the preferences and information sets of decision makers are unknown to researchers. In this paper, I characterize behavioral and econometric assumptions under which systematic prediction mistakes can be identified in empirical settings such as hiring, pretrial release, and medical testing. I derive a statistical test for whether the decision maker makes systematic prediction mistakes under these assumptions and show how supervised machine learning based models can be used to apply this test. I provide methods for conducting inference on the ways in which the decision maker's predictions are systematically biased. As an illustration, I apply this econometric framework to analyze the pretrial release decisions of judges in New York City, and I estimate that at least 20% of judges make systematic prediction mistakes about failure to appear risk given defendant characteristics.

**Included and Excluded Instruments in Structural Estimation** (with Isaiah Andrews, Nano Barahona, Matthew Gentzkow and Jesse Shapiro). 2022.

[ Abstract | Draft ]

We consider the choice of instrumental variables when a researcher’s structural model may be misspecified. We contrast included instruments, which have a direct causal effect on the outcome holding constant the endogenous variable of interest, with excluded instruments, which do not. We show conditions under which the researcher’s estimand maintains an interpretation in terms of causal effects of the endogenous variable under excluded instruments but not under included instruments. We apply our framework to estimation of a linear instrumental variables model, and of differentiated goods demand models under price endogeneity. We show that the distinction between included and excluded instruments is quantitatively important in simulations based on an application. We extend our results to a dynamic setting by studying estimation of production function parameters under input endogeneity.

**Design-Based Uncertainty for Quasi-Experiments** (with Jonathan Roth). 2022.

[ Abstract | Draft ]

Conventional standard errors reflect the fact that the observed data is sampled from an infinite super-population, but this approach to uncertainty may be unnatural in settings where all units in the population are observed (e.g. all 50 U.S. states). In such settings, it may be more natural to view the uncertainty as design-based, i.e. arising from the stochastic assignment of treatment. This paper develops a design-based framework for uncertainty that is suitable for analyzing ``quasi-experimental'' settings commonly studied in economics. A key feature of our framework is that each unit has an idiosyncratic probability of receiving treatment, but these idiosyncratic probabilities are unknown to the researcher. We derive conditions under which difference-in-differences (DiD) and related estimators are unbiased for an interpretable causal estimand. When the DiD estimator is unbiased, conventional confidence intervals are valid but potentially conservative in large populations. An interesting feature of our setting is that conventional standard errors tend to be more conservative when treatment probabilities differ across units, which helps to mitigate undercoverage from bias. As a result, conventional confidence intervals for DiD can potentially still have correct coverage even if the design-based analog to parallel trends does not hold exactly. Our results also have implications for the appropriate level to cluster standard errors and for the analysis of instrumental variables.

**Counterfactual Risk Assessments under Unmeasured Confounding** (with Amanda Coston and Edward Kennedy). 2022.

[ Abstract | Draft ]

Statistical risk assessments inform consequential decisions such as pretrial release in criminal justice, and loan approvals in consumer finance. Such risk assessments make counterfactual predictions, predicting the likelihood of an outcome under a proposed decision (e.g., what would happen if we approved this loan?). A central challenge, however, is that there may have been unobserved confounders that jointly affected past decisions and outcomes in the historical data. This paper therefore proposes a tractable mean outcome sensitivity model that bounds the extent to which unmeasured confounders could affect outcomes on average. Under the mean outcome sensitivity model, the conditional likelihood of the outcome under the proposed decision, popular predictive performance metrics (accuracy, calibration, TPR, FPR, etc.), and commonly-used predictive disparities are partially identified, and we derive their sharp identified sets. We then solve three tasks that are essential to deploying statistical risk assessments in high-stakes settings. First, we propose a learning procedure based on doubly-robust pseudo-outcomes that estimates bounds on the conditional likelihood of the outcome under the proposed decision, and derive a bound on its integrated mean square error. Second, we show how our estimated bounds on the conditional likelihood of the outcome under the proposed decision can be translated into a robust decision-making policy, and derive bounds on its worst-case regret relative to the max-min optimal decision rule. Third, we develop estimators of the bounds on the predictive performance metrics of existing risk assessment that are based on efficient influence functions and cross-fitting, and only require black-box access to the risk assessment.

**An Economic Approach to Regulating Algorithms** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). 2021.

[ Abstract | Draft | Slides ]

There is growing concern about "algorithmic bias" - that predictive algorithms used in decision-making might bake in or exacerbate discrimination in society. We argue that such concerns are naturally addressed using the tools of welfare economics. This approach overturns prevailing wisdom about the remedies for algorithmic bias. First, when a social planner builds the algorithm herself, her equity preference has no effect on the training procedure. So long as the data, however biased, contain signal, they will be used and the learning algorithm will be the same. Equity preferences alone provide no reason to alter how information is extracted from data - only how that information enters decision-making. Second, when private (possibly discriminatory) actors are the ones building algorithms, optimal regulation involves algorithmic disclosure but otherwise no restriction on training procedures. Under such disclosure, the use of algorithms strictly reduces the extent of discrimination relative to a world in which humans make all the decisions.

**When do Common Time Series Estimands have Nonparametric Causal Meaning?** (with Neil Shephard). 2021.

[ Abstract | Draft ]

In this paper, we introduce the nonparametric, direct potential outcome system as a foundational framework for analyzing dynamic causal effects of assignments on outcomes in observational time series settings. Using this framework, we provide conditions under which common predictive time series estimands, such as the impulse response function, generalized impulse response function, local projection, and local projection instrument variables, have a nonparametric causal interpretation in terms of such dynamic causal effects.

**A More Credible Approach to Parallel Trends** (with Jonathan Roth). 2022. *Conditionally accepted, The Review of Economic Studies.*

[ Abstract | Draft | R package ]

This paper proposes tools for robust inference in difference-in-differences and event-study designs where the parallel trends assumption may be violated. Instead of requiring that parallel trends holds exactly, we impose restrictions on how different the post-treatment violations of parallel trends can be from the pre-treatment differences in trends ("pre-trends"). The causal parameter of interest is partially identified under these restrictions. We introduce two approaches that guarantee uniformly valid inference under the imposed restrictions, and we derive novel results showing that they have desirable power properties in our context. We illustrate how economic knowledge can inform the restrictions on the possible violations of parallel trends in two economic applications. We also highlight how our approach can be used to conduct sensitivity analyses showing what causal conclusions can be drawn under various restrictions on the possible violations of the parallel trends assumption.

# Publications

**Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective** (with Iavor Bojinov and Neil Shephard). 2021. Quantitative Economics.

[ Abstract | Draft | Published Version | Slides ]

In panel experiments, we randomly assign units to different interventions, measuring their outcomes, and repeating the procedure in several periods. Using the potential outcomes framework, we define finite population dynamic causal effects that capture the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects, we provide a nonparametric estimator that is unbiased over the randomization distribution and derive its finite population limiting distribution as either the sample size or the duration of the experiment increases. We develop two methods for inference: a conservative test for weak null hypotheses and an exact randomization test for sharp null hypotheses. We further analyze the finite population probability limit of linear fixed effects estimators. These commonly-used estimators do not recover a causally interpretable estimand if there are dynamic causal effects and serial correlation in the assignments, highlighting the value of our proposed estimator.

**Characterizing Fairness over the Set of Good Models under Selective Labels** (with Amanda Coston and Alexandra Chouldechova). 2021. International Conference on Machine Learning (ICML 2021).

[ Abstract | Draft | Published Version ]

Algorithmic risk assessments are used to inform decisions in a wide variety of high-stakes settings. Often multiple predictive models deliver similar overall performance but differ markedly in their predictions for individual cases, an empirical phenomenon known as the "Rashomon Effect." These models may have different properties over various groups, and therefore have different predictive fairness properties. We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or "the set of good models." Our framework addresses the empirically relevant challenge of selectively labelled data in the setting where the selection decision and outcome are unconfounded given the observed data features. Our framework can be used to 1) replace an existing model with one that has better fairness properties; or 2) audit for predictive bias. We illustrate these uses cases on a real-world credit-scoring task and a recidivism prediction task.

**An Economic Perspective on Algorithmic Fairness** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). 2020. AEA Papers and Proceedings, 110, Pp. 91-95. *Non-refereed*.

[ Abstract | Published Version ]

There are widespread concerns that the growing use of machine learning algorithms in important decisions may reproduce and reinforce existing discrimination against legally protected groups. Most of the attention to date on issues of "algorithmic bias" or "algorithmic fairness" has come from computer scientists and machine learning researchers. We argue that concerns about algorithmic fairness are at least as much about questions of how discrimination manifests itself in data, decision-making under uncertainty, and optimal regulation. To fully answer these questions, an economic framework is necessary—and as a result, economists have much to contribute.

**Bias In, Bias Out? Evaluating the Folk Wisdom** (with Jonathan Roth). 2020. 1st Symposium on the Foundations of Responsible Computing (FORC 2020), LIPIcs, 156, Pp. 6:1-6:15.

[ Abstract | Draft | Published Version ]

We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as bias reversal. We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.

**Algorithmic Fairness** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). 2018. AEA Papers and Proceedings, 108, Pp. 22-27. *Non-refereed*.

[ Abstract | Published Version ]

Concerns that algorithms may discriminate against certain groups have led to numerous efforts to 'blind' the algorithm to race. We argue that this intuitive perspective is misleading and may do harm. Our primary result is exceedingly simple, yet often overlooked. A preference for fairness should not change the choice of estimator. Equity preferences can change how the estimated prediction function is used (e.g., different threshold for different groups) but the function itself should not change. We show in an empirical example for college admissions that the inclusion of variables such as race can increase both equity and efficiency.