# Working Papers

**From Predictive Algorithms to Automatic Generation of Anomalies** (with Sendhil Mullainathan). Updated April 2023.

[ Abstract | Draft ]

We ask how machine learning can change a crucial step of the scientific process in economics: the advancement of theories through the discovery of "anomalies." Canonical examples of anomalies include the Allais Paradox and the Kahneman-Tversky choice experiments, which are concrete examples of menus of lotteries that highlighted flaws in expected utility theory and spurred the development of new theories for decision-making under uncertainty. We develop an econometric framework for anomaly generation and develop two algorithmic procedures to generate anomalies (if they exist) when provided a formal theory and data that the theory seeks to explain. Our algorithmic procedures are general since anomalies play an important role across a wide variety of fields in economics. As an illustration, we apply our procedures to generate anomalies for expected utility theory using simulated lottery choice data by an individual who behaves according to cumulative prospect theory. We produce novel anomalies for the independence axiom based on the probability weighting function that to our knowledge have not been noticed before. While this illustration is specific, it is our view that automatic anomaly generation can accelerate the development of new theories.

**Counterfactual Risk Assessments under Unmeasured Confounding** (with Amanda Coston and Edward Kennedy). Updated Feb 2023.

[ Abstract | Draft ]

Statistical risk assessments inform consequential decisions such as pretrial release in criminal justice, and loan approvals in consumer finance. Such risk assessments make counterfactual predictions, predicting the likelihood of an outcome under a proposed decision (e.g., what would happen if we approved this loan?). A central challenge, however, is that there may have been unmeasured confounders that jointly affected past decisions and outcomes in the historical data. This paper proposes a tractable mean outcome sensitivity model that bounds the extent to which unmeasured confounders could affect outcomes on average. The mean outcome sensitivity model partially identifies the conditional likelihood of the outcome under the proposed decision, popular predictive performance metrics (e.g., accuracy, calibration, TPR, FPR), and commonly-used predictive disparities. We derive their sharp identified sets, and we then solve three tasks that are essential to deploying statistical risk assessments in high-stakes settings. First, we propose a doubly-robust learning procedure for the bounds on the conditional likelihood of the outcome under the proposed decision. Second, we translate our estimated bounds on the conditional likelihood of the outcome under the proposed decision into a robust, plug-in decision-making policy. Third, we develop doubly-robust estimators of the bounds on the predictive performance of an existing risk assessment. We apply our methods to analyze a real-world credit-scoring task, illustrating how varying assumptions on unmeasured confounding leads to substantive changes in the credit score's predictions and evaluations of its predictive disparities.

**Design-Based Uncertainty for Quasi-Experiments** (with Jonathan Roth). Updated Nov 2022.

[ Abstract | Draft ]

Conventional standard errors reflect the fact that the observed data is sampled from an infinite super-population, but this approach to uncertainty may be unnatural in settings where all units in the population are observed (e.g. all 50 U.S. states). In such settings, it may be more natural to view the uncertainty as design-based, i.e. arising from the stochastic assignment of treatment. This paper develops a design-based framework for uncertainty that is suitable for analyzing ``quasi-experimental'' settings commonly studied in economics. A key feature of our framework is that each unit has an idiosyncratic probability of receiving treatment, but these idiosyncratic probabilities are unknown to the researcher. We derive conditions under which difference-in-differences (DiD) and related estimators are unbiased for an interpretable causal estimand. When the DiD estimator is unbiased, conventional confidence intervals are valid but potentially conservative in large populations. An interesting feature of our setting is that conventional standard errors tend to be more conservative when treatment probabilities differ across units, which helps to mitigate undercoverage from bias. As a result, conventional confidence intervals for DiD can potentially still have correct coverage even if the design-based analog to parallel trends does not hold exactly. Our results also have implications for the appropriate level to cluster standard errors and for the analysis of instrumental variables.

**Identifying Prediction Mistakes in Observational Data**. Updated Oct 2022. *Revision requested, The Quarterly Journal of Economics.*

[ Abstract | Draft | Supplement ]

Decision makers, such as doctors, judges, and managers, make consequential choices based on predictions of unknown outcomes. Do these decision makers make systematic prediction mistakes based on the available information? If so, in what ways are their predictions systematically biased? Uncovering systematic prediction mistakes is difficult as the preferences and information sets of decision makers are unknown to researchers. In this paper, I characterize behavioral and econometric assumptions under which systematic prediction mistakes can be identified in empirical settings such as hiring, pretrial release, and medical testing. I derive a statistical test for whether the decision maker makes systematic prediction mistakes under these assumptions and show how supervised machine learning based models can be used to apply this test. I provide methods for conducting inference on the ways in which the decision maker's predictions are systematically biased. As an illustration, I apply this econometric framework to analyze the pretrial release decisions of judges in New York City, and I estimate that at least 20% of judges make systematic prediction mistakes about failure to appear risk given defendant characteristics.

**Included and Excluded Instruments in Structural Estimation** (with Isaiah Andrews, Nano Barahona, Matthew Gentzkow and Jesse Shapiro). Updated April 2022.

[ Abstract | Draft ]

We consider the choice of instrumental variables when a researcher’s structural model may be misspecified. We contrast included instruments, which have a direct causal effect on the outcome holding constant the endogenous variable of interest, with excluded instruments, which do not. We show conditions under which the researcher’s estimand maintains an interpretation in terms of causal effects of the endogenous variable under excluded instruments but not under included instruments. We apply our framework to estimation of a linear instrumental variables model, and of differentiated goods demand models under price endogeneity. We show that the distinction between included and excluded instruments is quantitatively important in simulations based on an application. We extend our results to a dynamic setting by studying estimation of production function parameters under input endogeneity.

**When do Common Time Series Estimands have Nonparametric Causal Meaning?** (with Neil Shephard). Updated Oct 2021.

[ Abstract | Draft ]

In this paper, we introduce the nonparametric, direct potential outcome system as a foundational framework for analyzing dynamic causal effects of assignments on outcomes in observational time series settings. Using this framework, we provide conditions under which common predictive time series estimands, such as the impulse response function, generalized impulse response function, local projection, and local projection instrument variables, have a nonparametric causal interpretation in terms of such dynamic causal effects.

**An Economic Approach to Regulating Algorithms** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). Updated Jan 2021.

[ Abstract | Draft | Slides ]

There is growing concern about "algorithmic bias" - that predictive algorithms used in decision-making might bake in or exacerbate discrimination in society. We argue that such concerns are naturally addressed using the tools of welfare economics. This approach overturns prevailing wisdom about the remedies for algorithmic bias. First, when a social planner builds the algorithm herself, her equity preference has no effect on the training procedure. So long as the data, however biased, contain signal, they will be used and the learning algorithm will be the same. Equity preferences alone provide no reason to alter how information is extracted from data - only how that information enters decision-making. Second, when private (possibly discriminatory) actors are the ones building algorithms, optimal regulation involves algorithmic disclosure but otherwise no restriction on training procedures. Under such disclosure, the use of algorithms strictly reduces the extent of discrimination relative to a world in which humans make all the decisions.

# Publications

**A More Credible Approach to Parallel Trends** (with Jonathan Roth). 2023. The Review of Economic Studies.

[ Abstract | Draft | R package ]

This paper proposes tools for robust inference in difference-in-differences and event-study designs where the parallel trends assumption may be violated. Instead of requiring that parallel trends holds exactly, we impose restrictions on how different the post-treatment violations of parallel trends can be from the pre-treatment differences in trends ("pre-trends"). The causal parameter of interest is partially identified under these restrictions. We introduce two approaches that guarantee uniformly valid inference under the imposed restrictions, and we derive novel results showing that they have desirable power properties in our context. We illustrate how economic knowledge can inform the restrictions on the possible violations of parallel trends in two economic applications. We also highlight how our approach can be used to conduct sensitivity analyses showing what causal conclusions can be drawn under various restrictions on the possible violations of the parallel trends assumption.

**Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective** (with Iavor Bojinov and Neil Shephard). 2021. Quantitative Economics.

[ Abstract | Draft | Published Version | Slides ]

In panel experiments, we randomly assign units to different interventions, measuring their outcomes, and repeating the procedure in several periods. Using the potential outcomes framework, we define finite population dynamic causal effects that capture the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects, we provide a nonparametric estimator that is unbiased over the randomization distribution and derive its finite population limiting distribution as either the sample size or the duration of the experiment increases. We develop two methods for inference: a conservative test for weak null hypotheses and an exact randomization test for sharp null hypotheses. We further analyze the finite population probability limit of linear fixed effects estimators. These commonly-used estimators do not recover a causally interpretable estimand if there are dynamic causal effects and serial correlation in the assignments, highlighting the value of our proposed estimator.

**Characterizing Fairness over the Set of Good Models under Selective Labels** (with Amanda Coston and Alexandra Chouldechova). 2021. International Conference on Machine Learning (ICML 2021).

[ Abstract | Draft | Published Version ]

Algorithmic risk assessments are used to inform decisions in a wide variety of high-stakes settings. Often multiple predictive models deliver similar overall performance but differ markedly in their predictions for individual cases, an empirical phenomenon known as the "Rashomon Effect." These models may have different properties over various groups, and therefore have different predictive fairness properties. We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or "the set of good models." Our framework addresses the empirically relevant challenge of selectively labelled data in the setting where the selection decision and outcome are unconfounded given the observed data features. Our framework can be used to 1) replace an existing model with one that has better fairness properties; or 2) audit for predictive bias. We illustrate these uses cases on a real-world credit-scoring task and a recidivism prediction task.

**An Economic Perspective on Algorithmic Fairness** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). 2020. AEA Papers and Proceedings, 110, Pp. 91-95. *Non-refereed*.

[ Abstract | Published Version ]

There are widespread concerns that the growing use of machine learning algorithms in important decisions may reproduce and reinforce existing discrimination against legally protected groups. Most of the attention to date on issues of "algorithmic bias" or "algorithmic fairness" has come from computer scientists and machine learning researchers. We argue that concerns about algorithmic fairness are at least as much about questions of how discrimination manifests itself in data, decision-making under uncertainty, and optimal regulation. To fully answer these questions, an economic framework is necessary—and as a result, economists have much to contribute.

**Bias In, Bias Out? Evaluating the Folk Wisdom** (with Jonathan Roth). 2020. 1st Symposium on the Foundations of Responsible Computing (FORC 2020), LIPIcs, 156, Pp. 6:1-6:15.

[ Abstract | Draft | Published Version ]

We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as bias reversal. We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.

**Algorithmic Fairness** (with Jon Kleinberg, Jens Ludwig and Sendhil Mullainathan). 2018. AEA Papers and Proceedings, 108, Pp. 22-27. *Non-refereed*.

[ Abstract | Published Version ]

Concerns that algorithms may discriminate against certain groups have led to numerous efforts to 'blind' the algorithm to race. We argue that this intuitive perspective is misleading and may do harm. Our primary result is exceedingly simple, yet often overlooked. A preference for fairness should not change the choice of estimator. Equity preferences can change how the estimated prediction function is used (e.g., different threshold for different groups) but the function itself should not change. We show in an empirical example for college admissions that the inclusion of variables such as race can increase both equity and efficiency.