Local Projections for Applied Economics

The dynamic causal effect of an intervention on an outcome is of paramount interest to applied macro-and micro-economics research. However, this question has been generally approached differently by the two literatures. In making the transition from traditional time series methods to applied microeconometrics, local projections can serve as a natural bridge. Local projections can translate the familiar language of vector autoregressions (VARs) and impulse responses into the language of potential outcomes and treatment effects. There are gains to be made by both literatures from greater integration of well established methods in each. This review shows how to make these connections and points to potential areas of further research.


Introduction
Impulse response functions estimated with vector autoregressions (VARs) are a standard statistic used to investigate dynamic macroeconomic relationships.Though many associate impulse responses with Sims (1980), there are references in economics already in Frisch (1933).Of course, before their arrival to economics, impulse responses could trace their origin to the field of signal processing.A. W. Phillips (of the Phillips curve fame), built a hydromechanical analog computer in 1949 (know as the Monetary National Income Analogue Computer or MONIAC) to illustrate the inner workings of Keynesian and Robertsonian economics.The effects of monetary policy were modulated by the flow of water through a system of pipes and valves representing different sectors of the economy, which activated a pen that drew an impulse response on a roll of graphing paper.
Traditionally, estimation of impulse responses has been viewed as a time series exercise that requires characterizing the entire dynamic system under consideration in order to study how policy interventions propagate over time, just as Phillip's MONIAC did.VARs were just a convenient and useful empirical approximation to such a dynamic system.Local projections (Jordà, 2005) shifted this system perspective to one where the impulse response could be directly estimated with univariate methods, and without reference to other parts of the system.
Local projections (LPs) compare two conditional means of a future outcome given today's available information, one of which is subject to an intervention while the other is not.Immediately, one can think of this situation as comparing two forecasts under different circumstances, or as comparing the conditional mean of treated versus control subpopulations.Further, because forecasts and impulse responses are tightly linked, it quickly becomes apparent that traditional time series concepts and policy evaluation ideas stemming from the Rubin Causal Model (Rubin, 1974) must be tightly connected as well.
One area can benefit from the time series tradition of modeling dynamic relationships, as much as the other area can benefit from a rich tradition in the identification of causal effects.
The flexibility of LPs, which helps establish this macro-micro nexus, is at the same time a potential weakness.Because LPs are a univariate semi-parametric approach, they cannot compete in mean-squared error terms with the specification of a traditional structural multivariate time series model (see, e.g.Plagborg-Møller & Wolf, 2021;Li, Plagborg-Møller & Wolf, 2022), even though in population, they estimate the same response in many settings (again, see Plagborg-Møller & Wolf, 2021).This should come as no surprise.The more restrictions one can place in describing the data, the more efficient the estimates, the smaller the mean squared forecast errors, and the broader the scope to experiment with policy variations within the model.Moreover, since many models of the macroeconomy have solutions (or approximate solutions) that consist of a system of linear difference/differential equations, it is natural to impose the same structure on the data to extract estimates of the deep parameters of the model.LPs are not universally preferable and one must recognize those situations where alternative methods have an edge.
However, by the same token, neither are traditional multivariate time series models universally preferable.For example, the consistency of an impulse response estimator depends on the truncation lag used to specify the infinite order approximation (see, e.g.Kuersteiner, 2005;Jordà, Singh & Taylor, 2020;Plagborg-Møller & Wolf, 2021).This issue of potential misspecification is easily resolved using LPs (see, e.g.Jordà, Singh & Taylor, 2020).
Moreover, the natural efficiency losses of a less restrictive model, such as LPs, can often be significantly reduced, as several authors have shown (see, e.g.Lusompa, 2021; Barnichon & Brownlees, 2019;Li et al., 2022;Montiel Olea & Plagborg-Møller, 2021).Moreover, in infinite order settings, Xu (2023) shows that LPs are semiparametrically efficient if the order is allowed to grow with the sample.Importantly, just because a theoretical model of the economy is written in linear form, it does not mean that a structural linear multivariate model will describe the data correctly.More recently, the desire to stratify the responses according to some economic condition (see, e.g.Auerbach & Gorodnichenko, 2012;Jordà & Taylor, 2016;Tenreyro & Thwaites, 2016;Ramey & Zubairy, 2018), is trivially met using LPs, but it is much harder to meet using VARs.In general, nonlinearities can be investigated more easily in univariate rather than multivariate models.
The trade-off between VARs and LPs evokes that between least-squares (OLS) and instrumental variables (IV) estimation.IV estimates are always less efficient (often times, wildly so), yet much of the profession prefers them to OLS estimates, almost regardless of the efficiency loss.The premium is on bias over efficiency, not on minimizing meansquared error loss.1 LPs by themselves do not resolve the issue of identification.However, researchers may prefer using LPs over VARs in settings where getting the dynamic response correctly is at a premium.More generally, efficiency losses in LPs can be greatly contained relative to the substantial bias improvements at medium-to long-horizons, specially with persistent data.
These issues become more pronounced as researchers tackle panel data and generally richer data sets.Moreover, the natural stratification resulting from the policy evaluation paradigm and the Kitagawa-Oaxaca-Blinder decomposition (Kitagawa, 1955;Oaxaca, 1973;Blinder, 1973), does not fit traditional structural time series models well, whereas it is naturally accommodated using LPs (see Cloyne, Jordà & Taylor, 2023).Going the other way, policy evaluation of interventions that have effects over time, or interventions administered over time, with perhaps different doses each time, could greatly benefit from the lessons learned over the past 40 years of applied macroeconomic research.
Extensions to panel data applications look like a specially fruitful area for LPs.In recent research, Dube, Girardi, Jordà & Taylor (2023) show that in difference-in-differences (DiD) settings with absorbing but heterogeneous treatments, LPs can greatly simplify the analysis and can even accommodate repeated treatments, thereby encompassing several of the methods recently proposed in the literature to tackle specific situations.Similar recent developments, such as regression discontinuity designs, probably deserve further exploration with LPs.
This review focuses on the applied macro-micro nexus through the method of local projections.The goal is not to provide an encyclopedic review of the local projections literature, but rather highlight recent developments and avenues for research.The more points of commonality between these two venerable literatures, the more opportunities there are to advance each field through cross-pollination.The review therefore spends the first few sections going over basic estimation and inferential procedures for LPs, and then dedicates the second half to showcase LP applications that take advantage of widely used policy evaluation methods.

A brief introduction to local projections
Let me begin by briefly discussing the intuition behind local projections with a simple example.Suppose w t refers to a vector of stationary random variables observed over t = 1, . . ., T periods.I assume stationarity for simplicity although it is not necessary more generally.2 Further assume that w t = (w 1t , . . ., w jt , . . ., w kt ) ′ for j = 1, . . ., k follows a simple VAR(1)3 : (1) It is well-known that the response of w jt+h due to a shock of size δ i in ϵ t is simply: for h = 0, 1, . . ., H and where A h [j,.] denotes the j th row of the matrix A raised to the h th power.Here δ i refers to the size of the shock for each component of ϵ t chosen by the experimenter to reflect an identified experiment to the i th variable in w t .Since the residuals ϵ t are usually correlated with one another, δ i can be seen as the linear combination that recovers the underlying structural residuals for the i th variable.I set aside different ways to achieve identification (i.e., finding the right δ i ) to later sections.Finally, I use the notation R ij (h) to denote the response from a shock in variable i to variable j, h periods after the initial intervention or shock.
Though this may seem like a restrictive example, note that the state-space representation of a VAR with p lags (a VAR(p)), and the approximate representation of other interesting stochastic processes, have a VAR(1) representation.The more general case is derived in Jordà (2005).Further, for cointegrated systems see Chong, Jordà & Taylor (2012), which shows how to decompose an impulse response in terms of the dynamics due to long-run equilibria and due to short-run dynamics separatedly.Jordà (2005)  As long as the model in Equation 1 accurately represents the data generating process (DGP), a consistent estimate of the coefficient matrix A is all that is needed to calculate the impulse response at any horizon.
The LP approach instead uses recursive substitution,5 yielding: with A 0 = I.The previous expression suggests that a regression of w jt+h on w t−1 such as: (2) for h = 0, 1, . . ., H gives us an estimate of the impulse response since which will be equal to A h [j,.] δ i as long as the DGP coincides with that in Equation 1.Note that B h = A h in this simple example.
As discussed in Jordà (2005) where I discuss how to estimate local projections generically using the generalized method of moments of GMM; and Section 7, where I provide formal conditions for IV estimation.
Here though, there exist parallels with the literature on proxy VARs, where instruments are used to identify structural shocks from reduced-form shocks (see, e.g., Stock & Watson, 2012;Mertens & Ravn, 2013).

Transformations and multipliers
Macroeconomics data often exhibit trending behavior.Think of GDP, or the price level over time, for example.Such trends can often be well described by a unit root-in time series parlance, they are I(1) or integrated of order one.If the data are log-transformed, the first difference can be interpreted as the approximate percentage change in the variable (for example, the growth rate of GDP or the rate of inflation).To fix ideas, let y t denote the log of an I(1) variable, let the first-difference be denoted as ∆y t = y t − y t−1 , and let the long difference be denoted as ∆ h y t+h = y t+h − y t−1 .The latter measures the approximate percentage change in the outcome, from t − 1 to h periods in the future.In addition and for later use, let s t denote a (randomly assigned) intervention of interest (to make things simple).
LPs can be estimated on the long-differences (∆ h y t+h ), or the first-differences (∆y t+h ) in response to an intervention s t .6However, the interpretation of the impulse response is different in each case.LPs on ∆ h y t+h measure the overall percentage change in the outcome since intervention.Notice that ∆ h y t+h = y t+h − y t+h−1 + y t+h−1 + . . .− y t + y t − y t−1 = ∆y t+h + . . .+ ∆y t .Adjusting the notation to indicate that s is the intervention that affects the outcome y however transformed, this means that the LP on the long-difference measures the cumulative of the per-period percentage changes, that is R sy (h) = R s∆ h y (h) = ∑ h j=0 R s∆y (j).A related statistic of interest is the multiplier.An early reference to the multiplier can be found in Keynes (1936).The Keynesian (fiscal) multiplier compares two dynamic responses.
The fiscal impetus in the first year where a fiscal package is passed has effects on output that are felt over subsequent years.From this perspective, the multiplier might seem quite large.However, fiscal packages are usually implemented over years, so that the overall effect of the fiscal package is best evaluated as the ratio of the overall gains in output relative to the overall fiscal expenditures over the duration of the package.
Therefore, the multiplier can be calculated as the sum of the cumulative changes in GDP due to the fiscal package over the cumulative sum of changes in the deficit due to the fiscal package.It is clear that the multiplier will be of interest in any setting in which an intervention is administered over several periods (call it a treatment plan) and one is interested in evaluating the overall effect of the treatment plan and not just the first intervention.
Consider a stripped down model to fix ideas.Suppose y t = γs t + u y t and that s t = ρs t−1 + u s t with E(u y t , u s r ) = 0 for any value of r.In this simple model the treatment variable s t is randomly assigned, though treatments are serially correlated.It is easy to see that R sy (h) = γρ h and R ss (h) = ρ h .Define the multiplier as: In economics terms, the overall effect of the treatment plan on the outcome happens to be the same as the effect on impact, though in more general settings this will not generally be the case.It is also useful to notice that since the effect on impact is γ and there are no internal propagation dynamics, the multiplier is simply the sum of the treatments over time, scaled by their per-period impact γ.
The LP estimator for R sy (h) One can similarly construct S t+h = θ s h s t + ν s t+h and therefore obtain m h = θ y h /θ s h .However, one can go one step further by noting that if an instrument z t is available such that E(z t , u j t ) = 0 for j = y, s, then: which can be directly estimated from the auxiliary LPIV (LP estimated with instrumental variables): estimated using z t as an instrumental variable.This is the approach proposed in Ramey ( 2016).The advantage of using this direct approach is that standard errors can be directly obtained from the regression output.Note that estimating this local projection by OLS would not generate valid estimates even if S t+h were completely assigned at random.In general, of course, we would include a vector of controls x t in the previous expression and use more than one instrument if additional instruments are available.Ramey (2016) provides a more complete discussion.

Inference
As Equation 2shows, the residuals of a local projection generally have a moving average structure.Because they are dated t to t + h, they do not affect the consistency of the local projection estimate, βjh .However, the residual serial correlation affects the construction of standard errors.
A semi-parametric solution offered by Jordà (2005) was to use a Newey-West het- Thus, suppose the data are generated by a simple AR(1) model such as w t = aw t−1 + ϵ t .For convenience, we may assume that w t is strictly stationary with |a| < 1 and ϵ t ∼ D(0, σ 2 ).Consider estimating the local projection w t+h = β h w t + v t+h .Plugging in the AR(1) into the local projection results in the expression Clearly ϵ t is not directly observable.However, we can use the Frisch-Waugh-Lovell logic to obtain β h by regressing (w t+h − γ h w t−1 ) on (w t − aw t−1 ).The estimator from this auxiliary two step regression is such that: The reason this approach works is that under the assumptions made on ϵ t , the term v t+h ϵ t is serially uncorrelated even if v t+h itself is serially correlated.This feature comes from the assumption that ϵ t is strictly stationary with E(ϵ t |{ϵ s } s̸ =t ).7 As a result, a simple way to obtain correct inference for the local projection is to add an additional lag as a regressor and then select a heteroscedasticty robust estimator to compute the standard errors-there is no longer a need to correct for serial correlation.A second option is to use a parametric specification of the residual covariance matrix.
For example, Lusompa (2021) provides a simple FGLS9 procedure that takes advantage of previous local projection stages to correct the h th stage residuals.Specifically, the idea is to estimate the first local projection (i.e., for h = 0) as usual and collect for use in subsequent stages, the residuals { εt } and the estimate of the impulse response coefficient, say β0 .For h = 1 construct the left-hand side variable ỹt+1 = y t+1 − β0 εt and obtain β1 from the local projection based on ỹt+1 .For h = 2, construct the left-hand side variable as ỹt+2 = y t+2 − ( β0 εt+1 + β1 εt ).Similar adjustments to the left-hand side variable are applied with subsequent horizons.
Lusompa (2021) shows that it is not necessary for the DGP to be a VAR for this procedure to correct for residual serial correlation (as long as the data are strictly stationary).Monte Carlo evidence in his paper shows that FGLS generates considerable gains in efficiency, specially when the data are highly persistent.Moreover, Lusompa ( 2021) also provides a bootstrap version using the score wild bootstrap (see Kline & Santos, 2012) and a version for structural multi-step inference.Below I set-up the GMM version of the local projection estimator, which will make the underpinnings of this procedure perhaps easier to understand.
Recent work by Xu (2023) shows that in settings where the true lag order is unknown and possibly infinite, LPs are semiparametrically efficient as long as controlled lags are allowed to grow with the sample size.This means that the efficiency loss relative to VARs diminishes the more lags one includes, and it effectively vanishes in the limit.The paper then proposes two robust methods of inference.
Yet a fourth option consists on shrinking the variation of the local projection coefficients.
By adding some mild constraints, one can make considerable efficiency improvements yet

Joint estimation with GMM
A useful way to think about estimation of local projections is by using the generalized method of moments (or GMM), which then naturally accommodates IV estimation.Let y t+h be an outcome variable observed at time t + h; let s t be a treatment/intervention/policy variable whose effect on the outcome at some point in the future we are interested in characterizing.Let x t refer to a 1 × k vector of exogenous and pre-determined variables that include lags of the outcome and the treatment variable.Let z t denote a 1 × l vector of instruments for s t , which naturally include x t .When no instruments are available, then z t = x t , as one would have in a situation where identification is achieved by conditioning on a rich set of right-hand side variables via regression control.
Then, the h th local projection in a linear model satisfies the moment condition: Stacking the left-hand side variables into the (H + 1) × 1 vector Y t (H) = (y t , y t+1 , . . ., y t+H ) ′ , the (H + 1) × 1 vector of impulse response coefficients β = (β 0 , β 1 , . . ., β H ) ′ can be esti-mated as the solution to the GMM objective function: where, in the overidentified case, it is common to use the Newey-West version of the optimal weighting matrix W t : but one could take advantage of the known structure of the residual correlation in a local projection using, for example, the continuously updated estimator of Hansen, Heaton & Yaron (1996).Importantly, under standard regularity assumptions, estimation with Equation 5 delivers an estimate of the covariance matrix for β, say Σ β , which will turn out to have important uses to conduct simultaneous inference, as I will show next.Panel (b) of Figure 1 combines the GMM expressions just presented and assumes that the coefficients of the impulse response (the β in the previous expression) can be well aproximated by a Gaussian Basis Function (which only depends on 3 parameters) as in Barnichon & Matthes (2018).
Of course, we do not need to be limited by linearity and later I explore some natural nonlinear extensions, but then care must be observed in interpreting the local projection.A simplified example illustrates how this should be done in a nonlinear setting: then note that the impulse response is no longer unique: it will depend on the values of the benchmark (say, s t = s 0 ) and the treatment (say, In this and in many alternative nonlinear specifications (some that I discuss below), local projections remain linear in parameters so that the GMM objective function can be easily set up and estimated.

Joint inference and significance bands
Impulse response plots typically include error bands around the response estimates to provide a measure of estimation uncertainty.However, they are often misused to make inferential statements about the shape of the impulse response.This is problematic because impulse response coefficients are correlated with one another, as Jordà (2009) pointed out.
The problem is like when one uses individual t-ratios instead of a χ 2 -or F-tests to do a joint hypothesis test in linear regression with correlated regressors.This section tackles these issues by building on the inferential methods presented earlier and building appropriate simultaneous inference bands.Ultimately we want to display error bands that allow us to produce valid inferential statements under a variety of scenarios.

Simultaneous inference
Asymptotically and quite generally, we may assume that a vector of impulse responses The joint null hypothesis H 0 : β = 0 could be tested with the traditional Wald statistic based on the Mahalanobis distance (Mahalonobis, 1936), which turns out to be the sum of the square of the t-ratios when standardizing β by Σ β .This Wald statistic will have an asymptotic χ 2 distribution with critical value d(H, α).Jordà (2009) then proposed constructing the individual critical values for the confidence interval of each β h by using Scheffé's S-method (see Scheffé, 1953), which in this example turns out to be d(H, α)/H.That is, Scheffé's S-method leads to more conservative error bands, but which have the correct coverage for any hypothesis test of the impulse response that can be expressed in linear form (such as joint significance).
However, Montiel-Olea & Plagborg-Møller (2019) provide a more elegant solution.The idea is to provide bounds that can accommodate a variety of hypotheses of interest while providing the desired nominal coverage, say with at least probability 1 − α.The idea is to construct an interval for each element in the response vector such that, in the worst case scenario, the null hypothesis of the element that is farthest from the estimate will still have desired nominal coverage 1 − α.This is called the sup-t procedure and Montiel- Olea & Plagborg-Møller (2019) show that it provides tighter bounds than the Scheffé S-method.
Here is how it works.Suppose as before that the estimates of an impulse response of interest are such that β → N (β, Σ β ).This asymptotic argument can be justified under a variety of rather general assumptions that apply to most situations observed in practice.
Hence define the auxiliary vector η = (η 1 , . . ., η The idea is to find the smallest critical value c such that for the collection of intervals around the response estimates: Alas, there are no tabulated values for the distribution of the maximum element of a normally distributed vector (the right-hand side of the previous expression) so that critical values have to be constructed via Monte Carlo simulation as: Based on this principle, Montiel-Olea & Plagborg-Møller (2019) in addition provide bootstrap and Bayesian methods.The main advantage of constructing error bands in this manner is that inference on a subset of impulse response coefficients (e.g. are coefficients for horizons 3 to 6 different from zero?), will be correct.Of course, this comes at the cost of more conservative bands.

Significance
In many applications, it is common to see an impulse response with error bands that straddle the zero line.Many authors therefore conclude that the response is not significant even though in many of these situations the response is uniformly positive (or negative).
An example is provided in Figure 2. The figure shows the response of 100 times the log consumer price index (CPI) in the U.S. to a Romer & Romer (2004) shock over the sample: 1969:Q1-2007:Q4.The specification includes 4 lags of CPI inflation, real GDP growth, the federal funds rate, and the Romer and Romer shock itself.
The impulse response displayed in Figure 2 is typical of many applications.It shows a time profile that, is zero for about one-year and is negative over the remaining 4-years.
The point-wise error bands (shown at 95% confidence level) straddle the zero line thus leading many researchers to conclude that the impulse response is not significant.However, as the figure shows, a joint test of the null that all the response coefficients is zero can be easily rejected (with a p-value of 8.07e-28).To make the point clearer, Figure 2 also displays two dashed lines.These are calculated by inverting the statistic of the null that all impulse response coefficients are zero.That is, I display approximate 95% significance bands constructed as ± σh d(H, α)/H since under the null the coefficient estimates are approximately uncorrelated.Note that in the figure, for the about first 2-years, the impulse response is largely within these significance bands, but clearly strays outside thereafter, thus confirming the result of the p-value (8.07e-28) reported for the joint test of significance.
Economically, the impulse response displayed shows that the CPI inflation is about 2 percentage points lower after 5-years or a decline of CPI inflation of about 0.4% on average over the 5-years-a non-negligible effect in economic terms even if individual response coefficients are imprecisely estimated.
What explains this disparity?As explained earlier, impulse response coefficients are highly correlated.Like regressions with near-collinearity, individual t-statistics are not significant, but a joint test of significance overwhelmingly rejects the null.A good practice is therefore to report the joint test, which more closely corresponds to the scientific test of the hypothesis that the intervention has no effect on the outcome.

Identification
In a typical local projection of the form: with x t containing exogenous and pre-determined variables, OLS estimates will be consistent as long as variation in s t is exogenous given x t .For example, the Cholesky identification assumption common in the VAR literature amounts to including the contemporaneous values of the system variables causally ordered first in x t .However, since the goal is to ensure that variation in s t is as good as if it were exogenous, it seems that the safest route in general would be to include all available information to ensure orthogonality of s t , regardless of the position of s t in the causal order.
Similarly, identification with other methods common in the VAR literature (such as long-run identification restrictions, or sign-based identification, for example) can be easily incorporated as shown in Plagborg-Møller & Wolf (2021).I refer the reader to their paper for more details.
As previewed in the introduction, one of the strong points of the policy evaluation literature is the emphasis on causation and hence on providing additional ways to approach identification.Expanding the idea behind regression control, I discuss in Section 8 inverse propensity score weighting, where control for x t is allowed to be semi-parametric (see, e.g.Hirano, Imbens & Ridder, 2003;Angrist, Jordà & Kuersteiner, 2018;Jordà & Taylor, 2016) based on ideas first discussed in Horvitz & Thompson (1952).
However, perhaps the more typical approach to identification in regression is the use of instrumental variables, which can control for endogeneity.I have sprinkled references to identification with instrumental variables at several points in the previous sections, in particular when discussing how to estimate local projections using GMM in Section 5.That said, it is useful to state formal conditions for when this approach is appropriate.Specifically, denote the vector of instruments z t for the intervention variable s t .Further denote z P t = z t − P (z t |x t ) and similarly s P t = s t − P (s t |x t ) where P (w|v) means the projection of w onto v.The first condition is that the instruments must be relevant, that is: Next, we need the instruments to be exogenous.The exogeneity condition in a local projections setting is slightly different than usual due to the dynamic structure of the problem.It can be stated as: • Lead-lag exogeneity: E(v t+h , z P ′ t ) = 0 for all h.I conclude this section with a brief statement about an advantage of local projections over VARs highlighted by these authors.It consists on noting that, although invertibility is necessary for proxy-VARs (as VARs identified using instrumental variables are typically referred to), this condition is not required for local projections.Invertibility essentially means that the structural residuals can be recovered from the reduced form residuals.The condition usually fails when the span of the reduced-form residuals is smaller than the span of the structural shocks, for example, as is common of models with news about future shocks.

The dynamic and static effects of a policy intervention
In order to draw a closer link to the policy evaluation literature, I draw from variations of a simple model involving an outcome and a binary intervention or treatment.Suppose that y is an outcome variable of interest and s is a latent variable that determines policy/treatment/intervention according to I(s t ) = I t ∈ {0, 1}.I leave the rule implicit in I t undefined for the moment, though a simple example would be I t = I(s t > c) for some c, as is common when estimating a logit or probit model.In some settings, I take s to be directly observable.In such cases, clearly s ̸ = 0 can be directly interpreted as the dose given to a treated unit.Further, suppose that: This expression has several useful features.First, it is written in structural form.The residuals u y t and u s t are orthogonal to each other (explaining the switch from the ϵ t to the u t notation).Second, the outcome variable and the policy variable allow for possible internal propagation dynamics.Third, interventions can be thought of as randomly assigned when ρ sy = 0.When ρ sy ̸ = 0, interventions are endogenously determined by previous outcome values.Several interesting cases can be studied using this simple model.

No serial correlation.
If ρ ij = 0 for i, j = s, y, there are no internal propagation dynamics.A sample of t = 1, . . ., T observations behaves like a cross section.Hence, the effect of an intervention is β on impact, and zero thereafter.This effect can be estimated as we would in a cross-section, that is, by taking the following difference in sample means: Of course, this could be simply estimated with the local projection consisting of regressing y t on I t .The estimate of the constant term would be the mean for the untreated units and the coefficient on I t would be the effect of the intervention, β.It is easy to recognize from these two expressions the parallels with how one would estimate the treatment effect in a randomized controlled trial (RCT).
Using the potential outcomes notation, one would conjecture that the observed data come from a mixture distribution of two unobserved latent variables, y t (1) for observations in the "treated" subpopulation, and y t (0) for the "control" subpopulation.Specifically, the observed data are y t = y t (1)I t + y t (0)(1 − I t ).Since y t (j) for j = 0, 1 are not directly observable for each element of the sample, a quantity of interest is usually the average treatment effect, defined as τ(0) = E[y t (1) − y t (0)], which under random assignment can be directly estimated with Equation 7.
In that case R Iy = (β, βρ, . . ., βρ h , . ..) ′ .That is, the intervention β is propagated by the internal dynamics of the outcome, but the assignment of the intervention I t is still random.In principle, we can use the same difference in means as in the previous expression.
However, to improve the efficiency of the estimator, we would want to take advantage of regressing the outcome on y t−1 first since, in general: , which is just the local projection of y t+h on I t and y t−1 .The residuals contain terms associated with future interventions or shocks.Under our assumptions, these are as if randomly assigned so they do not cause an inconsistency with the local projection estimate of ρ h β.However, if the I t+h are observable (and under the maintained assumptions for this example), nothing prevents one from including them as regressors (by construction they are uncorrelated with the u y t+j for any j) so that the local projection that one would estimate becomes: with a j = ρ j β for j = 0, . . ., h and c h = ρ h+1 and where I use the short-hand notation MA(h) to indicate that the residuals have a moving-average structure of order h.In this case, we can therefore estimate the impulse response with a single local projection set for the desired length, that is RIy = ( â0 , . . ., âh ) ′ .
How would one approach estimating the average treatment effect in similar fashion to Equation 7? Note that for this example, one would be interested in τ(h) = E[y t+h (1) − y t+h (0)|Λ t+h ] where Λ t+h = I t+h , . . ., I t+1 ; y t−1 and based on our example, y t−1 is a summary statistic for the effects of previous interventions.Conditioning on future treatments isolates the effect of the current treatment.Thus, let y t+h|Λ denote the value of y t+h conditional on Λ t+h (say from a regression of y t+h on Λ t+h ), then an alternative estimate of the impulse response is: R and N = N 0 + N 1 .In a moment, the usefulness of this derivation will become apparent.

Serially correlated interventions
Suppose that ρ ss = ρ ̸ = 0 but ρ yy = ρ sy = 0.In this case interventions are serially correlated, but still randomly assigned.When a unit receives an intervention, it is likely that it will receive interventions in the next few periods since I t+h = I(ρ h s t + u s t+h + ρu s t+h−1 + . . .+ ρ h−1 u s t ).Interventions are still as if they were randomly assigned, however the usual local projection in this case would include past values of the intervention as a right hand side variable.That is: In light of the previous example, it is natural to ask why would one not be also including future values of the intervention as regressors as is done in the definition of Λ t+h .Recent papers from the difference-in-differences literature argue that local projections estimate the wrong object for this reason (see e.g.De Chaisemartin & D'Haultfoeuille, 2022).However, this is just a confusion about the object of interest.In a typical impulse response, the effect of the intervention accommodates the possibility that future interventions will be subsequently administered with some probability, as is the case when s t is serially correlated.That is, the usual "macroeconomics" response answers the question, if there is an intervention at time t, what is the likely effect on the outcome, recognizing that the intervention itself generates subsequent interventions.This is the effect we most likely see in the data.However, conditioning on future interventions is also valid but answers a different question, that of the effect of a one-off intervention.
De Chaisemartin & D'Haultfoeuille (2022) and others are interested in the effect of the intervention in isolation of any subsequent potential intervention.This is an equally legitimate question to ask.And here once again, we can make a connection to a literature in applied macroeconomics that studies the fiscal multiplier (see, e.g.Mountford & Uhlig, 2009;Ramey, 2016;Ramey & Zubairy, 2018) as I will show.
That said, a key observation is worth noting.In panel data settings where treatment effects may be heterogenous across units, the difference between these two approaches matters.In a traditional time series setting, an implicit yet critical assumption is that the effect of subsequent treatments is homogenous, that is, the specific time that treatment is administered does not alter the treatment effect, all else equal.In the burgeoning literature on difference-in-differences (DiD) estimation (see Roth, Sant'Anna, Bilinski & Poe, 2022, for an overview), it is becoming standard to assume that treatment is heterogenous.I will return to this issue below.
By the same token, an issue often overlooked in the DiD literature is the role of expectations.That is, in a setting where agents expect interventions to follow after the initial (and possibly randomly assigned) intervention, their behavior will take into account such an eventuality.Thus, conditioning on past information and on future treatments will not completely account for the effect of expectations except in situations where agents are completely backward looking, for example.It seems safer to instead adopt the standard macroeconomic practice of reporting the impulse response without removing the effect of future interventions and instead focus on measuring multiplier effects as discussed earlier.
Summarizing, when interventions are serially correlated, an intervention today will likely be followed by subsequent interventions.The traditional impulse response measures the effect on the outcome of the entire intervention plan, that is, the intervention implemented today and the set of subsequent interventions expected to follow due to serial correlation.This is the effect we are likely to see in the data.Thus a practitioner may well be interested in calculating a multiplier consisting of the sum total of the effect of the intervention plan on the outcome over some horizon, divided by the sum total of the interventions over that same horizon, as is done in the calculation of m h as is done in section 3.
The policy evaluation literature tends to simply focus on the effect of the initial intervention by sterilizing the effect of subsequent interventions.In our example, this is Angrist et al. (2018) show that a doubly robust estimate of the impulse response is: R where N, N 0 and N 1 have been defined before.The doubly robust feature is reflected on the fact that the notation y t+h|t−1 indicates a regression of the outcome on past information.
That is, one controls via regression and via the propensity score.In practice, there are more efficient ways of estimating the model using doubly-robust, inverse propensity score weighting.Importantly, standard errors should be adjusted for the first stage estimation uncertainty in pt .Of course, this could be done with, for example, a paired bootstrap.

The Kitagawa decomposition
In the previous examples, treatment/intervention is conveniently assigned at random, a situation rarely encountered in practice with observational data.With random assignment, covariates provide tighter, more efficient estimates of the treatment effect, but otherwise, whether they are included or not has no effect on bias.However, this view assumes that the influence of the covariates on the outcome remains impervious to treatment.This is implicitly assumed in a VAR.What if this assumption is wrong?What if the manner in which a covariate interacts with the outcome depends on whether treatment is administered or not?
In applied microeconomics, one can account for how covariates and treatment interact using a decomposition first proposed by the sociologist Evelyn Kitagawa (Kitagawa, 1955) and introduced to economics by Oaxaca (1973) and Blinder (1973).An extensive review of this decomposition is provided in Fortin, Lemieux & Firpo (2011).I hence refer to this as the Kitagawa decomposition.It turns out that the Kitagawa decomposition provides a natural way for thinking about how to stratify local projections and even estimate time-varying local projections while still using simple regression analysis.The results that I present next are based on Cloyne, Jordà & Taylor (2023).
Let me start with a simple cross-sectional setting first, with as stripped down a notation as possible.Without loss of generality, one can write y(j) = µ j + v j for j = 0, 1, the two potential outcomes (1 for treated, 0 for control), and where E(v j ) = 0. Covariates introduce heterogeneity.A simple way to model this heterogeneity is by assuming that v j = (x − E(x))γ j + ϵ j , which ensures E(v j ) = 0 by assuming that E(ϵ j ) = 0.Here x refers to a vector of exogenous or predetermined variables, which could include lags of the outcome and the treatment variables.Using the same notation as earlier, let I(s t ) = I t ∈ {0, 1} denote the treatment indicator, which I will denote simply as I when the subscript is redundant to understand the main ideas.Hence, the average treatment effect (under linearity) can be written as: Note that E(ϵ j |I = j) = 0 for j = 0, 1 by assumption.Further, by adding and subtracting E[x − E(x)|I = 1]γ 0 , the previous expression can be rearranged into: Equation 8 hence decomposes the effect of treatment intro three components: (1) a direct effect coming from the difference in unconditional means between treated and control subpopulations; (2) an indirect effect due to differences in the manner the covariates affect the outcome, which leads to the natural hypothesis, H 0 : γ 1 = γ 0 ; and a composition effect due to the fact that in small samples, random assignment is imperfect.A test of the balance condition-if assignment is truly random, the means of the covariates should be the same in the treated and control subpopulations-is therefore a test of the null H 0 : µ 1 x = µ 0 x .Based on these standard derivations, Cloyne et al. (2023) show that, under fairly general assumptions, these three effects can be obtained from the augmented local projection: for h = 0, 1, . . ., H; t = h, . . ., T where v t+h is a residual term.Based on this linear regression in the parameters, note that one can calculate the following three elements of the Kitagawa decomposition as: Direct effect: μh 1 − μh 0 = βh , Indirect effect: If the monetary stance is neutral (x t = 0), or if γ 0 = γ 1 , then R sy (h) = βρ h s , the usual impulse response in a linear model.However, if the stance is not neutral (x t ̸ = 0) and γ 0 ̸ = γ 1 , then the term θρ h x x t will modulate the initial response to a fiscal consolidation.It is easy to see that the local projection in Equation 9 would deliver direct estimates with which to construct R sy (h) for any value of x t .
Figure 3 provides an example of the type of analysis that the Kitagawa decomposition allows.It is based on a simulation of the simple example we just discussed.Panel (a) of the figure shows that, for each observation in the sample, x t will attain a different value, which in turn will accentuate or attenuate the response.This is the most visible for the response to s t on impact (the solid, blue line).To reinforce this point, panel (b) shows the response when x t = 0 in solid blue with 2 standard error confidence bands.The response is purposely designed to be almost 0.However, note that depending on the value of the 2 nd treatment variable, x t , the response can be greatly accentuated (nearly 2 on impact) or greatly attenuated (nearly -2 instead).
At this point, it is helpful to draw the connections to the multiplier calculation reported earlier.Let's focus on the fiscal experiment when the monetary stance is in neutral, i.e., x t = 0.In that case, R sy (h) = βρ h s , and the multiplier m h is easily seen to be the same as that calculated in section 3, that is m h = β.However, when x t ̸ = 0, then R sy (h) = βρ h s + θρ h x x t and in this case the multiplier is: .
In other words, the earlier equivalency between the multiplier and the average treatment effect breaks down since now m h (x t ) = β + K h (x t ), which is a function of x t .

Panel data local projections
The ability to estimate impulse responses with univariate regression greatly facilitates their calculation in panel data settings.Given a sample of i = 1, . . ., n units observed over t = 1, . . ., T periods, the local projection can be written as: where α i are unit fixed effects and δ t are time fixed effects.Because lags of the endogenous variable are often included in x it , potential incidental parameter biases could arise in short panels with a highly serially correlated endogenous variable (see, e.g.Álvarez & Arellano, 2003).In such situations, an Arellano-Bond estimator or subsequent refinements are recommended (see, e.g.Arellano & Bond, 1991;Arellano & Bover, 1995;Blundell & Bond, 1998).
Panels, in principle, offer opportunities to take advantage of the cross-sectional and time-series dimensions to adjust standard errors for serial correlation and potential heteroscedasticity.Intuitively, clustering by unit/group uses the cross-sectional dimension to calculate autocovariances, thus adjusting for serial correlation non-parametrically and adjusting for clustering and heteroscedasticity.Clustering by time exploits the time-dimension to construct residual-variance estimates that vary by unit, thus correcting for heteroscedasticity non-parametrically.
However, the literature on clustered standard errors is rapidly evolving (see, e.g.Abadie, Athey, Imbens & Wooldridge, 2023).For example, Petersen (2009) emphasizes using clustering by unit rather than using Driscoll-Kraay standard errors (Driscoll & Kraay, 1998)-the panel version of a Newey-West standard error, which emphasizes large T, small N asymptotics.Clustering by group relies on having a large number of groups so that the asymptotic approximation works in favor of clustering over Driscoll-Kraay.That said, Petersen (2009) finds the biases of Driscoll-Kraay to be relatively small in many situations.
For short T panels, it seems Monte Carlos evidence indicates that it is sufficient to use time-fixed effects and one-way clustering.Cameron, Gelbach & Miller (2008, 2011); Cameron & Miller (2015) emphasize that with small numbers of clusters, cluster robust inference can be wildly incorrect (i.e.small N, regardless of T asymptotics).In particular, simulation evidence in Cameron & Miller (2015) shows that there can be significant distortions, leading them to recommend bootstrap-based procedures (see also MacKinnon, Nielsen & Webb, 2022).Generally speaking, cluster-robust standard errors (and two-way clustering in particular) are highly sensitive to having a sufficient number of groups and time periods for the asymptotic theory to provide a good approximation.
To my knowledge, there is no theoretical result yet justifying lag augmentation procedures similar to those discussed earlier as a possible alternative/complement, though proving this result seems possible.As an example of an application of local projections in panels, I now discuss recent work by Dube, Girardi, Jordà & Taylor (2023).
10.1.Difference-in-differences with multiple treated groups and treatment periods It has been well documented (see, e.g.Callaway & Sant'Anna, 2021;De Chaisemartin & d'Haultfoeuille, 2020;Sun & Abraham, 2021;Goodman-Bacon, 2021) that either in static or distributed lag specifications where there are multiple treated groups and treatment periods with heterogeneous treatment effects, the traditional two-way fixed effects (TWFE) estimator can be severely biased.This is true even when parallel trends holds with staggered treatment effects that are dynamic and possibly heterogeneous.
Previously treated units are invalid controls for currently treated units, which creates problems in distributed lag specifications.However, this is easily handled with local projections by using the clean control condition of Cengiz, Dube, Lindner & Zipperer (2019).
In particular, let P t = 0 for any period before intervention and 1 thereafter; let A i = 0 for an untreated unit, 1 if treated.Hence define D it = P t × A i .In a simple setting with no covariates, the difference-in-difference estimator of dynamic treatment effects can be estimated with: y i,t+h − y i,t−1 = δ h t + β h ∆D it + v i,t+h ; h = 0, 1, . . ., H by restricting the sample to observations that are either: • treated: ∆D it = 1, or • clean control: ∆D i,t+k = 0 for k = −H, . . ., h.
The key advantage of local projections over distributed lag TWFE estimators is that differencing is in the outcomes, not in treatments.Dube et al. (2023) show how the same estimator can be obtained by defining a dummy variable that is 1 for unclean controls by appropriately interacting this dummy variable with the regressors.I refer the reader to the original paper for more details.
Simulation evidence shows that the local projections approach is easier to implement, it is computationally much faster (which is important when using simulation-based inference, such as the bootstrap), and it provides consistent estimates of the treatment effects.Thus, this example shows that there are potentially many gains from incorporating local projections in other common situations in applied microeconomics where treatment effects may have effects over more than one period.Examples may include regression discontinuity designs, synthetic control, and so on.

Conclusion
In this review, I have focused on presenting the basic tools of local projections estimation so as to establish the nexus between applications in applied macro and applied microeconomics.
That said, of necessity there is a great deal that fell on the editing floor.
and Plagborg-Møller & Wolf (2021) formally establish the asymptotic equivalency of LPs and VARs under a variety of identification assumptions and Stock & Watson (2018); Plagborg-Møller & Wolf (2021) present the conditions under which instrumental variables and LPs can be used to identify non-invertible 4 systems.Importantly, it is not necessary to assume that the data are generated by a VAR, it simply helps with the intuition.
eroscedasticity and autocorrelation consistent (HAC) estimator.Though simple to use, several, more efficient alternatives have been proposed in the literature that are worth reviewing.Perhaps one of the more elegant solutions has been proposed byMontiel Olea & Plagborg-Møller (2021) and consists of adding an additional lag to the LP.Lag-augmentation is known to improve inference in autoregressive models (seeToda & Yamamoto, 1995;   Dolado & L ütkepohl, 1996;Inoue & Kilian, 2020).A simple univariate example helps illustrate the main idea behind lag augmentation, though the method is shown to work with a generic VAR(p) data generating process.

8
In the same paper, Montiel Olea & Plagborg-Møller (2021) propose a parametric wild bootstrap procedure where data are simulated from a VAR and then local projections are fitted to the simulated data to construct percentile t-confidence intervals.
retain much of the flexibility of local projections.A VAR does this automatically, with LPs this can be done in a variety of ways (see, e.g.Barnichon & Brownlees, 2019;Barnichon & Matthes, 2018;Miranda-Agrippino & Ricco, 2021).As an illustration, panel (a) of Figure1compares the error bands computed with Newey-West versus lag augmentation using simulated data.Panel (b) fits a Gaussian Basis Function instead (such as the one proposed in Barnichon & Matthes, 2018), and shows error bands constructed using a direct GMM estimation with Newey-West robust standard errors.Panel (a) shows that, for this example, Newey-West or lag augmentation generate very similar (nearly indistinguishable) bands, as the theory predicts.Panel (b) shows that smoothing the LP responses can generate considerable reductions in uncertainty.Several bootstrap methods have been proposed in the literature.The basic idea is as follows.First, estimate a VAR and generate bootstrap replicates of the data with it.Second, estimate LPs on these bootstrap replicates.The bootstrap sample of LPs can be used to construct inference.The procedure can be paired with a Wild bootstrap to correct for potential heteroscedasticity.The reader should consult Montiel-Olea & Plagborg-Møller (2019); Montiel Olea & Plagborg-Møller (2021) for a detailed presentation of the procedures.

Figure 2 :
Figure 2: Response of Inflation to a Romer and Romer monetary shock

Stock&
Watson (2018) andPlagborg-Møller & Wolf (2021)  discuss these conditions in greater detail and provide more formal statements though the main thrust of what is needed for IV estimation is summarized by the relevance and lead-lag exogenenity conditions just presented.Just like one can show the equivalence between VAR and LP responses, one can also show the equivalence of SVAR-IV and LPIV (see, e.g.Mertens & Montiel-Olea, 2018).

Figure 3 :
Figure 3: Kitagawa decomposition of the impulse response can be obtained from y t+h = β h s t + v y t+h and hence a direct estimate of ∑ h j=0 R sy (j) can be obtained from the modified local projection Y t+h = θ y h s t + ν y t+h where Y t+h ≡ y t+h + . . .+ y t .Clearly θ y h Examples include nonlinear applications of local projections such as to binary dependent data (see, e.g.Ferrari Minesso, Lebastard & Le Mezo, 2022), and quantile regression (see, e.g.Jordà, Kornejew, Schularick & Taylor, 2022); Bayesian estimation of local projections (see, e.g.Tanaka, 2020; Miranda-Agrippino & Ricco, 2021); and smoothing and shrinkage methods (see, e.g.Barnichon & Brownlees, 2019; Barnichon & Matthes, 2018), to name a few.More importantly, I have argued that impulse responses and dynamic treatment effects are close relatives to the point that local projections can offer a bridge between two literatures that up to this point appear to have developed quite separately from one another: time series analysis and methods in applied microeconomics research.The review showcases several examples where these literatures intersect.The hope is that this review will spur much more research at this intersection.