Author: XIAN FORTES ÁLVAREZ
You can download the pdf of the article below.
ABSTRACT
This paper examines the viability of using a forecast-based counterfactual ap- proach to evaluate the short-run economic effects of the 2018 US–China trade war, specifically causal ARIMA (C-ARIMA) modeling. Focusing on key U.S. macroeconomic indicators we construct time series forecasts based on pre-trade war data to estimate what outcomes would have occurred in the absence of policy intervention. By comparing these counterfactual forecasts to observed post-tariff outcomes, we could isolate the immediate causal effect of the trade shock without relying on structural or cross-sectional assumptions. However, diagnostic tests reveal residual autocorrelation in some series, indicating potential model mis- specification and raising caution about the interpretability of the results. These findings underscore both the promise and the pitfalls of using low-assumption time series tools for policy evaluation. Despite these limitations, C-ARIMA is still able to offer some valuable insights about trade war effects in a time where new concerns around commercial tensions have emerged.
1. Introduction
In April 2025, the United States announced a new wave of tariffs targeting imports from China, reigniting concerns of a renewed trade war between the world’s two largest economies. With memories of the 2018 U.S.–China trade conflict still fresh, policy- makers, businesses, and researchers are urgently seeking to understand the potential economic consequences of escalating trade tensions. The 2018 episode provides a critical point of reference, offering a real-world case study of how trade barriers can disrupt key sectors of the economy in the short run. However, accurately isolating the causal effects of such large-scale policy interventions remains a methodological challenge, particularly when the effects are rapid and intertwined with broader economic fluctuations.
The U.S.–China trade war formally began in mid-2018, when the United States imposed a series of tariffs on Chinese goods, citing concerns over intellectual prop- erty practices and trade imbalances. China quickly retaliated with its own tariffs on
American exports. Over the next year, the two countries engaged in a tit-for-tat esca- lation, affecting hundreds of billions of dollars in traded goods. Key moments in this chronology include the initial $34 billion in tariffs implemented in July 2018, followed by several rounds of tariff increases through late 2019. These policy moves introduced sharp shocks to the global economy, disrupted supply chains, and introduced a climate of uncertainty that weighed heavily on business investment and manufacturing activity.
Understanding the causal impact of such trade policy actions is critical for effective policymaking. Counterfactual analysis—assessing what would have happened in the absence of the trade war—provides an essential tool for identifying the true economic effects, separate from unrelated trends or shocks. Yet constructing credible counter- factuals is often complicated by the complexity of global economic dynamics and the absence of natural experimental conditions.
In this study, we employ a forecast-based counterfactual approach rooted in causal ARIMA modeling (AutoRegressive Integrated Moving Average). Causal ARIMA of- fers a straightforward, time-series-driven method for estimating what key economic variables would have looked like without the intervention of new tariffs, based purely on pre-treatment dynamics. Unlike structural models or cross-sectional difference-in- differences designs, causal ARIMA does not rely on restrictive assumptions about eco- nomic behavior, sectoral interlinkages, or the validity of control groups. Its strength lies in its simplicity and directness: by using the historical path of a variable to pre- dict its future under the assumption of no intervention, we can cleanly infer deviations attributable to the trade war.
Nonetheless, causal ARIMA is not without limitations. It assumes that pre-intervention patterns would have persisted absent the shock, potentially overlooking unobserved confounding trends. It also focuses on short-term dynamics, limiting its applicability for assessing long-term structural shifts. We then have to be carefull when assesing causality by using ARIMA models.
To bring clarity to the short-run economic impact of the 2018 U.S.–China trade war, we focus on the following indicators: the bilateral trade deficit with China (the direct target of tariff policy rhetoric), manufacturing employment (capturing labor market effects in a sensitive sector),SP500 and Russell 2000(capturing stock market reaction), oil prices (reflecting global commodity market reactions to trade uncertainty), and Dollar Index (showing impact on the US currency) . By comparing counterfactual forecasts of these variables to their realized trajectories following the imposition of tariffs, we aim to provide an empirically grounded measure of the imme- diate consequences of the 2018 trade war. These insights, in turn, offer valuable lessons as the world stands on the brink of a potential new chapter in U.S.–China economic relations.
2. Previous Literature
A growing body of empirical research has investigated the economic consequences of the 2018 U.S.–China trade war, employing a range of econometric tools to quantify its effects on prices, trade flows, and welfare. Amiti, Redding, and Weinstein (2019) use a comprehensive framework that combines regression modeling, elasticity estimation, and structural welfare analysis to measure the pass-through of tariffs into domestic prices. Their results show that the full incidence of tariffs was borne by U.S. consumers and firms, leading to significant reductions in real income—estimated at at 1.4 billion per month by the end of 2018. The study also documents major shifts in supply chains and trade patterns, highlighting the broader economic costs of the trade war.
These studies rely on structural models, cross-sectional regressions, and quasi- experimental designs such as difference-in-differences (DiD) to infer the effects of trade policy. While powerful, such methods typically depend on strong identifying assump- tions, the availability of appropriate control groups, or sufficient cross-sectional varia- tion in treatment exposure. In many macroeconomic settings, especially when policy changes affect the entire system or lack clear untreated units, these assumptions may not be tenable.
To address such challenges, Menchetti, Cipollini, and Mealli (2023) propose a new approach, the Causal-ARIMA. A novel framework that integrates ARIMA modeling with RCM principles to estimate causal impacts of interventions in time series data. The framework used in this paper is a simplification of the one proposed by Menchetti et. al.
3. Methodology
3.1 ARIMA Models and Assumptions
This study uses Autoregressive Integrated Moving Average (ARIMA) models to forecast counterfactual outcomes in the absence of the trade policy intervention. ARIMA models are denoted as ARIMA(p, d, q), where:
- p is the order of the autoregressive (AR) component
- d is the degree of differencing required to make the series stationary
- q is the order of the moving average (MA) component.
For a stationary time series Yt, the ARMA(p, q) model can be expressed as:
Yt = ϕ1Yt−1 + . . . + ϕpYt−p + εt + θ1εt−1 + . . . + θqεt−q, (1)
where the residuals εt are assumed to be gaussian white noise:
εt ∼ N (0, σ2). (2)
For non-stationary series, the data is differenced d times to achieve stationarity, resulting in an ARIMA model. The validity of ARIMA forecasts relies on two main assumptions:
- Stationarity: The (differenced) time series should have constant mean, variance, and autocovariance over time.
- White noise residuals: The residuals should be uncorrelated and homoskedas- tic. Gaussianity is preferred for valid inference, though not strictly required for point forecasts.
If these assumptions are violated:
- Non-stationarity may lead to biased or inconsistent forecasts.
- Autocorrelated residuals suggest model misspecification, undermining the reliability of the estimated counterfactuals.
- Non-Gaussian residuals can affect confidence intervals but are less critical for point estimates.
3.2 Model Selection with auto.arima
The selection of the ARIMA order (p, d, q) is automated using the auto.arima() func- tion from the forecast package in R. This function searches over combinations of p, d, and q to minimize the Bayesian Information Criterion (BIC):
BIC = −2 log L + k log(n), (3)
where L is the likelihood of the model, k is the number of estimated parameters, and n is the sample size. BIC penalizes model complexity, favoring more parsimonious models that are likely to generalize well.
3.3 Causal Estimation via Forecasting
Once the ARIMA model is fitted using pre-treatment data, forecasts are generated for the post-treatment period. The difference between the observed values Y obs and the forecasted (counterfactual) values Y (0) is interpreted as the point causal effect :
τt = Y obs − Y (0). (4)
This approach is grounded in the Rubin Causal Model (RCM), where only one potential outcome is observed and the other (counterfactual) must be estimated.
3.4 Intervention Timing and Training Window
The intervention or split date is variable-specific:
- For variables that react immediately to new information (e.g., stock indices), the announcement date of the tariffs is used. In this case we select 22/03/2018 as the announcement day, used as the split day for: SP500, Russell 2000, FRED Crude Oil prices and Dollar Index.
- For variables with delayed responses (e.g., exports), the implementation date of the tariffs is used. In this case we select 01/06/2018 as the split date of the following variables: Exports, Imports, Trade Balance and Manufacturing Employment.
To capture just the trend after the financial crisis the training data covers since 2009 until the split data. This selection mirrors the logic of regression discontinuity designs.
4. Findings
All the graphics results are showed in the appendix (PDF).
4.1 Overview of Time Series Diagnostics
For each time series under analysis, we begin by displaying the historical evolution of the variable, accompanied by results from the Augmented Dickey-Fuller (ADF) test for unit roots. The ADF test evaluates the null hypothesis that the series possesses a unit root (i.e., is non-stationary). A rejection of the null suggests stationarity. However, if the series is trend-stationary (meaning it becomes stationary once a deterministic trend is removed) the ADF test may fail to reject the null despite the underlying series being predictable after detrending. Therefore, visual inspection of the time series, is essential to complement formal testing.
Although the ARIMA or SARIMA orders were automatically selected using the auto.arima() function as mentioned, the autocorrelation function (ACF) and partial autocorrelation function (PACF) for both the original and first-differenced series is reported.
4.2 Counterfactual Forecasts and Estimated Effects
Using the ARIMA models fitted on the pre-trade war data, we generate counterfactual forecasts for the post-treatment period. These forecasts represent the expected path of the series in the absence of the trade war intervention. Each forecast is plotted alongside the actual observed data, with a 95% confidence interval (CI) shaded in gray to reflect forecast uncertainty. The point-wise difference between the observed and counterfactual values constitutes the estimated causal effect.
Among the analyzed variables, only U.S. Exports to China exhibit a statistically and visually significant deviation from the forecasted counterfactual. The series shows a pronounced negative divergence shortly after the tariff implementation, remaining outside the 95% CI for an extended period. This is consistent with the direct impact of Chinese retaliatory tariffs and trade frictions on U.S. export flows.
As a consequence, the U.S. Trade Balance with China also displays a noticeable short-run deterioration, with the observed series falling below the counterfactual tra- jectory immediately after the intervention.
In contrast, financial variables such as the S&P 500 Index and the Russel 2000 do not show immediate deviations following the announcement or implementation of tariffs. However, toward the end of 2018, both series experience big downward movement, which may be indicative of a delayed market reaction as trade tensions escalated. Nonetheless, these fluctuations remain within the CI, and the delay on the reaction makes it harder to directly link it to the trade war.
A similar pattern is observed in the Brent Oil Price, which does not display any immediate reaction post-intervention, but .
The U.S. Dollar Index (DXY) shows a marginal positive deviation relative to the forecast. While this could be interpreted as a safe-haven effect in response to geopolitical uncertainty, the observed values remain at the edge of the CI, making the interpretation tentative at best.
4.3 Model Fit and Residual Diagnostics
Table 1 summarizes the fitted ARIMA models for each time series, including selected model orders, BIC, log-likelihood, and the p-value of the Box-Ljung test on residuals (at lag 24). The Box-Ljung test evaluates the null hypothesis of no residual autocorrelation. A high p-value (typically above 0.05) indicates that the residuals resemble white noise, satisfying one of the key ARIMA model assumptions.

Only three series—Brent Oil, U.S. Exports to China, and Manufacturing Output—pass the residual diagnostics, with Box-Ljung p-values well above 0.05, indi- cating no significant autocorrelation.The remaining variables have Box-Ljung p-values close to 0.01 or below, indicating clear violations of the white noise assumption.
This has implications for interpretation. In cases where residuals are autocorre- lated, the ARIMA model may be misspecified, casting doubt on the reliability of the counterfactual forecasts and thus the estimated causal effects. Possible explanations include:
- Short training window: The use of post-2009 data limits the available infor- mation for model estimation, maybe with more observations the data would show white noise innovations. Small sample may result in less accurate estimation of ARIMA, it reduces statistical power especially for low-frequency macroeconomic series.
- Model selection limitations: While auto.arima() minimizes the BIC, it may select suboptimal models in terms of residuals autocorrelation. BIC penalizes complexity, which may lead to underfitting when more complex dynamics (e.g., higher-order AR or MA terms) are actually needed.
- Inherent volatility: Financial series often exhibit volatility clustering, which standard ARIMA models are not designed to capture. This could explain the poor residual diagnostics for stock indices and similar variables.
5. Limitations and Conclusion
This study has explored the potential of the Causal-ARIMA (C-ARIMA) approach as a simple, transparent tool for evaluating the short-run economic effects of the 2018 U.S.–China trade war. By forecasting counterfactual trajectories for key macroeconomic and financial variables, we aimed to isolate the causal impact of the trade war without relying on structural assumptions or external control groups.
While the method successfully identified a statistically significant negative effect on U.S. exports to China—consistent with the introduction of Chinese retaliatory tar- iffs—its application revealed several important limitations. Most notably, diagnostic testing showed that in many cases the residuals from the fitted ARIMA models failed to resemble white noise, violating a key assumption of the method. As a result, the reliability of the counterfactual forecasts and corresponding causal interpretations is weakened for those series. In these cases, deviations from forecasted counterfactuals should be interpreted with caution.
Despite these limitations, the C-ARIMA framework remains a useful starting point for transparent, low-assumption policy evaluation.Its simplicity makes it appealing for preliminary policy analysis, however some variables may require different approaches for the counterfactual generation such as GARCH models or Machine Learning forecast.
References
- Fiammetta Menchetti, Fabio Cipollini, and Fabrizia Mealli. Combining counterfactual outcomes and ARIMA models for policy evaluation. Econometrics Journal, Volume 26, 2023, Pages 1–24. Available at: https://doi.org/10.1093/ecj/utac024.
- Mary Amiti, Stephen J. Redding, and David Weinstein. The Impact of the 2018 Trade War on U.S. Prices and Welfare. National Bureau of Economic Research, Working Paper No. 25672, March 2019. Available at: http://www.nber.org/papers/w25672.
YOU CAN FIND THE GRAPHS AT THE END OF THE PDF.