﻿README - How to Build inflation_series_newvintages and inflation_series_oldvintages

Project folder:
C:\Users\tokay\Dropbox\ReisMacroLab\PostMortem\2-PC\Final_Replication - Revised

Date of this documentation:
2026-02-16

====================================================================
1) What this README covers
====================================================================

This README explains, step by step, how the two final inflation decomposition
plots are produced:

- inflation_series_newvintages.pdf
- inflation_series_oldvintages.pdf

from raw inputs, including:

- which raw files are used,
- which scripts/notebooks to run,
- what transformations are applied,
- exact formulas used to construct the exported series,
- how Stata reads those series and produces the final PDFs.

This README reflects the CURRENT code in this folder (as of 2026-02-16).


====================================================================
2) Final outputs and immediate inputs
====================================================================

Final PDF outputs:

- inflation_series_newvintages.pdf
- inflation_series_oldvintages.pdf

These are created by Stata do-files:

- inflation_forecast_PC_newvintages.do
- inflation_forecast_PC_oldvintages.do

Those do-files read these CSV files:

- Data/inflationpredictors_series_newvintages.csv
- Data/inflationpredictors_series_oldvintages.csv

And these CSV files are created by notebooks:

- Creation_Data_newvintages.ipynb
- Creation_Data_oldvintages.ipynb

which read:

- Data/alldata_newvintages.csv
- Data/alldata_oldvintages.csv

created by:

- Create_alldata_newvintages.ipynb
- Create_alldata_oldvintages.ipynb

starting from:

- Data/alldata.csv

and replacing selected series using raw vintage files listed below.


====================================================================
3) Raw files used for vintage replacement
====================================================================

3.1 Files used for NEW vintages (2020Q2 onward)

All replacements in Create_alldata_newvintages.ipynb come from these files:

- Data/Initial_Release_Food.xlsx                        -> CPIUFDSL     -> FoodPrice
- Data/Initial_Release_NROU.xlsx                        -> NROU         -> NROU
- Data/Initial_Release_UNRATE.xlsx                      -> UNRATE       -> UNRATE
- Data/Initial_Release_PCE_v2.xlsx                      -> PCECTPI      -> pceall
- Data/Initial_Release_PCE_IncludingEnergyandFood_v2.xlsx -> DNRGRG3Q086SBEA -> pceEnergy
- Data/Initial_Release_JTSJOL.xlsx                      -> JTSJOL       -> V_hwi
- Data/Initial_Release_UNEMPLOY.xlsx                    -> UNEMPLOY     -> Unemployment

What _v2 means (important):
- _v2 indicates second-pass versions of the two initial-release PCE extraction files.
- They were created during the vintage-revision cleanup so PCE and PCE-including-energy-and-food use the intended columns/layout in the replication pipeline.
- The current Create_alldata notebooks explicitly point to these _v2 files, so they are the operational source of truth for current runs.
- The non-_v2 files (Initial_Release_PCE.xlsx and Initial_Release_PCE_IncludingEnergyandFood.xlsx) are kept for archive/reference but are not the files mapped in current code.

Where the _v2 files are generated (Data_vintages_properVF.ipynb):
- Notebook: Data_vintages_properVF.ipynb
- It builds the two _v2 files from raw real-time panels:
  - Data/PCEPI_1.xlsx -> Data/Initial_Release_PCE_v2.xlsx
  - Data/DNRGRG3Q086SBEA_1.xlsx -> Data/Initial_Release_PCE_IncludingEnergyandFood_v2.xlsx
- Core logic in that notebook:
  - read sheet "Obs. By Real-Time Period",
  - extract FIRST release for each period_start_date (smallest realtime_start_date),
  - compute splice factor around base-change switch (old 2012=100 regime to new 2017=100 regime) using overlap periods,
  - apply splice factor to pre-switch observations so the full first-release series is in 2017-base units,
  - aggregate to quarterly series from 2020Q2 onward,
  - export single-sheet Excel output used by Create_alldata_* notebooks.
- Switch dates used there:
  - PCEPI series: SWITCH_RT_START = 2023-09-29
  - DNRGRG3Q086SBEA series: SWITCH_RT_START = 2023-09-28

3.2 Files used for OLD vintages (pre-2020Q2 block)

In Create_alldata_oldvintages.ipynb, old-period replacement uses:

- Data/CPIUFDSL_20200410.csv          -> CPIUFDSL_20200410          -> FoodPrice
- Data/NROU_20200803.csv              -> NROU_20200803              -> NROU
- Data/UNRATE_20200403.csv            -> UNRATE_20200403            -> UNRATE
- Data/PCECTPI_20200429.csv           -> PCECTPI_20200429           -> pceall
- Data/PCE_food_2020Q1vint.csv        -> DNRGRG3Q086SBEA_20200429   -> pceEnergy
- Data/JTSJOL_20200707.csv            -> JTSJOL_20200707            -> V_hwi
- Data/UNEMPLOY_20200403.csv          -> UNEMPLOY_20200403          -> Unemployment

3.3 Files used for OLD vintages (2020Q2 onward block)

In the same notebook, for 2020Q2+ it switches to the same Initial_Release
Excel files listed in section 3.1.


====================================================================
4) Optional preprocessing notebook for rebasing/linking
====================================================================

Notebook:
- Data_vintages.ipynb

This notebook is an optional helper. It can:

- link a known break at 2023Q3 (LINK_BREAK=True), then
- rebase index series so annual average of 2017 = 100,
- and save files with suffix _rebased_2017eq100.csv.

Important:
- In the CURRENT Create_alldata_* notebooks, the mapped input files are
  PCECTPI_20200429.csv and PCE_food_2020Q1vint.csv (not the rebased files).
- So Data_vintages.ipynb is useful for diagnostics or alternate preprocessing,
  but it is not directly called by Create_alldata_newvintages.ipynb or
  Create_alldata_oldvintages.ipynb in their current code.


====================================================================
5) Build alldata_newvintages.csv (exact logic)
====================================================================

Notebook:
- Create_alldata_newvintages.ipynb

5.1 Base frame
- Read Data/alldata.csv
- Coerce year and quarter to numeric Int64.

5.2 Replacement window
- replace_mask = (year > 2020) OR (year == 2020 AND quarter >= 2)

Meaning:
- only 2020Q2 and later observations can be replaced.

5.3 Reading initial-release files
Function: read_initial_release(path, value_col)
- Reads sheet 'Obs., Initial Release Only'.
- Accepts multiple layouts:
  - explicit year + quarter columns,
  - a quarter string like 'YYYYQn',
  - or date-like columns (period_start_date, observation_date, date, etc.).
- Converts to quarterly by year/quarter and averages within quarter.

5.4 Column-by-column replacement
For each mapped series (FoodPrice, NROU, UNRATE, pceall, pceEnergy, V_hwi,
Unemployment):
- Merge source quarterly values on (year, quarter).
- apply_mask = replace_mask AND replacement value is not missing.
- Replace target only on apply_mask.

5.5 Recompute V_U after replacement
- vu_mask = replace_mask AND V_hwi not missing AND Unemployment not missing
- V_U = V_hwi / Unemployment on vu_mask

5.6 Save output
- Data/alldata_newvintages.csv


====================================================================
6) Build alldata_oldvintages.csv (exact logic)
====================================================================

Notebook:
- Create_alldata_oldvintages.ipynb

6.1 Base frame
- Read Data/alldata.csv
- Coerce year and quarter to numeric Int64.

6.2 Two masks (sample split)
- old_mask = (year < 2020) OR (year == 2020 AND quarter <= 1)
- new_mask = (year > 2020) OR (year == 2020 AND quarter >= 2)

Meaning:
- pre-2020Q2 uses old vintage files,
- 2020Q2+ uses initial-release files.

6.3 Read old vintage CSVs
Function: read_vintage_csv(path, value_col)
- Reads CSV
- observation_date -> datetime
- creates year and quarter
- quarterly mean by (year, quarter)

6.4 Read new-vintage initial-release Excel files
Function: read_initial_release(path, value_col)
- same idea as in section 5 (with fallback to first sheet for v2 files).

6.5 Apply replacements
- For old_series_map: replace only where old_mask and replacement exists.
- For new_series_map: replace only where new_mask and replacement exists.

6.6 Recompute V_U
- where V_hwi and Unemployment are non-missing and Unemployment != 0:
  V_U = V_hwi / Unemployment

6.7 Save output
- Data/alldata_oldvintages.csv


====================================================================
7) Build inflationpredictors_series_*.csv (exact logic)
====================================================================

Notebooks:
- Creation_Data_newvintages.ipynb
- Creation_Data_oldvintages.ipynb

Both notebooks are structurally the same, only input/output file names differ.

7.1 Load and indexing
- Load corresponding alldata_*vintages.csv
- Create year_quarter = year + 'Q' + quarter
- Sort by year_quarter
- Drop last row: df.drop(df.tail(1).index, inplace=True)
- set index to year_quarter

7.2 Construct transformed variables
- pceEnergy_rate = pct_change(pceEnergy, periods=4) * 100
- pceall_rate    = pct_change(pceall, periods=4) * 100
- POILBREUSDM_rate = pct_change(POILBREUSDM, periods=4) * 100
- ugap = UNRATE - NROU

Swaps-related transform:
- CPI_Quarterly_Rate = pct_change(CPI_end, periods=1)
- CPI_Annual_Rate    = pct_change(CPI_end, periods=4)
- Exp_Infl_10month   = (1 + Swap_1year/100) / (1 + two_month_before_NS)
- Swaps_modified     = (Exp_Infl_10month) ** (12/10) - 1

Labor tightness transforms:
- V_U = log(V_U)
- V_U_dum = V_U where V_U > 1, else 0

Then reset index.

7.3 Regression sample selection
- df_to_work = rows where year_quarter >= '1984Q1'

7.4 Baseline model used for exported decomposition series
- Create Date from year_quarter quarterly timestamp.
- Training sample:
  Date <= 2020-04-01 (through 2020Q2)
- Regressors: rr_mean, rr_std, ugap, pceEnergy_rate
- Dependent variable: pceall_rate

OLS:
- X_train = add constant
- model = OLS(Y_train, X_train, missing='drop').fit()

Prediction:
- predicted_inflation = model.predict(add_constant(X_full))

7.5 Contribution construction
Using estimated coefficients:
- rr_contribution_sd       = beta_rr_std * rr_std
- rr_mean_contribution     = beta_rr_mean * rr_mean
- ugap_contribution        = beta_ugap * ugap
- pceEnergy_rate_contribution = beta_energy * pceEnergy_rate

Then aggregate to plotting components:
- inflation exp = rr_mean_contribution + rr_contribution_sd
- Supply side   = pceEnergy_rate_contribution
- Labor market  = ugap_contribution

Constant:
- constant_contribution = beta_const if constant exists else 0

7.6 Export file structure
Export columns:
- Date
- pceall_rate
- predicted_inflation
- constant_contribution
- inflation exp
- Labor market
- Supply side

Then rename to final schema:
- pceall_rate -> actual_pi
- predicted_inflation -> predicted_pi
- inflation exp -> inflation_expectation
- Labor market -> labor_market
- Supply side -> supply_side
- constant_contribution -> const_contribution

Saved as:
- Data/inflationpredictors_series_newvintages.csv
- Data/inflationpredictors_series_oldvintages.csv


====================================================================
8) Build final PDF plots in Stata
====================================================================

8.1 New vintages plot
Run:
- inflation_forecast_PC_newvintages.do

It imports:
- Data/inflationpredictors_series_newvintages.csv

Key operations in the do-file:
- normalize column names (if needed)
- convert Date string to quarterly time variable tq
- create adjusted expectation series:
  inflation_expectation_net = inflation_expectation + const_contribution
- keep window 2021Q1 to 2024Q1
- plot actual, predicted, and 3 components
- export:
  inflation_series_newvintages.pdf

8.2 Old vintages plot
Run:
- inflation_forecast_PC_oldvintages.do

IMPORTANT CURRENT ISSUE:
- This do-file currently sets:
  local csvfile "Data/inflationpredictors_series_newvintages.csv"
- For correct old-vintage plot, change it to:
  local csvfile "Data/inflationpredictors_series_oldvintages.csv"

Then run and export:
- inflation_series_oldvintages.pdf


====================================================================
9) Recommended full run order (from raw start)
====================================================================

1. Verify all files in section 3 exist in Data/.
2. Optional: run Data_vintages.ipynb if you want to regenerate rebased helper
   files for diagnostics.
3. Run Create_alldata_newvintages.ipynb.
4. Run Create_alldata_oldvintages.ipynb.
5. Run Creation_Data_newvintages.ipynb.
6. Run Creation_Data_oldvintages.ipynb.
7. Run inflation_forecast_PC_newvintages.do.
8. Edit inflation_forecast_PC_oldvintages.do to point to old CSV, then run it.


====================================================================
10) Validation checklist
====================================================================

After running everything, check:

- Data/alldata_newvintages.csv exists and has updated values from 2020Q2+.
- Data/alldata_oldvintages.csv exists and has:
  old vintage values through 2020Q1 and initial-release values from 2020Q2+.
- Data/inflationpredictors_series_newvintages.csv exists with columns:
  Date, actual_pi, predicted_pi, const_contribution,
  inflation_expectation, labor_market, supply_side.
- Data/inflationpredictors_series_oldvintages.csv exists with same schema.
- inflation_series_newvintages.pdf is updated.
- inflation_series_oldvintages.pdf is updated and actually uses oldvintages CSV.


====================================================================
11) Notes on interpretation
====================================================================

- The exported decomposition is model-based (OLS baseline model).
- inflation_expectation is the sum of rr_mean and rr_std contributions.
- labor_market is ugap contribution.
- supply_side is pceEnergy_rate contribution.
- const_contribution is the intercept and is added in the Stata plotting
  do-files when creating inflation_expectation_net.

End of README.

