{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3086df28",
   "metadata": {},
   "source": [
    "# Taiwan K-Shaped Card Spending\n",
    "\n",
    "This notebook reproduces the core New York Fed spending-index logic with Taiwan public aggregate data:\n",
    "\n",
    "- NCCC reported-income-bin credit-card spending\n",
    "- DGBAS income-level CPI deflators\n",
    "- DGBAS retail plus food-service benchmark\n",
    "- WID Taiwan wealth distribution appendix\n",
    "\n",
    "The target concept is **real credit-card signing/spending activity by reported-income bins**, not household average consumption."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ee73d1b",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "The notebook writes raw downloads to `.cache/taiwan-k-shaped-card-spending/raw/` and public aggregate artifacts to `public/data/taiwan-k-shaped-card-spending/`. Raw downloads are intentionally excluded from published site artifacts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d8df5df1",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:52.794876Z",
     "iopub.status.busy": "2026-06-18T11:51:52.794137Z",
     "iopub.status.idle": "2026-06-18T11:51:54.184046Z",
     "shell.execute_reply": "2026-06-18T11:51:54.184313Z"
    }
   },
   "outputs": [],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "import datetime as dt\n",
    "import hashlib\n",
    "import io\n",
    "import json\n",
    "import math\n",
    "import zipfile\n",
    "from pathlib import Path\n",
    "from typing import Any, Dict, Iterable, List, Mapping, Optional, Tuple\n",
    "from urllib.parse import urlparse\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import requests\n",
    "from requests.adapters import HTTPAdapter\n",
    "from urllib3.util.retry import Retry\n",
    "\n",
    "plt.rcParams['font.sans-serif'] = ['Taipei Sans TC Beta']\n",
    "plt.rcParams['axes.unicode_minus'] = False\n",
    "\n",
    "SLUG = 'taiwan-k-shaped-card-spending'\n",
    "PROJECT_ROOT = Path.cwd()\n",
    "if not (PROJECT_ROOT / 'src/content.config.ts').exists():\n",
    "    PROJECT_ROOT = Path('/Users/qin/side_projects_workspace/pyivest')\n",
    "\n",
    "RAW_CACHE_DIR = PROJECT_ROOT / '.cache' / SLUG / 'raw'\n",
    "PUBLIC_DATA_DIR = PROJECT_ROOT / 'public' / 'data' / SLUG\n",
    "RAW_CACHE_DIR.mkdir(parents=True, exist_ok=True)\n",
    "PUBLIC_DATA_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "ALLOWED_HOSTS = {'bas.nccc.com.tw', 'nstatdb.dgbas.gov.tw', 'wid.world'}\n",
    "MAX_RESPONSE_BYTES = 25_000_000\n",
    "FETCH_DATE = dt.datetime.now(dt.timezone.utc).date().isoformat()\n",
    "\n",
    "NCCC_URL = (\n",
    "    'https://bas.nccc.com.tw/nccc-nop/OpenAPI/C03/'\n",
    "    'incomegroupsconsumption/{region_code}/{industry_code}'\n",
    ")\n",
    "DGBAS_CPI_URL = (\n",
    "    'https://nstatdb.dgbas.gov.tw/dgbasAll/webMain.aspx?sdmx/A030104015'\n",
    ")\n",
    "DGBAS_RETAIL_URL = (\n",
    "    'https://nstatdb.dgbas.gov.tw/dgbasAll/webMain.aspx?sdmx/A050107020'\n",
    ")\n",
    "WID_TW_URL = 'https://wid.world/bulk_download/WID_fulldataset_TW.zip'\n",
    "\n",
    "NCCC_INDUSTRY_CODES = {\n",
    "    'ALL': '全部產業',\n",
    "    'FD': '食',\n",
    "    'CT': '衣',\n",
    "    'LG': '住',\n",
    "    'TR': '行',\n",
    "    'EE': '文教康樂',\n",
    "    'DP': '百貨',\n",
    "    'OT': '其他',\n",
    "}\n",
    "INDUSTRY_NAME_TO_CODE = {\n",
    "    '食': 'FD',\n",
    "    '衣': 'CT',\n",
    "    '住': 'LG',\n",
    "    '行': 'TR',\n",
    "    '文教康樂': 'EE',\n",
    "    '百貨': 'DP',\n",
    "    '其他': 'OT',\n",
    "}\n",
    "INCOME_BAND_ORDER = [\n",
    "    '未達50萬',\n",
    "    '50(含)-75萬',\n",
    "    '75(含)-100萬',\n",
    "    '100(含)-125萬',\n",
    "    '125(含)-150萬',\n",
    "    '150(含)-175萬',\n",
    "    '175(含)-200萬',\n",
    "    '200(含)萬以上',\n",
    "]\n",
    "INCOME_BAND_BOUNDS_NTD = {\n",
    "    '未達50萬': (0.0, 500_000.0),\n",
    "    '50(含)-75萬': (500_000.0, 750_000.0),\n",
    "    '75(含)-100萬': (750_000.0, 1_000_000.0),\n",
    "    '100(含)-125萬': (1_000_000.0, 1_250_000.0),\n",
    "    '125(含)-150萬': (1_250_000.0, 1_500_000.0),\n",
    "    '150(含)-175萬': (1_500_000.0, 1_750_000.0),\n",
    "    '175(含)-200萬': (1_750_000.0, 2_000_000.0),\n",
    "    '200(含)萬以上': (2_000_000.0, None),\n",
    "}\n",
    "INCOME_GROUPS = {\n",
    "    'Lower': {\n",
    "        'label': 'Low personal-income proxy',\n",
    "        'percentile_proxy': 'bottom 20%',\n",
    "        'income_range': '未滿38.1萬',\n",
    "        'lower_ntd': 0.0,\n",
    "        'upper_ntd': 381_000.0,\n",
    "    },\n",
    "    'Middle': {\n",
    "        'label': 'Middle personal-income proxy',\n",
    "        'percentile_proxy': 'P20-P80',\n",
    "        'income_range': '38.1萬到未滿95.9萬',\n",
    "        'lower_ntd': 381_000.0,\n",
    "        'upper_ntd': 959_000.0,\n",
    "    },\n",
    "    'Higher': {\n",
    "        'label': 'High personal-income proxy',\n",
    "        'percentile_proxy': 'top 20%',\n",
    "        'income_range': '95.9萬以上',\n",
    "        'lower_ntd': 959_000.0,\n",
    "        'upper_ntd': None,\n",
    "    },\n",
    "}\n",
    "GROUP_LABELS = {\n",
    "    group: config['label'] for group, config in INCOME_GROUPS.items()\n",
    "}\n",
    "CPI_BRIDGE = {\n",
    "    'Lower': '最低20%所得家庭',\n",
    "    'Middle': '中間60%所得家庭',\n",
    "    'Higher': '最高20%所得家庭',\n",
    "}\n",
    "GROUP_ORDER = ['Lower', 'Middle', 'Higher']\n",
    "BASE_MONTHS = ['2018-01', '2020-01', '2023-01']\n",
    "MAIN_BASE_MONTH = '2023-01'\n",
    "START_MONTH = '2018-01'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc18d316",
   "metadata": {},
   "source": [
    "## Safe API Client\n",
    "\n",
    "The client only allows the planned official/public hosts, requires HTTPS, uses bounded retries/timeouts, checks response size and content type, and caches raw responses with a content hash."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "160a6107",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:54.191418Z",
     "iopub.status.busy": "2026-06-18T11:51:54.191039Z",
     "iopub.status.idle": "2026-06-18T11:51:54.192222Z",
     "shell.execute_reply": "2026-06-18T11:51:54.192465Z"
    }
   },
   "outputs": [],
   "source": [
    "def build_session() -> requests.Session:\n",
    "    \"\"\"Return a requests session with bounded retry behavior.\"\"\"\n",
    "    retry = Retry(\n",
    "        total=3,\n",
    "        connect=3,\n",
    "        read=3,\n",
    "        backoff_factor=0.5,\n",
    "        status_forcelist=(429, 500, 502, 503, 504),\n",
    "        allowed_methods=frozenset(['GET']),\n",
    "    )\n",
    "    adapter = HTTPAdapter(max_retries=retry)\n",
    "    session = requests.Session()\n",
    "    session.mount('https://', adapter)\n",
    "    session.headers.update({'User-Agent': 'PyInvest public-data research'})\n",
    "    return session\n",
    "\n",
    "\n",
    "def assert_allowed_url(url: str) -> None:\n",
    "    \"\"\"Validate that a URL is HTTPS and on the explicit host allowlist.\"\"\"\n",
    "    parsed = urlparse(url)\n",
    "    if parsed.scheme != 'https':\n",
    "        raise ValueError(f'Only HTTPS URLs are allowed: {url}')\n",
    "    if parsed.hostname not in ALLOWED_HOSTS:\n",
    "        raise ValueError(f'Hostname is not allowed: {parsed.hostname}')\n",
    "\n",
    "\n",
    "def expected_extension(content_type: str, source_name: str) -> str:\n",
    "    \"\"\"Return a cache file extension based on source and content type.\"\"\"\n",
    "    if 'json' in content_type:\n",
    "        return 'json'\n",
    "    if 'zip' in content_type:\n",
    "        return 'zip'\n",
    "    if source_name.startswith('nccc'):\n",
    "        return 'csv'\n",
    "    return 'bin'\n",
    "\n",
    "\n",
    "def safe_get(\n",
    "    session: requests.Session,\n",
    "    url: str,\n",
    "    source_name: str,\n",
    "    expected_content: Iterable[str],\n",
    ") -> Path:\n",
    "    \"\"\"Download a URL safely and cache the raw response by hash.\"\"\"\n",
    "    assert_allowed_url(url)\n",
    "    response = session.get(url, timeout=(10, 60), stream=True)\n",
    "    response.raise_for_status()\n",
    "    content_type = response.headers.get('content-type', '').lower()\n",
    "    if not any(token in content_type for token in expected_content):\n",
    "        raise ValueError(\n",
    "            f'Unexpected content type for {source_name}: {content_type}'\n",
    "        )\n",
    "\n",
    "    chunks: List[bytes] = []\n",
    "    total_bytes = 0\n",
    "    for chunk in response.iter_content(chunk_size=1024 * 256):\n",
    "        if not chunk:\n",
    "            continue\n",
    "        total_bytes += len(chunk)\n",
    "        if total_bytes > MAX_RESPONSE_BYTES:\n",
    "            raise ValueError(f'Response too large for {source_name}')\n",
    "        chunks.append(chunk)\n",
    "\n",
    "    content = b''.join(chunks)\n",
    "    content_hash = hashlib.sha256(content).hexdigest()[:16]\n",
    "    extension = expected_extension(content_type, source_name)\n",
    "    cache_path = RAW_CACHE_DIR / (\n",
    "        f'{source_name}_{FETCH_DATE}_{content_hash}.{extension}'\n",
    "    )\n",
    "    if not cache_path.exists():\n",
    "        cache_path.write_bytes(content)\n",
    "    return cache_path\n",
    "\n",
    "\n",
    "def read_json(path: Path) -> Dict[str, Any]:\n",
    "    \"\"\"Read a JSON file as a dictionary.\"\"\"\n",
    "    return json.loads(path.read_text(encoding='utf-8'))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ea84998",
   "metadata": {},
   "source": [
    "## Download Sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "af0c2f51",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:54.196011Z",
     "iopub.status.busy": "2026-06-18T11:51:54.195721Z",
     "iopub.status.idle": "2026-06-18T11:51:57.639742Z",
     "shell.execute_reply": "2026-06-18T11:51:57.640071Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloaded or reused 11 raw source files\n",
      "nccc_TWN_ALL: nccc_TWN_ALL_2026-06-18_6adc9cbfecba79c2.csv\n",
      "nccc_TWN_FD: nccc_TWN_FD_2026-06-18_bd0eb1337ff75d95.csv\n",
      "nccc_TWN_CT: nccc_TWN_CT_2026-06-18_cb216e588e3a2e68.csv\n",
      "nccc_TWN_LG: nccc_TWN_LG_2026-06-18_7872cf3237ed20c1.csv\n",
      "nccc_TWN_TR: nccc_TWN_TR_2026-06-18_a56b9a307d86d712.csv\n",
      "nccc_TWN_EE: nccc_TWN_EE_2026-06-18_6005ae5d8e3ca4b6.csv\n",
      "nccc_TWN_DP: nccc_TWN_DP_2026-06-18_b9ab27850626593e.csv\n",
      "nccc_TWN_OT: nccc_TWN_OT_2026-06-18_b146f302c8212861.csv\n",
      "dgbas_cpi_A030104015: dgbas_cpi_A030104015_2026-06-18_75f7db92415e66b3.json\n",
      "dgbas_retail_A050107020: dgbas_retail_A050107020_2026-06-18_d820b4fda7bb4e9c.json\n",
      "wid_fulldataset_TW: wid_fulldataset_TW_2026-06-18_0151a66857a9e2e0.zip\n"
     ]
    }
   ],
   "source": [
    "session = build_session()\n",
    "\n",
    "raw_paths: Dict[str, Path] = {}\n",
    "for industry_code in NCCC_INDUSTRY_CODES:\n",
    "    url = NCCC_URL.format(region_code='TWN', industry_code=industry_code)\n",
    "    raw_paths[f'nccc_TWN_{industry_code}'] = safe_get(\n",
    "        session=session,\n",
    "        url=url,\n",
    "        source_name=f'nccc_TWN_{industry_code}',\n",
    "        expected_content=('octet-stream', 'csv', 'text/plain'),\n",
    "    )\n",
    "\n",
    "raw_paths['dgbas_cpi_A030104015'] = safe_get(\n",
    "    session=session,\n",
    "    url=DGBAS_CPI_URL,\n",
    "    source_name='dgbas_cpi_A030104015',\n",
    "    expected_content=('json',),\n",
    ")\n",
    "raw_paths['dgbas_retail_A050107020'] = safe_get(\n",
    "    session=session,\n",
    "    url=DGBAS_RETAIL_URL,\n",
    "    source_name='dgbas_retail_A050107020',\n",
    "    expected_content=('json',),\n",
    ")\n",
    "raw_paths['wid_fulldataset_TW'] = safe_get(\n",
    "    session=session,\n",
    "    url=WID_TW_URL,\n",
    "    source_name='wid_fulldataset_TW',\n",
    "    expected_content=('zip',),\n",
    ")\n",
    "\n",
    "print(f'Downloaded or reused {len(raw_paths)} raw source files')\n",
    "for key, path in raw_paths.items():\n",
    "    print(f'{key}: {path.name}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4172034",
   "metadata": {},
   "source": [
    "## Parse Sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7fef16cd",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:57.659633Z",
     "iopub.status.busy": "2026-06-18T11:51:57.659154Z",
     "iopub.status.idle": "2026-06-18T11:51:58.095061Z",
     "shell.execute_reply": "2026-06-18T11:51:58.095241Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NCCC rows: 8232 2014-01-01 2026-03-01\n",
      "CPI rows: 17440 1981-01-01 2026-05-01\n",
      "Retail rows: 1312 1999-01-01 2026-04-01\n",
      "WID rows: 21 2018 2024\n"
     ]
    }
   ],
   "source": [
    "def parse_month_yyyymm(value: Any) -> pd.Timestamp:\n",
    "    \"\"\"Parse NCCC YYYYMM-like values into month-start timestamps.\"\"\"\n",
    "    text = str(value).strip()\n",
    "    if len(text) != 6 or not text.isdigit():\n",
    "        raise ValueError(f'Unexpected NCCC month: {value}')\n",
    "    return pd.Timestamp(year=int(text[:4]), month=int(text[4:]), day=1)\n",
    "\n",
    "\n",
    "def parse_dgbas_month(value: str) -> Optional[pd.Timestamp]:\n",
    "    \"\"\"Parse DGBAS SDMX monthly ids such as 2024-M5.\"\"\"\n",
    "    if '-M' not in value:\n",
    "        return None\n",
    "    year_text, month_text = value.split('-M', maxsplit=1)\n",
    "    return pd.Timestamp(year=int(year_text), month=int(month_text), day=1)\n",
    "\n",
    "\n",
    "def parse_nccc_csv(path: Path, source_industry_code: str) -> pd.DataFrame:\n",
    "    \"\"\"Parse a NCCC CSV response into a normalized monthly dataframe.\"\"\"\n",
    "    text = path.read_text(encoding='utf-8-sig')\n",
    "    frame = pd.read_csv(io.StringIO(text))\n",
    "    required = {\n",
    "        '年月',\n",
    "        '地區',\n",
    "        '信用卡產業別',\n",
    "        '年收入',\n",
    "        '信用卡交易筆數',\n",
    "        '信用卡交易金額[新臺幣]',\n",
    "    }\n",
    "    missing = required.difference(frame.columns)\n",
    "    if missing:\n",
    "        raise ValueError(f'NCCC columns missing: {sorted(missing)}')\n",
    "\n",
    "    parsed = pd.DataFrame({\n",
    "        'month': frame['年月'].map(parse_month_yyyymm),\n",
    "        'region_name': frame['地區'].astype(str),\n",
    "        'source_industry_code': source_industry_code,\n",
    "        'industry_name': frame['信用卡產業別'].astype(str),\n",
    "        'income_band': frame['年收入'].astype(str),\n",
    "        'tx_count': pd.to_numeric(frame['信用卡交易筆數']),\n",
    "        'card_amount_ntd': pd.to_numeric(frame['信用卡交易金額[新臺幣]']),\n",
    "    })\n",
    "    parsed['industry_code'] = parsed['industry_name'].map(INDUSTRY_NAME_TO_CODE)\n",
    "    parsed['income_band_order'] = parsed['income_band'].map(\n",
    "        {label: index + 1 for index, label in enumerate(INCOME_BAND_ORDER)}\n",
    "    )\n",
    "    parsed['avg_ticket_ntd'] = (\n",
    "        parsed['card_amount_ntd'] / parsed['tx_count'].replace(0, pd.NA)\n",
    "    )\n",
    "    return parsed\n",
    "\n",
    "\n",
    "def parse_dgbas_sdmx(payload: Mapping[str, Any]) -> pd.DataFrame:\n",
    "    \"\"\"Parse DGBAS SDMX-JSON-like payload into tidy observations.\"\"\"\n",
    "    data = payload['data']\n",
    "    structure = data['structure']\n",
    "    dimensions = sorted(\n",
    "        structure['dimensions']['series'],\n",
    "        key=lambda item: item.get('keyPosition', 0),\n",
    "    )\n",
    "    obs_values = structure['dimensions']['observation'][0]['values']\n",
    "    rows: List[Dict[str, Any]] = []\n",
    "\n",
    "    for series_key, series_payload in data['dataSets'][0]['series'].items():\n",
    "        key_parts = [int(part) for part in series_key.split(':')]\n",
    "        row_base: Dict[str, Any] = {}\n",
    "        for dim, value_index in zip(dimensions, key_parts):\n",
    "            dim_value = dim['values'][value_index]\n",
    "            dim_id = dim['id']\n",
    "            row_base[f'{dim_id}_id'] = dim_value['id']\n",
    "            row_base[f'{dim_id}_name'] = dim_value['name']\n",
    "\n",
    "        for obs_index, obs_payload in series_payload['observations'].items():\n",
    "            time_id = obs_values[int(obs_index)]['id']\n",
    "            month = parse_dgbas_month(time_id)\n",
    "            if month is None:\n",
    "                continue\n",
    "            value = obs_payload[0] if obs_payload else None\n",
    "            if value is None:\n",
    "                continue\n",
    "            row = dict(row_base)\n",
    "            row['month'] = month\n",
    "            row['value'] = float(value)\n",
    "            rows.append(row)\n",
    "\n",
    "    return pd.DataFrame(rows)\n",
    "\n",
    "\n",
    "def parse_wid_wealth(zip_path: Path) -> pd.DataFrame:\n",
    "    \"\"\"Parse WID Taiwan wealth-share data and construct p20p80.\"\"\"\n",
    "    with zipfile.ZipFile(zip_path) as archive:\n",
    "        wid_data = pd.read_csv(archive.open('WID_data_TW.csv'), sep=';')\n",
    "        wid_meta = pd.read_csv(archive.open('WID_metadata_TW.csv'), sep=';')\n",
    "\n",
    "    share = wid_data[\n",
    "        (wid_data['country'] == 'TW')\n",
    "        & (wid_data['variable'] == 'shwealj992')\n",
    "        & (wid_data['percentile'].isin(['p0p20', 'p0p80', 'p80p100']))\n",
    "        & (wid_data['year'].between(2018, 2024))\n",
    "    ].copy()\n",
    "    pivot = share.pivot_table(\n",
    "        index='year', columns='percentile', values='value', aggfunc='first'\n",
    "    )\n",
    "    pivot['p20p80'] = pivot['p0p80'] - pivot['p0p20']\n",
    "    tidy = pivot[['p0p20', 'p20p80', 'p80p100']].reset_index()\n",
    "    tidy = tidy.melt(\n",
    "        id_vars='year', var_name='wealth_percentile', value_name='wealth_share'\n",
    "    )\n",
    "    tidy['wealth_share_pct'] = tidy['wealth_share'] * 100\n",
    "\n",
    "    for base_year in [2018, 2020, 2023]:\n",
    "        base = tidy[tidy['year'] == base_year].set_index(\n",
    "            'wealth_percentile'\n",
    "        )['wealth_share_pct']\n",
    "        tidy[f'change_pp_since_{base_year}'] = tidy.apply(\n",
    "            lambda row: row['wealth_share_pct'] - base[row['wealth_percentile']],\n",
    "            axis=1,\n",
    "        )\n",
    "\n",
    "    meta_row = wid_meta[wid_meta['variable'] == 'shwealj992'].iloc[0]\n",
    "    tidy['wid_variable'] = 'shwealj992'\n",
    "    tidy['wid_shortname'] = meta_row['shortname']\n",
    "    tidy['wid_unit'] = meta_row['unit']\n",
    "    return tidy.sort_values(['year', 'wealth_percentile']).reset_index(drop=True)\n",
    "\n",
    "\n",
    "nccc_frames = []\n",
    "for industry_code in NCCC_INDUSTRY_CODES:\n",
    "    nccc_frames.append(\n",
    "        parse_nccc_csv(raw_paths[f'nccc_TWN_{industry_code}'], industry_code)\n",
    "    )\n",
    "nccc_raw = pd.concat(nccc_frames, ignore_index=True)\n",
    "nccc = nccc_raw[nccc_raw['source_industry_code'] == 'ALL'].copy()\n",
    "\n",
    "cpi_payload = read_json(raw_paths['dgbas_cpi_A030104015'])\n",
    "retail_payload = read_json(raw_paths['dgbas_retail_A050107020'])\n",
    "cpi_raw = parse_dgbas_sdmx(cpi_payload)\n",
    "retail_raw = parse_dgbas_sdmx(retail_payload)\n",
    "wid_wealth = parse_wid_wealth(raw_paths['wid_fulldataset_TW'])\n",
    "\n",
    "print('NCCC rows:', len(nccc), nccc['month'].min().date(), nccc['month'].max().date())\n",
    "print('CPI rows:', len(cpi_raw), cpi_raw['month'].min().date(), cpi_raw['month'].max().date())\n",
    "print('Retail rows:', len(retail_raw), retail_raw['month'].min().date(), retail_raw['month'].max().date())\n",
    "print('WID rows:', len(wid_wealth), wid_wealth['year'].min(), wid_wealth['year'].max())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5598bd4d",
   "metadata": {},
   "source": [
    "## Build Indices And Benchmark"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4ce434ec",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:58.109662Z",
     "iopub.status.busy": "2026-06-18T11:51:58.096474Z",
     "iopub.status.idle": "2026-06-18T11:51:58.208039Z",
     "shell.execute_reply": "2026-06-18T11:51:58.208259Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Latest common month: 2026-03\n",
      "Benchmark YoY correlation: 0.326\n",
      "income_group      month  k_gap_index_points  k_gap_log_points\n",
      "96           2026-01-01           34.526200         27.595324\n",
      "97           2026-02-01           19.727092         18.891795\n",
      "98           2026-03-01           23.695287         19.889980\n"
     ]
    }
   ],
   "source": [
    "def allocation_weight(\n",
    "    band_lower: float,\n",
    "    band_upper: Optional[float],\n",
    "    group_lower: float,\n",
    "    group_upper: Optional[float],\n",
    ") -> float:\n",
    "    \"\"\"Return the interval-overlap weight for a NCCC bin split.\"\"\"\n",
    "    if band_upper is None:\n",
    "        return 1.0 if group_upper is None and band_lower >= group_lower else 0.0\n",
    "    effective_group_upper = group_upper if group_upper is not None else band_upper\n",
    "    overlap_lower = max(band_lower, group_lower)\n",
    "    overlap_upper = min(band_upper, effective_group_upper)\n",
    "    overlap = max(0.0, overlap_upper - overlap_lower)\n",
    "    return overlap / (band_upper - band_lower)\n",
    "\n",
    "\n",
    "def build_allocation_frame() -> pd.DataFrame:\n",
    "    \"\"\"Build the bin-split allocation from NCCC bins to income proxies.\"\"\"\n",
    "    rows: List[Dict[str, Any]] = []\n",
    "    for income_band, (band_lower, band_upper) in INCOME_BAND_BOUNDS_NTD.items():\n",
    "        for group, config in INCOME_GROUPS.items():\n",
    "            weight = allocation_weight(\n",
    "                band_lower=band_lower,\n",
    "                band_upper=band_upper,\n",
    "                group_lower=float(config['lower_ntd']),\n",
    "                group_upper=config['upper_ntd'],\n",
    "            )\n",
    "            if weight <= 0:\n",
    "                continue\n",
    "            rows.append({\n",
    "                'income_band': income_band,\n",
    "                'income_band_lower_ntd': band_lower,\n",
    "                'income_band_upper_ntd': band_upper,\n",
    "                'income_group': group,\n",
    "                'income_group_label': config['label'],\n",
    "                'personal_income_proxy': config['percentile_proxy'],\n",
    "                'proxy_income_range': config['income_range'],\n",
    "                'allocation_weight': weight,\n",
    "                'cpi_income_level': CPI_BRIDGE[group],\n",
    "            })\n",
    "    return pd.DataFrame(rows)\n",
    "\n",
    "\n",
    "def allocate_nccc_to_income_proxy(frame: pd.DataFrame) -> pd.DataFrame:\n",
    "    \"\"\"Allocate fixed NCCC bins into personal-income proxy groups.\"\"\"\n",
    "    allocated = frame.merge(allocation_frame, on='income_band', how='left')\n",
    "    if allocated['allocation_weight'].isna().any():\n",
    "        missing = allocated.loc[\n",
    "            allocated['allocation_weight'].isna(), 'income_band'\n",
    "        ].drop_duplicates().tolist()\n",
    "        raise ValueError(f'Missing allocation weights for {missing}')\n",
    "    allocated['card_amount_ntd'] = (\n",
    "        allocated['card_amount_ntd'] * allocated['allocation_weight']\n",
    "    )\n",
    "    allocated['tx_count'] = allocated['tx_count'] * allocated['allocation_weight']\n",
    "    allocated['avg_ticket_ntd'] = (\n",
    "        allocated['card_amount_ntd'] / allocated['tx_count'].replace(0, pd.NA)\n",
    "    )\n",
    "    allocated['allocation_method'] = (\n",
    "        'linear split inside fixed NCCC personal-income bins at '\n",
    "        '381k and 959k annual income thresholds'\n",
    "    )\n",
    "    return allocated\n",
    "\n",
    "\n",
    "def add_index_columns(\n",
    "    frame: pd.DataFrame,\n",
    "    group_columns: List[str],\n",
    "    value_column: str,\n",
    "    output_column: str,\n",
    "    base_month: str,\n",
    ") -> pd.DataFrame:\n",
    "    \"\"\"Add a base-month index column to a dataframe.\"\"\"\n",
    "    indexed = frame.copy()\n",
    "    base_ts = pd.Timestamp(f'{base_month}-01')\n",
    "    base_values = indexed[indexed['month'] == base_ts].set_index(\n",
    "        group_columns\n",
    "    )[value_column]\n",
    "\n",
    "    def compute(row: pd.Series) -> float:\n",
    "        key = tuple(row[column] for column in group_columns)\n",
    "        if len(key) == 1:\n",
    "            key = key[0]\n",
    "        return 100 * row[value_column] / base_values.loc[key]\n",
    "\n",
    "    indexed[output_column] = indexed.apply(compute, axis=1)\n",
    "    return indexed\n",
    "\n",
    "\n",
    "allocation_frame = build_allocation_frame()\n",
    "allocation_weight_check = allocation_frame.groupby('income_band')[\n",
    "    'allocation_weight'\n",
    "].sum()\n",
    "if not allocation_weight_check.round(10).eq(1.0).all():\n",
    "    raise AssertionError('NCCC bin-split allocation weights do not sum to 1')\n",
    "\n",
    "cpi_total = cpi_raw[\n",
    "    (cpi_raw['code1_name'] == '總指數')\n",
    "    & (cpi_raw['fldid_name'].isin(CPI_BRIDGE.values()))\n",
    "].copy()\n",
    "cpi_total = cpi_total.rename(columns={'fldid_name': 'cpi_income_level'})\n",
    "cpi_total = cpi_total[['month', 'cpi_income_level', 'value']].rename(\n",
    "    columns={'value': 'cpi_index_raw'}\n",
    ")\n",
    "\n",
    "nccc_allocated = allocate_nccc_to_income_proxy(nccc)\n",
    "\n",
    "start_ts = pd.Timestamp(f'{START_MONTH}-01')\n",
    "latest_common_month = min(\n",
    "    nccc['month'].max(),\n",
    "    cpi_total['month'].max(),\n",
    "    retail_raw['month'].max(),\n",
    ")\n",
    "nccc_period_original = nccc[\n",
    "    (nccc['month'] >= start_ts) & (nccc['month'] <= latest_common_month)\n",
    "].copy()\n",
    "nccc_period = nccc_allocated[\n",
    "    (nccc_allocated['month'] >= start_ts)\n",
    "    & (nccc_allocated['month'] <= latest_common_month)\n",
    "].copy()\n",
    "cpi_period = cpi_total[\n",
    "    (cpi_total['month'] >= start_ts)\n",
    "    & (cpi_total['month'] <= latest_common_month)\n",
    "].copy()\n",
    "\n",
    "monthly_group = nccc_period.groupby(\n",
    "    [\n",
    "        'month',\n",
    "        'income_group',\n",
    "        'income_group_label',\n",
    "        'personal_income_proxy',\n",
    "        'proxy_income_range',\n",
    "        'cpi_income_level',\n",
    "    ],\n",
    "    as_index=False,\n",
    ").agg(card_amount_ntd=('card_amount_ntd', 'sum'), tx_count=('tx_count', 'sum'))\n",
    "monthly_group['avg_ticket_ntd'] = (\n",
    "    monthly_group['card_amount_ntd']\n",
    "    / monthly_group['tx_count'].replace(0, pd.NA)\n",
    ")\n",
    "monthly_group = monthly_group.merge(\n",
    "    cpi_period, on=['month', 'cpi_income_level'], how='left'\n",
    ")\n",
    "monthly_group['real_amount_proxy'] = (\n",
    "    monthly_group['card_amount_ntd'] / monthly_group['cpi_index_raw']\n",
    ")\n",
    "monthly_group['real_ticket_proxy'] = (\n",
    "    monthly_group['avg_ticket_ntd'] / monthly_group['cpi_index_raw']\n",
    ")\n",
    "monthly_group['income_group'] = pd.Categorical(\n",
    "    monthly_group['income_group'], categories=GROUP_ORDER, ordered=True\n",
    ")\n",
    "monthly_group = monthly_group.sort_values(['income_group', 'month'])\n",
    "monthly_group['nominal_amount_yoy'] = monthly_group.groupby(\n",
    "    'income_group', observed=True\n",
    ")['card_amount_ntd'].pct_change(12)\n",
    "monthly_group['real_amount_yoy'] = monthly_group.groupby(\n",
    "    'income_group', observed=True\n",
    ")['real_amount_proxy'].pct_change(12)\n",
    "monthly_group['tx_count_yoy'] = monthly_group.groupby(\n",
    "    'income_group', observed=True\n",
    ")['tx_count'].pct_change(12)\n",
    "\n",
    "all_indices = []\n",
    "for base_month in BASE_MONTHS:\n",
    "    temp = monthly_group.copy()\n",
    "    temp = add_index_columns(\n",
    "        temp, ['income_group'], 'card_amount_ntd', 'nominal_index', base_month\n",
    "    )\n",
    "    temp = add_index_columns(\n",
    "        temp, ['income_group'], 'cpi_index_raw', 'cpi_index', base_month\n",
    "    )\n",
    "    temp = add_index_columns(\n",
    "        temp, ['income_group'], 'real_amount_proxy', 'real_index', base_month\n",
    "    )\n",
    "    temp = add_index_columns(\n",
    "        temp, ['income_group'], 'tx_count', 'tx_index', base_month\n",
    "    )\n",
    "    temp = add_index_columns(\n",
    "        temp, ['income_group'], 'avg_ticket_ntd', 'ticket_index', base_month\n",
    "    )\n",
    "    temp = add_index_columns(\n",
    "        temp,\n",
    "        ['income_group'],\n",
    "        'real_ticket_proxy',\n",
    "        'real_ticket_index',\n",
    "        base_month,\n",
    "    )\n",
    "    temp['base_month'] = base_month\n",
    "    all_indices.append(temp)\n",
    "indices = pd.concat(all_indices, ignore_index=True)\n",
    "indices['income_group'] = indices['income_group'].astype(str)\n",
    "\n",
    "main_indices = indices[indices['base_month'] == MAIN_BASE_MONTH].copy()\n",
    "wide_real = main_indices.pivot(\n",
    "    index='month', columns='income_group', values='real_index'\n",
    ")\n",
    "k_gap = wide_real.reset_index()\n",
    "k_gap['k_gap_index_points'] = k_gap['Higher'] - k_gap['Lower']\n",
    "k_gap['k_gap_log_points'] = 100 * (\n",
    "    (k_gap['Higher'] / k_gap['Lower']).map(math.log)\n",
    ")\n",
    "\n",
    "nccc_total = nccc_period_original.groupby('month', as_index=False).agg(\n",
    "    nccc_card_amount_ntd=('card_amount_ntd', 'sum')\n",
    ")\n",
    "retail_food = retail_raw[\n",
    "    retail_raw['fldid_name'].isin(['零售業', '餐飲業'])\n",
    "].copy()\n",
    "retail_food = retail_food[\n",
    "    (retail_food['month'] >= start_ts)\n",
    "    & (retail_food['month'] <= latest_common_month)\n",
    "]\n",
    "retail_food = retail_food.groupby('month', as_index=False).agg(\n",
    "    retail_food_sales=('value', 'sum')\n",
    ")\n",
    "benchmark = nccc_total.merge(retail_food, on='month', how='inner')\n",
    "benchmark = benchmark[\n",
    "    (benchmark['month'] >= start_ts)\n",
    "    & (benchmark['month'] <= latest_common_month)\n",
    "].copy()\n",
    "for column in ['nccc_card_amount_ntd', 'retail_food_sales']:\n",
    "    base_value = benchmark.loc[\n",
    "        benchmark['month'] == pd.Timestamp(f'{MAIN_BASE_MONTH}-01'), column\n",
    "    ].iloc[0]\n",
    "    benchmark[f'{column}_index'] = 100 * benchmark[column] / base_value\n",
    "    benchmark[f'{column}_yoy'] = benchmark[column].pct_change(12)\n",
    "benchmark_yoy_corr = benchmark['nccc_card_amount_ntd_yoy'].corr(\n",
    "    benchmark['retail_food_sales_yoy']\n",
    ")\n",
    "\n",
    "bin_monthly = nccc_period_original.groupby(\n",
    "    ['month', 'income_band', 'income_band_order'],\n",
    "    as_index=False,\n",
    ").agg(card_amount_ntd=('card_amount_ntd', 'sum'), tx_count=('tx_count', 'sum'))\n",
    "bin_indices = add_index_columns(\n",
    "    bin_monthly,\n",
    "    ['income_band'],\n",
    "    'card_amount_ntd',\n",
    "    'nominal_index',\n",
    "    MAIN_BASE_MONTH,\n",
    ")\n",
    "\n",
    "print('Latest common month:', latest_common_month.strftime('%Y-%m'))\n",
    "print('Benchmark YoY correlation:', round(float(benchmark_yoy_corr), 3))\n",
    "print(k_gap.tail(3)[['month', 'k_gap_index_points', 'k_gap_log_points']])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13ad5d39",
   "metadata": {},
   "source": [
    "## QA Gates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cf32d41a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:58.218026Z",
     "iopub.status.busy": "2026-06-18T11:51:58.217581Z",
     "iopub.status.idle": "2026-06-18T11:51:58.221854Z",
     "shell.execute_reply": "2026-06-18T11:51:58.222012Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                                 name  passed  \\\n",
      "0                 required sources fetched and cached    True   \n",
      "1                       required NCCC columns present    True   \n",
      "2                  8 NCCC income bins exactly matched    True   \n",
      "3                                 NCCC key uniqueness    True   \n",
      "4               non-negative amounts and transactions    True   \n",
      "5   personal-income proxy allocation weights complete    True   \n",
      "6                                 CPI bridge complete    True   \n",
      "7                                 base months present    True   \n",
      "8      DGBAS benchmark parses retail and food-service    True   \n",
      "9    YoY overlap sufficient for benchmark correlation    True   \n",
      "10                             WID p20p80 constructed    True   \n",
      "\n",
      "                                               detail  \n",
      "0   11 raw files cached under /Users/qin/side_proj...  \n",
      "1   card_amount_ntd, income_band, industry_name, m...  \n",
      "2   未達50萬, 50(含)-75萬, 75(含)-100萬, 100(含)-125萬, 125...  \n",
      "3   month + industry_code + income_band is unique ...  \n",
      "4              card_amount_ntd >= 0 and tx_count >= 0  \n",
      "5   NCCC bins are split at 381k and 959k annual pe...  \n",
      "6   all personal-income proxy groups have a househ...  \n",
      "7                           2018-01, 2020-01, 2023-01  \n",
      "8                          批發、零售及餐飲業合計, 批發業, 零售業, 餐飲業  \n",
      "9                           87 overlapping YoY months  \n",
      "10  p20p80 = p0p80 - p0p20 for shwealj992 wealth s...  \n"
     ]
    }
   ],
   "source": [
    "def qa_gate(name: str, passed: bool, detail: str) -> Dict[str, Any]:\n",
    "    \"\"\"Create a standard QA gate result.\"\"\"\n",
    "    return {'name': name, 'passed': bool(passed), 'detail': detail}\n",
    "\n",
    "\n",
    "def scan_public_for_raw_files(public_dir: Path) -> bool:\n",
    "    \"\"\"Return True when public artifacts do not contain raw cache files.\"\"\"\n",
    "    forbidden_terms = {'raw', 'fulldataset', 'cache'}\n",
    "    for path in public_dir.rglob('*'):\n",
    "        lowered_parts = {part.lower() for part in path.parts}\n",
    "        if lowered_parts.intersection(forbidden_terms):\n",
    "            return False\n",
    "        if path.suffix.lower() == '.zip':\n",
    "            return False\n",
    "    return True\n",
    "\n",
    "\n",
    "expected_nccc_columns = {\n",
    "    'month',\n",
    "    'region_name',\n",
    "    'industry_name',\n",
    "    'income_band',\n",
    "    'tx_count',\n",
    "    'card_amount_ntd',\n",
    "}\n",
    "base_month_set = set(BASE_MONTHS)\n",
    "index_months = set(indices['month'].dt.strftime('%Y-%m'))\n",
    "yoy_overlap = benchmark[['nccc_card_amount_ntd_yoy', 'retail_food_sales_yoy']].dropna()\n",
    "wid_pivot_check = wid_wealth.pivot_table(\n",
    "    index='year', columns='wealth_percentile', values='wealth_share', aggfunc='first'\n",
    ")\n",
    "wid_constructed_ok = (\n",
    "    {'p0p20', 'p20p80', 'p80p100'}.issubset(wid_pivot_check.columns)\n",
    "    and wid_pivot_check.notna().all().all()\n",
    ")\n",
    "\n",
    "qa_gates: List[Dict[str, Any]] = [\n",
    "    qa_gate(\n",
    "        'required sources fetched and cached',\n",
    "        len(raw_paths) == 11 and all(path.exists() for path in raw_paths.values()),\n",
    "        f'{len(raw_paths)} raw files cached under {RAW_CACHE_DIR}',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'required NCCC columns present',\n",
    "        expected_nccc_columns.issubset(nccc.columns),\n",
    "        ', '.join(sorted(expected_nccc_columns)),\n",
    "    ),\n",
    "    qa_gate(\n",
    "        '8 NCCC income bins exactly matched',\n",
    "        set(nccc['income_band'].dropna().unique()) == set(INCOME_BAND_ORDER),\n",
    "        ', '.join(nccc['income_band'].drop_duplicates().tolist()),\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'NCCC key uniqueness',\n",
    "        not nccc.duplicated(['month', 'industry_code', 'income_band']).any(),\n",
    "        'month + industry_code + income_band is unique in ALL endpoint',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'non-negative amounts and transactions',\n",
    "        (nccc['card_amount_ntd'].ge(0).all() and nccc['tx_count'].ge(0).all()),\n",
    "        'card_amount_ntd >= 0 and tx_count >= 0',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'personal-income proxy allocation weights complete',\n",
    "        allocation_weight_check.round(10).eq(1.0).all(),\n",
    "        'NCCC bins are split at 381k and 959k annual personal-income thresholds',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'CPI bridge complete',\n",
    "        main_indices['cpi_index_raw'].notna().all(),\n",
    "        'all personal-income proxy groups have a household income-level CPI proxy',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'base months present',\n",
    "        base_month_set.issubset(index_months),\n",
    "        ', '.join(sorted(base_month_set.intersection(index_months))),\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'DGBAS benchmark parses retail and food-service',\n",
    "        {'零售業', '餐飲業'}.issubset(set(retail_raw['fldid_name'].unique())),\n",
    "        ', '.join(retail_raw['fldid_name'].drop_duplicates().tolist()),\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'YoY overlap sufficient for benchmark correlation',\n",
    "        len(yoy_overlap) >= 60,\n",
    "        f'{len(yoy_overlap)} overlapping YoY months',\n",
    "    ),\n",
    "    qa_gate(\n",
    "        'WID p20p80 constructed',\n",
    "        wid_constructed_ok,\n",
    "        'p20p80 = p0p80 - p0p20 for shwealj992 wealth share',\n",
    "    ),\n",
    "]\n",
    "\n",
    "print(pd.DataFrame(qa_gates))\n",
    "if not all(gate['passed'] for gate in qa_gates):\n",
    "    raise AssertionError('One or more preliminary QA gates failed')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cdf595a",
   "metadata": {},
   "source": [
    "## Write Public Artifacts And Charts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "da615404",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:58.232956Z",
     "iopub.status.busy": "2026-06-18T11:51:58.232186Z",
     "iopub.status.idle": "2026-06-18T11:51:59.187158Z",
     "shell.execute_reply": "2026-06-18T11:51:59.187333Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wrote public CSV and PNG artifacts to /Users/qin/side_projects_workspace/pyivest/public/data/taiwan-k-shaped-card-spending\n"
     ]
    }
   ],
   "source": [
    "def write_csv(frame: pd.DataFrame, filename: str) -> Path:\n",
    "    \"\"\"Write a dataframe to the public artifact directory.\"\"\"\n",
    "    path = PUBLIC_DATA_DIR / filename\n",
    "    frame.to_csv(path, index=False)\n",
    "    return path\n",
    "\n",
    "\n",
    "def month_label(series: pd.Series) -> pd.Series:\n",
    "    \"\"\"Return YYYY-MM labels for month timestamps.\"\"\"\n",
    "    return pd.to_datetime(series).dt.strftime('%Y-%m')\n",
    "\n",
    "\n",
    "def plot_group_lines(\n",
    "    frame: pd.DataFrame,\n",
    "    value_column: str,\n",
    "    title: str,\n",
    "    ylabel: str,\n",
    "    output_name: str,\n",
    ") -> Path:\n",
    "    \"\"\"Plot Low/Middle/High personal-income proxy line chart.\"\"\"\n",
    "    fig, ax = plt.subplots(figsize=(10, 5.6))\n",
    "    for group in GROUP_ORDER:\n",
    "        subset = frame[frame['income_group'] == group]\n",
    "        ax.plot(\n",
    "            subset['month'],\n",
    "            subset[value_column],\n",
    "            linewidth=2.1,\n",
    "            label=GROUP_LABELS[group],\n",
    "        )\n",
    "    ax.axhline(100, color='#737373', linewidth=0.8, linestyle='--')\n",
    "    ax.set_title(title)\n",
    "    ax.set_ylabel(ylabel)\n",
    "    ax.set_xlabel('Month')\n",
    "    ax.grid(True, alpha=0.25)\n",
    "    ax.legend(loc='best')\n",
    "    fig.autofmt_xdate()\n",
    "    path = PUBLIC_DATA_DIR / output_name\n",
    "    fig.savefig(path, dpi=170, bbox_inches='tight')\n",
    "    plt.close(fig)\n",
    "    return path\n",
    "\n",
    "\n",
    "nccc_public = nccc_period[[\n",
    "    'month',\n",
    "    'region_name',\n",
    "    'industry_code',\n",
    "    'industry_name',\n",
    "    'income_band',\n",
    "    'income_band_order',\n",
    "    'income_group',\n",
    "    'income_group_label',\n",
    "    'personal_income_proxy',\n",
    "    'proxy_income_range',\n",
    "    'allocation_weight',\n",
    "    'allocation_method',\n",
    "    'tx_count',\n",
    "    'card_amount_ntd',\n",
    "    'avg_ticket_ntd',\n",
    "]].copy()\n",
    "nccc_public['month'] = month_label(nccc_public['month'])\n",
    "\n",
    "indices_public = indices[[\n",
    "    'base_month',\n",
    "    'month',\n",
    "    'income_group',\n",
    "    'income_group_label',\n",
    "    'personal_income_proxy',\n",
    "    'proxy_income_range',\n",
    "    'cpi_income_level',\n",
    "    'card_amount_ntd',\n",
    "    'tx_count',\n",
    "    'avg_ticket_ntd',\n",
    "    'cpi_index_raw',\n",
    "    'nominal_index',\n",
    "    'cpi_index',\n",
    "    'real_index',\n",
    "    'tx_index',\n",
    "    'ticket_index',\n",
    "    'real_ticket_index',\n",
    "    'nominal_amount_yoy',\n",
    "    'real_amount_yoy',\n",
    "    'tx_count_yoy',\n",
    "]].copy()\n",
    "indices_public['month'] = month_label(indices_public['month'])\n",
    "\n",
    "benchmark_public = benchmark.copy()\n",
    "benchmark_public['month'] = month_label(benchmark_public['month'])\n",
    "\n",
    "write_csv(nccc_public, 'nccc_income_industry_monthly.csv')\n",
    "write_csv(indices_public, 'real_card_spending_indices.csv')\n",
    "write_csv(benchmark_public, 'benchmark_monthly.csv')\n",
    "write_csv(wid_wealth, 'wid_wealth_distribution_appendix.csv')\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 5.6))\n",
    "ax.plot(\n",
    "    benchmark['month'],\n",
    "    benchmark['nccc_card_amount_ntd_index'],\n",
    "    linewidth=2.2,\n",
    "    label='NCCC all credit-card signing amount',\n",
    ")\n",
    "ax.plot(\n",
    "    benchmark['month'],\n",
    "    benchmark['retail_food_sales_index'],\n",
    "    linewidth=2.2,\n",
    "    label='DGBAS retail + food-service sales',\n",
    ")\n",
    "ax.axhline(100, color='#737373', linewidth=0.8, linestyle='--')\n",
    "ax.set_title(f'Benchmark: NCCC vs official sales, YoY corr={benchmark_yoy_corr:.2f}')\n",
    "ax.set_ylabel('Index, 2023-01 = 100')\n",
    "ax.set_xlabel('Month')\n",
    "ax.grid(True, alpha=0.25)\n",
    "ax.legend(loc='best')\n",
    "fig.autofmt_xdate()\n",
    "fig.savefig(PUBLIC_DATA_DIR / 'benchmark.png', dpi=170, bbox_inches='tight')\n",
    "plt.close(fig)\n",
    "\n",
    "plot_group_lines(\n",
    "    main_indices,\n",
    "    'nominal_index',\n",
    "    'Nominal card signing amount by personal-income proxy',\n",
    "    'Index, 2023-01 = 100',\n",
    "    'nominal_spending.png',\n",
    ")\n",
    "plot_group_lines(\n",
    "    main_indices,\n",
    "    'real_index',\n",
    "    'Real card signing amount by personal-income proxy',\n",
    "    'Index, 2023-01 = 100',\n",
    "    'real_spending.png',\n",
    ")\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 5.4))\n",
    "ax.plot(k_gap['month'], k_gap['k_gap_index_points'], linewidth=2.2)\n",
    "ax.axhline(0, color='#737373', linewidth=0.8, linestyle='--')\n",
    "ax.set_title('High-Low personal-income proxy real card signing K-gap')\n",
    "ax.set_ylabel('Index-point gap')\n",
    "ax.set_xlabel('Month')\n",
    "ax.grid(True, alpha=0.25)\n",
    "fig.autofmt_xdate()\n",
    "fig.savefig(PUBLIC_DATA_DIR / 'k_gap.png', dpi=170, bbox_inches='tight')\n",
    "plt.close(fig)\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10.5, 6.0))\n",
    "for income_band in INCOME_BAND_ORDER:\n",
    "    subset = bin_indices[bin_indices['income_band'] == income_band]\n",
    "    ax.plot(\n",
    "        subset['month'],\n",
    "        subset['nominal_index'],\n",
    "        linewidth=1.6,\n",
    "        label=income_band,\n",
    "    )\n",
    "ax.axhline(100, color='#737373', linewidth=0.8, linestyle='--')\n",
    "ax.set_title('Eight fixed NCCC personal-income bins, nominal card signing index')\n",
    "ax.set_ylabel('Index, 2023-01 = 100')\n",
    "ax.set_xlabel('Month')\n",
    "ax.grid(True, alpha=0.22)\n",
    "ax.legend(loc='upper left', ncols=2, fontsize=8)\n",
    "fig.autofmt_xdate()\n",
    "fig.savefig(PUBLIC_DATA_DIR / 'income_bins_8line.png', dpi=170, bbox_inches='tight')\n",
    "plt.close(fig)\n",
    "\n",
    "fig, axes = plt.subplots(2, 1, figsize=(10, 7.2), sharex=True)\n",
    "for group in GROUP_ORDER:\n",
    "    subset = main_indices[main_indices['income_group'] == group]\n",
    "    axes[0].plot(subset['month'], subset['tx_index'], linewidth=2, label=group)\n",
    "    axes[1].plot(subset['month'], subset['ticket_index'], linewidth=2, label=group)\n",
    "axes[0].set_title('Transactions index')\n",
    "axes[0].set_ylabel('Index, 2023-01 = 100')\n",
    "axes[1].set_title('Average ticket index')\n",
    "axes[1].set_ylabel('Index, 2023-01 = 100')\n",
    "axes[1].set_xlabel('Month')\n",
    "for axis in axes:\n",
    "    axis.axhline(100, color='#737373', linewidth=0.8, linestyle='--')\n",
    "    axis.grid(True, alpha=0.25)\n",
    "    axis.legend(loc='best')\n",
    "fig.autofmt_xdate()\n",
    "fig.tight_layout()\n",
    "fig.savefig(\n",
    "    PUBLIC_DATA_DIR / 'transactions_ticket_diagnostic.png',\n",
    "    dpi=170,\n",
    "    bbox_inches='tight',\n",
    ")\n",
    "plt.close(fig)\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(12, 5.2))\n",
    "for percentile in ['p0p20', 'p20p80', 'p80p100']:\n",
    "    subset = wid_wealth[wid_wealth['wealth_percentile'] == percentile]\n",
    "    axes[0].plot(\n",
    "        subset['year'], subset['wealth_share_pct'], linewidth=2.1, label=percentile\n",
    "    )\n",
    "    axes[1].plot(\n",
    "        subset['year'],\n",
    "        subset['change_pp_since_2023'],\n",
    "        linewidth=2.1,\n",
    "        label=percentile,\n",
    "    )\n",
    "axes[0].set_title('WID Taiwan net personal wealth share')\n",
    "axes[0].set_ylabel('Share of net personal wealth (%)')\n",
    "axes[1].set_title('Change since 2023')\n",
    "axes[1].set_ylabel('Percentage-point change')\n",
    "for axis in axes:\n",
    "    axis.axhline(0, color='#737373', linewidth=0.8, linestyle='--')\n",
    "    axis.grid(True, alpha=0.25)\n",
    "    axis.legend(loc='best')\n",
    "    axis.set_xlabel('Year')\n",
    "fig.tight_layout()\n",
    "fig.savefig(\n",
    "    PUBLIC_DATA_DIR / 'wid_wealth_share_appendix.png',\n",
    "    dpi=170,\n",
    "    bbox_inches='tight',\n",
    ")\n",
    "plt.close(fig)\n",
    "\n",
    "print('Wrote public CSV and PNG artifacts to', PUBLIC_DATA_DIR)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18aa38f9",
   "metadata": {},
   "source": [
    "## Summary JSON And Expected Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2ea48d56",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-18T11:51:59.197204Z",
     "iopub.status.busy": "2026-06-18T11:51:59.196615Z",
     "iopub.status.idle": "2026-06-18T11:51:59.216589Z",
     "shell.execute_reply": "2026-06-18T11:51:59.216753Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaVersion\": 1,\n",
      "  \"latest_month\": \"2026-03\",\n",
      "  \"benchmark_yoy_corr\": 0.3263731568155817,\n",
      "  \"latest_real_index_lower\": 107.67662889332999,\n",
      "  \"latest_real_index_middle\": 111.31947259247002,\n",
      "  \"latest_real_index_higher\": 131.37191607861817,\n",
      "  \"latest_k_gap_index_points\": 23.69528718528818,\n",
      "  \"latest_k_gap_log_points\": 19.889979603902155,\n",
      "  \"post_k_gap_shift_index_points\": 13.90646319356825,\n",
      "  \"qa_passed\": 13,\n",
      "  \"qa_total\": 13\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "latest_month = latest_common_month.strftime('%Y-%m')\n",
    "latest_real = main_indices[main_indices['month'] == latest_common_month].set_index(\n",
    "    'income_group'\n",
    ")\n",
    "latest_k_gap = k_gap[k_gap['month'] == latest_common_month].iloc[0]\n",
    "post_start = pd.Timestamp('2023-01-01')\n",
    "pre_gap_mean = k_gap[k_gap['month'] < post_start]['k_gap_index_points'].mean()\n",
    "post_gap_mean = k_gap[k_gap['month'] >= post_start]['k_gap_index_points'].mean()\n",
    "post_k_gap_shift = post_gap_mean - pre_gap_mean\n",
    "\n",
    "summary_metrics = {\n",
    "    'latest_month': latest_month,\n",
    "    'benchmark_yoy_corr': float(benchmark_yoy_corr),\n",
    "    'latest_real_index_lower': float(latest_real.loc['Lower', 'real_index']),\n",
    "    'latest_real_index_middle': float(latest_real.loc['Middle', 'real_index']),\n",
    "    'latest_real_index_higher': float(latest_real.loc['Higher', 'real_index']),\n",
    "    'latest_k_gap_index_points': float(latest_k_gap['k_gap_index_points']),\n",
    "    'latest_k_gap_log_points': float(latest_k_gap['k_gap_log_points']),\n",
    "    'post_k_gap_shift_index_points': float(post_k_gap_shift),\n",
    "    'nccc_start_month': nccc['month'].min().strftime('%Y-%m'),\n",
    "    'nccc_latest_month': nccc['month'].max().strftime('%Y-%m'),\n",
    "    'analysis_start_month': START_MONTH,\n",
    "    'analysis_latest_month': latest_month,\n",
    "}\n",
    "\n",
    "expected_artifacts = [\n",
    "    'summary.json',\n",
    "    'expected_results.json',\n",
    "    'reproduction-manifest.json',\n",
    "    'chart_series.json',\n",
    "    'nccc_income_industry_monthly.csv',\n",
    "    'real_card_spending_indices.csv',\n",
    "    'benchmark_monthly.csv',\n",
    "    'wid_wealth_distribution_appendix.csv',\n",
    "    'data_quality_summary.csv',\n",
    "    'benchmark.png',\n",
    "    'nominal_spending.png',\n",
    "    'real_spending.png',\n",
    "    'k_gap.png',\n",
    "    'income_bins_8line.png',\n",
    "    'transactions_ticket_diagnostic.png',\n",
    "    'wid_wealth_share_appendix.png',\n",
    "]\n",
    "artifact_gate_inputs = {\n",
    "    'summary.json',\n",
    "    'expected_results.json',\n",
    "    'reproduction-manifest.json',\n",
    "    'chart_series.json',\n",
    "    'data_quality_summary.csv',\n",
    "}\n",
    "artifact_gate = qa_gate(\n",
    "    'all public data tables and figures generated',\n",
    "    all(\n",
    "        (PUBLIC_DATA_DIR / name).exists()\n",
    "        for name in expected_artifacts\n",
    "        if name not in artifact_gate_inputs\n",
    "    ),\n",
    "    f'{len(expected_artifacts) - len(artifact_gate_inputs)} data/figure artifacts',\n",
    ")\n",
    "raw_public_gate = qa_gate(\n",
    "    'no raw cache files under public artifacts',\n",
    "    scan_public_for_raw_files(PUBLIC_DATA_DIR),\n",
    "    'raw downloads are cached under .cache and excluded from public data',\n",
    ")\n",
    "qa_gates.extend([artifact_gate, raw_public_gate])\n",
    "qa_passed = sum(1 for gate in qa_gates if gate['passed'])\n",
    "qa_total = len(qa_gates)\n",
    "if qa_passed != qa_total:\n",
    "    raise AssertionError('Final QA gates failed')\n",
    "\n",
    "metadata = {\n",
    "    'schemaVersion': 1,\n",
    "    'slug': SLUG,\n",
    "    'title': 'Taiwan K-Shaped Card Spending',\n",
    "    'createdAt': dt.datetime.now(dt.timezone.utc).isoformat(),\n",
    "    'sources': {\n",
    "        'nccc': {\n",
    "            'urlTemplate': NCCC_URL,\n",
    "            'regionCode': 'TWN',\n",
    "            'industryCodes': NCCC_INDUSTRY_CODES,\n",
    "            'note': 'NCCC aggregate credit-card signing amount and transaction count by fixed personal-income bins; main groups are bin-split personal-income proxies.',\n",
    "        },\n",
    "        'dgbasIncomeCpi': {\n",
    "            'url': DGBAS_CPI_URL,\n",
    "            'functionCode': 'A030104015',\n",
    "        },\n",
    "        'dgbasRetailBenchmark': {\n",
    "            'url': DGBAS_RETAIL_URL,\n",
    "            'functionCode': 'A050107020',\n",
    "            'series': ['零售業', '餐飲業'],\n",
    "        },\n",
    "        'widTaiwanWealth': {\n",
    "            'url': WID_TW_URL,\n",
    "            'variable': 'shwealj992',\n",
    "            'note': 'WID appendix only; not an official Taiwan Open API.',\n",
    "        },\n",
    "    },\n",
    "    'incomeGroups': INCOME_GROUPS,\n",
    "    'allocationTable': json.loads(allocation_frame.to_json(orient='records', force_ascii=False)),\n",
    "    'cpiBridge': CPI_BRIDGE,\n",
    "    'baseMonths': BASE_MONTHS,\n",
    "    'mainBaseMonth': MAIN_BASE_MONTH,\n",
    "    'metrics': summary_metrics,\n",
    "    'qaGates': qa_gates,\n",
    "    'caveats': [\n",
    "        'NCCC data are aggregate credit-card signing amounts and transaction counts, not all household consumption.',\n",
    "        'NCCC fixed personal-income bins are split into percentile-proxy groups using within-bin interval weights.',\n",
    "        'DGBAS income-level CPI is household-based; it is used because no matching personal-income CPI is available.',\n",
    "        'WID wealth data are annual external estimates and only provide distributional context.',\n",
    "    ],\n",
    "}\n",
    "\n",
    "expected_results = {\n",
    "    'schemaVersion': 1,\n",
    "    'latest_month': latest_month,\n",
    "    'benchmark_yoy_corr': summary_metrics['benchmark_yoy_corr'],\n",
    "    'latest_real_index_lower': summary_metrics['latest_real_index_lower'],\n",
    "    'latest_real_index_middle': summary_metrics['latest_real_index_middle'],\n",
    "    'latest_real_index_higher': summary_metrics['latest_real_index_higher'],\n",
    "    'latest_k_gap_index_points': summary_metrics['latest_k_gap_index_points'],\n",
    "    'latest_k_gap_log_points': summary_metrics['latest_k_gap_log_points'],\n",
    "    'post_k_gap_shift_index_points': summary_metrics['post_k_gap_shift_index_points'],\n",
    "    'qa_passed': qa_passed,\n",
    "    'qa_total': qa_total,\n",
    "}\n",
    "\n",
    "chart_series = {\n",
    "    'realSpending': [\n",
    "        {\n",
    "            'month': row['month'].strftime('%Y-%m'),\n",
    "            'incomeGroup': row['income_group'],\n",
    "            'realIndex': float(row['real_index']),\n",
    "        }\n",
    "        for _, row in main_indices.iterrows()\n",
    "    ],\n",
    "    'kGap': [\n",
    "        {\n",
    "            'month': row['month'].strftime('%Y-%m'),\n",
    "            'kGapIndexPoints': float(row['k_gap_index_points']),\n",
    "            'kGapLogPoints': float(row['k_gap_log_points']),\n",
    "        }\n",
    "        for _, row in k_gap.iterrows()\n",
    "    ],\n",
    "    'benchmark': [\n",
    "        {\n",
    "            'month': row['month'].strftime('%Y-%m'),\n",
    "            'ncccIndex': float(row['nccc_card_amount_ntd_index']),\n",
    "            'retailFoodIndex': float(row['retail_food_sales_index']),\n",
    "        }\n",
    "        for _, row in benchmark.iterrows()\n",
    "    ],\n",
    "}\n",
    "\n",
    "reproduction_manifest = {\n",
    "    'schemaVersion': 1,\n",
    "    'slug': SLUG,\n",
    "    'publicArtifacts': expected_artifacts,\n",
    "    'rawCachePolicy': 'Raw API/ZIP downloads are cached under .cache and not published.',\n",
    "    'notebook': f'/notebooks/{SLUG}.ipynb',\n",
    "    'analysisWindow': {\n",
    "        'startMonth': START_MONTH,\n",
    "        'latestCommonMonth': latest_month,\n",
    "    },\n",
    "    'sourceRequirements': [\n",
    "        'NCCC OpenAPI C03 income-group credit-card spending CSV endpoints',\n",
    "        'DGBAS A030104015 income-level CPI JSON endpoint',\n",
    "        'DGBAS A050107020 retail and food-service sales JSON endpoint',\n",
    "        'WID Taiwan bulk dataset ZIP for annual wealth distribution appendix',\n",
    "    ],\n",
    "}\n",
    "\n",
    "(PUBLIC_DATA_DIR / 'summary.json').write_text(\n",
    "    json.dumps(metadata, ensure_ascii=False, indent=2), encoding='utf-8'\n",
    ")\n",
    "(PUBLIC_DATA_DIR / 'expected_results.json').write_text(\n",
    "    json.dumps(expected_results, ensure_ascii=False, indent=2), encoding='utf-8'\n",
    ")\n",
    "(PUBLIC_DATA_DIR / 'chart_series.json').write_text(\n",
    "    json.dumps(chart_series, ensure_ascii=False), encoding='utf-8'\n",
    ")\n",
    "(PUBLIC_DATA_DIR / 'reproduction-manifest.json').write_text(\n",
    "    json.dumps(reproduction_manifest, ensure_ascii=False, indent=2),\n",
    "    encoding='utf-8',\n",
    ")\n",
    "pd.DataFrame(qa_gates).to_csv(PUBLIC_DATA_DIR / 'data_quality_summary.csv', index=False)\n",
    "\n",
    "print(json.dumps(expected_results, ensure_ascii=False, indent=2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65ec3612",
   "metadata": {},
   "source": [
    "## Interpretation Notes\n",
    "\n",
    "The public conclusion should stay inside the evidence boundary:\n",
    "\n",
    "- If benchmark fit is weak, call the result **credit-card payment activity divergence**, not household consumption divergence.\n",
    "- NCCC bins are fixed reported-income bins, not income percentiles.\n",
    "- WID is distributional wealth context only; it does not identify the wealth of NCCC cardholders."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.23"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
