{"id":1886,"date":"2026-04-20T04:46:14","date_gmt":"2026-04-20T04:46:14","guid":{"rendered":"https:\/\/inphronesys.com\/?p=1886"},"modified":"2026-04-20T04:46:14","modified_gmt":"2026-04-20T04:46:14","slug":"when-the-algorithm-is-wrong-and-the-expert-is-right","status":"publish","type":"post","link":"https:\/\/inphronesys.com\/?p=1886","title":{"rendered":"When the Algorithm Is Wrong and the Expert Is Right"},"content":{"rendered":"<p>Your statistical forecast doesn&#8217;t read the news. It cannot know that your key supplier&#8217;s factory caught fire on Tuesday, that your biggest competitor is running a clearance sale next month, or that the EU tightened a regulation overnight. History-based forecasts are silent on anything history hasn&#8217;t already seen \u2014 which is exactly why the expert in your planning meeting is still there. The question isn&#8217;t <em>whether<\/em> to use their judgment. The question is how to use it without making the forecast worse.<\/p>\n<p>For the last few posts in this series I&#8217;ve written about picking a forecasting model as a horse race. The M5 lesson was that simple methods still win. The six-model bake-off showed how to compare candidates honestly using MASE and cross-validation. Every one of those posts assumed something you don&#8217;t always get: that the past is a decent guide to the near future. Sometimes it isn&#8217;t. Sometimes the track changes and the model doesn&#8217;t know yet \u2014 and the only person who does is the human sitting in the planning meeting. This post is about what to do then, and about the quiet bias that makes that person&#8217;s judgment systematically wrong in a way almost no one writes about.<\/p>\n<h2>When the Model Doesn&#8217;t Know What You Know<\/h2>\n<p>Here is a monthly demand series for an industrial consumable. Forty-two observations, from January 2022 through June 2025. The first two years look like every textbook example: clean seasonal wave, gentle upward trend, a little noise. You could close your eyes, throw a dart at the forecasting toolbox, and get something respectable.<\/p>\n<p>Then March 2024 happens. A supplier goes offline. Demand collapses. Over three months the series sits at a trough of <strong>1,469 units<\/strong>, roughly 55% of the pre-shock baseline, before recovering over another three months. By late 2024 the series has settled into a new, lower steady state.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/inphronesys.com\/wp-content\/uploads\/2026\/04\/jfc_model_failure.png\" alt=\"ETS forecast vs. actual demand during a supplier disruption\" \/><\/p>\n<p>The dashed line is an ETS(A,A,A) model \u2014 additive error, additive trend, additive seasonality \u2014 trained on the first 26 months of data and asked to forecast through the shock. It does exactly what it was designed to do: extrapolate the pattern in its training window. At the trough, it predicts <strong>2,629 units<\/strong>. Actual demand is <strong>1,469 units<\/strong>. The model&#8217;s forecast is <strong>44% phantom demand<\/strong> \u2014 it predicts 1,160 units per month that don&#8217;t exist.<\/p>\n<p>No hyperparameter tuning would have fixed this. No clever feature engineering would have fixed it either. The disruption wasn&#8217;t in the training data, so from the model&#8217;s point of view it does not exist. This is the <strong>structural break<\/strong> problem \u2014 the single most important limitation of every forecasting method we&#8217;ve covered so far. Statistical methods assume the data-generating process in the future resembles the one in the past. When that assumption breaks, the forecast breaks with it, silently and confidently.<\/p>\n<p>The cases where this matters in supply chain are the ones that show up in every operations review:<\/p>\n<ul>\n<li><strong>New products with no history.<\/strong> The model has nothing to extrapolate from. Judgmental input (analogies to similar launches, channel intelligence, pre-order data) is the <em>only<\/em> signal.<\/li>\n<li><strong>Demand shocks.<\/strong> COVID, the Suez blockage, a regional conflict, a customer going bankrupt. By the time the shock is in the training data, the damage is done.<\/li>\n<li><strong>Promotional events.<\/strong> Next month&#8217;s clearance sale looks nothing like last year&#8217;s BAU month. The algorithm saw last year&#8217;s pattern; the planner sees this quarter&#8217;s promotional calendar.<\/li>\n<li><strong>Regulatory changes.<\/strong> A new tariff, a packaging rule, a material ban. Step-function effects that the model cannot anticipate because it has no mechanism to read policy documents.<\/li>\n<\/ul>\n<p>In each case the expert is not being asked to beat the algorithm on a level playing field. They&#8217;re being asked for information the algorithm never had access to. That is an entirely different \u2014 and entirely winnable \u2014 game.<\/p>\n<h2>The Bias No One Talks About: Anchoring<\/h2>\n<p>Here is the finding most likely to change how you run your planning meetings: <strong>expert adjustments to a statistical forecast are directionally correct but chronically under-sized.<\/strong><\/p>\n<p>Human judgment under uncertainty doesn&#8217;t start from a blank sheet of paper. It starts from whatever number was put in front of you \u2014 the anchor \u2014 and adjusts away from it. Daniel Kahneman and Amos Tversky documented this in the 1970s, and decades of forecasting research since has reproduced the pattern in the specific setting that matters to us: demand planners adjusting a baseline statistical forecast in response to real-world information they know the model is missing.<\/p>\n<p>The anchor is the baseline. The adjustment is toward the right answer. The trouble is that the adjustment almost always stops too early.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/inphronesys.com\/wp-content\/uploads\/2026\/04\/jfc_adjustment_bias.png\" alt=\"50 expert adjustments plotted against what the adjustment should have been\" \/><\/p>\n<p>That scatter plot is 50 synthetic adjustment opportunities, calibrated to match what the literature keeps finding. Each dot is one moment where a demand planner had to override a baseline forecast because they knew something the model didn&#8217;t \u2014 an incoming promotion, a known competitor stockout, a new account win. The x-axis is the adjustment that <em>would have been correct<\/em> in hindsight. The y-axis is the adjustment the planner actually applied.<\/p>\n<p>The grey dashed line at 45\u00b0 is what a perfectly-calibrated override program would produce: when the correct answer is +20%, the expert applies +20%; when the correct answer is -15%, the expert applies -15%. The red line is what experts actually do. Its slope is <strong>0.56<\/strong>. Not 1.0. Experts move in the right direction on nearly every call \u2014 but they cover barely more than half the distance to the correct answer, on average.<\/p>\n<p>This matters because the loudest objection to judgmental forecasting \u2014 &#8222;humans are random, they just add noise&#8220; \u2014 is, on a big enough sample, <strong>wrong<\/strong>. Humans aren&#8217;t random. They&#8217;re systematically under-adjusting. If your baseline says 1,000 units and the truth is 1,500, a planner presented with 1,000 as an anchor will typically propose something like 1,280, not 1,500. The direction is right. The magnitude is lazy. Fildes et al. (2009) documented the same systematic under-adjustment pattern empirically across thousands of real planner overrides at major forecasting organisations.<\/p>\n<p>This is the contrarian insight worth taking to your next S&amp;OP meeting: <strong>the bias in your override process is not what you think.<\/strong> It isn&#8217;t wild optimism and it isn&#8217;t gut feeling. It&#8217;s an anchor that pulls every override halfway back to the baseline. Once you&#8217;ve seen the pattern you can correct for it \u2014 by asking the planner to justify the magnitude of the adjustment, not just its direction, and by reviewing post-hoc whether adjustments were, on average, big enough.<\/p>\n<h2>When Expert Overrides Help \u2014 and When They Don&#8217;t<\/h2>\n<p>Anchoring is a bias in the <em>size<\/em> of the adjustment. But there&#8217;s a second, separate question: on which forecasts does any adjustment help at all? The honest answer is the one nobody selling a demand-planning product wants you to hear.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/inphronesys.com\/wp-content\/uploads\/2026\/04\/jfc_skill_comparison.png\" alt=\"Forecast error (MASE) by period type: statistical forecast vs. expert-adjusted forecast\" \/><\/p>\n<p>The chart shows three different kinds of forecasting period \u2014 normal demand, promotional events, and supply disruptions \u2014 and two methods on each one: a pure statistical forecast and an expert-adjusted version of that same forecast. The metric is MASE; lower is better, 1.0 is a seasonal-naive benchmark. The numbers come from an actual simulation: the same 42-month synthetic series, an ETS(A,A,A) fit on the first 26 months, and an expert override set to 55% of the true event magnitude \u2014 the very anchoring rate we just measured in the previous chart. The directional pattern is the one Fildes et al. (2009) and Syntetos et al. (2009) found empirically across thousands of real planner overrides.<\/p>\n<p>Three numbers to take to the Monday stand-up:<\/p>\n<ul>\n<li><strong>Normal periods: statistical wins (0.54 vs. 0.73).<\/strong> ETS already beats the seasonal-naive benchmark by almost half. Expert adjustment then drags MASE <strong>33% higher<\/strong>. When demand is running along its historical rails, the human tinkering with the number is mostly adding noise. Anchoring, mood, the loudest voice in the S&amp;OP room \u2014 they all leak in.<\/li>\n<li><strong>Promotional events: expert adjustment wins by 37%.<\/strong> The statistical forecast is stuck above the naive benchmark (1.76) because it cannot see next month&#8217;s promotional calendar. A planner who applies the override \u2014 even an anchored one that only covers 55% of the true spike \u2014 pulls MASE down to <strong>1.11<\/strong>.<\/li>\n<li><strong>Supply disruption: expert adjustment wins by 42%.<\/strong> Same logic. Pure ETS posts MASE 1.86 because it never saw the shock coming. The anchored override alone drops that to <strong>1.08<\/strong>. The model cannot know your supplier&#8217;s factory burned down. A planner with a supplier hotline and a phone can.<\/li>\n<\/ul>\n<p>The pattern is consistent across the forecasting literature and it has a clear interpretation. <strong>Judgment hurts when the model is doing fine. Judgment helps \u2014 a lot \u2014 when the model is flying blind.<\/strong> The biggest mistake in most planning departments is symmetric override policy: every forecast goes through a human review, every forecast gets adjusted, every adjustment feels like work well done. What you&#8217;re actually doing, three-quarters of the time, is injecting anchoring noise into a perfectly acceptable forecast.<\/p>\n<p>The answer is not to suppress expert judgment. It is to <strong>channel<\/strong> it toward the cases where it measurably helps.<\/p>\n<h2>The Delphi Method: Getting Expert Consensus Without Groupthink<\/h2>\n<p>Sometimes the thing you need from the experts isn&#8217;t a single override \u2014 it&#8217;s an estimate of something that has never happened before. Launch volume for a new product family. First-year demand for a market you&#8217;ve never served. Volume impact of a regulatory ban. No statistical method can help here, because there is no history to fit. You are asking the room.<\/p>\n<p>The problem with asking the room is the room. The most senior person speaks first and anchors everyone else. The quietest expert \u2014 often the one with the sharpest sensing on the question \u2014 stays quiet. An hour later you have a consensus forecast that is really one person&#8217;s opinion with six signatures.<\/p>\n<p>The <strong>Delphi method<\/strong>, developed at RAND in the late 1950s and early 1960s, solves this by making the process deliberately unsocial.<\/p>\n<ol>\n<li><strong>Round 1.<\/strong> Every expert submits their forecast in writing, independently, without knowing what anyone else wrote. No conversation. No anchoring.<\/li>\n<li><strong>Aggregation.<\/strong> A facilitator compiles the estimates \u2014 typically reporting the median, the interquartile range, and the anonymous rationales each expert provided.<\/li>\n<li><strong>Round 2.<\/strong> Every expert sees the aggregated picture and submits a revised estimate, along with a short rationale if they are far from the median.<\/li>\n<li><strong>Iterate.<\/strong> Repeat until the distribution converges \u2014 typically 2 to 4 rounds.<\/li>\n<\/ol>\n<p>The result is not a negotiated compromise. It is a genuine synthesis of independent judgments, weighted by the facts each expert brought but insulated from the social dynamics of a live meeting.<\/p>\n<p>In supply chain practice, the Delphi mechanism is worth keeping in your back pocket for two specific settings: <strong>pre-S&amp;OP demand sensing<\/strong> (where sales, marketing, product management, and operations need to produce a joint quantitative view of a new launch or a shifted market), and <strong>post-shock re-baselining<\/strong> (where the model is no longer trustworthy and the organisation needs an anchored-but-expert view of what the new normal looks like). Run it over email or a shared form. It works better than a meeting, and it takes less time.<\/p>\n<h2>Rules for Overriding the Model<\/h2>\n<p>If you remember nothing else from this post, remember this: <strong>every override needs a reason, and not every reason is a good one.<\/strong> Here is a clean separation between the overrides that tend to help and the ones that tend to hurt.<\/p>\n<p><strong>Justified overrides<\/strong> (ones that typically improve the forecast):<\/p>\n<ul>\n<li><strong>A known, dateable event not in training history.<\/strong> A new promotion, a scheduled price change, a planned product launch, a known customer win. The model cannot see it. You can.<\/li>\n<li><strong>A confirmed supply-side disruption.<\/strong> Supplier stockout, factory fire, shipping-lane closure. The effect is real and measurable and outside the historical distribution.<\/li>\n<li><strong>A regulatory or contractual change with step-function effects.<\/strong> Tariff, ban, new contract kicking in, old contract expiring.<\/li>\n<li><strong>A structural break you can name.<\/strong> A business segment sold off, a major customer lost, a new channel opened. If you can write a one-sentence reason, the override is probably informed.<\/li>\n<\/ul>\n<p><strong>Likely-noise overrides<\/strong> (ones that typically make the forecast worse):<\/p>\n<ul>\n<li><strong>&#8222;Demand feels like it&#8217;s trending up.&#8220;<\/strong> Recency bias. Run it against a trend test before changing the number.<\/li>\n<li><strong>&#8222;Sales needs a higher number for the plan.&#8220;<\/strong> Motivated reasoning, and it is rampant. Plan targets are a negotiation artifact, not a forecast input.<\/li>\n<li><strong>&#8222;This feels low.&#8220;<\/strong> Gut feeling without a specific, nameable reason. This is the override that costs you the 7% in normal periods.<\/li>\n<li><strong>&#8222;Last month was high and I don&#8217;t trust it.&#8220;<\/strong> The one-data-point override. The model already de-weights single-period outliers far better than you can do by hand.<\/li>\n<\/ul>\n<p>The single best diagnostic question to put between an override and the plan is: <strong>&#8222;What do you know that the model doesn&#8217;t?&#8220;<\/strong> If the answer is a specific, dateable, explainable piece of information, change the number. If the answer is a feeling, don&#8217;t.<\/p>\n<h2>Integrating Judgment into Your S&amp;OP Process<\/h2>\n<p>An unstructured override program will silently erode your forecast accuracy. A structured one will measurably improve it. The difference is one number per override and a reporting cadence.<\/p>\n<p>Every override should log three things:<\/p>\n<ol>\n<li><strong>The baseline value and the adjusted value.<\/strong> So you can calculate, after the fact, whether the adjustment was in the right direction and whether it was of the right magnitude.<\/li>\n<li><strong>The reason category.<\/strong> Pick from a short, enforced list: promotion, supply disruption, new product, regulatory, customer-specific event, <em>other<\/em>. The &#8222;other&#8220; bucket should hover around 5-10%; if it creeps above 20%, your categories are wrong or your overrides are.<\/li>\n<li><strong>The overrider.<\/strong> Who made the call. Not to blame anyone \u2014 to track skill by person. Some planners are meaningfully better than others at specific override categories, and the only way to find out is to measure.<\/li>\n<\/ol>\n<p>Then review quarterly. Two numbers worth pulling:<\/p>\n<ul>\n<li><strong>Override forecast skill by category.<\/strong> What is MASE on the baseline versus the adjusted value, split by reason category? You will almost certainly find that promotion and supply-disruption overrides add skill, while &#8222;gut feel&#8220; overrides subtract it. That evidence is how you shrink the &#8222;other&#8220; bucket over time.<\/li>\n<li><strong>The anchor-corrected magnitude.<\/strong> For the categories where overrides do add skill, compute the ratio of applied adjustment to eventual correct adjustment. If the ratio is systematically below 1.0 \u2014 and it almost always is \u2014 train planners to push adjustments further, and provide them with reference ranges drawn from historical analogues.<\/li>\n<\/ul>\n<p>This is the transition from reactive override (&#8222;it just felt low this month&#8220;) to structured judgmental forecasting \u2014 and it is the kind of quiet operational discipline that separates S&amp;OP processes that work from ones that merely exist.<\/p>\n<h2>Your Next Steps<\/h2>\n<ol>\n<li><strong>Pull your last 12 months of overrides and tag them by reason.<\/strong> Use the categories from the last section. Then compute MASE on the baseline vs. the adjusted value, split by category. If any category has adjusted-MASE <em>worse<\/em> than baseline-MASE, stop overriding in that category next quarter.<\/li>\n<li><strong>Measure anchoring in your own numbers.<\/strong> For every override where the actual has now landed, plot <code>applied adjustment %<\/code> against <code>needed adjustment %<\/code>. Fit a line. If the slope is below 0.7 (and it will be), your planners are under-adjusting systematically \u2014 and you now have the chart to say so.<\/li>\n<li><strong>Introduce a 30-second override-reason field in your planning tool.<\/strong> A free-text box plus a pick-list of six categories. No reason, no override. That one change will cut your &#8222;gut feel&#8220; overrides by half.<\/li>\n<li><strong>Run one Delphi round on your next new-product launch.<\/strong> Four experts, one shared template, two rounds, no meeting. Compare the convergent forecast to what came out of your traditional launch-planning meeting. If the Delphi version is tighter, you know what to do next time.<\/li>\n<li><strong>Add a shock-review step to your S&amp;OP calendar.<\/strong> Once a month, explicitly ask: &#8222;Is there anything we know that the baseline model can&#8217;t see?&#8220; If the honest answer is no, skip the override cycle entirely for that SKU bucket. The biggest gains come from knowing when <em>not<\/em> to interfere.<\/li>\n<\/ol>\n<h2>Interactive Dashboard<\/h2>\n<p>Put yourself in the demand planner&#8217;s seat \u2014 pick a scenario, add your expert adjustment, and see whether it helped or hurt.<\/p>\n<div class=\"dashboard-link\" style=\"margin: 2em 0; padding: 1.5em; background: #f8f9fa; border-left: 4px solid #0073aa; border-radius: 4px;\">\n<p style=\"margin: 0 0 0.5em 0; font-size: 1.1em;\"><strong>Interactive Dashboard<\/strong><\/p>\n<p style=\"margin: 0 0 1em 0;\">Explore the data yourself \u2014 adjust parameters and see the results update in real time.<\/p>\n<p><a style=\"display: inline-block; padding: 0.6em 1.2em; background: #0073aa; color: #fff; text-decoration: none; border-radius: 4px; font-weight: bold;\" href=\"https:\/\/inphronesys.com\/wp-content\/uploads\/2026\/04\/2026-04-23_Judgmental_Forecasting_Expert_Override_dashboard.html\" target=\"_blank\" rel=\"noopener\">Open Interactive Dashboard \u2192<\/a><\/p>\n<\/div>\n<details>\n<summary><strong>Show R Code<\/strong><\/summary>\n<pre><code class=\"language-r\"># =============================================================================\n# generate_judgmental_images.R \u2014 FPP3 Ch. 6: Judgmental Forecasting\n# =============================================================================\n# \"When the Algorithm Is Wrong and the Expert Is Right\"\n# Generates 3 images for the judgmental forecasting blog post.\n# Run from project root: Rscript Scripts\/generate_judgmental_images.R\n# =============================================================================\n\n# --- Setup ---\nsource(\"Scripts\/theme_inphronesys.R\")\n\nsuppressPackageStartupMessages({\n  library(fpp3)\n  library(ggplot2)\n  library(dplyr)\n  library(tidyr)\n  library(scales)\n})\n\nimg_dir &lt;- \"Images\"\nset.seed(42)\n\n# =============================================================================\n# 1. MODEL FAILURE DURING A STRUCTURAL BREAK\n# =============================================================================\n# Scenario: monthly demand for an industrial consumable. A stable, seasonal\n# demand series is interrupted by a supplier-disruption shock at month 27\n# (think: COVID lockdown, a port strike, or a key supplier going offline).\n# We train ETS(A,A,A) on pre-shock data and let it forecast through the shock.\n# The forecast stays on the old trend line \u2014 the algorithm cannot know.\n# =============================================================================\n\nn_months &lt;- 42\nmonth_idx &lt;- 1:n_months\nmonth_dates &lt;- seq(as.Date(\"2022-01-01\"), by = \"1 month\", length.out = n_months)\n\n# Base signal: steady growth + annual (12-month) seasonality\ntrend_m &lt;- 2200 + 18 * month_idx\nseasonal_m &lt;- 260 * sin(2 * pi * (month_idx - 3) \/ 12)\nnoise_m &lt;- rnorm(n_months, 0, 60)\nbaseline_m &lt;- trend_m + seasonal_m + noise_m\n\n# Structural break at month 27: demand collapses for 3 months, gradual recovery over 3 months\nshock_start &lt;- 27\nshock_end &lt;- 29\nrecovery_end &lt;- 32\ndemand_actual &lt;- baseline_m\ndemand_actual[shock_start:shock_end] &lt;- baseline_m[shock_start:shock_end] * 0.55\ndemand_actual[(shock_end + 1):recovery_end] &lt;-\n  baseline_m[(shock_end + 1):recovery_end] * seq(0.70, 0.95, length.out = 3)\n\ndemand_tbl &lt;- tibble(\n  month = yearmonth(month_dates),\n  month_date = month_dates,\n  idx = month_idx,\n  demand = round(demand_actual)\n)\n\ntrain_ts &lt;- demand_tbl |&gt;\n  filter(idx &lt; shock_start) |&gt;\n  select(month, demand) |&gt;\n  as_tsibble(index = month)\n\nfit &lt;- train_ts |&gt;\n  model(ETS(demand ~ error(\"A\") + trend(\"A\") + season(\"A\", period = 12)))\n\nfc &lt;- fit |&gt;\n  forecast(h = n_months - (shock_start - 1)) |&gt;\n  as_tibble() |&gt;\n  mutate(month_date = as.Date(month))\n\nforecast_line &lt;- tibble(\n  month_date = fc$month_date,\n  forecast = as.numeric(fc$.mean)\n)\n\nshock_date &lt;- month_dates[shock_start]\n\np1 &lt;- ggplot() +\n  geom_line(data = demand_tbl, aes(month_date, demand),\n            color = iph_colors$dark, linewidth = 0.9) +\n  geom_point(data = demand_tbl, aes(month_date, demand),\n             color = iph_colors$dark, size = 1.3) +\n  geom_line(data = forecast_line, aes(month_date, forecast),\n            color = iph_colors$red, linewidth = 0.9, linetype = \"dashed\") +\n  geom_vline(xintercept = shock_date,\n             color = iph_colors$grey, linewidth = 0.4, linetype = \"dotted\") +\n  annotate(\"text\",\n           x = shock_date, y = max(demand_tbl$demand) * 1.04,\n           label = \"Supplier disruption\\n(Mar 2024)\",\n           hjust = -0.05, vjust = 1, size = 3.3,\n           color = iph_colors$dark, family = \"Inter\", lineheight = 0.95) +\n  annotate(\"text\",\n           x = month_dates[38],\n           y = forecast_line$forecast[length(forecast_line$forecast)] + 200,\n           label = \"ETS forecast\\n(trained pre-shock)\",\n           hjust = 0.5, vjust = 0, size = 3.2,\n           color = iph_colors$red, family = \"Inter\", lineheight = 0.95) +\n  annotate(\"text\",\n           x = month_dates[shock_start + 1], y = min(demand_tbl$demand) - 120,\n           label = \"Actual demand\",\n           hjust = 0.5, vjust = 1, size = 3.2,\n           color = iph_colors$dark, family = \"Inter\") +\n  scale_x_date(date_breaks = \"6 months\", date_labels = \"%b %y\",\n               expand = expansion(mult = c(0.02, 0.05))) +\n  scale_y_continuous(labels = comma_format(),\n                     limits = c(min(demand_tbl$demand) - 300,\n                                max(max(demand_tbl$demand),\n                                    max(forecast_line$forecast)) * 1.12)) +\n  labs(\n    title = \"The algorithm can't see the news\",\n    subtitle = \"ETS(A,A,A) forecast vs. actual monthly demand during a supplier disruption\",\n    x = NULL, y = \"Units \/ month\",\n    caption = \"Synthetic series: 42 monthly observations, ETS trained on months 1-26.\"\n  ) +\n  theme_inphronesys(grid = \"y\")\n\nggsave(file.path(img_dir, \"jfc_model_failure.png\"),\n       p1, width = 8, height = 5, dpi = 100, bg = \"white\")\ncat(\"Saved jfc_model_failure.png\\n\")\n\n# Key numbers for the blog\ntrough_idx_abs &lt;- (shock_start:shock_end)[which.min(demand_tbl$demand[shock_start:shock_end])]\nshock_trough_actual &lt;- demand_tbl$demand[trough_idx_abs]\nshock_trough_forecast &lt;- forecast_line$forecast[trough_idx_abs - shock_start + 1]\ngap_pct &lt;- round(100 * (shock_trough_forecast - shock_trough_actual) \/ shock_trough_forecast, 1)\ncat(\"  Actual trough (shock):\", shock_trough_actual, \"units (month\", trough_idx_abs, \")\\n\")\ncat(\"  ETS forecast at same month:\", round(shock_trough_forecast), \"units\\n\")\ncat(\"  Forecast overshoot vs. actual at trough:\", gap_pct, \"%\\n\")\n\n\n# =============================================================================\n# 2. ANCHORING + INSUFFICIENT ADJUSTMENT\n# =============================================================================\n# Scenario: 50 occasions where a demand planner had to adjust the statistical\n# baseline in response to known information (promos, product launches,\n# competitor stockouts). \"Needed\" adjustment = what ex-post turned out to be\n# correct. \"Expert\" adjustment = what the planner actually applied.\n# Experts adjust in the right direction but only ~55% of the way.\n# =============================================================================\n\nn_adj &lt;- 50\nneeded_pct &lt;- round(\n  c(rnorm(n_adj \/ 2, mean = 12, sd = 9),    # upward adjustments\n    rnorm(n_adj \/ 2, mean = -10, sd = 8)),  # downward adjustments\n  1\n)\nneeded_pct &lt;- pmax(pmin(needed_pct, 30), -30)  # clamp to [-30, +30]\n\n# Expert adjustment = anchored (55% of needed) + noise\nexpert_pct &lt;- round(\n  needed_pct * 0.55 + rnorm(n_adj, 0, 2.5),\n  1\n)\n\nadj_df &lt;- tibble(\n  needed = needed_pct,\n  expert = expert_pct\n)\n\nslope_fit &lt;- lm(expert ~ needed, data = adj_df)\nslope_est &lt;- round(coef(slope_fit)[2], 2)\nmean_shortfall &lt;- round(mean((abs(adj_df$needed) - abs(adj_df$expert)) \/\n                              pmax(abs(adj_df$needed), 0.5)) * 100, 1)\n\ncat(\"\\nAnchoring chart numbers:\\n\")\ncat(\"  n =\", n_adj, \"adjustment opportunities\\n\")\ncat(\"  Regression slope (expert ~ needed):\", slope_est,\n    \"(perfect = 1.00)\\n\")\ncat(\"  Mean magnitude shortfall:\", mean_shortfall, \"%\\n\")\n\np2 &lt;- ggplot(adj_df, aes(x = needed, y = expert)) +\n  geom_abline(slope = 1, intercept = 0,\n              color = iph_colors$grey, linetype = \"dashed\", linewidth = 0.5) +\n  geom_smooth(method = \"lm\", se = FALSE,\n              color = iph_colors$red, linewidth = 0.8) +\n  geom_point(color = iph_colors$blue, size = 2.6, alpha = 0.75) +\n  annotate(\"text\",\n           x = 14, y = 28,\n           label = \"Perfect adjustment\\n(45\\u00b0 line)\",\n           hjust = 0, vjust = 1, size = 3.2,\n           color = iph_colors$grey, family = \"Inter\", lineheight = 0.95) +\n  annotate(\"text\",\n           x = 10, y = -3,\n           label = paste0(\"Actual expert adjustment\\nslope \\u2248 \", slope_est),\n           hjust = 0, vjust = 1, size = 3.2,\n           color = iph_colors$red, family = \"Inter\", lineheight = 0.95) +\n  annotate(\"text\",\n           x = -29, y = 27,\n           label = \"Experts move in the right direction\\nbut stop roughly halfway\",\n           hjust = 0, vjust = 1, size = 3.4, fontface = \"bold\",\n           color = iph_colors$dark, family = \"Inter\", lineheight = 0.95) +\n  scale_x_continuous(labels = function(x) paste0(x, \"%\"),\n                     limits = c(-32, 32), breaks = seq(-30, 30, 10)) +\n  scale_y_continuous(labels = function(x) paste0(x, \"%\"),\n                     limits = c(-32, 32), breaks = seq(-30, 30, 10)) +\n  labs(\n    title = \"Anchoring: the expert moves, but not far enough\",\n    subtitle = \"50 planner adjustments vs. the adjustment that turned out correct\",\n    x = \"Needed adjustment to the statistical baseline\",\n    y = \"Expert's actual adjustment\",\n    caption = \"Synthetic illustration calibrated to anchoring-bias findings in the forecasting literature.\"\n  ) +\n  theme_inphronesys(grid = \"xy\")\n\nggsave(file.path(img_dir, \"jfc_adjustment_bias.png\"),\n       p2, width = 8, height = 5, dpi = 100, bg = \"white\")\ncat(\"Saved jfc_adjustment_bias.png\\n\")\n\n\n# =============================================================================\n# 3. WHEN DOES EXPERT ADJUSTMENT PAY OFF?  (real simulation, not illustrative)\n# =============================================================================\n# Three scenarios \u2014 same base signal (trend + 12-month seasonal + noise).\n# ETS(A,A,A) trained on months 1-26, forecast months 27-42 (h = 16).\n# An event window in the test period differs across scenarios. The \"expert\"\n# applies an override to the forecast during the event window, calibrated at\n# 55% of the true event magnitude (per the anchoring pattern in Image 2).\n# MASE uses the seasonal naive (m = 12) in-sample MAE as the denominator.\n# =============================================================================\n\ntrain_cut &lt;- 26    # months 1..26 used for training\nm_season  &lt;- 12\n\n# Shared base signal (deterministic given seed already set at top of script)\nbase_signal &lt;- function() {\n  idx &lt;- 1:n_months\n  trend_m   &lt;- 2200 + 18 * idx\n  seasonal_m &lt;- 260 * sin(2 * pi * (idx - 3) \/ 12)\n  noise_m   &lt;- rnorm(n_months, 0, 60)\n  trend_m + seasonal_m + noise_m\n}\n\nbuild_scenario &lt;- function(label, event_start, event_end,\n                           actual_event_mult, expert_override,\n                           base_y) {\n  y &lt;- base_y\n  # Apply actual event to the true demand\n  if (actual_event_mult != 1) {\n    y[event_start:event_end] &lt;- y[event_start:event_end] * actual_event_mult\n  }\n\n  dates &lt;- seq(as.Date(\"2022-01-01\"), by = \"1 month\", length.out = n_months)\n  scen_tbl &lt;- tibble(\n    month = yearmonth(dates),\n    month_date = dates,\n    idx = 1:n_months,\n    demand = round(y)\n  )\n\n  train_ts &lt;- scen_tbl |&gt;\n    filter(idx &lt;= train_cut) |&gt;\n    select(month, demand) |&gt;\n    as_tsibble(index = month)\n\n  fit &lt;- train_ts |&gt;\n    model(ETS(demand ~ error(\"A\") + trend(\"A\") + season(\"A\", period = m_season)))\n  fc &lt;- fit |&gt;\n    forecast(h = n_months - train_cut) |&gt;\n    as_tibble() |&gt;\n    mutate(month_date = as.Date(month)) |&gt;\n    transmute(idx = (train_cut + 1):n_months,\n              month_date,\n              forecast_stat = as.numeric(.mean))\n\n  # Seasonal naive in-sample MAE (m=12) over training set\n  y_train &lt;- scen_tbl$demand[1:train_cut]\n  snaive_err &lt;- abs(y_train[(m_season + 1):train_cut] -\n                    y_train[1:(train_cut - m_season)])\n  snaive_mae &lt;- mean(snaive_err)\n\n  # Merge truth + forecast\n  test_tbl &lt;- scen_tbl |&gt;\n    filter(idx &gt; train_cut) |&gt;\n    left_join(fc |&gt; select(idx, forecast_stat), by = \"idx\")\n\n  # Expert-adjusted forecast: override applied to event window only\n  test_tbl &lt;- test_tbl |&gt;\n    mutate(\n      in_event = idx &gt;= event_start &amp; idx &lt;= event_end,\n      forecast_expert = ifelse(in_event,\n                               forecast_stat * (1 + expert_override),\n                               forecast_stat)\n    )\n\n  mase_stat  &lt;- mean(abs(test_tbl$demand - test_tbl$forecast_stat))   \/ snaive_mae\n  mase_exp   &lt;- mean(abs(test_tbl$demand - test_tbl$forecast_expert)) \/ snaive_mae\n\n  # Sweep override across a realistic range to find the optimum\n  sweep_pct &lt;- seq(-0.60, 0.60, by = 0.01)\n  mase_sweep &lt;- sapply(sweep_pct, function(ov) {\n    fc_adj &lt;- test_tbl$forecast_stat\n    fc_adj[test_tbl$in_event] &lt;- fc_adj[test_tbl$in_event] * (1 + ov)\n    mean(abs(test_tbl$demand - fc_adj)) \/ snaive_mae\n  })\n  optimal_override &lt;- sweep_pct[which.min(mase_sweep)]\n  optimal_mase     &lt;- min(mase_sweep)\n\n  list(\n    label = label,\n    event_start = event_start, event_end = event_end,\n    actual_event_mult = actual_event_mult,\n    expert_override = expert_override,\n    snaive_mae = snaive_mae,\n    mase_stat = mase_stat,\n    mase_expert = mase_exp,\n    optimal_override = optimal_override,\n    optimal_mase = optimal_mase,\n    train = scen_tbl |&gt; filter(idx &lt;= train_cut),\n    test = test_tbl,\n    sweep = tibble(override_pct = sweep_pct, mase = mase_sweep)\n  )\n}\n\n# Shared base \u2014 regenerate fresh so every scenario sees the same noise seed\nset.seed(42)\nbase_y_shared &lt;- base_signal()\n\nscen_normal &lt;- build_scenario(\n  label = \"Normal demand\",\n  event_start = 35, event_end = 37,\n  actual_event_mult = 1.0,   # no actual event\n  expert_override  = 0.08,   # expert wrongly feels \"Q4 will be strong\"\n  base_y = base_y_shared\n)\n\nscen_promo &lt;- build_scenario(\n  label = \"Promotional event\",\n  event_start = 33, event_end = 35,\n  actual_event_mult = 1.40,   # +40% promo spike (real, known to planner)\n  expert_override  = 0.22,    # 55% of 40%: anchored under-adjustment\n  base_y = base_y_shared\n)\n\nscen_disrupt &lt;- build_scenario(\n  label = \"Supply disruption\",\n  event_start = 27, event_end = 29,\n  actual_event_mult = 0.55,   # -45% crash (same as Image 1)\n  expert_override  = -0.2475, # 55% of -45%\n  base_y = base_y_shared\n)\n\nscenarios &lt;- list(normal = scen_normal,\n                  promo = scen_promo,\n                  disruption = scen_disrupt)\n\nskill_df &lt;- tibble(\n  period = rep(sapply(scenarios, function(s) s$label), each = 2),\n  method = rep(c(\"Statistical only\", \"Expert-adjusted\"), 3),\n  mase   = as.numeric(unlist(lapply(scenarios,\n              function(s) c(s$mase_stat, s$mase_expert))))\n) |&gt;\n  mutate(\n    period = factor(period, levels = c(\"Normal demand\",\n                                        \"Promotional event\",\n                                        \"Supply disruption\")),\n    method = factor(method, levels = c(\"Statistical only\",\n                                         \"Expert-adjusted\"))\n  )\n\n# Percentage change labels\ndelta_df &lt;- skill_df |&gt;\n  group_by(period) |&gt;\n  summarise(\n    stat = mase[method == \"Statistical only\"],\n    adj  = mase[method == \"Expert-adjusted\"],\n    delta_pct = round(100 * (adj - stat) \/ stat, 0),\n    .groups = \"drop\"\n  ) |&gt;\n  mutate(label = paste0(ifelse(delta_pct &gt; 0, \"+\", \"\"), delta_pct, \"%\"))\n\ncat(\"\\nSkill comparison numbers (MASE):\\n\")\nprint(skill_df)\ncat(\"\\nExpert vs. statistical, % change in MASE (lower = better):\\n\")\nprint(delta_df)\n\nmethod_fill &lt;- c(\n  \"Statistical only\" = iph_colors$blue,\n  \"Expert-adjusted\"  = iph_colors$green\n)\n\ny_top &lt;- max(skill_df$mase) * 1.35\n\np3 &lt;- ggplot(skill_df, aes(x = period, y = mase, fill = method)) +\n  geom_col(position = position_dodge(width = 0.72),\n           width = 0.65) +\n  geom_text(aes(label = sprintf(\"%.2f\", mase)),\n            position = position_dodge(width = 0.72),\n            vjust = -0.4, size = 3.3, family = \"Inter\",\n            color = iph_colors$dark) +\n  geom_hline(yintercept = 1, color = iph_colors$grey,\n             linetype = \"dashed\", linewidth = 0.5) +\n  annotate(\"text\",\n           x = 0.55, y = 1.08,\n           label = \"Naive benchmark\",\n           hjust = 0, size = 3.1,\n           color = iph_colors$grey, family = \"Inter\") +\n  annotate(\"text\",\n           x = 1, y = max(skill_df$mase[skill_df$period == \"Normal demand\"]) + 0.22,\n           label = delta_df$label[1],\n           size = 3.4, fontface = \"bold\",\n           color = iph_colors$red, family = \"Inter\") +\n  annotate(\"text\",\n           x = 2, y = max(skill_df$mase[skill_df$period == \"Promotional event\"]) + 0.22,\n           label = delta_df$label[2],\n           size = 3.4, fontface = \"bold\",\n           color = iph_colors$green, family = \"Inter\") +\n  annotate(\"text\",\n           x = 3, y = max(skill_df$mase[skill_df$period == \"Supply disruption\"]) + 0.22,\n           label = delta_df$label[3],\n           size = 3.4, fontface = \"bold\",\n           color = iph_colors$green, family = \"Inter\") +\n  scale_fill_manual(values = method_fill, name = NULL) +\n  scale_y_continuous(limits = c(0, y_top),\n                     breaks = seq(0, ceiling(y_top), 0.5),\n                     expand = expansion(mult = c(0, 0.02))) +\n  labs(\n    title = \"Expert adjustment earns its keep in exceptional periods\",\n    subtitle = \"Forecast error (MASE, lower is better) by scenario and method\",\n    x = NULL, y = \"MASE\",\n    caption = \"ETS(A,A,A) on 42-month synthetic series; expert override = 55% of true event magnitude.\"\n  ) +\n  theme_inphronesys(grid = \"y\")\n\nggsave(file.path(img_dir, \"jfc_skill_comparison.png\"),\n       p3, width = 8, height = 5, dpi = 100, bg = \"white\")\ncat(\"Saved jfc_skill_comparison.png\\n\")\n\n\n# =============================================================================\n# 4. JSON EXPORT FOR THE DASHBOARD\n# =============================================================================\n# Charlie-zero consumes this file. It contains the three scenario series,\n# ETS forecasts, seasonal-naive MAE denominator, MASE values, and the full\n# override-sweep so the dashboard can reproduce the MASE curve live.\n# =============================================================================\n\nlibrary(jsonlite)\n\nscenario_to_json &lt;- function(s) {\n  list(\n    label = s$label,\n    event_start_idx = s$event_start,\n    event_end_idx   = s$event_end,\n    actual_event_mult = s$actual_event_mult,\n    expert_override   = s$expert_override,\n    snaive_mae        = s$snaive_mae,\n    mase_stat         = s$mase_stat,\n    mase_expert       = s$mase_expert,\n    optimal_override  = s$optimal_override,\n    optimal_mase      = s$optimal_mase,\n    train = list(\n      month = format(s$train$month_date, \"%Y-%m\"),\n      idx   = s$train$idx,\n      demand = s$train$demand\n    ),\n    test = list(\n      month = format(s$test$month_date, \"%Y-%m\"),\n      idx   = s$test$idx,\n      demand = s$test$demand,\n      forecast_stat   = round(s$test$forecast_stat, 2),\n      forecast_expert = round(s$test$forecast_expert, 2),\n      in_event = s$test$in_event\n    ),\n    sweep = list(\n      override_pct = s$sweep$override_pct,\n      mase         = round(s$sweep$mase, 4)\n    )\n  )\n}\n\nexport &lt;- list(\n  meta = list(\n    title = \"When the Algorithm Is Wrong and the Expert Is Right\",\n    n_months = n_months,\n    train_months = train_cut,\n    test_months = n_months - train_cut,\n    model = \"ETS(A,A,A), period = 12\",\n    mase_denominator = \"seasonal naive in-sample MAE (m = 12)\",\n    anchoring_slope = slope_est,\n    anchoring_n = n_adj\n  ),\n  scenarios = lapply(scenarios, scenario_to_json)\n)\n\njson_path &lt;- \"Dashboards\/jfc_simulator_data.json\"\nif (!dir.exists(\"Dashboards\")) dir.create(\"Dashboards\")\nwrite_json(export, json_path,\n           auto_unbox = TRUE, digits = 4, pretty = TRUE)\ncat(\"Saved\", json_path, \"\\n\")\n\n\n# =============================================================================\n# Summary \u2014 for the FINAL dataset message\n# =============================================================================\ncat(\"\\n========== FINAL DATASET SUMMARY ==========\\n\")\ncat(\"Image 1 \u2014 jfc_model_failure.png\\n\")\ncat(\"  n_months:\", n_months, \"\\n\")\ncat(\"  shock_start (month idx):\", shock_start,\n    \"=&gt;\", format(month_dates[shock_start], \"%b %Y\"), \"\\n\")\ncat(\"  shock low (actual units):\", shock_trough_actual, \"\\n\")\ncat(\"  ETS forecast at same month (units):\", round(shock_trough_forecast), \"\\n\")\ncat(\"  forecast overshoot at trough (%):\", gap_pct, \"\\n\\n\")\n\ncat(\"Image 2 \u2014 jfc_adjustment_bias.png\\n\")\ncat(\"  n_adjustments:\", n_adj, \"\\n\")\ncat(\"  regression slope (expert ~ needed):\", slope_est, \"\\n\")\ncat(\"  mean magnitude shortfall (%):\", mean_shortfall, \"\\n\\n\")\n\ncat(\"Image 3 \u2014 jfc_skill_comparison.png (computed from real simulation)\\n\")\nprint(skill_df, n = 6)\ncat(\"\\nPer-scenario diagnostics:\\n\")\nfor (s in scenarios) {\n  cat(sprintf(\"  %-20s snaive_MAE=%.1f  MASE_stat=%.2f  MASE_expert=%.2f  optimal_override=%+.0f%%  optimal_MASE=%.2f\\n\",\n              s$label, s$snaive_mae,\n              s$mase_stat, s$mase_expert,\n              s$optimal_override * 100, s$optimal_mase))\n}\ncat(\"\\nJSON export:\", json_path, \"\\n\")\ncat(\"Done.\\n\")\n<\/code><\/pre>\n<\/details>\n<h2>References<\/h2>\n<ul>\n<li>Hyndman, R. J., &amp; Athanasopoulos, G. (2021). <em>Forecasting: Principles and Practice<\/em> (3rd ed.), Chapter 6 \u2014 &#8222;Judgmental forecasts.&#8220; OTexts. <a href=\"https:\/\/otexts.com\/fpp3\/judgmental.html\">https:\/\/otexts.com\/fpp3\/judgmental.html<\/a><\/li>\n<li>Fildes, R., Goodwin, P., Lawrence, M., &amp; Nikolopoulos, K. (2009). &#8222;Effective forecasting and judgmental adjustments: an empirical evaluation and strategies for improvement in supply-chain planning.&#8220; <em>International Journal of Forecasting<\/em>, 25(1), 3\u201323. <a href=\"https:\/\/doi.org\/10.1016\/j.ijforecast.2008.11.010\">DOI: 10.1016\/j.ijforecast.2008.11.010<\/a><\/li>\n<li>Syntetos, A. A., Nikolopoulos, K., Boylan, J. E., Fildes, R., &amp; Goodwin, P. (2009). &#8222;The effects of integrating management judgement into intermittent demand forecasts.&#8220; <em>International Journal of Production Economics<\/em>, 118(1), 72\u201381. <a href=\"https:\/\/doi.org\/10.1016\/j.ijpe.2008.08.011\">DOI: 10.1016\/j.ijpe.2008.08.011<\/a><\/li>\n<li>Tversky, A., &amp; Kahneman, D. (1974). &#8222;Judgment under uncertainty: heuristics and biases.&#8220; <em>Science<\/em>, 185(4157), 1124\u20131131.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Statistical models don&#8217;t know about your supplier&#8217;s factory fire, your competitor&#8217;s clearance sale, or the regulation that just changed. Here&#8217;s when expert judgment beats the algorithm \u2014 and the biases that make it worse.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[206,293,115],"tags":[296,205,295,297,127,294,201,26],"class_list":["post-1886","post","type-post","status-publish","format-standard","hentry","category-forecasting","category-operations-management","category-supply-chain-management","tag-delphi","tag-demand-planning","tag-expert-override","tag-forecast-bias","tag-fpp3","tag-judgmental-forecasting","tag-sop","tag-supply-chain-analytics"],"_links":{"self":[{"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/posts\/1886","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/inphronesys.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1886"}],"version-history":[{"count":1,"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/posts\/1886\/revisions"}],"predecessor-version":[{"id":1887,"href":"https:\/\/inphronesys.com\/index.php?rest_route=\/wp\/v2\/posts\/1886\/revisions\/1887"}],"wp:attachment":[{"href":"https:\/\/inphronesys.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1886"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/inphronesys.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1886"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/inphronesys.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1886"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}