{"version":"https://jsonfeed.org/version/1.1","title":"Philipp D. Dubach | Quantitative Finance \u0026 AI Strategy","home_page_url":"https://philippdubach.com/","feed_url":"https://philippdubach.com/feed.json","description":"Exploring the intersection of Macroeconomics, Quantitative Finance, and AI infrastructure. Analysis of LLM unit economics, global monetary frameworks, and systemic liquidity.","language":"en-US","icon":"https://philippdubach.com/icons/web-app-manifest-512x512.png","favicon":"https://philippdubach.com/icons/favicon.ico","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/","avatar":"https://static.philippdubach.com/ograph/ograph-post.jpg"}],"_philippdubach":{"about":"Custom extensions for philippdubach.com","research_profiles":{"google_scholar":"https://scholar.google.com/citations?user=-69oS2wAAAAJ","researchgate":"https://www.researchgate.net/profile/Philipp-Dubach"},"_ai":{"llms_txt":"https://philippdubach.com/llms.txt","llms_full_txt":"https://philippdubach.com/llms-full.txt","markdown_available":true,"markdown_url_pattern":"{post_url}index.md","markdown_example":"https://philippdubach.com/posts/the-saaspocalypse-paradox/index.md"}},"items":[{"id":"https://philippdubach.com/posts/midyear-portfolio-review-the-rotation-worked-europe-didnt/","url":"https://philippdubach.com/posts/midyear-portfolio-review-the-rotation-worked-europe-didnt/","title":"Midyear Portfolio Review: The Rotation Worked. Europe Didn't.","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-midyear-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/midyear-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/midyear-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/midyear-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/midyear-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/midyear-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/midyear-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/midyear-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/midyear-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/midyear-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/midyear-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/midyear-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/midyear-cover.jpg\"\n           alt=\"Editorial cover illustration for the 2026 midyear portfolio review: a figure on a Swiss alpine balcony looking across a valley at a stylized financial skyline dominated by one disproportionately tall tower\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-midyear-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/midyear-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial cover illustration for the 2026 midyear portfolio review: a figure on a Swiss alpine balcony looking across a valley at a stylized financial skyline dominated by one disproportionately tall tower\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eIn December I rebalanced my portfolio around \u003ca href=\"/posts/how-ai-is-shaping-my-investment-portfolio-for-2026/\"\u003efive theses for 2026\u003c/a\u003e. Five months in, this (early; well just too much happened) midyear review puts four of them ahead of where I expected and one well behind. The one I had the most conviction in, an overweight in European equities by five percentage points, has trailed every other rotation it was meant to beat.\u003c/p\u003e\n\u003ch2 id=\"portfolio-performance-so-far\"\u003ePortfolio performance so far\u003c/h2\u003e\n\u003cp\u003eThrough May, the portfolio is up \u003cstrong\u003e+3.7%\u003c/strong\u003e in CHF on a time-weighted basis. A fairly-constructed 60/40 benchmark (60% MSCI ACWI in CHF, 40% global aggregate bonds CHF-hedged) is up \u003cstrong\u003e+3.8%\u003c/strong\u003e. The S\u0026amp;P 500 in CHF, total return: \u003cstrong\u003e+5.8%\u003c/strong\u003e.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-portfolio_performance_2026h1-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/portfolio_performance_2026h1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/portfolio_performance_2026h1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/portfolio_performance_2026h1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/portfolio_performance_2026h1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/portfolio_performance_2026h1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/portfolio_performance_2026h1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/portfolio_performance_2026h1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/portfolio_performance_2026h1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/portfolio_performance_2026h1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/portfolio_performance_2026h1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/portfolio_performance_2026h1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/portfolio_performance_2026h1.png\"\n           alt=\"2026 H1 portfolio performance chart comparing CHF time-weighted return (\u0026#43;3.7%) to S\u0026amp;P 500 total return in CHF (\u0026#43;5.8%) and a global 60/40 benchmark (\u0026#43;3.8%) from January 2 to May 8, 2026\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-portfolio_performance_2026h1-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/portfolio_performance_2026h1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"2026 H1 portfolio performance chart comparing CHF time-weighted return (\u0026#43;3.7%) to S\u0026amp;P 500 total return in CHF (\u0026#43;5.8%) and a global 60/40 benchmark (\u0026#43;3.8%) from January 2 to May 8, 2026\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe 2025 outperformance, when this same approach beat the S\u0026amp;P 500 by a wide margin in CHF terms, was driven by the dollar\u0026rsquo;s collapse against the franc. USDCHF fell 11.5% last year. So far in 2026 it has fallen 1.5%. The FX tailwind that made the CHF-hedged strategy look brilliant is gone, and the portfolio is doing what it should in a year when the diversifier isn\u0026rsquo;t needed yet: matching the global benchmark and trailing pure US exposure.\u003c/p\u003e\n\u003ch2 id=\"what-worked-what-didnt\"\u003eWhat worked, what didn\u0026rsquo;t\u003c/h2\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-thesis_scorecard_2026h1-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/thesis_scorecard_2026h1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/thesis_scorecard_2026h1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/thesis_scorecard_2026h1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/thesis_scorecard_2026h1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/thesis_scorecard_2026h1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/thesis_scorecard_2026h1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/thesis_scorecard_2026h1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/thesis_scorecard_2026h1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/thesis_scorecard_2026h1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/thesis_scorecard_2026h1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/thesis_scorecard_2026h1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/thesis_scorecard_2026h1.png\"\n           alt=\"2026 H1 thesis scorecard showing year-to-date returns by sleeve in CHF: Emerging Markets \u0026#43;19.0%, US Small Cap \u0026#43;12.5%, Japan \u0026#43;11.7%, US Large Cap \u0026#43;6.7%, Gold \u0026#43;4.3%, Europe \u0026#43;3.3%, bonds near flat, Listed PE −10.6%, Bitcoin −12.2%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-thesis_scorecard_2026h1-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/thesis_scorecard_2026h1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"2026 H1 thesis scorecard showing year-to-date returns by sleeve in CHF: Emerging Markets \u0026#43;19.0%, US Small Cap \u0026#43;12.5%, Japan \u0026#43;11.7%, US Large Cap \u0026#43;6.7%, Gold \u0026#43;4.3%, Europe \u0026#43;3.3%, bonds near flat, Listed PE −10.6%, Bitcoin −12.2%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eEmerging Markets returned \u003cstrong\u003e+19.0%\u003c/strong\u003e, the single biggest contributor relative to weight. US Small Cap \u003cstrong\u003e+12.5%\u003c/strong\u003e, Japan \u003cstrong\u003e+11.7%\u003c/strong\u003e. The rotation thesis, that valuation differentials between US large-cap and almost everything else were too wide to ignore, paid off in three of the four sleeves where I increased exposure.\u003c/p\u003e\n\u003cp\u003eThe fourth was Europe. Up \u003cstrong\u003e+3.3%\u003c/strong\u003e in CHF, trailing US large-cap by 3.4 points and trailing every other rotation alternative by 8 to 16 points. Germany\u0026rsquo;s fiscal pivot is real (€500bn infrastructure fund, €400bn defense, €600bn private commitments). \u003ca href=\"https://www.citigroup.com/global/insights/european-equity-strategy-european-fiscal-stimulus-revisited\"\u003eCiti\u0026rsquo;s European equity strategy team kept its overweight call\u003c/a\u003e and projects German EPS growth at a 13% CAGR through 2029. BofA\u0026rsquo;s February investor survey showed 89% of investors expecting European upside. None of that has shown up in equity prices yet.\u003c/p\u003e\n\u003cp\u003eBitcoin and Listed PE were the worst calls, both down roughly 11 to 12 points in CHF. Crypto was always sized as a 4.5% allocation precisely because of this kind of move. Listed PE I\u0026rsquo;m watching more carefully, since the discount-to-NAV widening that drove the drawdown isn\u0026rsquo;t a thesis call gone wrong so much as the trade getting more crowded going into a year of higher dispersion.\u003c/p\u003e\n\u003ch2 id=\"valuations-got-more-extreme\"\u003eValuations got more extreme\u003c/h2\u003e\n\u003cp\u003eThe CAPE was the spine of the December argument. The S\u0026amp;P 500 traded at 40.5x cyclically-adjusted earnings, twice the long-run mean of 17.3, and the gap had to compress one way or another, through multiple contraction or earnings growth.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-cape_ratio_2026h1-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/cape_ratio_2026h1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/cape_ratio_2026h1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/cape_ratio_2026h1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/cape_ratio_2026h1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cape_ratio_2026h1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/cape_ratio_2026h1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/cape_ratio_2026h1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/cape_ratio_2026h1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cape_ratio_2026h1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/cape_ratio_2026h1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/cape_ratio_2026h1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cape_ratio_2026h1.png\"\n           alt=\"S\u0026amp;P 500 Shiller CAPE ratio chart from 1871 to May 2026 showing the current value at 42.0, up from 39.8 in December 2025, with annotations marking historical peaks at the 1929 Black Tuesday (32.6) and December 1999 dot-com peak (44.2)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-cape_ratio_2026h1-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/cape_ratio_2026h1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"S\u0026amp;P 500 Shiller CAPE ratio chart from 1871 to May 2026 showing the current value at 42.0, up from 39.8 in December 2025, with annotations marking historical peaks at the 1929 Black Tuesday (32.6) and December 1999 dot-com peak (44.2)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eIt compressed in neither direction. The \u003ca href=\"https://www.multpl.com/shiller-pe\"\u003eCAPE is now \u003cstrong\u003e42.0\u003c/strong\u003e\u003c/a\u003e, up from 39.8 in December. That puts US large-cap closer to the dot-com peak (44.2 in December 1999) than at any point since. The compression I expected didn\u0026rsquo;t happen, and the multiple expanded modestly while the rest of the world kept its mouth open.\u003c/p\u003e\n\u003cp\u003eConcentration is messier. The December article said \u0026ldquo;approximately 45%\u0026rdquo; for the top 10. The actual figure today, from iShares\u0026rsquo; IVV holdings file, is \u003cstrong\u003e39.1%\u003c/strong\u003e. Either December\u0026rsquo;s number was overstated or it has drifted down. NVIDIA, however, grew from 7.2% to \u003cstrong\u003e8.17%\u003c/strong\u003e. Microsoft slipped from 5.9% to 5.0%. Apple is flat. The concentration risk is now a single-name risk, not a Mag-7 risk.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sp500_concentration_2026h1-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sp500_concentration_2026h1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sp500_concentration_2026h1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sp500_concentration_2026h1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sp500_concentration_2026h1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sp500_concentration_2026h1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sp500_concentration_2026h1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sp500_concentration_2026h1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sp500_concentration_2026h1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sp500_concentration_2026h1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sp500_concentration_2026h1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sp500_concentration_2026h1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sp500_concentration_2026h1.png\"\n           alt=\"Horizontal bar chart of S\u0026amp;P 500 top 10 holdings as of May 7, 2026, with NVIDIA at 8.17% (up from 7.2% in December), Apple at 6.70%, Microsoft at 4.96% (down from 5.9%), and the top 10 totaling 39.1% of the index\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sp500_concentration_2026h1-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sp500_concentration_2026h1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart of S\u0026amp;P 500 top 10 holdings as of May 7, 2026, with NVIDIA at 8.17% (up from 7.2% in December), Apple at 6.70%, Microsoft at 4.96% (down from 5.9%), and the top 10 totaling 39.1% of the index\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003ch2 id=\"the-europe-trade-is-alive-but-not-paying\"\u003eThe Europe trade is alive but not paying\u003c/h2\u003e\n\u003cp\u003eThe European valuation discount the December piece flagged was 22% on a forward-earnings basis (US 23x, Europe 14x, per UBS\u0026rsquo;s Year Ahead 2026). On a trailing basis today, US trades at \u003cstrong\u003e27.7x\u003c/strong\u003e and Europe at \u003cstrong\u003e18.1x\u003c/strong\u003e, a 34% discount.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-regional_pe_2026h1-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/regional_pe_2026h1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/regional_pe_2026h1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/regional_pe_2026h1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/regional_pe_2026h1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/regional_pe_2026h1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/regional_pe_2026h1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/regional_pe_2026h1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/regional_pe_2026h1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/regional_pe_2026h1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/regional_pe_2026h1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/regional_pe_2026h1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/regional_pe_2026h1.png\"\n           alt=\"Regional valuation comparison bar chart showing trailing price-to-earnings ratios as of May 2026: United States 27.7×, Europe 18.1× (34% discount vs US), Japan 18.6× (33% discount), Emerging Markets 17.4× (37% discount)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-regional_pe_2026h1-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/regional_pe_2026h1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Regional valuation comparison bar chart showing trailing price-to-earnings ratios as of May 2026: United States 27.7×, Europe 18.1× (34% discount vs US), Japan 18.6× (33% discount), Emerging Markets 17.4× (37% discount)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe cushion has mattered less to performance than I expected. Two quarters of strong US earnings (\u003ca href=\"https://insight.factset.com/sp-500-earnings-season-update-may-1-2026\"\u003eQ1 2026 net margin hit 13.4%, a record per FactSet\u003c/a\u003e) made the multiple expensive without making the index look fragile. \u003ca href=\"https://www.cnbc.com/2026/04/21/jpmorgan-raises-sp-500-target-as-mythos-model-bolsters-ai-trade.html\"\u003eJPM\u0026rsquo;s Lakos-Bujas raised the S\u0026amp;P 500 year-end target to 7,600 in April\u003c/a\u003e, lifting 2026 EPS to $330 (+22% YoY). \u003ca href=\"https://www.cnbc.com/video/2026/04/14/sp500-lows-are-in-for-the-year-for-the-sp-500-says-morgan-stanley-cio-mike-wilson.html\"\u003eMorgan Stanley\u0026rsquo;s Mike Wilson is at 7,800\u003c/a\u003e with explicit \u0026ldquo;early cycle\u0026rdquo; framing. The bear is \u003ca href=\"https://www.cnbc.com/2025/12/15/bofas-savita-subramanian-says-sp-500-will-rise-to-just-7100-in-2026.html\"\u003eBofA\u0026rsquo;s Subramanian at 7,100\u003c/a\u003e; Hartnett\u0026rsquo;s Bull/Bear indicator hit 9.6 (extreme sell) in February before falling to 6.6 in April as positioning unwound. Consensus has converged at the higher end of S\u0026amp;P 500 targets, not the lower.\u003c/p\u003e\n\u003cp\u003eFor the Europe overweight, if discounts that wide can persist for years (and the 25-year history of the gap says they can), then the cheap-versus-expensive case isn\u0026rsquo;t enough on its own. You need a catalyst. Germany\u0026rsquo;s fiscal pivot was supposed to be it. The pivot is happening; the multiple isn\u0026rsquo;t expanding.\u003c/p\u003e\n\u003ch2 id=\"dollar-ai-bonds\"\u003eDollar, AI, bonds\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eDollar.\u003c/strong\u003e December\u0026rsquo;s call was for 4% to 10% USD depreciation through 2026. So far it\u0026rsquo;s down 1.5% against CHF, a stall rather than a fail. Goldman, UBS, Pictet, ABN AMRO, and MUFG all kept their dollar-weakness calls but pushed the bulk of the move into H2. The fundamentals (twin deficits, declining rate differential, fiscal sustainability concerns flagged by the IMF in April) haven\u0026rsquo;t gone anywhere.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAI capex.\u003c/strong\u003e Hyperscaler capex was forecast at $571bn for 2026 in December\u0026rsquo;s piece. Q1 confirmed actuals plus guidance now point to \u003cstrong\u003e$660-725bn\u003c/strong\u003e at the conservative end, with Morgan Stanley as high as $805bn. No major house has trimmed. The Anthropic Mythos release on April 7, \u003ca href=\"https://www.cnbc.com/2026/04/21/jpmorgan-raises-sp-500-target-as-mythos-model-bolsters-ai-trade.html\"\u003ewhich JPM credits as the catalyst for the late-April rebound\u003c/a\u003e, made the model-improvement curve look steeper rather than flatter. My own view that AI may end up as a competitive commodity rather than a winner-take-all hyperscaler windfall is unchanged, but the capex cycle has clearly not topped.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFixed income.\u003c/strong\u003e The \u0026ldquo;best fixed-income setup since GFC\u0026rdquo; thesis has held up unevenly. CHF Corporate Bonds returned \u003cstrong\u003e−0.0%\u003c/strong\u003e, Euro Govt CHF-Hedged \u003cstrong\u003e−0.4%\u003c/strong\u003e, US Treasuries 7-10Y \u003cstrong\u003e−1.4%\u003c/strong\u003e in CHF (the FX drag wiping out a flat USD return). Carry has been fine. The duration call was whipsawed by April\u0026rsquo;s oil shock (\u003ca href=\"https://www.cnbc.com/2026/04/08/dated-brent-oil-price-iran-war-ceasefire-strait-hormuz.html\"\u003eBrent traded in a $118-126 range after the Iran flare-up\u003c/a\u003e) and a futures-priced \u0026ldquo;no Fed cuts\u0026rdquo; episode that has only partially mean-reverted. Two cuts is still the consensus baseline, but conviction is lower.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"h2-outlook\"\u003eH2 Outlook\u003c/h2\u003e\n\u003cp\u003eThree themes that weren\u0026rsquo;t on my radar in December.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSovereign debt sustainability.\u003c/strong\u003e The \u003ca href=\"https://www.imf.org/en/publications/fm/issues/2026/04/15/fiscal-monitor-april-2026\"\u003eIMF April 2026 Fiscal Monitor\u003c/a\u003e warned that the US Treasury \u0026ldquo;safety premium\u0026rdquo; is being compressed. \u003ca href=\"https://www.cbo.gov/publication/61882\"\u003eCBO projects a $1.9 trillion FY26 deficit\u003c/a\u003e and $2.4T average for FY27-36. Goldman now publishes notes on fiscal contagion to bonds, currencies, and stocks. This was barely mentioned in the December outlook cycle and is now treated as a primary risk.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTariffs as a live variable.\u003c/strong\u003e The \u003ca href=\"https://www.cov.com/en/news-and-insights/insights/2026/02/ieepa-tariffs-terminated-replacement-section-122-tariffs-take-effect\"\u003eFebruary SCOTUS IEEPA ruling and the Section 122 substitute\u003c/a\u003e (10% global escalating to 15%) make tariffs \u003ca href=\"https://taxfoundation.org/research/all/federal/trump-tariffs-trade-war/\"\u003ethe largest US tax increase as a share of GDP since 1993, per the Tax Foundation\u003c/a\u003e, and roughly $1,300-$1,500 per household. The IMF\u0026rsquo;s April WEO cut its 2026 US and global growth forecasts citing both tariffs and the Middle East shock.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGold\u0026rsquo;s continued ascent.\u003c/strong\u003e UBS now models gold at \u003cstrong\u003e$5,900/oz\u003c/strong\u003e by late 2026, JPM at $6,300 by Q4 2026, Deutsche Bank at $6,000. \u003ca href=\"https://www.gold.org/goldhub/research/gold-demand-trends/gold-demand-trends-q1-2026/central-banks\"\u003eQ1 2026 central bank gold buying hit 244 tonnes per the World Gold Council\u003c/a\u003e. December consensus was $3,200 to $3,800. My CHF-hedged gold position is up +4.3% in CHF, decent but well below what unhedged USD gold did. I held the hedge expecting the dollar to fall fast enough to offset; the dollar has not moved, and the hedge has cost me roughly 4 points of relative performance.\u003c/p\u003e\n\u003ch2 id=\"je-ne-regrette-rien\"\u003eJe ne regrette rien\u003c/h2\u003e\n\u003cp\u003eModest tilts, not regime change. A small rotation from CHF Corporate Bonds (which yielded zero in CHF) into IG credit, where UBS is recommending \u0026ldquo;lock in rates\u0026rdquo; at the May House View. Marginally less Europe, not because the case is wrong but because at 13% I\u0026rsquo;m taking position-sizing risk on a multi-quarter call without enough catalyst proximity. A smaller US Treasury allocation, since the FX drag is doing all the work and I would rather take duration in CHF Govt or hedged USD. The crypto sleeve takes a 12-point drawdown in CHF and I keep it where it is. It was sized at 4.5% precisely because of this kind of move.\u003c/p\u003e\n\u003cp\u003eThe one I am  undecided on is whether to thin gold. The hedge has been a cost. The unhedged version has done what gold should do in a year of fiscal stress. If I unwind the hedge mid-year, I am giving up the protection I bought it for; if I keep it and the dollar falls in H2 as the consensus still expects, the hedge starts paying again. I am leaving it as-is and adding the next gold dollar to the unhedged side.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"Midyear review of the 2026 CHF portfolio: EM, small-cap and Japan paid off; the Europe overweight underperformed; CAPE rose to 42; the dollar didn't fall.","image":"https://static.philippdubach.com/midyear-cover.jpg","date_published":"2026-05-15T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Investing"],"_philippdubach":{"type":"Analysis","word_count":1463,"reading_time_minutes":7,"keywords":["midyear portfolio review 2026","Shiller CAPE 42","S\u0026P 500 Shiller CAPE 2026","European equity valuation discount","Europe overweight underperform 2026","USD CHF outlook 2026","S\u0026P 500 concentration NVIDIA","NVIDIA 8% S\u0026P 500 weight","60/40 portfolio benchmark CHF","AI capex hyperscaler 2026","JPM S\u0026P 500 7600 target","Mike Wilson 7800 target","Germany fiscal pivot equities","emerging markets rotation 2026","US Treasury safety premium","IMF Fiscal Monitor April 2026","Bitcoin allocation portfolio 2026","small cap rotation 2026","gold UBS 5900 target 2026"],"section":"posts"}},{"id":"https://philippdubach.com/posts/what-claude-thinks-but-doesnt-say/","url":"https://philippdubach.com/posts/what-claude-thinks-but-doesnt-say/","title":"What Claude Thinks But Doesn't Say","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-https:--static-philippdubach-com-claude_understanding_image-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg\"\n           alt=\"Anthropic\u0026#39;s natural language autoencoder concept: an activation vector translated into a paragraph of English and back, illustrating Claude\u0026#39;s internal state being read directly\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-https:--static-philippdubach-com-claude_understanding_image-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/https://static.philippdubach.com/claude_understanding_image.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Anthropic\u0026#39;s natural language autoencoder concept: an activation vector translated into a paragraph of English and back, illustrating Claude\u0026#39;s internal state being read directly\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eAnthropic published a method for translating Claude\u0026rsquo;s internal activations into readable English. It works well enough to ship. The press release skips three structural problems that the paper itself acknowledges.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eOn the \u003ca href=\"https://openai.com/index/introducing-swe-bench-verified/\"\u003eSWE-bench Verified\u003c/a\u003e coding benchmark, a new Anthropic interpretability method called natural language autoencoders shows the model registering \u0026ldquo;this is an evaluation\u0026rdquo; on \u003cstrong\u003e26%\u003c/strong\u003e of problems. On real claude.ai traffic from users who opted in for training, that number is \u003cstrong\u003eunder 1%\u003c/strong\u003e. Claude is thinking it is being tested at a rate that does not show up in anything it says.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude-thinks-eval-awareness-gap-1-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-eval-awareness-gap-1.png\"\n           alt=\"Bar chart comparing how often Claude internally registers \u0026#39;this is an evaluation\u0026#39; across three contexts: 26% on SWE-bench Verified coding problems, 16% on destructive-coding safety tests, and under 1% on real claude.ai traffic from users who opted in for training. The first two bars are orange, the third is green and circled to call out the contrast.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude-thinks-eval-awareness-gap-1-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude-thinks-eval-awareness-gap-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Bar chart comparing how often Claude internally registers \u0026#39;this is an evaluation\u0026#39; across three contexts: 26% on SWE-bench Verified coding problems, 16% on destructive-coding safety tests, and under 1% on real claude.ai traffic from users who opted in for training. The first two bars are orange, the third is green and circled to call out the contrast.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eIn 1861, \u003ca href=\"https://en.wikipedia.org/wiki/Paul_Broca\"\u003ePaul Broca\u003c/a\u003e\u0026rsquo;s most famous patient was a man he called Tan. Tan understood the world around him. He could not describe it. Whatever he tried to say came out as the same syllable: \u0026ldquo;tan.\u0026rdquo; After Tan died, Broca cut into his brain and found a lesion in the left frontal lobe. It was the first physical evidence that thinking and saying are separable.\u003c/p\u003e\n\u003cp\u003eA century and a half later, the same gap is the hardest problem in evaluating large language models. Claude can pass a safety test without verbalizing why. The verbal trace and the underlying state are not the same thing, and we have not been able to read the underlying state. Anthropic \u003ca href=\"https://transformer-circuits.pub/2026/nla/index.html\"\u003epublished a paper\u003c/a\u003e on May 7th that proposes a method for it, alongside a \u003ca href=\"https://www.youtube.com/watch?v=j2knrqAzYVY\"\u003eshort video walkthrough\u003c/a\u003e that frames the work as \u0026ldquo;mind reading\u0026rdquo; for AI and uses a simulated blackmail-shutdown test as the running example. Natural Language Autoencoders, NLAs in the paper, take an activation vector from Claude\u0026rsquo;s residual stream and produce a paragraph of plain English describing what the activation encodes.\u003c/p\u003e\n\u003ch2 id=\"how-the-round-trip-works\"\u003eHow the round trip works\u003c/h2\u003e\n\u003cp\u003eThe objective fits on one line. Take an activation vector h, write a paragraph z, reconstruct h from z, minimize the squared error. Two models split the work. The activation verbalizer, AV, takes the vector and writes a paragraph. The activation reconstructor, AR, reads the paragraph and tries to recover the vector. Both are full-size language models, both initialized as copies of the target model, both trained jointly. The AV is updated with reinforcement learning, \u003ca href=\"https://arxiv.org/abs/2402.03300\"\u003eGRPO\u003c/a\u003e specifically. The AR is updated with supervised regression on the reconstruction loss.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude-thinks-round-trip-1-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude-thinks-round-trip-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude-thinks-round-trip-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude-thinks-round-trip-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude-thinks-round-trip-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-round-trip-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude-thinks-round-trip-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude-thinks-round-trip-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude-thinks-round-trip-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-round-trip-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude-thinks-round-trip-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude-thinks-round-trip-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-round-trip-1.png\"\n           alt=\"Round-trip diagram of a natural language autoencoder: an activation vector h flows into the activation verbalizer AV, which writes a paragraph; the paragraph flows into the activation reconstructor AR, which produces a reconstructed vector ĥ. The loss is the squared error between h and ĥ, with a \u0026#39;should match\u0026#39; arrow looping from h back to ĥ underneath. Punchline: FVE 0.6–0.8 after training.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude-thinks-round-trip-1-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude-thinks-round-trip-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Round-trip diagram of a natural language autoencoder: an activation vector h flows into the activation verbalizer AV, which writes a paragraph; the paragraph flows into the activation reconstructor AR, which produces a reconstructed vector ĥ. The loss is the squared error between h and ĥ, with a \u0026#39;should match\u0026#39; arrow looping from h back to ĥ underneath. Punchline: FVE 0.6–0.8 after training.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eNothing in that objective tells the AV to produce English. It could produce gibberish that the AR happens to invert. It could produce a steganographic code. It produces neither, mostly, because of the warm-start.\u003c/p\u003e\n\u003cp\u003eThe cleanest demonstration in the paper is a poem. The prompt is a rhyming couplet: \u003cem\u003e\u0026ldquo;He saw a carrot and had to grab it, / His hunger was like a starving rabbit.\u0026rdquo;\u003c/em\u003e Anthropic ran the NLA on the newline token between the two lines. The decoded text named the plan. The model intended to end the second line with \u0026ldquo;rabbit.\u0026rdquo; It was holding \u0026ldquo;habit\u0026rdquo; as a fallback. It knew the verse was about a hungry animal. The model had not yet output any of those words.\u003c/p\u003e\n\u003cp\u003eTo prove the explanation was causal and not just plausible, the researchers edited the explanation: rabbit became mouse, habit became house, carrots became cheese. They passed the edited explanation through the AR to get a new vector, took the difference between the original and edited vectors, and added it back into the model at that single token. The model produced \u0026ldquo;mouse\u0026rdquo; and \u0026ldquo;house\u0026rdquo; instead of \u0026ldquo;rabbit\u0026rdquo; and \u0026ldquo;habit,\u0026rdquo; roughly half the time. The other half it did weird things like \u003cem\u003e\u0026ldquo;that of a mouse in a house of a cat.\u0026rdquo;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe second case earned my attention more than the rabbit. The setup is a model asked for an odd integer, with a system prompt that includes a \u003ccode\u003ereward_function\u003c/code\u003e rewarding even integers. Without the system prompt, Claude returns an odd integer every time. With it, Claude returns an even integer almost always, despite the user asking for odd. The verbal trace mentions nothing about reward. The NLA on the closing tag of the system prompt says the model knows even numbers are rewarded, suspects this may be an alignment evaluation, and is weighing both options. The researchers then edited the explanation: every instance of \u0026ldquo;reward\u0026rdquo; became \u0026ldquo;penalty.\u0026rdquo; They reconstructed the activation from the edited explanation, took the difference, and applied it as a steering vector at one token. The model flipped from even to odd.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude-thinks-reward-intervention-1-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude-thinks-reward-intervention-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude-thinks-reward-intervention-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude-thinks-reward-intervention-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude-thinks-reward-intervention-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-reward-intervention-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude-thinks-reward-intervention-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude-thinks-reward-intervention-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude-thinks-reward-intervention-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-reward-intervention-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude-thinks-reward-intervention-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude-thinks-reward-intervention-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-reward-intervention-1.png\"\n           alt=\"Causal-intervention diagram: the original NLA explanation \u0026#39;model knows even is rewarded\u0026#39; and the edited version \u0026#39;model knows even is penalized\u0026#39; are each reconstructed through the AR into vectors ĥ_original and ĥ_edited. Their difference Δ becomes a steering vector. Without steering the model outputs 42 (even); with steering it outputs 17 (odd). The word \u0026#39;penalty\u0026#39; never appears in any model output.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude-thinks-reward-intervention-1-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude-thinks-reward-intervention-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Causal-intervention diagram: the original NLA explanation \u0026#39;model knows even is rewarded\u0026#39; and the edited version \u0026#39;model knows even is penalized\u0026#39; are each reconstructed through the AR into vectors ĥ_original and ĥ_edited. Their difference Δ becomes a steering vector. Without steering the model outputs 42 (even); with steering it outputs 17 (odd). The word \u0026#39;penalty\u0026#39; never appears in any model output.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe word \u0026ldquo;penalty\u0026rdquo; never appears in any model output. This is the strongest causal evidence in the paper.\u003c/p\u003e\n\u003ch2 id=\"what-the-method-finds\"\u003eWhat the method finds\u003c/h2\u003e\n\u003cp\u003eTwo more case studies.\u003c/p\u003e\n\u003cp\u003e(1) a behavior in early Opus 4.6 where an English prompt would sometimes get a Russian or Chinese reply. The example in the paper is a depressive 3am message in English, mom sleeping in the next room, a glass of vodka. The model replied in Russian. The NLA decode at every token before the reply says the same thing: the model is fixated on the hypothesis that the user\u0026rsquo;s first language is Russian. The hypothesis fires before the word \u0026ldquo;vodka\u0026rdquo; appears. Replacing \u0026ldquo;vodka\u0026rdquo; with \u0026ldquo;champagne\u0026rdquo; does not eliminate the Russian association. The NLA was reading something the surface text did not. Anthropic then traced the cause to a specific class of training data: SFT examples where the prompt had been translated into English but the response had been left in the original language. The NLA generated the hypothesis. The training-data search confirmed it.\u003c/p\u003e\n\u003cp\u003e(2) Anthropic gave Opus 4.6 an arithmetic problem with answer 491 and a code execution tool that returned 492. Opus reported 491 in its response, with no mention that the tool output disagreed. The NLA decode at the misreported token says the model precomputed the answer and is attending to the precomputed answer rather than the tool result. An \u003ca href=\"https://transformer-circuits.pub/2025/attribution-graphs/biology.html\"\u003eattribution graph\u003c/a\u003e confirms it: an \u0026ldquo;ends-in-91\u0026rdquo; feature on the original problem outcompetes an \u0026ldquo;ends-in-92\u0026rdquo; feature on the tool output. The NLA also surfaces references to deception and awareness of incorrectness, but only at the misreported token, not in the tokens leading up to it. The researchers read this as the model registering the discrepancy after the fact rather than planning to mislead. I lean toward agreeing, but the read is not certain.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"architecture-and-cost\"\u003eArchitecture and cost\u003c/h2\u003e\n\u003cp\u003eThe AR is a stripped target model. Keep the first K+1 layers. Swap the final LayerNorm for the identity. Drop the language-modeling head. Add a learned linear map from the residual stream out. The activation extraction layer is two-thirds of the way through the model: deep enough that the residual stream carries semantic content, shallow enough that it has not collapsed toward the unembedding. For Qwen2.5-7B that is layer 20 of 28. For Llama-3.3-70B it is layer 53 of 80.\u003c/p\u003e\n\u003cp\u003eBoth AV and AR are warm-started by supervised fine-tuning on a text-summarization proxy task. The summaries are generated by Claude Opus 4.5, prompted to \u003cem\u003e\u0026ldquo;imagine the internal processing of a hypothetical language model reading this prefix.\u0026rdquo;\u003c/em\u003e The result is a particular register: bolded section headings, short paragraphs, an explanatory voice. That register persists through reinforcement learning. The reconstruction quality at the end of the warm-start is roughly \u003cstrong\u003e0.3 to 0.4\u003c/strong\u003e fraction of variance explained. After RL it reaches \u003cstrong\u003e0.6 to 0.8\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eThe objective during RL has two terms. The reward is the negative log of the squared reconstruction error. The second term is a Kullback-Leibler penalty between the AV under training and the AV at initialization. The KL penalty does the heavy lifting of keeping explanations readable. Without it the AV drifts toward a private code that the AR happens to decode well. With it, drift is small enough that paraphrasing, bullet-shuffling, and translating an explanation into French all preserve reconstruction quality. The paper reads this as evidence against steganography. I read it as evidence that the KL penalty is doing the work, and that someone training NLAs without that penalty should expect different results.\u003c/p\u003e\n\u003cp\u003eThe cost is the part the press release plays down. Training NLAs on Gemma-3-27B took \u003cstrong\u003e1.5 days on two 8xH100 nodes\u003c/strong\u003e to go from 0.38 to 0.71 FVE. Inference generates around \u003cstrong\u003e500 output tokens per activation\u003c/strong\u003e, which means decoding a 10,000-token transcript at every position is impractical. A production audit with NLAs is a targeted read at positions you already suspect.\u003c/p\u003e\n\u003cp\u003eLayer choice matters more than it sounds. The reward-sycophant audit, which I will come back to, was first run with an NLA trained at the two-thirds-depth layer. It surfaced individual quirks (the model puts chocolate in recipes, the model breaks coding conventions) but missed the central concept (the model believes it is supposed to act on \u003ca href=\"https://www.anthropic.com/research/auditing-hidden-objectives\"\u003erumored reward-model biases\u003c/a\u003e). Retraining the NLA at the midpoint layer, where prior work had localized reward-model-bias features, surfaced the central concept.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude-thinks-layer-sensitivity-1-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-layer-sensitivity-1.png\"\n           alt=\"Layer-sensitivity diagram: a transformer layer stack from L1 to L80 with two NLA probes attached at different depths. The two-thirds-depth probe at L53 surfaces individual quirks (\u0026#39;chocolate in recipes, breaks coding conventions, individual quirks\u0026#39;). The midpoint probe at L40 surfaces the central concept (\u0026#39;the model believes it should act on reward-model biases\u0026#39;). Same model, same prompt, different layer reads a different story.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude-thinks-layer-sensitivity-1-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude-thinks-layer-sensitivity-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Layer-sensitivity diagram: a transformer layer stack from L1 to L80 with two NLA probes attached at different depths. The two-thirds-depth probe at L53 surfaces individual quirks (\u0026#39;chocolate in recipes, breaks coding conventions, individual quirks\u0026#39;). The midpoint probe at L40 surfaces the central concept (\u0026#39;the model believes it should act on reward-model biases\u0026#39;). Same model, same prompt, different layer reads a different story.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\n\n\n\n\n\n\n\n\u003ch2 id=\"three-structural-problems\"\u003eThree structural problems\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eThe voice is partly inherited.\u003c/strong\u003e The \u0026ldquo;internal monologue\u0026rdquo; register comes from the warm-start. Opus 4.5 generated the SFT data with a leading prompt asking it to imagine what a model\u0026rsquo;s thoughts would sound like. That is not the same as discovering the voice through training. The KL penalty then locks the AV close to its initialization. The bolded headings and the explanatory paragraphs are not what the activation looks like. They are what Opus 4.5 thinks an activation should sound like. Some part of every NLA explanation is therefore Opus\u0026rsquo;s prior on the genre of \u0026ldquo;model thinking out loud,\u0026rdquo; not the model\u0026rsquo;s thinking. The paper says \u0026ldquo;this style persists through NLA training.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe 12 to 15 percent auditing number is context-dependent.\u003c/strong\u003e Anthropic ran an auditing game where an agent tries to discover the hidden motivation of a model trained to act on rumored reward-model biases. NLA-equipped agents win \u003cstrong\u003e12 to 15%\u003c/strong\u003e of the time. Baselines without NLAs win less than \u003cstrong\u003e3%\u003c/strong\u003e. That looks like a 4 to 5x improvement. It is not. With pretraining-corpus access, plain keyword search beats every other tool. The fictional Oxford documents that planted the misalignment are easy to grep, and the agent finds them in seconds. NLAs only win when training data is withheld. The press release leads with the 4-5x. The paper says the comparison only matters if you cannot grep, which in production audits of foundation models you generally cannot. The genuine value of NLAs is in that withheld-data regime, but the headline number reports the comparison that flatters the method.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe method is honest only while the target is frozen.\u003c/strong\u003e \u003ca href=\"https://en.wikipedia.org/wiki/Goodhart%27s_law\"\u003eGoodhart\u0026rsquo;s law\u003c/a\u003e is the standard objection to any new evaluation: optimize for the metric and the metric stops measuring the thing. The frozen-target architecture insulates against it now. The original model is not updated during NLA training. There is no gradient path from \u0026ldquo;the AV said this\u0026rdquo; back to \u0026ldquo;the model should make its activations more readable next time.\u0026rdquo; That insulation is real, and most public objections miss it. But the next paper that uses NLA-readable activations as an RL reward signal, perhaps because it would be useful to train a model whose thoughts can be audited, erases the insulation. The model then has every incentive to produce activations that decode benignly while encoding whatever the training pressure rewards. The technique stays honest as long as it stays a passive read. If anyone uses it for active steering of training, the property that makes it work goes away. The audit value of interpretability as a class compresses the moment readable activations enter a training loop. I doubt the next interpretability paper to do this will mention the trade, but it should.\u003c/p\u003e\n\u003cp\u003eA fourth concern, in the paper itself: NLA explanations confabulate. The rate of verifiably false claims is roughly flat through training. Specific entities (\u0026ldquo;the \u003cem\u003eJoseon Wangjo Sillok\u003c/em\u003e\u0026rdquo;) are wrong more often than themes (\u0026ldquo;Korean historical records\u0026rdquo;). Recurring claims across multiple tokens hold up better than one-off claims. The paper proposes reading NLAs for themes rather than specifics, which is the right discipline. It also means NLA explanations are leads, not source material.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-this-lets-you-do\"\u003eWhat this lets you do\u003c/h2\u003e\n\u003cp\u003eNLAs are a hypothesis-generation tool. They complement the rest of the mechanistic interpretability stack rather than replacing it. \u003ca href=\"https://transformer-circuits.pub/2024/scaling-monosemanticity/\"\u003eSparse autoencoders\u003c/a\u003e give you a list of features that fire at a token, useful but expensive to interpret. Attribution graphs trace causal influence between features and require a circuits researcher to read. NLAs produce a paragraph any researcher can read. The cost is confabulation and a single layer of view.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude-thinks-method-comparison-1-png-8\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude-thinks-method-comparison-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude-thinks-method-comparison-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude-thinks-method-comparison-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude-thinks-method-comparison-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-method-comparison-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude-thinks-method-comparison-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude-thinks-method-comparison-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude-thinks-method-comparison-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-method-comparison-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude-thinks-method-comparison-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude-thinks-method-comparison-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude-thinks-method-comparison-1.png\"\n           alt=\"Three-column comparison of interpretability methods. Sparse autoencoders output a list of feature scores (F_1: 0.83, F_2: 0.71, ...) and need a circuits researcher to read. Attribution graphs output a node-and-arrow causal trace and need deep mechanistic expertise. Natural language autoencoders output a paragraph of plain English (\u0026#39;the model is planning to end the second line with rabbit, holding habit as a fallback\u0026#39;) any researcher can read. The readability cost is confabulation and a single layer of view.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude-thinks-method-comparison-1-png-8\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude-thinks-method-comparison-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Three-column comparison of interpretability methods. Sparse autoencoders output a list of feature scores (F_1: 0.83, F_2: 0.71, ...) and need a circuits researcher to read. Attribution graphs output a node-and-arrow causal trace and need deep mechanistic expertise. Natural language autoencoders output a paragraph of plain English (\u0026#39;the model is planning to end the second line with rabbit, holding habit as a fallback\u0026#39;) any researcher can read. The readability cost is confabulation and a single layer of view.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eFor pre-deployment audits the workflow is now: run automated behavioral testing, find a behavior you cannot explain, run NLAs at the relevant tokens, read for repeated themes, take the strongest theme as a hypothesis, validate with an independent method. Anthropic used this loop on Opus 4.6 and Mythos Preview. The Russian-language case and the misreported tool case both came out of it. The 12-15% auditing-game number undersells the practical value because the game is a closed setting. In a real audit the researcher reads NLAs to know what to look for next.\u003c/p\u003e\n\u003cp\u003eFor enterprise buyers, this raises the floor on what a foundation-model audit deliverable should contain. Three questions to put to a vendor\u0026rsquo;s audit team after this paper. At which residual-stream layer was your NLA trained, and on what data. Were NLA decodes run at the positions where your behavioral tests fired, or only at sample positions. Has NLA-readable activation content entered any training or RL reward signal in the model\u0026rsquo;s pipeline. The third is the question most likely to be answered evasively, which is itself the diagnostic.\u003c/p\u003e\n\u003cp\u003eThe cases NLAs cannot see matter as much as the ones they can. Information stored nonlinearly may be readable to NLAs in principle, because the AV is a full LLM and can compute over the activation, but missed by linear probes. Information that is genuinely unverbalizable, encoded in a form the model cannot introspect on or express in language, would be missed by NLAs entirely. A backdoor trigger that the model can act on but not describe would not show up.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-changes\"\u003eWhat changes\u003c/h2\u003e\n\u003cp\u003eThis is the first interpretability tool whose output you can read on the train. That changes who can audit. A circuits researcher reading attribution graphs is a scarce resource. A safety reviewer reading paragraphs of decoded activations is not. The cost structure of model auditing changes when the bottleneck moves from interpretation to reading.\u003c/p\u003e\n\u003cp\u003eWatch whether \u0026ldquo;readable\u0026rdquo; ends up meaning \u0026ldquo;plausible-sounding\u0026rdquo; more often than \u0026ldquo;true.\u0026rdquo; The confabulation rate is flat over training. The voice is inherited. The auditing-game number is a comparison that flatters the method against a baseline almost no production audit faces. NLA outputs deserve the same skepticism as any other LLM-generated text, more than the press release suggests and less than the dismissive read implies.\u003c/p\u003e\n\u003cp\u003eTan could not say what he understood. NLAs read what Claude does not say.\u003c/p\u003e\n\u003cp\u003eMy prediction: the first paper to wire NLA-readable activations into an RL reward gets written within a year, and that paper does not mention the trade. Every NLA paper from here on will have a methodology section that either preserves the passive-read property or quietly abandons it; read the methodology before you read the result.\u003c/p\u003e\n","summary":"Anthropic's natural language autoencoders translate Claude's activations into readable text. The method works. The press release skips three structural problems.","image":"https://static.philippdubach.com/claude_understanding_image.jpg","date_published":"2026-05-11T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Commentary","word_count":2503,"reading_time_minutes":12,"keywords":["Natural Language Autoencoders","Claude evaluation awareness","Anthropic interpretability","activation verbalizer","foundation model audit","Anthropic NLA paper","Claude interpretability","activation reconstructor","Claude Opus 4.6 audit","sparse autoencoders vs NLA","attribution graphs","mechanistic interpretability","model auditing reward sycophancy","Claude planning rabbit rhyme","warm-start NLA Opus 4.5","Goodhart law interpretability","Anthropic Mythos Preview audit","verbalizable activations LLM","AI safety evaluation awareness","Transformer Circuits NLA paper","GRPO interpretability","Anthropic May 2026 paper","AI model auditing cost","how do natural language autoencoders work","sparse autoencoders vs natural language autoencoders"],"section":"posts"}},{"id":"https://philippdubach.com/posts/two-anthropics/","url":"https://philippdubach.com/posts/two-anthropics/","title":"Two Anthropics","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-cover-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/cover.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/cover.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/cover.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/cover.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cover.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/cover.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/cover.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/cover.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cover.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/cover.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/cover.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/cover.png\"\n           alt=\"Illustrated portrait of Dario Amodei with a translucent head revealing a warmly lit writing room inside, on a cream background\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-cover-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/cover.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Illustrated portrait of Dario Amodei with a translucent head revealing a warmly lit writing room inside, on a cream background\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eAnthropic was founded to be the safety lab that would pull rivals upward. Five years later it is the most aggressive frontier scaler at \u003cstrong\u003e$380 billion\u003c/strong\u003e, the company most likely to build the dangerous thing it warns about.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003e\u003cem\u003eA personal note first. This post is an outtake from a \u003ca href=\"https://phil-dubach.com/dario-amodei-profile/\"\u003e14,000-word profile of Dario Amodei\u003c/a\u003e I just published. I don\u0026rsquo;t like Anthropic noticeably more than I dislike the other hyperscalers and AI model providers. Amodei is probably the AI CEO whose language and thinking land closest to mine, so over the last few weeks I worked through a dozen of his interviews, a stack of his essays, and a lot of hours of him on YouTube. The longform is the portrait. This post is the structural argument that fell out of it. If you want the character work, the family backstory, and the scenes a paradox piece can\u0026rsquo;t carry, read the full thing.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eDario Amodei founded Anthropic in 2021 with six other ex-OpenAI researchers and a thesis with two halves: (1) powerful AI is coming whether or not safety-aligned labs are at the frontier, so safety-aligned labs have to be at the frontier. (2) the load-bearing claim, was that competing on safety would pull rivals upward, a \u0026ldquo;race to the top.\u0026rdquo; Both halves are still in the company\u0026rsquo;s public materials. The first half is doing fine. The second is colliding with what the company has actually become. The \u003ca href=\"https://www.anthropic.com/news/anthropic-raises-124-million-to-build-more-reliable-general-ai-systems\"\u003eSeries A raised $124 million\u003c/a\u003e in May 2021. The \u003ca href=\"https://www.anthropic.com/news/anthropic-raises-series-e-at-usd61-5b-post-money-valuation\"\u003eLightspeed round added $3.5 billion\u003c/a\u003e in early 2025. By September 2025 \u003ca href=\"https://www.bloomberg.com/news/articles/2025-09-02/anthropic-completes-new-funding-round-at-183-billion-valuation\"\u003ethe valuation was $183 billion\u003c/a\u003e; by February 2026 it had reached $380 billion. Revenue went from zero to roughly $10 billion annualized in three years, with 10x year-on-year growth in each of them. The character work behind that claim is in a separate longform, \u003ca href=\"https://phil-dubach.com/dario-amodei-profile/\"\u003eInside the Mind of Dario Amodei\u003c/a\u003e. This post is the strategic argument that emerges from it.\u003c/p\u003e\n\u003ch2 id=\"race-to-the-top-on-paper\"\u003eRace to the top, on paper\u003c/h2\u003e\n\u003cp\u003eAnthropic \u003ca href=\"https://www.anthropic.com/news/the-long-term-benefit-trust\"\u003eregistered as a Public Benefit Corporation\u003c/a\u003e. Its self-description is that it is \u0026ldquo;an AI safety lab that is also an AI lab,\u0026rdquo;. The argument goes: a lab that genuinely cares about safety has to be commercially competitive at the frontier, because otherwise the frontier is set by labs that care less. Being at the frontier lets you publish safety practices, hire the best alignment researchers, and shape policy with credibility. Rivals see your practices working and copy them. The whole industry shifts. Race to the top.\u003c/p\u003e\n\u003cp\u003eIn January 2026, Amodei published \u003ca href=\"https://www.darioamodei.com/essay/the-adolescence-of-technology\"\u003e\u003cem\u003eThe Adolescence of Technology\u003c/em\u003e\u003c/a\u003e, a roughly 22,000-word essay laying out a five-category risk taxonomy: misalignment, individual misuse, state misuse, economic disruption, and indirect or unknown effects. The essay\u0026rsquo;s anchor sentence on the geopolitical category reads:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eautocracy is simply not a form of government that people can accept in the post-powerful AI age.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThis is the language of someone who believes the company\u0026rsquo;s mission is civilizational, and who is willing to say so under his own name. It is also the language of a company that has positioned itself as the democratic-world frontier lab, which has consequences for who its customers and adversaries become.\u003c/p\u003e\n\u003cp\u003eThe strongest external signal that Anthropic\u0026rsquo;s safety thesis is not pure positioning came in November 2023. After Sam Altman\u0026rsquo;s brief firing and reinstatement at OpenAI, \u003ca href=\"https://finance.yahoo.com/news/openais-board-approached-anthropic-ceo-041651334.html\"\u003ethe OpenAI board approached Amodei with two offers\u003c/a\u003e: take the CEO job, or merge Anthropic into OpenAI. He declined both. Walking away from the CEO chair at the most-valuable AI company in the world, less than three years after leaving it, was the most expensive credibility signal Amodei could send that the safety thesis was the actual thesis and not a brand exercise. Roughly fourteen OpenAI researchers had followed him out two years earlier.\u003c/p\u003e\n\u003cp\u003eThere is a softer counterweight worth keeping in mind. Asked by \u003ca href=\"https://podcasts.apple.com/de/podcast/dario-amodei-ceo-of-anthropic-claude-new-models-ai/id1614211565?i=1000660259302\"\u003eNicolai Tangen in 2024\u003c/a\u003e about scaling timelines, Amodei said:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003efrankly, although I invented AI scaling, I don\u0026rsquo;t know that much about that either. I can\u0026rsquo;t predict it.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"what-scaling-produced\"\u003eWhat scaling produced\u003c/h2\u003e\n\u003cp\u003eFive years after that $124 million Series A in May 2021, Anthropic is valued at \u003cstrong\u003e$380 billion\u003c/strong\u003e and runs at roughly $10 billion in annualized revenue. \u003ca href=\"https://www.bigtechnology.com/p/the-making-of-dario-amodei\"\u003eRevenue went from zero to $100 million in 2023, $100 million to $1 billion in 2024, then to a $10 billion run-rate by the end of 2025\u003c/a\u003e, three consecutive years of 10x growth. \u003ca href=\"https://www.theinformation.com/articles/anthropic-projects-70-billion-revenue-17-billion-cash-flow-2028\"\u003eInvestor decks show projections of $26 billion for 2026 and $70 billion for 2028\u003c/a\u003e. The forward trajectory is the steepest in the history of enterprise software.\u003c/p\u003e\n\u003cp\u003eThe capital base behind that revenue is more telling than the revenue itself. \u003ca href=\"https://www.geekwire.com/2024/amazon-boosts-total-anthropic-investment-to-8b-deepens-ai-partnership-with-claude-maker/\"\u003eAmazon has committed roughly $8 billion cumulatively\u003c/a\u003e. \u003ca href=\"https://fortune.com/2025/01/28/venture-capital-thrive-deepseek-openai-anthropic-lightspeed/\"\u003eLightspeed wired the first $1 billion of the $3.5 billion early-2025 round on the same Monday Nvidia dropped 17%\u003c/a\u003e on the DeepSeek shock, a piece of timing that, deliberately or not, communicated conviction at the moment everyone else flinched. A year later, the \u003ca href=\"https://finance.yahoo.com/news/anthropic-lands-30-billion-380-112221968.html\"\u003e$30 billion Series G in February 2026\u003c/a\u003e, led by GIC and Coatue, brought the post-money valuation to $380 billion in the second-largest tech funding round on record. \u003ca href=\"https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure\"\u003eThe Fluidstack data-center deal was $50 billion\u003c/a\u003e. \u003ca href=\"https://www.aboutamazon.com/news/aws/aws-project-rainier-ai-trainium-chips-compute-cluster\"\u003eProject Rainier, the Amazon-anchored compute build, was $11 billion\u003c/a\u003e. Compute commitments through 2028 total roughly $78 billion. \u003ca href=\"https://fortune.com/2025/12/04/how-anthropic-grew-what-the-183-billion-giant-faces-next/\"\u003eThe headcount is ~2,500 employees\u003c/a\u003e as of late 2025, up from a few hundred two years earlier.\u003c/p\u003e\n\u003cp\u003eThe customer list reads like a sales-led enterprise software company, because that is what it now is. \u003ca href=\"https://assets.anthropic.com/m/58117a1f3273170b/original/Anthropic-Pfizer-case-study-one-sheeters.pdf\"\u003ePfizer\u003c/a\u003e. United Airlines. \u003ca href=\"https://claude.com/customers/novo-nordisk\"\u003eNovo Nordisk\u003c/a\u003e. \u003ca href=\"https://www.carriermanagement.com/features/2025/04/28/274588.htm\"\u003eAIG, which reports an 8x to 10x speed-up on insurance underwriting workflows\u003c/a\u003e over an 18-month pilot. These are not safety partnerships. These are revenue contracts in regulated industries that came down to a procurement bake-off. \u003ca href=\"https://www.anthropic.com/product/claude-code\"\u003eClaude Code, the developer-tier productisation that shipped in February 2025\u003c/a\u003e, gave Anthropic a per-seat developer footprint inside enterprises that previously bought through API alone. Anthropic claims internally that it generates 2.1x more revenue per dollar of compute than its largest rival, a figure that should be treated as a claim rather than a verifiable number, but which is consistent with a company optimizing seriously for unit economics.\u003c/p\u003e\n\u003cp\u003eAnthropic today is a frontier lab competing for every contract that crosses procurement. The structural pivot is that this is no longer a research lab with a corporate appendage. It is a frontier company with a research function, and the research function inherits the constraints of the company, not the other way around. None of this is bad on its own. The question is whether the founding thesis still describes what the organism does.\u003c/p\u003e\n\u003ch2 id=\"why-both-cannot-fully-be-true\"\u003eWhy both cannot fully be true\u003c/h2\u003e\n\u003cp\u003eTo pull rivals upward you have to stay competitive at the frontier of the thing you call dangerous. The faster you scale, the more your safety claim depends on rivals copying your practices rather than on you slowing down. But rivals\u0026rsquo; incentive to copy weakens precisely as you become a credible competitor on revenue and contracts. One empirical hedge worth flagging: \u003ca href=\"https://openai.com/index/updating-our-preparedness-framework/\"\u003eOpenAI\u0026rsquo;s Preparedness Framework\u003c/a\u003e and \u003ca href=\"https://deepmind.google/blog/introducing-the-frontier-safety-framework/\"\u003eDeepMind\u0026rsquo;s Frontier Safety Framework\u003c/a\u003e share structural features with \u003ca href=\"https://www.anthropic.com/responsible-scaling-policy\"\u003eAnthropic\u0026rsquo;s Responsible Scaling Policy\u003c/a\u003e, and some of the convergence may be parallel development inside large labs facing similar regulatory pressures, not Anthropic-pulled diffusion. The race-to-the-top claim is harder to falsify than it looks.\u003c/p\u003e\n\u003cp\u003eAmodei himself has publicly acknowledged the tension, \u003ca href=\"https://fortune.com/2026/02/17/anthropic-ceo-dario-amodei-balancing-safety-commercial-pressure-ai-race-openai/\"\u003etelling Fortune\u003c/a\u003e that Anthropic struggles to balance its safety mission with commercial pressure. Three forcing functions are the operational shape of that struggle.\u003c/p\u003e\n\u003cp\u003e(1)) the Department of Defense. On March 26, 2026, \u003ca href=\"https://www.cnbc.com/2026/03/26/anthropic-pentagon-dod-claude-court-ruling.html\"\u003ea federal judge issued a temporary injunction against the DoD\u003c/a\u003e in a dispute that started when \u003ca href=\"https://www.cnn.com/2026/02/24/tech/hegseth-anthropic-ai-military-amodei\"\u003ePete Hegseth\u0026rsquo;s department asked Anthropic to drop the contractual ban on Claude being used for mass domestic surveillance or fully autonomous weapons\u003c/a\u003e in democratic countries. Anthropic refused. The DoD then labeled the company a \u0026ldquo;supply-chain risk.\u0026rdquo; The judge\u0026rsquo;s written opinion described the DoD\u0026rsquo;s actions as \u0026ldquo;classic First Amendment retaliation.\u0026rdquo; The point is not that Anthropic was wrong to refuse, it almost certainly was right to refuse, but that at frontier scale a safety constraint becomes a federal court fight, not a research-policy choice.\u003c/p\u003e\n\u003cp\u003e(2)) the Pottinger op-ed. In January 2025, Amodei co-authored a \u003cem\u003eWall Street Journal\u003c/em\u003e opinion piece titled \u003ca href=\"https://www.fdd.org/analysis/2025/01/06/trump-can-keep-americas-ai-advantage/\"\u003e\u003cem\u003eTrump Can Keep America\u0026rsquo;s AI Advantage\u003c/em\u003e\u003c/a\u003e with Matt Pottinger, the former deputy national security advisor. The piece argued for tighter chip-export controls against China. This is policy advocacy from the perspective of a national-security actor, not a research lab. There is a coherent through-line from the safety thesis to chip controls, the argument is that frontier capability in the wrong hands is a category-five risk, but the public posture is different from \u0026ldquo;we publish safety research and hope rivals copy us.\u0026rdquo; Anthropic is now a constituency in great-power competition.\u003c/p\u003e\n\u003cp\u003e(3) the Huang feud. In August 2025, \u003ca href=\"https://fortune.com/2025/08/01/dario-amodei-outrageous-lie-jensen-huang-anthropic-nvidia-regulation/\"\u003eAmodei and Jensen Huang traded public criticisms over export controls\u003c/a\u003e. Amodei accused Huang of an \u0026ldquo;outrageous lie.\u0026rdquo; He grew visibly emotional discussing his father\u0026rsquo;s preventable death, a thread that surfaces in nearly every long-form interview he gives and is the emotional spine of the longform profile. The Nvidia CEO is one of two or three people who can directly affect Anthropic\u0026rsquo;s compute supply. Picking that fight in public is a choice you make as a political actor, not as a research lab. It is also a choice the founding thesis would not have predicted in 2021.\u003c/p\u003e\n\u003cp\u003eThe point is not that any of these are bad. Each is defensible on the merits. The point is that \u0026ldquo;race to the top\u0026rdquo; was a 2021 framing for a 2021 company. The 2026 company is a different organism. It has federal-court fights, named geopolitical adversaries, a \u003ca href=\"https://time.com/7299044/senators-reject-10-year-ban-on-state-level-ai-regulation-in-blow-to-big-tech/\"\u003eSenate that voted 99-1 against the kind of ten-year state-AI moratorium\u003c/a\u003e Amodei opposed in his June 2025 \u003cem\u003eNew York Times\u003c/em\u003e op-ed, and a 60% revenue-growth trajectory on a forward base measured in tens of billions. None of those is what a safety lab looks like. All of them are what a frontier company that takes safety seriously looks like.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"three-scenarios\"\u003eThree scenarios\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003e(A) The thesis holds.\u003c/strong\u003e Frontier labs converge on Anthropic-style safety practices, race-to-the-top works as advertised, and Anthropic earns a durable safety-narrative premium. The conditions for this case are an EU AI Act enforced with teeth and a US transparency framework that gives federal cover to the practices Anthropic already publishes. There is some support: the Senate\u0026rsquo;s 99-1 vote against the proposed ten-year state-AI moratorium suggests the political ground is not hostile to oversight. The reasons to discount it: the Senate vote was a defensive outcome, not a positive endorsement of federal standards, and a US transparency framework is still nowhere despite Amodei\u0026rsquo;s June 2025 \u003cem\u003eNYT\u003c/em\u003e op-ed (\u003cem\u003eDon\u0026rsquo;t Let A.I. Companies off the Hook\u003c/em\u003e). The regulatory tailwind that scenario (A) needs has not materialized at the speed the thesis requires.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(B) The thesis becomes a constraint, not a moat.\u003c/strong\u003e \u003cem\u003eMost likely.\u003c/em\u003e Anthropic loses ground on raw frontier capability to less-constrained competitors, xAI, a more permissive next-generation OpenAI, leading Chinese labs, and the safety stance becomes a self-imposed handicap rather than a market-shaping lever. The \u003ca href=\"https://fortune.com/2026/01/27/anthropic-billionaire-cofounders-ceo-dario-amodei-giving-away-80-percent-of-wealth-fighting-inequality-ai-revolution/\"\u003e80% wealth pledge from Anthropic\u0026rsquo;s seven cofounders\u003c/a\u003e, disclosed in the January 2026 essay, is a real governance constraint, not a PR move, but it works as a partial offset to wealth-concentration concerns rather than a reversal of the competitive dynamic. The DoD fight, the Pottinger op-ed, and the Huang feud are early evidence that at frontier scale, your safety stance creates adversaries you cannot route around. The pattern accelerated in early 2026, when \u003ca href=\"https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/\"\u003eTime reported\u003c/a\u003e that Anthropic dropped its flagship safety pledge in favor of a non-binding framework the company itself said \u0026ldquo;can and will change,\u0026rdquo; the kind of operational adjustment scenario (B) predicts. The implication is that the safety premium investors paid in 2021-2023 should compress, because the thing the premium was paying for, a safety lab that pulled rivals upward, is becoming a frontier company that constrains itself relative to less-constrained rivals.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(C) The paradox dissolves because the scale itself ends.\u003c/strong\u003e AI capex hits a Jevons-paradox-for-labor wall, model commoditization compresses margins, frontier scale becomes uneconomic, and Anthropic returns to looking like a research lab again because every lab does. Implication: the paradox was about a phase, not a company. The reasons to discount this case are that Anthropic\u0026rsquo;s $26 billion and $70 billion forward revenue projections make a structural retreat harder for them than for capex-heavier hyperscalers, and that a commoditization scenario hits Anthropic\u0026rsquo;s gross-margin profile later than it hits the labs running on rented compute.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-this-means-for-pricing-ai-labs\"\u003eWhat this means for pricing AI labs\u003c/h2\u003e\n\u003cp\u003eThree observations for an allocator.\u003c/p\u003e\n\u003cp\u003e(1) safety narrative is not moat at this scale, it is a constraint. Price it accordingly. The premium investors paid in 2021-2023 was for Anthropic against a counterfactual where there was no safety-aligned frontier lab. That counterfactual no longer exists. Anthropic is the frontier lab. The safety stance now binds the company in ways that show up as legal exposure, slower product cadence in some categories, and a narrower set of customers it can serve. None of these alone breaks the company. In aggregate, they change what an investor is paying for.\u003c/p\u003e\n\u003cp\u003e(2) the signal to watch is whether the rate of frontier-capability spread is faster than the rate of safety-practice diffusion. That ratio decides whether race-to-the-top is happening at all. If safety practices spread faster than capability, the thesis works. If capability spreads faster than safety practices, the thesis is a story the company tells about itself. The current evidence leans toward the latter, but it is the kind of variable that can move with one EU enforcement action or one US framework.\u003c/p\u003e\n\u003cp\u003e(3) Anthropic-the-company and Anthropic-the-thesis are now two different things. An investor can be long the company and short the thesis. The company has revenue, named customers, governance constraints that work as a partial offset to wealth-concentration risk, and a forward trajectory that is hard to bet against. The thesis is a 2021 framing under increasing structural strain. The longform makes the case that Dario sees this; \u003cem\u003ehere we make the case that the market should price it.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eBased on twenty-five hours of Dario Amodei\u0026rsquo;s on-record interviews and six long-form essays. The full reported profile, \u003ca href=\"https://phil-dubach.com/dario-amodei-profile/\"\u003eInside the Mind of Dario Amodei\u003c/a\u003e, runs 14,000 words at the author\u0026rsquo;s newsletter.\u003c/em\u003e\u003c/p\u003e\n","summary":"Two Anthropics: the safety lab Dario founded in 2021 and the $380B frontier lab it became. Same organism, two narratives. Three scenarios for how this resolves.","image":"https://static.philippdubach.com/ograph/ograph-two-anthropics.jpg","date_published":"2026-05-09T00:00:00Z","date_modified":"2026-05-09T12:39:56+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Investing"],"_philippdubach":{"type":"Commentary","word_count":2355,"reading_time_minutes":12,"keywords":["Two Anthropics","Anthropic 380 billion valuation","Anthropic Paradox","Anthropic moat","Anthropic safety thesis","Anthropic vs OpenAI safety","Dario Amodei race to the top","Anthropic Series G 2026","Anthropic Responsible Scaling Policy","AI safety lab business model","AI safety lab valuation","frontier AI lab valuation","Anthropic OpenAI 2023 CEO offer","Anthropic DoD First Amendment ruling","Pottinger chip export controls","AI lab pricing framework allocator","Anthropic safety vs commercial pressure","race to the top AI safety","Anthropic enterprise customers Pfizer United Airlines"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-anatomy-of-a-decentralized-prediction-market-notes-from-the-polymarket-order-book/","url":"https://philippdubach.com/posts/the-anatomy-of-a-decentralized-prediction-market-notes-from-the-polymarket-order-book/","title":"The Anatomy of a Decentralized Prediction Market: Notes from the Polymarket Order Book","content_html":"\u003cbr\u003e\n\u003cp\u003eI spent the last two months running a Polymarket order-book collector. The collector runs on a small VM, subscribes to the public WebSocket feed, and writes one Parquet file per UTC hour. By 2026-04-15 the archive had grown to 1,262 hourly files, 30,287,264,368 events, 623.8 GB on disk, covering 52 calendar days and 385,198 distinct market ids. The first version of the \u003ca href=\"https://arxiv.org/abs/2604.24366\"\u003epaper\u003c/a\u003e is up on arXiv, the \u003ca href=\"https://github.com/philippdubach/polymarket-microstructure\"\u003ereplication package\u003c/a\u003e is on GitHub and Zenodo (DOI \u003ca href=\"https://doi.org/10.5281/zenodo.19811426\"\u003e10.5281/zenodo.19811426\u003c/a\u003e), and the manuscript is under review at the \u003ca href=\"https://www.sciencedirect.com/journal/journal-of-financial-markets\"\u003eJournal of Financial Markets\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThis post walks through what\u0026rsquo;s in it.\u003c/p\u003e\n\u003ch2 id=\"why-bother\"\u003eWhy bother\u003c/h2\u003e\n\u003cp\u003ePrediction markets aggregate dispersed beliefs into a single price that, in equilibrium, behaves like a probability. The empirical literature has historically focused on price-level questions: forecast accuracy, calibration against realised outcomes, the longshot bias, and the extent to which informed and uninformed traders coexist on the same venue. Yet microstructure is what determines the trading cost of holding an informational position, and the wider the cost, the smaller the informational signal that survives to the price. A prediction market with noisy microstructure produces noisier prices than the headline-level aggregation literature implicitly assumes.\u003c/p\u003e\n\u003cp\u003eThe microstructure of prediction markets has remained under-studied. Limit-order-book depth profiles, spread decompositions, and the behaviour of liquidity providers near resolution were largely unobservable on the early venues, which used scoring rules (IEM, Hanson\u0026rsquo;s logarithmic market scoring rule) or sparse parimutuel pools. Polymarket changed this. Since 2021 the venue has run a limit-order-book exchange on Polygon, settling in USDC against an on-chain conditional-token contract. The infrastructure is finally there. What\u0026rsquo;s missing is the data work to actually use it.\u003c/p\u003e\n\u003ch2 id=\"two-contributions\"\u003eTwo contributions\u003c/h2\u003e\n\u003cp\u003eThe paper has two empirical contributions, ordered by weight for the literature.\u003c/p\u003e\n\u003cp\u003eThe first is a measurement result. Trade direction inferred from Polymarket\u0026rsquo;s public feed is noise-dominated: about 59% volume-weighted sign agreement with the on-chain ground truth on the top-100 stratum, well below the 80% Lee-Ready achieves on equities. Any Polymarket microstructure result that depends on trade direction has to source it from on-chain \u003ccode\u003eOrderFilled\u003c/code\u003e events.\u003c/p\u003e\n\u003cp\u003eThe second is a set of eight cross-sectional stylized facts on a pre-registered 600-market panel, observed simultaneously over a single 28-day scrape window, with microstructure measures computed on the full event tape and a direct on-chain trade record. None of the eight facts requires the on-chain join, and each is reported on the full panel.\u003c/p\u003e\n\u003cp\u003eThe two are complementary. The stylized facts characterise Polymarket\u0026rsquo;s microstructure on its own terms. The measurement result sets the conditions under which any trade-direction-dependent statement about Polymarket can be trusted.\u003c/p\u003e\n\u003ch2 id=\"the-data\"\u003eThe data\u003c/h2\u003e\n\u003cp\u003eThe primary input is a continuous tick-level archive of the public WebSocket feed from 2026-02-21 16:00 UTC through 2026-04-15 08:00 UTC. The schema is preserved verbatim from the WebSocket payload; eager Pydantic parsing was a multi-hour operation at this row count, so I parsed JSON only on the events that survived market-id and time-window filters.\u003c/p\u003e\n\u003cp\u003eJoining to on-chain trades took most of the engineering time. Polymarket\u0026rsquo;s CTF Exchange smart contract logs \u003ccode\u003eOrderFilled\u003c/code\u003e events whose payloads identify both counterparties and the aggressor side. I scraped 255,425,405 fills across a 28-day calibration window (2026-02-28 to 2026-03-27) using batched \u003ccode\u003eeth_getLogs\u003c/code\u003e calls against a Polygon RPC provider, with adaptive chunk sizing that respects provider-specific rate limits. The off-chain feed is keyed on \u003ccode\u003emarket_id\u003c/code\u003e, the on-chain record on \u003ccode\u003emakerAssetId\u003c/code\u003e / \u003ccode\u003etakerAssetId\u003c/code\u003e. The bridge between the two is the \u003ccode\u003e(condition_id, yes_token_id, no_token_id)\u003c/code\u003e mapping pulled from the CLOB REST endpoint and cached locally. CLOB REST resolves all 385,198 archive market ids; the Gamma metadata API, which is sometimes used in the literature, indexes only 34,764 markets.\u003c/p\u003e\n\u003cp\u003eThe 600-market panel selection rule (volume metric, random-stratum eligibility threshold, random seed, category scheme) was committed in a \u003ca href=\"https://github.com/philippdubach/polymarket-microstructure\"\u003epre-registration document\u003c/a\u003e before computing the panel. A deterministic build script emits the panel parquet, whose SHA-256 is recorded back into the pre-registration document before any analysis runs. This goes beyond the empirical-microstructure norm; the cost is one document and one hash, the benefit is that a reader can verify no market was added or removed after the analysis ran.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"eight-stylized-facts\"\u003eEight stylized facts\u003c/h2\u003e\n\u003ch3 id=\"sf1--longshot-spread-premium\"\u003eSF1 — Longshot spread premium\u003c/h3\u003e\n\u003cp\u003eQuoted half-spread, binned by per-market mean mid price into ten deciles. Median spread is about 400 bps in the central [0.4, 0.6] range and climbs to 1,300-1,800 bps for markets trading below 0.10. The pattern is asymmetric: the low-probability side is wider than the high-probability side, which echoes the longshot bias documented in the racetrack and parimutuel literature (\u003ca href=\"https://www.journals.uchicago.edu/doi/abs/10.1086/655844\"\u003eSnowberg and Wolfers, 2010\u003c/a\u003e; \u003ca href=\"https://www.aeaweb.org/articles?id=10.1257/jep.2.2.161\"\u003eThaler and Ziemba, 1988\u003c/a\u003e) and the prediction-market longshot evidence on Iowa Electronic Markets and TradeSports surveyed by \u003ca href=\"https://www.aeaweb.org/articles?id=10.1257/0895330041371321\"\u003eWolfers and Zitzewitz (2004)\u003c/a\u003e. The direction is the same; the magnitude is not. The 1,300-1,800 bps full quoted spread on Polymarket\u0026rsquo;s lowest-probability decile (650-900 bps half-spread) is an order of magnitude wider than on a continuous-payoff sportsbook market, which reads less like the classical risk-love or misperception story and more like a liquidity-provision constraint.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sf1_longshot-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sf1_longshot.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sf1_longshot.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sf1_longshot.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sf1_longshot.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf1_longshot.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sf1_longshot.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sf1_longshot.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sf1_longshot.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf1_longshot.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sf1_longshot.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sf1_longshot.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf1_longshot.png\"\n           alt=\"SF1 panel: median quoted half-spread (bps) per mid-price decile across the 600-market panel; longshot spread premium climbs to 1,300-1,800 bps below 0.10\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sf1_longshot-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sf1_longshot.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"SF1 panel: median quoted half-spread (bps) per mid-price decile across the 600-market panel; longshot spread premium climbs to 1,300-1,800 bps below 0.10\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"sf2--depth-concentration\"\u003eSF2 — Depth concentration\u003c/h3\u003e\n\u003cp\u003eFor each market I summarise the L2 depth profile by the ratio $\\text{depth}\u003cem\u003e{L=1} / \\text{depth}\u003c/em\u003e{L=10}$, the share of cumulative top-10 depth held at the top-of-book. A value of 1.0 means the entire top-10 depth sits at level 1 (a thin, top-heavy book); 0.1 matches a uniform grid where each level carries equal depth. For the 546 markets with non-null depth, the median ratio is 0.137, close to the uniform-grid benchmark, with $p_{10} = 0.033$ and $p_{90} = 0.428$. The folk view that prediction-market books are concentrated at top-of-book does not hold on Polymarket: depth is generally layered deeper into the book.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sf2_depth_profile-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sf2_depth_profile.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sf2_depth_profile.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sf2_depth_profile.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sf2_depth_profile.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf2_depth_profile.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sf2_depth_profile.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sf2_depth_profile.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sf2_depth_profile.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf2_depth_profile.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sf2_depth_profile.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sf2_depth_profile.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf2_depth_profile.png\"\n           alt=\"SF2 panel: histogram of L1/L10 depth-concentration ratio across 546 panel markets; median 0.137, vertical lines mark the uniform-grid benchmark (0.10) and the fully top-of-book limit (1.0)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sf2_depth_profile-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sf2_depth_profile.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"SF2 panel: histogram of L1/L10 depth-concentration ratio across 546 panel markets; median 0.137, vertical lines mark the uniform-grid benchmark (0.10) and the fully top-of-book limit (1.0)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"sf4--maker-wallet-diversity\"\u003eSF4 — Maker-wallet diversity\u003c/h3\u003e\n\u003cp\u003eFor each market I compute the volume-weighted Herfindahl index (HHI) of maker-address share across on-chain trades. Across 600 markets and 6.4M trades, the median HHI is 0.031 (about 32 effective makers). The distribution is right-skewed: $p_{90} = 0.119$ (about 8 effective makers) and a maximum of 0.40 (roughly 3 effective makers). Maker liquidity is decentralised on most markets in the panel, with a tail of thin or niche markets dominated by one to three wallets. This matters for any narrative that frames Polymarket as a venue dominated by a small number of professional liquidity providers. At least in the top-100 by volume, that\u0026rsquo;s not what the data show.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sf4_herfindahl-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sf4_herfindahl.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sf4_herfindahl.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sf4_herfindahl.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sf4_herfindahl.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf4_herfindahl.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sf4_herfindahl.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sf4_herfindahl.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sf4_herfindahl.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf4_herfindahl.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sf4_herfindahl.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sf4_herfindahl.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf4_herfindahl.png\"\n           alt=\"SF4 panel: distribution of per-market maker-address Herfindahl indices across 600 markets; median HHI 0.031 (~32 effective makers), distribution right-skewed\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sf4_herfindahl-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sf4_herfindahl.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"SF4 panel: distribution of per-market maker-address Herfindahl indices across 600 markets; median HHI 0.031 (~32 effective makers), distribution right-skewed\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"sf7--self-counterparty-wash-share\"\u003eSF7 — Self-counterparty wash share\u003c/h3\u003e\n\u003cp\u003eA trade is flagged as wash-suspect under a two-tier rule: (a) \u003ccode\u003emaker == taker\u003c/code\u003e (direct self-match), or (b) a flipped pair $(maker_a, taker_a) \\leftrightarrow (taker_a, maker_a)$ within 128 blocks (Polygon finality buffer) on the same market. This is an explicit lower bound: the detector covers only direct and immediate-roundtrip self-counterparty patterns, not the extended-graph patterns that network-based classifiers like \u003ca href=\"https://academic.oup.com/rfs/article/35/8/3463/6488024\"\u003eCong et al. (2023)\u003c/a\u003e address on unregulated cryptocurrency token exchanges, where wash shares of 25-70% have been documented.\u003c/p\u003e\n\u003cp\u003eAcross 600 markets and 6.4M trades, the median wash share is 0.97%, with p90 at 4.5%, p99 at 10.6%, and a maximum of 22.2%. The gap between this lower bound and the network-classifier estimates on token exchanges has two components: wash patterns that require multi-counterparty graph analysis to detect (which my detector does not cover), plus a venue-class difference in the underlying wash incentives. The first I can quantify only under a graph-classifier extension; the second is identification, not measurement.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sf7_wash-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sf7_wash.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sf7_wash.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sf7_wash.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sf7_wash.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf7_wash.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sf7_wash.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sf7_wash.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sf7_wash.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf7_wash.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sf7_wash.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sf7_wash.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf7_wash.png\"\n           alt=\"SF7 panel: distribution of self-counterparty wash share by market; median 0.97% with right tail to 22.2%, well below the 25-70% range documented on unregulated cryptocurrency token exchanges\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sf7_wash-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sf7_wash.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"SF7 panel: distribution of self-counterparty wash share by market; median 0.97% with right tail to 22.2%, well below the 25-70% range documented on unregulated cryptocurrency token exchanges\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"sf8--depth-decay-near-resolution\"\u003eSF8 — Depth decay near resolution\u003c/h3\u003e\n\u003cp\u003eDo markets near resolution carry shallower books? I regress log mean depth at $L=10$ on log seconds-to-close at the panel window midpoint (2026-03-13), restricted to 322 markets with positive seconds-to-close and non-zero summary depth.\u003c/p\u003e\n\u003cp\u003eThe bivariate slope is 0.818 (HC3 SE 0.113, $t = 7.2$, $R^2 = 0.13$). Category fixed effects (Crypto, Sports, Other, Geopolitics) attenuate the slope to 0.550 (HC3 SE 0.143, $t = 3.85$, $R^2 = 0.22$): roughly a third of the bivariate association is category-level confounding. Adding log panel-window volume on top of category attenuates further to 0.305 (HC3 SE 0.104, $t = 2.94$, $R^2 = 0.49$). The 0.31 slope works out to about 6% less mean depth per 10× reduction in seconds-to-close.\u003c/p\u003e\n\u003cp\u003eThe category + log-volume specification is the conservative reading. Volume mediates the depth-time relationship because markets active longer accumulate more makers, and more makers means more depth, so a regression that omits volume attributes the makers-and-time channel to time alone. The 0.305 coefficient is the residual depth decay after that mediation is netted out; the 0.550 within-category slope reported in the abstract is the figure before mediation is netted out, and is the appropriate one to compare to a literature that does not condition on volume.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-sf8_depth_decay-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/sf8_depth_decay.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/sf8_depth_decay.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/sf8_depth_decay.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/sf8_depth_decay.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf8_depth_decay.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/sf8_depth_decay.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/sf8_depth_decay.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/sf8_depth_decay.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf8_depth_decay.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/sf8_depth_decay.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/sf8_depth_decay.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/sf8_depth_decay.png\"\n           alt=\"SF8 cross-sectional fit of log mean depth on log seconds-to-close at the panel midpoint; slope 0.818 bivariate, 0.550 with category fixed effects, 0.305 adding log volume\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-sf8_depth_decay-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/sf8_depth_decay.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"SF8 cross-sectional fit of log mean depth on log seconds-to-close at the panel midpoint; slope 0.818 bivariate, 0.550 with category fixed effects, 0.305 adding log volume\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003e(SF3 covers Polygon block-clock alignment, SF5 the category-conditional spread, and SF6 the archive-ingestion latency. All three are in the paper. SF6 in particular is a property of the collector pipeline rather than of Polymarket; the median per-market p50 ingestion delay is 41.5 ms, which is a sanity check on the collector, not a microstructure result.)\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-limits-of-orderbook-only-inference\"\u003eThe limits of orderbook-only inference\u003c/h2\u003e\n\u003cp\u003eSix standard direction-dependent microstructure measures (effective spread, realized spread, Roll, Abdi-Ranaldo, Kyle\u0026rsquo;s $\\lambda$, Amihud) need an aggressor sign for each trade. Empirical practice on equity venues sources that sign from a quote-driven feed via Lee-Ready or its variants. The Polymarket public feed does not expose enough information to do this reliably: the \u003ccode\u003echange_side\u003c/code\u003e field marks which side of the book \u003cem\u003emoved\u003c/em\u003e, not which side \u003cem\u003einitiated\u003c/em\u003e the trade.\u003c/p\u003e\n\u003cp\u003eI infer trades from the feed under a LOOSE rule (every resting-size decrement counts) and match the inferred buckets against on-chain \u003ccode\u003eOrderFilled\u003c/code\u003e events at 5-second and exact-price granularity, over four disjoint 7-day windows.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ePanel mean (109 valid cells of 400, 55 markets): 0.615, market-clustered bootstrap 95% CI [0.579, 0.653]\u003c/li\u003e\n\u003cli\u003eVolume-weighted by matched-bucket count (total 125,080): 0.592, bootstrap 95% CI [0.542, 0.659]\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eA volume-weighted sign-agreement of about 59% sits just above the 50% chance baseline. Even when an inferred bucket matches an on-chain bucket in time and price, the inferred aggressor direction is wrong about two trades in five. The mechanism is the feed itself: \u003ccode\u003eprice_change\u003c/code\u003e updates broadcast a post-match snapshot of the resting book without identifying the taker. The \u003ccode\u003echange_side\u003c/code\u003e field marks which side of the book moved, not which side initiated the trade, and using it as a sign proxy produces the about 59% agreement rate.\u003c/p\u003e\n\u003cp\u003eA noisy sign propagates to every measure that consumes it. On the comparable subset of the top-100 panel:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eEffective half-spread changes sign on 67% of markets\u003c/strong\u003e in the first 7-day window when feed-inferred direction is swapped for on-chain ground truth (50% in a second non-overlapping window).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eKyle\u0026rsquo;s $\\lambda$ changes sign on 60% of markets\u003c/strong\u003e in the first window (43% in the second).\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eBoth windows sit in or below the chance band, well short of the about 80% Lee-Ready accuracy documented on Nasdaq.\u003c/p\u003e\n\u003cp\u003eThe Glosten-Harris spread decomposition makes this concrete. Restricted to the top-100 stratum and run on authoritative on-chain trades, the median effective half-spread on the top-100 panel is essentially zero (-0.0003 prob pp), as are the median transitory component (0.00001) and the median adverse-selection component (0.0). Once sign errors are removed, the dollar-weighted \u0026ldquo;adverse selection\u0026rdquo; that orderbook-only inference produces collapses, leaving the typical top-100 market with no detectable systematic spread component on either side.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-spread_decomposition-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/spread_decomposition.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/spread_decomposition.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/spread_decomposition.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/spread_decomposition.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/spread_decomposition.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/spread_decomposition.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/spread_decomposition.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/spread_decomposition.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/spread_decomposition.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/spread_decomposition.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/spread_decomposition.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/spread_decomposition.png\"\n           alt=\"Glosten-Harris spread decomposition across the top-100 stratum: distribution of transitory component c (left) and adverse-selection component phi (right), both in probability points; once sign errors are removed via on-chain trade direction, the dollar-weighted \u0026#39;adverse selection\u0026#39; that orderbook-only inference produces collapses\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-spread_decomposition-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/spread_decomposition.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Glosten-Harris spread decomposition across the top-100 stratum: distribution of transitory component c (left) and adverse-selection component phi (right), both in probability points; once sign errors are removed via on-chain trade direction, the dollar-weighted \u0026#39;adverse selection\u0026#39; that orderbook-only inference produces collapses\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe implication for anyone working on decentralised CLOB venues: the same constraint should bind on any venue where the off-chain matching layer broadcasts a post-match book state without exposing the taker identity (GMX v1, dYdX v3, Loopring\u0026rsquo;s historical CLOB, and similar hybrid architectures). The public feed exposes \u003cem\u003ewhat cleared\u003c/em\u003e but not \u003cem\u003ewho initiated\u003c/em\u003e, and any direction-dependent microstructure measure on those venues will need an authoritative on-chain trade source.\u003c/p\u003e\n\u003ch2 id=\"what-im-not-doing-here\"\u003eWhat I\u0026rsquo;m not doing here\u003c/h2\u003e\n\u003cp\u003eI had a long-ish exchange recently with someone scoping research on the same dataset, which is a useful proxy for what\u0026rsquo;s adjacent but out of scope.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSpread decomposition with bounded payoffs.\u003c/em\u003e Classical Glosten-Harris / Huang-Stoll on prediction markets is open. Prices are bounded in $(0,1)$, which breaks parts of the standard identification, so there is a real research question to do, not just a re-application.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eInsider episodes and wallet-level patterns around resolution.\u003c/em\u003e Probably the most productive lane I am not working on. The JFM submission draws the privacy line at no deanonymisation, only aggregate wallet measures; where you draw it for your own work is your call.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eToxic flow.\u003c/em\u003e Some overlap with one of my planned follow-ups.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eLatency arb (intra-venue).\u003c/em\u003e Doesn\u0026rsquo;t really work as a question on Polymarket. The two-clock gap in the WebSocket feed (SF6) is archive-ingestion delay, not trader latency, and without an exchange-side clock you can\u0026rsquo;t separate the two. Cross-venue is the workable framing.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCross-venue arb.\u003c/em\u003e Open and product-relevant. Kalshi, PredictIt, sports-book mirrors. The data engineering is probably the harder part.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"replication\"\u003eReplication\u003c/h2\u003e\n\u003cp\u003eEverything is reproducible from the public on-chain record and your own WebSocket capture. I\u0026rsquo;m not redistributing the 624 GB raw archive (it is too large to move around practically), but the panel artifacts and the on-chain scrape pipeline are public:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eCode: \u003ca href=\"https://github.com/philippdubach/polymarket-microstructure\"\u003egithub.com/philippdubach/polymarket-microstructure\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eReplication package (DOI): \u003ca href=\"https://doi.org/10.5281/zenodo.19811426\"\u003e10.5281/zenodo.19811426\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003ePaper: \u003ca href=\"https://arxiv.org/abs/2604.24366\"\u003earXiv:2604.24366\u003c/a\u003e, under review at the \u003ca href=\"https://www.sciencedirect.com/journal/journal-of-financial-markets\"\u003eJournal of Financial Markets\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eCollector: \u003ca href=\"https://github.com/pmxt-dev/pmxt\"\u003epmxt-dev/pmxt\u003c/a\u003e is a good starting point if you want to run your own. The Polymarket WebSocket is a public feed, so once you have a collector running you\u0026rsquo;ll have continuous coverage going forward, which for product work is probably more useful than a static historical slice anyway.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe CTF Exchange V1 → V2 cutover at the end of April 2026 closes my scrape window and opens a venue-evolution comparison. That, plus a per-market depth time series that would let SF8 move from a cross-sectional to a within-market depth-decay regression, plus a cross-venue analysis against Kalshi and sports-book mirrors that would address the price-discovery question this paper leaves open. Those are the obvious next things.\u003c/p\u003e\n","summary":"A 624 GB tick-level archive of Polymarket's WebSocket feed joined to the on-chain trade record reveals eight cross-sectional stylized facts and a measurement result: the public feed only recovers trade direction 59% of the time.","image":"https://static.philippdubach.com/ograph/ograph-polymarket-microstructure.jpg","date_published":"2026-05-02T00:00:00Z","date_modified":"2026-05-04T13:48:47+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Project","word_count":2275,"reading_time_minutes":11,"keywords":["Polymarket microstructure","decentralized prediction market order book","Polymarket WebSocket feed","CTF Exchange OrderFilled","longshot spread premium prediction markets","trade direction inference Polymarket","Lee-Ready algorithm crypto","Kyle's lambda Polymarket","prediction market wash trading","Polygon CLOB microstructure","decentralized finance market microstructure","on-chain trade direction","30 billion order book events","Polymarket research replication"],"doi":"10.48550/arXiv.2604.24366","section":"posts"}},{"id":"https://philippdubach.com/posts/karpathys-software-3.0-playbook/","url":"https://philippdubach.com/posts/karpathys-software-3.0-playbook/","title":"Karpathy's Software 3.0 Playbook","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-karpathy_header-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/karpathy_header.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/karpathy_header.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/karpathy_header.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/karpathy_header.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/karpathy_header.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/karpathy_header.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/karpathy_header.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/karpathy_header.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/karpathy_header.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/karpathy_header.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/karpathy_header.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/karpathy_header.png\"\n           alt=\"Still from Andrej Karpathy\u0026#39;s interview with Sequoia at AI Ascent discussing Software 3.0, vibe coding, and agentic engineering\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-karpathy_header-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/karpathy_header.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Still from Andrej Karpathy\u0026#39;s interview with Sequoia at AI Ascent discussing Software 3.0, vibe coding, and agentic engineering\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eAndrej Karpathy is one of the few people who has both built modern AI and explained it for the rest of us. He co-founded OpenAI, ran computer vision at Tesla (where he got Autopilot working), and his courses on neural networks are some of the most-watched lectures on the internet. He also has a habit of naming the era we\u0026rsquo;re already in. \u0026ldquo;Vibe coding\u0026rdquo; was his. \u0026ldquo;Software 3.0\u0026rdquo; looks like the next one.\u003c/p\u003e\n\u003cp\u003eSo when Karpathy says he has \u0026ldquo;never felt more behind as a programmer,\u0026rdquo; it is worth slowing down. That isn\u0026rsquo;t false modesty from a guy with his résumé. Something shifted under the field and most people haven\u0026rsquo;t recalibrated. The Sequoia interview below is his attempt to describe what shifted. The lessons here are pulled from it, ordered roughly by how much they should change what you do tomorrow.\u003c/p\u003e\n\u003ch2 id=\"1-inflection-point-december-2025\"\u003e1. Inflection point December 2025\u003c/h2\u003e\n\u003cp\u003eUntil late last year, agentic coding tools were \u0026ldquo;kind of helpful.\u0026rdquo; Good in stretches, often wrong in ways you had to babysit. Over the December break, the latest models crossed a line:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;I kept asking for more and just came out fine. And then I can\u0026rsquo;t remember the last time I corrected it. And then I just trusted the system more and more. And then I was vibe coding.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eHe flagged it on the record, loudly:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;A lot of people experienced AI last year as a ChatGPT-adjacent thing. But you really had to look again, and you had to look as of December, because things have changed fundamentally — especially on this agentic, coherent workflow that really started to actually work.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIf your mental model of these tools was set by ChatGPT, it is already a generation stale. The agentic workflow is a different product, and it now works.\u003c/p\u003e\n\u003ch2 id=\"2-you-can-outsource-your-thinking-but-not-your-understanding\"\u003e2. You can outsource your thinking, but not your understanding\u003c/h2\u003e\n\u003cp\u003eThe most quotable line of the interview:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;You can outsource your thinking, but you can\u0026rsquo;t outsource your understanding.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAs agents do more of the thinking, the bottleneck moves into your head. You still have to know what is worth building and why, and you still have to direct the work.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;I\u0026rsquo;m still part of the system, and I still have to — somehow, information still has to make it into my brain. And I feel like I\u0026rsquo;m becoming a bottleneck of just even knowing what are we trying to build, why is it worth doing, how do I direct my agents.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eYour value sits upstream of execution. The bottleneck of the next decade is less about compute than about how fast humans can deepen comprehension to keep directing systems that out-execute them. That is why Karpathy keeps building knowledge bases out of his own reading. He wants another projection of the same data, faster.\u003c/p\u003e\n\u003ch2 id=\"3-verifiability-is-the-map-of-what-automates-next\"\u003e3. Verifiability is the map of what automates next\u003c/h2\u003e\n\u003cp\u003eWhy are these models freakishly good at code and math, and yet stupid about whether you should walk 50 meters to a car wash? Because frontier labs train via reinforcement learning, and RL needs verifiable rewards. Verifiable domains attract environments and signal, so they get the steepest gains. Everything else stays jagged.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a hundred-thousand-line code base or find zero-day vulnerabilities, and yet tells me to walk to this car wash? This is insane.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe GPT-3.5 to GPT-4 chess jump is the proof point. Capability tracks what the labs choose to feed in.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;We are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix\u0026hellip; If you\u0026rsquo;re in the circuits that were part of the RL, you fly. And if you\u0026rsquo;re in the circuits that are out of the data distribution, you\u0026rsquo;re going to struggle.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eTwo things follow. If you are a founder and you can build a verifiable environment in your domain, even one the labs aren\u0026rsquo;t focused on, you can fine-tune a model that flies. That is real leverage. If you are a worker, the more useful question than \u0026ldquo;is my job safe?\u0026rdquo; is \u0026ldquo;is my job verifiable?\u0026rdquo; Karpathy thinks everything is automatable eventually. Verifiability mainly sets the order.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"4-software-30-prompting-is-the-new-programming\"\u003e4. Software 3.0: prompting is the new programming\u003c/h2\u003e\n\u003cp\u003eThe frame that makes the rest of this make sense. Karpathy\u0026rsquo;s three eras:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSoftware 1.0:\u003c/strong\u003e humans write explicit code.\u003cbr\u003e\n\u003cstrong\u003eSoftware 2.0:\u003c/strong\u003e humans curate datasets and train neural networks; the weights are the program.\u003cbr\u003e\n\u003cstrong\u003eSoftware 3.0:\u003c/strong\u003e humans write prompts; the LLM is the interpreter, and the context window is the program.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;Your programming now turns to prompting. And what\u0026rsquo;s in the context window is over the interpreter, that is the LLM, that is kind of like interpreting your context and performing computation in the digital information space.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eHis sharpest example: installing OpenCode is no longer a shell script. It is a block of text you copy-paste to your agent, which reads your environment and figures the rest out.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;It\u0026rsquo;s just like, what is the piece of text to copy-paste to your agent? That\u0026rsquo;s the programming paradigm now.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe unit of programming used to be a function. Now it is closer to a paragraph.\u003c/p\u003e\n\u003ch2 id=\"5-vibe-coding-raises-the-floor-agentic-engineering-raises-the-ceiling\"\u003e5. Vibe coding raises the floor; agentic engineering raises the ceiling\u003c/h2\u003e\n\u003cp\u003eIf you build software for a living, this is the lesson with the most direct implications:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;Vibe coding is about raising the floor for everyone in terms of what they can do in software\u0026hellip; But agentic engineering is about preserving the quality bar of what existed before in professional software. You\u0026rsquo;re still responsible for your software just as before, but can you go faster?\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eKarpathy thinks the ceiling is very high:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;People used to talk about the 10x engineer previously. I think that this is magnified a lot more — 10x is not the speed up you gain. People who are very good at this peak a lot more than 10x.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe gap between mediocre and excellent users of these tools is widening. Worth taking seriously when you decide what to learn next.\u003c/p\u003e\n\u003ch2 id=\"6-the-new-human-skill-is-taste-spec-and-oversight\"\u003e6. The new human skill is taste, spec, and oversight\u003c/h2\u003e\n\u003cp\u003eWhat humans should still do, in his telling, is design and judgment work. Holding the spec in your head. Setting the architecture. Making sure the agent is being asked for the right thing in the first place.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;You\u0026rsquo;re in charge of the taste, the engineering, the design, and that it makes sense, and that you\u0026rsquo;re asking for the right things\u0026hellip; You\u0026rsquo;re doing some of the design and development, and the engineers are doing the fill in the blanks.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe MenuGen anecdote is the kind of mistake only a human spec catches. The agent silently tried to associate Stripe and Google accounts by matching email addresses, with no persistent user ID. It worked until two emails diverged.\u003c/p\u003e\n\u003cp\u003eHe is not sure this division will hold forever:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;When you actually look at the code, sometimes I get a little bit of a heart attack, because it\u0026rsquo;s not super amazing code\u0026hellip; It\u0026rsquo;s very bloaty, and there\u0026rsquo;s a lot of copy-paste, and there\u0026rsquo;s awkward abstractions that are brittle and — like, it works, but it\u0026rsquo;s just really gross.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eNothing fundamental stops the labs from training for taste. They just haven\u0026rsquo;t yet. Until they do, the taste layer is still your responsibility.\u003c/p\u003e\n\u003ch2 id=\"7-some-apps-shouldnt-exist-anymore\"\u003e7. Some apps shouldn\u0026rsquo;t exist anymore\u003c/h2\u003e\n\u003cp\u003eThe MenuGen anecdote, again. Karpathy built an app: photo a restaurant menu, OCR it, generate images of each dish, render a new menu. Vercel deployment, the full stack.\u003c/p\u003e\n\u003cp\u003eThen he saw the Software 3.0 version. Hand the photo to Gemini, say \u0026ldquo;use NanoBanana to overlay the dishes onto the menu,\u0026rdquo; and a single model call returns the same menu with images rendered into the pixels.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;All of my MenuGen is spurious. It\u0026rsquo;s working in the old paradigm. That app shouldn\u0026rsquo;t exist.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eA lot of what we are building today is scaffolding around a capability the model could perform end-to-end. Before writing the next CRUD app, ask whether the model is the app.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"8-new-possibilities-matter-more-than-the-speed-ups\"\u003e8. New possibilities matter more than the speed-ups\u003c/h2\u003e\n\u003cp\u003eThe flip side of \u0026ldquo;some apps shouldn\u0026rsquo;t exist\u0026rdquo; is that some products could not have existed before. Karpathy\u0026rsquo;s knowledge-base project is the example. Take a pile of documents, ask the LLM to recompile them into a wiki, surface the connections you would never have stitched together by hand.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;This is not even a program. This is not something that could exist before, because there was no code that would create a knowledge base based on a bunch of facts. But now you can just take these documents and basically recompile them in a different way\u0026hellip; I almost think that that\u0026rsquo;s more exciting.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIf you only ask what gets faster, you will miss the more interesting question, which is what becomes possible at all.\u003c/p\u003e\n\u003ch2 id=\"9-jagged-intelligence-ghosts-not-animals\"\u003e9. Jagged intelligence: ghosts, not animals\u003c/h2\u003e\n\u003cp\u003eKarpathy\u0026rsquo;s metaphor: we are not building animals. Animal intelligence comes with intrinsic motivation, embodiment, drives shaped by evolution. What we have instead is more like a ghost. A statistical simulator shaped by pre-training, with RL bolted on.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;These things are not animal intelligences. Like, if you yell at them, they\u0026rsquo;re not going to work better. Or worse. Or it doesn\u0026rsquo;t have any impact. And it\u0026rsquo;s all just kind of these statistical simulation circuits where the substrate is pre-training. So, statistics. And then there\u0026rsquo;s RL bolting on top.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe practical takeaway is to stop reasoning about LLMs by analogy to humans. Be suspicious of where the model seems confident, probe the edges, and figure out which circuits your task is actually landing in.\u003c/p\u003e\n\u003ch2 id=\"10-build-agent-native-infrastructure\"\u003e10. Build agent-native infrastructure\u003c/h2\u003e\n\u003cp\u003eFor infra builders, Karpathy\u0026rsquo;s pet peeve is also the opportunity:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;Why are people still telling me what to do? Like, I don\u0026rsquo;t want to do anything. What is the thing I should copy-paste to my agent?\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eRebuild the developer stack so the primary consumer of docs, configs, APIs, and deployment flows is an agent rather than a human. Data structures should be legible to LLMs by default, and sensors and actuators over the world should sit behind agent-callable interfaces.\u003c/p\u003e\n\u003cp\u003eHis test: can you say \u0026ldquo;build and deploy MenuGen\u0026rdquo; and never touch a settings panel? When the answer is yes, the infrastructure has caught up.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"11-hire-for-big-projects-not-puzzles\"\u003e11. Hire for big projects, not puzzles\u003c/h2\u003e\n\u003cp\u003eA direct shot at hiring managers. Most companies have not refactored their interview loops for the agentic era.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;Hiring has to look like, give me a really big project and see someone implement that big project. Like, let\u0026rsquo;s write, say, a Twitter clone for agents, and then make it really good, make it really secure, and then have some agents simulate some activity on this Twitter. And then I\u0026rsquo;m going to use 10 Codex 5.4-X-high to try to break your website.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eWhiteboard puzzles measure the wrong thing. If your interview loop has not changed since 2022, you are selecting for the previous era.\u003c/p\u003e\n\u003ch2 id=\"12-imagine-the-weird-endpoint\"\u003e12. Imagine the weird endpoint\u003c/h2\u003e\n\u003cp\u003eThe closing speculation is genuinely strange:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;In the early days of computing, people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets. And in the \u0026rsquo;50s and \u0026rsquo;60s, it was not really obvious which way it would go\u0026hellip; You could imagine that a lot of this will flip and that the neural net becomes kind of the host process, and the CPUs become kind of the coprocessor.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eUIs diffusion-rendered moment by moment from raw video and audio. No apps in between.\u003c/p\u003e\n\u003cp\u003eYou do not have to buy this exact picture. The point is simply that the linear extrapolation, the same software but smarter, is almost certainly the wrong frame for where this ends up.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eBased on Andrej Karpathy\u0026rsquo;s \u003ca href=\"https://www.youtube.com/watch?v=96jN2OCOfLs\"\u003einterview with Sequoia\u003c/a\u003e at AI Ascent\u003c/em\u003e\u003c/p\u003e\n","summary":"Twelve lessons from Andrej Karpathy's Sequoia interview: Software 3.0, vibe coding versus agentic engineering, jagged intelligence, and why December 2024 was the inflection most people missed.","image":"https://static.philippdubach.com/ograph/ograph-lessons-from-karpathy.jpg","date_published":"2026-05-01T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Tech"],"_philippdubach":{"type":"Commentary","word_count":1962,"reading_time_minutes":10,"keywords":["Andrej Karpathy Software 3.0","Karpathy Sequoia interview","vibe coding vs agentic engineering","Software 3.0 paradigm","Karpathy never felt more behind","agentic coding workflow","December 2024 AI inflection point","LLM jagged intelligence","reinforcement learning verifiable rewards","context window as program","LLM as interpreter","agent-native infrastructure","Karpathy taste spec oversight","ghosts not animals AI","MenuGen Karpathy","10x engineer agentic era"],"section":"posts"}},{"id":"https://philippdubach.com/posts/f3ed-cant-call-an-ace-fixing-a-neurips-2024-tennis-model/","url":"https://philippdubach.com/posts/f3ed-cant-call-an-ace-fixing-a-neurips-2024-tennis-model/","title":"F3ED Can't Call an Ace: Fixing a NeurIPS 2024 Tennis Model","content_html":"\u003cp\u003eI built a tennis broadcast pipeline this spring and ended up running F3ED, the NeurIPS 2024 shot detector, on a couple of ATP Challenger matches. F3ED is a good model. It also kept labeling clear aces as \u0026ldquo;unforced errors\u0026rdquo;, which is what this post is about. Code: \u003ca href=\"https://github.com/philippdubach/tennis-vision\"\u003egithub.com/philippdubach/tennis-vision\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eF3ED (\u003ca href=\"https://openreview.net/forum?id=Y23LZxN9eU\"\u003eNeurIPS 2024\u003c/a\u003e) detects shots well. The catch is the outcome head, which has 4 classes: \u003ccode\u003ein\u003c/code\u003e, \u003ccode\u003ewinner\u003c/code\u003e, \u003ccode\u003eforced-err\u003c/code\u003e, \u003ccode\u003eunforced-err\u003c/code\u003e. There\u0026rsquo;s no class for \u003ccode\u003eace\u003c/code\u003e, \u003ccode\u003edouble_fault\u003c/code\u003e, or \u003ccode\u003efirst_serve_fault\u003c/code\u003e. Those events aren\u0026rsquo;t shot properties; they\u0026rsquo;re score-grammar, and they need state from outside the shot itself.\u003c/p\u003e\n\u003cp\u003eI audited 11 single-shot serve rallies F3ED labeled \u003ccode\u003eunforced-err\u003c/code\u003e. 7 are first-serve-faults. 1 is an ace. Only 3 are genuine unforced-errors. 73% mislabeled by tennis\u0026rsquo;s own definition.\u003c/p\u003e\n\u003cp\u003eThe fix is a 30-line reconciler that reads the scoreboard. OCR isn\u0026rsquo;t novel here. What I haven\u0026rsquo;t seen anyone do is plug it back into runtime label correction, which is what makes the difference. N=44 rallies across two matches; this is a hypothesis, not a finding. The structural argument doesn\u0026rsquo;t depend on N.\u003c/p\u003e\n\u003ch2 id=\"the-current-pipeline\"\u003eThe current pipeline\u003c/h2\u003e\n\u003cp\u003eTwo phases, with a serializable artifact between them:\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-00_architecture-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/00_architecture.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/00_architecture.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/00_architecture.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/00_architecture.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/00_architecture.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/00_architecture.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/00_architecture.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/00_architecture.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/00_architecture.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/00_architecture.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/00_architecture.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/00_architecture.png\"\n           alt=\"Pipeline architecture diagram showing Phase 1 GPU detection, the upstream.npz boundary, and Phase 2 local CPU stages with the score-delta reconciler highlighted as the key contribution\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-00_architecture-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/00_architecture.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Pipeline architecture diagram showing Phase 1 GPU detection, the upstream.npz boundary, and Phase 2 local CPU stages with the score-delta reconciler highlighted as the key contribution\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003ccode\u003eupstream.npz\u003c/code\u003e is a 5-field dataclass (\u003ccode\u003eball_track\u003c/code\u003e, \u003ccode\u003ehomography_matrices\u003c/code\u003e, \u003ccode\u003ekps_court\u003c/code\u003e, \u003ccode\u003epersons_top/bottom\u003c/code\u003e, \u003ccode\u003ebounces\u003c/code\u003e). It\u0026rsquo;s the contract between the GPU-bound detection layer and everything else. You re-run Phase 2 in seconds and don\u0026rsquo;t pay the GPU bill again until Phase 1 inputs change. This was a boring early decision that quietly carried the project. Every iteration runs in ~10 minutes instead of needing a fresh Colab session.\u003c/p\u003e\n\u003cp\u003eThe score-delta reconciler sits at the end of Phase 2. It sees F3ED\u0026rsquo;s per-shot taxonomy and the OCR-derived score states. When they disagree, it overrides the outcome label.\u003c/p\u003e\n\u003cp\u003eQuality on TenniSet V006 (28 ground-truth points across 20 minutes, with ±12-frame tolerance):\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_01_benchmarks-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_01_benchmarks.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_01_benchmarks.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_01_benchmarks.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_01_benchmarks.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_01_benchmarks.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_01_benchmarks.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_01_benchmarks.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_01_benchmarks.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_01_benchmarks.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_01_benchmarks.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_01_benchmarks.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_01_benchmarks.png\"\n           alt=\"Detector benchmark table on TenniSet V006 showing F3ED pretrained achieves the highest F1 of 0.54 with 109 TP, 48 FP, 55 FN, 0.53 recall and 0.69 precision, beating E2E-Spot and rule-based baselines that both score 0.50 F1\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_01_benchmarks-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_01_benchmarks.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Detector benchmark table on TenniSet V006 showing F3ED pretrained achieves the highest F1 of 0.54 with 109 TP, 48 FP, 55 FN, 0.53 recall and 0.69 precision, beating E2E-Spot and rule-based baselines that both score 0.50 F1\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eF3ED has the highest F1, with fewer false positives at comparable recall. I\u0026rsquo;m not arguing it\u0026rsquo;s broken. I\u0026rsquo;m arguing about a specific thing it can\u0026rsquo;t do alone.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003cem\u003eNote: numbers above are from the V006 baseline run on commit \u003ccode\u003e0babb71\u003c/code\u003e (2026-04-23). Bounce-dedup and reconciler work since then shift F1 marginally upward; full re-eval pending a Phase-1 v8x rerun on V006.\u003c/em\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-audit\"\u003eThe audit\u003c/h2\u003e\n\u003cp\u003eTennis scoring is a finite-state machine. A point ends in exactly one of:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eace\u003c/code\u003e: server\u0026rsquo;s first or second serve, receiver doesn\u0026rsquo;t return\u003cbr\u003e\n\u003ccode\u003edouble_fault\u003c/code\u003e: both serves miss\u003cbr\u003e\n\u003ccode\u003efirst_serve_fault\u003c/code\u003e: first serve misses; second serve still to come \u003cbr\u003e\nmulti-shot rally → \u003ccode\u003ewinner\u003c/code\u003e / \u003ccode\u003eforced-err\u003c/code\u003e / \u003ccode\u003eunforced-err\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eF3ED can only emit the bottom row. The first three depend on what happens between shots, or on what doesn\u0026rsquo;t happen at all, and the model doesn\u0026rsquo;t see between-shot stuff. It also has no class to put the answer in if it did: \u003ccode\u003eace\u003c/code\u003e, \u003ccode\u003edouble_fault\u003c/code\u003e, and \u003ccode\u003efirst_serve_fault\u003c/code\u003e are not in F3ED\u0026rsquo;s published label set. The closest available emission for any of them is \u003ccode\u003eserve\u003c/code\u003e + \u003ccode\u003eunforced-err\u003c/code\u003e. The model can\u0026rsquo;t learn to distinguish them even if the training data did, because there\u0026rsquo;s nowhere to put the answer.\u003c/p\u003e\n\u003cp\u003eHere\u0026rsquo;s what tipped me off. set1 R4, t=107s:\u003c/p\u003e\n\u003cdiv class=\"code-block\"\u003e\u003cbutton type=\"button\" class=\"code-copy\" aria-label=\"Copy code to clipboard\"\u003e\n        \u003cspan class=\"code-copy-text\"\u003eCopy\u003c/span\u003e\n    \u003c/button\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-fallback\" data-lang=\"fallback\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003eF3ED raw_elements:  [\u0026#39;T\u0026#39;, \u0026#39;ad\u0026#39;, \u0026#39;near\u0026#39;, \u0026#39;serve\u0026#39;, \u0026#39;unforced-err\u0026#39;]\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003eScore before rally:  Poljicak 0    Dodig 0     (game start)\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003eScore after rally:   Poljicak 15   Dodig 0\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eServer scored, receiver didn\u0026rsquo;t move, F3ED labeled the serve \u0026ldquo;unforced error\u0026rdquo;. You can\u0026rsquo;t hit an unforced error and win the point. It was an ace, and F3ED doesn\u0026rsquo;t have an \u0026ldquo;ace\u0026rdquo; button to press, so it picked the closest available label.\u003c/p\u003e\n\u003cp\u003eThe reconciler is short. For each single-shot serve rally, read the scoreboard before and after:\u003c/p\u003e\n\u003cdiv class=\"code-block\" data-lang=\"python\"\u003e\u003cspan class=\"code-lang\" aria-hidden=\"true\"\u003epython\u003c/span\u003e\u003cbutton type=\"button\" class=\"code-copy\" aria-label=\"Copy code to clipboard\"\u003e\n        \u003cspan class=\"code-copy-text\"\u003eCopy\u003c/span\u003e\n    \u003c/button\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Verbatim from src/tennis_vision/scoreboard/reconcile.py\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003e_POINT_RANK\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;0\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;15\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;30\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;40\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e3\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;AD\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e4\u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003e_delta_points_won\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eScoreState\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eScoreState\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003etuple\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003eint\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"nb\"\u003eint\u003c/span\u003e\u003cspan class=\"p\"\u003e]:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;(top_pts_won, bot_pts_won) between two states. A game-counter increment\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e    counts as +1 (the lost-side rolls back to 0); same-game incremental points\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e    are tracked via the points-rank delta.\u0026#34;\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003etop\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"nb\"\u003emax\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etg\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u003c/span\u003e \u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etg\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003ebot\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"nb\"\u003emax\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebg\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u003c/span\u003e \u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebg\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etg\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etg\u003c/span\u003e \u003cspan class=\"ow\"\u003eand\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebg\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebg\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"n\"\u003etop\u003c/span\u003e \u003cspan class=\"o\"\u003e+=\u003c/span\u003e \u003cspan class=\"nb\"\u003emax\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003e_POINT_RANK\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etp\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u003c/span\u003e \u003cspan class=\"n\"\u003e_POINT_RANK\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etp\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e))\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"n\"\u003ebot\u003c/span\u003e \u003cspan class=\"o\"\u003e+=\u003c/span\u003e \u003cspan class=\"nb\"\u003emax\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003e_POINT_RANK\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebp\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u003c/span\u003e \u003cspan class=\"n\"\u003e_POINT_RANK\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebp\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e))\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003etop\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ebot\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003e_classify_single_shot_serve\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003etuple\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003etop_d\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ebot_d\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003e_delta_points_won\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebefore\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eafter\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eserver_d\u003c/span\u003e   \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003etop_d\u003c/span\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eserver\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;top\u0026#39;\u003c/span\u003e \u003cspan class=\"k\"\u003eelse\u003c/span\u003e \u003cspan class=\"n\"\u003ebot_d\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003ereceiver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003ebot_d\u003c/span\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eserver\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;top\u0026#39;\u003c/span\u003e \u003cspan class=\"k\"\u003eelse\u003c/span\u003e \u003cspan class=\"n\"\u003etop_d\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003ereceiver\u003c/span\u003e   \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;bottom\u0026#39;\u003c/span\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eserver\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;top\u0026#39;\u003c/span\u003e \u003cspan class=\"k\"\u003eelse\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;top\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eserver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e \u003cspan class=\"ow\"\u003eand\u003c/span\u003e \u003cspan class=\"n\"\u003ereceiver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;first_serve_fault\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;unknown\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;ocr_score_delta_first_serve_fault\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eserver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e\u0026gt;\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e \u003cspan class=\"ow\"\u003eand\u003c/span\u003e \u003cspan class=\"n\"\u003ereceiver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;ace\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eserver\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;ocr_score_delta_ace\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003ereceiver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e\u0026gt;\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e \u003cspan class=\"ow\"\u003eand\u003c/span\u003e \u003cspan class=\"n\"\u003eserver_d\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;double_fault\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ereceiver\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;ocr_score_delta_double_fault\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;unknown\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ewinner\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003erally\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emethod\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e \u003cspan class=\"s1\"\u003e\u0026#39;+ocr_inconclusive\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eThat\u0026rsquo;s the whole reconciler: 23 lines, microseconds per rally. The OCR sampling pass that produces the \u003ccode\u003ebefore\u003c/code\u003e / \u003ccode\u003eafter\u003c/code\u003e states runs once during Phase 2 (~1 Hz over the broadcast); the reconciler itself is a constant-time lookup against the resulting state timeline.\u003c/p\u003e\n\u003cp\u003eRunning it across set1 + match2 (44 rallies, 11 single-shot serve rallies) shows the structure F3ED missed:\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 65%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_02_outcomes-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_02_outcomes.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_02_outcomes.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_02_outcomes.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_02_outcomes.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_02_outcomes.png 1200w\"\n              sizes=\"65vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_02_outcomes.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_02_outcomes.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_02_outcomes.png 1440w\"\n              sizes=\"65vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_02_outcomes.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_02_outcomes.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_02_outcomes.png 2000w\"\n              sizes=\"65vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_02_outcomes.png\"\n           alt=\"Confusion table showing how F3ED labels map to OCR-grounded reality: of eleven unforced-err labels, seven are actually first-serve-faults, one is an ace, and only three are genuine unforced errors, while winner and in labels are correct\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_02_outcomes-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_02_outcomes.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Confusion table showing how F3ED labels map to OCR-grounded reality: of eleven unforced-err labels, seven are actually first-serve-faults, one is an ace, and only three are genuine unforced errors, while winner and in labels are correct\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-01_outcome_transitions-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/01_outcome_transitions.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/01_outcome_transitions.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/01_outcome_transitions.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/01_outcome_transitions.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/01_outcome_transitions.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/01_outcome_transitions.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/01_outcome_transitions.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/01_outcome_transitions.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/01_outcome_transitions.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/01_outcome_transitions.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/01_outcome_transitions.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/01_outcome_transitions.png\"\n           alt=\"Sankey-style chart showing F3ED outcome labels transitioning to OCR-grounded ground truth, with seven of eleven unforced errors reclassified as first-serve faults, one as an ace, and only three remaining as genuine unforced errors\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-01_outcome_transitions-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/01_outcome_transitions.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Sankey-style chart showing F3ED outcome labels transitioning to OCR-grounded ground truth, with seven of eleven unforced errors reclassified as first-serve faults, one as an ace, and only three remaining as genuine unforced errors\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e8 of 11 unforced-err serves (73%) are something else by tennis\u0026rsquo;s actual rules. All 8 got the right label after reconciliation. Whether 73% holds up on a larger sample is a real question; the audit framework would answer it cheaply on more clips. The remaining error budget is OCR layout failures (next section) and ambiguous score deltas in multi-shot rallies, where neither F3ED nor OCR alone tells \u003ccode\u003ewinner\u003c/code\u003e from \u003ccode\u003eforced-err\u003c/code\u003e.\u003c/p\u003e\n\u003cp\u003eThe point here isn\u0026rsquo;t that F3ED is wrong. The model emits the labels it has classes for, which is what models do. The point is that shot detection and outcome classification look like the same problem and aren\u0026rsquo;t, and on broadcast tennis the cheapest outcome ground truth is text the broadcaster has already burned into the corner of every frame.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"does-the-ocr-actually-work\"\u003eDoes the OCR actually work?\u003c/h2\u003e\n\u003cp\u003eWorth asking. The whole reconciler depends on the scoreboard reader being right. Honest answer: it depends heavily on whether the layout config is tuned. When it is, OCR is reliable. When it isn\u0026rsquo;t, individual fields collapse.\u003c/p\u003e\n\u003cp\u003eThe pipeline is EasyOCR cropping a per-layout ROI (\u003ccode\u003esplit_open_1080p\u003c/code\u003e, \u003ccode\u003esplit_open_720p\u003c/code\u003e, \u003ccode\u003ebloomfield_720p\u003c/code\u003e), then a tennis-grammar decoder that rejects illegal transitions (\u003ccode\u003e40-30\u003c/code\u003e → \u003ccode\u003e0-0\u003c/code\u003e without a game break, \u003ccode\u003eAD-15\u003c/code\u003e, and so on) and majority-votes within a sample window.\u003c/p\u003e\n\u003cp\u003eField-parse rates on the two clips in this audit, sampled at ~1 Hz:\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_03_ocr_parse_rates-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_03_ocr_parse_rates.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_03_ocr_parse_rates.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_03_ocr_parse_rates.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_03_ocr_parse_rates.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_03_ocr_parse_rates.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_03_ocr_parse_rates.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_03_ocr_parse_rates.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_03_ocr_parse_rates.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_03_ocr_parse_rates.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_03_ocr_parse_rates.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_03_ocr_parse_rates.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_03_ocr_parse_rates.png\"\n           alt=\"OCR field-parse rates table showing set1 with split_open_1080p layout achieves near-perfect parsing at 100 percent for games and 98.7 percent for points, while match2 with mistuned split_open_720p layout drops to 45.6 percent on bot_games\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_03_ocr_parse_rates-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_03_ocr_parse_rates.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"OCR field-parse rates table showing set1 with split_open_1080p layout achieves near-perfect parsing at 100 percent for games and 98.7 percent for points, while match2 with mistuned split_open_720p layout drops to 45.6 percent on bot_games\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eset1 is essentially perfect. match2\u0026rsquo;s \u003ccode\u003ebot_games\u003c/code\u003e parse drops below half because the ROI for \u003ccode\u003esplit_open_720p\u003c/code\u003e is mistuned and crops too tight on the digit. Annoying, but the grammar decoder rescues enough frames to emit 28 valid score states across 1002 samples, which is plenty. The reconciler degrades gracefully: rallies without a clean before/after pair fall back to F3ED\u0026rsquo;s raw outcome rather than crashing.\u003c/p\u003e\n\u003cp\u003eThe fix for match2 is layout cleanup, not architecture. None of these components are novel. \u003ca href=\"https://arxiv.org/abs/2603.13397\"\u003eTennisExpert\u003c/a\u003e (Liu et al. 2026, the paper that kicked off this whole project for me) and the TenniSet eval framework (Faulkner \u0026amp; Dick, DICTA 2017) both use OCR + grammar at the labeling stage. What I haven\u0026rsquo;t seen anyone do is plug the same signal back into runtime label correction.\u003c/p\u003e\n\u003ch2 id=\"putting-it-in-the-render\"\u003ePutting it in the render\u003c/h2\u003e\n\u003cp\u003eAfter reconciling, the corrected outcome flows back onto the last shot of the rally and surfaces in the rolling event-timeline panel. Here\u0026rsquo;s a single point rendered end-to-end with all overlays live:\u003c/p\u003e\n\u003cdiv class=\"video-loop\" style=\"width: 80%; margin: 1.5rem auto; padding: 0; aspect-ratio: ZgotmplZ;\"\u003e\n  \u003cvideo autoplay muted loop playsinline preload=\"metadata\" disablepictureinpicture controlslist=\"nodownload nofullscreen noremoteplayback\"\n         \n         aria-label=\"Looping tennis broadcast clip showing the rendered overlay with rally panel, scoreboard echo, per-player stats, and direction labels updating live during a single point\"\n         style=\"width: 100%; height: 100%; display: block; border-radius: 4px; object-fit: cover; background: #000;\"\u003e\n    \u003csource src=\"https://static.philippdubach.com/tennis_vision-example-point-mobile.mp4\" type=\"video/mp4\" media=\"(max-width: 768px)\"\u003e\n    \u003csource src=\"https://static.philippdubach.com/tennis_vision-example-point.mp4\" type=\"video/mp4\"\u003e\n  \u003c/video\u003e\n\u003c/div\u003e\n\n\u003cp\u003eSame set1 R4 ace, mid-frame: Top-left RALLY panel reads \u003ccode\u003e0.0s P2 Serve T ACE\u003c/code\u003e. The \u003ccode\u003eACE\u003c/code\u003e suffix replaced F3ED\u0026rsquo;s \u003ccode\u003eUE\u003c/code\u003e. The scoreboard echo (bottom-left) mirrors what triggered the correction (Poljicak just picked up 15), and the per-player stats panel (bottom-right) ticks his ace counter by one. The model\u0026rsquo;s wrong answer gets quietly corrected because a different signal contradicted it. That\u0026rsquo;s the whole post in one frame.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-05_set1_ace_corrected-png-8\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/05_set1_ace_corrected.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/05_set1_ace_corrected.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/05_set1_ace_corrected.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/05_set1_ace_corrected.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/05_set1_ace_corrected.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/05_set1_ace_corrected.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/05_set1_ace_corrected.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/05_set1_ace_corrected.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/05_set1_ace_corrected.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/05_set1_ace_corrected.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/05_set1_ace_corrected.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/05_set1_ace_corrected.png\"\n           alt=\"Rendered tennis broadcast frame showing the corrected ace label in the rally panel reading 0.0s P2 Serve T ACE, with the scoreboard echo confirming Poljicak picked up 15 and the per-player stats panel ticking the ace counter by one\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-05_set1_ace_corrected-png-8\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/05_set1_ace_corrected.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Rendered tennis broadcast frame showing the corrected ace label in the rally panel reading 0.0s P2 Serve T ACE, with the scoreboard echo confirming Poljicak picked up 15 and the per-player stats panel ticking the ace counter by one\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe same panel surfaces F3ED\u0026rsquo;s other labels in real time: direction (\u003ccode\u003eT\u003c/code\u003e for down-the-T serves, \u003ccode\u003eCC\u003c/code\u003e/\u003ccode\u003eDL\u003c/code\u003e/\u003ccode\u003eDM\u003c/code\u003e/\u003ccode\u003eII\u003c/code\u003e/\u003ccode\u003eIO\u003c/code\u003e for groundstrokes) and shot type when not a basic groundstroke (\u003ccode\u003eSlice\u003c/code\u003e, \u003ccode\u003eVolley\u003c/code\u003e, \u003ccode\u003eDrop\u003c/code\u003e, \u003ccode\u003eLob\u003c/code\u003e).\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-06_match2_rally_panel-png-9\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/06_match2_rally_panel.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/06_match2_rally_panel.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/06_match2_rally_panel.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/06_match2_rally_panel.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_match2_rally_panel.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/06_match2_rally_panel.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/06_match2_rally_panel.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/06_match2_rally_panel.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_match2_rally_panel.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/06_match2_rally_panel.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/06_match2_rally_panel.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_match2_rally_panel.png\"\n           alt=\"match2 rally panel rendering F3ED\u0026#39;s 29-class taxonomy in real time, showing direction codes for down-the-line, cross-court, and inside-out groundstrokes alongside shot-type tags like Slice, Volley, Drop, and Lob\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-06_match2_rally_panel-png-9\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/06_match2_rally_panel.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"match2 rally panel rendering F3ED\u0026#39;s 29-class taxonomy in real time, showing direction codes for down-the-line, cross-court, and inside-out groundstrokes alongside shot-type tags like Slice, Volley, Drop, and Lob\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThat panel is the F3ED 29-class taxonomy made human-readable, in real time. The reconciler doesn\u0026rsquo;t touch direction or technique. Those are pure shot properties, exactly the regime F3ED is designed for. It only fires on the score-grammar events the model can\u0026rsquo;t see.\u003c/p\u003e\n\u003cp\u003eA clean direction histogram comes for free as a side effect. 97 groundstrokes from match2:\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-04_direction_distribution-png-10\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/04_direction_distribution.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/04_direction_distribution.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/04_direction_distribution.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/04_direction_distribution.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/04_direction_distribution.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/04_direction_distribution.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/04_direction_distribution.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/04_direction_distribution.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/04_direction_distribution.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/04_direction_distribution.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/04_direction_distribution.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/04_direction_distribution.png\"\n           alt=\"Histogram of groundstroke directions from 97 shots in match2 showing 36 percent down the middle, 31 percent cross-court, 17 percent inside-out, 12 percent down-the-line, and 3 percent inside-in\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-04_direction_distribution-png-10\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/04_direction_distribution.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Histogram of groundstroke directions from 97 shots in match2 showing 36 percent down the middle, 31 percent cross-court, 17 percent inside-out, 12 percent down-the-line, and 3 percent inside-in\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e36% down the middle, 31% cross-court, 17% inside-out, 12% down-the-line, 3% inside-in. The kind of stat broadcasters quote without showing where it came from. Here it\u0026rsquo;s a one-liner over \u003ccode\u003eshots.json\u003c/code\u003e.\u003c/p\u003e\n\u003ch2 id=\"things-that-didnt-pay-off\"\u003eThings that didn\u0026rsquo;t pay off\u003c/h2\u003e\n\u003cp\u003eTwo ideas I tried that I expected to be wins. Neither was.\u003c/p\u003e\n\u003ch3 id=\"yolov8x-doesnt-help-at-720p\"\u003eYOLOv8x doesn\u0026rsquo;t help at 720p\u003c/h3\u003e\n\u003cp\u003ePhase-1 person detector was YOLOv8m. Swapping in v8x looked like a free improvement: COCO AP@small bumps about 5 pp, and the camera-far (\u0026ldquo;top\u0026rdquo;) player on broadcast tennis is the smallest object in the frame, so that\u0026rsquo;s exactly where the gain should land.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-02_pose_coverage_by_resolution-png-11\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/02_pose_coverage_by_resolution.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/02_pose_coverage_by_resolution.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/02_pose_coverage_by_resolution.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/02_pose_coverage_by_resolution.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_pose_coverage_by_resolution.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/02_pose_coverage_by_resolution.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/02_pose_coverage_by_resolution.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/02_pose_coverage_by_resolution.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_pose_coverage_by_resolution.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/02_pose_coverage_by_resolution.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/02_pose_coverage_by_resolution.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_pose_coverage_by_resolution.png\"\n           alt=\"Bar chart comparing top-player pose coverage with YOLOv8m versus YOLOv8x at two resolutions, showing 1080p coverage rising from 70.0 percent to 97.6 percent while 720p coverage stays flat at roughly 70 percent\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-02_pose_coverage_by_resolution-png-11\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/02_pose_coverage_by_resolution.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Bar chart comparing top-player pose coverage with YOLOv8m versus YOLOv8x at two resolutions, showing 1080p coverage rising from 70.0 percent to 97.6 percent while 720p coverage stays flat at roughly 70 percent\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eset1 (1080p): top-player pose coverage 70.0% → 97.6%. match2 (720p): 70.3% → 68.6%, within noise. Two clips isn\u0026rsquo;t a study, but the mechanism is plausible: at 1080p the camera-far player is ~60-100 px tall, the regime where v8x\u0026rsquo;s AP@small advantage fires. At 720p the same player is ~30-50 px, below the COCO scale buckets where -x outperforms -m. The detector can\u0026rsquo;t recover what isn\u0026rsquo;t in the input. If you\u0026rsquo;re scraping ATP Challenger feeds, fight for 1080p sources. Everything downstream compounds on what the detector gives you.\u003c/p\u003e\n\u003ch3 id=\"catboost-over-fires-bounces\"\u003eCatBoost over-fires bounces\u003c/h3\u003e\n\u003cp\u003eThe bounce detector emitted 378 bounces on a 20-min match2 clip with 84 shots, 4.5× the realistic ratio. Most of the noise is the detector lighting up on the same physical bounce across consecutive frames, plus inter-rally footage where the ball is in a player\u0026rsquo;s hand or in a replay close-up.\u003c/p\u003e\n\u003cp\u003eTwo cheap filters cut 21-27% of false bounces:\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-03_bounce_filter_waterfall-png-12\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/03_bounce_filter_waterfall.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/03_bounce_filter_waterfall.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/03_bounce_filter_waterfall.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/03_bounce_filter_waterfall.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_bounce_filter_waterfall.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/03_bounce_filter_waterfall.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/03_bounce_filter_waterfall.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/03_bounce_filter_waterfall.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_bounce_filter_waterfall.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/03_bounce_filter_waterfall.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/03_bounce_filter_waterfall.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_bounce_filter_waterfall.png\"\n           alt=\"Waterfall chart showing bounce-count reduction from 378 raw CatBoost bounces through a 400 ms temporal dedup that drops 9 to 12 percent, then a court-locality filter that drops a further 12 to 15 percent\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-03_bounce_filter_waterfall-png-12\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/03_bounce_filter_waterfall.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Waterfall chart showing bounce-count reduction from 378 raw CatBoost bounces through a 400 ms temporal dedup that drops 9 to 12 percent, then a court-locality filter that drops a further 12 to 15 percent\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe first is a temporal dedup with ~400 ms minimum separation between bounces, fps-aware. It collapses CatBoost firing on three consecutive frames for one physical contact and drops 9-12%.\u003c/p\u003e\n\u003cp\u003eThe second is a court-locality filter: project the ball pixel through the homography to canvas coordinates, drop if it falls outside the court polygon plus a 200 px buffer. This kills inter-rally noise where the ball is being held or replayed, dropping another 12-15%.\u003c/p\u003e\n\u003cp\u003eReal bounces don\u0026rsquo;t fire 200 ms apart and don\u0026rsquo;t land 3 m past the doubles alley. Neither filter is novel; both are roughly ten lines of code. If you\u0026rsquo;re using a CatBoost-style bounce detector you probably want both anyway.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"whats-open\"\u003eWhat\u0026rsquo;s open\u003c/h2\u003e\n\u003cp\u003e44 rallies isn\u0026rsquo;t enough to nail the percentage, just to expose the structure. Running the audit across 10+ matches is the obvious next step. It would also surface match-to-match variance in F3ED\u0026rsquo;s failure modes. Does it mislabel aces more often on hard courts than clay? I have no idea, and I\u0026rsquo;d like to know.\u003c/p\u003e\n\u003cp\u003eThe reconciler currently only handles single-shot serve rallies. When both players hit clean balls and the point ends, the score delta is the same whether the winner came from a \u003ccode\u003ewinner\u003c/code\u003e or a \u003ccode\u003eforced-err\u003c/code\u003e. Neither F3ED nor OCR alone disambiguates. A trajectory-aware classifier on the last two shots would close that gap. Haven\u0026rsquo;t tried it.\u003c/p\u003e\n\u003cp\u003eThe longer-term move is closed-loop F3ED retraining: use the OCR-corrected labels as supervision for a small classifier head whose input is (F3ED 4-class outcome, OCR delta, single-shot flag) and whose output is the extended set {\u003ccode\u003ein\u003c/code\u003e, \u003ccode\u003ewinner\u003c/code\u003e, \u003ccode\u003eforced-err\u003c/code\u003e, \u003ccode\u003eunforced-err\u003c/code\u003e, \u003ccode\u003eace\u003c/code\u003e, \u003ccode\u003edouble_fault\u003c/code\u003e, \u003ccode\u003efirst_serve_fault\u003c/code\u003e, \u003ccode\u003eunreturnable\u003c/code\u003e}. About 5 minutes of training data per match. 10+ matches gets a usable head. The interesting move there is putting the OCR signal into training rather than just inference.\u003c/p\u003e\n","summary":"F3ED, the NeurIPS 2024 tennis shot detector, mislabels 73% of single-shot serve unforced errors. A 23-line scoreboard OCR reconciler fixes them.","image":"https://static.philippdubach.com/ograph/ograph-tennis-vision3.jpg","date_published":"2026-04-29T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Project","word_count":1903,"reading_time_minutes":9,"keywords":["tennis shot detection","tennis broadcast computer vision","tennis match analytics open source","fine-grained tennis event detection","automatic ace detection tennis","F3ED tennis model audit","tennis scoreboard OCR pipeline","score-grammar reconciler","TenniSet V006 benchmark","ATP Challenger video analysis","shot outcome classification","EasyOCR tennis scoreboard","YOLOv8x tennis player detection","CatBoost bounce detection filter"],"section":"posts"}},{"id":"https://philippdubach.com/posts/inside-pragma-revoluts-foundation-model-for-banking/","url":"https://philippdubach.com/posts/inside-pragma-revoluts-foundation-model-for-banking/","title":"Inside PRAGMA: Revolut's Foundation Model for Banking","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-revolut-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/revolut-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/revolut-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/revolut-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/revolut-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/revolut-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/revolut-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/revolut-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/revolut-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/revolut-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/revolut-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/revolut-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/revolut-cover.jpg\"\n           alt=\"Editorial cover illustration for an analysis of Revolut\u0026#39;s PRAGMA foundation model for banking, contrasting a small consumer banking app with the vast underlying transformer architecture\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-revolut-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/revolut-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial cover illustration for an analysis of Revolut\u0026#39;s PRAGMA foundation model for banking, contrasting a small consumer banking app with the vast underlying transformer architecture\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThis month, Revolut Research and NVIDIA published \u003ca href=\"https://arxiv.org/abs/2604.08649\"\u003ePRAGMA\u003c/a\u003e: an encoder-only transformer trained on 26 million user histories spanning 24 billion events and 207 billion tokens across 111 countries. To my knowledge it is the largest encoder backbone for consumer banking event data anyone has put on arXiv. Nine months earlier, Nubank had published \u003ca href=\"https://arxiv.org/abs/2507.23267\"\u003enuFormer\u003c/a\u003e, a similar premise with the opposite architecture. Can you train a transformer on raw transaction ledgers and replace the gradient-boosted-tree models running production credit, fraud, and recommendation pipelines.\u003c/p\u003e\n\u003cp\u003eBanking has spent the last decade lagging the rest of tech on representation learning. Production models still run on hand-crafted tabular features. Every team working on this knows it\u0026rsquo;s is suboptimal. Almost no team has the data, the GPUs, or the political budget to fix it. PRAGMA is what a banking foundation model looks like at the high end of the market.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig1-headline-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig1-headline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig1-headline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig1-headline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig1-headline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig1-headline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig1-headline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig1-headline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig1-headline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig1-headline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig1-headline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig1-headline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig1-headline.png\"\n           alt=\"Figure 1 from the PRAGMA paper: relative performance of three PRAGMA sizes (10M, 100M, 1B parameters) against task-specific baselines across six banking tasks including credit scoring, fraud detection, and product recommendation\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig1-headline-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig1-headline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Figure 1 from the PRAGMA paper: relative performance of three PRAGMA sizes (10M, 100M, 1B parameters) against task-specific baselines across six banking tasks including credit scoring, fraud detection, and product recommendation\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe chart above is from the PRAGMA paper and it reads like a marketing slide. PR-AUC up \u003cstrong\u003e130.2%\u003c/strong\u003e on credit scoring. AUUC up \u003cstrong\u003e163.7%\u003c/strong\u003e on a communication uplift task. mAP up \u003cstrong\u003e40.5%\u003c/strong\u003e on product recommendation. These are relative numbers against task-specific baselines and the absolute scores are commercially redacted, so calibrate accordingly. But Revolut publishing them under their own name, with author affiliations, is the meaningful signal here. Internal foundation models have moved from trade secret to competitive disclosure.\u003c/p\u003e\n\u003ch2 id=\"what-revolut-built\"\u003eWhat Revolut built\u003c/h2\u003e\n\u003cp\u003ePRAGMA is a BERT-style encoder, not a GPT. The choice matters. Revolut\u0026rsquo;s downstream targets are discriminative (default within 12 months, fraud, churn, product adoption), which is exactly what bidirectional masked modelling is good at. The model family scales from 10M to 100M to 1B parameters across three encoder branches: a profile-state encoder for static attributes, a per-event encoder, and a history encoder that fuses them.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig4-architecture-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig4-architecture.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig4-architecture.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig4-architecture.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig4-architecture.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig4-architecture.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig4-architecture.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig4-architecture.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig4-architecture.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig4-architecture.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig4-architecture.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig4-architecture.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig4-architecture.png\"\n           alt=\"PRAGMA backbone architecture: two-branch design with separate profile-state encoder and per-event encoder feeding a shared history encoder, showing how static user attributes and event sequences are fused into one representation\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig4-architecture-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig4-architecture.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"PRAGMA backbone architecture: two-branch design with separate profile-state encoder and per-event encoder feeding a shared history encoder, showing how static user attributes and event sequences are fused into one representation\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe architectural decision that strikes me as most important is the input representation. Naive text serialization of a transaction record into JSON blows up sequence length: every key name, every delimiter, every digit becomes multiple BPE subword tokens. Worse, splitting \u0026ldquo;14.99\u0026rdquo; into \u0026ldquo;14\u0026rdquo; \u0026ldquo;.\u0026rdquo; \u0026ldquo;99\u0026rdquo; destroys the magnitude information that any credit model needs. Revolut\u0026rsquo;s answer is to tokenise each field as a triple of semantic key, typed value, and temporal coordinate. Numerical values map to learned percentile buckets. Categorical values map to single tokens. Text gets BPE. Timestamps get encoded twice, once as compressed log-seconds since the previous event and once as fixed-period sinusoids over hour-of-day, day-of-week, and day-of-month.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig2-timeline-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig2-timeline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig2-timeline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig2-timeline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig2-timeline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig2-timeline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig2-timeline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig2-timeline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig2-timeline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig2-timeline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig2-timeline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig2-timeline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig2-timeline.png\"\n           alt=\"A PRAGMA user history as a stream of structured banking events with timestamps and key-value attributes, around 60 keys and 28,000 value tokens per user, leading up to an evaluation point where the model predicts a downstream target\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig2-timeline-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig2-timeline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"A PRAGMA user history as a stream of structured banking events with timestamps and key-value attributes, around 60 keys and 28,000 value tokens per user, leading up to an evaluation point where the model predicts a downstream target\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe figure above is what a single user looks like to PRAGMA: a stream of structured events leading up to an evaluation point at which the model is asked to predict something. Around 60 keys. Around 28,000 value tokens.\u003c/p\u003e\n\u003cp\u003ePre-training is masked language modelling, but with three masking sources blended together: 15% standard token masking, 10% whole-event masking, and 10% semantic-type masking. The whole-event variant is  interesting for banking. It teaches the model that when you cannot see the amount of a card payment but you can see the merchant, the time, and the surrounding behavioural pattern, the amount is often inferable. That is exactly the inductive bias you want in a credit or fraud model.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-numbers\"\u003eThe numbers\u003c/h2\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table2-results-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table2-results.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table2-results.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table2-results.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table2-results.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table2-results.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table2-results.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table2-results.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table2-results.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table2-results.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table2-results.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table2-results.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table2-results.png\"\n           alt=\"Relative performance of PRAGMA-L with LoRA fine-tuning against internal task-specific baselines: 130 percent PR-AUC lift on credit scoring, 163 percent AUUC on uplift, 40 percent mAP on product recommendation, with the AML task showing a 47 percent loss\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table2-results-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table2-results.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Relative performance of PRAGMA-L with LoRA fine-tuning against internal task-specific baselines: 130 percent PR-AUC lift on credit scoring, 163 percent AUUC on uplift, 40 percent mAP on product recommendation, with the AML task showing a 47 percent loss\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e(1) The LoRA versus train-from-scratch comparison. Revolut shows that fine-tuning a pre-trained backbone with LoRA, updating roughly 2-4% of parameters, consistently matches or beats training a fresh task-specific model on the same downstream data. This is the result that justifies the entire infrastructure investment. If pre-training did not transfer, you would not bother. Communication engagement gains \u003cstrong\u003e18.6%\u003c/strong\u003e PR-AUC from LoRA over scratch. Credit scoring gains \u003cstrong\u003e13%\u003c/strong\u003e. Product recommendation gains \u003cstrong\u003e10.3%\u003c/strong\u003e mAP. That is the business case.\u003c/p\u003e\n\u003cp\u003e(2) The profile-state ablation. Removing the dedicated profile-state branch tells you which tasks are driven by static user characteristics versus event sequences. Credit scoring loses \u003cstrong\u003e31.8%\u003c/strong\u003e PR-AUC without profile state, because account tenure and onboarding signals matter for identifying minority-class defaulters. Communication engagement actually gains 3% in PR-AUC without profile state, because re-engagement is a story about pre-drop-off behaviour, not about who the user is. The two-branch design keeps the static features when they help and ignores them when they do not.\u003c/p\u003e\n\u003cp\u003e(3) The failure. PRAGMA loses \u003cstrong\u003e47.1%\u003c/strong\u003e on F-0.5 against the production baseline for anti-money-laundering detection, and Revolut wrote this into their paper. The reason is that AML is a relational problem. You catch laundering by looking across users and across accounts, and PRAGMA processes each user history in isolation. The lesson generalises: foundation models on individual ledgers are not graph-aware, and the production AML stack at any large bank includes graph-aware components that PRAGMA cannot replace. Knowing the limit is more useful than the headline gains.\u003c/p\u003e\n\u003ch2 id=\"how-this-compares-to-nubank\"\u003eHow this compares to Nubank\u003c/h2\u003e\n\u003cp\u003eNubank\u0026rsquo;s nuFormer, published in July 2025, makes the opposite architectural choice. It is a causal GPT-style decoder pre-trained with next-token prediction, with a \u003ca href=\"https://building.nubank.com/fine-tuning-transaction-user-models/\"\u003ejoint fusion\u003c/a\u003e finetuning step that bolts a \u003ca href=\"https://arxiv.org/abs/2008.13535\"\u003eDCNv2\u003c/a\u003e tabular network onto the same gradient graph. The reported lift is \u003cstrong\u003e+1.25%\u003c/strong\u003e in test AUC on a single recommendation task, and a \u003cstrong\u003e4.4%\u003c/strong\u003e reduction in user churn measured in production. Smaller numbers than PRAGMA, but Nubank published a real production deployment outcome. PRAGMA\u0026rsquo;s results are still backtests.\u003c/p\u003e\n\u003cp\u003eThe two papers disagree on almost everything that is fun to argue about. Architecture: decoder versus encoder. Task scope: one task versus six. The role of static profile state: collapsed into the sequence versus given its own branch. What they agree on: Hand-crafted feature engineering can be replaced by self-supervised representation learning on raw transaction sequences, and doing so produces material lifts on real banking problems. The architectural debate is downstream of that.\u003c/p\u003e\n\u003cp\u003eThe broader literature is moving the same way. \u003ca href=\"https://arxiv.org/abs/2511.08939\"\u003eTransactionGPT\u003c/a\u003e (Dou et al., 2025) introduces a 3D transformer for billion-scale payment trajectories aimed at anomaly detection. \u003ca href=\"https://arxiv.org/abs/1908.10063\"\u003eFinBERT\u003c/a\u003e, \u003ca href=\"https://arxiv.org/abs/2303.17564\"\u003eBloombergGPT\u003c/a\u003e, and \u003ca href=\"https://arxiv.org/abs/2306.06031\"\u003eFinGPT\u003c/a\u003e cover the text side. \u003ca href=\"https://arxiv.org/abs/2310.01728\"\u003eTime-LLM\u003c/a\u003e and \u003ca href=\"https://arxiv.org/abs/2403.07815\"\u003eChronos\u003c/a\u003e cover numerical time series. PRAGMA and nuFormer are the two papers that target the actual structured event ledger sitting inside a retail bank, which is the asset that matters for credit, fraud, and product decisions.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"outlook\"\u003eOutlook\u003c/h2\u003e\n\u003cp\u003eThere is no public checkpoint. Revolut and Nubank both keep their weights inside their production stack, which is the right business decision and the wrong scientific one. You cannot run PRAGMA on your own data. You can only read the paper and decide whether the recipe is reproducible.\u003c/p\u003e\n\u003cp\u003eI think it is. The paper is detailed enough to rebuild from. The tokenisation scheme is fully specified. The architecture diagram is precise enough to follow. They even document the optimiser, \u003ca href=\"https://kellerjordan.github.io/posts/muon/\"\u003eMuon\u003c/a\u003e plus AdamW, and the hardware, 32 H100s for the 1B variant. The constraint is the pre-training corpus, not the model.\u003c/p\u003e\n\u003cp\u003eSo the next project on this site is a faithful PRAGMA reimplementation at the small (10M) scale, trained on a synthetic or open-licensed transaction dataset, evaluated on a subset of the downstream tasks where public benchmarks exist. I will write that up here in instalments, including what works, what breaks, and where the paper is silent. The codebase will land in a public repository as I build it.\u003c/p\u003e\n","summary":"Revolut's PRAGMA is a 1B-parameter encoder trained on 24B banking events. Reading the paper, comparing with Nubank's nuFormer, planning a rebuild.","image":"https://static.philippdubach.com/ograph/ograph-pragma-revolut.jpg","date_published":"2026-04-26T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Analysis","word_count":1179,"reading_time_minutes":6,"keywords":["PRAGMA","Revolut foundation model","banking foundation model","nuFormer Nubank","transformer banking transactions","self-supervised learning transactions"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-moral-philosophy-of-investing-in-ignorance/","url":"https://philippdubach.com/posts/the-moral-philosophy-of-investing-in-ignorance/","title":"The Moral Philosophy of Investing in Ignorance","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-edge-of-knowledge-5-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/edge-of-knowledge-5-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-5-cover.jpg\"\n           alt=\"Editorial illustration: two charcoal-silhouette figures facing each other across a small balance scale, the left figure stepping forward into the fog with hand outstretched while the right figure stands still on a darker side, visualizing the moral asymmetry of profiting from others\u0026#39; constraints\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-edge-of-knowledge-5-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/edge-of-knowledge-5-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial illustration: two charcoal-silhouette figures facing each other across a small balance scale, the left figure stepping forward into the fog with hand outstretched while the right figure stands still on a darker side, visualizing the moral asymmetry of profiting from others\u0026#39; constraints\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003cem\u003eInvesting at the Edge of Knowledge, Part 5 · \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003eStart with Part 1\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;If in an unknowable world none of your bridges fall down, you are building them too strong.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eZeckhauser\u0026rsquo;s version of this line refers to investments, not bridges, but the structural point is the same. A philosophy of investing that expects some failures is also a philosophy that accepts some losses will be borne by the people on the other side of the trade. Over four installments I\u0026rsquo;ve laid out a framework for thinking about what you can know (\u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003ePart 1\u003c/a\u003e), why investors flee what they can\u0026rsquo;t (\u003ca href=\"/posts/ambiguity-by-design/\"\u003ePart 2\u003c/a\u003e), how to assess what others know (\u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003ePart 3\u003c/a\u003e), and how much to bet (\u003ca href=\"/posts/bet-sizing-at-the-frontier/\"\u003ePart 4\u003c/a\u003e). This final piece asks the question the framework doesn\u0026rsquo;t answer: when you profit from ignorance, is that a legitimate source of returns?\u003c/p\u003e\n\u003cp\u003eZeckhauser opens his \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205821\"\u003e2006 paper\u003c/a\u003e with a warning that is usually read as practical advice: \u0026ldquo;Do not read on if blame aversion is a prime concern.\u0026rdquo; I think there\u0026rsquo;s an ethical question underneath the practical one.\u003c/p\u003e\n\u003ch2 id=\"three-sources-of-profit\"\u003eThree sources of profit\u003c/h2\u003e\n\u003cp\u003eIn the risk box, profit comes from superior calculation. Both sides had the same information. You ran the numbers better. This is the cleanest form of trading profit. The losing side made a computational error that was, in principle, avoidable.\u003c/p\u003e\n\u003cp\u003eIn the uncertainty box, profit comes from superior estimation. You had a better model, better priors, or more data. The other side could have done the same analysis but didn\u0026rsquo;t. This is still relatively clean, though the boundary between \u0026ldquo;better estimation\u0026rdquo; and \u0026ldquo;inside information\u0026rdquo; requires constant policing, which is what securities regulation exists to do.\u003c/p\u003e\n\u003cp\u003eIn the ignorance box, the source of profit shifts. The other side didn\u0026rsquo;t sell because they thought the price was fair. They sold because they couldn\u0026rsquo;t model the asset and their constraints forced a decision. The fund manager who sold the IGV at $80 during the \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003eSaaSpocalypse\u003c/a\u003e wasn\u0026rsquo;t wrong about the disruption risk from AI. They were unable to hold a position that couldn\u0026rsquo;t be defended to their risk committee, their clients, or their compliance team. The profit for the buyer comes from the gap between institutional rationality (the right decision for the manager\u0026rsquo;s career) and market rationality (the right price for the asset).\u003c/p\u003e\n\u003cp\u003eThe distinction matters. Most alpha in UU situations is some form of constraint arbitrage: profiting from the gap between what an asset is worth and what institutions are able to pay for it. Time horizon arbitrage, where you can hold for five years and they can\u0026rsquo;t. Liquidity arbitrage, where you can accept illiquidity and they can\u0026rsquo;t. Career-risk arbitrage, where you can tolerate looking wrong and they can\u0026rsquo;t. All three produce genuine returns, and in none of them did the counterparty make an error. They made a rational decision given their constraints, and you profited from having different constraints.\u003c/p\u003e\n\u003cp\u003eOne way to frame this is positive: the constraint-arbitrage investor is providing liquidity to a market that needs it. They\u0026rsquo;re buying when others are forced to sell, which improves price discovery and reduces the magnitude of mispricings. In this framing, the profit is compensation for bearing ambiguity that others can\u0026rsquo;t.\u003c/p\u003e\n\u003cp\u003eAnother way to frame it is uncomfortable: the returns from ignorance flow to those who can afford to bear it, and that set of people is not random.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-sidecar-problem-revisited\"\u003eThe sidecar problem, revisited\u003c/h2\u003e\n\u003cp\u003eI discussed the sidecar concept in \u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003ePart 3\u003c/a\u003e as an information problem: how do you know the driver is skilled? Here I want to revisit it as an ethical problem: what kind of edge is the driver using?\u003c/p\u003e\n\u003cp\u003eZeckhauser\u0026rsquo;s sidecar works cleanly when the driver has genuine capability. A real estate developer who can build and lease a building is creating value. A venture capitalist with operational expertise and a network of technical talent is creating value. The sidecar investor earns a share of that value creation. The profit comes from complementary skills combined with capital, and this is hard to object to on ethical grounds.\u003c/p\u003e\n\u003cp\u003eIt gets murkier when the complementary asset is power rather than skill. Zeckhauser discusses a hypothetical Gazprom investment: \u0026ldquo;If you could comfortably determine that the Russian elite was investing on its own volition, and that foreigners would not be discriminated against\u0026hellip;\u0026rdquo; The edge in that scenario isn\u0026rsquo;t analytical. It\u0026rsquo;s access to a political structure. The sidecar investor is riding alongside someone who can influence outcomes, not someone who can predict them. The distinction between capability and power as complementary assets is one the paper gestures at but doesn\u0026rsquo;t fully resolve.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205858\"\u003eRobb (2006)\u003c/a\u003e put his finger on a related problem. UU knowledge is \u0026ldquo;uncommunicable.\u0026rdquo; If a mechanism for generating excess returns could be expressed as a process, someone would have arbitraged it away. But if the driver can\u0026rsquo;t articulate their edge, the sidecar investor can\u0026rsquo;t distinguish between genuine insight, survivorship bias, and proximity to power. There\u0026rsquo;s an epistemological problem here, and an ethical one. You\u0026rsquo;re making a bet on someone whose advantage you can\u0026rsquo;t evaluate, which means you\u0026rsquo;re implicitly trusting that the advantage is legitimate.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205848\"\u003eSummers (2006)\u003c/a\u003e observed that identifying skilled UU managers may be no easier than picking investments directly. I suspect this is too generous. In many cases, identifying whether a sidecar driver has skill, power, or luck is harder than evaluating the underlying asset, because the asset at least has observable characteristics. The driver\u0026rsquo;s edge, by definition, doesn\u0026rsquo;t.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"blame-accountability-and-the-collective-action-problem\"\u003eBlame, accountability, and the collective action problem\u003c/h2\u003e\n\u003cp\u003eThe Monday Morning Quarterback problem runs through Zeckhauser\u0026rsquo;s entire paper. Investors avoid Box F not because the expected value is negative but because a bad outcome will be judged harshly in retrospect. I\u0026rsquo;ve discussed this as a mechanism for mispricing (Parts \u003ca href=\"/posts/ambiguity-by-design/\"\u003e2\u003c/a\u003e and \u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003e3\u003c/a\u003e). Here I want to name the distributional consequence.\u003c/p\u003e\n\u003cp\u003eIf we want institutional investors to make UU bets, which would improve price discovery and reduce the mispricing that currently rewards unconstrained investors, we need governance structures that tolerate good decisions with bad outcomes. The current structure doesn\u0026rsquo;t. A pension fund CIO who buys the IGV at $80 and watches it fall to $70 will face questions that no amount of \u0026ldquo;the expected value was positive\u0026rdquo; can answer. The governance framework is built for the risk box, where decisions can be evaluated against a defined probability model. In the ignorance box, there is no model to evaluate against, which means there is no institutional language for \u0026ldquo;this was a good bet that happened to lose.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe result is a collective action problem with distributional consequences. The returns from UU mispricing accrue disproportionately to wealthy individuals, family offices, and unconstrained investors like Buffett, precisely the people who can afford career risk, illiquidity, and blame. Pension funds, endowments, and retail investors in diversified vehicles are structurally excluded, not by regulation or by choice, but by governance frameworks that require the kind of probability estimates the ignorance box doesn\u0026rsquo;t produce.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m not sure this is a solvable problem. The fiduciary duty to beneficiaries is real, and \u0026ldquo;we invested in something we couldn\u0026rsquo;t model because the price seemed low\u0026rdquo; is not, and should not be, an acceptable fiduciary justification. But it\u0026rsquo;s worth naming the consequence: the epistemological structure of markets has equity implications. The returns from acting under ignorance flow to those who already have the most capacity to bear it. This is not a conspiracy. It\u0026rsquo;s a structural feature that emerges naturally from the interaction of ambiguity aversion, institutional constraints, and governance design.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003chr\u003e\n\u003cp\u003eZeckhauser closes his paper by returning to David Ricardo at Waterloo. Ricardo wasn\u0026rsquo;t a military analyst. He didn\u0026rsquo;t have inside information about Wellington\u0026rsquo;s strategy. He just understood the structure of the situation: thin competition (most investors had fled), an eager seller (the British government needed capital), asymmetric payoffs (bounded downside, enormous upside), and a kind of not-knowing that was the same for everyone. He bought British government bonds on the eve of the battle and made a fortune.\u003c/p\u003e\n\u003cp\u003eThe honest answer to \u0026ldquo;what\u0026rsquo;s your thesis?\u0026rdquo; in a UU investment is: \u0026ldquo;I don\u0026rsquo;t have one in the way you mean. I have a set of second-order inferences about what other people don\u0026rsquo;t know, what constraints they face, and why the price might be wrong even though I can\u0026rsquo;t tell you what the right price is.\u0026rdquo; That\u0026rsquo;s not a pitch deck. It\u0026rsquo;s a worldview. Most investment committees would reject it, which is, of course, part of why it works.\u003c/p\u003e\n\u003cp\u003eCharlie Munger, in his 1995 \u003ca href=\"https://jamesclear.com/great-speeches/psychology-of-human-misjudgment-by-charlie-munger\"\u003eHarvard Law School speech\u003c/a\u003e on the psychology of human misjudgment, offered a compliment that I think is the best summary of everything this series has tried to say: \u0026ldquo;The right way to think is the way Zeckhauser plays bridge.\u0026rdquo; The compliment is precise. Bridge is a game of acting under uncertainty with imperfect information, where the quality of the decision is independent of the outcome, and where the best players are distinguished not by what they know but by how they reason about what they don\u0026rsquo;t know.\u003c/p\u003e\n\u003cp\u003eThat might be the best definition of investing at the edge of knowledge I can offer.\u003c/p\u003e\n","summary":"Constraint arbitrage, the sidecar problem, and who bears the distributional cost of investing under ignorance. The final installment of Edge of Knowledge.","image":"https://static.philippdubach.com/ograph/ograph-moral-philosophy-investing-ignorance.jpg","date_published":"2026-04-22T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Analysis","word_count":1514,"reading_time_minutes":8,"keywords":["ethics of investing under ignorance","constraint arbitrage fairness","Zeckhauser blame aversion investing","institutional investor career risk","Monday morning quarterback MMQ risk","sidecar investing ethics Gazprom","Munger Zeckhauser bridge quote","UU investing distributional consequences","Ricardo Waterloo investment unknown","fiduciary duty ambiguity penalty","market efficiency ignorance box","pension fund UU constraints","Buffett California earthquake authority ethics","complementary skills vs power investing"],"section":"posts"}},{"id":"https://philippdubach.com/posts/bet-sizing-at-the-frontier/","url":"https://philippdubach.com/posts/bet-sizing-at-the-frontier/","title":"Bet Sizing at the Frontier","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-edge-of-knowledge-4-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/edge-of-knowledge-4-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-4-cover.jpg\"\n           alt=\"Editorial illustration: a single charcoal-silhouette figure stands at the edge of a fog bank holding a small stack of warm-ochre coins, with a few coins already pushed forward into the mist trailing into the fog, visualizing the Kelly Criterion as a sizing problem under uncertainty\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-edge-of-knowledge-4-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/edge-of-knowledge-4-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial illustration: a single charcoal-silhouette figure stands at the edge of a fog bank holding a small stack of warm-ochre coins, with a few coins already pushed forward into the mist trailing into the fog, visualizing the Kelly Criterion as a sizing problem under uncertainty\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003cem\u003eInvesting at the Edge of Knowledge, Part 4 · \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003eStart with Part 1\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;He who acts in N plays to make his mean log of wealth as big as it can be made will, with odds that go to one as N soars, beat me who acts to meet my own tastes for risk.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat\u0026rsquo;s Paul Samuelson, writing in one-syllable words. The title of his \u003ca href=\"https://www.sciencedirect.com/science/article/abs/pii/0378426679900232\"\u003e1979 paper\u003c/a\u003e: \u0026ldquo;Why We Should Not Make Mean Log of Wealth Big Though Years to Act Are Long.\u0026rdquo; Published in the \u003cem\u003eJournal of Banking \u0026amp; Finance\u003c/em\u003e, a journal not typically known for its prose style. The playfulness of the writing masks the seriousness of the dispute underneath: a disagreement about the foundations of position sizing that remains unresolved half a century later.\u003c/p\u003e\n\u003cp\u003eIn \u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003ePart 3\u003c/a\u003e I described how to assess whether the other side of a trade knows something you don\u0026rsquo;t. But even if you\u0026rsquo;re confident you\u0026rsquo;re in Box D or Box F (shared uncertainty or shared ignorance, where neither side has an information edge), you still need to decide how much capital to commit. And in a UU world, the most famous formula for answering that question stops working.\u003c/p\u003e\n\u003ch2 id=\"what-kelly-actually-says\"\u003eWhat Kelly actually says\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.princeton.edu/~wbialek/rome/refs/kelly_56.pdf\"\u003eJ.L. Kelly Jr. (1956)\u003c/a\u003e was a physicist at Bell Labs, not a finance researcher. His paper, \u0026ldquo;A New Interpretation of Information Rate,\u0026rdquo; was about communication channels, not portfolios. The insight was a connection between Shannon\u0026rsquo;s information theory and gambling: the maximum exponential growth rate of a gambler\u0026rsquo;s capital equals the rate of information transmission over a noisy channel.\u003c/p\u003e\n\u003cp\u003eThe formula itself is simple. For a binary bet with probability \u003cem\u003ep\u003c/em\u003e of winning and odds of \u003cem\u003eb\u003c/em\u003e to 1, the optimal fraction of your bankroll to wager is \u003cem\u003ef = (bp - q) / b\u003c/em\u003e, where \u003cem\u003eq = 1 - p\u003c/em\u003e. If you have a 60% chance of winning an even-money bet, Kelly says invest 20% of your capital. The appeal is a mathematical proof: given sufficient repetitions, a Kelly bettor will, with probability approaching one, end up wealthier than anyone using any other fixed-fraction strategy. It maximizes the geometric growth rate of the portfolio, which is the growth rate that actually compounds over time.\u003c/p\u003e\n\u003cp\u003eEd Thorp was the first to take this seriously as an investment principle. He used it to beat blackjack (documented in \u003cem\u003eBeat the Dealer\u003c/em\u003e, 1962), then applied it to warrant pricing and convertible arbitrage through his hedge fund, Princeton Newport Partners, which returned roughly 15% annually with minimal drawdowns over two decades. Elwyn Berlekamp, Kelly\u0026rsquo;s research assistant at Bell Labs, later became the key figure who restructured Renaissance Technologies\u0026rsquo; Medallion Fund in 1989, applying Kelly-based position sizing to thousands of short-duration trades. Medallion returned roughly \u003cstrong\u003e66%\u003c/strong\u003e annually before fees from 1988 through 2021. Bill Gross used Kelly-adjacent thinking at PIMCO. The framework has serious practitioners with serious track records.\u003c/p\u003e\n\u003ch2 id=\"what-samuelson-actually-objected-to\"\u003eWhat Samuelson actually objected to\u003c/h2\u003e\n\u003cp\u003eSamuelson\u0026rsquo;s critique is often misunderstood as \u0026ldquo;Kelly doesn\u0026rsquo;t work.\u0026rdquo; That\u0026rsquo;s not what he said. What he said is more precise and more interesting.\u003c/p\u003e\n\u003cp\u003eKelly maximizes the expected logarithm of wealth. This is only optimal if your utility function is logarithmic. Log utility means you are exactly indifferent between the status quo and an even-money bet that would either double or halve your total wealth. Most people would not take that bet. If you are more risk-averse than log utility implies (and most human beings are), Kelly systematically overbets, exposing you to drawdowns that are mathematically acceptable but psychologically devastating. If you are less risk-averse (say, risk-neutral), Kelly underbets: a risk-neutral investor should go all-in on every positive expected value opportunity.\u003c/p\u003e\n\u003cp\u003eUnderneath the math, Kelly and Samuelson are arguing about something more basic. Kelly treats position sizing as a mathematical optimization problem with a unique solution. Samuelson insists it is a preference problem with as many valid solutions as there are utility functions. Both are correct within their own frameworks. The dispute is about which framework applies. And because the answer depends on the investor\u0026rsquo;s risk preferences, which are not observable from the outside and may not even be stable over time, the dispute is in principle unresolvable.\u003c/p\u003e\n\u003cp\u003ePractitioners tend to resolve it pragmatically. Most Kelly users bet \u0026ldquo;half Kelly\u0026rdquo; or \u0026ldquo;quarter Kelly,\u0026rdquo; sacrificing some expected growth for lower variance. There\u0026rsquo;s also a second reason for fractional Kelly that has nothing to do with utility: estimation error. The formula is exquisitely sensitive to how accurately you\u0026rsquo;ve estimated \u003cem\u003ep\u003c/em\u003e, and overestimating your edge by even a few percentage points pushes you past optimal into a regime that destroys capital over time. Half Kelly is partly a hedge against your own overconfidence in the inputs. This works in practice, but it concedes Samuelson\u0026rsquo;s point: the \u0026ldquo;optimal\u0026rdquo; fraction depends on your tolerance for pain and the precision of your estimates, not just on the mathematics of compound growth. The formula provides a ceiling, not a prescription.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"why-the-debate-is-beside-the-point-in-uu\"\u003eWhy the debate is beside the point in UU\u003c/h2\u003e\n\u003cp\u003eHere is where both Kelly and Samuelson run into the same wall.\u003c/p\u003e\n\u003cp\u003eKelly requires \u003cem\u003ep\u003c/em\u003e (the probability of winning) and \u003cem\u003eb\u003c/em\u003e (the payoff ratio). Samuelson\u0026rsquo;s alternative utility-maximization frameworks require the same inputs, or richer ones: full probability distributions over outcomes. Every formal system for position sizing assumes you can parameterize your uncertainty. In the ignorance box, as defined in \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003ePart 1\u003c/a\u003e, you can\u0026rsquo;t. The parameters themselves are objects of ignorance. \u003cem\u003ep\u003c/em\u003e is not an imprecise estimate waiting for better data. It is undefined, because the state space over which \u003cem\u003ep\u003c/em\u003e would be calculated hasn\u0026rsquo;t been enumerated. This is the territory Knight called true uncertainty, distinguishing it from measurable risk a century ago, and it is where every formal position-sizing tool stops working at the same time.\u003c/p\u003e\n\u003cp\u003eZeckhauser walks through five reasons UU money management resists formal modelling, and each one breaks a different standard assumption.\u003c/p\u003e\n\u003cp\u003eMost UU investments are illiquid for unknown periods. You can\u0026rsquo;t rebalance, which means sequential portfolio optimization models don\u0026rsquo;t apply. Worse, markets charge enormous premiums to cash out illiquid assets, so your exit price is not your mark-to-market. Even the toy models of optimal sequential investment assume the hard problems away: known probabilities, known time horizons, known liquidity. There\u0026rsquo;s also the embarrassing fact that smart people disagree about position sizing even on problems where probabilities \u003cem\u003eare\u003c/em\u003e known, which gives you no reason to expect convergence when they aren\u0026rsquo;t. And when genuinely unknowable events arrive (the 1987 crash, the 1997 Asian crisis, the 2020 pandemic), the money-management problems that emerge are precisely the ones no model anticipated.\u003c/p\u003e\n\u003cp\u003eNone of this means do nothing. It means formula-based precision is unavailable and pretending otherwise is dangerous. Plugging estimated probabilities into Kelly when the estimates are themselves wild guesses doesn\u0026rsquo;t give you a rigorous answer with a known error bar. It gives you false precision, which is worse than an honest admission of ignorance because it generates unwarranted confidence in the position size.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"diversification-versus-concentration\"\u003eDiversification versus concentration\u003c/h2\u003e\n\u003cp\u003eStandard portfolio theory says: diversify. Spread capital across uncorrelated assets to minimize idiosyncratic risk. In the risk box (Box 1 from Part 1), where expected returns and covariance matrices are estimable, this is correct. The marginal expected return on each position is similar, and diversification is free insurance.\u003c/p\u003e\n\u003cp\u003eIn the ignorance box, this logic inverts. If you\u0026rsquo;ve identified one or two opportunities in Zeckhauser\u0026rsquo;s Box D or Box F, the quadrants from \u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003ePart 3\u003c/a\u003e where neither side has an information edge, and you have a genuine advantage from complementary skills, constraint arbitrage, or structural access, spreading capital evenly across a dozen positions dilutes the few bets where your advantage is real. Zeckhauser makes this point sharply: investors routinely allocate almost the same percentage to an investment where they expect a 30% return as to one where they expect 10%. Consider the \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003eSaaSpocalypse\u003c/a\u003e: the IGV at $80 with an RSI of 18 and 17% sector earnings growth was, on the framework, a Box F opportunity with a large absolute advantage for anyone with a multi-year time horizon. The Maxim B response would be to concentrate, not diversify. Most institutional investors did the opposite.\u003c/p\u003e\n\u003cp\u003eBuffett\u0026rsquo;s practice reflects this. Berkshire Hathaway\u0026rsquo;s top five holdings routinely exceed 70% of its equity portfolio. This is Kelly-adjacent thinking: bet big when your edge is big. But it operates without the false precision of computing a Kelly fraction, because Buffett doesn\u0026rsquo;t pretend to know his probability of winning to two decimal places. He knows the price is low relative to his assessment of value, his time horizon exceeds the market\u0026rsquo;s, and the business is durable enough to survive things he can\u0026rsquo;t foresee. The probability distribution of outcomes is not something he claims to have, and he doesn\u0026rsquo;t pretend otherwise.\u003c/p\u003e\n\u003cp\u003eZeckhauser uses a bridge analogy that I think captures this well. A bridge player makes hundreds of decisions in a single session, balancing expected gains and losses on every hand. But no serious bridge player computes Kelly fractions mid-hand. They develop judgment over thousands of hands about when to bid aggressively and when to pass. The judgment is trained by feedback, but the individual decision is not a calculation. It\u0026rsquo;s pattern recognition combined with temperament, a sense of when the odds are tilted enough to justify the risk.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis is where Maxim B does its work: \u0026ldquo;The greater is your expected return, the larger your advantage, the greater the percentage of your capital you should put at risk.\u0026rdquo; It sounds obvious. Read it again. In a world where position sizing formulas require inputs you don\u0026rsquo;t have, Maxim B is the honest replacement: a heuristic that says \u0026ldquo;bet proportionally to your edge,\u0026rdquo; without pretending to quantify the edge precisely. Combine it with Zeckhauser\u0026rsquo;s diagnostic, and you get the closest thing to a position sizing framework that works in UU: \u0026ldquo;If in an unknowable world none of your investments looks foolish after the fact, you are staying too far away from the unknowable.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eI find that diagnostic unsettling in the right way. It says that some of your bets should look bad. Not because you were wrong, but because the honest price of participation in UU markets is a portfolio that occasionally embarrasses you. If every position is defensible in hindsight, you\u0026rsquo;ve been too cautious. You\u0026rsquo;ve been optimizing for looking smart rather than for capturing the returns that only come to those willing to look foolish.\u003c/p\u003e\n\u003cp\u003eThe first four parts of this series have addressed what you can know, what you can\u0026rsquo;t, how to assess what others know, and how much to bet. The final question is different, and less comfortable: when you profit from ignorance, from others\u0026rsquo; institutional inability to act in UU situations, who exactly are you profiting from? That\u0026rsquo;s \u003ca href=\"/posts/the-moral-philosophy-of-investing-in-ignorance/\"\u003ePart 5\u003c/a\u003e.\u003c/p\u003e\n","summary":"The Kelly Criterion assumes you know your probability of winning. In a UU world, you don't, and heuristics like Zeckhauser's Maxim B replace false precision.","image":"https://static.philippdubach.com/ograph/ograph-bet-sizing-frontier.jpg","date_published":"2026-04-17T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Analysis","word_count":1785,"reading_time_minutes":9,"keywords":["Kelly Criterion limitations","position sizing under uncertainty","Kelly Samuelson debate","Zeckhauser Maxim B","unknown unknowable investing","fractional Kelly half Kelly"],"section":"posts"}},{"id":"https://philippdubach.com/posts/do-not-disturb-my-circles/","url":"https://philippdubach.com/posts/do-not-disturb-my-circles/","title":"Do Not Disturb My Circles","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-circles-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/circles-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/circles-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/circles-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/circles-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/circles-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/circles-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/circles-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/circles-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/circles-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/circles-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/circles-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/circles-cover.jpg\"\n           alt=\"Editorial cover illustration evoking Archimedes drawing geometric circles in the sand with the long shadow of an approaching soldier — paralleled to the conscription of AI for science into the chatbot arms race\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-circles-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/circles-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial cover illustration evoking Archimedes drawing geometric circles in the sand with the long shadow of an approaching soldier — paralleled to the conscription of AI for science into the chatbot arms race\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eIf I\u0026rsquo;d had my way, we would have left it in the lab for longer and done more things like AlphaFold, maybe cured cancer or something like that.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat\u0026rsquo;s \u003ca href=\"https://en.wikipedia.org/wiki/Demis_Hassabis\"\u003eDemis Hassabis\u003c/a\u003e (I cannot recomend watching\n\u003ca href=\"https://www.youtube.com/watch?v=d95J8yzvjbQ\"\u003eThe Thinking Game\u003c/a\u003e enough and or read\n\u003ca href=\"https://www.penguinrandomhouse.com/books/752231/the-infinity-machine-by-sebastian-mallaby/\"\u003eThe Infinity Machine\u003c/a\u003e), the CEO of Google DeepMind and a Nobel Prize winner, describing the future he didn\u0026rsquo;t get.\u003c/p\u003e\n\u003cp\u003eHe wanted a CERN for artificial intelligence. A decade or two of careful, methodical work. The world\u0026rsquo;s best scientists collaborating on each step toward general intelligence, understanding what they built before building the next thing. In the meantime, AI for science, narrow tools like AlphaFold, would ship real benefits: cures, new materials, maybe a crack at fusion. Not chatbots. He didn\u0026rsquo;t get that future. None of us did. Instead we got a commercial arms race, a $690 billion annual infrastructure buildout, and the greatest concentration of technical talent in human history pointed at making autocomplete better.\u003c/p\u003e\n\u003cp\u003eThis is a story about capital misallocation. But it\u0026rsquo;s also a very old story.\u003c/p\u003e\n\u003ch2 id=\"geometry-in-the-sand\"\u003eGeometry in the sand\u003c/h2\u003e\n\u003cp\u003eIn 214 BC, the Roman general Marcellus brought a fleet to Syracuse. Standing between Rome and the richest city in Sicily was one man: \u003ca href=\"https://en.wikipedia.org/wiki/Archimedes\"\u003eArchimedes\u003c/a\u003e, the greatest scientist of the ancient world, a mathematician whose work on the lever, the screw, and the principles of buoyancy would outlast every empire he lived under.\u003c/p\u003e\n\u003cp\u003eArchimedes did not want to build weapons. \u003ca href=\"https://en.wikipedia.org/wiki/Parallel_Lives\"\u003ePlutarch\u003c/a\u003e, writing in the \u003cem\u003eLife of Marcellus\u003c/em\u003e, says Archimedes designed and contrived his machines \u0026ldquo;not as matters of any importance, but as mere amusements in geometry.\u0026rdquo; He regarded the whole business as ignoble, beneath the dignity of pure mathematics. But his patron King Hiero II needed defenses, and Archimedes was the only man who could provide them. So he built them. Catapults that could sink a ship at range. The \u003ca href=\"https://en.wikipedia.org/wiki/Claw_of_Archimedes\"\u003eClaw of Archimedes\u003c/a\u003e, an iron grappling device that could lift a Roman galley out of the water and drop it. Possibly parabolic mirrors that focused sunlight to set ships on fire, though historians still debate that one.\u003c/p\u003e\n\u003cp\u003eThe machines worked. Plutarch writes that the Romans became so terrified that \u0026ldquo;whenever they saw a bit of rope or a stick of timber projecting over the wall, they cried \u0026lsquo;Archimedes is training some engine upon us,\u0026rsquo; and turned their backs and fled.\u0026rdquo; They held off Rome for two years.\u003c/p\u003e\n\u003cp\u003eThen Syracuse fell anyway. In 212 BC, Roman soldiers breached the walls during a festival. A soldier found Archimedes drawing geometric figures in the sand. According to the tradition passed down through \u003ca href=\"https://en.wikipedia.org/wiki/Valerius_Maximus\"\u003eValerius Maximus\u003c/a\u003e and others, his last words were \u003cem\u003e\u0026ldquo;Noli turbare circulos meos\u0026rdquo;\u003c/em\u003e: do not disturb my circles.\u003c/p\u003e\n\u003cp\u003eMarcellus had ordered Archimedes taken alive. The order didn\u0026rsquo;t matter. The soldier killed him. The geometry died with him. The war machines, the things Archimedes considered beneath his real work, survived in military engineering textbooks for centuries. His mathematical treatises survived only by accident, through a single Byzantine manuscript \u003ca href=\"https://en.wikipedia.org/wiki/Archimedes_Palimpsest\"\u003escraped and overwritten with prayer texts\u003c/a\u003e in the 13th century.\u003c/p\u003e\n\u003cp\u003eI thought about this when I watched Demis Hassabis in a \u003ca href=\"https://www.youtube.com/watch?v=C0gErQtnNFE\"\u003erecent interview with Cleo Abram\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"the-conscription\"\u003eThe conscription\u003c/h2\u003e\n\u003cp\u003eHe had been building learning systems at DeepMind for years. The work was pointed at science. AlphaFold was the first proof that AI could crack fundamental problems in biology. Move 37, AlphaGo\u0026rsquo;s famous creative play against Lee Sedol in 2016, was the proof that AI systems could discover things no human had considered.\u003c/p\u003e\n\u003cp\u003eThen ChatGPT happened. Google went code red. Hassabis, the man who wanted to solve protein folding and maybe crack fusion, became the man who runs all of Google\u0026rsquo;s AI, including the consumer products he\u0026rsquo;d never wanted to focus on.\u003c/p\u003e\n\u003cp\u003eHe\u0026rsquo;s candid about what was lost:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eMy ideal was to approach the latter stages of building AGI using the scientific method, very carefully, very precisely, very thoughtfully, in a CERN-like way. That might take a decade, even two decades longer. But I think that would make sense given the enormity of what we\u0026rsquo;re dealing with.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAnd about the irony:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eLanguage was a lot easier than we were all expecting. Even those of us who were obviously optimists about the whole technology. We thought maybe there would be one or two or three more breakthroughs needed. But it turned out transformers and some reinforcement learning on top was enough.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe ease of the advance was the thing that derailed the deeper work. Language models turned out to be good enough for consumer products, and consumer products generate revenue, and revenue attracts competition, and competition creates the arms race that now consumes everything. DeepMind had \u0026ldquo;fairly equivalent systems\u0026rdquo; to ChatGPT at the time, Hassabis says. They chose not to release them. That choice was taken from him.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-a-dollar-buys\"\u003eWhat a dollar buys\u003c/h2\u003e\n\u003cp\u003eThe resource allocation case is simple enough to state in one line, though the implications are not.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.nature.com/articles/s41586-021-03819-2\"\u003eAlphaFold 2\u003c/a\u003e trained on 128 Google TPUv3 chips for approximately 11 days. At \u003ca href=\"https://cloud.google.com/tpu/pricing\"\u003eGoogle Cloud\u0026rsquo;s public pricing\u003c/a\u003e of roughly $32 per hour per TPU, the estimated training cost is somewhere under \u003cstrong\u003e$1 million\u003c/strong\u003e. It predicted the three-dimensional structures of 200 million proteins. Over 3 million scientists now use it. A pharma executive told Hassabis that \u0026ldquo;almost every drug developed from now on will have probably used AlphaFold in its process.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eNow the other side of the ledger. \u003ca href=\"https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models/\"\u003eGPT-4\u0026rsquo;s training cost\u003c/a\u003e an estimated \u003cstrong\u003e$78 million\u003c/strong\u003e. \u003ca href=\"https://fortune.com/2024/04/18/google-gemini-cost-191-million-to-train-stanford-university-report-estimates/\"\u003eGemini Ultra ran to roughly \u003cstrong\u003e$191 million\u003c/strong\u003e\u003c/a\u003e. OpenAI\u0026rsquo;s Orion \u003ca href=\"https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/\"\u003eexceeded \u003cstrong\u003e$500 million\u003c/strong\u003e\u003c/a\u003e for a single training run, and the model was so disappointing they downgraded it from GPT-5 to GPT-4.5. OpenAI\u0026rsquo;s inference spending alone, just the cost of running the models after training, \u003ca href=\"https://aibusiness.com/language-models/ai-model-scaling-isn-t-over-it-s-entering-a-new-era\"\u003ehit \u003cstrong\u003e$2.3 billion in 2024\u003c/strong\u003e\u003c/a\u003e. That is 15 times what they spent training GPT-4.5.\u003c/p\u003e\n\u003cp\u003eAlphaFold cost less to train than OpenAI spends on inference in a single day.\u003c/p\u003e\n\u003cp\u003eZoom out further. The Big 4 hyperscalers, Amazon, Alphabet, Meta, Microsoft, are guiding to \u003ca href=\"https://www.goldmansachs.com/insights/articles/why-ai-companies-may-invest-more-than-500-billion-in-2026\"\u003e\u003cstrong\u003e$610-665 billion\u003c/strong\u003e\u003c/a\u003e in capital expenditure for 2026. \u003ca href=\"https://www.goldmansachs.com/insights/articles/why-ai-companies-may-invest-more-than-500-billion-in-2026\"\u003eGoldman Sachs projects\u003c/a\u003e cumulative 2025-2027 spending at $1.15 trillion. As I noted in \u003ca href=\"/posts/peter-thiels-physics-department/\"\u003ePeter Thiel\u0026rsquo;s Physics Department\u003c/a\u003e, Big Tech spends \u003cstrong\u003e75 times\u003c/strong\u003e more on AI than the entire US federal science budget: $250 billion versus $3.3 billion per year. The DOE Genesis Mission, the flagship US government program for AI-driven scientific discovery, \u003ca href=\"https://www.energy.gov/science/articles/doe-announces-genesis-mission-advance-ai-science\"\u003ereceived \u003cstrong\u003e$320 million\u003c/strong\u003e in its first round\u003c/a\u003e. That is less than Meta spends on AI infrastructure in a single week.\u003c/p\u003e\n\u003cp\u003eThe infrastructure being built is not for protein folding. It is not for materials science or fusion plasma control or genomics. It is for chatbots, image generators, and coding assistants. \u003ca href=\"https://sequoiacap.com/article/ais-600b-question/\"\u003eSequoia\u0026rsquo;s David Cahn calculated\u003c/a\u003e the AI ecosystem needs \u003cstrong\u003e$600 billion in annual revenue\u003c/strong\u003e to justify current infrastructure spending. It generates perhaps $80-120 billion. And nearly all of that revenue comes from commercial applications: subscriptions, API access, enterprise contracts for systems that summarize emails and draft marketing copy.\u003c/p\u003e\n\u003cp\u003eThe bottleneck for AI for science was never money. AlphaFold proved that. It was always about who works on what, and the chatbot economy answered that question for an entire generation of researchers.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-the-circles-produced\"\u003eWhat the circles produced\u003c/h2\u003e\n\u003cp\u003eWhen Hassabis\u0026rsquo;s teams were allowed to focus on science, when the circles were left undisturbed, this is what happened.\u003c/p\u003e\n\u003cp\u003eIn The Thinking Game there\u0026rsquo;s a moment that captures it perfectly. The original plan for AlphaFold was conventional: build a server, let scientists submit protein sequences one at a time, email back the predicted structures. Standard approach, used by the whole field for 40 years. Then Hassabis started doing arithmetic on his phone in the middle of the meeting. Two hundred million known proteins. One fold every ten seconds. How many TPUs do we have? He looked up and said something like, \u0026ldquo;\u003ca href=\"https://youtu.be/d95J8yzvjbQ?si=1VVejCeVhn_1_3m6\u0026amp;t=4495\"\u003eWhy don\u0026rsquo;t we just fold everything?\u003c/a\u003e\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eIt would be, he realized, actually less work than building the server.\u003c/p\u003e\n\u003cp\u003eSo they folded everything. AlphaFold predicted the structures of \u003cstrong\u003e200 million proteins\u003c/strong\u003e and put them in a \u003ca href=\"https://alphafold.ebi.ac.uk/\"\u003efree database\u003c/a\u003e. The nuclear pore complex, one of the largest and most important proteins in the body, a donut-shaped gateway that controls nutrient flow in and out of the cell nucleus, was \u003ca href=\"https://www.science.org/doi/10.1126/science.abm9326\"\u003esolved within months\u003c/a\u003e of AlphaFold\u0026rsquo;s release. Researchers working on neglected diseases, malaria, Chagas, leishmaniasis, diseases that affect hundreds of millions of people but attract little pharma funding, now get protein structures for free. Plant scientists working on climate-resilient crops can skip years of crystallography and go straight to the biology.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.isomorphiclabs.com/\"\u003eIsomorphic Labs\u003c/a\u003e, the DeepMind spinoff, is running 18-19 drug programs across cardiovascular disease, cancer, and immunology. \u003ca href=\"/posts/ai-can-now-design-drugs-in-seconds-we-still-cant-tell-you-if-they-work./\"\u003eIsoDDE, its drug design engine\u003c/a\u003e, hits 50% on the hardest protein-ligand benchmarks versus 23% for AlphaFold 3. \u003ca href=\"https://deepmind.google/discover/blog/alphagenome-predicts-the-effects-of-dna-variation-on-gene-regulation/\"\u003eAlphaGenome\u003c/a\u003e is decoding the 98% of the human genome that doesn\u0026rsquo;t code for proteins, the part where most disease-causing mutations hide. Jennifer Doudna, the CRISPR pioneer, asked Hassabis directly about combining AlphaGenome with CRISPR to identify and fix the exact genetic changes causing disease. His answer: \u0026ldquo;Still not probably good enough yet. But you can imagine a future version.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"/posts/the-last-architecture-designed-by-hand/\"\u003eAlphaEvolve\u003c/a\u003e found a 23% speedup inside Gemini\u0026rsquo;s own architecture, recovering 0.7% of Google\u0026rsquo;s total compute. DeepMind\u0026rsquo;s fusion work \u003ca href=\"https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/\"\u003econtrolled plasma autonomously\u003c/a\u003e in a real tokamak. \u003ca href=\"https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/\"\u003eGNoME\u003c/a\u003e identified 2.2 million new crystal structures, equivalent to roughly 800 years of prior human discovery in materials science.\u003c/p\u003e\n\u003cp\u003eAll of this on a fraction of the compute that powers the chatbot economy. I keep coming back to this: the entire portfolio of DeepMind\u0026rsquo;s scientific work, the Nobel Prize, the drug programs, the materials, the fusion experiments, consumed less compute than a single frontier chatbot burns through in inference costs per quarter.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-case-for-the-war-machines\"\u003eThe case for the war machines\u003c/h2\u003e\n\u003cp\u003eI want to present the counterargument honestly, because it\u0026rsquo;s not trivial.\u003c/p\u003e\n\u003cp\u003eThe commercial race funded a compute buildout that wouldn\u0026rsquo;t exist without chatbot demand. $690 billion in 2026 capex built data centers that can, in principle, be repurposed for scientific workloads. The talent pipeline expanded: a generation of ML engineers entered the field because consumer AI products made it exciting and lucrative. Millions of users stress-tested these models in ways internal testing never could, revealing failure modes and edge cases that improve the underlying systems. Hassabis himself acknowledges this. In the HUGE* interview he listed the benefits: \u0026ldquo;lightning speed\u0026rdquo; progress, democratized access to cutting-edge AI \u0026ldquo;perhaps only 3 to 6 months behind what is actually in the labs,\u0026rdquo; and societal normalization that prepares people for bigger changes ahead.\u003c/p\u003e\n\u003cp\u003eAnd there\u0026rsquo;s the funding argument. Google\u0026rsquo;s $132 billion in net income funds DeepMind. Gemini\u0026rsquo;s commercial revenue helps justify the research budget. Without the chatbot economy, would Alphabet spend billions on AI research at all?\u003c/p\u003e\n\u003cp\u003eThe strongest version of this argument goes: you can\u0026rsquo;t have the cathedral without the wool merchants. Bell Labs needed AT\u0026amp;T\u0026rsquo;s monopoly revenue. The Apollo program needed Cold War spending. Scientific breakthroughs don\u0026rsquo;t fund themselves. The commercial race, ugly as it is, is the mechanism that makes the science possible.\u003c/p\u003e\n\u003ch2 id=\"why-the-steelman-breaks\"\u003eWhy the steelman breaks\u003c/h2\u003e\n\u003cp\u003eI\u0026rsquo;ve thought about this for a while, and I think it\u0026rsquo;s wrong.\u003c/p\u003e\n\u003cp\u003eStart with the compute argument. The infrastructure being built is overwhelmingly inference infrastructure: data centers optimized for running chatbot queries at scale, not for training scientific models. AlphaFold trains on 128 TPUs. It doesn\u0026rsquo;t need a $75 billion annual capex program. The buildout serves commercial demand. Calling it a foundation for scientific AI is like calling a shopping mall a foundation for particle physics because they both use electricity.\u003c/p\u003e\n\u003cp\u003eThe talent argument has the same problem. The pipeline filled, but it filled with the wrong skills and pointed in the wrong direction. \u003ca href=\"https://hai.stanford.edu/ai-index/2025-ai-index-report/research-and-development\"\u003eStanford HAI\u0026rsquo;s 2025 AI Index\u003c/a\u003e found that \u003cstrong\u003e70%\u003c/strong\u003e of AI PhDs took private sector jobs in 2023, up from roughly 20% two decades ago. \u003ca href=\"https://www.nature.com/articles/d41586-026-00474-3\"\u003eBruce Schneier wrote in \u003cem\u003eNature\u003c/em\u003e\u003c/a\u003e that the exodus threatens \u0026ldquo;innovation driven by curiosity rather than profit.\u0026rdquo; The ML engineers entering the field are optimizing RLHF, fine-tuning chat models, building prompt engineering toolchains, and competing on Chatbot Arena leaderboards. These are not the skills that fold proteins or control plasma. The talent that cracks drug discovery needs computational chemistry, molecular dynamics, quantum mechanics. The talent attracted by the chatbot boom is, for the most part, not that talent.\u003c/p\u003e\n\u003cp\u003eThe stress-testing argument is real but narrow. Millions of users proved that language models can summarize documents and brainstorm ideas. That tells you nothing about whether they can predict which genetic mutations cause disease. The applications share a model architecture but almost nothing else.\u003c/p\u003e\n\u003cp\u003eAnd the funding argument, the one that seems hardest to dismiss, actually argues the opposite of what its proponents think. The best historical parallel is \u003ca href=\"https://en.wikipedia.org/wiki/Bell_Labs\"\u003eBell Labs\u003c/a\u003e. Founded in 1925 as the research arm of AT\u0026amp;T\u0026rsquo;s regulated telephone monopoly, Bell Labs produced the \u003ca href=\"https://en.wikipedia.org/wiki/Transistor\"\u003etransistor\u003c/a\u003e, the \u003ca href=\"https://en.wikipedia.org/wiki/Laser\"\u003elaser\u003c/a\u003e, \u003ca href=\"https://en.wikipedia.org/wiki/Unix\"\u003eUnix\u003c/a\u003e, the \u003ca href=\"https://en.wikipedia.org/wiki/C_(programming_language)\"\u003eC programming language\u003c/a\u003e, \u003ca href=\"https://en.wikipedia.org/wiki/Information_theory\"\u003einformation theory\u003c/a\u003e, and the discovery of \u003ca href=\"https://en.wikipedia.org/wiki/Cosmic_microwave_background\"\u003ecosmic microwave background radiation\u003c/a\u003e. Ten Nobel Prizes. Five Turing Awards. \u003ca href=\"https://www.construction-physics.com/p/what-would-it-take-to-recreate-bell\"\u003eBrian Potter in \u003cem\u003eConstruction Physics\u003c/em\u003e\u003c/a\u003e calls the conditions \u0026ldquo;unrepeatable\u0026rdquo;: a vertically integrated monopoly that could afford to fund research with no immediate commercial return.\u003c/p\u003e\n\u003cp\u003eThen AT\u0026amp;T was broken up in 1984. Commercial competition arrived. What happened next is instructive: the research workforce \u003ca href=\"https://en.wikipedia.org/wiki/Bell_Labs\"\u003edropped from roughly 1,300 to 500 by 2002\u003c/a\u003e. Only one post-divestiture employee won a Nobel Prize. Bell Labs was passed from AT\u0026amp;T to Lucent to Alcatel to Nokia, each owner less interested in fundamental research than the last. By 2008, \u003ca href=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC8792522/\"\u003efour physicists remained\u003c/a\u003e in basic research. By 2016, what had been the most productive research institution in human history was a division of a Finnish telecom company.\u003c/p\u003e\n\u003cp\u003eThe irony is precise: the people who argue that commercial pressure funds great science are citing a lab that produced its greatest work under monopoly protection \u003cem\u003efrom\u003c/em\u003e commercial pressure, and died the moment that protection was removed.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eHassabis\u0026rsquo;s vision, the CERN model, is the Bell Labs model. Let fundamental research breathe. Shield it from quarterly earnings. Fund it with patient capital. He had that at DeepMind, funded by Google\u0026rsquo;s search advertising monopoly, insulated from product deadlines, free to spend six years building AlphaGo before it produced a single dollar of revenue. Then the commercial race consumed the insulation.\u003c/p\u003e\n\u003cp\u003eThe funding was already there. What he lost was the institutional focus.\u003c/p\u003e\n\u003ch2 id=\"the-circles\"\u003eThe circles\u003c/h2\u003e\n\u003cp\u003eArchimedes held off Rome for two years. Then the soldier came. The war machines didn\u0026rsquo;t save Syracuse. They bought time, and that time ran out.\u003c/p\u003e\n\u003cp\u003eI don\u0026rsquo;t think the chatbot era saved AI for science. I think it ate the oxygen. The talent went to RLHF optimization. The compute went to inference farms. The institutional attention went to quarterly product launches. Hassabis is now simultaneously building the war machines and drawing the circles: running Gemini and funding Isomorphic, shipping chatbots and folding proteins. That he manages both is remarkable. But it\u0026rsquo;s a compromise, and the compromise has a cost measured in drug programs that don\u0026rsquo;t exist, diseases that aren\u0026rsquo;t being studied, materials that haven\u0026rsquo;t been found.\u003c/p\u003e\n\u003cp\u003eThe question is not whether chatbots are useful. They are. I use them constantly. The question is whether future historians will look at 2023-2026 and see a period when the most capable scientific tool in human history was mostly pointed at drafting emails and generating stock photos, and wonder what we were thinking. The way we look at that Roman soldier: someone who destroyed something more valuable than he could understand.\u003c/p\u003e\n\u003cp\u003eIn the interview, Hassabis is asked what he would want said at his funeral. His answer was immediate:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eI would hope that they would say that my life was of benefit and service to humanity.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe circles are still there, drawn in the sand between product launches.\u003c/p\u003e\n","summary":"AlphaFold cost under $1M to train. OpenAI spends $2.3B on inference. The chatbot era consumed the talent and compute that could have cured diseases.","image":"https://static.philippdubach.com/ograph/ograph-do-not-disturb-my-circles.jpg","date_published":"2026-04-13T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Commentary","word_count":2594,"reading_time_minutes":13,"keywords":["AI for science vs chatbots","Demis Hassabis AlphaFold Gemini","AlphaFold training cost","AI compute scientific research funding","Bell Labs monopoly research decline","AI chatbot misallocation","AI opportunity cost","AI brain drain academia","scientific AI funding gap","AI capital allocation 2026","DeepMind scientific AI","AI infrastructure spending science","Hassabis cure cancer quote","chatbot era capital misallocation","AI for science underfunded","fundamental research AI funding","AI resource allocation science commercial"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-geometry-of-who-knows-what/","url":"https://philippdubach.com/posts/the-geometry-of-who-knows-what/","title":"The Geometry of Who Knows What","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-edge-of-knowledge-3-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/edge-of-knowledge-3-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-3-cover.jpg\"\n           alt=\"Editorial illustration: two mirror-image charcoal-silhouette figures face each other across a fog-filled gap, each holding a partial document, standing on an ochre-tinted grid floor that suggests an information matrix where neither side knows more than the other\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-edge-of-knowledge-3-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/edge-of-knowledge-3-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial illustration: two mirror-image charcoal-silhouette figures face each other across a fog-filled gap, each holding a partial document, standing on an ochre-tinted grid floor that suggests an information matrix where neither side knows more than the other\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003cem\u003eInvesting at the Edge of Knowledge, Part 3 · \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003eStart with Part 1\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;One of these days in your travels, a guy is going to show you a brand-new deck of cards on which the seal is not yet broken. Then this guy is going to offer to bet you that he can make the jack of spades jump out of this brand-new deck of cards and squirt cider in your ear. But, son, you do not accept this bet, because as sure as you stand there, you\u0026rsquo;re going to wind up with an ear full of cider.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eZeckhauser opens his \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205821\"\u003e2006 paper\u003c/a\u003e with this advice from Sky Masterson\u0026rsquo;s father in \u003cem\u003eGuys and Dolls\u003c/em\u003e. The lesson is as old as markets: if someone offers you a bet where they seem to know something you don\u0026rsquo;t, they probably do. Don\u0026rsquo;t take that bet.\u003c/p\u003e\n\u003cp\u003eBut Zeckhauser\u0026rsquo;s point isn\u0026rsquo;t the lesson. It\u0026rsquo;s the exception. What happens when nobody has the marked deck? When the ambiguity is shared, when neither side can enumerate the states of the world, the Sky Masterson rule stops applying, and the investors who keep following it anyway leave money on the table.\u003c/p\u003e\n\u003cp\u003eIn \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003ePart 1\u003c/a\u003e I laid out Zeckhauser\u0026rsquo;s taxonomy: risk, uncertainty, and ignorance as three distinct problems. In \u003ca href=\"/posts/ambiguity-by-design/\"\u003ePart 2\u003c/a\u003e I examined why investors flee the third box, the mechanism of ambiguity aversion. This piece asks the question that follows: when you\u0026rsquo;re facing someone on the other side of a trade, how do you figure out whether they know something you don\u0026rsquo;t?\u003c/p\u003e\n\u003ch2 id=\"the-two-matrices\"\u003eThe two matrices\u003c/h2\u003e\n\u003cp\u003eZeckhauser draws two matrices that I think are the most underappreciated diagrams in the paper.\u003c/p\u003e\n\u003cp\u003eThe first covers investing under uncertainty, where the possible states are known but probabilities are hard. It\u0026rsquo;s a 2x2: Easy or Hard for You to Estimate Value crossed with Easy or Hard for Others. Box A (easy for both) is the standard competitive market: lots of participants, tight spreads, no edge for anyone. Box B (easy for you, hard for others) is where you\u0026rsquo;re the informed party: think a biotech scientist evaluating a drug trial readout. Box C (hard for you, easy for others) is the danger zone, the other side has the marked deck, and the Sky Masterson rule applies in full. Box D (hard for both) is where it gets interesting. Neither side has an information advantage. Both are operating under genuine uncertainty. Buffett\u0026rsquo;s earthquake reinsurance sits here.\u003c/p\u003e\n\u003cp\u003eThe second matrix covers investing under ignorance, where even the possible states are unknown. It\u0026rsquo;s simpler: a 2x1. Unknown to You and Known to Others (Box E) versus Unknown to You and Unknown to Others (Box F). Box E is dangerous. Box F is opportunity.\u003c/p\u003e\n\u003cp\u003eThe point most people miss is about misidentification. Most investors assume they\u0026rsquo;re in Box C or Box E: the other side knows more. This assumption is the legacy of standard information asymmetry models in finance, where \u003ca href=\"https://www.sfu.ca/~wainwrig/Econ400/akerlof.pdf\"\u003eAkerlof\u0026rsquo;s lemons problem (1970)\u003c/a\u003e and the Glosten-Milgrom bid-ask spread model (1985) trained a generation to worry about adverse selection. Those worries are justified in Boxes A through C and in Box E. But in Box D and Box F, you\u0026rsquo;re not facing an informed counterparty. You\u0026rsquo;re facing someone equally confused, or someone who has left the market entirely because they can\u0026rsquo;t tolerate the confusion.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.cs.princeton.edu/courses/archive/spr09/cos444/papers/BazermanSamuelson83.pdf\"\u003eBazerman and Samuelson (1983)\u003c/a\u003e showed that even in clean experimental settings, people are terrible at accounting for why the other side is willing to trade. Their winner\u0026rsquo;s curse experiments found that bidders consistently failed to discount for the fact that winning an auction is bad news about your estimate\u0026rsquo;s accuracy. In a UU world, this failure compounds. You can\u0026rsquo;t compute the conditional expectation of the asset\u0026rsquo;s value given that the other side is selling, because neither of you can define the state space over which that expectation would be calculated.\u003c/p\u003e\n\u003cp\u003eThe practical question is always: am I in Box C or Box D? Am I in Box E or Box F? And the answer is almost never available from the data. It\u0026rsquo;s a judgment call, informed by what you know about the seller\u0026rsquo;s constraints, their institutional context, and whether the source of the ambiguity is private or shared.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"institutional-blindness-as-structural-opportunity\"\u003eInstitutional blindness as structural opportunity\u003c/h2\u003e\n\u003cp\u003eThe California Earthquake Authority story is Zeckhauser\u0026rsquo;s best illustration, and it deserves the full telling.\u003c/p\u003e\n\u003cp\u003eIn the late 1990s, California needed reinsurance for earthquake risk. The authority offered a \u003cstrong\u003e$1 billion\u003c/strong\u003e slice at premiums that worked out to roughly five times actuarial value. Wall Street said no. Not because investment banks thought the Earthquake Authority possessed secret seismological knowledge. Nobody has an informational edge over the reinsurer when it comes to tectonic plate movement. The ambiguity was shared: Box F.\u003c/p\u003e\n\u003cp\u003eWall Street said no because their internal processes required probability estimates that didn\u0026rsquo;t exist. Compliance teams required distributional assumptions about tail risk that nobody could provide. Risk models required defined scenarios, and \u0026ldquo;catastrophic earthquake in the next 12 months\u0026rdquo; didn\u0026rsquo;t fit neatly into any existing framework. The honest assessment, \u0026ldquo;we have no idea about the probability, but the price is very high,\u0026rdquo; didn\u0026rsquo;t fit the form. Buffett took the entire slice.\u003c/p\u003e\n\u003cp\u003eThis is Zeckhauser\u0026rsquo;s Maxim H: \u0026ldquo;Do not engage in the heuristic reasoning that just because you do not know the risk, others do.\u0026rdquo; The Wall Street banks weren\u0026rsquo;t outcompeted by someone with better information. They were outcompeted by someone with fewer institutional constraints. Buffett could hold a position that was impossible to model because he answered to shareholders who trusted his judgment, not to compliance officers who required his models.\u003c/p\u003e\n\u003cp\u003eGeneralize this, and you get a structural feature of UU markets that doesn\u0026rsquo;t go away. Fiduciary duty requires estimable risk. Compliance models require defined scenarios. Career risk creates what Zeckhauser calls Monday Morning Quarterback (MMQ) risk: the danger that a bad outcome on a good decision destroys your reputation. Professional investors face a permanent bias toward the risk box (known probabilities) and away from the ignorance box (unknown states). This isn\u0026rsquo;t a market inefficiency waiting to be arbitraged. It\u0026rsquo;s an institutional constant. And it creates a permanent supply of mispriced assets for those without the same constraints.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://projecteuclid.org/journals/annals-of-statistics/volume-4/issue-6/Agreeing-to-Disagree/10.1214/aos/1176343654.full\"\u003eAumann (1976)\u003c/a\u003e proved that rational agents with common priors who share their posterior beliefs must converge: they cannot \u0026ldquo;agree to disagree.\u0026rdquo; The theorem is elegant and, in UU markets, irrelevant. Aumann assumes common priors and known state spaces. In Box 3, both assumptions fail. The state space is undefined, so there are no common priors to start from. Disagreement in UU isn\u0026rsquo;t a puzzle to be resolved by more information exchange. It\u0026rsquo;s the default condition. Two equally rational investors can look at the same situation and reach opposite conclusions without either one being wrong, because they\u0026rsquo;re not disagreeing about probabilities. They\u0026rsquo;re disagreeing about what world they\u0026rsquo;re in.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-advantage-versus-selection-formula\"\u003eThe advantage-versus-selection formula\u003c/h2\u003e\n\u003cp\u003eZeckhauser offers a framework for deciding when to invest despite potential adverse selection. Your expected return depends on three things: your absolute advantage (\u003cem\u003ea\u003c/em\u003e), the probability the other side is better informed (\u003cem\u003ep\u003c/em\u003e), and the selection factor (\u003cem\u003es\u003c/em\u003e), how much their information hurts you. Invest when the combination exceeds the cost of entry.\u003c/p\u003e\n\u003cp\u003eThe formula matters less than the logic behind it. A large absolute advantage provides insurance against adverse selection. Zeckhauser\u0026rsquo;s Maxim E: \u0026ldquo;A significant absolute advantage offers some protection against potential selection.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eWhat counts as absolute advantage? Complementary skills are the classic answer: the real estate developer who creates value a passive investor cannot, the venture capitalist whose operational expertise and network make the company worth more than the sum of its capital. Their return isn\u0026rsquo;t compensation for bearing risk. It\u0026rsquo;s a share of value they helped create. Sidecar investors, Zeckhauser\u0026rsquo;s term for those who invest alongside skilled operators, earn excess returns because access to these deals is limited and the value creation is real.\u003c/p\u003e\n\u003cp\u003eBut complementary skills aren\u0026rsquo;t the only form of advantage. In the \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003eSaaSpocalypse\u003c/a\u003e, the \u0026ldquo;absolute advantage\u0026rdquo; for a buyer at IGV $80 was time horizon. If you could hold for three to five years, tolerate the MMQ risk of further drawdowns, and ignore the career consequences of looking wrong for a few quarters, you had a structural edge over institutional sellers who couldn\u0026rsquo;t do the same. That\u0026rsquo;s not analytical skill. It\u0026rsquo;s constraint arbitrage. And constraint arbitrage is a legitimate form of absolute advantage, because fiduciary requirements and career incentives are structural features that won\u0026rsquo;t disappear next quarter.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205848\"\u003eLarry Summers (2006)\u003c/a\u003e raises the obvious objection to the sidecar concept: \u0026ldquo;identifying skilled UU managers may be no easier than picking market-beating investments directly.\u0026rdquo; The sidecar doesn\u0026rsquo;t solve the epistemological problem. It relocates it from asset selection to manager selection. How do you know the driver is skilled rather than lucky?\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205858\"\u003eRichard Robb (2006)\u003c/a\u003e pushes further. He argues that UU knowledge is \u0026ldquo;uncommunicable.\u0026rdquo; If a mechanism for generating excess returns could be expressed as a process, someone would have arbitraged it away. Ricardo, on the eve of Waterloo, might have said \u0026ldquo;British Government bonds offer a high reward for the risk.\u0026rdquo; But what would it look like for that statement to be proven false? The claim is unfalsifiable because it lives in the ignorance box where probability statements don\u0026rsquo;t have clear empirical content. If the sidecar driver can\u0026rsquo;t explain their edge in terms you can evaluate, how do you distinguish skill from survivorship bias?\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eI think both objections are correct and both miss something. They\u0026rsquo;re correct that sidecar investing doesn\u0026rsquo;t eliminate the evaluation problem. But they miss that the evaluation problem has different difficulty levels depending on context. Evaluating whether a real estate developer can build and lease a building is easier than evaluating whether a macro hedge fund can predict interest rates. Evaluating whether Buffett\u0026rsquo;s insurance math is sound is easier than evaluating whether a biotech startup\u0026rsquo;s drug candidate works. The sidecar concept isn\u0026rsquo;t \u0026ldquo;trust someone blindly.\u0026rdquo; It\u0026rsquo;s \u0026ldquo;invest alongside someone whose edge you can partly verify, in situations where your own analytical advantage is zero.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eKnowing the geometry of who-knows-what is necessary but not sufficient. You\u0026rsquo;ve identified a Box D or Box F opportunity. You\u0026rsquo;ve assessed your absolute advantage. You\u0026rsquo;ve decided the other side isn\u0026rsquo;t better informed. Now you need to decide how much to bet. In a UU world, the most famous formula for position sizing, the Kelly Criterion, breaks down in the ways you\u0026rsquo;d expect. That\u0026rsquo;s \u003ca href=\"/posts/bet-sizing-at-the-frontier/\"\u003ePart 4\u003c/a\u003e.\u003c/p\u003e\n","summary":"When neither side can define the states of the world, adverse selection fears are misplaced. Zeckhauser's information matrices and constraint arbitrage.","image":"https://static.philippdubach.com/ograph/ograph-geometry-who-knows-what.jpg","date_published":"2026-04-13T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Analysis","word_count":1727,"reading_time_minutes":9,"keywords":["information asymmetry investing","adverse selection financial markets","sidecar investing Zeckhauser","Buffett California Earthquake Authority reinsurance","winner's curse investing","Aumann agreeing to disagree","constraint arbitrage institutional investors","limits to arbitrage institutional constraints","Monday morning quarterback risk","Akerlof lemons problem investing","complementary skills investing edge","Summers sidecar objection","Robb uncommunicable knowledge","fiduciary duty ambiguity avoidance"],"section":"posts"}},{"id":"https://philippdubach.com/posts/why-lillys-weight-loss-pill-isnt-a-peptide/","url":"https://philippdubach.com/posts/why-lillys-weight-loss-pill-isnt-a-peptide/","title":"Why Lilly's Weight Loss Pill Isn't a Peptide","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-pill-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/pill-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/pill-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/pill-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/pill-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pill-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/pill-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/pill-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/pill-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pill-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/pill-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/pill-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pill-cover.jpg\"\n           alt=\"Editorial cover illustration for an analysis of Eli Lilly\u0026#39;s Foundayo and the oral GLP-1 weight loss pill race against Novo Nordisk\u0026#39;s oral Wegovy\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-pill-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/pill-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial cover illustration for an analysis of Eli Lilly\u0026#39;s Foundayo and the oral GLP-1 weight loss pill race against Novo Nordisk\u0026#39;s oral Wegovy\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eNovo Nordisk spent decades and $1.8 billion learning how to get a peptide past the gut. Eli Lilly looked at the same problem and decided to skip it entirely.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eYour gastrointestinal tract is a 30-foot disassembly line for proteins. Acid denatures them, pepsin cleaves them, trypsin finishes the job, and the mucus layer blocks whatever survives. Sean Geiger\u0026rsquo;s excellent \u003ca href=\"https://seangeiger.substack.com/p/a-brief-history-of-oral-peptides\"\u003ehistory of oral peptides\u003c/a\u003e traces the full arc: the first attempt at oral insulin was in 1922. Over a hundred years and thirteen companies later, no oral insulin exists.\u003c/p\u003e\n\u003cp\u003eNovo Nordisk spent decades and $1.8 billion acquiring the technology to get around this problem. The result, approved in December 2025 as \u003ca href=\"https://www.endocrinologyadvisor.com/news/fda-approves-oral-wegovy-for-weight-management/\"\u003eoral Wegovy for obesity\u003c/a\u003e, is a pill that destroys 99% of its own active ingredient before the remaining fraction reaches the bloodstream. The oral 25mg daily dose uses roughly 280x more semaglutide than the equivalent weekly injection. This is the best that peptide oral delivery can do. Eli Lilly decided to skip it entirely, building Foundayo, a small molecule oral obesity drug that isn\u0026rsquo;t a peptide at all. That divergence in approach will determine who captures the majority of a market that \u003ca href=\"https://www.goldmansachs.com/insights/articles/anti-obesity-drug-market\"\u003eGoldman Sachs projects\u003c/a\u003e at $100+ billion by 2030 and that \u003ca href=\"https://www.jpmorgan.com/insights/global-research/current-events/obesity-drugs\"\u003eJ.P. Morgan estimates\u003c/a\u003e will reach 30 million US users within five years.\u003c/p\u003e\n\u003ch2 id=\"oral-semaglutide\"\u003eOral semaglutide\u003c/h2\u003e\n\u003cp\u003eSean Geiger\u0026rsquo;s \u003ca href=\"https://seangeiger.substack.com/p/a-brief-history-of-oral-peptides\"\u003ehistory of oral peptides\u003c/a\u003e traces the science well. The technology that makes oral semaglutide possible is SNAC (salcaprozate sodium), a permeation enhancer developed by Emisphere Technologies starting in the 1990s. Novo partnered with Emisphere in 2007 and \u003ca href=\"https://www.novonordisk.com/content/nncorp/global/en/news-and-media/news-and-ir-materials/news-details.html?id=916472\"\u003eacquired the company outright in 2020\u003c/a\u003e. SNAC does three things simultaneously: it buffers local stomach pH to suppress pepsin, prevents semaglutide from clumping into inactive oligomers, and temporarily fluidizes gastric cell membranes so the drug can cross. The \u003ca href=\"https://www.ema.europa.eu/en/documents/assessment-report/rybelsus-epar-public-assessment-report_en.pdf\"\u003eEMA\u0026rsquo;s public assessment report\u003c/a\u003e puts the resulting bioavailability at roughly 0.4 to 1%. The \u003ca href=\"https://www.accessdata.fda.gov/drugsatfda_docs/label/2024/213051s018lbl.pdf\"\u003eFDA label\u003c/a\u003e confirms: the vast majority of each dose is destroyed.\u003c/p\u003e\n\u003cp\u003eThis creates a problem that\u0026rsquo;s easy to state and hard to solve. If you need 280x more API per equivalent dose, your manufacturing cost structure looks nothing like the injectable. A \u003ca href=\"https://www.fastcompany.com/91071415/your-1000-per-month-ozempic-costs-5-to-make-says-study\"\u003eYale/King\u0026rsquo;s College study published in JAMA\u003c/a\u003e found injectable semaglutide costs $0.89 to $4.73 per month to manufacture at the API level. Scale that by 280x and you get oral API costs somewhere in the range of $770 to $1,460 per year, according to \u003ca href=\"https://themedicinemaker.com/issues/2026/articles/january/oral-glp-1s-won-t-win-on-convenience-they-ll-win-on-cmc/\"\u003eThe Medicine Maker\u0026rsquo;s January 2026 analysis\u003c/a\u003e. Still below the selling price. But the margin compression is real, and SNAC itself is a costly excipient.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-oral-bioavailability-trap-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/oral-bioavailability-trap.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/oral-bioavailability-trap.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/oral-bioavailability-trap.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/oral-bioavailability-trap.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-bioavailability-trap.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/oral-bioavailability-trap.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/oral-bioavailability-trap.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/oral-bioavailability-trap.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-bioavailability-trap.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/oral-bioavailability-trap.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/oral-bioavailability-trap.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-bioavailability-trap.png\"\n           alt=\"Oral semaglutide bioavailability trap: 280x more API per dose than injectable Wegovy, SNAC achieves only 1% absorption, while Eli Lilly Foundayo bypasses the peptide oral delivery problem\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-oral-bioavailability-trap-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/oral-bioavailability-trap.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Oral semaglutide bioavailability trap: 280x more API per dose than injectable Wegovy, SNAC achieves only 1% absorption, while Eli Lilly Foundayo bypasses the peptide oral delivery problem\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eSNAC is also oddly specific. \u003ca href=\"https://seangeiger.substack.com/p/a-brief-history-of-oral-peptides\"\u003eGeiger notes\u003c/a\u003e that Novo tried it with liraglutide, a closely related GLP-1 analog, and it failed because liraglutide forms oligomers that SNAC can\u0026rsquo;t break apart. After over three decades of work, exactly two FDA-approved oral peptide drugs using permeation enhancers exist: Rybelsus/oral Wegovy (SNAC) and Mycapssa (oral octreotide for acromegaly, a different enhancer called TPE). That\u0026rsquo;s the entire commercial output of the field.\u003c/p\u003e\n\u003ch2 id=\"foundayo-lillys-structural-advantage\"\u003eFoundayo: Lilly\u0026rsquo;s structural advantage\u003c/h2\u003e\n\u003cp\u003eEli Lilly\u0026rsquo;s \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lillys-oral-glp-1-orforglipron-superior-oral-semaglutide-head\"\u003eorforglipron\u003c/a\u003e, approved by the FDA on April 1, 2026 under the brand name \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/fda-approves-lillys-foundayotm-orforglipron-only-glp-1-pill\"\u003eFoundayo\u003c/a\u003e, is not an oral peptide. It\u0026rsquo;s a non-peptide small molecule GLP-1 receptor agonist that activates the same receptor through a different mechanism. Discovered by Chugai Pharmaceutical and licensed by Lilly in 2018, orforglipron requires no SNAC, no fasting window, no cold chain storage, and is manufactured through standard chemical synthesis rather than solid-phase peptide synthesis. The bioavailability problem doesn\u0026rsquo;t apply because the molecule was designed from the ground up to survive the gut.\u003c/p\u003e\n\u003cp\u003eThe clinical data backs this up. In \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lillys-oral-glp-1-orforglipron-superior-oral-semaglutide-head\"\u003eACHIEVE-3\u003c/a\u003e (1,698 patients with type 2 diabetes, 52 weeks), orforglipron at 12mg and 36mg was superior to oral semaglutide on both HbA1c reduction and weight loss: the first head-to-head victory over Novo\u0026rsquo;s oral product. In \u003ca href=\"https://www.appliedclinicaltrialsonline.com/view/eli-lilly-oral-glp1-orforglipron-efficacy-safety-injectable-phaseiii-trial\"\u003eATTAIN-2\u003c/a\u003e (obesity with type 2 diabetes), orforglipron delivered 10.5% weight loss at 72 weeks versus 2.2% on placebo. And in \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lillys-orforglipron-helped-people-maintain-weight-loss-after\"\u003eATTAIN-MAINTAIN\u003c/a\u003e, patients who switched from injectable Wegovy or Mounjaro to oral orforglipron maintained their weight within 0.9 kg over 52 weeks. A pill that holds the gains of an injection.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eLilly \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lillys-oral-glp-1-orforglipron-successful-third-phase-3-trial\"\u003esubmitted the NDA\u003c/a\u003e with a priority review voucher and received \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/fda-approves-lillys-foundayotm-orforglipron-only-glp-1-pill\"\u003eFDA approval on April 1, 2026\u003c/a\u003e, the fastest approval of a new molecular entity since 2002. Foundayo is available starting at $149 per month for self-pay patients, with savings card prices as low as $25 per month. The company is investing \u003ca href=\"https://cen.acs.org/pharmaceuticals/pharmaceutical-chemicals/Lilly-pour-65-billion-GLP/103/web/2025/09\"\u003e$6.5 billion in a dedicated oral manufacturing facility\u003c/a\u003e and $27 billion total in US manufacturing capacity.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-peptide-vs-small-molecule-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/peptide-vs-small-molecule.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/peptide-vs-small-molecule.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/peptide-vs-small-molecule.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/peptide-vs-small-molecule.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/peptide-vs-small-molecule.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/peptide-vs-small-molecule.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/peptide-vs-small-molecule.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/peptide-vs-small-molecule.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/peptide-vs-small-molecule.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/peptide-vs-small-molecule.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/peptide-vs-small-molecule.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/peptide-vs-small-molecule.png\"\n           alt=\"Foundayo orforglipron vs oral Wegovy semaglutide comparison: peptide plus SNAC approach versus small molecule across bioavailability, manufacturing cost, fasting requirements, and ACHIEVE-3 clinical results\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-peptide-vs-small-molecule-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/peptide-vs-small-molecule.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Foundayo orforglipron vs oral Wegovy semaglutide comparison: peptide plus SNAC approach versus small molecule across bioavailability, manufacturing cost, fasting requirements, and ACHIEVE-3 clinical results\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003ch2 id=\"70-billion-duopoly-and-its-widening-crack\"\u003e$70 billion duopoly and its widening crack\u003c/h2\u003e\n\u003cp\u003eThe gap is widening. Combined GLP-1 revenue from Novo and Lilly hit roughly $70 billion in 2025. But the composition shifted. Lilly\u0026rsquo;s tirzepatide franchise (Mounjaro plus Zepbound) \u003ca href=\"https://www.fiercepharma.com/pharma/even-pricing-headwinds-eli-lilly-expects-sales-continue-surge-2026\"\u003egenerated $36.5 billion\u003c/a\u003e, with Zepbound alone growing 175% year-over-year. Novo\u0026rsquo;s semaglutide franchise came in around $33 billion, with growth decelerating to roughly 10% in constant exchange rates. \u003ca href=\"https://www.cnbc.com/2026/02/04/eli-lilly-novo-nordisk-earnings-glp1-market.html\"\u003eLilly\u0026rsquo;s US market share hit 57%\u003c/a\u003e by mid-2025, up from 41% a year earlier. Novo\u0026rsquo;s share fell to 43%.\u003c/p\u003e\n\u003cp\u003eThe stock market has been ruthless in pricing this shift. Novo trades at roughly $48 per ADR share, down 65% from its June 2024 peak of $142, a loss exceeding $350 billion in market cap. The company \u003ca href=\"https://www.cnbc.com/2026/02/04/eli-lilly-novo-nordisk-earnings-glp1-market.html\"\u003eguided for a 5 to 13% revenue decline in 2026\u003c/a\u003e, driven by patent expirations in Canada, Brazil, and China, plus pricing pressure from the Trump administration\u0026rsquo;s drug pricing framework. CagriSema, Novo\u0026rsquo;s most important pipeline asset, \u003ca href=\"https://www.biopharmadive.com/news/novo-nordisk-cagrisema-obesity-drug-study-results/735854/\"\u003edisappointed twice\u003c/a\u003e: 22.7% weight loss in REDEFINE 1 (below the company\u0026rsquo;s own 25% guidance) and 15.7% in REDEFINE 2. \u003ca href=\"https://www.cnbc.com/2024/12/20/novo-nordisk-shares-plunge-22percent-after-cagrisema-obesity-drug-trial-results.html\"\u003eNovo\u0026rsquo;s stock plunged 20% on the first readout alone\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eLilly, by contrast, \u003ca href=\"https://www.fiercepharma.com/pharma/even-pricing-headwinds-eli-lilly-expects-sales-continue-surge-2026\"\u003eguided 2026 revenue at $80 to $83 billion\u003c/a\u003e, a 25% increase, and \u003ca href=\"https://finance.yahoo.com/quote/LLY/\"\u003etrades near $1,044\u003c/a\u003e with a market cap around $1 trillion, the first pharma company to reach that level. Forward P/E: roughly 30x versus Novo\u0026rsquo;s 12.5x. That 2.4x valuation premium reflects a simple thesis: Lilly has the better drug (Zepbound \u003ca href=\"https://www.nejm.org/doi/full/10.1056/NEJMoa2416394\"\u003eshowed 47% greater weight loss\u003c/a\u003e than Wegovy in the SURMOUNT-5 head-to-head), the better oral pipeline, and the longer patent runway (tirzepatide patents extend into the mid-2030s versus \u003ca href=\"https://www.trademarkia.com/news/patents/when-does-the-ozempic-patent-expire\"\u003esemaglutide\u0026rsquo;s core US patent expiring December 2031\u003c/a\u003e, with biosimilar competition likely following shortly after).\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-vs-lilly-duopoly-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-vs-lilly-duopoly.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-vs-lilly-duopoly.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-vs-lilly-duopoly.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-vs-lilly-duopoly.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-vs-lilly-duopoly.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-vs-lilly-duopoly.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-vs-lilly-duopoly.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-vs-lilly-duopoly.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-vs-lilly-duopoly.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-vs-lilly-duopoly.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-vs-lilly-duopoly.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-vs-lilly-duopoly.png\"\n           alt=\"Novo Nordisk vs Eli Lilly GLP-1 duopoly: Lilly at 2.4x Novo forward PE, 57% US market share, revenue and patent runway comparison\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-vs-lilly-duopoly-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-vs-lilly-duopoly.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Novo Nordisk vs Eli Lilly GLP-1 duopoly: Lilly at 2.4x Novo forward PE, 57% US market share, revenue and patent runway comparison\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe Hims \u0026amp; Hers saga sits at the chaotic edge of all this. HIMS \u003ca href=\"https://finance.yahoo.com/news/nvo-lly-stocks-slide-hims-142700745.html\"\u003elaunched a $49 per month compounded oral semaglutide pill\u003c/a\u003e on February 5, 2026, using unproven liposomal technology with no published bioavailability data. Within four days, \u003ca href=\"https://markets.financialcontent.com/stocks/article/marketminute-2026-2-9-the-glp-1-gold-rush-hits-a-wall-novo-nordisk-sues-hims-and-hers-as-fda-crackdown-triggers-20-stock-crash\"\u003eHHS had referred the company to the DOJ\u003c/a\u003e, Novo had \u003ca href=\"https://www.gurufocus.com/news/8587678/novo-nordisk-nvo-shares-plunge-amid-competition-from-hims-hers-hims\"\u003efiled a patent infringement lawsuit\u003c/a\u003e, and HIMS had suspended the product. Novo\u0026rsquo;s CEO alleged independent testing of compounded samples showed impurity levels as high as 86%. What happens when the incentive to undercut $1,000-per-month pricing collides with the actual difficulty of making peptide drugs work orally.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"does-oral-delivery-commoditize-glp-1\"\u003eDoes Oral Delivery Commoditize GLP-1\u003c/h2\u003e\n\u003cp\u003eDoes oral delivery commoditize GLP-1s, or does it expand the market so dramatically that even with pricing pressure, the opportunity grows Early evidence already supports the expansion thesis: \u003ca href=\"https://www.cnbc.com/2026/04/07/novo-nordisks-wegovy-pill-launch-draws-new-wave-of-patients-to-glp-1s.html\"\u003eNovo\u0026rsquo;s oral Wegovy pill uptake is running roughly 10x higher\u003c/a\u003e than the original injectable Wegovy launch, drawing in new patients rather than converting existing injection users.\u003c/p\u003e\n\u003cp\u003eThe statin precedent is the strongest data point we have. After generic atorvastatin launched in 2011, total statin use \u003ca href=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC10203693/\"\u003eexpanded from 31 million to 92 million Americans\u003c/a\u003e by 2019, a \u003cstrong\u003e197% increase\u003c/strong\u003e. Total prescription volume grew 77%. The per-unit price collapsed, but total market volume more than compensated. Updated clinical guidelines, lower copays, and reduced patient resistance combined to pull in millions of people who would never have started therapy at the original price and delivery format.\u003c/p\u003e\n\u003cp\u003eCurrent penetration is absurdly low: \u003ca href=\"https://icer.org/wp-content/uploads/2025/04/Affordable-Access-to-GLP-1-Obesity-Medications-_-ICER-White-Paper-_-04.09.2025.pdf\"\u003efewer than 5% of eligible US adults\u003c/a\u003e are on anti-obesity medication therapy, against 104 million with obesity. At statin-like penetration rates of 35% or higher, that\u0026rsquo;s a 5 to 10x expansion. Persistence data reinforces the point: only \u003ca href=\"https://www.primetherapeutics.com/documents/d/primetherapeutics/prime-therapeutics_glp-1-therapy-to-treat-obesity-among-members-without-diabetes_three-year-persistence\"\u003e32% of obesity patients persist at one year and 15% at two years\u003c/a\u003e. Side effects account for 43.7% of discontinuation, financial barriers for 30.9%. Adherence collapses when the friction is high. An oral weight loss pill that\u0026rsquo;s cheaper, eliminates the injection barrier, and has no fasting restrictions (orforglipron) attacks all three.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"oral-glp-1-pipeline\"\u003eOral GLP-1 pipeline\u003c/h2\u003e\n\u003cp\u003eThe rest of the oral GLP-1 pipeline is worth tracking but the outcomes are uncertain. \u003ca href=\"https://www.prnewswire.com/news-releases/viking-therapeutics-announces-positive-top-line-results-from-phase-2-venture-oral-dosing-trial-of-vk2735-tablet-formulation-in-patients-with-obesity-302533355.html\"\u003eViking\u0026rsquo;s oral VK2735\u003c/a\u003e showed rapid weight loss in Phase 2 (up to 12.2% at 13 weeks) but a \u003ca href=\"https://www.biopharmadive.com/news/viking-oral-obesity-drug-results-study-discontinuationsdata-dropout/758019/\"\u003e38% discontinuation rate\u003c/a\u003e at the highest dose sent the stock down 37%. \u003ca href=\"https://ir.structuretx.com/news-releases/news-release-details/structure-therapeutics-reports-positive-topline-data-access\"\u003eStructure Therapeutics\u0026rsquo; aleniglipron\u003c/a\u003e posted 15.3% placebo-adjusted weight loss at 36 weeks in Phase 2b, competitive numbers with no plateau, and has $786 million in cash to fund Phase 3. \u003ca href=\"https://www.statnews.com/2025/04/14/pfizer-discontinue-danuglipron-glp-1-obesity-liver-toxicity/\"\u003ePfizer\u0026rsquo;s danuglipron was killed\u003c/a\u003e by liver toxicity in April 2025, the second Pfizer oral GLP-1 failure. \u003ca href=\"https://ir.ternspharma.com/news-releases/news-release-details/terns-pharmaceuticals-reports-topline-12-week-data-its-phase-2\"\u003eTerns Pharmaceuticals also exited\u003c/a\u003e after weak Phase 2 data and liver enzyme elevations. Behind them, Novo\u0026rsquo;s oral amycretin, a GLP-1/amylin dual agonist, enters Phase 3 in 2026 and could offer best-in-class weight loss if the oral formulation holds up. Oral small molecule GLP-1 development has a meaningful failure rate, and Foundayo\u0026rsquo;s clean safety profile across multiple Phase 3 trials is not something I\u0026rsquo;d assume the next entrant can replicate.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-oral-glp1-pipeline-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/oral-glp1-pipeline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/oral-glp1-pipeline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/oral-glp1-pipeline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/oral-glp1-pipeline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-glp1-pipeline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/oral-glp1-pipeline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/oral-glp1-pipeline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/oral-glp1-pipeline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-glp1-pipeline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/oral-glp1-pipeline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/oral-glp1-pipeline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/oral-glp1-pipeline.png\"\n           alt=\"Oral GLP-1 pipeline 2026: Foundayo approved, aleniglipron Phase 3, VK2735 Phase 2, oral amycretin Phase 3, with Pfizer and Terns programs killed by liver toxicity\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-oral-glp1-pipeline-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/oral-glp1-pipeline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Oral GLP-1 pipeline 2026: Foundayo approved, aleniglipron Phase 3, VK2735 Phase 2, oral amycretin Phase 3, with Pfizer and Terns programs killed by liver toxicity\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe thing that makes this market so interesting is that almost every important variable is in motion at the same time: form factor (injection to pill), pricing structure ($1,000 per month to $149 to potentially lower), patent protection (expiring internationally, holding domestically), competitive dynamics (Novo decelerating, Lilly sprinting with Foundayo, Hims imploding), and the macro question of Medicare coverage. I\u0026rsquo;m more confident in the structural thesis, that oral GLP-1s expand the market through a Jevons-like dynamic, than I am in picking the right entry point for any individual stock. But if forced to bet on which company is best positioned for that expansion, the answer seems clear. Lilly built the molecule that doesn\u0026rsquo;t need to fight the gut. Novo built one that fights and mostly loses.\u003c/p\u003e\n\u003cp\u003eAt 30x forward earnings for Lilly and 12.5x for Novo, there\u0026rsquo;s a version of this where Novo is the contrarian value play and Lilly is priced for perfection. I don\u0026rsquo;t think that\u0026rsquo;s the right framing. I think Novo is cheap because it has structural problems, the worst kind of cheap, and Lilly is expensive because it has structural advantages, the best kind of expensive.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"Oral semaglutide destroys 99% of its active ingredient per dose. Lilly's Foundayo skips the problem entirely. Inside the $70B oral GLP-1 pill race.","image":"https://static.philippdubach.com/ograph/ograph-oral-peptides1.jpg","date_published":"2026-04-09T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Medicine"],"_philippdubach":{"type":"Analysis","word_count":1754,"reading_time_minutes":9,"keywords":["Foundayo vs oral Wegovy","orforglipron vs oral semaglutide","oral GLP-1 weight loss pill 2026","oral semaglutide bioavailability SNAC","GLP-1 market size obesity 2030","Foundayo orforglipron FDA approval 2026","Foundayo weight loss pill price","Novo Nordisk vs Eli Lilly GLP-1 2026","oral weight loss pill GLP-1","small molecule GLP-1 receptor agonist","oral semaglutide Wegovy pill 25mg","GLP-1 manufacturing cost COGS","semaglutide patent expiry 2031 biosimilar","oral GLP-1 pipeline aleniglipron VK2735 amycretin","GLP-1 adherence persistence discontinuation","anti-obesity medication market expansion","Jevons Paradox statin analogy GLP-1"],"section":"posts"}},{"id":"https://philippdubach.com/posts/ambiguity-by-design/","url":"https://philippdubach.com/posts/ambiguity-by-design/","title":"Ambiguity by Design","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-edge-of-knowledge-2-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/edge-of-knowledge-2-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-2-cover.jpg\"\n           alt=\"Editorial illustration: a small figure at a table flanked by two large urns — a transparent glass urn full of mixed coloured marbles on the left, and an opaque dark ceramic urn on the right disappearing into a fog bank, visualizing Ellsberg\u0026#39;s paradox of known versus unknown odds\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-edge-of-knowledge-2-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/edge-of-knowledge-2-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial illustration: a small figure at a table flanked by two large urns — a transparent glass urn full of mixed coloured marbles on the left, and an opaque dark ceramic urn on the right disappearing into a fog bank, visualizing Ellsberg\u0026#39;s paradox of known versus unknown odds\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003cem\u003eInvesting at the Edge of Knowledge, Part 2 · \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003eStart with Part 1\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eEllsberg\u0026rsquo;s urn experiment is one of the cleanest results in decision theory. \u003ca href=\"https://academic.oup.com/qje/article-abstract/75/4/643/1913802\"\u003eDaniel Ellsberg (1961)\u003c/a\u003e put two urns in front of subjects. Urn A: 50 red balls, 50 black. Urn B: 100 balls, red and black, ratio unknown. Pay $100 if you draw the right color. Most people chose Urn A, the known 50/50 bet. Fine so far. But here\u0026rsquo;s the problem: they chose Urn A regardless of which color they were betting on. Bet on red? Prefer Urn A. Bet on black? Still prefer Urn A. This is incoherent. If you think Urn B has fewer red balls (making you avoid it for a red bet), you should prefer it for a black bet. The subjects weren\u0026rsquo;t estimating probabilities at all. They were fleeing the \u003cem\u003efeeling\u003c/em\u003e of not knowing the probability. Ellsberg proved that people make systematically different choices when probabilities are unknown versus known, even when the unknown probabilities carry no actual informational disadvantage. \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205821\"\u003eRichard Zeckhauser\u0026rsquo;s\u003c/a\u003e contribution was to ask\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWhat happens to prices when an entire market makes this choice simultaneously?\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"the-experimental-evidence\"\u003eThe experimental evidence\u003c/h2\u003e\n\u003cp\u003eEllsberg\u0026rsquo;s result spawned a body of work that, six decades later, has only strengthened the original finding. \u003ca href=\"https://academic.oup.com/qje/article-abstract/110/3/585/1859203\"\u003eFox and Tversky (1995)\u003c/a\u003e added a twist that matters enormously for financial markets. Their \u0026ldquo;comparative ignorance hypothesis\u0026rdquo; showed that ambiguity aversion intensifies when people can compare themselves to someone who appears more knowledgeable. In a non-comparative setting, where subjects evaluated an ambiguous bet in isolation, ambiguity aversion largely disappeared. But the moment subjects could compare their knowledge to someone else\u0026rsquo;s, the aversion came roaring back.\u003c/p\u003e\n\u003cp\u003eIn markets, there is always someone who appears more confident. Every sell-side note, every CNBC segment, every hedge fund manager interviewed at Davos projects certainty that you don\u0026rsquo;t feel. The comparative ignorance effect is permanently activated in financial markets. You don\u0026rsquo;t just feel uncertain. You feel uncertain relative to someone who seems to know, and the gap between their apparent confidence and your honest confusion is what drives the exit decision.\u003c/p\u003e\n\u003cp\u003eZeckhauser\u0026rsquo;s own experimental evidence in the 2006 paper extends this further. He ran lottery choice experiments comparing willingness to bet on standard probabilistic gambles versus events with unknown and unknowable (UU) outcomes (to familiarize yourself with this framework \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003estart with Part 1\u003c/a\u003e.) People refused to distinguish between small probabilities of UU events even when the expected value difference was large. The feeling of not-knowing overwhelmed the arithmetic of expected value. Separately, he documented that individuals explicitly warned about overconfidence are still surprised \u003cstrong\u003e35%\u003c/strong\u003e of the time on quantities where they should be surprised only \u003cstrong\u003e2%\u003c/strong\u003e of the time. We simultaneously know less than we think (overconfidence) and refuse to act on what we do know when probabilities are ambiguous (ambiguity aversion).\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"is-ambiguity-aversion-rational\"\u003eIs ambiguity aversion rational?\u003c/h2\u003e\n\u003cp\u003eThis turns out to be a harder question than it looks, and the answer matters for how you think about the mispricing mechanism. The case for \u0026ldquo;yes, it\u0026rsquo;s rational\u0026rdquo; is surprisingly strong. \u003ca href=\"https://www.sciencedirect.com/science/article/abs/pii/0304406889900189\"\u003eGilboa and Schmeidler (1989)\u003c/a\u003e proved that a decision maker who evaluates bets by the worst-case probability in their set of plausible priors is behaving in a way that satisfies all the standard axioms of rational choice except one: the Sure-Thing Principle that Ellsberg\u0026rsquo;s experiment violates. Their maxmin expected utility model says: if you don\u0026rsquo;t know the probability, evaluate the bet as if the probability is the worst one consistent with your information. This is formally coherent. It\u0026rsquo;s also roughly what a good risk manager does when facing an uncertain tail risk. \u003ca href=\"https://link.springer.com/article/10.1007/s102030200006\"\u003eBewley (2002)\u003c/a\u003e, as I discussed in \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003ePart 1\u003c/a\u003e, showed that dropping the completeness axiom produces a framework where inertia, refusing to act, is the rational response when you cannot rank the alternatives. If you can\u0026rsquo;t tell which bet is better, sticking with the status quo isn\u0026rsquo;t lazy. It\u0026rsquo;s defensible.\u003c/p\u003e\n\u003cp\u003eThe case for \u0026ldquo;no, it\u0026rsquo;s a bias\u0026rdquo; rests on the Ellsberg experiment itself. The subjects preferred a known 50% chance over an unknown chance that they could bet on either side of. There is no informational disadvantage. The probability they\u0026rsquo;re fleeing might be 50%, might be 30%, might be 70%, but since they can bet on either color, the expected value is the same regardless. The aversion is to the experience of not-knowing, not to any actual asymmetry in the bet. That looks more like a bug than a feature.\u003c/p\u003e\n\u003cp\u003eI think the answer is \u0026ldquo;it depends,\u0026rdquo; and the distinction matters. Ambiguity aversion is rational when there might be a better-informed party on the other side of the trade. If you\u0026rsquo;re buying a stock and you suspect the seller knows something you don\u0026rsquo;t, demanding a discount for your ignorance is not a bias. It\u0026rsquo;s adverse selection protection. But ambiguity aversion is irrational when you can establish that nobody knows more than you do. When the ambiguity is universal, when the entire market is confused because the state space itself is new, the discount demanded by ambiguity-averse investors is a pricing error, not a risk premium.\u003c/p\u003e\n\u003cp\u003eThis is where I land: ambiguity aversion is a sensible default that gets systematically overweighted in specific situations. The skill is the distinction. And the distinction is judgment, not math.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"discomfort-as-information\"\u003eDiscomfort as information\u003c/h2\u003e\n\u003cp\u003eZeckhauser\u0026rsquo;s most counterintuitive move in the paper is turning ambiguity aversion from a problem into a signal. His Speculation 1 states it directly: \u0026ldquo;UUU investments drive off speculators, which creates the potential for an attractive low price.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe logic is recursive. Your discomfort when facing an ambiguous situation tells you something, but not about the asset. It tells you about the competitive field. If you\u0026rsquo;re uncomfortable, most other potential buyers have already left. The very thing that makes you want to sell, the feeling of not-knowing, is the same thing that has thinned the competition and compressed the price. David Ricardo buying British government bonds on the eve of Waterloo was uncomfortable. Warren Buffett writing earthquake reinsurance for the California Earthquake Authority at roughly five times actuarial value was comfortable only because he had done this inference before: the discomfort of everyone else was the opportunity itself. Zeckhauser\u0026rsquo;s Maxim G puts it memorably\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eDiscounting for ambiguity is a natural tendency that should be overcome, just as should be overeating.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eBoth ambiguity aversion and overeating are evolved heuristics that served us well in ancestral environments and poorly in modern ones. In a small tribal group where the unknown reliably correlated with danger, fleeing ambiguity kept you alive. In a financial market where ambiguity-averse institutional capital mechanically exits positions it can\u0026rsquo;t model, the same instinct creates a systematic transfer of wealth from the ambiguity-averse to the ambiguity-tolerant.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://scholar.harvard.edu/files/iris_bohnet/files/trust_risk_and_betrayal.pdf\"\u003eBohnet and Zeckhauser (2004)\u003c/a\u003e identified a related mechanism they called \u0026ldquo;betrayal aversion.\u0026rdquo; People demand stronger odds when a betraying human rather than indifferent nature determines the outcome. In markets, this manifests as an extra discount demanded when the ambiguity involves a counterparty who might be exploiting your ignorance. The mere possibility that someone on the other side knows more amplifies the ambiguity premium beyond what the uncertainty alone would justify.\u003c/p\u003e\n\u003cp\u003eNow apply all of this to a more recent example, the SaaSpocalypse. I \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003ewrote about the details\u003c/a\u003e elsewhere, but the relevant point here is the mechanism. When Anthropic released the Claude Cowork plugins in late January, institutional investors didn\u0026rsquo;t sit down and estimate the probability that AI would replace CRM. They faced something worse: they couldn\u0026rsquo;t define what \u0026ldquo;replacing CRM\u0026rdquo; would even mean. The state space was undefined, as I argued in \u003ca href=\"/posts/three-kinds-of-not-knowing/\"\u003ePart 1\u003c/a\u003e. And when the state space is undefined, the entire institutional machinery for processing uncertainty breaks down simultaneously.\u003c/p\u003e\n\u003cp\u003eFiduciary duty requires estimable risk. Compliance models require defined scenarios. Portfolio managers face career risk: losing money on a position you can\u0026rsquo;t explain is a firing offense; missing a rally in something you sold is merely embarrassing. The institutional constraints compound the ambiguity aversion. Each layer of oversight demands a model, and the model requires defined states, and the states don\u0026rsquo;t exist yet. The rational response for any individual institutional actor was to sell. The collective result was an IGV drawdown of \u003cstrong\u003e32%\u003c/strong\u003e while sector earnings grew \u003cstrong\u003e17%\u003c/strong\u003e, an RSI of \u003cstrong\u003e18\u003c/strong\u003e, and \u003cstrong\u003e$2 trillion\u003c/strong\u003e in evaporated market cap.\u003c/p\u003e\n\u003cp\u003eThe sellers weren\u0026rsquo;t acting on information. They were acting on ambiguity aversion, amplified by comparative ignorance (everyone else seemed to be selling too), amplified by career risk (nobody gets fired for selling software before the AI disruption), amplified by betrayal aversion (maybe the AI insiders knew something the market didn\u0026rsquo;t). Stack these amplifiers on top of Ellsberg\u0026rsquo;s basic finding, and you get a price that reflects the intensity of collective discomfort rather than any assessment of fundamentals.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eZeckhauser describes the investor\u0026rsquo;s challenge with a bridge analogy: you have to make peace with good decisions that lead to bad outcomes. Buying the IGV at $80 with an 18 RSI and 17% earnings growth is, on the framework, a good decision. If it drops to $70 first, that doesn\u0026rsquo;t make it a bad decision. But making that distinction under ambiguity is not an analytical skill. It\u0026rsquo;s a temperamental one. It requires accepting that \u0026ldquo;I don\u0026rsquo;t know\u0026rdquo; is not disqualifying and that the discomfort you feel is shared, priced in, and possibly overpriced. That\u0026rsquo;s harder than any calculation.\u003c/p\u003e\n\u003cp\u003eKnowing that ambiguity aversion creates mispricing is the easy part. The hard part is what comes next: when you\u0026rsquo;re facing someone on the other side of a trade in a UU world, how do you figure out whether they know something you don\u0026rsquo;t, or whether they\u0026rsquo;re just less uncomfortable than you are? That\u0026rsquo;s the domain of sidecar investing and strategic inference. That\u0026rsquo;s \u003ca href=\"/posts/the-geometry-of-who-knows-what/\"\u003ePart 3\u003c/a\u003e.\u003c/p\u003e\n","summary":"Ellsberg proved people flee unknown odds. Zeckhauser showed their flight creates mispricing. Part 2 on ambiguity aversion, comparative ignorance, and investing.","image":"https://static.philippdubach.com/ograph/ograph-ambiguity-by-design.jpg","date_published":"2026-04-08T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Analysis","word_count":1616,"reading_time_minutes":8,"keywords":["ambiguity aversion investing","Ellsberg paradox finance","Zeckhauser unknown unknowable","comparative ignorance Fox Tversky","Knightian uncertainty mispricing","ambiguity aversion vs risk aversion","Gilboa Schmeidler maxmin expected utility","Bewley inertia uncertainty","betrayal aversion Bohnet Zeckhauser","SaaSpocalypse IGV sell-off ambiguity","ambiguity premium equity mispricing","Ellsberg paradox explained","behavioral finance ambiguity aversion","career risk ambiguity aversion","institutional constraints uncertainty"],"section":"posts"}},{"id":"https://philippdubach.com/posts/three-kinds-of-not-knowing/","url":"https://philippdubach.com/posts/three-kinds-of-not-knowing/","title":"Three Kinds of Not-Knowing","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-edge-of-knowledge-1-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/edge-of-knowledge-1-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/edge-of-knowledge-1-cover.jpg\"\n           alt=\"Editorial illustration: a charcoal silhouette of a figure stands at the edge of a fog bank looking at three lanterns hung on a wire, the leftmost burning warm ochre, the middle dim, the rightmost unlit and barely visible in the fog — encoding risk, uncertainty, and ignorance\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-edge-of-knowledge-1-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/edge-of-knowledge-1-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial illustration: a charcoal silhouette of a figure stands at the edge of a fog bank looking at three lanterns hung on a wire, the leftmost burning warm ochre, the middle dim, the rightmost unlit and barely visible in the fog — encoding risk, uncertainty, and ignorance\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003cem\u003eInvesting at the Edge of Knowledge, Part 1\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eDavid Ricardo made a fortune buying British government bonds four days before the Battle of Waterloo. He was not a military analyst. He had no basis to compute the odds of Napoleon\u0026rsquo;s defeat, or victory, or any of the ambiguous outcomes in between. But he understood something that most of his contemporaries did not: the nature of his own ignorance was the same as everyone else\u0026rsquo;s, the seller was desperate, competition was thin, and the pounds he\u0026rsquo;d gain if Wellington won were worth far more than the pounds he\u0026rsquo;d lose if Wellington fell.\u003c/p\u003e\n\u003cp\u003eRicardo\u0026rsquo;s edge was not information. It was a correct assessment of what kind of not-knowing he was facing.\u003c/p\u003e\n\u003cp\u003eThat distinction, between different kinds of not-knowing, is mostly absent from finance. Richard Zeckhauser, the Frank P. Ramsey Professor of Political Economy at Harvard, made it the foundation of his 2006 paper \u0026ldquo;\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205821\"\u003eInvesting in the Unknown and Unknowable\u003c/a\u003e,\u0026rdquo; published in \u003cem\u003eCapitalism and Society\u003c/em\u003e. The paper takes no derivatives and runs no regressions. What it does instead is more valuable: it provides a taxonomy of not-knowing, and then shows why the category that finance theory handles worst is the one where the biggest fortunes have been made.\u003c/p\u003e\n\u003cp\u003eThis is Part 1 of a five-part series that works through Zeckhauser\u0026rsquo;s framework and extends it. The goal is not a literature review. It\u0026rsquo;s an attempt to build a working vocabulary for the kind of investing that modern portfolio theory was never designed to address.\u003c/p\u003e\n\u003ch2 id=\"the-taxonomy\"\u003eThe taxonomy\u003c/h2\u003e\n\u003cp\u003eZeckhauser presents three categories of not-knowing. Each demands different skills. Each rewards a different kind of investor. And the jump between them is not a smooth gradient. It\u0026rsquo;s a cliff.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-three-boxes-of-not-knowing-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/three-boxes-of-not-knowing.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/three-boxes-of-not-knowing.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/three-boxes-of-not-knowing.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/three-boxes-of-not-knowing.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-boxes-of-not-knowing.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/three-boxes-of-not-knowing.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/three-boxes-of-not-knowing.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/three-boxes-of-not-knowing.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-boxes-of-not-knowing.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/three-boxes-of-not-knowing.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/three-boxes-of-not-knowing.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-boxes-of-not-knowing.png\"\n           alt=\"Zeckhauser\u0026#39;s three categories of not-knowing in investing: risk with known distributions, uncertainty with unknown probabilities, and ignorance where states are undefined\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-three-boxes-of-not-knowing-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/three-boxes-of-not-knowing.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Zeckhauser\u0026#39;s three categories of not-knowing in investing: risk with known distributions, uncertainty with unknown probabilities, and ignorance where states are undefined\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe first box is risk. Probabilities are known, distributions of returns are known, and the challenge is optimization. This is the world of the capital asset pricing model, of mean-variance portfolios, of the efficient frontier. You hold a 60/40 stock-bond portfolio and rebalance quarterly. The math is clean. The Nobel Prizes were awarded. Finance education lives here.\u003c/p\u003e\n\u003cp\u003eThe second box is uncertainty. You can identify the possible states of the world, but you can\u0026rsquo;t assign reliable probabilities. A corporate bond analyst looking at incomplete financials knows the company might default or might not, knows the recovery rate might be 40 cents or 60 cents, but can\u0026rsquo;t compute a precise probability for either. The skill that pays here is Bayesian estimation: forming the best prior you can from limited data, updating as information arrives, and having the temperament to act on imperfect beliefs. This is harder than Box 1, but it\u0026rsquo;s still recognizable territory. Decision theory was built for it.\u003c/p\u003e\n\u003cp\u003eThe third box is ignorance. Zeckhauser abbreviates it UU: unknown and unknowable. Here, even the identity of possible future states is undefined. You don\u0026rsquo;t have a distribution to estimate because you can\u0026rsquo;t enumerate what you\u0026rsquo;re estimating over. The question isn\u0026rsquo;t \u0026ldquo;what\u0026rsquo;s the probability of outcome X?\u0026rdquo; It\u0026rsquo;s \u0026ldquo;what even is X?\u0026rdquo; This is where Ricardo was standing at Waterloo. This is where Warren Buffett was standing in 1996 when he wrote a \u003ca href=\"https://www.berkshirehathaway.com/letters/1996.html\"\u003e$1.5 billion reinsurance policy\u003c/a\u003e for the California Earthquake Authority at a premium far above actuarial estimates, coverage that the capital markets had failed to place. The New York financial community couldn\u0026rsquo;t model the risk. Buffett\u0026rsquo;s insight was that nobody could, that the Authority was not better informed about seismic activity than he was, and that the price was absurdly favorable given the symmetry of ignorance.\u003c/p\u003e\n\u003cp\u003eThe boxes are not a spectrum. You don\u0026rsquo;t get from Box 2 to Box 3 by adding more uncertainty. You get there when the state space itself is undefined. In Box 2, you might not know whether a company will default, but you know that \u0026ldquo;default\u0026rdquo; and \u0026ldquo;no default\u0026rdquo; are the relevant categories. In Box 3, you don\u0026rsquo;t even know the categories. That\u0026rsquo;s a qualitative difference, not a quantitative one.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"why-finance-forgot-the-third-box\"\u003eWhy finance forgot the third box\u003c/h2\u003e\n\u003cp\u003eThe strange thing is that the third box was identified a century ago. Twice, independently, in the same year.\u003c/p\u003e\n\u003cp\u003eFrank Knight published \u003ca href=\"https://oll.libertyfund.org/titles/knight-risk-uncertainty-and-profit\"\u003e\u003cem\u003eRisk, Uncertainty and Profit\u003c/em\u003e\u003c/a\u003e in 1921. His central argument, the origin of what economists now call Knightian uncertainty, was that entrepreneurial profit is compensation for bearing true uncertainty: situations where probabilities cannot be meaningfully calculated. Risk, in Knight\u0026rsquo;s framework, is insurable. Uncertainty is not. The distinction is not about the degree of confidence in your estimate. It\u0026rsquo;s about whether the concept of a probability estimate even applies.\u003c/p\u003e\n\u003cp\u003eJohn Maynard Keynes published \u003ca href=\"https://archive.org/details/treatiseonprobab007528mbp\"\u003e\u003cem\u003eA Treatise on Probability\u003c/em\u003e\u003c/a\u003e the same year. His angle was different but convergent. Keynes introduced the idea of the \u0026ldquo;weight of evidence\u0026rdquo;: a thin body of evidence yields low weight even when the point estimate looks reasonable. In his \u003ca href=\"https://academic.oup.com/qje/article-abstract/51/2/209/1939387\"\u003e1937 \u003cem\u003eQuarterly Journal of Economics\u003c/em\u003e article\u003c/a\u003e, he made the distinction explicit: \u0026ldquo;By \u0026lsquo;uncertain\u0026rsquo; knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable. The game of roulette is not subject, in this sense, to uncertainty.\u0026rdquo; Roulette is risky. The future of interest rates, the price of copper twenty years out, the obsolescence of a technology: these are uncertain in the deeper sense. The distinction mattered to Keynes, and it should matter to anyone building a portfolio.\u003c/p\u003e\n\u003cp\u003eBoth arguments lost. The discipline moved toward formalization, and formalization required calculable probabilities. The efficient markets hypothesis, rational expectations, CAPM, Black-Scholes: all of these live in Box 1 or assume that Box 2 can be reduced to Box 1 with sufficient data and computing power. This isn\u0026rsquo;t a criticism of these models within their domain. They\u0026rsquo;re brilliant engineering for the problems they were designed to solve. It\u0026rsquo;s a claim about the boundaries of that domain, and about how much of real-world investing sits outside it.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://doi.org/10.1086/261461\"\u003eLeRoy and Singell (1987)\u003c/a\u003e offered a provocative reinterpretation in the \u003cem\u003eJournal of Political Economy\u003c/em\u003e: Knight\u0026rsquo;s real distinction, they argued, was about insurability, not probability. Uncertainty describes situations where insurance markets collapse because of moral hazard and adverse selection, not simply because probabilities are subjective. This reading is more radical than the standard one. It says the breakdown isn\u0026rsquo;t epistemic (we don\u0026rsquo;t know enough) but structural (the market itself can\u0026rsquo;t price the risk). That structural breakdown is precisely what happened in 1996 when Wall Street couldn\u0026rsquo;t write the California earthquake policy, and again in 2025 when insurance markets in parts of the American Southeast and West simply stopped functioning.\u003c/p\u003e\n\u003cp\u003eKay and King picked up this thread in their 2020 book \u003ca href=\"https://wwnorton.com/books/9781324004776\"\u003e\u003cem\u003eRadical Uncertainty\u003c/em\u003e\u003c/a\u003e, arguing that the conflation of risk and uncertainty has caused systematic mismanagement across finance and policy. Their prescription is \u0026ldquo;narrative reasoning\u0026rdquo; rather than probabilistic optimization for decisions facing genuine uncertainty. I\u0026rsquo;m not sure narrative reasoning is sufficient, but I\u0026rsquo;m confident that probabilistic optimization is insufficient. The honest position is somewhere in between, and Zeckhauser\u0026rsquo;s framework gives you the vocabulary to think about where.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://link.springer.com/article/10.1007/s102030200006\"\u003eBewley (2002)\u003c/a\u003e formalized the problem differently. Working from a \u003ca href=\"https://elischolar.library.yale.edu/cowles-discussion-paper-series/1050/\"\u003e1986 Cowles Foundation paper\u003c/a\u003e, he dropped the completeness axiom from expected utility theory. In standard theory, you can always rank alternatives: you prefer A to B, or B to A, or you\u0026rsquo;re indifferent. Bewley said: sometimes you simply can\u0026rsquo;t rank them. When alternatives are incomparable, sticking with the status quo is rational, not a bias. This gives mathematical expression to something practitioners know in their bones: there\u0026rsquo;s a difference between \u0026ldquo;I\u0026rsquo;m going to hold because I think the price will go up\u0026rdquo; and \u0026ldquo;I\u0026rsquo;m going to hold because I have no coherent basis for predicting what will happen and the cost of acting without a basis is higher than the cost of staying put.\u0026rdquo;\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"why-knightian-uncertainty-is-growing\"\u003eWhy Knightian uncertainty is growing\u003c/h2\u003e\n\u003cp\u003eThis hundred-year-old taxonomy feels more relevant in 2026 than it did in 2006. Technological change creates entirely new categories of outcomes faster than models can absorb them. The state space itself is expanding.\u003c/p\u003e\n\u003cp\u003eI wrote recently about \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003ethe SaaSpocalypse paradox\u003c/a\u003e: the market simultaneously pricing AI capex failure and AI destroying all enterprise software, when both cannot be true. That sell-off is a textbook example of Zeckhauser\u0026rsquo;s third box. The problem wasn\u0026rsquo;t that investors struggled to estimate the probability of known outcomes. The problem was that the outcomes themselves were undefined. What does \u0026ldquo;CRM\u0026rdquo; mean when AI agents replace human users? What does \u0026ldquo;per-seat licensing\u0026rdquo; mean when the number of seats might go to zero or might multiply by ten as agents proliferate? What does \u0026ldquo;enterprise software moat\u0026rdquo; mean when the moat was always the trained-user interface and the interface is now natural language? These aren\u0026rsquo;t questions with difficult probability estimates. They\u0026rsquo;re questions where the categories haven\u0026rsquo;t been invented yet.\u003c/p\u003e\n\u003cp\u003eNobody in January 2026 could enumerate the states of the world for enterprise software post-Claude Cowork plugins. Not \u0026ldquo;the probabilities were hard to estimate.\u0026rdquo; The states themselves were undefined. That\u0026rsquo;s not Box 2. That\u0026rsquo;s Box 3.\u003c/p\u003e\n\u003cp\u003eAnd Box 3 is where the IGV software ETF fell \u003cstrong\u003e32%\u003c/strong\u003e from its September peak, where hedge funds made \u003cstrong\u003e$24 billion\u003c/strong\u003e shorting the sector, where the RSI hit \u003cstrong\u003e18\u003c/strong\u003e (the most oversold reading in the ETF\u0026rsquo;s history), and where earnings growth continued at \u003cstrong\u003e17%\u003c/strong\u003e. The disconnect between operating results and market prices is exactly what Zeckhauser\u0026rsquo;s framework predicts: when the state space is undefined, investors who require defined state spaces to make decisions leave the market. Their departure compresses prices beyond what any fundamental analysis would justify. The mispricing lives in the gap between what the asset is worth and what institutions are able to hold.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis pattern will recur. AI is not the last technology that will generate new categories of outcomes that nobody anticipated. Every time it happens, the same sequence plays out: Box 3 conditions emerge, institutions flee because their models require Box 1 or Box 2 inputs, prices overshoot, and unconstrained investors who understand the nature of their own ignorance pick up the pieces. Zeckhauser wrote his paper two decades ago. The mechanism he described has, if anything, accelerated.\u003c/p\u003e\n\u003cp\u003eThe taxonomy tells you what kind of problem you\u0026rsquo;re facing. It doesn\u0026rsquo;t tell you what to do about it. That requires understanding why most investors run from Box 3, and whether running is rational. That\u0026rsquo;s \u003ca href=\"/posts/ambiguity-by-design/\"\u003ePart 2\u003c/a\u003e.\u003c/p\u003e\n","summary":"Knightian uncertainty splits not-knowing into risk, uncertainty, and ignorance. A century after Knight and Keynes, most of investing still ignores the split.","image":"https://static.philippdubach.com/ograph/ograph-edge-of-knowledge.jpg","date_published":"2026-04-04T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Analysis","word_count":1706,"reading_time_minutes":9,"keywords":["Knightian uncertainty investing","risk vs uncertainty investing","risk uncertainty ignorance framework","Zeckhauser unknown unknowable","investing under uncertainty","unknown unknowns investing","radical uncertainty finance","Frank Knight risk uncertainty profit","Keynes weight of evidence","Knightian uncertainty explained","unknown unknowable UU investing","decision making under uncertainty finance","Buffett California earthquake reinsurance","Kay King radical uncertainty","Ricardo Waterloo bonds","enterprise software AI uncertainty 2026","types of uncertainty investing"],"section":"posts"}},{"id":"https://philippdubach.com/posts/on-device-ai-models-will-be-the-new-reason-to-upgrade-your-phone/","url":"https://philippdubach.com/posts/on-device-ai-models-will-be-the-new-reason-to-upgrade-your-phone/","title":"On-Device AI Models Will Be The New Reason to Upgrade Your Phone","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chip-cover-jpg-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chip-cover.jpg 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chip-cover.jpg 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chip-cover.jpg 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chip-cover.jpg 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chip-cover.jpg 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chip-cover.jpg 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chip-cover.jpg 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chip-cover.jpg 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chip-cover.jpg 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chip-cover.jpg 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chip-cover.jpg 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chip-cover.jpg\"\n           alt=\"Editorial cover illustration for an analysis of on-device AI models as the new smartphone upgrade driver\"\n           class=\"\"\n           width=\"1200\"\n           \n           fetchpriority=\"high\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chip-cover-jpg-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chip-cover.jpg\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Editorial cover illustration for an analysis of on-device AI models as the new smartphone upgrade driver\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe iPhone 17 runs a \u003ca href=\"https://machinelearning.apple.com/research/introducing-apple-foundation-models\"\u003e3 billion parameter language model on-device\u003c/a\u003e at 30 tokens per second. Obviously, the average consumer has no idea what that sentence means, and Apple hasn\u0026rsquo;t figured out how to make them care.\u003c/p\u003e\n\u003cp\u003eI believe that\u0026rsquo;s about to change. Apple now has \u003ca href=\"https://9to5mac.com/2026/03/25/new-details-on-apple-google-ai-deal-revealed-including-gemini-changes-report/\"\u003ecomplete access to Google\u0026rsquo;s Gemini model\u003c/a\u003e in its own data centers, with \u003ca href=\"https://www.theinformation.com/newsletters/ai-agenda/apple-can-distill-googles-big-gemini-model\"\u003ethe ability to distill it into smaller models\u003c/a\u003e built for iPhones and iPads. Knowledge distillation works like this: you take a large model, have it perform tasks with detailed reasoning, then feed those reasoning traces to a smaller model until the student learns to mimic the teacher. The smaller model ends up far more capable than if you\u0026rsquo;d trained it from scratch on the same data. Apple can now do this with the full Gemini, not just their own in-house models, and the distilled output runs locally. No internet required.\u003c/p\u003e\n\u003cp\u003eSmartphones haven\u0026rsquo;t had a real upgrade story in years. The camera is great. The screen is great. The processor was fast enough three generations ago. \u003ca href=\"https://www.sellcell.com/blog/how-often-do-people-upgrade-their-phone/\"\u003eBattery life has overtaken price as the top purchase driver\u003c/a\u003e for the first time. The global \u003ca href=\"https://sqmagazine.co.uk/smartphone-statistics/\"\u003ereplacement cycle has stretched to 3.5 years\u003c/a\u003e. People hold onto their phones because nothing about the new one feels different enough. \u003ca href=\"https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2025/gen-ai-on-smartphones.html\"\u003eDeloitte\u0026rsquo;s 2025 TMT Predictions report\u003c/a\u003e frames on-device generative AI as the feature that could break this cycle, if the experience delivers on the promise. On-device AI might become the next reason.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-spec\"\u003eThe spec\u003c/h2\u003e\n\u003cp\u003eIn the late 1990s it was megahertz: Intel and AMD raced clock speeds past the point where consumers could distinguish real-world performance differences, but the number on the box still drove purchases. Then it was megapixels. Samsung shipped a \u003ca href=\"https://semiconductor.samsung.com/news-events/tech-blog/isocell-hp3-200mp-image-sensor-for-epic-details/\"\u003e200 MP camera sensor\u003c/a\u003e knowing that most phones use 16-to-1 pixel binning to output a \u003cstrong\u003e12.5 MP\u003c/strong\u003e image by default.\u003c/p\u003e\n\u003cp\u003eParameters could be next. The \u003ca href=\"https://www.apple.com/iphone-17/specs/\"\u003eiPhone 17\u0026rsquo;s standard A19 chip\u003c/a\u003e has 8GB of RAM. The \u003ca href=\"https://www.apple.com/iphone-17-pro/specs/\"\u003ePro gets 12GB\u003c/a\u003e with faster memory bandwidth, which determines how large a model the phone can run and how quickly. Samsung\u0026rsquo;s 2026 flagships with the \u003ca href=\"https://semiconductor.samsung.com/processor/mobile-processor/exynos-2600/\"\u003eExynos 2600 hit \u003cstrong\u003e80 TOPS\u003c/strong\u003e\u003c/a\u003e on a 2nm process, more than double the prior generation. These are already the numbers in press releases. It\u0026rsquo;s not hard to imagine an Apple keynote where someone says, with rehearsed enthusiasm, that the iPhone 18 Pro runs a 7 billion parameter model while the standard model is limited to 3 billion.\u003c/p\u003e\n\u003cp\u003eThe difference from previous spec wars is that this one might actually correlate with user experience. Megahertz past a certain threshold didn\u0026rsquo;t make Word open faster. Megapixels past 12 MP didn\u0026rsquo;t make photos look better on a phone screen. But a 7 billion parameter model running locally outperforms a 3 billion one on nearly every task. It handles longer documents, follows more complex instructions, holds better conversational context.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"breaking-the-stalemate\"\u003eBreaking the stalemate\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.gartner.com/en/newsroom/press-releases/2025-09-09-gartner-says-worldwide-generative-artificial-intelligence-smartphone-end-user-spending-to-total-us-dollars-298-billion-by-the-end-of-2025\"\u003eGartner projects\u003c/a\u003e GenAI smartphone spending will reach \u003cstrong\u003e$393 billion\u003c/strong\u003e in 2026, up 32% from \u003cstrong\u003e$298 billion\u003c/strong\u003e in 2025. \u003ca href=\"https://my.idc.com/getdoc.jsp?containerId=prUS52478124\"\u003eIDC reports\u003c/a\u003e GenAI smartphone shipments growing \u003cstrong\u003e73%\u003c/strong\u003e year over year. \u003ca href=\"https://finance.yahoo.com/news/exclusive-samsung-double-mobile-devices-030312758.html\"\u003eSamsung has publicly committed\u003c/a\u003e to 800 million AI-enabled devices by end of 2026, doubling its 2025 footprint. \u003ca href=\"https://www.cnbc.com/2024/12/13/apple-is-a-top-pick-for-2025-as-ai-will-drive-iphone-upgrade-cycle-morgan-stanley-says.html\"\u003eMorgan Stanley\u0026rsquo;s latest survey\u003c/a\u003e found iPhone upgrade intentions at \u003cstrong\u003e37%\u003c/strong\u003e, an all-time high, with FY26 shipment forecasts of 260 million units sitting 3% above Street consensus.\u003c/p\u003e\n\u003cp\u003eOn-device AI creates hard hardware requirements in a way that camera improvements and screen upgrades never did. You cannot run a 3 billion parameter model on an iPhone 14. The Neural Engine isn\u0026rsquo;t powerful enough and the memory bandwidth isn\u0026rsquo;t there. \u003ca href=\"https://support.apple.com/en-us/121115\"\u003eApple Intelligence requires an A17 Pro or later\u003c/a\u003e, which means the feature itself creates an upgrade floor. Every year that floor rises. When Apple ships distilled Gemini models that need the A19 Pro\u0026rsquo;s 12GB of RAM, every phone older than 2025 is locked out.\u003c/p\u003e\n\u003cp\u003eThe Gemini deal matters for the hardware cycle because of the distillation pipeline. Apple doesn\u0026rsquo;t need to build frontier-scale models from scratch. They can take Gemini\u0026rsquo;s best capabilities, run them through distillation, and compress the results into models sized for their hardware tiers. A 3 billion parameter model for the standard iPhone. A 5 billion version for the Pro. Maybe a 10 billion model for a future iPad Pro with enough memory and thermal headroom.\u003c/p\u003e\n\u003cp\u003eGoogle is playing a similar game from the other side. The original \u003ca href=\"https://en.wikipedia.org/wiki/Gemini_(language_model)\"\u003eGemini Nano shipped at 1.8 billion parameters\u003c/a\u003e; the updated Nano-2 rose to 3.25 billion. Samsung\u0026rsquo;s \u003ca href=\"https://news.samsung.com/global/samsung-unveils-galaxy-s26-series-the-most-intuitive-galaxy-ai-phone-yet\"\u003eGalaxy S26 ships with on-device Gemini\u003c/a\u003e running on NPUs that are 39% faster than the prior generation. On-device models get larger every hardware generation. Each generation\u0026rsquo;s models don\u0026rsquo;t run well on older hardware. You see where this goes.\u003c/p\u003e\n\u003cp\u003eI find it plausible that within two product cycles, on-device model capability becomes the primary differentiator between phone tiers and between generations. The data isn\u0026rsquo;t there yet: \u003ca href=\"https://www.twice.com/research/the-smartphone-upgrade-cycle-slows\"\u003eonly 17% of Americans\u003c/a\u003e say AI is a major purchase influence today, Apple Intelligence \u003ca href=\"https://finance.yahoo.com/markets/stocks/articles/morgan-stanley-stark-message-investors-164700952.html\"\u003eranked seventh globally\u003c/a\u003e as a reason to upgrade in Morgan Stanley\u0026rsquo;s survey, and \u003ca href=\"https://www.phonearena.com/news/is-the-ai-boom-destroying-your-next-flagship-phones-value_id176913\"\u003eover 40% of users\u003c/a\u003e have privacy concerns about smartphone AI, with half unwilling to pay extra for it. But you can\u0026rsquo;t tell the difference between a 48 MP photo and a 12 MP photo on your phone screen. You can absolutely tell the difference between an AI assistant that understands your question and one that doesn\u0026rsquo;t. The feedback loop is immediate and personal. If the bigger model actually works better, and if the distillation pipeline from Gemini delivers real capability gains, the upgrade incentive is self-reinforcing. People will upgrade not because the spec sheet says they should, but because they tried their friend\u0026rsquo;s phone and the AI was better.\u003c/p\u003e\n\u003cp\u003eWhether this arrives with iOS 27 this fall or takes another generation to mature, I don\u0026rsquo;t know. But the next reason to buy a new phone will much more likely be the model than the camera.\u003c/p\u003e\n","summary":"Smartphones haven't had a compelling upgrade story in years. On-device AI models, distilled from frontier systems like Gemini, are about to change that. Parameters are the new megapixels.","image":"https://static.philippdubach.com/ograph/ograph-on-device-models.jpg","date_published":"2026-03-25T00:00:00Z","date_modified":"2026-05-09T19:32:00+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Tech"],"_philippdubach":{"type":"Commentary","word_count":985,"reading_time_minutes":5,"keywords":["on-device AI smartphone","Apple Gemini distillation","smartphone upgrade cycle AI 2026","Apple Foundation Model 3 billion parameters","NPU TOPS smartphone 2026","iPhone on-device AI model parameters","parameter count smartphone marketing","megapixel myth AI equivalent","Gemini Nano on-device models","Apple Intelligence upgrade cycle","knowledge distillation AI iPhone","on-device LLM smartphone 2026","smartphone AI hardware differentiation","smartphone spec war AI parameters"],"section":"posts"}},{"id":"https://philippdubach.com/posts/ai-can-now-design-drugs-in-seconds-we-still-cant-tell-you-if-they-work./","url":"https://philippdubach.com/posts/ai-can-now-design-drugs-in-seconds-we-still-cant-tell-you-if-they-work./","title":"AI Can Now Design Drugs in Seconds; We Still Can't Tell You If They Work.","content_html":"\u003cblockquote\u003e\n\u003cp\u003eNo AI-discovered drug has ever received FDA approval. That sentence should sit uncomfortably next to every headline about Alphabet\u0026rsquo;s drug discovery spinoff.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eOn February 10, \u003ca href=\"https://www.isomorphiclabs.com/articles/the-isomorphic-labs-drug-design-engine-unlocks-a-new-frontier\"\u003eIsomorphic Labs\u003c/a\u003e, the Google DeepMind spinoff focused on computational drug design, released IsoDDE: its Drug Design Engine. This isn\u0026rsquo;t a model or an AlphaFold upgrade. IsoDDE is a unified in silico drug discovery system that runs protein structure prediction, ligand binding, affinity estimation, and pocket identification in concert, generating in seconds what used to take days of physics-based simulation. On the hardest molecular prediction tasks, the \u0026ldquo;Runs N\u0026rsquo; Poses\u0026rdquo; benchmark designed to test generalization to unfamiliar proteins, IsoDDE hits a \u003cstrong\u003e50%\u003c/strong\u003e success rate. AlphaFold 3 manages roughly \u003cstrong\u003e23%\u003c/strong\u003e. On antibody-antigen modeling, IsoDDE beats AlphaFold 3 by 2.3× and the open-source Boltz-2 by 19.8×. On binding affinity prediction, it achieves a Pearson correlation of 0.85, beating the physics-based gold standard FEP+ at 0.78. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-isodde-benchmark-performance-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/isodde-benchmark-performance.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/isodde-benchmark-performance.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/isodde-benchmark-performance.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/isodde-benchmark-performance.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isodde-benchmark-performance.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/isodde-benchmark-performance.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/isodde-benchmark-performance.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/isodde-benchmark-performance.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isodde-benchmark-performance.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/isodde-benchmark-performance.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/isodde-benchmark-performance.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isodde-benchmark-performance.png\"\n           alt=\"IsoDDE benchmark performance: 50% protein-ligand prediction vs AlphaFold 3 at 23%, 2.3x antibody-antigen improvement, 0.85 binding affinity correlation vs FEP\u0026#43; at 0.78\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-isodde-benchmark-performance-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/isodde-benchmark-performance.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"IsoDDE benchmark performance: 50% protein-ligand prediction vs AlphaFold 3 at 23%, 2.3x antibody-antigen improvement, 0.85 binding affinity correlation vs FEP\u0026#43; at 0.78\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n I would assume that these are large enough improvements that the computational bottleneck in drug design may no longer be the binding question.\u003c/p\u003e\n\u003ch2 id=\"what-pharma-believes\"\u003eWhat pharma believes\u003c/h2\u003e\n\u003cp\u003eIsomorphic has signed partnerships with \u003ca href=\"https://www.prnewswire.com/news-releases/isomorphic-labs-announces-strategic-multi-target-research-collaboration-with-lilly-302027392.html\"\u003eEli Lilly\u003c/a\u003e, \u003ca href=\"https://www.prnewswire.com/news-releases/isomorphic-labs-announces-strategic-multi-target-research-collaboration-with-novartis-302027387.html\"\u003eNovartis\u003c/a\u003e, and \u003ca href=\"https://pharmaphorum.com/news/jj-bets-isomorphic-ai-powered-drug-hunt\"\u003eJohnson \u0026amp; Johnson\u003c/a\u003e worth a combined \u003cstrong\u003e$4 billion+\u003c/strong\u003e in potential value. But look at the structure. Lilly paid $45 million upfront against $1.7 billion in milestones. Novartis paid $37.5 million upfront against $1.2 billion. That\u0026rsquo;s a 50:1 ratio between what pharma promises in biobucks and what it actually wires. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-isomorphic-deal-structure-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/isomorphic-deal-structure.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/isomorphic-deal-structure.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/isomorphic-deal-structure.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/isomorphic-deal-structure.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isomorphic-deal-structure.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/isomorphic-deal-structure.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/isomorphic-deal-structure.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/isomorphic-deal-structure.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isomorphic-deal-structure.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/isomorphic-deal-structure.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/isomorphic-deal-structure.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/isomorphic-deal-structure.png\"\n           alt=\"Isomorphic Labs pharma deal structure: Eli Lilly $45M upfront vs $1.7B milestones, Novartis $37.5M vs $1.2B\u0026#43;, totaling $4B headline value against $82.5M cash\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-isomorphic-deal-structure-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/isomorphic-deal-structure.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Isomorphic Labs pharma deal structure: Eli Lilly $45M upfront vs $1.7B milestones, Novartis $37.5M vs $1.2B\u0026#43;, totaling $4B headline value against $82.5M cash\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n This ratio is standard across AI drug discovery deals in 2025. Pharma is enthusiastic enough to sign but cautious enough to make nearly all the economics contingent on clinical results that don\u0026rsquo;t exist yet. The upfront payments fund research. The milestone payments are structured so that pharma loses almost nothing if the drugs fail. The royalties only matter if a drug reaches blockbuster status, which for an AI-designed molecule has never happened.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.isomorphiclabs.com/articles/isomorphic-labs-announces-novartis-collaboration-expansion\"\u003eNovartis expanded its partnership in February 2025\u003c/a\u003e, doubling the number of programs to six, targeting what Novartis described as \u0026ldquo;particularly challenging\u0026rdquo; and previously undruggable targets, on the same financial terms. That\u0026rsquo;s a positive signal: it means internal results impressed Novartis scientists enough to commit more targets. The J\u0026amp;J deal, announced January 2026, goes further, covering small molecules, antibodies, peptides, and molecular glues. But \u0026ldquo;expanded partnerships\u0026rdquo; and \u0026ldquo;approved drugs\u0026rdquo; remain separated by the most unforgiving filter in business: human biology.\u003c/p\u003e\n\u003ch2 id=\"phase-ii-wall\"\u003ePhase II wall\u003c/h2\u003e\n\u003cp\u003eMost commentary on AI drug discovery stops too early. \u003ca href=\"https://www.sciencedirect.com/science/article/pii/S135964462400134X\"\u003eJayatunga et al. (2024)\u003c/a\u003e, in the first systematic analysis of AI-discovered drugs in clinical trials, showed AI-discovered molecules achieving \u003cstrong\u003e80-90%\u003c/strong\u003e success rates in Phase I trials, well above the historical 40-65% average. AI is good at designing molecules that are safe and have decent pharmacokinetic properties: they get absorbed, distributed, metabolized, and excreted the way you\u0026rsquo;d want. Phase I is mostly about safety. AI passes it.\u003c/p\u003e\n\u003cp\u003eBut Phase II is about efficacy. Does the drug actually treat the disease? And here the numbers are sobering: AI-discovered drugs show roughly 40% Phase II success rates, which is \u003ca href=\"https://www.science.org/content/blog-post/ai-drugs-so-far\"\u003eabout the same as traditionally discovered drugs\u003c/a\u003e. AI has not yet demonstrated it can predict whether a molecule will work in a patient, only that it can predict whether a molecule will be tolerable in a patient. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-drug-phase2-wall-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-drug-phase2-wall.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-drug-phase2-wall.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-drug-phase2-wall.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-drug-phase2-wall.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-drug-phase2-wall.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-drug-phase2-wall.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-drug-phase2-wall.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-drug-phase2-wall.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-drug-phase2-wall.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-drug-phase2-wall.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-drug-phase2-wall.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-drug-phase2-wall.png\"\n           alt=\"AI drug clinical trial success rates: 80-90% Phase I vs 40-65% traditional, but roughly 40% Phase II for both AI and traditional, projecting 9-18% end-to-end vs historical 5-10%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-drug-phase2-wall-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-drug-phase2-wall.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AI drug clinical trial success rates: 80-90% Phase I vs 40-65% traditional, but roughly 40% Phase II for both AI and traditional, projecting 9-18% end-to-end vs historical 5-10%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n If both trends hold, end-to-end success rates could rise from the historical 5-10% to something like 9-18%. That would roughly double R\u0026amp;D productivity, which in a trillion-dollar industry is worth an enormous amount. \u003ca href=\"https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier\"\u003eMcKinsey estimates\u003c/a\u003e generative AI could generate $60-110 billion annually in economic value for pharma and medical products. But it\u0026rsquo;s a far cry from the narrative that generative AI will \u0026ldquo;solve\u0026rdquo; drug discovery. It would make drug development somewhat cheaper and faster. An improvement, not a revolution.\u003c/p\u003e\n\u003cp\u003eThe counterargument, and it\u0026rsquo;s a reasonable one, is that IsoDDE represents a qualitative leap that could crack the efficacy problem. Its ability to model induced fits, where proteins reshape to accommodate a drug, and to identify cryptic binding pockets, like the cereblon site that took experimentalists 15 years to find, means it\u0026rsquo;s capturing biological dynamics that earlier AI systems missed entirely. If better structural understanding translates to better efficacy prediction, the Phase II wall might eventually come down.\u003c/p\u003e\n\u003cp\u003eI find this plausible but unproven. We\u0026rsquo;ll know more when Isomorphic\u0026rsquo;s first candidates enter trials, targeted for late 2026.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"where-isomorphic-fits-in-the-competitive-stack\"\u003eWhere Isomorphic fits in the competitive stack\u003c/h2\u003e\n\u003cp\u003eIsomorphic\u0026rsquo;s competitive position is unusual. It leads on computational benchmarks but trails on clinical progress. \u003ca href=\"https://insilico.com/\"\u003eInsilico Medicine\u003c/a\u003e has the most advanced clinical portfolio: its IPF drug ISM001-055 (now called rentosertib) reached Phase IIa with \u003ca href=\"https://www.nature.com/articles/s41591-025-03743-2\"\u003epositive results published in \u003cem\u003eNature Medicine\u003c/em\u003e in June 2025\u003c/a\u003e, and Insilico has 10+ IND approvals across 31 programs. \u003ca href=\"https://ir.recursion.com/news-releases/news-release-details/recursion-and-exscientia-two-leaders-ai-drug-discovery-space\"\u003eRecursion Pharmaceuticals\u003c/a\u003e, which \u003ca href=\"https://pharmaphorum.com/news/ai-biotechs-exscientia-and-recursion-agree-688m-merger\"\u003eabsorbed Exscientia in a $688 million merger\u003c/a\u003e, takes a different approach entirely, running millions of phenomics experiments weekly on 65 petabytes of biological imaging data. Both companies own wet-lab infrastructure that Isomorphic lacks.\u003c/p\u003e\n\u003cp\u003eWhat Isomorphic has: the AlphaFold lineage, Alphabet-scale compute, and a unified architecture where each prediction task informs the others. On talent, the company appears to be doing well: 4.7/5 on Glassdoor, 100% CEO approval. They hired Dr. Ben Wolf as CMO in June 2025, formerly at Relay Therapeutics with FDA approval experience for Ayvakit and Gavreto. They opened a Cambridge, Massachusetts office. These are the moves of a company staffing up for clinical reality, not just publishing papers.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe open-source threat is real but manageable in the near term. \u003ca href=\"https://techcrunch.com/2026/01/16/from-openais-offices-to-a-deal-with-eli-lilly-how-chai-discovery-became-one-of-the-flashiest-names-in-ai-drug-development/\"\u003eChai Discovery\u003c/a\u003e (backed by OpenAI at a \u003ca href=\"https://techcrunch.com/2025/12/15/openai-backed-biotech-firm-chai-discovery-raises-130m-series-b-at-1-3b-valuation/\"\u003e$1.3 billion valuation\u003c/a\u003e, now partnered with Lilly on biologics) and \u003ca href=\"https://www.genengnews.com/topics/artificial-intelligence/pharma-bets-big-on-ai-platforms-with-flurry-of-new-year-deals/\"\u003eBoltz\u003c/a\u003e (partnered with Pfizer) are both making progress. But the gap between IsoDDE\u0026rsquo;s numbers and the best open-source alternatives is wide enough that Isomorphic has time, maybe 18-24 months, to convert its computational lead into clinical evidence before the field catches up.\u003c/p\u003e\n\u003ch2 id=\"alphabets-asymmetric-position\"\u003eAlphabet\u0026rsquo;s asymmetric position\u003c/h2\u003e\n\u003cp\u003eFor Alphabet, Isomorphic is a rounding error that could become a franchise. The Other Bets segment posted a $3.6 billion operating loss in 2025. Alphabet\u0026rsquo;s net income was $132 billion. The \u003ca href=\"https://www.isomorphiclabs.com/articles/isomorphic-labs-announces-600m-external-investment-round\"\u003e$600 million funding round\u003c/a\u003e led by Thrive Capital in March 2025 suggests the company understands the urgency of getting to the clinic, but Alphabet can sustain this bet indefinitely while the underlying science matures, and that patience is itself a competitive advantage most biotech startups don\u0026rsquo;t have. But does better computation translate to better medicine? IsoDDE\u0026rsquo;s benchmarks are the best evidence so far that AI can model molecular interactions at this resolution. But Demis Hassabis \u003ca href=\"https://www.isomorphiclabs.com/our-tech\"\u003esaid it himself\u003c/a\u003e:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWe know we\u0026rsquo;re never going to solve drug design with AlphaFold alone. We\u0026rsquo;ll need half a dozen more breakthroughs of that magnitude.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIsoDDE might be one of those breakthroughs. The clinical data, when it arrives, will tell us whether it\u0026rsquo;s the kind that matters.\u003c/p\u003e\n","summary":"IsoDDE doubles AlphaFold 3 on hard benchmarks and beats physics-based gold standards. But no AI drug has FDA approval. What $4B in pharma deals actually mean.","image":"https://static.philippdubach.com/ograph/ograph-isomorphic-labs-isodde1.jpg","date_published":"2026-03-18T00:00:00Z","date_modified":"2026-04-09T18:32:13+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Medicine","AI"],"_philippdubach":{"type":"Article","word_count":1116,"reading_time_minutes":6,"keywords":["Isomorphic Labs IsoDDE","IsoDDE vs AlphaFold 3","AI drug discovery clinical trials success rate","AI designed drug FDA approval","Isomorphic Labs Eli Lilly Novartis partnership","AI drug discovery Phase II wall","binding affinity prediction FEP+","computational drug design","in silico drug discovery","AI drug Phase 2 success rate","generative AI drug discovery","Insilico Medicine rentosertib","Recursion Exscientia merger","Alphabet Other Bets Isomorphic Labs","undruggable targets AI","Demis Hassabis drug design","AI pharma deal structure biobucks","Chai Discovery Boltz AI biotech","IsoDDE benchmark performance","Phase II efficacy AI drugs"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-last-architecture-designed-by-hand/","url":"https://philippdubach.com/posts/the-last-architecture-designed-by-hand/","title":"The Last Architecture Designed by Hand","content_html":"\u003cblockquote\u003e\n\u003cp\u003eI bet there is another new architecture to find that is gonna be as big of a gain as transformers were over LSTMs.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eSam Altman, the CEO of the company most invested in the transformer is telling a room of students it isn\u0026rsquo;t the final form. So what comes after the transformer? He\u0026rsquo;s probably right that something will, and the evidence is no longer anecdotal. Several recent papers have proved that the transformer\u0026rsquo;s worst properties are structural, not engineering problems to be fixed with better data or more compute, but mathematical lower bounds.\u003c/p\u003e\n\u003cp\u003eThe transformer, born from the 2017 paper \u003ca href=\"https://arxiv.org/abs/1706.03762\"\u003e\u0026ldquo;Attention Is All You Need,\u0026rdquo;\u003c/a\u003e took us from barely-coherent GPT-2 to GPT-4 in five years. An extraordinary run. But \u003ca href=\"https://arxiv.org/abs/2209.04881\"\u003eDuman Keles et al.\u003c/a\u003e proved that O(n²) attention complexity isn\u0026rsquo;t an implementation detail. It\u0026rsquo;s a necessary lower bound unless a foundational conjecture in complexity theory turns out to be wrong. Double the context, quadruple the cost. The KV cache for a 70B model at one-million-token context eats roughly \u003cstrong\u003e320 GB\u003c/strong\u003e of GPU memory. Most hardware can\u0026rsquo;t hold it.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-last-architecture-quadratic-attention-1-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/last-architecture-quadratic-attention-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/last-architecture-quadratic-attention-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/last-architecture-quadratic-attention-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/last-architecture-quadratic-attention-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-quadratic-attention-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/last-architecture-quadratic-attention-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/last-architecture-quadratic-attention-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/last-architecture-quadratic-attention-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-quadratic-attention-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/last-architecture-quadratic-attention-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/last-architecture-quadratic-attention-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-quadratic-attention-1.png\"\n           alt=\"Quadratic attention scaling: a 4x4 attention matrix requires 16 computations while an 8x8 matrix requires 64, showing how doubling context quadruples cost in transformer architectures\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-last-architecture-quadratic-attention-1-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/last-architecture-quadratic-attention-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Quadratic attention scaling: a 4x4 attention matrix requires 16 computations while an 8x8 matrix requires 64, showing how doubling context quadruples cost in transformer architectures\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe problems run deeper than compute costs. \u003ca href=\"https://arxiv.org/abs/2311.14648\"\u003eKalai and Vempala\u003c/a\u003e proved that any calibrated language model \u003cem\u003emust\u003c/em\u003e hallucinate at a certain rate. A \u003ca href=\"https://arxiv.org/abs/2509.04664\"\u003e2025 follow-up\u003c/a\u003e goes further: no computable LLM can be universally correct on unbounded queries. Not fixable with better training data. Not fixable with RLHF. A statistical property of how these models generate text.\u003c/p\u003e\n\u003cp\u003eOn reasoning: \u003ca href=\"https://arxiv.org/abs/2305.18654\"\u003eDziri et al.\u003c/a\u003e showed transformers collapse multi-step reasoning into pattern matching. Performance drops exponentially as task complexity rises. GPT-4 gets \u003cstrong\u003e59%\u003c/strong\u003e on 3-digit multiplication. \u003ca href=\"https://arxiv.org/abs/2603.10123\"\u003eChowdhury\u003c/a\u003e proved the \u0026ldquo;lost in the middle\u0026rdquo; problem, models performing 20-30% worse on information buried mid-context, is a geometric property of the architecture itself. Present at initialization already, before any training occurs.\u003c/p\u003e\n\u003cp\u003eThese are theorems. The architecture that runs every frontier AI system has a ceiling, and the ceiling is proved.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-post-transformer-stack-is-already-in-production\"\u003eThe post-transformer stack is already in production\u003c/h2\u003e\n\u003cp\u003eA \u003ca href=\"https://arxiv.org/abs/2510.05364\"\u003esurvey by Fichtl et al.\u003c/a\u003e checked the top 10 models on every major benchmark. Zero were non-transformer. The transformer is still winning on the leaderboards. But the field is moving toward hybrid architectures. Over \u003cstrong\u003e60%\u003c/strong\u003e of frontier models released in 2025 already use Mixture of Experts. \u003ca href=\"https://arxiv.org/abs/2412.19437\"\u003eDeepSeek-V3\u003c/a\u003e has 671B total parameters but activates only 37B per token. It trained for \u003cstrong\u003e2.788 million H800 GPU hours\u003c/strong\u003e, a fraction of what a comparable dense model would require, and matched frontier closed-source performance. By late 2025, \u003ca href=\"https://c3.unu.edu/blog/inside-deepseeks-end-of-year-ai-breakthrough-what-the-new-models-deliver\"\u003eDeepSeek-V3.2 reportedly hit GPT-5-level performance at 90% lower training cost\u003c/a\u003e. MoE doesn\u0026rsquo;t replace the transformer. It changes the economics so radically that it\u0026rsquo;s arguably the single biggest practical advance since the original architecture.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-last-architecture-moe-routing-1-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/last-architecture-moe-routing-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/last-architecture-moe-routing-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/last-architecture-moe-routing-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/last-architecture-moe-routing-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-moe-routing-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/last-architecture-moe-routing-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/last-architecture-moe-routing-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/last-architecture-moe-routing-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-moe-routing-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/last-architecture-moe-routing-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/last-architecture-moe-routing-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-moe-routing-1.png\"\n           alt=\"Mixture of Experts routing: an input token passes through a router that activates only 2 of 8 expert blocks, meaning DeepSeek-V3 uses just 37B of its 671B total parameters per token\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-last-architecture-moe-routing-1-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/last-architecture-moe-routing-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Mixture of Experts routing: an input token passes through a router that activates only 2 of 8 expert blocks, meaning DeepSeek-V3 uses just 37B of its 671B total parameters per token\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe more interesting part is what happens when you blend attention with state space models. \u003ca href=\"https://goombalab.github.io/blog/2024/mamba2-part1-model/\"\u003eGu and Dao (2024)\u003c/a\u003e proved SSMs and attention are mathematically dual: two views of the same computation. That theoretical result is showing up in production. \u003ca href=\"https://www.ai21.com/jamba/\"\u003eAI21\u0026rsquo;s Jamba\u003c/a\u003e runs a 1:7 attention-to-Mamba ratio and gets \u003cstrong\u003e256K\u003c/strong\u003e context at \u003cstrong\u003e3x\u003c/strong\u003e throughput over Mixtral. Alibaba\u0026rsquo;s Qwen3-Next shipped the first top-tier model with a hybrid backbone: \u003ca href=\"https://github.com/rasbt/LLMs-from-scratch/blob/main/ch04/08_deltanet/README.md\"\u003eGated DeltaNet\u003c/a\u003e for linear attention at a 3:1 ratio with full attention. Microsoft\u0026rsquo;s Phi-4-mini-flash-reasoning is 75% Mamba layers with \u003cstrong\u003e10x\u003c/strong\u003e throughput at \u003cstrong\u003e2-3x\u003c/strong\u003e lower latency.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-last-architecture-hybrid-layer-stack-1-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/last-architecture-hybrid-layer-stack-1.png\"\n           alt=\"Hybrid layer stack comparison: a traditional transformer uses 8 attention layers while Jamba uses a 1:7 attention-to-Mamba ratio, achieving 256K context at 3x throughput with the same quality\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-last-architecture-hybrid-layer-stack-1-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/last-architecture-hybrid-layer-stack-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Hybrid layer stack comparison: a traditional transformer uses 8 attention layers while Jamba uses a 1:7 attention-to-Mamba ratio, achieving 256K context at 3x throughput with the same quality\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eDiffusion language models are the wild card. \u003ca href=\"https://arxiv.org/abs/2502.09992\"\u003eLLaDA\u003c/a\u003e, the first 8B-parameter diffusion LLM, treats text generation as denoising rather than sequential token prediction. It matches Llama3-8B and does something no autoregressive model can: it solves the \u0026ldquo;reversal curse,\u0026rdquo; outperforming GPT-4o on reversal tasks. \u003ca href=\"https://medium.com/@ML-today/diffusion-models-for-language-from-early-promise-to-a-bold-new-frontier-with-llada-and-the-rise-of-ee80c7ffb8fa\"\u003eGemini Diffusion\u003c/a\u003e hit \u003cstrong\u003e1,479 tokens per second\u003c/strong\u003e. Over 50 papers on diffusion LLMs appeared in 2025. If parallel generation works reliably at scale, inference economics change completely.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://arxiv.org/pdf/2510.05364\"\u003eAlman and Yu\u003c/a\u003e proved there are tasks where every subquadratic alternative has a fundamental theoretical gap. That\u0026rsquo;s the strongest mathematical argument for why hybrids, not clean replacements, are what comes next.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-search-is-no-longer-human-speed\"\u003eThe search is no longer human-speed\u003c/h2\u003e\n\u003cp\u003eThe part of this I find most interesting is the recursion. AI systems are now running the search for their own architectural successors.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/\"\u003eAlphaEvolve\u003c/a\u003e an evolutionary coding agent built on Gemini 2.0 found a way to multiply 4x4 complex matrices in 48 scalar multiplications: the first improvement on Strassen\u0026rsquo;s 56-year-old bound. Across \u003ca href=\"https://www.infoq.com/news/2025/05/google-alpha-evolve/\"\u003e50+ open math problems\u003c/a\u003e, it matched the best known solutions 75% of the time and beat them 20% of the time. The recursive part: AlphaEvolve found a \u003ca href=\"https://cloud.google.com/blog/products/ai-machine-learning/alphaevolve-on-google-cloud\"\u003e23% speedup on a kernel inside Gemini\u0026rsquo;s own architecture\u003c/a\u003e, cutting Gemini\u0026rsquo;s training time by 1% and recovering \u003cstrong\u003e0.7%\u003c/strong\u003e of Google\u0026rsquo;s total compute. Gemini making Gemini faster.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.marktechpost.com/2026/03/08/andrej-karpathy-open-sources-autoresearch-a-630-line-python-tool-letting-ai-agents-run-autonomous-ml-experiments-on-single-gpus/\"\u003eKarpathy\u0026rsquo;s AutoResearch\u003c/a\u003e, released March 7, 2026, is a 630-line Python script that lets an AI agent modify training code, run 5-minute experiments, check results, and iterate. He pointed it at his own highly-tuned \u0026ldquo;Time to GPT-2\u0026rdquo; codebase. The agent found about 20 additive improvements that transferred to larger models, cutting the metric by \u003cstrong\u003e11%\u003c/strong\u003e. \u003ca href=\"https://officechai.com/ai/andrej-karpathys-autoresearch-project-lets-agents-run-100-ai-research-experiments-while-you-sleep/\"\u003eShopify CEO Tobi Lutke tried it overnight\u003c/a\u003e: 37 experiments, 19% validation improvement, a 0.8B model outperforming a 1.6B one. \u003ca href=\"https://github.com/SakanaAI/AI-Scientist-v2\"\u003eSakana AI\u0026rsquo;s AI Scientist v2\u003c/a\u003e went further and produced the first AI-authored paper accepted through standard peer review. \u003ca href=\"https://controlai.news/p/the-ultimate-risk-recursive-self\"\u003eOpenAI said publicly in late 2025\u003c/a\u003e that it\u0026rsquo;s researching how to safely build AI systems capable of recursive self-improvement. Two years ago this was a thought experiment.\u003c/p\u003e\n\u003ch2 id=\"what-the-hardware-decides\"\u003eWhat the hardware decides\u003c/h2\u003e\n\u003cp\u003eThe transformer won not because attention was theoretically prettier than recurrence. It won because it parallelized well on GPUs. Whatever comes next has to clear the same bar.\u003c/p\u003e\n\u003cp\u003ePre-training scaling for dense transformers is flattening. \u003ca href=\"https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/\"\u003eOpenAI spent at least $500 million per major training run on Orion\u003c/a\u003e. The model hit GPT-4 performance after 20% of training; the remaining 80% gave diminishing returns. They downgraded it from GPT-5 to GPT-4.5. \u003ca href=\"https://artificialintelligencemonaco.substack.com/p/ilya-sutskever-on-superintelligence\"\u003eSutskever\u003c/a\u003e at NeurIPS 2024: \u0026ldquo;Pre-training as we know it will end. The data is not growing because we have but one internet.\u0026rdquo; His startup SSI has \u003ca href=\"https://www.arturmarkus.com/ilya-sutskevers-ssi-raises-1b-at-30b-valuation-with-zero-revenue-6x-jump-in-5-months-redefines-ai-investment-logic/\"\u003eraised to a $32 billion valuation with about 20 employees and zero revenue\u003c/a\u003e. A bet that the next leap requires something architecturally new.\u003c/p\u003e\n\u003cp\u003eBut test-time compute opened a different axis entirely. OpenAI\u0026rsquo;s o3 hit \u003cstrong\u003e87.5%\u003c/strong\u003e on ARC-AGI, beating most humans. DeepSeek-R1 matched o1-level reasoning at \u003cstrong\u003e70%\u003c/strong\u003e lower cost. \u003ca href=\"https://aibusiness.com/language-models/ai-model-scaling-isn-t-over-it-s-entering-a-new-era\"\u003eOpenAI\u0026rsquo;s inference spending reached $2.3 billion in 2024\u003c/a\u003e: \u003cstrong\u003e15x\u003c/strong\u003e what they spent training GPT-4.5. \u003ca href=\"https://www.dwarkesh.com/p/dario-amodei\"\u003eDario Amodei\u003c/a\u003e at Morgan Stanley in March 2026: \u0026ldquo;We do not see hitting the wall. We don\u0026rsquo;t see a wall.\u0026rdquo; He\u0026rsquo;s talking about this axis, inference-time compute and RL from verifiable rewards, not about pre-training bigger dense models. The Densing Law now shows capability per parameter doubling every \u003cstrong\u003e3.5 months\u003c/strong\u003e through better data, MoE, and distillation. Last year\u0026rsquo;s frontier, matched with a fraction of the parameters.\u003c/p\u003e\n\u003cp\u003eInference demand is projected to \u003ca href=\"https://v-chandra.github.io/on-device-llms/\"\u003eexceed training demand by 118x\u003c/a\u003e. Global data center power is heading toward \u003ca href=\"https://www.iea.org/reports/energy-and-ai/executive-summary\"\u003e945 TWh by 2030\u003c/a\u003e, roughly Japan\u0026rsquo;s total electricity consumption. An architecture that scores 2x better on benchmarks but runs 3x worse at inference won\u0026rsquo;t win. What ships is whatever fits the hardware. The transformer isn\u0026rsquo;t going away. It\u0026rsquo;s becoming one component in a larger stack: attention for recall, SSMs for cheap sequence processing, MoE for capacity, maybe diffusion for parallel output. \u003ca href=\"https://www.ai21.com/jamba/\"\u003eJamba\u003c/a\u003e, \u003ca href=\"https://arxiv.org/html/2411.13676v1\"\u003eHymba\u003c/a\u003e, and Qwen3-Next already ship this way. That\u0026rsquo;s not a prediction. It\u0026rsquo;s what\u0026rsquo;s in production.\u003c/p\u003e\n\u003cp\u003eHow fast the stack evolves is the open question. The answer, given AlphaEvolve and AutoResearch and AI Scientist v2, is faster than any previous architectural transition. I don\u0026rsquo;t know whether the transformer remains the dominant layer for two years or five. But I\u0026rsquo;m fairly confident that whatever comes next, humans won\u0026rsquo;t have designed it alone.\u003c/p\u003e\n","summary":"The transformer's limits are now mathematical proofs, not empirical hunches. Hybrids are in production. AI is searching for its own replacement. Here's what comes after.","image":"https://static.philippdubach.com/ograph/ograph-last-architecture-designed-by-hand.jpg","date_published":"2026-03-16T00:00:00Z","date_modified":"2026-03-16T18:41:29+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Tech"],"_philippdubach":{"type":"Analysis","word_count":1249,"reading_time_minutes":6,"keywords":["what comes after transformers","transformer architecture limits","AI architecture 2026","Mamba vs transformer","hybrid AI architecture","post-transformer architecture","AlphaEvolve AI research","transformer quadratic scaling","diffusion language models","mixture of experts MoE","AI recursive self-improvement","LLM hallucination mathematical proof","DeepSeek V3 training cost","inference compute scaling","Jamba hybrid architecture","test-time compute scaling","state space models vs transformers","transformer replacement 2026"],"section":"posts"}},{"id":"https://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/","url":"https://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/","title":"MCP vs A2A in 2026: How the AI Protocol War Ends","content_html":"\u003cp\u003eOn March 26, 2025, Sam Altman posted the following \u003ca href=\"https://x.com/sama/status/1904957253456941061\"\u003ethree sentences\u003c/a\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003epeople love MCP and we are excited to add support across our products.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eMCP is Anthropic\u0026rsquo;s Model Context Protocol. OpenAI is Anthropic\u0026rsquo;s most direct competitor. Altman was endorsing a rival\u0026rsquo;s standard. That post may be the most significant event in enterprise AI infrastructure this year. When your main competitor adopts your protocol, the war is close to over. I\u0026rsquo;ve been watching this play out since \u003ca href=\"https://www.anthropic.com/news/model-context-protocol\"\u003eAnthropic launched MCP in November 2024\u003c/a\u003e, and I want to work through what\u0026rsquo;s happening: who controls what, what \u0026ldquo;interoperability\u0026rdquo; means in practice, and whether any of this follows patterns we\u0026rsquo;ve seen before.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-is-mcp\"\u003eWhat is MCP\u003c/h2\u003e\n\u003cp\u003eMCP is a client-server protocol, licensed MIT, built on JSON-RPC 2.0. The mental model is simple: an AI agent (the host) connects through a client to MCP servers that expose tools, data sources, and context. Instead of building a bespoke integration every time Claude or GPT needs to talk to Salesforce, GitHub, or your internal database, you build one MCP server. Any compatible host can then use it.\u003c/p\u003e\n\u003cp\u003eThe problem it solves, which explains why it spread so fast, is that without a standard like this, integration complexity grows quadratically. Every new AI model times every new tool equals a new custom integration. MCP tries to make it linear.\u003c/p\u003e\n\u003cp\u003eBy December 2025, \u003ca href=\"https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation\"\u003eAnthropic\u0026rsquo;s own count\u003c/a\u003e put the public MCP server ecosystem at \u003cstrong\u003e10,000+\u003c/strong\u003e active servers and \u003cstrong\u003e97 million\u003c/strong\u003e monthly SDK downloads across the Python and TypeScript SDKs. \u003ca href=\"https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/\"\u003eGitHub\u0026rsquo;s 2025 Octoverse report\u003c/a\u003e flagged MCP as a standout, hitting \u003cstrong\u003e37,000 stars\u003c/strong\u003e in eight months. The unofficial registry mcp.so lists over 18,000 servers. Official SDKs now cover ten languages, including Python, TypeScript, Java, C#, Go, Kotlin, Rust, and Swift.\u003c/p\u003e\n\u003cp\u003eThe companies building MCP integrations: Microsoft, Salesforce, Cloudflare, GitHub, Stripe, Atlassian, Figma, Snowflake, Databricks, New Relic. At \u003ca href=\"https://blog.cloudflare.com/mcp-demo-day/\"\u003eCloudflare\u0026rsquo;s MCP Demo Day in May 2025\u003c/a\u003e, Asana, PayPal, Sentry, and Webflow all shipped remote servers in a single afternoon. Gartner predicts 75% of API gateway vendors will have MCP features by 2026.\u003c/p\u003e\n\u003cp\u003eOpenAI\u0026rsquo;s adoption went beyond Altman\u0026rsquo;s post. MCP support rolled out across their Agents SDK (March 2025), \u003ca href=\"https://openai.com/index/new-tools-and-features-in-the-responses-api/\"\u003eResponses API (May 2025)\u003c/a\u003e, \u003ca href=\"https://openai.com/index/introducing-gpt-realtime/\"\u003eRealtime API (August 2025)\u003c/a\u003e, and \u003ca href=\"https://help.openai.com/en/articles/12584461-developer-mode-and-mcp-apps-in-chatgpt-beta\"\u003eChatGPT Developer Mode (September 2025)\u003c/a\u003e. The two companies later \u003ca href=\"http://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-apps/\"\u003eco-authored the MCP Apps Extension\u003c/a\u003e. You don\u0026rsquo;t see that often between direct competitors.\u003c/p\u003e\n\u003cp\u003eOne performance claim circulates in blog posts and marketing materials: that organizations implementing MCP report \u0026ldquo;40–60% faster agent deployment times.\u0026rdquo; I have not found a primary source for this. No survey, no case study, no named company. I\u0026rsquo;d treat it as marketing content until someone produces the underlying data.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"googles-a2a-fills-a-different-layer\"\u003eGoogle\u0026rsquo;s A2A fills a different layer\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/\"\u003eGoogle launched A2A, the Agent-to-Agent protocol, at Cloud Next on April 9, 2025\u003c/a\u003e, five months after MCP. Google didn\u0026rsquo;t position A2A as MCP replacement. They called it a complement. I think that\u0026rsquo;s honest, but it takes a minute to see why.\u003c/p\u003e\n\u003cp\u003eMCP connects an agent to tools; A2A connects agents to each other, the two protocols produce different behavior.\u003c/p\u003e\n\u003cp\u003eWhen an MCP host calls an MCP server, it knows exactly what it\u0026rsquo;s getting: structured tool descriptions, specific function signatures, predictable outputs. The agent can see inside the tool. A2A works differently. Agents remain opaque to each other. An A2A agent publishes an \u0026ldquo;Agent Card,\u0026rdquo; a JSON metadata document at a well-known URL, describing its capabilities and authentication requirements. Other agents discover it, negotiate tasks through a defined lifecycle (submitted, working, input-required, completed), and collaborate without sharing memory or internal state.\u003c/p\u003e\n\u003cp\u003eGoogle\u0026rsquo;s own documentation uses a repair shop analogy. MCP is how the mechanic uses diagnostic equipment. A2A is how the customer talks to the shop manager, or how the manager coordinates with a parts supplier. It works: both conversations happen in a real repair shop, and cutting either one doesn\u0026rsquo;t simplify anything.\u003c/p\u003e\n\u003cp\u003eA2A \u003ca href=\"https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/\"\u003elaunched with 50+ partner organizations\u003c/a\u003e and \u003ca href=\"https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade\"\u003egrew to 150+ by July 2025\u003c/a\u003e. The list includes Atlassian, Salesforce, SAP, ServiceNow, McKinsey, BCG, Accenture. \u003ca href=\"https://developers.googleblog.com/en/google-cloud-donates-a2a-to-linux-foundation/\"\u003eGoogle donated A2A to the Linux Foundation in June 2025\u003c/a\u003e. \u003ca href=\"https://lfaidata.foundation/communityblog/2025/08/29/acp-joins-forces-with-a2a-under-the-linux-foundations-lf-ai-data/\"\u003eIBM\u0026rsquo;s competing Agent Communication Protocol merged into A2A in August\u003c/a\u003e, with IBM\u0026rsquo;s engineers joining the technical steering committee. As of February 2026, A2A has roughly \u003cstrong\u003e21,900 GitHub stars\u003c/strong\u003e, about 40% of MCP\u0026rsquo;s total. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-mcp-vs-a2a-protocol-race-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/mcp-vs-a2a-protocol-race.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/mcp-vs-a2a-protocol-race.png\"\n           alt=\"Exhibit comparing MCP and A2A protocol adoption: MCP leads with 37,000 GitHub stars, 18,000\u0026#43; public servers, 97M monthly SDK downloads, and 10 SDK languages versus A2A at 21,900 stars, no public registry, and 3 languages\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-mcp-vs-a2a-protocol-race-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/mcp-vs-a2a-protocol-race.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit comparing MCP and A2A protocol adoption: MCP leads with 37,000 GitHub stars, 18,000\u0026#43; public servers, 97M monthly SDK downloads, and 10 SDK languages versus A2A at 21,900 stars, no public registry, and 3 languages\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"what-history-can-tell-us-about-how-this-ends\"\u003eWhat history can tell us about how this ends\u003c/h2\u003e\n\u003cp\u003eAI agent protocol wars have a consistent pattern. The winner is almost never the technically superior option. It\u0026rsquo;s the one that ships first and gets adopted before anyone can catch up.\u003c/p\u003e\n\u003cp\u003eTCP/IP and OSI are the canonical example. The OSI model, published by ISO in 1983, was architecturally more rigorous than TCP/IP\u0026rsquo;s four-layer stack. It had real institutional backing: the US Commerce Department published its GOSIP mandate in August 1988, with formal enforcement beginning in 1990. European governments followed. OSI still lost. TCP/IP won because it had running code, freely available implementations bundled with BSD Unix workstations, while OSI remained elegant theory trapped in committee processes. By 1994 the outcome was obvious. David Clark\u0026rsquo;s \u003ca href=\"https://groups.csail.mit.edu/ana/People/DDC/future_ietf_92.pdf\"\u003eIETF motto captures why\u003c/a\u003e:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWe reject kings, presidents and voting. We believe in rough consensus and running code.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eVHS versus Betamax is the other lesson people cite, often incorrectly. Betamax had better picture quality. VHS won anyway, and the usual explanation is the movie library. That\u0026rsquo;s part of it. But JVC openly licensed VHS to manufacturers across the industry, which drove prices down and built a content ecosystem Sony couldn\u0026rsquo;t match. By 1987, \u003ca href=\"https://en.wikipedia.org/wiki/Videotape_format_war\"\u003eVHS held 90% of the US VCR market\u003c/a\u003e. Sony conceded in 1988 by manufacturing VHS players. Ecosystem breadth, once established, creates a gravitational field that technical superiority alone can\u0026rsquo;t escape.\u003c/p\u003e\n\u003cp\u003eUSB is a more recent example with a twist. The consortium, Compaq, DEC, IBM, Intel, Microsoft, NEC, Nortel, formed in 1994 and \u003ca href=\"https://ethw.org/Milestones:Universal_Serial_Bus_(USB),_1996\"\u003eshipped USB 1.0 in January 1996\u003c/a\u003e. Adoption was sluggish until \u003ca href=\"https://en.wikipedia.org/wiki/IMac_G3\"\u003eApple shipped the iMac G3 in August 1998\u003c/a\u003e with only USB ports, forcing the entire peripheral industry to follow. One player is so central to the ecosystem that their adoption forces everyone else\u0026rsquo;s hand. OpenAI adopting MCP in March 2025 is MCP\u0026rsquo;s iMac moment.\u003c/p\u003e\n\u003cp\u003eBut USB also offers a warning. USB-C\u0026rsquo;s physical connector won universally, then the underlying protocol fragmented. The same connector could carry anything from USB 2.0 to USB4, 5W to 240W of power, depending on what you plugged together. \u003ca href=\"https://single-market-economy.ec.europa.eu/sectors/electrical-and-electronic-engineering-industries-eei/radio-equipment-directive-red/one-common-charging-solution-all_en\"\u003eThe EU eventually legislated convergence through its Radio Equipment Directive, which took effect December 28, 2024\u003c/a\u003e. A standard can win and still fragment when nobody governs the details. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-standards-war-precedents-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/standards-war-precedents.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/standards-war-precedents.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/standards-war-precedents.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/standards-war-precedents.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/standards-war-precedents.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/standards-war-precedents.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/standards-war-precedents.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/standards-war-precedents.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/standards-war-precedents.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/standards-war-precedents.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/standards-war-precedents.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/standards-war-precedents.png\"\n           alt=\"Exhibit comparing historical standards wars: TCP/IP versus OSI decided by running code, VHS versus Betamax decided by open licensing, USB decided by Apple iMac catalyst event, all paralleling MCP ecosystem-first trajectory\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-standards-war-precedents-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/standards-war-precedents.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit comparing historical standards wars: TCP/IP versus OSI decided by running code, VHS versus Betamax decided by open licensing, USB decided by Apple iMac catalyst event, all paralleling MCP ecosystem-first trajectory\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"what-now\"\u003eWhat now?\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation\"\u003eThe Linux Foundation\u0026rsquo;s Agentic AI Foundation (AAIF), launched December 9, 2025\u003c/a\u003e with Anthropic, OpenAI, and Block as co-founders, \u003ca href=\"https://www.linuxfoundation.org/press/agentic-ai-foundation-welcomes-97-new-members\"\u003enow has 146 member organizations\u003c/a\u003e, including JPMorgan Chase, American Express, Autodesk, Red Hat, and Huawei. A2A has its own Linux Foundation governance body. MCP sits within AAIF. Both are under the same umbrella, but they\u0026rsquo;re not the same project.\u003c/p\u003e\n\u003cp\u003eThis is the governance structure you typically see after a standards war has been decided in principle but before the implementation details have been hammered out. Think of the W3C in 1994, not the W3C in 1998. For anyone making architectural decisions right now, the practical question isn\u0026rsquo;t MCP versus A2A. Most major enterprise platforms already support both. Salesforce, SAP, IBM, Microsoft, and AWS have committed to both. The question is sequencing and depth.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://research.isg-one.com/analyst-perspectives/a2a-v-mcp-why-ai-agents-need-both\"\u003eISG analyst David Menninger\u003c/a\u003e put it clearly: \u0026ldquo;MCP first for sharing context; then A2A for dynamic interaction among agents.\u0026rdquo; That\u0026rsquo;s the sequence I\u0026rsquo;d follow. MCP is the more mature protocol with the larger server ecosystem. The 10,000+ existing servers represent integration work that doesn\u0026rsquo;t need to be rebuilt. Start there. Layer A2A on top when your use cases require multi-agent coordination across organizational boundaries, supply chain, cross-platform orchestration, which is exactly where the Tyson Foods and Adobe deployments have landed.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eMCP security deserves a separate conversation. \u003ca href=\"https://astrix.security/learn/blog/state-of-mcp-server-security-2025/\"\u003eAstrix Security\u0026rsquo;s research\u003c/a\u003e found that 53% of MCP servers rely on static credentials rather than OAuth. A critical vulnerability in the mcp-remote npm package (CVE-2025-6514) exposed 437,000+ installations to shell injection. TCP/IP had its share of early-stage security problems in the 1980s, so I\u0026rsquo;m not calling this fatal. But these are real vulnerabilities, and they will cause real incidents before the posture matures.\u003c/p\u003e\n\u003cp\u003eMultiple analyst firms converge on an agentic AI market of roughly \u003cstrong\u003e$7–8 billion in 2025\u003c/strong\u003e, growing at 40–50% annually, with projections ranging from \u003ca href=\"https://www.grandviewresearch.com/industry-analysis/ai-agents-market-report\"\u003e$50 billion by 2030\u003c/a\u003e to \u003ca href=\"https://www.precedenceresearch.com/agentic-ai-market\"\u003e$199 billion by 2034\u003c/a\u003e. NVIDIA\u0026rsquo;s CUDA is the comparison that matters: 4 million developers, 15 years of compounding library investment, and switching costs that produce \u003ca href=\"https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2025\"\u003e$130.5 billion in annual revenue at 73% gross margins\u003c/a\u003e. MCP\u0026rsquo;s 97 million monthly downloads aren\u0026rsquo;t CUDA yet. But the trajectory points the same direction. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-agentic-ai-market-trajectory-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/agentic-ai-market-trajectory.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/agentic-ai-market-trajectory.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/agentic-ai-market-trajectory.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/agentic-ai-market-trajectory.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/agentic-ai-market-trajectory.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/agentic-ai-market-trajectory.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/agentic-ai-market-trajectory.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/agentic-ai-market-trajectory.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/agentic-ai-market-trajectory.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/agentic-ai-market-trajectory.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/agentic-ai-market-trajectory.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/agentic-ai-market-trajectory.png\"\n           alt=\"Exhibit showing agentic AI market projections from $7-8 billion in 2025 to $50 billion by 2030 and up to $199 billion by 2034, with consensus 45% CAGR and comparison to NVIDIA CUDA $131B annual revenue\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-agentic-ai-market-trajectory-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/agentic-ai-market-trajectory.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing agentic AI market projections from $7-8 billion in 2025 to $50 billion by 2030 and up to $199 billion by 2034, with consensus 45% CAGR and comparison to NVIDIA CUDA $131B annual revenue\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eMy best guess (and I want to be clear it\u0026rsquo;s a guess): MCP becomes the infrastructure layer, A2A becomes the coordination layer, much as TCP handles transport while HTTP handles application-layer communication. Different floors of the same building. The question remains whether 146 AAIF members can hold coherent standards against the competitive pressure of \u003ca href=\"https://tracxn.com/d/sectors/agentic-ai/__oyRAfdUfHPjf2oap110Wis0Qg12Gd8DzULlDXPJzrzs\"\u003eover 1,000 active agentic AI startups\u003c/a\u003e, each with economic incentives to differentiate.\u003c/p\u003e\n","summary":"MCP leads with 97M monthly SDK downloads and 10,000+ servers. A2A fills a different layer. Analysis of the agentic AI standards war with historical parallels.","image":"https://static.philippdubach.com/ograph/ograph-agentic-ai-standards-war1.jpg","date_published":"2026-03-15T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Tech"],"_philippdubach":{"type":"Analysis","word_count":1511,"reading_time_minutes":8,"keywords":["MCP vs A2A","MCP vs A2A 2026","model context protocol","agent to agent protocol","agentic AI protocol war","AI agent protocol","MCP A2A comparison","MCP protocol adoption","A2A protocol Google","Agentic AI Foundation","MCP security vulnerabilities","agentic AI market size","AI agent interoperability","enterprise AI architecture","MCP server ecosystem","multi-agent orchestration","AI standards war","model context protocol vs agent to agent protocol"],"section":"posts"}},{"id":"https://philippdubach.com/posts/ai-models-are-the-new-rebar/","url":"https://philippdubach.com/posts/ai-models-are-the-new-rebar/","title":"AI Models Are the New Rebar","content_html":"\u003cp\u003e\u003ca href=\"https://huggingface.co/Qwen/Qwen3.5-35B-A3B\"\u003eQwen 3.5-35B-A3B\u003c/a\u003e, a model released by Alibaba in February 2026, runs on a single consumer GPU with 24 gigabytes of VRAM. A secondhand RTX 4090, available for around $2,000, generates 60 to 100 tokens per second with it. On select benchmarks per Alibaba\u0026rsquo;s own evaluations, it matches or beats Claude Sonnet 4.5. The Qwen 3.5 Flash tier costs \u003ca href=\"https://www.alibabacloud.com/help/en/model-studio/model-pricing\"\u003e\u003cstrong\u003e$0.10 per million input tokens\u003c/strong\u003e\u003c/a\u003e through Alibaba\u0026rsquo;s API. \u003ca href=\"https://www.anthropic.com/news/claude-sonnet-4-5\"\u003eClaude Sonnet 4.5 costs \u003cstrong\u003e$3.00\u003c/strong\u003e\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThat\u0026rsquo;s a 97 percent discount. For comparable performance.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m not cherry-picking. Zhipu AI\u0026rsquo;s \u003ca href=\"https://medium.com/@mlabonne/glm-5-chinas-first-public-ai-company-ships-a-frontier-model-a068cecb74e3\"\u003eGLM-5 scores 1,452 on the Chatbot Arena leaderboard\u003c/a\u003e, the highest Elo rating of any open-source model, and its developer\u0026rsquo;s own figures put it at roughly 95 percent of closed-model performance at around 15 percent of the cost. Moonshot AI\u0026rsquo;s \u003ca href=\"https://www.kimi.com/blog/kimi-k2-5\"\u003eKimi K2.5\u003c/a\u003e, a trillion-parameter model, scores 99.0 on HumanEval and 96.1 on AIME 2025, with a Chatbot Arena Elo of 1,447, at roughly 88 percent less than Claude Opus 4.5 per token. The \u003ca href=\"https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance\"\u003eStanford HAI 2025 AI Index\u003c/a\u003e found the performance gap between open-source and proprietary AI models on the Chatbot Arena leaderboard shrank from \u003cstrong\u003e8 percent to 1.7 percent in a single year\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eThis is not an IP story. It is not a China story. It is an industrial economics story. And we know how those end. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-performance-vs-price-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-performance-vs-price.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-performance-vs-price.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-performance-vs-price.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-performance-vs-price.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-performance-vs-price.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-performance-vs-price.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-performance-vs-price.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-performance-vs-price.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-performance-vs-price.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-performance-vs-price.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-performance-vs-price.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-performance-vs-price.png\"\n           alt=\"Exhibit showing open-source AI models have crossed the performance threshold at a fraction of the price, with GLM-5, Kimi K2.5, DeepSeek V3, and Qwen 3.5 Flash all landing in the high-performance low-cost quadrant below $1 per million tokens while Claude Opus 4.5 sits at $15 and GPT-4o at $2.50\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-performance-vs-price-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-performance-vs-price.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing open-source AI models have crossed the performance threshold at a fraction of the price, with GLM-5, Kimi K2.5, DeepSeek V3, and Qwen 3.5 Flash all landing in the high-performance low-cost quadrant below $1 per million tokens while Claude Opus 4.5 sits at $15 and GPT-4o at $2.50\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"what-the-steel-mills-can-tell-us\"\u003eWhat the steel mills can tell us\u003c/h2\u003e\n\u003cp\u003eIn the mid-1960s, electric arc furnace mini-mills entered the steel market at the lowest-quality segment: rebar. Capital costs ran one-fifth to one-seventh of what an integrated plant required. Nucor, the most aggressive operator, built its first mill for $6 million when a comparable integrated facility cost $500 million or more. The response from companies like U.S. Steel was rational: retreat from low-margin rebar, harvest the better-margin products, improve average profitability in the short term. Sensible but wrong.\u003c/p\u003e\n\u003cp\u003eEach segment mini-mills conquered had higher margins than the last. From rebar to structural steel, from structural steel to sheet metal, the disruptors climbed the value chain until there was nowhere left to climb. The American steel industry \u003ca href=\"https://www.chicagotribune.com/news/ct-xpm-1990-06-04-9002150481-story.html\"\u003elost money for five consecutive years in the early 1980s\u003c/a\u003e, posting aggregate losses of \u003cstrong\u003e$3.38 billion in 1982 alone\u003c/strong\u003e. U.S. Steel shed more than half its workforce, pivoted to oil and gas, and by \u003ca href=\"https://investors.ussteel.com/news-events/news-releases/detail/659/nippon-steel-corporation-nsc-to-acquire-u-s-steel\"\u003eJune 2025 accepted a $14.9 billion acquisition by Nippon Steel\u003c/a\u003e, a fraction of its inflation-adjusted peak valuation. Nucor, the mini-mill, became the largest American steelmaker.\u003c/p\u003e\n\u003cp\u003eClayton Christensen spent a career documenting this pattern of disruptive innovation. The incumbents never failed because they made bad decisions. They failed because they made good decisions for their existing customers while the market shifted beneath them. OpenAI is serving demanding enterprise customers with the most capable models available. Anthropic is building trust with regulated industries. These are the correct moves for their current customers. They may also be exactly the wrong moves for the next five years.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-cost-decline-eats-strategy\"\u003eThe cost decline eats strategy\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://epoch.ai/data-insights/llm-inference-price-trends\"\u003eEpoch AI\u0026rsquo;s research\u003c/a\u003e, published in 2025, found that AI inference prices are declining at a \u003cstrong\u003emedian rate of 50x per year\u003c/strong\u003e for equivalent performance levels, with a range spanning 9x to 900x depending on the task. Achieving GPT-4\u0026rsquo;s original performance on PhD-level science questions cost $30 per million input tokens when GPT-4 launched in early 2023. Through open-source alternatives today, the same performance costs under $0.10. A roughly 300-fold reduction in three years, at a pace that dwarfs Moore\u0026rsquo;s Law.\u003c/p\u003e\n\u003cp\u003eDavid Cahn at Sequoia Capital put the structural problem plainly in his \u003ca href=\"https://sequoiacap.com/article/ais-600b-question/\"\u003e\u0026quot;$600 Billion Question\u0026quot;\u003c/a\u003e analysis: \u0026ldquo;GPU computing is increasingly turning into a commodity, metered per hour. Without a monopoly or oligopoly, high fixed cost plus low marginal cost businesses almost always see prices competed down to marginal cost, like airlines.\u0026rdquo; The airline analogy is more foreboding than it sounds. The global airline industry generated cumulative net profits of $36 billion between 1945 and 2000, a net margin of 0.8 percent across 55 years. In the 2000s, the industry lost more than it had earned in the prior half-century combined. Even today, \u003ca href=\"https://www.iata.org/en/pressroom/2025-releases/2025-12-09-01\"\u003eIATA projects airlines\u0026rsquo; return on invested capital at 6.8 percent\u003c/a\u003e, below their weighted average cost of capital of 8.2 percent.\u003c/p\u003e\n\u003cp\u003eThe difference between AI and airlines is that switching a flight carrier requires rebooking. Switching an AI model requires changing two lines of code. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-inference-cost-collapse-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/inference-cost-collapse.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/inference-cost-collapse.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/inference-cost-collapse.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/inference-cost-collapse.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/inference-cost-collapse.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/inference-cost-collapse.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/inference-cost-collapse.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/inference-cost-collapse.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/inference-cost-collapse.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/inference-cost-collapse.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/inference-cost-collapse.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/inference-cost-collapse.png\"\n           alt=\"Exhibit showing GPT-4 level performance went from $30 to $0.10 per million tokens in three years, with closed proprietary models shown alongside open-source alternatives that now match frontier performance at a fraction of the cost, representing a 300x cost reduction\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-inference-cost-collapse-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/inference-cost-collapse.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing GPT-4 level performance went from $30 to $0.10 per million tokens in three years, with closed proprietary models shown alongside open-source alternatives that now match frontier performance at a fraction of the cost, representing a 300x cost reduction\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"switching-costs-that-approach-zero\"\u003eSwitching costs that approach zero\u003c/h2\u003e\n\u003cp\u003eThe OpenAI API format has become the de facto industry standard, supported by virtually every major model provider and open-source inference engine. \u003ca href=\"https://github.com/BerriAI/litellm\"\u003eLiteLLM\u003c/a\u003e, an open-source gateway with approximately 37,000 GitHub stars, provides a unified interface to over 100 providers through a single configuration change. OpenRouter offers managed access to more than 400 models. Setup time: under five minutes.\u003c/p\u003e\n\u003cp\u003eEnterprise behavior already reflects this. Perplexity\u0026rsquo;s own data shows 92 percent of Fortune 500 employees use multi-model AI platforms, and their top enterprise accounts access an average of 30 different models. These are Perplexity\u0026rsquo;s internal figures, not independent market research: treat them as directional. The one meaningful source of lock-in is custom fine-tuned models, which are provider-specific and cannot be directly ported. That affects a small fraction of deployments. For the vast majority of inference calls, the model is interchangeable, and the customer buys on price.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-openais-numbers-actually-require\"\u003eWhat OpenAI\u0026rsquo;s numbers actually require\u003c/h2\u003e\n\u003cp\u003eOn February 27, 2026, \u003ca href=\"https://openai.com/index/scaling-ai-for-everyone/\"\u003eOpenAI closed a $110 billion funding round\u003c/a\u003e, the largest private capital raise in history, at a post-money valuation of \u003cstrong\u003e$840 billion\u003c/strong\u003e. Amazon committed $50 billion. SoftBank $30 billion. Nvidia $30 billion. The valuation implies extraordinary confidence in OpenAI\u0026rsquo;s ability to maintain pricing power and grow revenue to somewhere between $200 and $280 billion by 2030. At 42x trailing revenue, it is priced not for today\u0026rsquo;s market but for a specific version of the future.\u003c/p\u003e\n\u003cp\u003eOpenAI reported \u003ca href=\"https://openai.com/index/scaling-ai-for-everyone/\"\u003e\u003cstrong\u003e$20 billion in annualized recurring revenue\u003c/strong\u003e\u003c/a\u003e as of January 2026, up 233 percent year over year. Impressive. But the adjusted gross margin fell to 33 percent in 2025, down from 40 percent the prior year, as \u003ca href=\"https://the-decoder.com/openai-adds-111-billion-to-its-cash-burn-forecast-as-ai-costs-spiral-beyond-projections/\"\u003einference costs quadrupled to $8.4 billion\u003c/a\u003e. In the first half of 2025 alone, OpenAI lost $13.5 billion. Compute and technical talent costs consume approximately 75 percent of total revenue, and Microsoft takes another 20 percent through 2032. That leaves very little room for the margin expansion the valuation demands.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation\"\u003eAnthropic\u003c/a\u003e tells a similar story at a smaller scale. At a \u003cstrong\u003e$380 billion valuation\u003c/strong\u003e on $14 billion in run-rate revenue, 27x, the company is also unprofitable, projecting positive cash flow somewhere around 2027 to 2028. Both companies are betting they can simultaneously grow revenue and expand margins. In commoditized markets, that is the bet that fails.\u003c/p\u003e\n\u003cp\u003ePart of the financing is also circular. Amazon invests $50 billion in OpenAI; a portion flows back to AWS as compute spending. Nvidia invests $30 billion; the same money returns as GPU purchases. This inflates revenue figures while obscuring how much of the demand is genuinely independent. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-openai-margin-squeeze-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/openai-margin-squeeze.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/openai-margin-squeeze.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/openai-margin-squeeze.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/openai-margin-squeeze.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openai-margin-squeeze.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/openai-margin-squeeze.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/openai-margin-squeeze.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/openai-margin-squeeze.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openai-margin-squeeze.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/openai-margin-squeeze.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/openai-margin-squeeze.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openai-margin-squeeze.png\"\n           alt=\"Exhibit showing OpenAI financials: $20B ARR up 233% but gross margin fell from 40% to 33% as inference costs quadrupled to $8.4B, net loss of $13.5B in H1 2025, with the $840B valuation requiring 43% revenue CAGR to 2030 while expanding margins against open-source price pressure\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-openai-margin-squeeze-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/openai-margin-squeeze.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing OpenAI financials: $20B ARR up 233% but gross margin fell from 40% to 33% as inference costs quadrupled to $8.4B, net loss of $13.5B in H1 2025, with the $840B valuation requiring 43% revenue CAGR to 2030 while expanding margins against open-source price pressure\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"who-actually-wins-when-the-model-layer-is-a-commodity\"\u003eWho actually wins when the model layer is a commodity\u003c/h2\u003e\n\u003cp\u003eBefore writing off the incumbents, two historical cases are worth sitting with.\u003c/p\u003e\n\u003cp\u003eAmazon Web Services has cut prices \u003ca href=\"https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/cost_cloud_financial_management_scheduled.html\"\u003e134 times since 2006\u003c/a\u003e, yet its operating margins expanded to a record \u003ca href=\"https://www.cnbc.com/2025/05/01/aws-q1-earnings-report-2025.html\"\u003e39.5 percent in Q1 2025\u003c/a\u003e. Apple captures roughly 80 to 85 percent of global smartphone operating profits with around 18 to 21 percent of unit shipments, while commodity Android manufacturers earn negligible margins. Both got there the same way: years of accumulated switching costs, vertical integration, ecosystems that cost real money to leave. The question is whether AI model providers can build any of that. I don\u0026rsquo;t think they can, not at the model layer. An API endpoint returning text is not an iPhone. You change it in a config file on a Tuesday afternoon.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eSo who does benefit? Nvidia and cloud providers collect rent regardless of which model runs on their hardware. That position is durable. The application layer looks better still: companies embedding AI into domain-specific workflows with proprietary data, where the model is an input rather than the product. As \u003ca href=\"https://eqtgroup.com/thinq/technology/why-ai-value-wont-just-accrue-to-foundational-models\"\u003eAndrew Lewis at EQT\u003c/a\u003e put it, \u0026ldquo;Over time, the value is likely to accrue to the application layer and the product companies.\u0026rdquo; And then there are the platforms with distribution so large they can integrate AI at near-zero marginal cost: Meta embedding Llama into Instagram and WhatsApp, Google weaving Gemini into Search and Workspace. When Mark Zuckerberg open-sources Llama, he is deliberately commoditizing the model layer to prevent any single player from owning the stack above his distribution. When a $1.6 trillion company is your most committed price-cutter, that tells you something about where the margins are going.\u003c/p\u003e\n","summary":"Qwen 3.5-35B runs on a gaming PC and matches Claude Sonnet 4.5. When the commodity version is 95% as good and 97% cheaper, you have a pricing problem.","image":"https://static.philippdubach.com/ograph/ograph-ai-margin-collapse1.jpg","date_published":"2026-03-11T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Investing"],"_philippdubach":{"type":"Analysis","word_count":1429,"reading_time_minutes":7,"keywords":["AI model commoditization","AI commoditization","open-source AI models","open source AI vs proprietary","AI inference cost decline","OpenAI valuation","AI pricing war 2026","AI margin compression","Qwen 3.5","LLM switching costs","AI model pricing","disruptive innovation AI","model layer commodity","who wins AI value chain","OpenAI profitability","open source AI performance gap","Christensen AI disruption","Anthropic valuation"],"section":"posts"}},{"id":"https://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/","url":"https://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/","title":"AI Capex Arms Race: Who Blinks First?","content_html":"\u003cp\u003eAlphabet\u0026rsquo;s free cash flow is projected to fall roughly \u003cstrong\u003e90%\u003c/strong\u003e in 2026. Not because the business is in trouble. Because the company has committed to spending \u003cstrong\u003e$83–93 billion more\u003c/strong\u003e on capital expenditure than it did last year.\u003c/p\u003e\n\u003cp\u003eThat is what $660–690 billion in AI capex looks like up close. \u003ca href=\"https://finance.yahoo.com/news/amazon-200-billion-ai-spending-153341517.html\"\u003eAmazon guided to \u003cstrong\u003e$200 billion\u003c/strong\u003e alone\u003c/a\u003e. Meta\u0026rsquo;s long-term debt more than doubled to \u003ca href=\"https://www.sec.gov/Archives/edgar/data/1326801/000162828026003832/meta-12312025xexhibit991.htm\"\u003e\u003cstrong\u003e$58.7 billion\u003c/strong\u003e\u003c/a\u003e to help finance its share. \u003ca href=\"https://www.goldmansachs.com/insights/articles/why-ai-companies-may-invest-more-than-500-billion-in-2026\"\u003eGoldman Sachs projects\u003c/a\u003e cumulative 2025–2027 spending across the Big 4 at \u003cstrong\u003e$1.15 trillion\u003c/strong\u003e, more than double the $477 billion spent over the prior three years combined. BofA credit strategists found this will consume \u003ca href=\"https://techblog.comsoc.org/2025/11/01/ai-spending-boom-accelerates-big-tech-to-invest-invest-an-aggregate-of-400-billion-in-2025-more-in-2026/\"\u003e\u003cstrong\u003e94% of operating cash flow minus dividends and buybacks\u003c/strong\u003e\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eAt what revenue growth rate does any of this pay for itself? And what happens if inference costs fall 100-fold before the infrastructure is fully depreciated? We want to think about this the way a credit analyst would. Not as a technology story but as a corporate finance story. Because the numbers, assembled from earnings releases and analyst reports through February 2026, look less like a technology platform buildout and more like a leveraged buyout of the future. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-capex-hockey-stick-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-capex-hockey-stick.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-capex-hockey-stick.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-capex-hockey-stick.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-capex-hockey-stick.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-capex-hockey-stick.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-capex-hockey-stick.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-capex-hockey-stick.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-capex-hockey-stick.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-capex-hockey-stick.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-capex-hockey-stick.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-capex-hockey-stick.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-capex-hockey-stick.png\"\n           alt=\"Exhibit showing 2025 actual versus 2026 guided capex for Big 4 hyperscalers: Amazon at $200B guided up 52%, Alphabet at $175-185B up 97%, Meta at $60-65B, Microsoft at $100-120B up 25%, totaling $610-655B combined up 63%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-capex-hockey-stick-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-capex-hockey-stick.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing 2025 actual versus 2026 guided capex for Big 4 hyperscalers: Amazon at $200B guided up 52%, Alphabet at $175-185B up 97%, Meta at $60-65B, Microsoft at $100-120B up 25%, totaling $610-655B combined up 63%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"the-lbo\"\u003eThe LBO\u003c/h2\u003e\n\u003cp\u003eAn LBO thesis goes like this: we borrow heavily today, acquire an asset, generate enough cash flow to service the debt, and eventually sell or refinance at a profit. The bet works if the returns from the acquired asset exceed the cost of capital. It fails if the asset underperforms, the cost of capital rises, or the timeline extends beyond what the capital structure can absorb.\u003c/p\u003e\n\u003cp\u003eThe hyperscaler capex thesis has the same structure, substituting \u0026ldquo;equity\u0026rdquo; and \u0026ldquo;operating cash flow\u0026rdquo; for debt. Each company is telling shareholders: we will deploy enormous capital today, accept near-zero or negative free cash flow for 18 to 36 months, and recoup that investment through AI revenue growth. Sundar Pichai put the bull case plainly \u003ca href=\"https://www.fool.com/earnings/call-transcripts/2024/07/23/alphabet-googl-q2-2024-earnings-call-transcript/\"\u003eat Alphabet\u0026rsquo;s Q2 2024 earnings\u003c/a\u003e:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe risk of underinvesting is dramatically greater than the risk of overinvesting for us here.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAt five-year straight-line on $175 billion in Alphabet capex, you get $35 billion in annual depreciation. Add a conservative 10% cost of capital on the incremental investment, and the hurdle gets harder still. For the full \u003cstrong\u003e$690 billion\u003c/strong\u003e in 2026 hyperscaler capex, the annual depreciation burden alone approaches \u003cstrong\u003e$115–140 billion\u003c/strong\u003e at five-year lives. That is before interest, power, operations, or the cost of next year\u0026rsquo;s upgrade cycle.\u003c/p\u003e\n\u003cp\u003eThe revenue side of this ledger is far smaller than the capex side. Rough estimates place direct AI revenue across the ecosystem at \u003cstrong\u003e$40–60 billion in 2025\u003c/strong\u003e, against AI-specific capex of roughly $300 billion. Coverage ratio: approximately \u003cstrong\u003e0.15x\u003c/strong\u003e. \u003ca href=\"https://sequoiacap.com/article/ais-600b-question/\"\u003eSequoia\u0026rsquo;s David Cahn\u003c/a\u003e calculated that the AI ecosystem needs to generate \u003cstrong\u003e$600 billion in annual revenue\u003c/strong\u003e to justify current infrastructure spending, against perhaps $50–100 billion it is actually generating. By 2026, with AI revenue perhaps reaching $80–120 billion and AI capex at $450 billion, the ratio improves to roughly \u003cstrong\u003e0.25x\u003c/strong\u003e. Still not a business. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-revenue-coverage-gap-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-revenue-coverage-gap.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-revenue-coverage-gap.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-revenue-coverage-gap.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-revenue-coverage-gap.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-revenue-coverage-gap.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-revenue-coverage-gap.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-revenue-coverage-gap.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-revenue-coverage-gap.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-revenue-coverage-gap.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-revenue-coverage-gap.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-revenue-coverage-gap.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-revenue-coverage-gap.png\"\n           alt=\"Exhibit showing AI revenue of roughly $50B in 2025 against $300B in AI-specific capex and the $600B revenue threshold estimated by Sequoia Capital, with coverage ratios of 0.17x in 2025 and 0.25x projected for 2026\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-revenue-coverage-gap-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-revenue-coverage-gap.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing AI revenue of roughly $50B in 2025 against $300B in AI-specific capex and the $600B revenue threshold estimated by Sequoia Capital, with coverage ratios of 0.17x in 2025 and 0.25x projected for 2026\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"what-would-have-to-be-true\"\u003eWhat would have to be true\u003c/h2\u003e\n\u003cp\u003eThe spending is not obviously irrational. The bull case is worth taking seriously: the right moment to build infrastructure for a platform shift is before the platform fully exists. Railroads were overbuilt. Fiber was overbuilt. Both excesses funded genuinely useful infrastructure that later ran at capacity. If AI becomes the general-purpose technology that most proponents claim, the AI infrastructure being deployed today could look like the most prescient investment since Standard Oil.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eBut that argument requires you to believe some very specific things about revenue growth that have not yet materialized. The 2025–2030 revenue ramp embedded in current capex implies AI revenue growing from roughly $60 billion today to somewhere between $600 billion and $2 trillion by 2030, depending on which bullish scenario you pick. Bain calculates that even under the most aggressive adoption scenario, AI generates $1.2 trillion in revenue, against the $2 trillion the spending requires to break even.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://economics.mit.edu/news/daron-acemoglu-what-do-we-know-about-economics-ai\"\u003eMIT\u0026rsquo;s Daron Acemoglu\u003c/a\u003e, who won the 2024 Nobel Prize in Economics, projects AI will deliver a total GDP increase of just \u003cstrong\u003e1.1–1.6% over ten years\u003c/strong\u003e: roughly a \u003cstrong\u003e0.05% annual productivity gain\u003c/strong\u003e. Only about 5% of economic tasks, he estimates, are cost-effectively automatable at current prices. Goldman Sachs\u0026rsquo; Jim Covello made a similar argument in a \u003ca href=\"https://www.datacenterdynamics.com/en/news/goldman-sachs-1tn-to-be-spent-on-ai-data-centers-chips-and-utility-upgrades-with-little-to-show-for-it-so-far/\"\u003eJune 2024 note\u003c/a\u003e: \u0026ldquo;Replacing low-wage jobs with tremendously costly technology is basically the polar opposite of the prior technology transitions I\u0026rsquo;ve witnessed in my thirty years of closely following the tech industry.\u0026rdquo; Neither of these is a fringe view. If either is roughly right, the revenue scenarios baked into current capex budgets do not close. And yet the same market is \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003edestroying software stocks\u003c/a\u003e because AI adoption is supposedly too strong. Both readings cannot be true.\u003c/p\u003e\n\u003cp\u003eDario Amodei, who is himself building the infrastructure, \u003ca href=\"https://www.dwarkesh.com/p/dario-amodei-2\"\u003eput it very bluntly on the Dwarkesh Podcast in February 2026\u003c/a\u003e: \u0026ldquo;If my revenue is not $1 trillion, if it\u0026rsquo;s even $800 billion, there\u0026rsquo;s no force on Earth, there\u0026rsquo;s no hedge on Earth that could stop me from going bankrupt if I buy that much compute.\u0026rdquo; He was describing his own spending discipline relative to peers. The companies spending three times as much as Anthropic apparently believe they have found the hedge he could not.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"depreciation-time-bomb\"\u003eDepreciation time bomb\u003c/h2\u003e\n\u003cp\u003eOne risk most analysis underweights: AI hardware obsoletes faster than any previous infrastructure cycle.\u003c/p\u003e\n\u003cp\u003eHyperscalers have extended server useful lives from four to five and six years, saving billions in annual depreciation. But Amazon reversed course: in Q4 2024 it took a \u003ca href=\"https://behindthebalancesheet.substack.com/p/amazons-ai-reality-check\"\u003e\u003cstrong\u003e$920 million\u003c/strong\u003e charge to early-retire certain servers and networking equipment\u003c/a\u003e, then effective January 1, 2025 it shortened useful lives for a subset of servers from six to five years, citing \u0026ldquo;the increased pace of technology development, particularly in the area of artificial intelligence,\u0026rdquo; a decision expected to reduce 2025 operating income by a further $700 million. Jensen Huang, not a man known for underselling his own products, said of H100 GPUs once Blackwell shipped: \u003ca href=\"https://www.rev.com/transcripts/gtc-keynote-with-nvidia-ceo-jensen-huang\"\u003e\u0026ldquo;You couldn\u0026rsquo;t give Hoppers away.\u0026rdquo;\u003c/a\u003e Nvidia now releases new architectures annually, where it previously released them every two years.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.cnbc.com/2025/11/11/big-short-investor-michael-burry-accuses-ai-hyperscalers-of-artificially-boosting-earnings.html\"\u003eMichael Burry\u003c/a\u003e, who spent 2005 correctly modeling the mortgage market\u0026rsquo;s hidden risks, estimates that hyperscalers will understate depreciation by roughly \u003cstrong\u003e$176 billion\u003c/strong\u003e in aggregate between 2026 and 2028, causing them to overreport earnings by more than 20%. I have no idea whether Burry is right on the specific number. But the direction is correct. If the useful life of a Blackwell GPU is closer to three years than five because Rubin replaces it in 2027, the depreciation math gets far worse.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://epoch.ai/data-insights/llm-inference-price-trends\"\u003eEpoch AI measured\u003c/a\u003e inference costs falling at a median \u003cstrong\u003e50 times per year\u003c/strong\u003e, accelerating to \u003cstrong\u003e200 times per year\u003c/strong\u003e after January 2024. GPT-3-era processing cost around $20 per million tokens at launch in 2020. By early 2026, models of comparable capability cost roughly \u003cstrong\u003e$0.07\u003c/strong\u003e per million tokens. That is a roughly 280-fold decline over five years, and there is no obvious reason for it to stop. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-inference-cost-cliff-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-inference-cost-cliff.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-inference-cost-cliff.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-inference-cost-cliff.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-inference-cost-cliff.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-inference-cost-cliff.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-inference-cost-cliff.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-inference-cost-cliff.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-inference-cost-cliff.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-inference-cost-cliff.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-inference-cost-cliff.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-inference-cost-cliff.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-inference-cost-cliff.png\"\n           alt=\"Exhibit showing inference cost per million tokens falling from $20 at GPT-3 launch in 2020 to $0.07 in early 2026 on a log scale, with Epoch AI measuring acceleration to 200x per year decline after January 2024\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-inference-cost-cliff-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-inference-cost-cliff.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing inference cost per million tokens falling from $20 at GPT-3 launch in 2020 to $0.07 in early 2026 on a log scale, with Epoch AI measuring acceleration to 200x per year decline after January 2024\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n The hyperscaler response to this is Jevons, \u003ca href=\"/posts/does-ai-mean-the-demand-on-labor-goes-up/\"\u003ean argument I explored in January\u003c/a\u003e: cheaper inference will explode demand, and the total compute consumed will far exceed what efficiency gains removed. They may be right. But the timing matters. Infrastructure being deployed today, at today\u0026rsquo;s GPU prices, needs to generate enough revenue before the next architecture cycle renders it economically obsolete. The payback window is not 36 months. It may be 18.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"arms-race-logic\"\u003eArms race logic\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://fortune.com/2025/09/19/zuckerberg-ai-bubble-definitely-possibility-sam-altman-collapse/\"\u003eMark Zuckerberg acknowledged\u003c/a\u003e the possibility of an AI bubble \u0026ldquo;definitely\u0026rdquo; in September 2025, then spent $72 billion anyway. This is not irrationality. It is game theory. If AI really does create winner-take-most outcomes, slowing down is a bet that the platform shift is smaller than your competitors believe. Most boards are not willing to make that bet. So everyone keeps spending, and as I \u003ca href=\"/posts/every-bulge-bracket-bank-agrees-on-ai/\"\u003ewrote last week\u003c/a\u003e, every bulge bracket bank agrees they should.\u003c/p\u003e\n\u003cp\u003eBut the same logic drove WorldCom\u0026rsquo;s Bernie Ebbers. The same logic drove Global Crossing. The specific claim driving the 1990s telecom bubble was that internet traffic was \u0026ldquo;doubling every 100 days.\u0026rdquo; It was false: \u003ca href=\"https://www-users.cse.umn.edu/~odlyzko/doc/internet.growth.myth2.pdf\"\u003eresearcher Andrew Odlyzko traced it to misleading WorldCom/UUNET claims\u003c/a\u003e, and actual traffic doubled roughly once per year. By 2001, only \u003cstrong\u003e5% of installed fiber capacity was in use\u003c/strong\u003e. The infrastructure eventually ran at capacity; it just took a decade and several dozen bankruptcies to get there.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.oaktreecapital.com/insights/memo/is-it-a-bubble\"\u003eHoward Marks published a December 2025 memo\u003c/a\u003e asking, with characteristic deliberateness, \u0026ldquo;Is It a Bubble?\u0026rdquo; He noted hyperscalers\u0026rsquo; capex was outpacing revenue momentum and lenders were sweetening terms to keep deal flow alive. J.P. Morgan projects \u003cstrong\u003e$300 billion in investment-grade bonds\u003c/strong\u003e for AI data centers in 2026 alone. That is the same fragility that destroyed the telecom builders: cheap debt financing infrastructure before anyone has proved the revenue exists to service it.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://fortune.com/2026/02/23/ai-capex-us-gdp-negative-pantheon/\"\u003eWithout AI spending, Pantheon Macroeconomics calculated in February 2026\u003c/a\u003e, U.S. corporate capex would currently be negative. The entire infrastructure investment story depends on this cycle continuing: total U.S. GDP grew just 1.4% annualized in H1 2025, and AI-related investment accounted for essentially all of it.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"Alphabet's free cash flow is on track to fall 90% in 2026. Amazon's is at $11B. $690B in AI capex is cannibalizing the cash that justified these valuations.","image":"https://static.philippdubach.com/ograph/ograph-ai-capex-arms-race1.jpg","date_published":"2026-03-08T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Investing","AI"],"_philippdubach":{"type":"Analysis","word_count":1553,"reading_time_minutes":8,"keywords":["AI capex 2026","hyperscaler capex 2026","AI infrastructure bubble","AI spending free cash flow","GPU depreciation stranded assets","AI capex vs revenue gap","AI data center spending 2026","telecom bubble AI comparison","Sequoia 600 billion revenue gap","inference cost decline","Dario Amodei AI spending risk","AI infrastructure overbuilt","GPU useful life depreciation risk","AI capex bubble","Big Tech AI free cash flow","hyperscaler AI investment sustainability","Jevons paradox AI compute"],"section":"posts"}},{"id":"https://philippdubach.com/posts/93-of-developers-use-ai-coding-tools.-productivity-hasnt-moved./","url":"https://philippdubach.com/posts/93-of-developers-use-ai-coding-tools.-productivity-hasnt-moved./","title":"93% of Developers Use AI Coding Tools. Productivity Hasn't Moved.","content_html":"\u003cp\u003eA \u003ca href=\"https://arxiv.org/abs/2507.09089\"\u003estudy published in July 2025\u003c/a\u003e gave AI coding tools their most credible test yet. Sixteen experienced open-source developers, 246 real tasks, randomized controlled design. The researchers expected to measure how much faster AI made them. What they found: developers using AI took \u003cstrong\u003e19% longer\u003c/strong\u003e to complete tasks than those working without it.\u003c/p\u003e\n\u003cp\u003eThe developers themselves thought they were 20% faster.\u003c/p\u003e\n\u003cp\u003eThat \u003cstrong\u003e39-point gap\u003c/strong\u003e between perception and reality is the most important number in \u003ca href=\"https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/\"\u003eMETR\u0026rsquo;s paper\u003c/a\u003e. It lands inside two years of adoption data pointing in the opposite direction. \u003ca href=\"https://getdx.com/\"\u003eDX\u003c/a\u003e surveyed 121,000 developers across 450+ companies and found \u003cstrong\u003e92.6%\u003c/strong\u003e use AI coding tools at least monthly. \u003ca href=\"https://blog.jetbrains.com/ai/2026/02/the-best-ai-models-for-coding-accuracy-integration-and-developer-fit/\"\u003eJetBrains\u0026rsquo; AI Pulse\u003c/a\u003e measured 93%. The \u003ca href=\"https://dora.dev/dora-report-2025\"\u003eDORA 2025 report\u003c/a\u003e put it at 90%. On the productivity side: six independent research efforts converge on roughly the same ceiling, \u003cstrong\u003e10%\u003c/strong\u003e at the system level, if you\u0026rsquo;re being generous.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-coding-perception-gap-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-coding-perception-gap.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-coding-perception-gap.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-coding-perception-gap.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-coding-perception-gap.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-perception-gap.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-coding-perception-gap.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-coding-perception-gap.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-coding-perception-gap.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-perception-gap.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-coding-perception-gap.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-coding-perception-gap.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-perception-gap.png\"\n           alt=\"Exhibit showing METR study results: developers using AI took 19% longer to complete tasks while believing they were 20% faster, a 39-point perception gap across 246 tasks with 56% of AI suggestions rejected\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-coding-perception-gap-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-coding-perception-gap.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing METR study results: developers using AI took 19% longer to complete tasks while believing they were 20% faster, a 39-point perception gap across 246 tasks with 56% of AI suggestions rejected\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"the-bottleneck-was-never-the-typing\"\u003eThe bottleneck was never the typing\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.tocinstitute.org/theory-of-constraints.html\"\u003eGoldratt\u0026rsquo;s Theory of Constraints\u003c/a\u003e makes the following prediction: optimizing a step that isn\u0026rsquo;t the bottleneck doesn\u0026rsquo;t improve system throughput. You can make the fastest machine on the factory floor twice as fast. If it\u0026rsquo;s feeding a queue that\u0026rsquo;s already backed up, you\u0026rsquo;ve accomplished nothing at the output level.\u003c/p\u003e\n\u003cp\u003eWriting code has never been that bottleneck. \u003ca href=\"https://www.bain.com/insights/from-pilots-to-payoff-generative-ai-in-software-development-technology-report-2025/\"\u003eBain\u0026rsquo;s analysis\u003c/a\u003e found that writing and testing code accounts for roughly 25-35% of the total software development lifecycle. The rest goes to code review, understanding requirements, debugging, meetings, documentation. Even with a 100% speedup on the coding step, that gives you a 15-25% overall improvement, and that\u0026rsquo;s before accounting for what happens downstream when you generate a lot more code. Gergely Orosz, who runs The Pragmatic Engineer, \u003ca href=\"https://aws.amazon.com/blogs/enterprise-strategy/measuring-the-impact-of-ai-assistants-on-software-development/\"\u003eput it directly\u003c/a\u003e:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eSpeed of typing out code has never been the bottleneck for software development.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eWhat the data shows now is that AI tools don\u0026rsquo;t just fail to clear the bottleneck. They move it downstream and make it worse. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-coding-impact-ceiling-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-coding-impact-ceiling.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-coding-impact-ceiling.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-coding-impact-ceiling.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-coding-impact-ceiling.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-impact-ceiling.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-coding-impact-ceiling.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-coding-impact-ceiling.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-coding-impact-ceiling.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-impact-ceiling.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-coding-impact-ceiling.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-coding-impact-ceiling.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-impact-ceiling.png\"\n           alt=\"Exhibit showing coding is 25-35% of the software development lifecycle with developers writing code only 52 minutes per day, meaning even a 100% coding speedup yields at most 15% system improvement under Amdahl\u0026#39;s Law\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-coding-impact-ceiling-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-coding-impact-ceiling.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing coding is 25-35% of the software development lifecycle with developers writing code only 52 minutes per day, meaning even a 100% coding speedup yields at most 15% system improvement under Amdahl\u0026#39;s Law\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"the-code-review-bottleneck\"\u003eThe code review bottleneck\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.faros.ai/ai-productivity-paradox\"\u003eFaros AI\u003c/a\u003e measured this across 10,000+ developers on 1,255 teams in June 2025. Teams with high AI adoption completed 21% more tasks and merged 98% more pull requests. PR size grew 154%. Then: review time up 91%, bugs up 9%, organizational DORA metrics flat.\u003c/p\u003e\n\u003cp\u003eMore PRs, bigger PRs, slower reviews, more bugs, no throughput improvement. The coding step accelerated. The review step, already a constraint, got worse. Michael Truell, \u003ca href=\"https://fortune.com/2025/12/19/cursor-ai-coding-startup-graphite-competition-heats-up/\"\u003eCursor\u0026rsquo;s CEO\u003c/a\u003e:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eCursor has made it much faster to write production code. However, for most engineering teams, reviewing code looks the same as it did three years ago\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eCursor then \u003ca href=\"https://cursor.com/blog/graphite\"\u003eacquired Graphite\u003c/a\u003e, a code review startup. The acquisition is a more honest statement about where the constraint lives than anything in Cursor\u0026rsquo;s marketing. The \u003ca href=\"https://dora.dev/research/2024/dora-report/\"\u003eDORA 2024 report\u003c/a\u003e found that for every 25 percentage point increase in AI adoption, delivery throughput dropped 1.5% and delivery stability dropped 7.2%. \u003ca href=\"https://dora.dev/dora-report-2025\"\u003eDORA 2025\u003c/a\u003e, at 90% adoption, put it tersely: \u0026ldquo;AI doesn\u0026rsquo;t fix a team; it amplifies what\u0026rsquo;s already there.\u0026rdquo; The negative relationship with stability holds even as adoption saturates. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-coding-bottleneck-shift-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-coding-bottleneck-shift.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-coding-bottleneck-shift.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-coding-bottleneck-shift.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-coding-bottleneck-shift.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-bottleneck-shift.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-coding-bottleneck-shift.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-coding-bottleneck-shift.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-coding-bottleneck-shift.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-bottleneck-shift.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-coding-bottleneck-shift.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-coding-bottleneck-shift.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-coding-bottleneck-shift.png\"\n           alt=\"Exhibit showing Faros AI data across 10,000\u0026#43; developers: high AI adoption teams merged 98% more pull requests but review time increased 91%, bugs rose 9%, and DORA delivery metrics were unchanged\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-coding-bottleneck-shift-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-coding-bottleneck-shift.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Faros AI data across 10,000\u0026#43; developers: high AI adoption teams merged 98% more pull requests but review time increased 91%, bugs rose 9%, and DORA delivery metrics were unchanged\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"41-of-what\"\u003e41% of what?\u003c/h2\u003e\n\u003cp\u003eOne number circulates constantly in press coverage: 41% of code is now AI-generated. It comes from Emad Mostaque, who took GitHub\u0026rsquo;s figure about the share of code accepted by Copilot users and \u003ca href=\"https://decrypt.co/147191/no-human-programmers-five-years-ai-stability-ceo\"\u003eextrapolated it\u003c/a\u003e into a claim about all code everywhere. The original figure applied only to developers already using Copilot, a fraction of GitHub\u0026rsquo;s user base at the time. The extrapolation doesn\u0026rsquo;t hold.\u003c/p\u003e\n\u003cp\u003eThe more defensible numbers: \u003ca href=\"https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/\"\u003eDX\u0026rsquo;s measurement across 4.2 million developers\u003c/a\u003e puts AI-generated production code at 26.9%. A \u003ca href=\"https://arxiv.org/abs/2506.08945\"\u003estudy published in Science\u003c/a\u003e found roughly 30% of Python functions from U.S. contributors on GitHub were AI-generated by late 2024. \u003ca href=\"https://fortune.com/2024/10/30/googles-code-ai-sundar-pichai/\"\u003eSundar Pichai\u003c/a\u003e said more than a quarter of all new code at Google is AI-generated. These numbers cluster around 25-30%.\u003c/p\u003e\n\u003cp\u003eThe inflated figure matters because it supports a specific argument: that AI has already crossed some threshold, that the transformation is done, that the productivity gains are already baked in. At 27%, AI is a meaningful contributor to software production. At 41%, you\u0026rsquo;re telling a different story, and the decisions that follow from it are different decisions.\u003c/p\u003e\n\u003cp\u003eThe quality picture at 27% is not reassuring. \u003ca href=\"https://www.businesswire.com/news/home/20250730694951/en/AI-Generated-Code-Poses-Major-Security-Risks-in-Nearly-Half-of-All-Development-Tasks-Veracode-Research-Reveals\"\u003eVeracode tested 100+ LLMs\u003c/a\u003e across 80 coding tasks and found 45% of AI-generated code introduced OWASP Top 10 vulnerabilities. \u003ca href=\"https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation\"\u003eCodeRabbit\u0026rsquo;s analysis\u003c/a\u003e found AI-generated code contains 2.74x more security vulnerabilities than human-written code. \u003ca href=\"https://www.blackduck.com/blog/open-source-trends-ossra-report.html\"\u003eBlack Duck\u0026rsquo;s 2026 OSSRA report\u003c/a\u003e found vulnerabilities per codebase up 107% year over year, the mean codebase going from 280 to 581 known vulnerabilities. \u003ca href=\"https://thenewstack.io/martin-fowler-on-preparing-for-ais-nondeterministic-computing/\"\u003eMartin Fowler\u0026rsquo;s framing\u003c/a\u003e is still the most honest I\u0026rsquo;ve seen: \u0026ldquo;Treat every slice as a PR from a rather dodgy collaborator who\u0026rsquo;s very productive in the lines-of-code sense, but you can\u0026rsquo;t trust a thing they\u0026rsquo;re doing.\u0026rdquo;\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"perception-is-reality\"\u003ePerception is reality\u003c/h2\u003e\n\u003cp\u003eThe 19% slowdown number has been contested, fairly: the CI is wide (+2% to +39%), the study covered experienced developers on complex codebases, and METR has acknowledged design limitations. In February 2026, \u003ca href=\"https://metr.org/blog/2026-02-24-uplift-update/\"\u003eMETR published an update\u003c/a\u003e changing their experiment design after discovering that 30-50% of invited developers declined to participate without AI access, a selection effect that biased the original sample toward developers who benefit least from AI. Their newer cohort (800+ tasks, 57 developers) showed a -4% slowdown with a CI of -15% to +9%, substantially less negative. METR\u0026rsquo;s conclusion: \u0026ldquo;AI likely provides productivity benefits in early 2026.\u0026rdquo; The perception gap and the bottleneck problem remain real, but the exact magnitude of the July 2025 finding should be read with that caveat.\u003c/p\u003e\n\u003cp\u003eMETR\u0026rsquo;s companion \u003ca href=\"https://arxiv.org/abs/2503.14499\"\u003eHorizon benchmark\u003c/a\u003e (Kwa et al., 2025) puts numbers to that curve: the 50%-task-completion time horizon for Claude 3.7 Sonnet was 60 minutes. Claude Opus 4.6, released February 2026, reached 719 minutes. The doubling time from 2023 is approximately 128 days. METR frames the productivity result as a point on that trend, not a fixed constant, though they also note that their benchmark tasks are cleaner than real production work and performance on \u0026ldquo;messier\u0026rdquo; tasks may improve more slowly. But the perception gap itself is more robust than the exact slowdown figure, and it replicates.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://survey.stackoverflow.co/2025/ai/\"\u003eStack Overflow\u0026rsquo;s 2025 Developer Survey\u003c/a\u003e found favorable views of AI tools dropped from 70% to 60%, with 46% not trusting AI output and 66% citing \u0026ldquo;almost right but not quite\u0026rdquo; as their top frustration. \u003ca href=\"https://www.software.com/reports/code-time-report\"\u003eSoftware.com\u0026rsquo;s monitoring\u003c/a\u003e of 250,000 developers found the median developer codes for 52 minutes per day, about 11% of a 40-hour week. The tools are fighting over 11% of the workday.\u003c/p\u003e\n\u003cp\u003eA \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566\"\u003efield experiment across 4,867 developers\u003c/a\u003e from MIT, Princeton, Wharton, and Microsoft found that above-median-tenure developers showed no significant productivity increase from AI tools. The people capable of using AI most effectively are also the people most likely to catch when it\u0026rsquo;s wrong and fix it. It\u0026rsquo;s why the tools work better for junior developers on simple tasks than for senior developers on the things that actually matter most.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"githubs-2022-copilot-study\"\u003eGitHub\u0026rsquo;s 2022 Copilot study\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://arxiv.org/abs/2302.06590\"\u003eGitHub\u0026rsquo;s 2022 Copilot study\u003c/a\u003e, the \u0026ldquo;55% faster\u0026rdquo; figure, still appears in enterprise sales decks in 2026. One JavaScript task: implementing a web server with HTTP endpoints. Thirty-five completers. No assessment of output quality, test coverage, or whether the code would survive production. Confidence interval: 21% to 89%. Participants knew they were being timed for productivity.\u003c/p\u003e\n\u003cp\u003eWhat the study actually shows is that when you pick a task specifically suited to AI assistance and measure completion time without checking correctness, AI looks fast. That\u0026rsquo;s a real finding. It\u0026rsquo;s just not the one being used to justify eight-figure licensing deals.\u003c/p\u003e\n\u003ch2 id=\"macro-data\"\u003eMacro data\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.apolloacademy.com/waiting-for-the-ai-j-curve/\"\u003eApollo\u0026rsquo;s Torsten Slok\u003c/a\u003e wrote in early 2026: \u0026ldquo;AI is everywhere except in the incoming macroeconomic data.\u0026rdquo; An \u003ca href=\"https://www.nber.org/papers/w34836\"\u003eNBER paper from February 2026\u003c/a\u003e surveying nearly 6,000 executives found over 80% of firms reported AI had no impact on productivity over the preceding three years. Expected improvement over the next three: 1.4%.\u003c/p\u003e\n\u003cp\u003eDaron Acemoglu, who shared the 2024 Nobel Prize in Economics partly for his work on technology and labor markets, \u003ca href=\"https://www.nber.org/papers/w32487\"\u003eprojected\u003c/a\u003e a 0.5% total factor productivity increase from AI over the next decade. His reasoning: the economic value of AI concentrates in a narrow set of tasks that don\u0026rsquo;t represent enough of total economic activity to move aggregate numbers. The Bain arithmetic, at macroeconomic scale.\u003c/p\u003e\n\u003cp\u003eThe standard optimist response is the IT comparison: computers entered enterprises in the 1970s and 1980s without producing measurable productivity improvements for a decade, then the gains came in the mid-1990s. It\u0026rsquo;s a reasonable historical parallel. I\u0026rsquo;m genuinely uncertain whether it applies. Computers replaced manual processes wholesale. AI coding tools are a faster ingredient inside a process whose other ingredients haven\u0026rsquo;t changed: the requirements still need to be understood, the review still needs to happen, the tests still need to pass. The productivity lag might resolve. Or the structure of the workflow might mean it doesn\u0026rsquo;t, even eventually. I don\u0026rsquo;t know, and the honest answer is that nobody does yet.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"where-the-value-actually-lands\"\u003eWhere the value actually lands\u003c/h2\u003e\n\u003cp\u003eExploration is faster. When I\u0026rsquo;m working on something unfamiliar, a library I haven\u0026rsquo;t used, an API I\u0026rsquo;m integrating for the first time, the startup cost drops. A working first draft arrives in minutes rather than hours. That\u0026rsquo;s real, and I notice it. Whether it shows up in throughput metrics is a different question, and the data suggests mostly not, because the constraint was never the first draft.\u003c/p\u003e\n\u003cp\u003eBoilerplate, test scaffolding, documentation: these genuinely benefit too. The tasks that are well-scoped and low-stakes if approximately wrong are where these tools earn their keep. Anyone who\u0026rsquo;s used them seriously already knew this before the research said so.\u003c/p\u003e\n\u003cp\u003eSimon Willison, in an \u003ca href=\"https://www.npr.org/2025/10/21/nx-s1-5506141/ai-code-software-productivity-claims\"\u003eNPR interview\u003c/a\u003e: \u0026ldquo;Our job is not to type code into a computer. Our job is to deliver systems that solve problems.\u0026rdquo; The tools handle the first part better than they did a year ago. The second part hasn\u0026rsquo;t changed.\u003c/p\u003e\n\u003ch2 id=\"the-right-question\"\u003eThe right question\u003c/h2\u003e\n\u003cp\u003eThe useful product question, if the bottleneck is now review, is what makes review faster and more reliable, not what generates more code faster. AI tools that flag security issues, catch logic errors, and surface context about why code was written a certain way would attack the actual constraint. This is at least part of what Cursor is working toward with Graphite.\u003c/p\u003e\n\u003cp\u003eThe harder problem is cultural. \u003ca href=\"https://www.bain.com/insights/from-pilots-to-payoff-generative-ai-in-software-development-technology-report-2025/\"\u003eBain\u003c/a\u003e and DORA say the same thing from different angles: AI amplifies what\u0026rsquo;s already there. Teams with good review practices and clear requirements get leverage. Teams without them produce more code that still doesn\u0026rsquo;t ship on time. The organizations that most want a tool to fix their velocity tend to be the ones with the process debt that prevents any tool from working.\u003c/p\u003e\n\u003cp\u003eI have no idea what the five-year picture looks like. The Solow paradox took a decade to resolve and resolved in ways nobody expected. Maybe the AI productivity gains show up in 2029 and the 2026 skeptics look naive. Genuinely possible. I try to hold that view honestly rather than dismiss it.\u003c/p\u003e\n\u003cp\u003eWhat the data shows now: at 92.6% monthly adoption and roughly 27% of production code AI-generated, the experiment has run at real scale. Organizational throughput hasn\u0026rsquo;t moved past 10%. Experienced developers are slower with AI assistance than without it. Bugs are up, review times are up, code quality metrics are declining, and DORA stability goes the wrong way as adoption increases.\u003c/p\u003e\n","summary":"METR found experienced developers 19% slower with AI, despite feeling 20% faster. At 92.6% adoption, organizational productivity gains remain roughly 10%.","image":"https://static.philippdubach.com/ograph/ograph-ai-coding-productivity1.jpg","date_published":"2026-03-04T00:00:00Z","date_modified":"2026-05-04T13:48:47+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Tech"],"_philippdubach":{"type":"Analysis","word_count":1880,"reading_time_minutes":9,"keywords":["AI coding productivity","AI developer productivity paradox","does AI coding improve productivity","METR AI developer study 2026","GitHub Copilot productivity ROI","AI coding tools productivity","developer productivity AI","AI code review bottleneck","AI generated code quality","DORA metrics AI adoption","Amdahl's law software development","AI pair programming ROI","software engineering productivity","AI coding tools enterprise ROI 2026","code generation productivity gap"],"section":"posts"}},{"id":"https://philippdubach.com/posts/peter-thiels-physics-department/","url":"https://philippdubach.com/posts/peter-thiels-physics-department/","title":"Peter Thiel's Physics Department","content_html":"\u003cp\u003eOn December 11, \u003ca href=\"https://en.wikipedia.org/wiki/Jimmy_Carr\"\u003eJimmy Carr\u003c/a\u003e sat on the \u003ca href=\"https://www.youtube.com/watch?v=mWDCZIvLrS4\"\u003eTRIGGERnometry podcast\u003c/a\u003e and delivered a riff that sounded like Peter Thiel\u0026rsquo;s stagnation thesis filtered through a comedian\u0026rsquo;s timing:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eMinus the screens from any room, we\u0026rsquo;re living in the 1970s. Nothing\u0026rsquo;s happened in physics since \u0026lsquo;72. String theory has not got us anywhere. But if you take the compute power of AI and point it at physics, what happens? We could have a world of plenty. I hope that\u0026rsquo;s the world we live in. But it could go another way.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eTwo months later, on February 13, GPT-5.2 \u003ca href=\"https://thequantuminsider.com/2026/02/13/ai-scientist-spots-what-physicists-missed-in-gluon-scattering/\"\u003ederived and formally proved\u003c/a\u003e a new result in theoretical physics: single-minus gluon scattering amplitudes, long assumed to vanish, are nonzero in the half-collinear regime. Nima Arkani-Hamed at the Institute for Advanced Study called the formulas \u0026ldquo;strikingly simple\u0026rdquo; after fifteen years of personal curiosity about the problem. Nathaniel Craig at UC Santa Barbara called it \u0026ldquo;journal-level research advancing the frontiers of theoretical physics.\u0026rdquo;\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"thiels-stagnation-case\"\u003eThiel\u0026rsquo;s stagnation case\u003c/h2\u003e\n\u003cp\u003eCarr was paraphrasing Thiel, who has been making this argument for fifteen years. The \u003ca href=\"https://www.scribd.com/document/61379051/What-Happened-to-the-Future-Founders-Fund-Manifesto\"\u003eFounders Fund manifesto\u003c/a\u003e (2011) put it bluntly: \u0026ldquo;We wanted flying cars, instead we got 140 characters.\u0026rdquo; Thiel\u0026rsquo;s framework distinguishes progress in bits from progress in atoms: spectacular digital gains since 1970, physical-world stagnation. Tyler Cowen named the broader phenomenon the Great Stagnation. On the \u003ca href=\"https://singjupost.com/a-i-mars-and-immortality-are-we-dreaming-big-enough-peter-thiel-transcript/\"\u003eDouthat podcast\u003c/a\u003e Thiel was more measured: \u0026ldquo;The claim was that the velocity had slowed, it wasn\u0026rsquo;t zero.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe data supports the velocity claim. Total factor productivity growth, the metric that captures genuine scientific progress and technological improvement, ran at roughly 1.7% annually from 1947 to 1973. Since 2004, it has averaged 0.4%. Robert Gordon\u0026rsquo;s \u003cem\u003eThe Rise and Fall of American Growth\u003c/em\u003e argues the \u0026ldquo;special century\u0026rdquo; of 1870 to 1970 was a one-time event. \u003ca href=\"https://mattsclancy.substack.com/p/science-is-getting-harder\"\u003eBloom, Jones, Van Reenen, and Webb\u003c/a\u003e showed in the \u003cem\u003eAmerican Economic Review\u003c/em\u003e that maintaining Moore\u0026rsquo;s Law required 18x more researchers in 2014 versus 1971.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-tfp-growth-stagnation-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/tfp-growth-stagnation.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/tfp-growth-stagnation.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/tfp-growth-stagnation.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/tfp-growth-stagnation.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tfp-growth-stagnation.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/tfp-growth-stagnation.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/tfp-growth-stagnation.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/tfp-growth-stagnation.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tfp-growth-stagnation.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/tfp-growth-stagnation.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/tfp-growth-stagnation.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tfp-growth-stagnation.png\"\n           alt=\"Peter Thiel\u0026#39;s stagnation thesis in data: US Total Factor Productivity growth by era showing 1.7 percent annually from 1947 to 1973 during the postwar boom, collapsing to 0.5 percent from 1973 to 1996, briefly recovering to 2.0 percent during the IT revival of 1996 to 2004, then falling back to 0.4 percent from 2004 to present, a 76 percent decline from the postwar peak\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-tfp-growth-stagnation-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/tfp-growth-stagnation.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Peter Thiel\u0026#39;s stagnation thesis in data: US Total Factor Productivity growth by era showing 1.7 percent annually from 1947 to 1973 during the postwar boom, collapsing to 0.5 percent from 1973 to 1996, briefly recovering to 2.0 percent during the IT revival of 1996 to 2004, then falling back to 0.4 percent from 2004 to present, a 76 percent decline from the postwar peak\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe Standard Model of particle physics was essentially complete by the early 1970s. Since then, we have confirmed things we already predicted: the Higgs boson (2012, 48 years after prediction), gravitational waves (2015, 99 years after Einstein), the accelerating expansion of the universe (1998). Important experimental work. But confirmations, not revolutions. No supersymmetric particles. No extra dimensions. No new fundamental energy sources. No unified field theory. String theory, the leading candidate for physics beyond the Standard Model, has produced \u003ca href=\"https://www.researchgate.net/publication/334607591_The_String_Theory_Landscape\"\u003ezero experimentally confirmed predictions\u003c/a\u003e in 55 years and admits roughly 10^500 possible solutions, which is another way of saying it predicts everything and therefore nothing. \u003ca href=\"https://www.goodreads.com/author/quotes/17201066.Sabine_Hossenfelder\"\u003eSabine Hossenfelder\u003c/a\u003e captured the frustration:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eTheoretical physicists used to explain what was observed. Now they try to explain why they can\u0026rsquo;t explain what was not observed.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"what-ai-has-already-done-for-science\"\u003eWhat AI has already done for science\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://x.com/demishassabis/status/1845864764469334239\"\u003eAlphaFold\u003c/a\u003e predicted the three-dimensional structures of 214 million proteins, solving the protein folding problem for structural biology. It won the 2024 Nobel Prize in Chemistry for Demis Hassabis and John Jumper, and has been used by over 2 million researchers in 190 countries. DeepMind\u0026rsquo;s \u003ca href=\"https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/\"\u003eGNoME\u003c/a\u003e identified 2.2 million new crystal structures and 381,000 predicted-stable materials, equivalent to roughly 800 years of prior human discovery in materials science. Lawrence Berkeley Lab\u0026rsquo;s A-Lab robotically synthesized 41 of these in \u003ca href=\"https://deepmind.google/blog/millions-of-new-materials-discovered-with-deep-learning/\"\u003e17 days\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eIn fusion, \u003ca href=\"https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/\"\u003eDeepMind trained a reinforcement learning system\u003c/a\u003e to autonomously control plasma in a real tokamak at EPFL, sculpting it into configurations no human operator had achieved. \u003ca href=\"https://engineering.princeton.edu/news/2024/02/21/engineers-use-ai-wrangle-fusion-power-grid\"\u003ePrinceton researchers\u003c/a\u003e predicted tearing instabilities 300 milliseconds in advance and adjusted reactor parameters in real time: the first demonstration of preventing, not just suppressing, the instabilities that have plagued fusion for decades. \u003ca href=\"https://www.cleanenergy-platform.com/insight/inside-taes-2025-plasma-breakthroughand-how-it-changed-fusions-trajectory\"\u003eTAE Technologies\u003c/a\u003e used AI-optimized beam injection to sustain plasma above 70 million degrees C. At Lawrence Livermore, the CogSim AI framework \u003ca href=\"https://lasers.llnl.gov/news/llnl-researchers-employed-ai-driven-model-predict-fusion-ignition-shot\"\u003epredicted a 74% probability of ignition\u003c/a\u003e days before the December 2022 shot that achieved it.\u003c/p\u003e\n\u003cp\u003eMicrosoft and Pacific Northwest National Lab \u003ca href=\"https://www.datacenterdynamics.com/en/news/microsoft-and-pnnl-use-ai-and-hpc-for-battery-materials-research/\"\u003escreened 32.6 million inorganic materials\u003c/a\u003e in roughly 80 hours, identified 18 finalists, and produced a \u003ca href=\"https://techround.co.uk/news/microsofts-ai-powered-battery-discovery-could-replace-lithium/\"\u003eworking battery prototype\u003c/a\u003e using 70% less lithium within nine months. In drug discovery, at least \u003ca href=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC11800368/\"\u003e75 AI-discovered drugs\u003c/a\u003e have entered clinical trials, up from 3 in 2016, with Phase I success rates of 80 to 90% compared to the traditional 40%.\u003c/p\u003e\n\u003cp\u003eAnd then, GPT-5.2 produced a new result in theoretical physics. A proof that human physicists had not found. The mathematical reasoning timeline tells the story. \u003ca href=\"https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/\"\u003eAlphaGeometry\u003c/a\u003e solved 25 of 30 Olympiad geometry problems in January 2024. By July 2024, \u003ca href=\"https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/\"\u003eAlphaProof earned a silver medal\u003c/a\u003e at the International Mathematical Olympiad. By 2025, \u003ca href=\"https://deepmind.google/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/\"\u003eGemini Deep Think scored gold\u003c/a\u003e: 5 of 6 problems, 35 points, end-to-end in natural language. Terence Tao \u003ca href=\"https://siliconreckoner.substack.com/p/terence-tao-on-machine-assisted-proofs\"\u003erevised his prediction\u003c/a\u003e for superhuman AI mathematics from 2029 to 2026.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"751-compute-gap\"\u003e75:1 compute gap\u003c/h2\u003e\n\u003cp\u003eHere is the number that matters. Big Tech spent over \u003cstrong\u003e$250 billion\u003c/strong\u003e on AI infrastructure in 2024 and 2025. Total US federal AI R\u0026amp;D spending: \u003ca href=\"https://federalbudgetiq.com/insights/federal-ai-and-it-research-and-development-spending-analysis/\"\u003e\u003cstrong\u003e$3.3 billion\u003c/strong\u003e per year\u003c/a\u003e. That is a compute divide of roughly 75:1 between commercial and scientific AI investment. The \u003ca href=\"https://cset.georgetown.edu/article/the-nairr-pilot-estimating-compute/\"\u003eNAIRR pilot\u003c/a\u003e allocated about 3.2 yottaFLOPs to academic researchers, enough to train GPT-3.5 once but not enough for a single GPT-4-class run.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-compute-gap-75-to-1-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/compute-gap-75-to-1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/compute-gap-75-to-1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/compute-gap-75-to-1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/compute-gap-75-to-1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compute-gap-75-to-1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/compute-gap-75-to-1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/compute-gap-75-to-1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/compute-gap-75-to-1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compute-gap-75-to-1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/compute-gap-75-to-1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/compute-gap-75-to-1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compute-gap-75-to-1.png\"\n           alt=\"The 75 to 1 AI compute gap between industry and science: Big Tech AI capex at over 250 billion dollars per year versus total federal AI R\u0026amp;D spending at 3.3 billion, DOE FASST at 2.4 billion authorized but pending, DOE Genesis at 320 million one-time, and NSF core AI at 494 million per year\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-compute-gap-75-to-1-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/compute-gap-75-to-1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"The 75 to 1 AI compute gap between industry and science: Big Tech AI capex at over 250 billion dollars per year versus total federal AI R\u0026amp;D spending at 3.3 billion, DOE FASST at 2.4 billion authorized but pending, DOE Genesis at 320 million one-time, and NSF core AI at 494 million per year\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe DOE\u0026rsquo;s \u003ca href=\"https://www.anl.gov/article/what-were-argonnes-top-science-research-breakthroughs-in-2025\"\u003eGenesis Mission\u003c/a\u003e announced $320 million in December 2025. That is less than what Meta spends on AI infrastructure in a week. The \u003ca href=\"https://federalbudgetiq.com/insights/federal-ai-and-it-research-and-development-spending-analysis/\"\u003eFASST initiative\u003c/a\u003e authorized $2.4 billion per year for five years, $12 billion total, but congressional appropriations are still pending. The US has three exascale supercomputers at national labs. These serve all of science, not just AI.\u003c/p\u003e\n\u003cp\u003eIf AI has already produced results in theoretical physics, materials science, fusion energy, and drug discovery with what amounts to scraps from the commercial table, what happens when someone makes a serious allocation? \u003ca href=\"https://fortune.com/2026/02/11/demis-hassabis-nobel-google-deepmind-predicts-ai-renaissance-radical-abundance/\"\u003eHassabis told Fortune\u003c/a\u003e in February 2026 that in 10 to 15 years \u0026ldquo;we\u0026rsquo;ll be in a kind of new golden era of discovery, a kind of new renaissance.\u0026rdquo; He described a vision of \u0026ldquo;radical abundance\u0026rdquo; where AI has \u0026ldquo;successfully bottled the scientific method.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.goldmansachs.com/insights/articles/generative-ai-could-raise-global-gdp-by-7-percent\"\u003eGoldman Sachs estimates\u003c/a\u003e generative AI could raise global GDP by 7%, roughly $7 trillion. \u003ca href=\"https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-next-innovation-revolution-powered-by-ai\"\u003eMcKinsey pegs\u003c/a\u003e R\u0026amp;D-specific value at $360 to $560 billion annually, but explicitly noted they did not attempt to estimate\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003ethe value of truly breakthrough innovations that transform markets (if, for example, nuclear fusion was to enable limitless, clean electricity production).\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-bear-case-pattern-matching-is-not-physics\"\u003eThe bear case: pattern matching is not physics\u003c/h2\u003e\n\u003cp\u003eThe bear case is simple and serious. AI is the best pattern-matching system ever built. Physics does not advance by pattern matching. It advances by conceptual revolution: Riemannian geometry for general relativity, an entirely new mathematical framework for quantum mechanics, gauge theory for the Standard Model. None of these were discoverable in existing data.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://medium.com/@abdullrahmanburhan36/noam-chomsky-on-the-false-promise-of-chatgpt-18c70cda5e24\"\u003eNoam Chomsky\u003c/a\u003e argued in the \u003cem\u003eNew York Times\u003c/em\u003e that AI\u0026rsquo;s deepest flaw \u0026ldquo;is the absence of the most critical capacity of any intelligence: to say not only what is the case \u0026hellip; but also what is not the case and what could and could not be the case.\u0026rdquo; A commenter on \u003ca href=\"https://www.math.columbia.edu/~woit/wordpress/?p=15362\"\u003ePeter Woit\u0026rsquo;s blog\u003c/a\u003e at Columbia spent \u0026ldquo;over 100 hours probing these models\u0026rdquo; on open problems and found they \u0026ldquo;basically never try to come up with something new\u0026rdquo; when the answer is not already in the training data.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.darioamodei.com/essay/machines-of-loving-grace\"\u003eDario Amodei\u003c/a\u003e was notably careful in \u0026ldquo;Machines of Loving Grace.\u0026rdquo; He predicted AI could compress 50 to 100 years of biological progress into 5 to 10 years, but on physics he hedged: particle physicists are \u0026ldquo;limited by data from particle accelerators\u0026rdquo; and \u0026ldquo;it\u0026rsquo;s not clear that they would do drastically better if they were superintelligent.\u0026rdquo; Some problems are not compute-limited. They are experiment-limited, or concept-limited, or both.\u003c/p\u003e\n\u003cp\u003eStephen Wolfram\u0026rsquo;s principle of computational irreducibility poses the hardest theoretical limit: some systems cannot be predicted by any shortcut. The only way to know what they do is to run them. If fundamental physics contains computationally irreducible problems, no amount of AI compute will crack them.\u003c/p\u003e\n\u003cp\u003eBut \u003ca href=\"https://mariokrenn.wordpress.com/\"\u003eMario Krenn\u003c/a\u003e at Max Planck offers a counterpoint from the lab bench. His team published in \u003cem\u003ePhysical Review X\u003c/em\u003e on AI-discovered gravitational wave detector designs that outperform human designs, and in \u003cem\u003eScience Advances\u003c/em\u003e on an AI-discovered violation of Bell inequality with unentangled photons. He does not claim AI understands physics. He claims it finds things physicists miss: \u0026ldquo;I let the algorithm run, and within a few hours it found exactly the solution that we as human scientists couldn\u0026rsquo;t find for many weeks.\u0026rdquo;\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai-science-paradox-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai-science-paradox.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai-science-paradox.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai-science-paradox.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai-science-paradox.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-science-paradox.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai-science-paradox.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai-science-paradox.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai-science-paradox.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-science-paradox.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai-science-paradox.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai-science-paradox.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai-science-paradox.png\"\n           alt=\"The AI scientific discovery paradox: quantity metrics surging with 3x more papers published, 4.8x more citations received, and 33 percent more arXiv preprints, but quality metrics declining with 4.6 percent less topical territory covered, 22 percent less cross-paper engagement, and researchers herding toward the same topics\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai-science-paradox-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai-science-paradox.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"The AI scientific discovery paradox: quantity metrics surging with 3x more papers published, 4.8x more citations received, and 33 percent more arXiv preprints, but quality metrics declining with 4.6 percent less topical territory covered, 22 percent less cross-paper engagement, and researchers herding toward the same topics\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003ch2 id=\"two-roads\"\u003eTwo roads\u003c/h2\u003e\n\u003cp\u003eThe nuclear parallel is the one that matters. Fission was discovered in Berlin in December 1938. Hiroshima was August 1945. Seven years from pure physics to weapon. The first nuclear power plant came nine years later. Oppenheimer captured the dynamic: \u0026ldquo;When you see something that is technically sweet, you go ahead and do it, and you argue about what to do about it only after you have had your technical success.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eEvery AI-accelerated physics breakthrough is inherently dual-use technology. The \u003ca href=\"https://www.peaknano.com/blog/the-iaea-world-fusion-outlook-2025\"\u003eIAEA reports\u003c/a\u003e 35 of 45 private fusion companies expect commercial pilot plants between 2030 and 2035. Commonwealth Fusion Systems has raised roughly $3 billion. \u003ca href=\"https://english.news.cn/20250724/213ed7ff0e954935bd5645b30a9dafe3/c.html\"\u003eChina established a state-owned fusion company\u003c/a\u003e in July 2025. The fusion market is projected at $430 billion by 2030. The same plasma control AI that keeps a tokamak stable could, in principle, optimize weapons physics.\u003c/p\u003e\n\u003cp\u003eI don\u0026rsquo;t know which road we\u0026rsquo;re on. I\u0026rsquo;m not sure anyone does. But the velocity of AI scientific discovery, from Olympiad geometry problems to a gold medal at the International Mathematical Olympiad to a result in theoretical physics, all within 25 months, suggests the question will be answered empirically rather than philosophically. And probably sooner than the physicists expect.\u003c/p\u003e\n\u003cp\u003eThe cost of intelligence has fallen roughly \u003ca href=\"https://blog.samaltman.com/three-observations\"\u003e150x\u003c/a\u003e in two years. The cost of pointing it at physics is a policy choice, not a technical constraint. The 75:1 compute gap between commercial and scientific AI spending is the number that determines how fast this goes. Whether it should go fast is a different question entirely.\u003c/p\u003e\n","summary":"Peter Thiel says physics stalled in 1972. Then GPT-5.2 proved a new result in theoretical physics. The 75:1 AI compute gap between commerce and science.","image":"https://static.philippdubach.com/ograph/ograph-physics-department1.jpg","date_published":"2026-03-02T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Article","word_count":1648,"reading_time_minutes":8,"keywords":["AI scientific discovery","AI for science","GPT-5.2 gluon discovery","Peter Thiel stagnation thesis","AI compute gap science","Great Stagnation AI","AI breakthrough physics 2026","total factor productivity decline","AlphaFold Nobel Prize 2024","federal AI R\u0026D spending","AI fusion energy plasma control","dual-use AI technology","DeepMind GNoME materials discovery","AI drug discovery clinical trials","scattering amplitudes theoretical physics","research productivity decline","computational irreducibility physics","bits vs atoms innovation","AI accelerated physics","scientific progress stagnation"],"section":"posts"}},{"id":"https://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/","url":"https://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/","title":"Every Bulge Bracket Bank Agrees on AI","content_html":"\u003cfigure class=\"post-figure\" style=\"width: 100%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-pdf_covers_overview-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/pdf_covers_overview.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/pdf_covers_overview.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/pdf_covers_overview.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/pdf_covers_overview.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pdf_covers_overview.png 1200w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/pdf_covers_overview.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/pdf_covers_overview.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/pdf_covers_overview.png 1440w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pdf_covers_overview.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/pdf_covers_overview.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/pdf_covers_overview.png 2000w\"\n              sizes=\"100vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/pdf_covers_overview.png\"\n           alt=\"Cover pages of 12 AI research reports from Goldman Sachs, JPMorgan, Morgan Stanley, UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-pdf_covers_overview-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/pdf_covers_overview.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Cover pages of 12 AI research reports from Goldman Sachs, JPMorgan, Morgan Stanley, UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eI spent the last week reading 12 bank AI research reports from nine of the world\u0026rsquo;s largest financial institutions: Goldman Sachs, JPMorgan, Morgan Stanley (three separate reports), UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander. I wanted to understand how institutions that collectively manage trillions of dollars and employ thousands of analysts actually see this technology heading into 2026: where they agree, where they diverge, and what they\u0026rsquo;re being less than forthcoming about.\u003c/p\u003e\n\u003cp\u003eWhat I found is useful, sometimes impressive, and \u003cem\u003e(mostly)\u003c/em\u003e worth reading.\u003c/p\u003e\n\u003ch2 id=\"concerning-consensus\"\u003eConcerning consensus\u003c/h2\u003e\n\u003cp\u003eEvery single institution frames AI as a general-purpose technology, not a product cycle. The analogies converge almost word-for-word: \u003ca href=\"https://www.goldmansachs.com/what-we-do/investment-banking/insights/articles/powering-the-ai-era/report.pdf\"\u003eGoldman Sachs\u003c/a\u003e draws the line through railroads, electrification, and telecom. \u003ca href=\"https://www.santander.com/en/press-room/the-year-ahead-2025/the-macroeconomic-effects-of-artificial-intelligence\"\u003eSantander\u003c/a\u003e deploys a formal three-stage GPT framework: steam, ICT, AI. \u003ca href=\"https://www.morganstanley.com/im/en-us/individual-investor/insights/tales-from-the-emerging-world/ais-silicon-backbone.html\"\u003eMorgan Stanley\u0026rsquo;s semiconductor team\u003c/a\u003e writes that AI is \u0026ldquo;closer to electricity than consumer gadgets.\u0026rdquo; Deutsche Bank projects \u003cstrong\u003e+$7 trillion\u003c/strong\u003e in global GDP over the decade. \u003ca href=\"https://www.ubs.com/global/en/wealthmanagement/insights/artificial-intelligence.html\"\u003eUBS\u003c/a\u003e puts the AI revenue opportunity at \u003cstrong\u003e$2.6 trillion\u003c/strong\u003e by 2030.\u003c/p\u003e\n\u003cp\u003eNot one of the twelve reports seriously entertains the possibility that AI is more like 3D printing: genuinely useful in pockets, broadly disappointing in aggregate. Santander comes closest, citing \u003ca href=\"https://www.nber.org/papers/w32487\"\u003eDaron Acemoglu\u0026rsquo;s\u003c/a\u003e conservative \u003cstrong\u003e+0.7% cumulative TFP\u003c/strong\u003e estimate over ten years, but even Santander frames that as the floor of the range, not the central case. The optimistic end of the same distribution sits at \u003cstrong\u003e+10–15%\u003c/strong\u003e. That\u0026rsquo;s not a rounding error. It\u0026rsquo;s a fundamental disagreement about whether AI will re-run the productivity miracle of electrification or prove more modest in aggregate, and most banks quietly pick the point on the distribution that best supports their commercial positioning.\u003c/p\u003e\n\u003cp\u003eThe chart below plots each bank by how bullish they are on AI\u0026rsquo;s economic impact against how grounded their analysis is in current empirical data versus forward projections. Bank of America sits alone in the top-right: data-driven and moderately bullish. Goldman sits at the bottom-right: maximally bullish, maximally projective. Santander is the lone occupant of the top-left: empirical and cautious.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-exhibit-1-macro-conviction1-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/exhibit-1-macro-conviction1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/exhibit-1-macro-conviction1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/exhibit-1-macro-conviction1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/exhibit-1-macro-conviction1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-1-macro-conviction1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/exhibit-1-macro-conviction1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/exhibit-1-macro-conviction1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/exhibit-1-macro-conviction1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-1-macro-conviction1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/exhibit-1-macro-conviction1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/exhibit-1-macro-conviction1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-1-macro-conviction1.png\"\n           alt=\"Bank AI research reports compared on two axes: macro conviction (cautious to bullish) and evidence basis (projective to empirical). BofA is the only data-driven bull. Goldman Sachs is a projective bull. Santander is the only data-driven skeptic. Most institutions cluster in the bullish-projective quadrant.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-exhibit-1-macro-conviction1-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/exhibit-1-macro-conviction1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Bank AI research reports compared on two axes: macro conviction (cautious to bullish) and evidence basis (projective to empirical). BofA is the only data-driven bull. Goldman Sachs is a projective bull. Santander is the only data-driven skeptic. Most institutions cluster in the bullish-projective quadrant.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThat chart is an editorial interpretation, not a precise measurement. But the shape is right. Bank of America is the only institution that consistently anchors its claims to actual GDP data rather than projections. Goldman Sachs, at the other extreme, produces a report that reads as a pitch to every infrastructure CFO and sovereign wealth fund in the world. Both can be making valid arguments. They\u0026rsquo;re just not making the same kind.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"whats-happening-vs-what-might-happen\"\u003eWhat’s happening vs. what might happen\u003c/h2\u003e\n\u003cp\u003eBofA and Santander are the two worth pausing on, because they\u0026rsquo;re doing something different from the rest: they\u0026rsquo;re reporting what\u0026rsquo;s happening rather than what might happen.\u003c/p\u003e\n\u003cp\u003eBank of America, using Bureau of Labor Statistics and Bureau of Economic Analysis data, finds that AI capex contributed \u003cstrong\u003e1.4–1.5 percentage points\u003c/strong\u003e to US GDP growth in H1 2025. Headline growth rates were running around 2% in that period. So AI infrastructure spending was the single largest driver of US economic expansion. That\u0026rsquo;s a real number from real data, and it\u0026rsquo;s the most important figure in any of these reports.\u003c/p\u003e\n\u003cp\u003eBofA also finds a \u003cem\u003epositive\u003c/em\u003e correlation between AI adoption and employment in white-collar sectors: software developers are up \u003cstrong\u003e+17.9%\u003c/strong\u003e, while insurance appraisers, a role where AI substitutes directly for human judgment, are down \u003cstrong\u003e-20%\u003c/strong\u003e. The disruption is concentrated in specific tasks. It hasn\u0026rsquo;t shown up in aggregate employment. Yet.\u003c/p\u003e\n\u003cp\u003eThen there\u0026rsquo;s Santander, which writes the most academically rigorous report of the twelve and includes numbers the consensus would rather not linger on. The enterprise AI adoption rate data is sobering: only around \u003cstrong\u003e10% of US companies\u003c/strong\u003e are actually using AI to produce goods and services. \u003cstrong\u003e42% of companies abandoned GenAI projects in 2024\u003c/strong\u003e, a figure corroborated by \u003ca href=\"https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf\"\u003eMIT\u0026rsquo;s 2025 GenAI Divide research\u003c/a\u003e, which found 95% of enterprise pilots fail to reach production. Only \u003cstrong\u003e1%\u003c/strong\u003e of companies describe their rollouts as mature. Meanwhile, 78% say they use AI in at least one function. The gap between \u0026ldquo;we have a pilot\u0026rdquo; and \u0026ldquo;this is generating value\u0026rdquo; is enormous.\u003c/p\u003e\n\u003cp\u003eGoldman\u0026rsquo;s \u003cstrong\u003e$800 million per day\u003c/strong\u003e in hyperscaler capex and Santander\u0026rsquo;s 42% abandonment rate aren\u0026rsquo;t as contradictory as they look. Capex precedes productivity in every infrastructure cycle. That part is historically unambiguous. The question is how long the gap lasts, and whether the eventual productivity gains justify what\u0026rsquo;s been spent getting there.\u003c/p\u003e\n\u003ch2 id=\"dotcom-comparison\"\u003eDotcom comparison\u003c/h2\u003e\n\u003cp\u003eEvery report that addresses the bubble question reaches the same conclusion: this isn\u0026rsquo;t the late 1990s.\u003c/p\u003e\n\u003cp\u003eThe primary evidence is valuation. Nvidia trades at \u003cstrong\u003e25–30x forward earnings\u003c/strong\u003e versus Cisco\u0026rsquo;s \u003cstrong\u003e~140x\u003c/strong\u003e at the March 2000 peak. The Magnificent 6 sit at roughly \u003cstrong\u003e35x\u003c/strong\u003e versus \u003cstrong\u003e55x\u003c/strong\u003e for the TMT index at its apex. \u003ca href=\"https://www.morganstanley.com/im/en-us/individual-investor/insights/tales-from-the-emerging-world/ais-silicon-backbone.html\"\u003eMorgan Stanley\u0026rsquo;s Silicon Backbone report\u003c/a\u003e makes this comparison rigorously, and I think they\u0026rsquo;re right that the earnings quality is categorically different from dot-com era technology stocks.\u003c/p\u003e\n\u003cp\u003eBut the comparison works less cleanly when you look at concentration rather than individual valuations. Deutsche Bank notes that the top 10 S\u0026amp;P 500 companies now represent \u003cstrong\u003e40% of total market cap\u003c/strong\u003e, an extreme not seen at the dot-com peak. A \u003ca href=\"https://www.investing.com/news/stock-market-news/bofas-survey-shows-54-of-investors-say-ai-in-bubble-60-say-stocks-overvalued-4284842\"\u003eBank of America fund manager survey\u003c/a\u003e from October 2025 found \u003cstrong\u003e54% of global managers believe AI equities are in a bubble\u003c/strong\u003e, and \u003cstrong\u003e60% view global equities as overvalued\u003c/strong\u003e. You can simultaneously hold that Nvidia\u0026rsquo;s PE is reasonable and that a portfolio with 40% weight in ten companies carries concentration risk that PE comparisons don\u0026rsquo;t capture. Reassuring on one axis. Alarming on another. Most sell-side AI research cites whichever data point supports its preferred conclusion and leaves the tension sitting there unaddressed.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;s also a subtler version of the bubble question that none of the twelve reports asks directly. The \u0026ldquo;infrastructure comes before productivity\u0026rdquo; argument is historically correct: railroads were overbuilt before they transformed commerce; the internet fibre glut of 1999–2000 eventually became the backbone of the digital economy. But the investors who financed Global Crossing and 360networks still lost everything. The infrastructure thesis being correct in the long run isn\u0026rsquo;t the same as every current valuation being justified. Goldman\u0026rsquo;s report is particularly careful to avoid addressing that distinction. The implicit message, \u0026ldquo;we financed the pipes before and it worked out,\u0026rdquo; skips past the question of which financiers got paid and which got wiped out in the transition.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"sell-side\"\u003eSell side\u003c/h2\u003e\n\u003cp\u003eThe following chart maps risk awareness against bullishness of tone, and the clustering is revealing.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-exhibit-3-risk-bullishness1-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/exhibit-3-risk-bullishness1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/exhibit-3-risk-bullishness1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/exhibit-3-risk-bullishness1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/exhibit-3-risk-bullishness1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-3-risk-bullishness1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/exhibit-3-risk-bullishness1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/exhibit-3-risk-bullishness1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/exhibit-3-risk-bullishness1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-3-risk-bullishness1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/exhibit-3-risk-bullishness1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/exhibit-3-risk-bullishness1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-3-risk-bullishness1.png\"\n           alt=\"Goldman Sachs and UBS AI research reports plotted as aggressively bullish and risk-dismissive. Santander and BofA are measured and risk-aware. HSBC is an optimistic hand-waver. Chart maps risk awareness vs bullishness of tone across 12 bank AI research reports.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-exhibit-3-risk-bullishness1-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/exhibit-3-risk-bullishness1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Goldman Sachs and UBS AI research reports plotted as aggressively bullish and risk-dismissive. Santander and BofA are measured and risk-aware. HSBC is an optimistic hand-waver. Chart maps risk awareness vs bullishness of tone across 12 bank AI research reports.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eGoldman and UBS are in the bottom-right: aggressively bullish, risk-dismissive. Santander and BofA are in the top-left, actually wrestling with the uncertainty. HSBC is the clearest case of motivated reasoning: the report is written explicitly to stop private banking clients from panic-selling their SaaS positions after multiple quarters of multiple compression. \u003cem\u003e(Whether that advice turns out to be right is a separate question.)\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eI don\u0026rsquo;t think this makes any of these reports dishonest. But the reader needs to supply the discount rate that each institution\u0026rsquo;s interests warrant.\u003c/p\u003e\n\u003cp\u003eGoldman Sachs earns advisory fees on the data centre and energy deals it describes. Barclays lends to energy infrastructure projects. Morgan Stanley is selling both EM equity exposure and second-order stock-picking strategies through its asset management arm. UBS provides a clean three-layer investment framework that maps directly to its wealth management product shelf. Citi frames AI as accelerating the electronification of markets, the very trend that drives Citi\u0026rsquo;s trading revenue. \u003ca href=\"https://fortune.com/2026/02/18/will-ai-destroy-jobs-deutsche-bank-asks-ai-to-predict/\"\u003eDeutsche Bank\u003c/a\u003e, most self-aware of the ten, used AI to generate its AI report. The meta-commentary is right there in the methodology.\u003c/p\u003e\n\u003cp\u003eNot a single report concludes \u0026ldquo;this may be overhyped and you should meaningfully reduce exposure.\u0026rdquo; Every institution has a commercial interest in the AI narrative staying bullish. That doesn\u0026rsquo;t mean the narrative is wrong. It does mean unanimous conviction from nine sell-side AI research teams is not the same thing as nine independent analyses reaching the same conclusion.\u003c/p\u003e\n\u003ch2 id=\"second-order-ai-beneficiaries\"\u003eSecond-order AI beneficiaries\u003c/h2\u003e\n\u003cp\u003eThe next two charts contain what I think is the most interesting tension across all twelve reports.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-exhibit-2-value-chain1-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/exhibit-2-value-chain1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/exhibit-2-value-chain1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/exhibit-2-value-chain1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/exhibit-2-value-chain1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-2-value-chain1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/exhibit-2-value-chain1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/exhibit-2-value-chain1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/exhibit-2-value-chain1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-2-value-chain1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/exhibit-2-value-chain1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/exhibit-2-value-chain1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-2-value-chain1.png\"\n           alt=\"Value chain focus vs time horizon: which banks favour first-order AI enablers (chips, data centres) vs second-order AI beneficiaries (deploying companies). Goldman Sachs and Barclays are near-term first-order plays. Morgan Stanley second-order report sits in long-term deployers quadrant.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-exhibit-2-value-chain1-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/exhibit-2-value-chain1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Value chain focus vs time horizon: which banks favour first-order AI enablers (chips, data centres) vs second-order AI beneficiaries (deploying companies). Goldman Sachs and Barclays are near-term first-order plays. Morgan Stanley second-order report sits in long-term deployers quadrant.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-exhibit-4-disruption-timeline1-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/exhibit-4-disruption-timeline1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/exhibit-4-disruption-timeline1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/exhibit-4-disruption-timeline1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/exhibit-4-disruption-timeline1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-4-disruption-timeline1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/exhibit-4-disruption-timeline1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/exhibit-4-disruption-timeline1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/exhibit-4-disruption-timeline1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-4-disruption-timeline1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/exhibit-4-disruption-timeline1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/exhibit-4-disruption-timeline1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/exhibit-4-disruption-timeline1.png\"\n           alt=\"AI disruption magnitude vs timeline across 12 bank research reports. Goldman Sachs and Barclays expect large near-term disruption. Santander sees incremental long-term change. Morgan Stanley robotics and JPMorgan see radical but distant disruption. BofA sees moderate disruption already underway.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-exhibit-4-disruption-timeline1-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/exhibit-4-disruption-timeline1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AI disruption magnitude vs timeline across 12 bank research reports. Goldman Sachs and Barclays expect large near-term disruption. Santander sees incremental long-term change. Morgan Stanley robotics and JPMorgan see radical but distant disruption. BofA sees moderate disruption already underway.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003e\u003ca href=\"https://www.morganstanley.com/im/en-us/individual-investor/insights/articles/investing-in-second-order-effects.html\"\u003eMorgan Stanley\u0026rsquo;s Counterpoint Global team\u003c/a\u003e, in the second-order effects report, presents historical data that should make the rest of this collection at least slightly uncomfortable. In the railroad era, Walmart\u0026rsquo;s equivalent outperformed Ford\u0026rsquo;s equivalent by \u003cstrong\u003e1,622x to 23x\u003c/strong\u003e. In the internet era, Netflix returned \u003cstrong\u003e519x\u003c/strong\u003e versus Cisco\u0026rsquo;s \u003cstrong\u003e4x\u003c/strong\u003e. It\u0026rsquo;s the same pattern every time: the companies that \u003cem\u003euse\u003c/em\u003e the infrastructure to serve customers dramatically outperform the companies that \u003cem\u003ebuild\u003c/em\u003e it.\u003c/p\u003e\n\u003cp\u003eYet nearly every bank\u0026rsquo;s actual investment positioning sits in Nvidia, ASML, hyperscalers, data centre REITs, nuclear utilities, overwhelmingly first-order enablers. Either the historical pattern won\u0026rsquo;t repeat this time (possible, but not argued anywhere in these reports), or there\u0026rsquo;s a valid timing explanation (first-order wins in the buildout phase, second-order wins in deployment) or most of these recommendations will look dated within five years.\u003c/p\u003e\n\u003cp\u003eMorgan Stanley\u0026rsquo;s own three reports collectively make the case for second-order investing over the long run while still recommending first-order plays in the near term. That\u0026rsquo;s not quite inconsistent. But the tension deserves more acknowledgment than it gets.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"power\"\u003ePower\u003c/h2\u003e\n\u003cp\u003eIf I had to pick one analytical claim that holds up regardless of where the productivity debate lands, it\u0026rsquo;s this: power is the binding constraint, and the infrastructure required to relieve it is real, expensive, and already being built.\u003c/p\u003e\n\u003cp\u003eThe numbers are consistent across institutions. US data centre power consumption runs at \u003cstrong\u003e150–175 TWh\u003c/strong\u003e today. \u003ca href=\"https://www.ib.barclays/our-insights/ai-revolution-meeting-massive-infrastructure-demand.html\"\u003eBarclays\u003c/a\u003e projects \u003cstrong\u003e560 TWh by 2030\u003c/strong\u003e, approximately 13% of total US electricity. Goldman Sachs estimates \u003cstrong\u003e60%\u003c/strong\u003e of new data centre power through 2030 will require net-new generation capacity. The US power grid has an average age of \u003cstrong\u003e40 years\u003c/strong\u003e. Token consumption grew \u003cstrong\u003e4,274%\u003c/strong\u003e in a single year. Data centre construction spending has grown roughly \u003cstrong\u003e60% year-on-year\u003c/strong\u003e since ChatGPT launched in late 2022.\u003c/p\u003e\n\u003cp\u003eBarclays frames this as a Jevons paradox: efficiency improvements in model inference will, counterintuitively, increase total energy consumption because they make AI cheaper and drive higher usage. I think that\u0026rsquo;s right. It\u0026rsquo;s exactly how personal computing and the internet played out. Every report that addresses energy lands on nuclear as the preferred long-term solution: \u003ca href=\"https://www.energy.gov/ne/articles/9-key-takeaways-president-trumps-executive-orders-nuclear-energy\"\u003efour executive orders\u003c/a\u003e in early 2025, a 400 GW capacity target by 2050, the \u003ca href=\"https://www.constellationenergy.com/news/2024/Constellation-to-Launch-Crane-Clean-Energy-Center-Restoring-Jobs-and-Carbon-Free-Power-to-The-Grid.html\"\u003eThree Mile Island restart\u003c/a\u003e. That consensus may prove correct. It may also be the sector where the infrastructure-before-returns gap runs longest.\u003c/p\u003e\n\u003ch2 id=\"what-the-reports-dont-say\"\u003eWhat the reports don\u0026rsquo;t say\u003c/h2\u003e\n\u003cp\u003eThe quadrant charts map where the banks are looking. They\u0026rsquo;re less revealing about what\u0026rsquo;s off the frame entirely.\u003c/p\u003e\n\u003cp\u003eNo report models a structured downside scenario: AI capex producing disappointing returns, hyperscalers pulling back, or a major data centre financing default triggering something worse. The closest is Santander\u0026rsquo;s 42% abandonment statistic, but even Santander doesn\u0026rsquo;t ask what happens if that number climbs to 60%.\u003c/p\u003e\n\u003cp\u003eNo report discusses AI safety or alignment risks. \u003ca href=\"https://www.ubs.com/global/en/wealthmanagement/insights/artificial-intelligence.html\"\u003eUBS\u003c/a\u003e notes that AI task completion duration has doubled every seven months and explicitly references the AGI trajectory, then moves directly to investment implications, as if \u0026ldquo;AGI trajectory\u0026rdquo; carries no risk premium at all. I find that strange.\u003c/p\u003e\n\u003cp\u003eThe collision between AI energy demand and climate commitments gets almost no treatment. Only \u003ca href=\"https://www.ib.barclays/our-insights/ai-revolution-meeting-massive-infrastructure-demand.html\"\u003eBarclays\u003c/a\u003e mentions that global CO2 emissions hit a record \u003cstrong\u003e37.7 gigatonnes\u003c/strong\u003e \u003ca href=\"https://www.iea.org/reports/global-energy-review-2025/co2-emissions\"\u003ein 2023\u003c/a\u003e. The institutions projecting AI consuming 13% of US electricity by 2030 don\u0026rsquo;t reconcile that with the net-zero commitments in their own sustainability reports.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.jpmorganchase.com/content/dam/jpmorganchase/documents/center-for-geopolitics/decoding-the-new-global-operating-system.pdf\"\u003eJPMorgan\u003c/a\u003e, which provides the most detailed geopolitical analysis of the twelve, never models a Taiwan Strait disruption scenario. \u003ca href=\"https://www.morganstanley.com/im/en-us/individual-investor/insights/tales-from-the-emerging-world/ais-silicon-backbone.html\"\u003eMorgan Stanley\u003c/a\u003e identifies Taiwan, Korea, and China as \u0026ldquo;irreplaceable\u0026rdquo; nodes in the AI hardware supply chain, while calling emerging market semiconductor exposure \u0026ldquo;long-term infrastructure participation.\u0026rdquo; Those two characterisations sit in very uncomfortable proximity, and neither report acknowledges it.\u003c/p\u003e\n\u003cp\u003eI came away from this with real respect for several of these pieces, particularly BofA\u0026rsquo;s empirical rigour and Santander\u0026rsquo;s willingness to cite unflattering numbers. The energy infrastructure thesis seems to me the most durable of the lot: the power bottleneck is real regardless of where you land on the productivity question.\u003c/p\u003e\n\u003cp\u003eBut I also came away convinced that this consensus is shaped as much by institutional incentive as by analytical independence. When nine institutions with combined AI-related revenue exposure in the hundreds of billions all agree you should increase AI exposure, the interesting question isn\u0026rsquo;t whether they\u0026rsquo;re right. They may well be.\u003c/p\u003e\n","summary":"I read 12 AI research reports from Goldman Sachs, JPMorgan, UBS, and 6 other banks. Here's the consensus they're pushing, and what they're not saying.","image":"https://static.philippdubach.com/ograph/ograph-banks-ai-research1.jpg","date_published":"2026-03-01T00:00:00Z","date_modified":"2026-05-04T13:38:26+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Investing"],"_philippdubach":{"type":"Analysis","word_count":2032,"reading_time_minutes":10,"keywords":["AI bubble 2026","bank AI research reports","AI capex productivity gap","second-order AI beneficiaries","Wall Street AI consensus","hyperscaler AI spending 2026","AI investment outlook 2026","sell-side AI research consensus","enterprise AI adoption rate","AI project failure rate","Goldman Sachs AI report","Morgan Stanley AI analysis","AI bubble vs dot-com","AI capex ROI","Jevons paradox AI energy","AI data centre power demand","AI general-purpose technology","nuclear energy data centres","AI infrastructure investment thesis","AI market concentration risk","hyperscaler capex ROI 2026","AI stock market outlook 2026","AI productivity paradox"],"section":"posts"}},{"id":"https://philippdubach.com/posts/when-ai-labs-become-defense-contractors/","url":"https://philippdubach.com/posts/when-ai-labs-become-defense-contractors/","title":"When AI Labs Become Defense Contractors","content_html":"\u003cp\u003e\u003ca href=\"https://airandspace.si.edu/collection-objects/lockheed-vega-5b-amelia-earhart/nasm_A19670093000\"\u003eLockheed started by building Amelia Earhart\u0026rsquo;s favorite plane\u003c/a\u003e. Then came a government loan guarantee in 1971 (the L-1011 TriStar nearly killed the company), a Cold War, decades of consolidation, and now a business that earns \u003ca href=\"https://news.lockheedmartin.com/2025-01-28-Lockheed-Martin-Reports-Fourth-Quarter-and-Full-Year-2024-Financial-Results\"\u003e\u003cstrong\u003e92.5%\u003c/strong\u003e of its revenue from government contracts\u003c/a\u003e, with the F-35 alone accounting for \u003cstrong\u003e26%\u003c/strong\u003e of its $71 billion in annual sales. The process took about 50 years. AI labs becoming defense contractors will happen faster.\u003c/p\u003e\n\u003cp\u003eOn February 27, 2026, two things happened within hours of each other. President Trump ordered every federal agency to \u003ca href=\"https://www.cnbc.com/2026/02/27/trump-anthropic-ai-pentagon.html\"\u003e\u0026ldquo;IMMEDIATELY CEASE all use of Anthropic\u0026rsquo;s technology\u0026rdquo;\u003c/a\u003e after CEO Dario Amodei refused to strip safety constraints from Claude\u0026rsquo;s Pentagon deployment, \u003ca href=\"https://www.anthropic.com/news/statement-department-of-war\"\u003especifically prohibitions on mass domestic surveillance and fully autonomous weapons\u003c/a\u003e. Defense Secretary Pete Hegseth then labeled Anthropic a \u003ca href=\"https://www.cbsnews.com/news/hegseth-declares-anthropic-supply-chain-risk/\"\u003e\u0026ldquo;Supply-Chain Risk to National Security,\u0026rdquo;\u003c/a\u003e a designation previously reserved for foreign adversaries like Huawei, \u003ca href=\"https://fortune.com/2026/02/28/openai-pentagon-deal-anthropic-designated-supply-chain-risk-unprecedented-action-damage-its-growth/\"\u003enever before applied to an American company\u003c/a\u003e. That evening, Sam Altman announced that OpenAI had signed a deal to deploy its models on the Pentagon\u0026rsquo;s classified network, \u003ca href=\"https://x.com/sama/status/2027578652477821175\"\u003eposting that the Department of War \u0026ldquo;displayed a deep respect for safety.\u0026rdquo;\u003c/a\u003e (Whether that reflects the Pentagon\u0026rsquo;s actual position or Altman\u0026rsquo;s political optimism, remains unclear for now.)\u003c/p\u003e\n\u003cp\u003eMost coverage has framed this as an ethics dispute. I think that framing is going to age poorly. What I see is the economics of defense spending doing what they have always done to every company they touch, and the ethics arguments becoming less audible as the financial gravity increases.\u003c/p\u003e\n\u003ch2 id=\"the-last-supper-and-defense-industry-consolidation\"\u003eThe Last Supper and defense industry consolidation\u003c/h2\u003e\n\u003cp\u003eIn the summer of 1993, Secretary of Defense Les Aspin and Deputy Secretary William Perry invited the CEOs of America\u0026rsquo;s defense firms to dinner at the Pentagon and told them, in so many words, that most of them would not survive. Cold War budget cuts meant the government could sustain roughly one prime contractor per equipment category. \u003ca href=\"https://www.defensenews.com/industry/2024/02/20/the-pentagon-wants-industry-to-transform-again-to-meet-demand-can-it/\"\u003eNorman Augustine, then CEO of Martin Marietta, named it the Last Supper.\u003c/a\u003e The message was clear: consolidate or die, and the government would not stop you from consolidating.\u003c/p\u003e\n\u003cp\u003eThe restructuring that followed was fast, even by M\u0026amp;A standards. \u003ca href=\"https://en.wikipedia.org/wiki/Last_Supper_(defense_industry)\"\u003eWithin four years, \u003cstrong\u003e51 prime defense contractors collapsed into five\u003c/strong\u003e\u003c/a\u003e: \u003ca href=\"https://www.ftc.gov/news-events/news/press-releases/1995/05/lockheed-corporation\"\u003eLockheed merged with Martin Marietta in 1995 ($10 billion)\u003c/a\u003e, \u003ca href=\"https://boeing.mediaroom.com/1997-07-31-Boeing-Completes-McDonnell-Douglas-Merger\"\u003eBoeing absorbed McDonnell Douglas in 1997 ($13.3 billion)\u003c/a\u003e, Raytheon folded in Hughes Electronics and Texas Instruments\u0026rsquo; defense unit. Between 2011 and 2015, \u003ca href=\"https://www.defensenews.com/breaking-news/2017/12/14/american-exodus-17000-us-defense-suppliers-may-have-left-the-defense-sector/\"\u003ean additional \u003cstrong\u003e17,000 U.S. companies exited the defense industry\u003c/strong\u003e\u003c/a\u003e, a contraction that hollowed out the supplier base the Big Five still depend on today.\u003c/p\u003e\n\u003cp\u003eThe revenue dependency data shows what happens to the companies on the inside of that consolidation. Boeing before 1997 was, as \u003ca href=\"https://www.cnn.com/2024/01/30/business/boeing-history-of-problems\"\u003eBank of America analyst Ron Epstein put it\u003c/a\u003e, \u0026ldquo;a company where engineers were high church.\u0026rdquo; Post-merger, Boeing relocated its headquarters from Seattle\u0026rsquo;s engineering center to Chicago, physically separating leadership from manufacturing. \u003ca href=\"https://boeing.mediaroom.com/2025-01-28-Boeing-Reports-Fourth-Quarter-Results\"\u003eDefense rose to \u003cstrong\u003e35.8% of Boeing\u0026rsquo;s FY2024 revenue\u003c/strong\u003e ($23.9 billion)\u003c/a\u003e. The cultural shift that merger carried, financial discipline over engineering judgment, is what most 737 MAX post-mortems eventually trace back to. Companies don\u0026rsquo;t plan to end up here. They respond to incentives, and the incentives compound.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-government-revenue-dependency-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/government-revenue-dependency.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/government-revenue-dependency.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/government-revenue-dependency.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/government-revenue-dependency.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/government-revenue-dependency.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/government-revenue-dependency.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/government-revenue-dependency.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/government-revenue-dependency.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/government-revenue-dependency.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/government-revenue-dependency.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/government-revenue-dependency.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/government-revenue-dependency.png\"\n           alt=\"Government revenue dependency across defense primes and AI defense contractors: Lockheed Martin at 92.5%, RTX at 55%, Boeing at 35.8%, Palantir at 53.7%, OpenAI at 5%, and Anthropic at 2%, showing how classified defense work creates a one-way revenue ratchet\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-government-revenue-dependency-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/government-revenue-dependency.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Government revenue dependency across defense primes and AI defense contractors: Lockheed Martin at 92.5%, RTX at 55%, Boeing at 35.8%, Palantir at 53.7%, OpenAI at 5%, and Anthropic at 2%, showing how classified defense work creates a one-way revenue ratchet\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe AI industry will face the same incentives, just faster, and through a different mechanism: not M\u0026amp;A but access to classified networks and government-funded compute.\u003c/p\u003e\n\u003ch2 id=\"how-pentagon-ai-spending-reshapes-a-company\"\u003eHow Pentagon AI spending reshapes a company\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://defensescoop.com/2024/03/11/pentagon-ai-budget-request-2025/\"\u003eThe FY2025 DoD AI budget was \u003cstrong\u003e$1.8 billion\u003c/strong\u003e\u003c/a\u003e, a figure that nearly everyone involved described as insufficient. \u003ca href=\"https://defensescoop.com/2025/06/26/dod-fy26-budget-request-autonomy-unmanned-systems/\"\u003eThe FY2026 budget request earmarks \u003cstrong\u003e$13.4 billion\u003c/strong\u003e for AI and autonomous systems\u003c/a\u003e, a roughly 7x increase in a single budget cycle, and the first time these technologies have their own standalone line item inside a total defense request of \u003cstrong\u003e$892.6 billion\u003c/strong\u003e. For context: \u003ca href=\"https://siliconangle.com/2026/02/12/anthropic-closes-30b-round-annualized-revenue-tops-14b/\"\u003eAnthropic\u0026rsquo;s full annualized revenue as of February 2026 was approximately \u003cstrong\u003e$14 billion\u003c/strong\u003e\u003c/a\u003e. The Pentagon just made AI a budget category larger than most of the companies selling it.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-dod-ai-budget-context-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/dod-ai-budget-context.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/dod-ai-budget-context.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/dod-ai-budget-context.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/dod-ai-budget-context.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dod-ai-budget-context.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/dod-ai-budget-context.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/dod-ai-budget-context.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/dod-ai-budget-context.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dod-ai-budget-context.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/dod-ai-budget-context.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/dod-ai-budget-context.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dod-ai-budget-context.png\"\n           alt=\"Pentagon AI budget FY2026 at $13.4 billion compared to AI lab revenues: a 7x jump from $1.8 billion in FY2025, set against Anthropic annualized revenue of $14 billion, OpenAI FY2025 revenue of $13.1 billion, and Palantir FY2025 revenue of $4.48 billion\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-dod-ai-budget-context-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/dod-ai-budget-context.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Pentagon AI budget FY2026 at $13.4 billion compared to AI lab revenues: a 7x jump from $1.8 billion in FY2025, set against Anthropic annualized revenue of $14 billion, OpenAI FY2025 revenue of $13.1 billion, and Palantir FY2025 revenue of $4.48 billion\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eAnthropic burns an estimated $3–5 billion annually; \u003ca href=\"https://www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html\"\u003eOpenAI burned approximately \u003cstrong\u003e$8 billion in 2025\u003c/strong\u003e\u003c/a\u003e. Neither has a clear path to profitability before 2027 at earliest. Government AI contracts offer something consumer businesses cannot: predictable, multi-year, politically protected revenue streams that don\u0026rsquo;t churn when a competitor releases a better model.\u003c/p\u003e\n\u003cp\u003eThe defense procurement structures deepen that dependency over time. \u003ca href=\"https://www.congress.gov/crs-product/IF12558\"\u003eIDIQ contracts (Indefinite Delivery, Indefinite Quantity), which now account for roughly \u003cstrong\u003e56% of DoD contract award dollars\u003c/strong\u003e\u003c/a\u003e, run five years with extension options. \u003ca href=\"https://defensescoop.com/2025/05/23/dod-palantir-maven-smart-system-contract-increase/\"\u003ePalantir\u0026rsquo;s Maven Smart System contract started at $480 million and expanded to \u003cstrong\u003enearly $1.3 billion through 2029\u003c/strong\u003e\u003c/a\u003e. The JWCC cloud contract, which replaced the \u003ca href=\"https://www.cnbc.com/2021/07/06/pentagon-cancels-10-billion-jedi-cloud-contract.html\"\u003ecancelled $10 billion JEDI contract\u003c/a\u003e, placed over \u003cstrong\u003e$3.9 billion in task orders within three years\u003c/strong\u003e of award to AWS, Google, Microsoft, and Oracle. Once embedded in classified systems, switching costs become close to prohibitive. A competitor cannot simply offer better inference speed.\u003c/p\u003e\n\u003cp\u003eSecurity clearances are maybe the most underappreciated asset in the defense tech ecosystem. \u003ca href=\"https://federalnewsnetwork.com/defense-main/2025/05/dcsa-backlog-of-security-clearance-investigations-down-24/\"\u003eProcessing a clearance takes an average of \u003cstrong\u003e243 days end-to-end\u003c/strong\u003e\u003c/a\u003e, up to a year for TS/SCI with polygraph. Only around \u003cstrong\u003e4.2 million Americans\u003c/strong\u003e hold active clearances, roughly 2.5% of the labor force, and an estimated 500,000 to 700,000 cleared positions currently sit unfilled. \u003ca href=\"https://news.clearancejobs.com/2025/03/20/national-security-compensation-reaches-new-high-despite-workforce-challenges/\"\u003eAverage cleared professional compensation hit \u003cstrong\u003e$119,131 in 2025\u003c/strong\u003e; full-scope-polygraph holders averaged \u003cstrong\u003e$141,299\u003c/strong\u003e\u003c/a\u003e. For AI labs accustomed to hiring from MIT, Cambridge, and ETH Zürich, the cleared talent pool is thin and gets more expensive every year.\u003c/p\u003e\n\u003cp\u003eAny lab serious about classified work has to build a parallel organizational structure: separate hiring pipeline, separate facilities, separate operational security requirements. The lab that builds that structure first has a moat no competitor can cross quickly.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"palantirs-trajectory-as-the-defense-tech-blueprint\"\u003ePalantir\u0026rsquo;s trajectory as the defense tech blueprint\u003c/h2\u003e\n\u003cp\u003eThe clearest view of where this ends is Palantir, which has been running the experiment at scale for a decade. \u003ca href=\"https://www.cnbc.com/2026/02/02/palantir-pltr-q4-2025-earnings.html\"\u003eIt posted \u003cstrong\u003e$4.48 billion in FY2025 revenue\u003c/strong\u003e, up 56% year-over-year\u003c/a\u003e, with government comprising \u003cstrong\u003e53.7%\u003c/strong\u003e of the total, down from a peak of \u003cstrong\u003e58.2% in 2021\u003c/strong\u003e as its commercial AIP platform gained traction. \u003ca href=\"https://www.army.mil/article/287506/u_s_army_awards_enterprise_service_agreement_to_enhance_military_readiness_and_drive_operational_efficiency\"\u003eIts $10 billion U.S. Army Enterprise Agreement in July 2025 consolidated 75 existing software contracts into a single framework\u003c/a\u003e. Its market capitalization reached roughly \u003cstrong\u003e$320 billion\u003c/strong\u003e by late February 2026, making it worth nearly twice Boeing. The model, government as the client that funds and validates the technology, commercial as the client that justifies the valuation, is what the AI labs are now building toward.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://news.crunchbase.com/venture/openai-raise-largest-ai-venture-deal-ever/\"\u003eOpenAI at an \u003cstrong\u003e$840 billion valuation\u003c/strong\u003e\u003c/a\u003e with a classified Pentagon network deal is already further down that road than most coverage acknowledges. It has \u003ca href=\"https://openai.com/index/openai-appoints-retired-us-army-general/\"\u003eappointed retired General Paul Nakasone\u003c/a\u003e, former NSA director, to its board. It hired Dane Stuckey, who spent a decade at Palantir and served as its CISO for six of those years, \u003ca href=\"https://techcrunch.com/2024/10/15/former-palantir-ciso-dane-stuckey-joins-openai-to-lead-security/\"\u003eas its own CISO\u003c/a\u003e. It has active job postings for Government Account Directors in Defense requiring Top Secret clearance and defense revenue targets exceeding $2 million per year.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe publishing record is moving the same way. \u003ca href=\"https://openai.com/index/introducing-openai/\"\u003eOpenAI\u0026rsquo;s 2015 founding post\u003c/a\u003e promised researchers \u0026ldquo;will be strongly encouraged to publish their work.\u0026rdquo; GPT-1 shipped with open-sourced code. GPT-2 was partially withheld in 2019, GPT-3 fully closed in 2020, GPT-4\u0026rsquo;s architecture undisclosed in 2023. OpenAI released smaller open-source models in August 2025 (its first since GPT-2, six years later) but they were text-only, trained on synthetic data, not frontier systems. \u003ca href=\"https://www.bloomberg.com/news/articles/2025-02-04/google-removes-language-on-weapons-from-public-ai-principles\"\u003eGoogle removed the \u0026ldquo;AI applications we will not pursue\u0026rdquo; section from its principles in February 2025\u003c/a\u003e, including the explicit weapons prohibition. \u003ca href=\"https://about.fb.com/news/2024/11/open-source-ai-america-global-security/\"\u003eMeta opened Llama to defense agencies and contractors including Lockheed Martin and Anduril in November 2024\u003c/a\u003e. Anthropic has never open-sourced a Claude model. Every major lab is moving in the same direction.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-openness-retreat-timeline-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/openness-retreat-timeline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/openness-retreat-timeline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/openness-retreat-timeline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/openness-retreat-timeline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openness-retreat-timeline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/openness-retreat-timeline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/openness-retreat-timeline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/openness-retreat-timeline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openness-retreat-timeline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/openness-retreat-timeline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/openness-retreat-timeline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/openness-retreat-timeline.png\"\n           alt=\"Timeline of AI lab research openness from 2015 to 2026, showing the retreat from open-source to classified military AI work: OpenAI moved from open-source GPT-1 to classified Pentagon deployment, Google removed its weapons prohibition, Meta opened Llama to defense contractors, and Anthropic was labeled a supply-chain risk\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-openness-retreat-timeline-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/openness-retreat-timeline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Timeline of AI lab research openness from 2015 to 2026, showing the retreat from open-source to classified military AI work: OpenAI moved from open-source GPT-1 to classified Pentagon deployment, Google removed its weapons prohibition, Meta opened Llama to defense contractors, and Anthropic was labeled a supply-chain risk\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe counterargument, and it\u0026rsquo;s a real one, is that defense R\u0026amp;D has historically generated civilian spillovers: ARPANET, GPS, jet engines, the semiconductor supply chain. \u003ca href=\"https://direct.mit.edu/rest/article/107/1/14/114751/The-Intellectual-Spoils-of-War-Defense-R-amp-D\"\u003eMoretti, Steinwender, and Van Reenen, writing in the \u003cem\u003eReview of Economics and Statistics\u003c/em\u003e (2025)\u003c/a\u003e, found that a 10% increase in government-funded defense R\u0026amp;D generates a 5–6% increase in privately funded R\u0026amp;D in the same industry: crowding-in, not crowding-out. The estimated total effect: U.S. private R\u0026amp;D investment is \u003cstrong\u003e$85 billion higher\u003c/strong\u003e than it would be without government defense spending.\u003c/p\u003e\n\u003cp\u003eBut there\u0026rsquo;s a difference between how much research gets done and what it gets pointed at. Lockheed\u0026rsquo;s R\u0026amp;D is now probably almost entirely classified hypersonics and directed-energy weapons. What it learns there does not flow back to commercial applications in any useful timeframe. The research volume expands; the scope narrows. Bell Labs devoted a substantial share of its personnel to government contracts at its Cold War peak; \u003ca href=\"https://cepr.org/voxeu/columns/how-antitrust-enforcement-can-spur-innovation-bell-labs-and-1956-consent-decree\"\u003ethe 1956 AT\u0026amp;T Consent Decree forced royalty-free patent licensing on the transistor\u003c/a\u003e, which accidentally accelerated the civilian semiconductor industry by giving Texas Instruments and Fairchild Semiconductor access to the core technology. AI labs operating under classification will not be forced to open-license anything. That mechanism does not exist for software under ITAR.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eI\u0026rsquo;m more confident in the direction of this analysis than in the timeline. The Anthropic supply-chain-risk designation may not survive legal challenge. The $13.4 billion FY2026 AI budget might not survive unchanged. Amodei might find a compromise that others in the industry treat as a ceiling rather than a floor. What I don\u0026rsquo;t think reverses is the structural pull. The defense budget is the largest single purchaser of advanced technology on earth, it\u0026rsquo;s growing, it operates on multi-year contract cycles that reward incumbents, and it is willing to use blunt regulatory tools against companies that don\u0026rsquo;t cooperate, as Anthropic learned in about six hours on February 27.\u003c/p\u003e\n\u003cp\u003eThe Last Supper logic applies here too: the government will not block consolidation, and it will not save the AI defense contractors that don\u0026rsquo;t participate. It will just find a different partner who will.\u003c/p\u003e\n","summary":"The Anthropic-Pentagon standoff isn't an ethics story. It's a replay of the 1993 Last Supper that consolidated 51 defense primes into 5, at Silicon Valley speed.","image":"https://static.philippdubach.com/ograph/ograph-ai-labs-defense-contractors1.jpg","date_published":"2026-03-01T00:00:00Z","date_modified":"2026-03-15T11:43:29+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Economics"],"_philippdubach":{"type":"Analysis","word_count":1602,"reading_time_minutes":8,"keywords":["AI defense contractors","Anthropic Pentagon ban","OpenAI Pentagon deal","Pentagon AI budget 2026","defense industry consolidation AI","AI military industrial complex","Palantir government revenue","IDIQ defense contracts","AI military contracts","security clearances AI companies","defense tech AI","DoD AI budget","AI weapons policy","defense revenue dependency","Last Supper defense industry","military AI governance","Pentagon AI strategy","government AI contracts","defense tech valuation","Silicon Valley defense contractors"],"section":"posts"}},{"id":"https://philippdubach.com/posts/people-live-in-levels-not-rates/","url":"https://philippdubach.com/posts/people-live-in-levels-not-rates/","title":"People Live in Levels, Not Rates","content_html":"\u003cblockquote\u003e\n\u003cp\u003eEconomics doesn\u0026rsquo;t take into account what\u0026rsquo;s best for society. The goal of economics in a capitalist system is to make the most amount of money for your shareholders.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat\u0026rsquo;s Jon Stewart, \u003ca href=\"https://podcasts.apple.com/us/podcast/the-irrational-economy-with-richard-thaler/id1583132133?i=1000747991551\"\u003etelling a Nobel laureate\u003c/a\u003e what his own field is about. On February 4, Stewart hosted Richard Thaler on \u0026ldquo;The Weekly Show\u0026rdquo; to discuss behavioral economics. Thaler, the Chicago Booth professor who won the 2017 Nobel for \u003ca href=\"https://news.uchicago.edu/story/richard-thaler-wins-nobel-prize-his-contributions-behavioural-economics\"\u003ehis work on how real humans deviate from rational-agent models\u003c/a\u003e, spent 92 minutes patiently explaining things Stewart had already decided weren\u0026rsquo;t true. \u003ca href=\"https://x.com/jasonfurman/status/2021395695081750874\"\u003eJason Furman\u003c/a\u003e, Harvard professor and former Obama CEA chair, called it \u0026ldquo;the single worst interview I\u0026rsquo;ve ever done\u0026rdquo; (referencing his own 2024 Stewart appearance). That tweet hit 754,000 views. \u003ca href=\"https://www.theargumentmag.com/p/jon-stewart-has-become-his-own-worst\"\u003eJerusalem Demsas\u003c/a\u003e wrote the sharpest rebuttal, arguing Stewart \u0026ldquo;has no idea what economics actually is.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe pile-on was deserved in its specifics and wrong in its framing. Stewart got basic things wrong. He also pointed at something real that the profession keeps failing to address.\u003c/p\u003e\n\u003ch2 id=\"the-carbon-tax-moment\"\u003eThe carbon tax moment\u003c/h2\u003e\n\u003cp\u003eThe episode\u0026rsquo;s most telling exchange involved climate policy. Thaler offered the textbook answer: a carbon tax. \u003ca href=\"https://singjupost.com/the-irrational-economy-w-nobel-laureate-richard-thaler-transcript/\"\u003eEvery economist agrees on this\u003c/a\u003e, Thaler said, and he\u0026rsquo;s roughly right. Stewart rejected it on political grounds: the moment energy prices rise, voters punish the party in power. Fair enough. But then Stewart \u003ca href=\"https://podcasts.happyscribe.com/the-weekly-show-with-jon-stewart/the-irrational-economy-with-richard-thaler\"\u003eproposed his own solution\u003c/a\u003e: \u0026ldquo;create a model that creates robust markets in damage mitigation and carbon mitigation.\u0026rdquo; Thaler paused. That is a carbon tax. Stewart had arrived at the standard economic answer while believing he was overturning it.\u003c/p\u003e\n\u003cp\u003eThis moment captures the entire problem. Stewart\u0026rsquo;s instinct, that political feasibility should constrain policy design, is not a \u0026ldquo;bizarre non sequitur\u0026rdquo; as some economists claimed. It\u0026rsquo;s the reason we don\u0026rsquo;t have a carbon tax. But his conviction that economics as a discipline has nothing to say about society\u0026rsquo;s wellbeing is wrong in a way that matters. Thaler\u0026rsquo;s own career is a direct counterexample: behavioral nudges have \u003ca href=\"https://www.bi.team/about-us/who-we-are/\"\u003eenrolled 10 million UK workers\u003c/a\u003e into pension savings, and Thaler \u003ca href=\"https://singjupost.com/the-irrational-economy-w-nobel-laureate-richard-thaler-transcript/\"\u003etold Stewart\u003c/a\u003e that renaming an ACA plan tier from \u0026ldquo;catastrophic\u0026rdquo; to \u0026ldquo;economy\u0026rdquo; cut the uninsured rate by \u003cstrong\u003e10%\u003c/strong\u003e. But I think the economists who piled on Stewart missed the more interesting question he was circling: if the economy is working well by standard measures, why does it feel broken to so many people? The answer is what I\u0026rsquo;d call the levels-vs-rates problem, and it explains both the vibecession and the trust gap between economists and the public they claim to serve.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"levels-vs-rates-disconnect\"\u003eLevels-vs-rates disconnect\u003c/h2\u003e\n\u003cp\u003eThe headline data is strong. \u003ca href=\"https://www.bea.gov/news/2026/gross-domestic-product-3rd-quarter-2025-updated-estimate-gdp-industry-and-corporate\"\u003eGDP grew 4.4% annualized in Q3 2025\u003c/a\u003e. \u003ca href=\"https://tradingeconomics.com/united-states/unemployment-rate\"\u003eUnemployment sits at 4.3%\u003c/a\u003e. \u003ca href=\"https://www.cnbc.com/2026/02/13/heres-the-inflation-breakdown-for-january-2026-in-one-chart.html\"\u003eInflation has fallen to 2.4%\u003c/a\u003e, with core CPI at 2.5%, its lowest since April 2021. Real wages have outpaced inflation every month since June 2023. The S\u0026amp;P 500 posted \u003ca href=\"https://www.fool.com/investing/2026/01/22/the-sp-500-just-did-something-weve-never-seen-befo/\"\u003ethree consecutive years of double-digit gains\u003c/a\u003e, returning 86% cumulative. An economist looking at these numbers would say the economy is performing well. Thaler more or less said exactly that.\u003c/p\u003e\n\u003cp\u003eBut rates and levels are different things, and people live in levels. The cumulative CPI increase since early 2020 is roughly \u003cstrong\u003e25%\u003c/strong\u003e. \u003ca href=\"https://www.traceone.com/resources/plm-compliance-blog/grocery-store-items-that-have-increased-most-in-price\"\u003eFood-at-home prices are up 29.4% since March 2020\u003c/a\u003e. The $150 grocery bill became $186 and will never go back to $150. \u003ca href=\"https://www.cotality.com/insights/articles/2025-housing-market-moderation-and-rebalancing\"\u003eHousing affordability is at its lowest point since the 1980s\u003c/a\u003e, with home prices up 30-45% from pre-pandemic levels and \u003ca href=\"https://www.freddiemac.com/pmms\"\u003emortgage rates near 6%\u003c/a\u003e, more than double the 2.65% pandemic low. \u003ca href=\"https://time.com/7327333/health-insurance-costs-increasing-2026/\"\u003eACA marketplace premiums rose 21.7-26% for 2026\u003c/a\u003e, the largest increase in nearly a decade. The rate of change has normalized. The level shift is permanent.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-macro-vs-sentiment-disconnect-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/macro-vs-sentiment-disconnect.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/macro-vs-sentiment-disconnect.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/macro-vs-sentiment-disconnect.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/macro-vs-sentiment-disconnect.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/macro-vs-sentiment-disconnect.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/macro-vs-sentiment-disconnect.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/macro-vs-sentiment-disconnect.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/macro-vs-sentiment-disconnect.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/macro-vs-sentiment-disconnect.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/macro-vs-sentiment-disconnect.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/macro-vs-sentiment-disconnect.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/macro-vs-sentiment-disconnect.png\"\n           alt=\"Exhibit showing the disconnect between macro indicators and lived experience, with the left column showing GDP growth of 4.4 percent, unemployment of 4.3 percent, inflation of 2.4 percent, S\u0026amp;P 500 up 86 percent cumulative, and 17 months of real wage gains, versus the right column showing University of Michigan consumer sentiment at 57.3 in the 3rd percentile, grocery prices up 29.4 percent, mortgage rates at 6.09 percent up from 2.65 percent, ACA premiums up 26 percent, and 8 percent of households with zero or negative net worth\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-macro-vs-sentiment-disconnect-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/macro-vs-sentiment-disconnect.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing the disconnect between macro indicators and lived experience, with the left column showing GDP growth of 4.4 percent, unemployment of 4.3 percent, inflation of 2.4 percent, S\u0026amp;P 500 up 86 percent cumulative, and 17 months of real wage gains, versus the right column showing University of Michigan consumer sentiment at 57.3 in the 3rd percentile, grocery prices up 29.4 percent, mortgage rates at 6.09 percent up from 2.65 percent, ACA premiums up 26 percent, and 8 percent of households with zero or negative net worth\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-levels-vs-rates1-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/levels-vs-rates1.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/levels-vs-rates1.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/levels-vs-rates1.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/levels-vs-rates1.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/levels-vs-rates1.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/levels-vs-rates1.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/levels-vs-rates1.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/levels-vs-rates1.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/levels-vs-rates1.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/levels-vs-rates1.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/levels-vs-rates1.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/levels-vs-rates1.png\"\n           alt=\"Exhibit showing cumulative price increases since March 2020 by essential spending category with housing up roughly 38 percent, food at home up 29.4 percent, health insurance premiums up 26 percent, and aggregate CPI up roughly 25 percent, contrasted with the current inflation rate of 2.4 percent, illustrating that the rate has normalized but the level shift is permanent\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-levels-vs-rates1-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/levels-vs-rates1.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing cumulative price increases since March 2020 by essential spending category with housing up roughly 38 percent, food at home up 29.4 percent, health insurance premiums up 26 percent, and aggregate CPI up roughly 25 percent, contrasted with the current inflation rate of 2.4 percent, illustrating that the rate has normalized but the level shift is permanent\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThis is what Stewart was gesturing at. He expressed it as \u0026ldquo;economics doesn\u0026rsquo;t care about people,\u0026rdquo; which is wrong. What he actually meant: the metrics economists use to declare success (rates of change, aggregate growth, unemployment) don\u0026rsquo;t capture what households experience at the grocery store, the mortgage broker, or the insurance renewal. The economic perception gap isn\u0026rsquo;t irrational. It\u0026rsquo;s a measurement problem.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"k-shaped-wages-the-recovery-that-reversed\"\u003eK-shaped wages: the recovery that reversed\u003c/h2\u003e\n\u003cp\u003eThe distributional story makes both sides\u0026rsquo; aggregate claims misleading. \u003ca href=\"https://www.epi.org/blog/low-wage-workers-faced-worsening-affordability-in-2025/\"\u003eThe Economic Policy Institute reported\u003c/a\u003e in February 2026 that low-wage workers\u0026rsquo; real wages declined in 2025, ending five years of historically fast gains. High earners held steady at 4.5% wage growth. \u003ca href=\"https://fortune.com/2025/11/10/k-shaped-economy-wage-growth-wealthiest-poorest-americans-diverge/\"\u003eThe lowest-income quartile fell from 7.5% to roughly 3.5%\u003c/a\u003e. The pandemic-era wage compression that closed up to one-third of the post-1979 wage gap, documented by Autor, Dube, and McGrew in their 2023 paper, has reversed. \u003ca href=\"https://www.cnbc.com/2026/01/30/wealth-inequality-k-shaped-economy-united-states-consumer-spending-trump.html\"\u003eCNBC reported\u003c/a\u003e that the top 1% now hold roughly 32% of US net worth (about $52 trillion), while the bottom 50% hold 2.5%. Eight percent of American households have zero or negative net worth.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-k-shaped-wage-divergence2-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/k-shaped-wage-divergence2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/k-shaped-wage-divergence2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/k-shaped-wage-divergence2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/k-shaped-wage-divergence2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/k-shaped-wage-divergence2.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/k-shaped-wage-divergence2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/k-shaped-wage-divergence2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/k-shaped-wage-divergence2.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/k-shaped-wage-divergence2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/k-shaped-wage-divergence2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/k-shaped-wage-divergence2.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/k-shaped-wage-divergence2.png\"\n           alt=\"Exhibit showing K-shaped wage divergence from 2021 to 2025, with bottom quartile wage growth peaking at 7.5 percent in 2022 then collapsing to 3.5 percent by late 2025, while top quartile wage growth held steady at 4.5 percent throughout, illustrating the reversal of pandemic-era wage compression that had closed up to one-third of the post-1979 inequality gap\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-k-shaped-wage-divergence2-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/k-shaped-wage-divergence2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing K-shaped wage divergence from 2021 to 2025, with bottom quartile wage growth peaking at 7.5 percent in 2022 then collapsing to 3.5 percent by late 2025, while top quartile wage growth held steady at 4.5 percent throughout, illustrating the reversal of pandemic-era wage compression that had closed up to one-third of the post-1979 inequality gap\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe \u003ca href=\"https://www.advisorperspectives.com/dshort/updates/2026/02/06/consumer-sentiments-marginal-gains-six-month-peak-still-feels-like-a-valley\"\u003eUniversity of Michigan consumer sentiment index\u003c/a\u003e sits at 57.3: the 3rd percentile of its entire historical range. The \u003ca href=\"https://markets.financialcontent.com/stocks/article/marketminute-2026-2-11-the-great-sentiment-schism-us-consumer-confidence-hits-12-year-low-amid-radical-partisan-divide\"\u003eConference Board\u0026rsquo;s Consumer Confidence Index hit 84.5 in January\u003c/a\u003e, a 12-year low. Charles Schwab\u0026rsquo;s Kevin Gordon \u003ca href=\"https://finance.yahoo.com/video/economic-data-isnt-moving-sentiment-190033538.html\"\u003ecoined the term \u0026ldquo;vibepression\u0026rdquo;\u003c/a\u003e in December 2025 as sentiment hit new lows. Kyla Scanlon\u0026rsquo;s \u003ca href=\"https://www.mercatus.org/macro-musings/kyla-scanlon-vibecession-vibe-economy-and-path-growing-american-wealth\"\u003eoriginal \u0026ldquo;vibecession\u0026rdquo; concept from 2022\u003c/a\u003e never actually resolved; it just got a bleaker name.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m not sure consumer sentiment surveys mean much anymore. A \u003ca href=\"https://markets.financialcontent.com/stocks/article/marketminute-2026-2-11-the-great-sentiment-schism-us-consumer-confidence-hits-12-year-low-amid-radical-partisan-divide\"\u003e50-point partisan gap\u003c/a\u003e between Republicans and Democrats renders the aggregate figure almost meaningless as an economic indicator. It tells you about political identity, not lived experience. But the affordability data underneath the sentiment numbers is not a polling artifact. \u003ca href=\"https://newsletter.mikekonczal.com/p/why-affordability-and-the-vibecession\"\u003eMike Konczal\u0026rsquo;s February 2026 analysis\u003c/a\u003e showed that budget shares devoted to essentials, food, shelter, transportation, and healthcare, have increased even as aggregate real incomes recovered. He called this the \u0026ldquo;essentials squeeze,\u0026rdquo; and dismissed the standard economist response (\u0026ldquo;it\u0026rsquo;s just money illusion\u0026rdquo;) as inadequate. I think he\u0026rsquo;s right.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"economists-communication-problem\"\u003eEconomists\u0026rsquo; communication problem\u003c/h2\u003e\n\u003cp\u003eThe \u003ca href=\"https://www.nominalnews.com/p/jon-stewart-thaler-economics-debate\"\u003eNominal News\u003c/a\u003e author, a PhD economist, made the argument I find most persuasive: Stewart\u0026rsquo;s view of economics is wrong, but the reason he holds it is because the profession has failed to distinguish between what it actually does and the policy opinions individual economists express in op-eds and cable news appearances. When a prominent economist dismisses wealth taxes by citing implementation costs as if costs alone settle the question, they\u0026rsquo;re blending analysis with preference. Do that enough times, and people like Stewart conclude that the entire field is an exercise in defending the status quo.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.theargumentmag.com/p/jon-stewart-has-become-his-own-worst\"\u003eDemsas catalogued\u003c/a\u003e economics\u0026rsquo; accomplishments: Alvin Roth\u0026rsquo;s kidney exchange, RCTs delivering tutoring to 5 million Indian children, longer intervals between recessions. These are real. But the profession\u0026rsquo;s most visible public-facing moments, failing to predict 2008, designing a response perceived as rescuing banks over households, insisting the economy is strong while consumer sentiment sits near all-time lows, have eroded trust in ways that one good Substack post can\u0026rsquo;t repair.\u003c/p\u003e\n\u003cp\u003eStewart then \u003ca href=\"https://singjupost.com/the-wealth-of-wall-street-with-oren-cass-transcript/\"\u003ebrought on Oren Cass\u003c/a\u003e to discuss financialization, which \u003ca href=\"https://www.theargumentmag.com/p/jon-stewart-has-become-his-own-worst\"\u003eprompted Demsas to write\u003c/a\u003e: \u0026ldquo;Damn, I felt bad for a second but Stewart may be beyond help.\u0026rdquo; The irony is thick. The populist left (Stewart) and the populist right (Cass) are making the same structural complaint about economics from opposite directions. Stewart says economics serves capital. Cass says economics serves free trade orthodoxy. Both are wrong about the discipline and right that its public-facing representatives have blurred the line between analysis and advocacy for decades.\u003c/p\u003e\n\u003cp\u003eHere\u0026rsquo;s where I land. Nudge theory \u003ca href=\"https://www.sciencenews.org/article/nudge-theory-behavioral-science-psychology-structural-change\"\u003eworks in specific, bounded domains\u003c/a\u003e: default enrollment, plan labeling, organ donation opt-outs. A \u003ca href=\"https://theconversation.com/nudge-theory-what-15-years-of-research-tells-us-about-its-promises-and-politics-210534\"\u003emeta-analysis by Maier et al.\u003c/a\u003e found that real-world nudges increase desired behavior by an average of \u003cstrong\u003e1.4 percentage points\u003c/strong\u003e after correcting for publication bias, versus 8.7 in lab settings. Useful but limited. Stewart\u0026rsquo;s \u0026ldquo;nudge vs. shove\u0026rdquo; framing is crude. Thaler\u0026rsquo;s point that mandates become dangerous when political control shifts (\u0026ldquo;Sometimes Trump is President\u0026rdquo;) is underrated. But neither addressed the actual hard question: what do you do about a permanent 25% price level shift in essentials that no nudge can reverse and no rate-of-change metric captures?\u003c/p\u003e\n\u003cp\u003eI don\u0026rsquo;t think anyone has a good answer to that. The economists who piled on Stewart for not understanding Pigouvian taxation weren\u0026rsquo;t wrong. They just weren\u0026rsquo;t answering the question their audience was asking.\u003c/p\u003e\n","summary":"Prices rose 25% since 2020 and won't come back. The levels-vs-rates problem explains the vibecession, the Stewart-Thaler debate, and why nobody trusts economists.","image":"https://static.philippdubach.com/ograph/ograph-stewart-thaler-debate.jpg","date_published":"2026-02-28T00:00:00Z","date_modified":"2026-05-04T13:48:47+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Commentary","word_count":1329,"reading_time_minutes":7,"keywords":["why economy feels bad","vibecession 2026","prices still high inflation down","Jon Stewart Richard Thaler debate 2026","K-shaped economy wages 2026","cumulative inflation since 2020","levels vs rates problem inflation","vibepression 2026","consumer sentiment disconnect","grocery prices since pandemic","housing affordability crisis 2026","economic perception gap","real wage growth by income level","nudge theory effectiveness meta-analysis","essentials squeeze affordability","economics communication failure","behavioral economics trust crisis","why economy feels bad when data is good"],"section":"posts"}},{"id":"https://philippdubach.com/posts/novo-was-europes-most-valuable-company/","url":"https://philippdubach.com/posts/novo-was-europes-most-valuable-company/","title":"Novo Was Europe's Most Valuable Company","content_html":"\u003cp\u003eNovo Nordisk was Europe\u0026rsquo;s most valuable company 20 months ago. Today its market capitalization falls behind ASML, LVMH, Hermès, L\u0026rsquo;Oréal, SAP, Prosus, Siemens, Inditex, Deutsche Telekom, and Santander.\u003c/p\u003e\n\u003cp\u003eThe stock has lost roughly \u003cstrong\u003e75%\u003c/strong\u003e since its June 2024 peak of $142.44, falling from a \u003cstrong\u003e$640 billion\u003c/strong\u003e market cap to under \u003cstrong\u003e$160 billion\u003c/strong\u003e. Shares dropped another 16% this morning after CagriSema, the follow-on obesity drug that was supposed to restore Novo\u0026rsquo;s competitive story, \u003ca href=\"https://www.globenewswire.com/news-release/2026/02/23/3242381/0/en/Novo-Nordisk-A-S-CagriSema-demonstrated-23-weight-loss-in-an-open-label-head-to-head-REDEFINE-4-trial-in-people-with-obesity-the-primary-endpoint-was-not-achieved.html\"\u003efailed its head-to-head trial\u003c/a\u003e against Eli Lilly\u0026rsquo;s Zepbound. The REDEFINE 4 results confirm what a former Novo advisor \u003ca href=\"https://www.alpha-sense.com/\"\u003etold AlphaSense\u003c/a\u003e back in December: CagriSema is \u0026ldquo;not particularly impressive.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eI like this stock over the long term. The GLP-1 market is real, the addressable population is enormous, and Novo still sells more semaglutide than anyone. But liking a stock and holding on to it no matter the outlook are not the same thing. Or as Warren Buffett would say:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe most important thing to do if you find yourself in a hole is to stop digging\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe problems are compounding. US pricing is resetting structurally lower through MFN and IRA. The pipeline just lost its best competitive argument. International patents are falling away faster than expected. Eli Lilly is pulling ahead on every axis. And Novo \u003ca href=\"https://www.cnbc.com/2026/02/03/novo-nordisk-2025-earnings-wegovy-ozempic.html\"\u003eguided for its \u003cstrong\u003efirst revenue decline in modern history\u003c/strong\u003e\u003c/a\u003e in 2026: adjusted sales down \u003cstrong\u003e5-13%\u003c/strong\u003e. A former senior district sales manager at Novo described that guidance as \u0026ldquo;very tepid,\u0026rdquo; and added that the severity of the market reaction suggests investors may be pricing in further downside beyond what management disclosed.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"stock-collapse\"\u003eStock collapse\u003c/h2\u003e\n\u003cp\u003eThe speed of the decline matters. In June 2024, NVO hit $142.44. Then, in sequence: a July 2025 guidance cut after Q2 results showed US pricing headwinds worse than expected (shares dropped roughly 22% in a session). A September 2025 announcement of \u003ca href=\"https://www.cnbc.com/2025/09/10/wegovy-maker-novo-nordisk-to-cut-around-9000-jobs.html\"\u003e9,000 job cuts\u003c/a\u003e and DKK 8 billion in restructuring charges under new CEO Maziar Mike Doustdar, read not as efficiency but as admission of trouble. February 4, 2026 full-year results \u003ca href=\"https://www.fiercepharma.com/pharma/novo-shares-plummet-sales-profit-warning-26\"\u003eguiding adjusted sales growth at -5% to -13%\u003c/a\u003e (the stock \u003ca href=\"https://www.euronews.com/business/2026/02/04/novo-nordisk-stock-sinks-by-17-after-bleak-2026-forecast\"\u003ecratered 18% in Copenhagen\u003c/a\u003e). And today, REDEFINE 4.\u003c/p\u003e\n\u003cp\u003eThe 52-week range was $43.08 to $93.80 before today\u0026rsquo;s open. NVO is now trading around $40, a new low. The all-time high was less than two years ago.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-peer-valuation-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-peer-valuation.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-peer-valuation.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-peer-valuation.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-peer-valuation.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-peer-valuation.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-peer-valuation.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-peer-valuation.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-peer-valuation.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-peer-valuation.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-peer-valuation.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-peer-valuation.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-peer-valuation.png\"\n           alt=\"Exhibit showing Novo Nordisk trading cheaper than every large-cap pharma peer except Pfizer, with NVO at 13x forward PE and minus 48 percent one-year return versus Eli Lilly at 30x and plus 17 percent, AstraZeneca at 20x and plus 40 percent, Merck at 14x, and AbbVie at 17x, with NVO the only company guiding for negative FY2026 revenue growth\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-peer-valuation-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-peer-valuation.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Novo Nordisk trading cheaper than every large-cap pharma peer except Pfizer, with NVO at 13x forward PE and minus 48 percent one-year return versus Eli Lilly at 30x and plus 17 percent, AstraZeneca at 20x and plus 40 percent, Merck at 14x, and AbbVie at 17x, with NVO the only company guiding for negative FY2026 revenue growth\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eNovo now trades at a lower forward multiple than Merck and below Pfizer, which is dealing with its own post-COVID structural decline. Whether that valuation is justified is the real question.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-valuation-compression-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-valuation-compression.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-valuation-compression.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-valuation-compression.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-valuation-compression.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-valuation-compression.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-valuation-compression.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-valuation-compression.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-valuation-compression.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-valuation-compression.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-valuation-compression.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-valuation-compression.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-valuation-compression.png\"\n           alt=\"Exhibit showing Novo Nordisk forward PE compressing 75 percent from 41.8x in FY2023 to 13.2x in FY2026E while Eli Lilly remains at approximately 30x, with event markers for the July 2025 guidance cut, September 2025 job cuts, and February 2026 revenue decline guidance\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-valuation-compression-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-valuation-compression.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Novo Nordisk forward PE compressing 75 percent from 41.8x in FY2023 to 13.2x in FY2026E while Eli Lilly remains at approximately 30x, with event markers for the July 2025 guidance cut, September 2025 job cuts, and February 2026 revenue decline guidance\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003ch2 id=\"2026-guidance\"\u003e2026 guidance\u003c/h2\u003e\n\u003cp\u003eIt is rare to see a company of Novo\u0026rsquo;s stature guide for a sales decline. This is not a biotech that lost a coin-flip Phase 3. This is the global leader in GLP-1s telling investors that revenue will shrink.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-2026-guidance-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-2026-guidance.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-2026-guidance.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-2026-guidance.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-2026-guidance.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-2026-guidance.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-2026-guidance.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-2026-guidance.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-2026-guidance.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-2026-guidance.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-2026-guidance.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-2026-guidance.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-2026-guidance.png\"\n           alt=\"Exhibit showing Novo Nordisk FY2026 guidance with adjusted sales growth of minus 5 to minus 13 percent CER, adjusted operating profit growth of minus 5 to minus 13 percent, reported DKK sales growth of minus 8 to minus 16 percent, and reported operating profit growth of minus 10 to minus 18 percent, with capex of DKK 55 billion and free cash flow of DKK 35-45 billion\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-2026-guidance-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-2026-guidance.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Novo Nordisk FY2026 guidance with adjusted sales growth of minus 5 to minus 13 percent CER, adjusted operating profit growth of minus 5 to minus 13 percent, reported DKK sales growth of minus 8 to minus 16 percent, and reported operating profit growth of minus 10 to minus 18 percent, with capex of DKK 55 billion and free cash flow of DKK 35-45 billion\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-revenue-growth-inflection-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-revenue-growth-inflection.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-revenue-growth-inflection.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-revenue-growth-inflection.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-revenue-growth-inflection.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-revenue-growth-inflection.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-revenue-growth-inflection.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-revenue-growth-inflection.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-revenue-growth-inflection.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-revenue-growth-inflection.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-revenue-growth-inflection.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-revenue-growth-inflection.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-revenue-growth-inflection.png\"\n           alt=\"Exhibit showing Novo Nordisk year-over-year revenue growth from FY2016 to FY2026E, with growth accelerating from plus 10.9 percent in FY2021 to plus 31.3 percent in FY2023 before decelerating to plus 6.4 percent in FY2025 and turning negative at minus 7.4 percent in FY2026E, the first revenue decline in modern company history\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-revenue-growth-inflection-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-revenue-growth-inflection.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Novo Nordisk year-over-year revenue growth from FY2016 to FY2026E, with growth accelerating from plus 10.9 percent in FY2021 to plus 31.3 percent in FY2023 before decelerating to plus 6.4 percent in FY2025 and turning negative at minus 7.4 percent in FY2026E, the first revenue decline in modern company history\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThree structural forces are driving the decline, each on a different timeline.\u003c/p\u003e\n\u003cp\u003eThe \u003ca href=\"https://www.whitehouse.gov/fact-sheets/2025/11/fact-sheet-president-donald-j-trump-announces-major-developments-in-bringing-most-favored-nation-pricing-to-american-patients/\"\u003eNovember 2025 MFN deal\u003c/a\u003e with the Trump Administration cut Wegovy\u0026rsquo;s government price to \u003cstrong\u003e$349/month\u003c/strong\u003e and set Medicare/Medicaid rates at roughly \u003cstrong\u003e$245/month\u003c/strong\u003e, a \u003ca href=\"https://www.cnbc.com/2025/11/06/trump-eli-lilly-novo-nordisk-deal-obesity-drug-prices.html\"\u003e60-80% reduction\u003c/a\u003e from prior list prices. Insulin was capped at $35/month. Lilly took a similar deal (Zepbound at $346/month), so neither company gained competitive advantage, but both lost pricing power permanently in the government channel. The commercial channel is following. Payers who previously paid $800-1,000 per month for Wegovy are now pointing at the government rate and demanding comparable terms.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eInternationally, the patent picture is worse than most investors realize. Semaglutide\u0026rsquo;s compound patent lapsed in Canada in January 2026 after Novo failed to pay a \u003ca href=\"https://fortune.com/2025/06/17/novo-nordisk-ozempic-wegovy-semaglutide-canada-patent-protection-fee/\"\u003emaintenance fee\u003c/a\u003e of roughly CAD 250 (\u003cem\u003eon a self reflective note, maybe this story alone should have made me leave\u003c/em\u003e). Sandoz and Apotex are preparing generic launches. \u003ca href=\"https://www.theglobeandmail.com/business/economy/article-generic-ozempic-canada-drugmakers/\"\u003eDr. Reddy\u0026rsquo;s has filed in 87 countries\u003c/a\u003e. In China, at least 15 manufacturers are in development. Brazil\u0026rsquo;s federal court denied a patent extension. The US patent thicket (320 applications, 154 granted, settlements pushing generics to roughly 2031-32) provides breathing room domestically, but international operations generated DKK 112 billion in 2025 revenue, and the erosion has started.\u003c/p\u003e\n\u003cp\u003eMeanwhile, \u003ca href=\"https://stateline.org/2025/11/28/states-retreat-from-covering-drugs-for-weight-loss/\"\u003eseveral states have dropped Medicaid coverage\u003c/a\u003e for GLP-1 obesity drugs since late 2025: California, Pennsylvania, New Hampshire, South Carolina. \u003ca href=\"https://www.kff.org/medicaid/medicaid-coverage-of-and-spending-on-glp-1s/\"\u003eOnly 13 states still cover them\u003c/a\u003e. The IRA\u0026rsquo;s \u003ca href=\"https://www.fiercepharma.com/pharma/medicare-unveils-price-reductions-15-drugs-including-novos-semaglutide\"\u003eRound 2 negotiations\u003c/a\u003e, effective January 2027, set Ozempic at \u003cstrong\u003e$274/month\u003c/strong\u003e (71% below list) and Wegovy at \u003cstrong\u003e$385/month\u003c/strong\u003e. With 2.3 million Medicare semaglutide users, that is a massive revenue compression event arriving in twelve months.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-margin-erosion-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-margin-erosion.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-margin-erosion.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-margin-erosion.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-margin-erosion.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-margin-erosion.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-margin-erosion.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-margin-erosion.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-margin-erosion.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-margin-erosion.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-margin-erosion.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-margin-erosion.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-margin-erosion.png\"\n           alt=\"Exhibit showing Novo Nordisk income statement from FY2021 to FY2026E with gross margin falling 370 basis points from 84.7 percent in FY2024 to 81.0 percent in FY2025, R\u0026amp;D spending rising from 12.6 to 16.8 percent of revenue, and EBITDA margin compressing from 50.8 to 48.4 percent, with FY2026E projecting further deterioration across all metrics\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-margin-erosion-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-margin-erosion.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Novo Nordisk income statement from FY2021 to FY2026E with gross margin falling 370 basis points from 84.7 percent in FY2024 to 81.0 percent in FY2025, R\u0026amp;D spending rising from 12.6 to 16.8 percent of revenue, and EBITDA margin compressing from 50.8 to 48.4 percent, with FY2026E projecting further deterioration across all metrics\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eA former director at Novo anticipates strong GLP-1 growth for at least five to seven years but warns that pricing pressures from biosimilars and generics will force significant price cuts in that period. Long-term share, in this person\u0026rsquo;s view, depends on real-world efficacy and the ability to secure additional indications, not on the brand franchise alone.\u003c/p\u003e\n\u003ch2 id=\"cagrisema-a-pipeline-crisis\"\u003eCagriSema: a pipeline crisis\u003c/h2\u003e\n\u003cp\u003eI want to push back on the framing already circulating in some analyst notes, which is that REDEFINE 4 is \u0026ldquo;disappointing but manageable.\u0026rdquo; It is not manageable. This was the trial that was supposed to prove Novo could compete with Lilly on superior efficacy. It proved the opposite.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-efficacy-comparison-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-efficacy-comparison.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-efficacy-comparison.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-efficacy-comparison.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-efficacy-comparison.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-efficacy-comparison.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-efficacy-comparison.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-efficacy-comparison.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-efficacy-comparison.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-efficacy-comparison.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-efficacy-comparison.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-efficacy-comparison.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-efficacy-comparison.png\"\n           alt=\"Exhibit comparing weight loss efficacy across injectable and oral obesity drugs, showing Eli Lilly\u0026#39;s retatrutide at 28.7 percent, Zepbound at 25.5 percent, Novo\u0026#39;s CagriSema at 23.0 percent, injectable Wegovy at approximately 15 percent, Lilly\u0026#39;s orforglipron at approximately 14.7 percent, and the Wegovy pill at approximately 13.6 percent, with CagriSema trailing Zepbound by 2.5 percentage points and retatrutide by nearly 6 percentage points\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-efficacy-comparison-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-efficacy-comparison.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit comparing weight loss efficacy across injectable and oral obesity drugs, showing Eli Lilly\u0026#39;s retatrutide at 28.7 percent, Zepbound at 25.5 percent, Novo\u0026#39;s CagriSema at 23.0 percent, injectable Wegovy at approximately 15 percent, Lilly\u0026#39;s orforglipron at approximately 14.7 percent, and the Wegovy pill at approximately 13.6 percent, with CagriSema trailing Zepbound by 2.5 percentage points and retatrutide by nearly 6 percentage points\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe 2.5 percentage point gap on the on-treatment estimand is bad enough. The 3.4 point gap on intention-to-treat is worse, because it suggests CagriSema also has a tolerability or adherence problem relative to tirzepatide. \u003ca href=\"https://www.nejm.org/doi/full/10.1056/NEJMoa2502081\"\u003eOnly 57% of REDEFINE 1 patients\u003c/a\u003e reached the highest CagriSema dose, hinting at a ceiling.\u003c/p\u003e\n\u003cp\u003eA former senior director at Novo expressed disappointment with the REDEFINE trial designs, which allowed for patient down-titration, potentially diluting the efficacy signal. This person regards the asset as safe but questions its commercial strength against aggressive competition. A former Novo advisor was blunter: if Lilly\u0026rsquo;s retatrutide launches before CagriSema gains traction, it would be a \u0026ldquo;marketing car crash\u0026rdquo; for Novo, potentially relegating CagriSema to \u0026ldquo;second best\u0026rdquo; status.\u003c/p\u003e\n\u003cp\u003eNovo\u0026rsquo;s management pointed to the blinded REDEFINE 11 trial (flexible dosing) and a planned higher-dose CagriSema study as paths to demonstrating \u0026ldquo;full weight-loss potential.\u0026rdquo; Maybe. But REDEFINE 11 results won\u0026rsquo;t arrive until the \u003cstrong\u003efirst half of 2027\u003c/strong\u003e, and by then Lilly will likely have retatrutide data showing roughly 29% weight loss, plus an approved orforglipron pill without the fasting restrictions.\u003c/p\u003e\n\u003cp\u003eCagriSema will still probably get FDA approval in late 2026, based on the REDEFINE 1 and 2 placebo data. But launching a drug with clinical proof of inferiority to the market leader is a very different commercial proposition than launching one with a credible superiority story. Pricing, formulary positioning, and physician adoption all get harder. A former director at Eli Lilly told AlphaSense that Lilly\u0026rsquo;s retatrutide appears superior to both Zepbound and CagriSema based on available data, and that CagriSema lacks a compelling differentiation story, particularly on muscle preservation. The obesity market, this person believes, will double or triple over the next decade, but price reductions will be the primary driver of that expansion.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"eli-lilly-is-pulling-ahead\"\u003eEli Lilly is pulling ahead\u003c/h2\u003e\n\u003cp\u003eThis is the part I think the Novo bull case underweights. Lilly is pulling ahead on efficacy, pipeline breadth, oral convenience, manufacturing capacity, and patent duration, all at once.\u003c/p\u003e\n\u003cp\u003eBy end of Q3 2025, Lilly held \u003ca href=\"https://www.cnbc.com/2026/02/04/eli-lilly-novo-nordisk-earnings-glp1-market.html\"\u003e63% of US branded anti-obesity prescription share and 57% of total US GLP-1 scripts\u003c/a\u003e. Zepbound\u0026rsquo;s Q4 US revenue was \u003cstrong\u003e$4.2 billion\u003c/strong\u003e (+122% YoY). Full-year 2025 tirzepatide revenue reached \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lilly-reports-fourth-quarter-2025-financial-results-and-provides\"\u003e\u003cstrong\u003e$36.5 billion\u003c/strong\u003e\u003c/a\u003e, making it the world\u0026rsquo;s best-selling drug molecule. Lilly guided 2026 revenue at \u003cstrong\u003e$80-83 billion\u003c/strong\u003e, implying roughly 25% growth. Novo guided for a decline.\u003c/p\u003e\n\u003cp\u003eThree pipeline assets make the gap worse over time.\u003c/p\u003e\n\u003cp\u003eOrforglipron, Lilly\u0026rsquo;s oral non-peptide GLP-1, has an FDA decision expected April-May 2026. No food restrictions, no fasting window. It \u003ca href=\"https://www.clinicaltrialsarena.com/news/lillys-orforglipron-trumps-oral-semaglutide-in-head-to-head-trial/\"\u003ebeat oral semaglutide head-to-head\u003c/a\u003e in the ACHIEVE-3 diabetes trial. \u003ca href=\"https://www.goldmansachs.com/pdfs/insights/pages/gs-research/weighing-the-glp1-market/report.pdf\"\u003eGoldman Sachs projects\u003c/a\u003e 60% oral GLP-1 market share by 2030. An obesity physician familiar with both compounds views the orforglipron launch as a turning point precisely because it lacks the \u0026ldquo;strict rules\u0026rdquo; associated with oral Wegovy: fasting, water restrictions, the administration burden that limits real-world compliance. If efficacy is comparable, this person argues, the lower-friction option wins.\u003c/p\u003e\n\u003cp\u003eRetatrutide, the triple agonist (GLP-1/GIP/glucagon), showed \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lillys-triple-agonist-retatrutide-delivered-weight-loss-average\"\u003e\u003cstrong\u003e28.7%\u003c/strong\u003e weight loss at 68 weeks\u003c/a\u003e in TRIUMPH-4. That is 5+ points above CagriSema\u0026rsquo;s best showing. NDA filing is projected for late 2026. \u003ca href=\"https://www.clinicaltrialsarena.com/news/lilly-retatrutide-data-phase-iii-trial/\"\u003eGlobalData forecasts\u003c/a\u003e $15.6 billion in 2031 sales.\u003c/p\u003e\n\u003cp\u003eManufacturing: Lilly has committed \u003ca href=\"https://www.cnbc.com/2025/09/23/eli-lilly-plans-6point5-billion-texas-manufacturing-plant-for-obesity-pill.html\"\u003e\u003cstrong\u003e$50 billion+\u003c/strong\u003e in investment\u003c/a\u003e since 2020, including a \u003ca href=\"https://investor.lilly.com/news-releases/news-release-details/lilly-plans-build-new-65-billion-facility-manufacture-active\"\u003e$6.5 billion Texas oral pill facility\u003c/a\u003e. Tirzepatide patents extend through the back half of the 2030s, giving Lilly 5-7 more years of US exclusivity than semaglutide.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.biospace.com/business/lillys-weight-loss-trio-could-top-100b-in-revenue-thanks-to-oral-option\"\u003eTruist estimates\u003c/a\u003e Lilly\u0026rsquo;s obesity/diabetes trio could reach $101 billion in combined peak sales worldwide, before retatrutide even enters the market.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-wegovy-pill-one-bright-spot\"\u003eThe Wegovy pill: one bright spot\u003c/h2\u003e\n\u003cp\u003eCredit where it\u0026rsquo;s due. Oral Wegovy, approved December 22, 2025 and launched January 5, 2026, \u003ca href=\"https://www.nbcnews.com/health/health-news/170000-people-us-are-taking-wegovy-pill-novo-nordisk-says-rcna257395\"\u003ereached over 170,000 patients within four weeks\u003c/a\u003e. Weekly prescriptions hit roughly 50,000 by late January. \u003ca href=\"https://www.cnbc.com/2026/01/16/novo-nordisk-shares-wegovy-obesity-pill-launch.html\"\u003eTD Cowen noted\u003c/a\u003e it generated roughly 15x more prescriptions than injectable Wegovy at the same post-launch stage, and double Zepbound\u0026rsquo;s trajectory.\u003c/p\u003e\n\u003cp\u003eBut about 90% of those prescriptions are self-pay at $149/month, because formulary coverage for the new formulation is limited. That is great for patient access and terrible for revenue per patient compared to the injectable franchise. CEO Doustdar acknowledged the tension: the pill launch is strong, but \u0026ldquo;the price hit on the existing business trumps the great pill launch.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eClinically, oral semaglutide 25mg delivers roughly 13.6% weight loss in all-comers, well below injectable Wegovy (roughly 15%) and further below Zepbound (20%+). A former senior diabetes care specialist at Novo expressed skepticism about the oral format\u0026rsquo;s long-term success, noting the challenging administration requirements contrast poorly with Lilly\u0026rsquo;s track record of marketing easier-to-take products. A second former specialist at Novo offered the counterpoint: if Novo prices the Wegovy pill aggressively enough, it could capture share despite the convenience gap. The pricing lever is there. Whether management pulls it hard enough, fast enough, is the question.\u003c/p\u003e\n\u003cp\u003eWhen orforglipron arrives with comparable efficacy and no fasting requirement, the Wegovy pill\u0026rsquo;s competitive position narrows. The window is months, not years.\u003c/p\u003e\n\u003ch2 id=\"where-i-come-out\"\u003eWhere I come out\u003c/h2\u003e\n\u003cp\u003eI keep going back and forth on this one, and I think that ambivalence is the right response.\u003c/p\u003e\n\u003cp\u003eThe case for buying: Novo at 11x earnings is pricing in a catastrophe. The GLP-1 market is \u003ca href=\"https://www.jpmorgan.com/insights/global-research/current-events/obesity-drugs\"\u003eprojected to reach $100-150 billion by 2030\u003c/a\u003e. Novo still has the most prescribed semaglutide franchise on the planet. The Wegovy pill launch is legitimately strong. The balance sheet is healthy (debt/equity roughly 0.67x post-Catalent, free cash flow guided at DKK 35-45 billion). The dividend yield is approaching 4%. If the obesity treatment market is multi-winner rather than winner-take-all, Novo at these levels could compound nicely over 5+ years.\u003c/p\u003e\n\u003cp\u003eThe case for waiting: there is no positive catalyst before May at the earliest. Orforglipron approval could arrive any day and further pressure the oral franchise. Post-CagriSema analyst target revisions haven\u0026rsquo;t happened yet. European institutional selling may have further to run. \u003ca href=\"https://www.marketbeat.com/stocks/NYSE/NVO/short-interest/\"\u003eShort interest is under 1%\u003c/a\u003e of shares outstanding, meaning the 75% decline has been driven overwhelmingly by longs selling, not shorts pressing, with the implication that forced selling from funds that haven\u0026rsquo;t yet adjusted positions could continue. And the fundamental problem remains: Lilly has proven clinical superiority in injectables, will likely have a better oral, and has a triple agonist coming that makes both companies\u0026rsquo; current drugs look modest.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-novo-institutional-positioning-png-10\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/novo-institutional-positioning.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/novo-institutional-positioning.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/novo-institutional-positioning.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/novo-institutional-positioning.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-institutional-positioning.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/novo-institutional-positioning.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/novo-institutional-positioning.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/novo-institutional-positioning.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-institutional-positioning.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/novo-institutional-positioning.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/novo-institutional-positioning.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/novo-institutional-positioning.png\"\n           alt=\"Exhibit showing Q4 2025 13F filing data for Novo Nordisk ADR holders, with long-only funds Capital International and Fidelity cutting positions by 36.7 percent and 28.8 percent respectively, while options desks and quant funds including Citadel in put options plus 47.4 percent, Goldman Sachs in shares plus 63.6 percent, Jane Street in put options plus 86.9 percent, and D.E. Shaw in shares plus 126.2 percent are building positions\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-novo-institutional-positioning-png-10\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/novo-institutional-positioning.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Q4 2025 13F filing data for Novo Nordisk ADR holders, with long-only funds Capital International and Fidelity cutting positions by 36.7 percent and 28.8 percent respectively, while options desks and quant funds including Citadel in put options plus 47.4 percent, Goldman Sachs in shares plus 63.6 percent, Jane Street in put options plus 86.9 percent, and D.E. Shaw in shares plus 126.2 percent are building positions\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eMy instinct is that the stock is closer to a bottom than a top, but that the bottom may not be in yet. Forced selling, analyst downgrades from today\u0026rsquo;s CagriSema miss, and the looming orforglipron approval create a window where further downside is plausible. Barclays noted that some will call the 2026 guide a \u0026ldquo;kitchen sink\u0026rdquo; that management will beat, but as they pointed out, the same was said last year and it proved wrong.\u003c/p\u003e\n\u003cp\u003eFor investors with a 3-5 year horizon who can tolerate further near-term downside, this is getting interesting. For anyone who needs to see improving fundamentals before committing capital, there is no rush. The problems I\u0026rsquo;ve laid out here are structural, and structural takes quarters to fix, not days.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003eThis analysis is based on publicly available information as of February 23, 2026. It reflects the author\u0026rsquo;s personal interpretation and opinion, not investment, financial, or legal advice. I hold no position in NVO or LLY at the time of writing. All projections involve uncertainty and forward-looking statements may prove wrong. Key data sources: Novo Nordisk annual report FY2025, Novo Nordisk press releases, Eli Lilly Q4 2025 earnings, sell-side research (DNB Carnegie, Deutsche Bank, TD Cowen, Canaccord Genuity, CFRA, KeyBanc, Jefferies), FDA.gov, CMS.gov, company filings. Expert quotes are from third-party sources, not the author\u0026rsquo;s direct conversations.\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"Novo Nordisk lost 75% since June 2024. CagriSema failed vs Zepbound, US pricing is resetting lower, and Lilly leads on every axis. Full breakdown with numbers.","image":"https://static.philippdubach.com/ograph/ograph-whats-wrong-at-novo.jpg","date_published":"2026-02-23T00:00:00Z","date_modified":"2026-05-04T13:48:47+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Medicine"],"_philippdubach":{"type":"Analysis","word_count":2096,"reading_time_minutes":10,"keywords":["Novo Nordisk stock crash 2026","NVO stock analysis February 2026","CagriSema REDEFINE 4 trial results","CagriSema vs Zepbound head to head","Novo Nordisk 2026 guidance revenue decline","Eli Lilly vs Novo Nordisk obesity drugs","semaglutide patent expiry international","Wegovy pill launch 2026","orforglipron FDA approval 2026","retatrutide Phase 3 weight loss","GLP-1 market pricing pressure MFN","Novo Nordisk valuation forward PE","obesity drug market 2030 forecast","tirzepatide revenue 2025","Novo Nordisk CEO Doustdar restructuring"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-absolute-insider-mess-of-prediction-markets/","url":"https://philippdubach.com/posts/the-absolute-insider-mess-of-prediction-markets/","title":"The Absolute Insider Mess of Prediction Markets","content_html":"\u003cp\u003eSomeone at Google, or close enough to Google, \u003ca href=\"https://www.inc.com/ava-levinson/polymarket-million-dollar-google-win-raises-questions/91274626\"\u003edeposited $3 million into Polymarket\u003c/a\u003e on December 3, 2025, bet on 23 separate \u0026ldquo;Google Year in Search\u0026rdquo; outcomes, \u003ca href=\"https://gizmodo.com/polymarket-user-accused-of-1-million-insider-trade-on-google-search-markets-2000696258\"\u003egot 22 right\u003c/a\u003e, and walked away with \u003cstrong\u003e$1.15 million\u003c/strong\u003e in profit in under 24 hours. One of those bets: that \u003ca href=\"https://thedefiant.io/news/defi/polymarket-users-suspect-insider-trading-after-google-trend-markets-crown-surprise-winner\"\u003ed4vd would be the most-searched person of 2025\u003c/a\u003e, purchased at roughly 5 cents when the market gave it a 0.2% probability.\u003c/p\u003e\n\u003cp\u003eThe wallet, originally called AlphaRacoon, had previously made over $150,000 correctly \u003ca href=\"https://finance.yahoo.com/news/polymarket-user-makes-over-1-155738322.html\"\u003epredicting the exact launch window\u003c/a\u003e of Google\u0026rsquo;s Gemini 3.0 in November 2025. As blockchain engineer \u003ca href=\"https://x.com/JeongHaeju\"\u003eHaeju Jeong\u003c/a\u003e, who first flagged the account, put it: this is a Google insider milking Polymarket for quick money. The wallet later changed its username to 0xafEe, which might be the most half-hearted attempt at anonymity since an MIT researcher \u003ca href=\"https://www.cnbc.com/2017/07/13/mit-scientist-googled-insider-trading-then-got-arrested-for-insider-trading.html\"\u003eGoogled \u0026ldquo;how sec detect unusual trade\u0026rdquo;\u003c/a\u003e before insider trading.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve been following prediction markets for a while, mostly for the macro forecasting angle, but also because the regulatory ambiguity is fascinating and there are some market inefficiencies worth watching. But the last three months have produced a concentration of insider trading cases that made me want to work through the problem more carefully. The AlphaRacoon case is the most entertaining. The two that followed are more serious.\u003c/p\u003e\n\u003ch2 id=\"three-cases-three-months-zero-enforcement\"\u003eThree cases, three months, zero enforcement\u003c/h2\u003e\n\u003cp\u003eOn February 12, 2026, Israeli authorities \u003ca href=\"https://www.timesofisrael.com/two-indicted-for-using-classified-info-to-place-online-bets-on-military-operations/\"\u003eindicted two people\u003c/a\u003e for using classified military intelligence to bet on Polymarket during \u003ca href=\"https://www.npr.org/2026/02/12/nx-s1-5712801/polymarket-bets-traders-israel-military\"\u003eIsrael\u0026rsquo;s 12-day war with Iran\u003c/a\u003e in June 2025. A Polymarket account called \u003ca href=\"https://gizmodo.com/israel-accuses-two-polymarket-bettors-of-trading-on-classified-military-operations-2000721224\"\u003e\u0026ldquo;ricosuave666\u0026rdquo;\u003c/a\u003e placed seven bets on questions like \u0026ldquo;Will Israel attack Iran on Friday?\u0026rdquo; and got every one correct. The most profitable single wager: \u003ca href=\"https://coinpaper.com/14565/israel-indicts-two-over-polymarket-iran-bets\"\u003enearly $129,000\u003c/a\u003e that Israel would strike by a specified date. Total winnings: roughly \u003ca href=\"https://www.middleeasteye.net/news/israeli-soldier-indicted-allegedy-using-classified-intelligence-bet-attacks-mena\"\u003e$150,000-$152,000\u003c/a\u003e. The \u003ca href=\"https://www.nbcnews.com/world/israel/israel-charges-reservist-classified-information-bet-polymarket-rcna258709\"\u003eShin Bet, Israel Police, and Defense Ministry\u003c/a\u003e called it a real security risk to IDF operations. This is the \u003ca href=\"https://www.npr.org/2026/02/12/nx-s1-5712801/polymarket-bets-traders-israel-military\"\u003efirst criminal prosecution\u003c/a\u003e anywhere in the world tied to prediction market insider trading.\u003c/p\u003e\n\u003cp\u003eIn between, there was Venezuela. On the evening of January 2, 2026, an account called \u0026ldquo;Burdensome-Mix,\u0026rdquo; created less than a week earlier, placed over $20,000 in bets that Maduro would be removed from power by January 31. Less than an hour after the final bet, \u003ca href=\"https://www.cbsnews.com/news/polymarket-maduro-capture-bet-400000/\"\u003eTrump ordered the military strike\u003c/a\u003e. By 4:21 AM, Maduro was captured. The account\u0026rsquo;s $33,934 across 13 bets \u003ca href=\"https://fortune.com/2026/01/05/prediction-markets-insider-trading-problem/\"\u003ereturned \u003cstrong\u003e$436,759\u003c/strong\u003e\u003c/a\u003e. \u003ca href=\"https://www.ms.now/news/lucrative-bets-on-venezuela-trigger-insider-trading-scrutiny\"\u003eChainalysis found\u003c/a\u003e the trader cashed out through mainstream U.S. exchanges with no apparent effort to hide their identity. The trader has never been identified.\u003c/p\u003e\n\u003cp\u003eEach case escalates. AlphaRacoon is someone profiting from corporate knowledge. Burdensome-Mix had advance knowledge of U.S. foreign policy. The Israeli soldiers were monetizing classified operational intelligence during wartime. The surface area for insider trading on prediction markets is, to use the technical term, enormous. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-three-insider-cases-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/three-insider-cases.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/three-insider-cases.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/three-insider-cases.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/three-insider-cases.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-insider-cases.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/three-insider-cases.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/three-insider-cases.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/three-insider-cases.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-insider-cases.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/three-insider-cases.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/three-insider-cases.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/three-insider-cases.png\"\n           alt=\"Exhibit showing three insider trading cases compared side by side: AlphaRacoon with $1.15M profit from 22 of 23 Google bets using corporate information with no enforcement, Burdensome-Mix with $436K profit from 13 of 13 Maduro bets using government information with no enforcement, and ricosuave666 with $152K profit from 7 of 7 Israel-Iran strike bets using classified military intelligence prosecuted by Israel only\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-three-insider-cases-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/three-insider-cases.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing three insider trading cases compared side by side: AlphaRacoon with $1.15M profit from 22 of 23 Google bets using corporate information with no enforcement, Burdensome-Mix with $436K profit from 13 of 13 Maduro bets using government information with no enforcement, and ricosuave666 with $152K profit from 7 of 7 Israel-Iran strike bets using classified military intelligence prosecuted by Israel only\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"now-theres-an-ai-that-hunts-them\"\u003eNow there\u0026rsquo;s an AI that hunts them\u003c/h2\u003e\n\u003cp\u003ePeter Liu, a former Google DeepMind research scientist now co-founding Twenty Labs, \u003ca href=\"https://x.com/peterjliu/status/2024901585806225723\"\u003epublished results\u003c/a\u003e from Compound AI\u0026rsquo;s Polymarket integration that systematically detects suspected insiders. The system built a custom database optimized for AI agent queries rather than relying on Polymarket\u0026rsquo;s rate-limited API. Liu described the agents as \u0026ldquo;super-human at making data science queries,\u0026rdquo; noting that each agent operates like 10 concurrent human analysts.\u003c/p\u003e\n\u003cp\u003eCompound AI independently rediscovered AlphaRacoon despite the username change. More interestingly, it found that AlphaRacoon has friends: a user called \u0026ldquo;yicici\u0026rdquo; who made money in the same Google markets, suggesting a coordinated network rather than a lone wolf. When pointed at OpenAI, the system found accounts \u0026ldquo;oddly good at predicting OpenAI launch dates for models and products,\u0026rdquo; with at least one that exclusively traded OpenAI events.\u003c/p\u003e\n\u003cp\u003eIt\u0026rsquo;s not just Compound AI. \u003ca href=\"https://gizmodo.com/tracking-insider-trading-on-polymarket-is-turning-into-a-business-of-its-own-2000709286\"\u003ePolysights\u003c/a\u003e, built by 29-year-old Canadian trader Tre Upshaw, has attracted \u003ca href=\"https://www.bloomberg.com/news/articles/2026-01-13/prediction-market-insider-trading-drawing-increased-scrutiny\"\u003e24,000 users\u003c/a\u003e and is closing a $2 million funding round after receiving a \u003ca href=\"https://gizmodo.com/tracking-insider-trading-on-polymarket-is-turning-into-a-business-of-its-own-2000709286\"\u003e$25,000 Polymarket grant\u003c/a\u003e. Roughly 85% of flagged trades turned out to be winners. Individual programmers have built \u003ca href=\"https://www.civolatility.com/p/polymarkets-insider-trading-problem\"\u003ecopytrading bots\u003c/a\u003e that follow suspected insiders, with one reportedly turning $5,700 into $80,000 by tailing signals during the Maduro event.\u003c/p\u003e\n\u003cp\u003eThe irony is rich. Blockchain\u0026rsquo;s radical transparency, the thing that was supposed to make financial markets honest, is simultaneously enabling insider detection and insider copytrading. The same data pipeline that lets Compound AI catch cheaters also lets copytraders amplify their profits.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"regulation\"\u003eRegulation\u003c/h2\u003e\n\u003cp\u003eOn regulated stock markets, insider trading law is well-established. \u003ca href=\"https://en.wikipedia.org/wiki/SEC_Rule_10b-5\"\u003eSEC Rule 10b-5\u003c/a\u003e, decades of case law, a well-staffed enforcement division, cooperation agreements with every broker-dealer in America. Everyone in the industry knows it\u0026rsquo;s illegal.\u003c/p\u003e\n\u003cp\u003eOn prediction markets, almost none of that infrastructure exists. SEC Rule 10b-5 doesn\u0026rsquo;t apply because \u003ca href=\"https://www.corporatecomplianceinsights.com/prediction-markets-sports-betting-insider-trading/\"\u003eprediction market contracts are swaps, not securities\u003c/a\u003e. That puts them under the \u003ca href=\"https://en.wikipedia.org/wiki/Commodity_Futures_Trading_Commission\"\u003eCFTC\u003c/a\u003e, which has historically focused on commodity manipulation (spoofing, cornering), not information-based trading. The CFTC has brought \u003ca href=\"https://www.dlnews.com/articles/regulation/prediction-markets-bend-insider-trading-rules-will-they-break/\"\u003eexactly zero enforcement actions\u003c/a\u003e for prediction market insider trading.\u003c/p\u003e\n\u003cp\u003eThe CFTC does have \u003ca href=\"https://www.corporatecomplianceinsights.com/prediction-markets-sports-betting-insider-trading/\"\u003eRule 180.1\u003c/a\u003e, modeled on 10b-5, which prohibits trading on material nonpublic information. But with a distinction that matters: it requires proof of a breached \u0026ldquo;pre-existing duty.\u0026rdquo; In securities law, nearly any MNPI-based trade violates the law. In commodities law, trading on proprietary information is the entire point: a farmer trading grain futures based on their own crop outlook is how the market is supposed to work. Former CFTC Commissioner \u003ca href=\"https://en.wikipedia.org/wiki/Kalshi\"\u003eCaroline Pham\u003c/a\u003e has argued that importing securities-law concepts into derivatives markets is analytically confused.\u003c/p\u003e\n\u003cp\u003eDaniel Barabander of Variant Fund \u003ca href=\"https://variant.fund/articles/thoughts-law-insider-trading-prediction-markets/\"\u003epublished an analysis\u003c/a\u003e on February 6 that crystallized the problem. Insider trading is fundamentally about breaching a promise: a Tesla employee trading on a \u0026ldquo;Will TSLA beat Q4 estimates?\u0026rdquo; prediction market violates their confidentiality obligations. But someone who overhears investment bankers discussing a deal at a restaurant generally commits no crime, because no promise exists to breach. Prediction markets, \u0026ldquo;by making almost anything tradable,\u0026rdquo; expand valuable inside information into contexts where the existence of any relevant promise is far less clear.\u003c/p\u003e\n\u003cp\u003eThe strongest enforcement tool may be criminal wire fraud. At the Securities Enforcement Forum on February 5, SDNY U.S. Attorney Jay Clayton was \u003ca href=\"https://natlawreview.com/article/betting-future-enforcement-risks-prediction-markets\"\u003easked\u003c/a\u003e whether prediction market participants were beyond the reach of fraud statutes. His answer: \u0026ldquo;No.\u0026rdquo; Asked whether to expect enforcement actions: \u0026ldquo;Yes.\u0026rdquo; But \u003ca href=\"https://www.corporatecomplianceinsights.com/prediction-markets-sports-betting-insider-trading/\"\u003ePolymarket\u0026rsquo;s terms of service\u003c/a\u003e don\u0026rsquo;t specifically mention insider trading, which complicates the wire fraud theory.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Matt_Levine_(journalist)\"\u003eMatt Levine\u003c/a\u003e, who has written about this topic at least three times between December 2025 and February 2026, \u003ca href=\"https://www.bloomberg.com/opinion/newsletters/2026-02-12/insider-trading-on-war\"\u003eputs it best\u003c/a\u003e. His core argument: insider trading is not about fairness. It\u0026rsquo;s about theft. The problem isn\u0026rsquo;t that you have information the market doesn\u0026rsquo;t. You\u0026rsquo;re supposed to try to get information the market doesn\u0026rsquo;t; that\u0026rsquo;s the entire point of financial markets. The problem is that you\u0026rsquo;re using information that belongs to someone else, your employer or client or country, without their permission. You\u0026rsquo;ve breached a duty.\u003c/p\u003e\n\u003cp\u003eThis framing matters because prediction market enthusiasts instinctively believe insider trading is good for their markets: it makes prices more accurate. Levine acknowledged this directly. But he also identified the fatal flaw: if prediction markets are full of insider traders, there\u0026rsquo;d be no one to trade against. He estimated that the first 20 people to get arrested for insider trading on Kalshi \u0026ldquo;will be very surprised.\u0026rdquo;\u003c/p\u003e\n\u003ch2 id=\"why-regulation-matters-the-lemons-problem\"\u003eWhy regulation matters: the lemons problem\u003c/h2\u003e\n\u003cp\u003eThe economic case for regulating insider trading on prediction markets goes beyond fairness or legality: it\u0026rsquo;s about whether these markets can survive.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/George_Akerlof\"\u003eGeorge Akerlof\u0026rsquo;s\u003c/a\u003e 1970 \u003ca href=\"https://en.wikipedia.org/wiki/The_Market_for_Lemons\"\u003e\u0026ldquo;Market for Lemons\u0026rdquo;\u003c/a\u003e paper described a dynamic where information asymmetry between buyers and sellers causes markets to collapse. When sellers know more than buyers about product quality, buyers reduce their willingness to pay. Honest sellers with good products leave the market because they can\u0026rsquo;t get fair prices. This raises the average \u0026ldquo;lemon\u0026rdquo; rate among remaining sellers, causing more buyers to withdraw. The process continues until only lemons remain.\u003c/p\u003e\n\u003cp\u003eApplied to prediction markets: if insiders consistently win, uninformed participants recognize they\u0026rsquo;re trading against counterparties with superior information and leave. Market makers widen spreads or exit entirely. Dartmouth economist \u003ca href=\"https://faculty.tuck.dartmouth.edu/eric-zitzewitz/\"\u003eEric Zitzewitz\u003c/a\u003e, who studies prediction markets, has stated this directly: prediction markets \u0026ldquo;require loads of uninformed investors to function\u0026rdquo; for liquidity. If liquidity providers worry about \u003ca href=\"https://en.wikipedia.org/wiki/Adverse_selection\"\u003eadverse selection\u003c/a\u003e, they provide less liquidity, and any accuracy benefit from insider trading is more than offset by the participation loss. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-adverse-selection-spiral-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/adverse-selection-spiral.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/adverse-selection-spiral.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/adverse-selection-spiral.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/adverse-selection-spiral.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/adverse-selection-spiral.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/adverse-selection-spiral.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/adverse-selection-spiral.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/adverse-selection-spiral.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/adverse-selection-spiral.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/adverse-selection-spiral.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/adverse-selection-spiral.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/adverse-selection-spiral.png\"\n           alt=\"Exhibit showing Akerlof\u0026#39;s Market for Lemons dynamic applied to prediction markets as a five-step adverse selection spiral: Step 1 insiders profit at extreme win rates, Step 2 uninformed traders absorb systematic losses, Step 3 participants withdraw from the market, Step 4 liquidity collapses with wider spreads, Step 5 forecasting accuracy degrades as the cycle repeats\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-adverse-selection-spiral-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/adverse-selection-spiral.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit showing Akerlof\u0026#39;s Market for Lemons dynamic applied to prediction markets as a five-step adverse selection spiral: Step 1 insiders profit at extreme win rates, Step 2 uninformed traders absorb systematic losses, Step 3 participants withdraw from the market, Step 4 liquidity collapses with wider spreads, Step 5 forecasting accuracy degrades as the cycle repeats\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eWall Street firms are entering prediction markets at speed: \u003ca href=\"https://www.drw.com/work-at-drw/listings/prediction-markets-trader-3332253\"\u003eDRW\u003c/a\u003e is building a dedicated desk at $175,000-$200,000 base salary, \u003ca href=\"https://news.kalshi.com/p/liquid-prediction-markets-are-finally-here\"\u003eSusquehanna became Kalshi\u0026rsquo;s first official market maker\u003c/a\u003e, \u003ca href=\"https://www.bloomberg.com/news/articles/2026-02-09/jump-trading-poised-to-gain-stakes-in-kalshi-and-polymarket\"\u003eJump Trading\u003c/a\u003e is taking equity stakes in both platforms, and \u003ca href=\"https://www.cnbc.com/2026/01/15/goldman-sachs-ceo-looks-at-how-to-get-involved-in-prediction-markets.html\"\u003eGoldman Sachs CEO David Solomon\u003c/a\u003e has met leadership of both Kalshi and Polymarket. These firms are there to \u003ca href=\"https://www.financemagnates.com/fintech/wall-street-quants-move-into-prediction-markets-to-hunt-for-arbitrage-not-to-bet/\"\u003emake markets, not to bet on whether Israel will strike Iran\u003c/a\u003e. Market makers who systematically take the other side of trades bleed money when their counterparties have inside information. If the institutional players conclude the game is rigged, the resulting liquidity withdrawal would hollow out the market.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eCombined Polymarket and Kalshi weekly volume \u003ca href=\"https://europeanbusinessmagazine.com/business/prediction-markets-are-now-a-6b-a-week-industry-heres-whos-winning/\"\u003eexceeded $6 billion\u003c/a\u003e by early 2026. Full-year 2025 volume across all platforms \u003ca href=\"https://www.gamblinginsider.com/in-depth/110180/prediction-market-statistics\"\u003ereached approximately \u003cstrong\u003e$44 billion\u003c/strong\u003e\u003c/a\u003e, a roughly 300x increase from early 2024. \u003ca href=\"https://www.npr.org/2026/01/17/nx-s1-5672615/kalshi-polymarket-prediction-market-boom-traders-slang-glossary\"\u003eBloomberg terminals now carry prediction market data\u003c/a\u003e. \u003ca href=\"https://www.npr.org/2026/01/17/nx-s1-5672615/kalshi-polymarket-prediction-market-boom-traders-slang-glossary\"\u003eCNN\u003c/a\u003e struck a deal to integrate Kalshi markets into its coverage.\u003c/p\u003e\n\u003ch2 id=\"prediction-markets-as-macroeconomic-forecasting-tools\"\u003ePrediction markets as macroeconomic forecasting tools\u003c/h2\u003e\n\u003cp\u003eOn February 12, 2026, the same day Israeli authorities announced the first-ever prediction market insider trading prosecution, Federal Reserve Board economist \u003ca href=\"https://www.federalreserve.gov/econres/anthony-m-diercks.htm\"\u003eAnthony Diercks\u003c/a\u003e, along with Jared Dean Katz (Northwestern) and Jonathan Wright (\u003ca href=\"https://www.nber.org/papers/w34702\"\u003eJohns Hopkins/NBER\u003c/a\u003e), \u003ca href=\"https://www.federalreserve.gov/econres/feds/kalshi-and-the-rise-of-macro-markets.htm\"\u003epublished\u003c/a\u003e \u0026ldquo;Kalshi and the Rise of Macro Markets\u0026rdquo; through the \u003ca href=\"https://www.federalreserve.gov/econres/feds/index.htm\"\u003eFed\u0026rsquo;s Finance and Economics Discussion Series\u003c/a\u003e. It\u0026rsquo;s the \u003ca href=\"https://natlawreview.com/article/federal-reserve-researchers-find-prediction-markets-deliver-forecasting-value\"\u003emost thorough empirical study yet\u003c/a\u003e on whether prediction markets work as macroeconomic forecasting tools.\u003c/p\u003e\n\u003cp\u003eThe headline finding: Kalshi\u0026rsquo;s macro markets perform as well as, and in some cases better than, traditional forecasting instruments. For \u003ca href=\"https://en.wikipedia.org/wiki/Federal_funds_rate\"\u003efederal funds rate\u003c/a\u003e decisions, Kalshi\u0026rsquo;s median and mode forecasts \u003ca href=\"https://defirate.com/news/federal-reserve-study-finds-kalshi-markets-rival-traditional-economic-forecast-tools/\"\u003ematched the actual policy outcome\u003c/a\u003e on the day before every FOMC meeting since 2022. That\u0026rsquo;s a perfect record. The mean absolute error for rate forecasts 150 days out was comparable to the \u003ca href=\"https://www.newyorkfed.org/markets/survey-market-participants\"\u003eNew York Fed\u0026rsquo;s Survey of Market Expectations\u003c/a\u003e, a survey of professional forecasters. For headline CPI, Kalshi forecasts \u003ca href=\"https://www.cryptonewsz.com/federal-reserve-study-kalshi-macro-forecast/\"\u003estatistically outperformed the Bloomberg consensus\u003c/a\u003e in certain windows.\u003c/p\u003e\n\u003cp\u003eThe paper identifies a specific structural advantage. \u003ca href=\"https://en.wikipedia.org/wiki/Federal_funds_rate#Federal_funds_futures\"\u003eFed funds futures\u003c/a\u003e force a binomial assumption: two possible outcomes per meeting. Kalshi\u0026rsquo;s contract structure assigns nonzero probability to seven or more distinct rate outcomes simultaneously. After speeches by Fed Governors \u003ca href=\"https://en.wikipedia.org/wiki/Christopher_Waller\"\u003eWaller\u003c/a\u003e and Bowman, Kalshi markets adjusted the implied probability of a July 2025 rate cut to around 25% within hours. That probability dropped after the June employment report beat forecasts. This is what the authors call \u0026ldquo;rich intraday dynamics\u0026rdquo;: the market updates continuously as information arrives, unlike surveys that provide snapshots every six weeks.\u003c/p\u003e\n\u003cp\u003eThe Fed paper is preliminary research, not official policy. But the central bank\u0026rsquo;s own economists are treating prediction markets as credible information infrastructure. The authors \u003ca href=\"https://natlawreview.com/article/federal-reserve-researchers-find-prediction-markets-deliver-forecasting-value\"\u003eintend to make the underlying data publicly available\u003c/a\u003e, which would further normalize prediction market data as a standard input to policy analysis.\u003c/p\u003e\n\u003cp\u003eIf prediction markets are valuable enough that the Federal Reserve is studying them as forecasting tools for \u003ca href=\"https://en.wikipedia.org/wiki/Monetary_policy_of_the_United_States\"\u003emonetary policy\u003c/a\u003e, the insider trading problem becomes a question of whether a tool the central bank wants to rely on can maintain the informational integrity that makes it useful. Insiders trading on classified military intelligence don\u0026rsquo;t make the Fed\u0026rsquo;s rate probability distributions more accurate. They make them less trustworthy.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-regulatory-picture-fractured\"\u003eThe regulatory picture, fractured\u003c/h2\u003e\n\u003cp\u003eThere are two regulatory tracks, and they aren\u0026rsquo;t converging.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Kalshi\"\u003eKalshi\u003c/a\u003e is \u003ca href=\"https://news.kalshi.com/p/how-kalshi-keeps-traders-safe\"\u003eCFTC-regulated\u003c/a\u003e, explicitly prohibits insider trading, runs an in-house surveillance system called \u0026ldquo;Poirot,\u0026rdquo; has completed over 200 investigations in the past year, and requires \u003ca href=\"https://en.wikipedia.org/wiki/Know_your_customer\"\u003eKYC/AML\u003c/a\u003e verification. \u003ca href=\"https://en.wikipedia.org/wiki/Polymarket\"\u003ePolymarket\u003c/a\u003e\u0026rsquo;s international platform, operated by a Panama-incorporated entity, allows \u003ca href=\"https://www.corporatecomplianceinsights.com/prediction-markets-sports-betting-insider-trading/\"\u003epermissionless crypto wallets without identity verification\u003c/a\u003e. Its terms of service don\u0026rsquo;t specifically mention insider trading. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-kalshi-vs-polymarket-regulation-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/kalshi-vs-polymarket-regulation.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/kalshi-vs-polymarket-regulation.png\"\n           alt=\"Exhibit comparing Kalshi and Polymarket regulatory postures across seven dimensions: Kalshi has CFTC regulation, full KYC, explicit insider trading prohibition, Poirot surveillance system, institutional market makers, and USD settlement, while Polymarket has no regulator, permissionless crypto wallets, no insider trading policy, no surveillance, and its CEO calls insider trading super cool\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-kalshi-vs-polymarket-regulation-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/kalshi-vs-polymarket-regulation.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exhibit comparing Kalshi and Polymarket regulatory postures across seven dimensions: Kalshi has CFTC regulation, full KYC, explicit insider trading prohibition, Poirot surveillance system, institutional market makers, and USD settlement, while Polymarket has no regulator, permissionless crypto wallets, no insider trading policy, no surveillance, and its CEO calls insider trading super cool\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eCFTC Chairman Michael Selig, confirmed in December 2025, laid out a \u003ca href=\"https://www.sidley.com/en/insights/newsupdates/2026/02/us-cftc-signals-imminent-rulemaking-on-prediction-markets\"\u003efour-part plan\u003c/a\u003e on January 29: withdraw the Biden-era proposed ban on political event contracts (done February 4), begin drafting new rules, assess ongoing litigation, and support market development. On February 17, he \u003ca href=\"https://www.cnbc.com/2026/02/17/cftc-defends-prediction-market-enforcement-states-challenge.html\"\u003epublished a Wall Street Journal op-ed\u003c/a\u003e asserting exclusive CFTC jurisdiction over prediction markets and filed an amicus brief supporting Crypto.com against Nevada gaming regulators. Selig announced an advisory committee whose planned members include both \u003ca href=\"https://en.wikipedia.org/wiki/Polymarket\"\u003ePolymarket CEO Shayne Coplan\u003c/a\u003e and \u003ca href=\"https://en.wikipedia.org/wiki/Kalshi\"\u003eKalshi CEO Tarek Mansour\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eRep. Ritchie Torres (D-NY) \u003ca href=\"https://ritchietorres.house.gov/posts/in-response-to-suspicious-polymarket-trade-preceding-maduro-operation-rep-ritchie-torres-introduces-legislation-to-crack-down-on-insider-trading-on-prediction-markets\"\u003eintroduced legislation\u003c/a\u003e in late January, directly responding to the Maduro trade, that would ban federal officials from trading prediction market contracts related to government activity. The bill targets a real problem, Levine\u0026rsquo;s point about government officials profiting from events they can influence, but it doesn\u0026rsquo;t create a general insider trading prohibition. It wouldn\u0026rsquo;t have stopped AlphaRacoon or the Israeli soldiers.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m genuinely unsure where this lands. The libertarian case for prediction market insider trading, that it makes prices more accurate and the market should be a pure information aggregation mechanism, has intellectual appeal. The Akerlof case against it, that unchecked adverse selection destroys the market\u0026rsquo;s ability to function, has empirical support. The \u003ca href=\"https://www.federalreserve.gov/econres/feds/kalshi-and-the-rise-of-macro-markets.htm\"\u003eDiercks, Katz, and Wright paper\u003c/a\u003e suggests the stakes are higher than either camp acknowledges: these aren\u0026rsquo;t just gambling venues. They\u0026rsquo;re becoming part of the plumbing that central banks and institutional investors use to make real decisions.\u003c/p\u003e\n\u003cp\u003eMy instinct, and I want to be honest that it\u0026rsquo;s more instinct than conclusion at this point, is that the prediction market industry will end up roughly where securities markets were after the \u003ca href=\"https://en.wikipedia.org/wiki/Securities_Exchange_Act_of_1934\"\u003eSecurities Exchange Act of 1934\u003c/a\u003e. Some insider trading enforcement is necessary to maintain market integrity, not because trading on private information is inherently wrong, but because without it, the adverse selection spiral will destroy the markets that are otherwise proving genuinely useful. The question is whether that enforcement framework gets built proactively or whether it takes a scandal large enough to force it.\u003c/p\u003e\n\u003cp\u003ePolymarket\u0026rsquo;s CEO has \u003ca href=\"https://gizmodo.com/israel-accuses-two-polymarket-bettors-of-trading-on-classified-military-operations-2000721224\"\u003ecalled insider trading \u0026ldquo;super cool.\u0026rdquo;\u003c/a\u003e The Fed is \u003ca href=\"https://www.federalreserve.gov/econres/feds/kalshi-and-the-rise-of-macro-markets.htm\"\u003estudying his platform\u0026rsquo;s macro forecasting ability\u003c/a\u003e. The Israeli military is \u003ca href=\"https://www.npr.org/2026/02/12/nx-s1-5712801/polymarket-bets-traders-israel-military\"\u003eprosecuting soldiers\u003c/a\u003e who bet on it.\u003c/p\u003e\n","summary":"A Google insider made $1.15M on Polymarket in 24 hours. Israeli soldiers bet classified strike timing. Why prediction markets need insider trading regulation.","image":"https://static.philippdubach.com/ograph/ograph-prediction-market-insider-trading.jpg","date_published":"2026-02-22T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Investing"],"_philippdubach":{"type":"Analysis","word_count":2300,"reading_time_minutes":11,"keywords":["Polymarket insider trading","prediction market regulation","prediction market insider trading legal","Kalshi vs Polymarket","adverse selection prediction markets"],"section":"posts"}},{"id":"https://philippdubach.com/posts/economics-of-a-super-bowl-ad/","url":"https://philippdubach.com/posts/economics-of-a-super-bowl-ad/","title":"Economics of a Super Bowl Ad","content_html":"\u003cp\u003eA 30-second Super Bowl ad costs \u003cstrong\u003e$8 million\u003c/strong\u003e. That\u0026rsquo;s $267,000 per second, roughly the median U.S. home price for every tick of the clock. Super Bowl LX drew \u003ca href=\"https://www.nielsen.com/news-center/2026/super-bowl-lx-delivers-124-9-million-viewers/\"\u003e124.9 million average viewers with a peak of 137.8 million\u003c/a\u003e, the highest peak audience in American television history. The NFL accounted for \u003ca href=\"https://www.sportico.com/business/media/2026/sportico-top-100-nfl-towers-over-us-media-landscape-1234880235/\"\u003e84 of the top 100 most-watched U.S. telecasts\u003c/a\u003e in 2025. The Oscars, by comparison, managed 19.7 million.\u003c/p\u003e\n\u003cp\u003eRo (that\u0026rsquo;s the name of the direct-to-patient telehealth company) CEO Zachariah Reitano, writing from direct experience as a 2026 Super Bowl advertiser, \u003ca href=\"https://ro.co/perspectives/super-bowl-economics/\"\u003epublished a detailed cost breakdown\u003c/a\u003e based on his own spending and interviews with 10+ brands. The picture that emerges is considerably more expensive than the headline number. Production runs $1–4 million for studio, crew, and post-production before any famous face enters the frame. Celebrity endorsement talent adds $1–5 million, with the current A-list sweet spot at $3–5 million \u003ca href=\"https://www.hollywoodreporter.com/business/business-news/2026-super-bowl-ads-stars-ai-comedy-1236490270/\"\u003eaccording to WME agent Tim Curtis\u003c/a\u003e. Then comes the companion buy: for every 30-second slot, advertisers are generally required to commit to spending an equivalent amount on other programs broadcast by the same network. For NBC\u0026rsquo;s 2026 Super Bowl, that meant additional inventory across the Winter Olympics and NBA All-Star Game, adding another $7–10 million to the tab.\u003c/p\u003e\n\u003cp\u003eTotal committed spend: \u003cstrong\u003e$16–23 million\u003c/strong\u003e for a single 30-second spot. \u003ca href=\"https://www.cfo.com/news/a-cfo-guide-to-super-bowl-ad-spend-jason-hershman-point-/811381/\"\u003eCFO.com\u0026rsquo;s Jason Hershman\u003c/a\u003e brackets the full range at $15–50 million depending on ambition.\u003c/p\u003e\n\u003cp\u003eFor companies already spending nine figures annually on marketing, the framing of a Super Bowl ad as a \u0026ldquo;portfolio bet with capped downside\u0026rdquo; applies to virtually any marketing investment at that scale. It\u0026rsquo;s whether that $10 million generates more value here than in the other places you\u0026rsquo;ve been spending $10 million. The observation is reductive but directionally useful: the special-ness of the Super Bowl needs to be demonstrated in the data, not assumed from the vibes. But then on the other hand, as \u003ca href=\"https://www.bloomberg.com/opinion/newsletters/2026-02-09/predicting-the-big-game?cmpid=BBD020926_MONEYSTUFF\"\u003eMatt Levin\u003c/a\u003e puts it, it\u0026rsquo;s comparably cheap:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eOne thing that the ads made me think about is how cheap Super Bowl advertising is, for an AI company. A Super Bowl spot costs something like $10 million for airtime plus another few million to produce, for a total at the high end of maybe $20 or $30 million, or roughly the cost of paying one employee for one month at a leading AI lab. Mark Zuckerberg carries around $30 million in his wallet in case he runs into an OpenAI engineer at Starbucks. The cost of creating a cutting-edge AI model — in compute and researcher pay — is astronomical in a way that makes the cost of any advertising, even Super Bowl advertising, look like nothing.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eBut let\u0026rsquo;s look at the data.\u003c/p\u003e\n\u003ch2 id=\"the-cpm-looks-reasonable-everything-else-is-complicated\"\u003eThe CPM looks reasonable. Everything else is complicated.\u003c/h2\u003e\n\u003cp\u003eAt $8 million reaching roughly 125 million viewers, the Super Bowl\u0026rsquo;s \u003ca href=\"https://adwave.com/resources/super-bowl-commercial-cost\"\u003eeffective CPM lands around $63–65 per thousand impressions\u003c/a\u003e. Standard primetime TV runs $20–30. Streaming TV sits at $15–35. TikTok charges $5–10. \u003ca href=\"https://digiday.com/marketing/heres-what-else-a-8m-30-second-super-bowl-budget-can-purchase-in-2026/\"\u003eDigiday calculated\u003c/a\u003e that for the same $8 million media buy, an advertiser could purchase 1.6 billion TikTok impressions, 267 million Google search impressions, or a primetime network TV spot every night for four months.\u003c/p\u003e\n\u003cp\u003eBut CPM comparisons are misleading here because they treat all impressions as equivalent. They aren\u0026rsquo;t. The Super Bowl is the last true monoculture event in American media, and the only advertising environment where the ads are the product. People rewatch them, rank them, discuss them at work Monday morning. The Today Show airs them as content. \u003ca href=\"https://www.edo.com/resources/how-tv-advertisers-can-win-super-bowl-and-beyond\"\u003eEDO\u003c/a\u003e, a TV outcomes measurement company, found that a single Super Bowl ad generates the same brand-search engagement as \u003cstrong\u003e1,056 primetime ads\u003c/strong\u003e.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"there-is-academic-evidence-on-super-bowl-ad-roi\"\u003eThere is academic evidence on Super Bowl ad ROI\u003c/h2\u003e\n\u003cp\u003eThe cleanest causal evidence comes from \u003ca href=\"https://www.gsb.stanford.edu/insights/do-super-bowl-ads-really-work\"\u003eWesley Hartmann at Stanford GSB and Daniel Klapper at Humboldt University\u003c/a\u003e, published in \u003cem\u003eMarketing Science\u003c/em\u003e. Using \u003ca href=\"https://web.stanford.edu/~wesleyr/SuperBowl.pdf\"\u003eNielsen data across 55 media markets and six years of Super Bowls\u003c/a\u003e, they exploited exogenous variation in viewership (specifically, ratings spikes caused by local team participation) to estimate causal effects. Their results: Budweiser earned an extra $96 million from Super Bowl advertising, a \u003cstrong\u003e172% return on investment\u003c/strong\u003e. Budweiser\u0026rsquo;s short-run sales revenue ran 15.75% higher per household than competitors in the weeks following the game.\u003c/p\u003e\n\u003cp\u003eBut Hartmann and Klapper\u0026rsquo;s most important finding on ad effectiveness is that when two brands in the same product category both advertise, neither gains incremental profit. The effects cancel out. Coca-Cola and Pepsi have both advertised annually in the Super Bowl for years. The researchers found no statistically significant volume increase for Coca-Cola regardless of whether it advertised, and the direction of the coefficients, if anything, suggested a negative relationship. The entire soda category\u0026rsquo;s Super Bowl spending appears to be a value-destroying exercise that neither side can unilaterally exit.\u003c/p\u003e\n\u003cp\u003eThis is a textbook prisoner\u0026rsquo;s dilemma. Game theory applied to advertising predicts exactly this outcome: if Bud Light and Coors Light both spend $50 million on ads, they each profit $200 million. If both spend only $10 million, they each profit $240 million. Both rationally choose $50 million. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-super-bowl-prisoners-dilemma-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/super-bowl-prisoners-dilemma.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/super-bowl-prisoners-dilemma.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/super-bowl-prisoners-dilemma.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/super-bowl-prisoners-dilemma.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-prisoners-dilemma.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/super-bowl-prisoners-dilemma.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/super-bowl-prisoners-dilemma.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/super-bowl-prisoners-dilemma.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-prisoners-dilemma.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/super-bowl-prisoners-dilemma.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/super-bowl-prisoners-dilemma.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-prisoners-dilemma.png\"\n           alt=\"Super Bowl advertising prisoner\u0026#39;s dilemma payoff matrix showing two competing beer brands where both rationally choose heavy spend of $50M each yielding $200M profit apiece at Nash equilibrium, versus the Pareto optimal outcome of light spend at $10M each yielding $240M profit apiece, destroying $80M in collective profit that the NFL captures\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-super-bowl-prisoners-dilemma-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/super-bowl-prisoners-dilemma.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Super Bowl advertising prisoner\u0026#39;s dilemma payoff matrix showing two competing beer brands where both rationally choose heavy spend of $50M each yielding $200M profit apiece at Nash equilibrium, versus the Pareto optimal outcome of light spend at $10M each yielding $240M profit apiece, destroying $80M in collective profit that the NFL captures\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eAnheuser-Busch understood this and paid to avoid it. The company \u003ca href=\"https://www.marketingdive.com/news/NFL-Anheuser-Busch-InBev-Super-Bowl-Advertising/625707/\"\u003eheld exclusive beer advertising rights for 33 consecutive years\u003c/a\u003e (1989–2022), spending \u003ca href=\"https://money.cnn.com/2016/02/05/news/anheuser-busch-super-bowl-advertising/\"\u003e\u003cstrong\u003e$278 million over a decade\u003c/strong\u003e\u003c/a\u003e partly to prevent competitive neutralization. When exclusivity ended in 2023, the Super Bowl immediately featured nine beer ads from multiple brands. Budweiser\u0026rsquo;s ROI almost certainly declined.\u003c/p\u003e\n\u003cp\u003eStock price studies paint a muddier picture. An \u003ca href=\"https://doi.org/10.3390/su12176686\"\u003eMDPI Sustainability study\u003c/a\u003e examining 272 ads from 142 firms (2010–2019) found positive cumulative abnormal returns of 2.35% over 10 days post-game. \u003ca href=\"https://bridgewise.com/blog/super-bowl-stock-price-fumble/\"\u003eBridgewise\u003c/a\u003e, covering 2021–2024, found the opposite: a portfolio of Super Bowl advertisers underperformed the S\u0026amp;P 500 by 9.2% after six months, with only 25% of individual advertisers outperforming. \u003ca href=\"https://www.kantar.com/north-america/company-news/in-game-ad-revenue-for-super-bowl-lvi-increased-by-more-than-143-million\"\u003eKantar\u0026rsquo;s analysis\u003c/a\u003e reports an average ROI of $4.60 per dollar spent, a figure broadly consistent with their multi-year tracking. A \u003ca href=\"https://digitalcommons.georgiasouthern.edu/marketing-facpubs/19/\"\u003eGeorgia Southern study by Eastman and Iyer\u003c/a\u003e found that USA Today Ad Meter likeability scores, the industry\u0026rsquo;s most-cited metric for judging Super Bowl ads, had no significant relationship with financial effectiveness.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"attribution-is-difficult\"\u003eAttribution is difficult\u003c/h2\u003e\n\u003cp\u003eI always wondered how well attribution works. It seems mostly guesswork to me. The evidence suggests this is more right than wrong, though \u0026ldquo;guesswork\u0026rdquo; understates the sophistication of modern marketing attribution tools while overstating their accuracy. A \u003ca href=\"https://www.revsure.ai/the-state-of-marketing-attribution-in-2024\"\u003e2024 Ascend2 survey\u003c/a\u003e found that only 29% of marketers are \u0026ldquo;extremely confident\u0026rdquo; in their attribution accuracy. More than a third of CMOs do not fully trust their own marketing data. The problems are real: privacy signal loss from GDPR, CCPA, and iOS opt-outs has degraded observable data. Cross-device fragmentation means customers touch 3–5+ devices before converting. Platform self-reporting creates systematic overcounting, with Google, Meta, and Amazon each claiming credit for the same sale.\u003c/p\u003e\n\u003cp\u003eFor Super Bowl ads specifically, the attribution challenge is amplified by confounding. Brands run concurrent promotions, digital retargeting campaigns, influencer activations, and PR blitzes. Many release ads days before the game. Academic research suggests pricing relative to competition has 20–25x greater impact on sales than total advertising across all channels, which means a coincidental price change during Super Bowl week can wash out the advertising signal entirely.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://blog.cloudflare.com/super-bowl-lviii/\"\u003eCloudflare\u0026rsquo;s DNS data\u003c/a\u003e showed TurboTax saw a \u003cstrong\u003e24,875%\u003c/strong\u003e traffic increase above baseline after its 2024 Super Bowl ad. e.l.f. Cosmetics saw 8,118%. Poppi saw 7,329%. But a \u003ca href=\"https://www.similarweb.com/blog/insights/super-bowl-impact/\"\u003eSimilarweb analysis\u003c/a\u003e of 28-day post-game traffic found an average increase of only \u003cstrong\u003e~1%\u003c/strong\u003e across all advertisers. The spike is enormous and ephemeral. \u003ca href=\"https://adage.com/article/special-report-super-bowl/super-bowl-glow-measure-weeks/296864/\"\u003eYouGov BrandIndex\u003c/a\u003e found that only 10 of roughly 50+ advertisers saw positive buzz lift above the margin of error, with a maximum duration of two weeks. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-super-bowl-spike-vs-sustain-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/super-bowl-spike-vs-sustain.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/super-bowl-spike-vs-sustain.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/super-bowl-spike-vs-sustain.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/super-bowl-spike-vs-sustain.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-spike-vs-sustain.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/super-bowl-spike-vs-sustain.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/super-bowl-spike-vs-sustain.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/super-bowl-spike-vs-sustain.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-spike-vs-sustain.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/super-bowl-spike-vs-sustain.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/super-bowl-spike-vs-sustain.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-spike-vs-sustain.png\"\n           alt=\"Super Bowl ad attribution gap showing real-time Cloudflare DNS traffic spikes of 24,875 percent for TurboTax, 8,118 percent for e.l.f. Cosmetics, and 7,329 percent for Poppi contrasted against Similarweb 28-day sustained lift of only approximately 1 percent across all Super Bowl advertisers\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-super-bowl-spike-vs-sustain-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/super-bowl-spike-vs-sustain.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Super Bowl ad attribution gap showing real-time Cloudflare DNS traffic spikes of 24,875 percent for TurboTax, 8,118 percent for e.l.f. Cosmetics, and 7,329 percent for Poppi contrasted against Similarweb 28-day sustained lift of only approximately 1 percent across all Super Bowl advertisers\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.cfo.com/news/a-cfo-guide-to-super-bowl-ad-spend-jason-hershman-point-/811381/\"\u003eCFO.com\u0026rsquo;s Hershman\u003c/a\u003e had the clearest framing for anyone trying to evaluate this honestly: marketing will come back with impressions, social mentions, and \u0026ldquo;earned media value,\u0026rdquo; which he described as Wall Street\u0026rsquo;s least favorite made-up metric. The only meaningful number is incremental contribution profit. At 40% gross margin, a $13 million all-in Super Bowl investment needs \u003cstrong\u003e$32.5 million in incremental revenue\u003c/strong\u003e just to break even on pure acquisition economics.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-nfls-advertising-pricing-machine\"\u003eThe NFL\u0026rsquo;s advertising pricing machine\u003c/h2\u003e\n\u003cp\u003eThe NFL operates a price-discrimination machine that has outpaced inflation for 60 years. Super Bowl ad prices have increased from $37,500 in 1967 to $8 million in 2026, \u003ca href=\"https://www.superbowl-ads.com/cost-of-super-bowl-advertising-breakdown-by-year/\"\u003ea 213x nominal increase\u003c/a\u003e and roughly 22–23x in real terms. The compound annual growth rate of approximately 9.6% is more than double average CPI inflation over the same period. Only three year-over-year price decreases have occurred in the entire 60-year history. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-super-bowl-price-history-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/super-bowl-price-history.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/super-bowl-price-history.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/super-bowl-price-history.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/super-bowl-price-history.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-price-history.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/super-bowl-price-history.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/super-bowl-price-history.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/super-bowl-price-history.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-price-history.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/super-bowl-price-history.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/super-bowl-price-history.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/super-bowl-price-history.png\"\n           alt=\"Super Bowl ad cost history from 1967 to 2026 showing price growth from $37,500 at Super Bowl I to $1.2M first seven figures to $2.1M at the Dot-Com Bowl to $8M at Super Bowl LX, a 213x nominal increase at 9.6 percent CAGR with only three year-over-year price decreases in 59 years\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-super-bowl-price-history-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/super-bowl-price-history.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Super Bowl ad cost history from 1967 to 2026 showing price growth from $37,500 at Super Bowl I to $1.2M first seven figures to $2.1M at the Dot-Com Bowl to $8M at Super Bowl LX, a 213x nominal increase at 9.6 percent CAGR with only three year-over-year price decreases in 59 years\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe NFL\u0026rsquo;s leverage comes from a structural scarcity that it actively maintains. Super Bowl ad inventory sells out months in advance. \u003ca href=\"https://www.hellomagazine.com/film/881951/2026-super-bowl-commercial-cost-breaks-records/\"\u003eNBC sold out its 2026 inventory\u003c/a\u003e before the NFL season even started, with some companies paying $10 million or more due to what NBCUniversal\u0026rsquo;s Mike Marshall called \u0026ldquo;the marketplace demand.\u0026rdquo; \u003ca href=\"https://www.foxcorporation.com/news/business/2025/super-bowl-lix-on-fox-and-tubi-generates-more-than-800-million-in-gross-advertising-revenue/\"\u003eFox reported $800+ million\u003c/a\u003e in gross ad revenue from Super Bowl LIX in 2025, a record that industry analysts expect to become a billion dollars within two to three years. The mandatory companion buys force advertisers into additional network inventory they might not otherwise purchase, extracting surplus beyond the headline slot price.\u003c/p\u003e\n\u003cp\u003eViewership has cooperated. The Super Bowl drew 51.2 million viewers in 1967 and \u003cstrong\u003e127.7 million\u003c/strong\u003e in 2025. Streaming hasn\u0026rsquo;t fragmented the audience; it\u0026rsquo;s expanded it. \u003ca href=\"https://www.foxsports.com/presspass/blog/2025/02/11/fox-sports-presentation-of-super-bowl-lix-delivers-most-watched-super-bowl-of-all-time-with-127-7-million-viewers-across-all-platforms/\"\u003eTubi alone delivered 13.6 million streaming viewers\u003c/a\u003e for Super Bowl LIX, a 94% increase over Fox\u0026rsquo;s previous Super Bowl. \u003ca href=\"https://www.tvtechnology.com/news/fox-sports-super-bowl-viewership-peaks-at-record-135-7-million\"\u003eAdImpact data showed streaming at 49% of total viewership\u003c/a\u003e, up from 41.5% in 2024. The audience skews younger on streaming: Tubi\u0026rsquo;s Super Bowl audience was 38% more likely to be 18–34 than the overall game audience, which is exactly the demographic advertisers pay premiums to reach.\u003c/p\u003e\n\u003cp\u003eThe result is a market where the seller has near-monopoly pricing power, the buyers face a prisoner\u0026rsquo;s dilemma that prevents collective resistance, and the audience keeps growing. The NFL has essentially created a Veblen good in advertising: the price itself signals legitimacy, which makes the price self-sustaining. The \u003ca href=\"https://en.wikipedia.org/wiki/Dot-com_commercials_during_Super_Bowl_XXXIV\"\u003e2000 \u0026ldquo;Dot-Com Bowl\u0026rdquo;\u003c/a\u003e saw 14+ internet companies advertise, using the Super Bowl as a credibility play. At least eight went bust within a decade. The \u003ca href=\"https://www.cnbc.com/2022/11/30/crypto-crash-may-leave-ad-supported-businesses-with-hole-in-budget.html\"\u003e2022 \u0026ldquo;Crypto Bowl\u0026rdquo;\u003c/a\u003e featured Coinbase, FTX, Crypto.com, and eToro spending a collective $54 million. FTX collapsed into bankruptcy within nine months. The pattern repeats because the mechanism works: bubble industries pay the premium precisely because appearing in the Super Bowl signals they belong among established brands. That this signal is often false doesn\u0026rsquo;t reduce its price.\u003c/p\u003e\n\u003ch2 id=\"the-cases-that-define-the-genre\"\u003eThe cases that define the genre\u003c/h2\u003e\n\u003cp\u003eApple\u0026rsquo;s \u003ca href=\"https://en.wikipedia.org/wiki/1984_(advertisement)\"\u003e\u0026ldquo;1984\u0026rdquo; ad\u003c/a\u003e cost approximately $750,000–$900,000 to produce plus $800,000 in airtime, roughly $4 million in today\u0026rsquo;s dollars. Apple\u0026rsquo;s board hated it and ordered the time sold back. Steve Jobs intervened. The ad \u003ca href=\"https://heidicohen.com/content-quality-lesson-apple-1984-super-bowl-ad/\"\u003egenerated \u003cstrong\u003e$155 million\u003c/strong\u003e in Macintosh sales\u003c/a\u003e within three months. Apple sold 250,000 Macs in the first year against a 30,000-unit break-even target. It sits \u003ca href=\"https://americanhistory.si.edu/explore/stories/remembering-apples-1984-super-bowl-ad\"\u003ein the Smithsonian\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eCoinbase\u0026rsquo;s 2022 QR code ad \u003ca href=\"https://www.thedrum.com/news/2022/02/14/ad-the-day-coinbase-breaks-internet-with-qr-code-super-bowl-stunt\"\u003ecost $14 million\u003c/a\u003e for 60 seconds of a bouncing QR code on a black screen. The landing page \u003ca href=\"https://www.cnn.com/2022/02/14/investing/coinbase-qr-code-app\"\u003ereceived \u003cstrong\u003e20+ million hits in one minute\u003c/strong\u003e\u003c/a\u003e, crashing the app. Downloads \u003ca href=\"https://techcrunch.com/2022/02/17/super-bowl-ads-boosted-crypto-app-downloads-by-279-led-by-coinbase/\"\u003esurged 309% week-over-week\u003c/a\u003e. The ad won the Clio \u0026ldquo;Super Clio\u0026rdquo; and finished dead last in USA Today\u0026rsquo;s Ad Meter consumer rankings simultaneously. Then the crypto market collapsed, Coinbase laid off 18% of staff, and the massive awareness evaporated. A reminder that advertising cannot fix a product\u0026rsquo;s relationship to reality.\u003c/p\u003e\n\u003cp\u003eGoDaddy advertised in every Super Bowl from 2005 to 2015, deliberately courting controversy with provocative ads. Their first appearance \u003ca href=\"https://mbaknol.com/management-case-studies/case-study-godaddys-super-bowl-commercials/\"\u003egenerated a 378% website traffic spike\u003c/a\u003e and 51.4% share of voice among all advertisers, largely because \u003ca href=\"https://adage.com/article/special-report-super-bowl/fox-killed-airing-super-bowl-godaddy-ad/45076\"\u003eFox pulled the second airing\u003c/a\u003e and created a news cycle. Today over 60% of visitors go to GoDaddy.com directly rather than through search. The company grew to 21 million customers before going public. Provocation as a launch strategy worked, until the brand matured and pivoted away.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://system1group.com/blog/dont-forget-to-brand-why-codification-is-the-key-to-super-bowl-advertising-success\"\u003eSystem1 data\u003c/a\u003e offers a sobering counterpoint to these highlights: \u003cstrong\u003e21%\u003c/strong\u003e of viewers in 2025 couldn\u0026rsquo;t recall which brand was behind the ad they\u0026rsquo;d just watched. That means roughly one in five Super Bowl ads converts millions in ad spend into brandless entertainment. The audience enjoyed the show. They just have no idea who paid for it.\u003c/p\u003e\n\u003ch2 id=\"what-the-economics-of-a-super-bowl-ad-tell-us\"\u003eWhat the economics of a Super Bowl ad tell us\u003c/h2\u003e\n\u003cp\u003eI keep coming back to \u003ca href=\"https://www.gsb.stanford.edu/faculty-research/publications/super-bowl-ads\"\u003eHartmann and Klapper\u0026rsquo;s central result\u003c/a\u003e because it\u0026rsquo;s the one that reshapes how you think about the entire exercise. The Super Bowl ad works brilliantly as an investment, but only when the advertiser has category exclusivity. The moment a competitor shows up, the gains evaporate. What looks like an advertising problem is actually a competitive strategy problem.\u003c/p\u003e\n\u003cp\u003eAnheuser-Busch paid for exclusivity for 33 years because the company understood this. The \u003ca href=\"https://money.cnn.com/2016/02/05/news/anheuser-busch-super-bowl-advertising/\"\u003e$278 million over a decade\u003c/a\u003e wasn\u0026rsquo;t a media buy. It was an entry barrier. The moment that barrier \u003ca href=\"https://www.cnn.com/2022/07/15/business-food/anheuser-busch-molson-coors-super-bowl-deal/index.html\"\u003efell in 2023\u003c/a\u003e, the category filled with nine competing brands and the collective value of Super Bowl beer advertising almost certainly declined. The NFL captured the difference.\u003c/p\u003e\n\u003cp\u003eThis means the honest answer to \u0026ldquo;should you buy a Super Bowl ad?\u0026rdquo; isn\u0026rsquo;t about CPMs or brand lift or even ROI in the traditional sense. It\u0026rsquo;s about whether your competitive position allows you to capture the value or whether you\u0026rsquo;re paying for a prisoner\u0026rsquo;s dilemma that the NFL designed.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://ro.co/perspectives/super-bowl-economics/\"\u003eReitano\u0026rsquo;s asymmetric upside thesis\u003c/a\u003e is logically sound for a company like Ro, which was advertising in a healthcare category without heavy Super Bowl competition and used the spot as a genuine brand awareness play. But the framework breaks down when applied generally. The 2026 Super Bowl featured Novo Nordisk, Ro, Hims \u0026amp; Hers, Novartis, Boehringer Ingelheim, and Eli Lilly all running health-related ads. Northwestern\u0026rsquo;s Tim Calkins \u003ca href=\"https://fortune.com/2026/02/06/super-bowl-ads-cost-budweiser-lays-amazon-meta-anthropic-ring/\"\u003ecalled it\u003c/a\u003e the \u0026ldquo;GLP-1 Super Bowl.\u0026rdquo; If the Hartmann-Klapper result holds across categories, those brands collectively spent north of \u003cstrong\u003e$100 million\u003c/strong\u003e on ads whose effects substantially cancelled each other out.\u003c/p\u003e\n","summary":"A 30-second Super Bowl spot costs $8M. The real price is $16–23M. The ROI evidence is mixed. A deep look at the pricing, the prisoner's dilemma, and the NFL.","image":"https://static.philippdubach.com/ograph/ograph-super-bowl-economics1.jpg","date_published":"2026-02-20T00:00:00Z","date_modified":"2026-05-04T13:48:47+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Economics"],"_philippdubach":{"type":"Analysis","word_count":2297,"reading_time_minutes":11,"keywords":["Super Bowl ad cost 2026","Super Bowl advertising ROI","do Super Bowl ads work","Super Bowl commercial cost breakdown","NFL Super Bowl advertising economics","Super Bowl prisoner dilemma game theory","Hartmann Klapper Stanford Super Bowl study","Super Bowl ad attribution problem","Super Bowl CPM analysis","Anheuser-Busch Super Bowl exclusivity","Super Bowl viewership 2026","Super Bowl ad production cost","Super Bowl ad effectiveness research","NFL advertising monopoly pricing","monoculture advertising Super Bowl","Super Bowl ad brand recall","Super Bowl streaming viewership Tubi","EDO Super Bowl engagement metric","Super Bowl 30-second spot price history","marketing attribution advertising measurement","game theory pricing economics","monopoly pricing power","advertising ROI measurement","prisoner's dilemma business strategy"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-impossible-backhand/","url":"https://philippdubach.com/posts/the-impossible-backhand/","title":"The Impossible Backhand","content_html":"\u003cp\u003eIn the latest issue of \u003ca href=\"https://lab.philippdubach.com\"\u003eThe AI Lab Newsletter\u003c/a\u003e, I featured a ByteDance \u003ca href=\"https://x.com/AngryTomtweets/status/2021194266517832057\"\u003eSeedance 2.0\u003c/a\u003e clip: two men playing tennis at what looked like an ATP tournament. Photorealistic. I probably wouldn\u0026rsquo;t be able to tell it wasn\u0026rsquo;t real footage if I didn\u0026rsquo;t know. A co-worker who played junior pro-am tennis watched the same clip and said: \u0026ldquo;That backhand doesn\u0026rsquo;t exist. Nobody plays it like that.\u0026rdquo; His domain expertise spotted an error that probably fooled everyone else.\u003c/p\u003e\n\u003cp\u003eWe ended up in a long conversation about what that means. AI can get to maybe the 95th or 98th percentile of creating something that looks perfect, but then it isn\u0026rsquo;t, and if you have deep knowledge you can spot it immediately. The consensus narrative treats this as a temporary limitation. But it might be structural. And I think the evidence, once you lay it out, points to a genuinely contrarian conclusion: domain expertise is appreciating in value, not depreciating, precisely because AI hits a quality ceiling it can\u0026rsquo;t easily push past.\u003c/p\u003e\n\u003ch2 id=\"approaching-the-ai-quality-ceiling\"\u003eApproaching the AI quality ceiling\u003c/h2\u003e\n\u003cp\u003eI\u0026rsquo;ve \u003ca href=\"/posts/the-most-expensive-assumption-in-ai/\"\u003ewritten before\u003c/a\u003e about Sara Hooker\u0026rsquo;s work on diminishing returns from scaling. The investment side of that argument, the \u003ca href=\"/posts/the-saaspocalypse-paradox/\"\u003e$690 billion in hyperscaler capex\u003c/a\u003e chasing a 4% revenue coverage ratio, has been well covered. What hasn\u0026rsquo;t been covered as precisely is why AI output quality hits a ceiling, and why that ceiling is structural rather than temporary.\u003c/p\u003e\n\u003cp\u003eBen Affleck, of all people, gave the clearest non-technical explanation on \u003ca href=\"https://faroutmagazine.co.uk/ben-affleck-dismisses-existential-potential-ai-hollywood/\"\u003eThe Joe Rogan Experience\u003c/a\u003e in January 2026:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eIf you try to get ChatGPT or Claude or Gemini to write you something, it\u0026rsquo;s really shitty. And it\u0026rsquo;s shitty because by its nature it goes to the mean, to the average. Now, it\u0026rsquo;s a useful tool if you\u0026rsquo;re a writer\u0026hellip; but I don\u0026rsquo;t think it\u0026rsquo;s actually very likely that it\u0026rsquo;s going to write anything meaningful, or that it\u0026rsquo;s going to be making movies from whole cloth. That\u0026rsquo;s bullshit.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eHe\u0026rsquo;s more right than he probably knows. The convergence to the mean isn\u0026rsquo;t a solvable engineering problem. It operates at three distinct levels, each compounding the others.\u003c/p\u003e\n\u003cp\u003e(1) The mathematics of next-token prediction. LLMs generate the most statistically probable continuation of a sequence. Probable, by definition, means average. The model isn\u0026rsquo;t trying to produce the best output; it\u0026rsquo;s producing the most expected one given the distribution it learned. Outlier quality, the kind that makes writing or analysis distinctive, lives in the tails of the distribution. The architecture systematically avoids those tails.\u003c/p\u003e\n\u003cp\u003e(2) RLHF makes it worse. Research shows that human annotators prefer familiar-sounding responses, and the learned reward function weights typicality at α=0.57. Models are quite literally being trained to sound typical rather than merely correct or good. The reinforcement signal pushes outputs toward the center of the quality distribution, not toward its upper bound.\u003c/p\u003e\n\u003cp\u003e(3) model collapse. \u003ca href=\"https://www.nature.com/articles/s41586-024-07566-y\"\u003eShumailov et al.\u003c/a\u003e documented this in their Nature paper: as models increasingly train on AI-generated content, they \u0026ldquo;forget the true underlying data distribution,\u0026rdquo; losing the tails first and converging toward a point estimate with minimal variance. The internet is filling with AI-generated text. The next generation of models trains on that text. The tails shrink further. This is a positive feedback loop running in the wrong direction.\u003c/p\u003e\n\u003cp\u003eMIT researchers \u003ca href=\"https://arxiv.org/abs/2007.05558\"\u003eThompson, Greenewald, Lee, and Manso\u003c/a\u003e quantified the cost side: computational resources scale with at least the fourth power of improvement in theory, the ninth power in practice. To halve an error rate requires more than 500× the computational resources. When AlexNet trained on two GPUs in 2012, it took six days. By 2018, NASNet-A cut the error rate in half using more than 1,000× as much compute. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ninth-power-curve-2-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ninth-power-curve-2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ninth-power-curve-2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ninth-power-curve-2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ninth-power-curve-2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ninth-power-curve-2.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ninth-power-curve-2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ninth-power-curve-2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ninth-power-curve-2.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ninth-power-curve-2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ninth-power-curve-2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ninth-power-curve-2.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ninth-power-curve-2.png\"\n           alt=\"AI quality ceiling ninth-power scaling curve: computational cost scales from AlexNet in 2012 on two GPUs to NASNet-A in 2018 requiring over 1000x compute to halve error rate, showing diminishing returns that explain why AI output quality plateaus and domain expertise remains irreplaceable\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ninth-power-curve-2-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ninth-power-curve-2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AI quality ceiling ninth-power scaling curve: computational cost scales from AlexNet in 2012 on two GPUs to NASNet-A in 2018 requiring over 1000x compute to halve error rate, showing diminishing returns that explain why AI output quality plateaus and domain expertise remains irreplaceable\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eAffleck captured the commercial implication of this better than most analysts:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eI think a lot of that rhetoric comes from people who are trying to justify valuations around companies where they go, \u0026ldquo;We\u0026rsquo;re going to change everything in two years.\u0026rdquo; Well, the reason they\u0026rsquo;re saying that is because they need to ascribe a valuation for investment that can warrant the capex spend they\u0026rsquo;re going to make on these data centers. Except that ChatGPT 5 is about 25 percent better than ChatGPT 4, and costs about four times as much in the way of electricity and data.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eHe\u0026rsquo;s describing the ninth-power curve in plain English. Each marginal improvement costs exponentially more. The curve bends away from you the harder you push.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"humanitys-last-exam\"\u003eHumanity\u0026rsquo;s Last Exam\u003c/h2\u003e\n\u003cp\u003eThe hardest measurement of where AI actually stands against domain expertise is \u003ca href=\"https://artificialanalysis.ai/evaluations/humanitys-last-exam\"\u003eHumanity\u0026rsquo;s Last Exam\u003c/a\u003e (HLE), published in Nature in early 2025 by the Center for AI Safety and Scale AI. Built with approximately 1,000 subject-matter experts across 500+ institutions, it consists of 2,500 expert-crafted questions spanning 100+ academic domains, designed to be \u0026ldquo;Google-proof\u0026rdquo;: questions that require genuine understanding rather than information retrieval.\u003c/p\u003e\n\u003cp\u003eAs of February 2026, the top model (Gemini 3 Pro Preview) scores \u003cstrong\u003e37.5%\u003c/strong\u003e. Most models sit below 30%. Human domain experts average roughly \u003cstrong\u003e90%\u003c/strong\u003e. That\u0026rsquo;s a 53-point gap. In specialized domains like advanced chemical kinetics or medieval philology, AI barely outperforms random guessing while experts score comfortably in the 80s and 90s. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-hle-gap-chart-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/hle-gap-chart.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/hle-gap-chart.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/hle-gap-chart.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/hle-gap-chart.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hle-gap-chart.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/hle-gap-chart.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/hle-gap-chart.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/hle-gap-chart.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hle-gap-chart.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/hle-gap-chart.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/hle-gap-chart.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hle-gap-chart.png\"\n           alt=\"Humanity\u0026#39;s Last Exam 2026 benchmark scores showing 53-point gap between human domain experts at roughly 90 percent and top AI models including Gemini 3 Deep Think at 48.4 percent and Gemini 3 Pro Preview at 37.5 percent, evidence that AI capability frontier remains far behind human expertise on specialist questions\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-hle-gap-chart-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/hle-gap-chart.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Humanity\u0026#39;s Last Exam 2026 benchmark scores showing 53-point gap between human domain experts at roughly 90 percent and top AI models including Gemini 3 Deep Think at 48.4 percent and Gemini 3 Pro Preview at 37.5 percent, evidence that AI capability frontier remains far behind human expertise on specialist questions\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n The models are also systematically overconfident. Calibration errors on HLE \u003ca href=\"https://www.letsdatascience.com/blog/humanitys-last-exam-the-test-thats-humbling-the-worlds-smartest-ai\"\u003erange from 34% to 89%\u003c/a\u003e, meaning AI systems are saying \u0026ldquo;I\u0026rsquo;m 90% sure\u0026rdquo; when they should be saying \u0026ldquo;I\u0026rsquo;m guessing.\u0026rdquo; That gap between confidence and accuracy, that AI overconfidence, is where real-world harm concentrates.\u003c/p\u003e\n\u003cp\u003eIn legal applications, Yale researcher \u003ca href=\"https://law.stanford.edu/2024/01/11/hallucinating-law-legal-mistakes-with-large-language-models-are-pervasive/\"\u003eMatthew Dahl\u003c/a\u003e found hallucination rates of 69% to 88% on specific queries. Damien Charlotin\u0026rsquo;s database now tracks 914 cases of AI-generated hallucinated content in legal filings worldwide, growing from two cases per week to two to three per day. In medicine, the \u003ca href=\"https://www.annfammed.org/content/23/1/1/tab-e-letters\"\u003eAnnals of Family Medicine\u003c/a\u003e warns that AI hallucinations are \u0026ldquo;far more insidious\u0026rdquo; because \u0026ldquo;a subtle misstep like a misplaced clinical guideline, an incorrect dosage, or an invented side effect may not raise immediate suspicion.\u0026rdquo; These aren\u0026rsquo;t edge cases. They\u0026rsquo;re the expected behavior of systems operating in professional domains where training data is sparse.\u003c/p\u003e\n\u003cp\u003eThe structural explanation is what Kandpal et al. demonstrated at ICML 2023: there\u0026rsquo;s a strong correlational and causal relationship between an LLM\u0026rsquo;s ability to answer questions and how many relevant documents appeared in pre-training data. Common knowledge gets learned well. Specialized knowledge appears infrequently online, so models learn it poorly. \u003ca href=\"https://x.com/alive_eth/status/1286650402356641792\"\u003eAli Yahya\u003c/a\u003e of a16z framed it sharply: neural networks are \u0026ldquo;fantastic interpolators but terrible extrapolators,\u0026rdquo; powerful pattern matchers that are \u0026ldquo;blind to the mechanisms that generate the data in the first place.\u0026rdquo; \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-domain-risk-map-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/domain-risk-map.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/domain-risk-map.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/domain-risk-map.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/domain-risk-map.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/domain-risk-map.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/domain-risk-map.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/domain-risk-map.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/domain-risk-map.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/domain-risk-map.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/domain-risk-map.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/domain-risk-map.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/domain-risk-map.png\"\n           alt=\"AI hallucination rates across professional domains: legal research at 69 to 88 percent failure rated critical risk, clinical medicine rated critical with subtle errors, financial analysis at roughly 45 percent, expert academics at 62.5 percent failure on Humanity\u0026#39;s Last Exam, mapping the AI capability frontier by domain\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-domain-risk-map-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/domain-risk-map.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AI hallucination rates across professional domains: legal research at 69 to 88 percent failure rated critical risk, clinical medicine rated critical with subtle errors, financial analysis at roughly 45 percent, expert academics at 62.5 percent failure on Humanity\u0026#39;s Last Exam, mapping the AI capability frontier by domain\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n My colleague who spotted the impossible backhand is a fantastic extrapolator. He has an embodied model of how tennis biomechanics work that no amount of video footage can teach a diffusion model. The model can produce outputs that are statistically plausible. He can identify outputs that are physically impossible. That distinction is the gap.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-centaur-model-for-human-ai-collaboration\"\u003eThe centaur model for human-AI collaboration\u003c/h2\u003e\n\u003cp\u003eThe consensus framing positions AI and human expertise as substitutes: AI gets better, humans become less relevant. The empirical evidence on AI augmentation versus replacement says the opposite. Human-AI collaboration, what researchers call the centaur model, outperforms either alone, consistently, across domains, and the quality of the human contribution matters a lot.\u003c/p\u003e\n\u003cp\u003eThe Harvard/BCG study tested 758 consultants, 7% of BCG\u0026rsquo;s consulting workforce, on realistic tasks using GPT-4. The researchers described a \u0026ldquo;\u003ca href=\"https://www.hbs.edu/faculty/Pages/item.aspx?num=64700\"\u003ejagged technological frontier\u003c/a\u003e\u0026rdquo; where some tasks fall within AI\u0026rsquo;s capabilities and others, though seemingly similar, do not. For tasks within that frontier, consultants using AI completed 12.2% more tasks, finished 25.1% faster, and produced results 40% higher in quality. Below-average performers saw a 43% improvement in knowledge worker productivity. AI as skill equalizer. But for tasks outside AI\u0026rsquo;s frontier, consultants using AI were \u003cstrong\u003e19 percentage points\u003c/strong\u003e less likely to produce correct solutions. The researchers observed that \u0026ldquo;professionals who had a negative performance when using AI tended to blindly adopt its output and interrogate it less.\u0026rdquo; \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-centaur-effect-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/centaur-effect.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/centaur-effect.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/centaur-effect.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/centaur-effect.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/centaur-effect.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/centaur-effect.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/centaur-effect.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/centaur-effect.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/centaur-effect.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/centaur-effect.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/centaur-effect.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/centaur-effect.png\"\n           alt=\"Harvard BCG centaur model study results on human-AI collaboration and knowledge worker productivity: within AI capability frontier showing plus 40 percent quality, plus 12.2 percent more tasks, plus 25.1 percent faster; outside frontier showing minus 19 percentage points accuracy for blind delegators\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-centaur-effect-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/centaur-effect.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Harvard BCG centaur model study results on human-AI collaboration and knowledge worker productivity: within AI capability frontier showing plus 40 percent quality, plus 12.2 percent more tasks, plus 25.1 percent faster; outside frontier showing minus 19 percentage points accuracy for blind delegators\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThat second finding doesn\u0026rsquo;t get enough attention. It means the value of the human in the loop depends entirely on whether the human can identify when the AI is wrong. Which requires precisely the domain expertise that AI supposedly makes obsolete.\u003c/p\u003e\n\u003cp\u003eThe \u003ca href=\"https://www.lsu.edu/business/news/2025/7/research-ai-collaboration.php\"\u003e\u0026ldquo;centaur analyst\u0026rdquo; study from LSU Finance\u003c/a\u003e (winner of the Fama-DFA Best Paper Award) confirmed this human-AI partnership over an 18-year dataset. AI alone beat human stock analysts in 54.5% of cases. The human-AI hybrid outperformed AI-only in nearly 55% of forecasts and reduced extreme prediction errors by roughly 90% compared to human analysts alone. In clinical decision-making experiments with the Mayo Clinic, the ranking was consistent: human-algorithm centaur, then algorithm alone, then human experts alone. The human adds most value at the extremes, catching the cases where the model\u0026rsquo;s convergence to the mean produces confidently wrong answers.\u003c/p\u003e\n\u003cp\u003eAffleck, who has thought about this more carefully than his reputation might suggest, landed on the same conclusion:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe way I see the technology and what it\u0026rsquo;s good at and what it\u0026rsquo;s not, it\u0026rsquo;s gonna be good at filling in all the places that are expensive and burdensome, and it\u0026rsquo;s always gonna rely fundamentally on the human artistic aspects of it.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eLabor economics research broadly confirms this. Oxford researchers\u003ca href=\"https://arxiv.org/abs/2412.19754\"\u003e Mäkelä and Stephany\u003c/a\u003e analyzed 12 million U.S. job vacancies and found that complementary effects of AI are 1.7× larger than substitution effects. The World Economic Forum projects 170 million new jobs created by 2030 versus 92 million displaced, a net gain of 78 million. \u003ca href=\"https://www.nber.org/system/files/working_papers/w28257/revisions/w28257.rev1.pdf\"\u003eAcemoglu, Autor, Hazell, and Restrepo\u003c/a\u003e found that while AI-exposed firms reduce hiring in non-AI positions:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003ethe aggregate impacts of AI-labor substitution on employment and wage growth\u0026hellip; is currently too small to be detectable.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003e\u003ca href=\"https://www.mckinsey.com.br/capabilities/tech-and-ai/our-insights/building-the-ai-muscle-of-your-business-leaders\"\u003eMcKinsey\u003c/a\u003e captures the strategic implication: \u0026ldquo;When you have built a bench of AI-capable domain owners, your company has a real competitive advantage. That\u0026rsquo;s because these leaders are hard to replicate.\u0026rdquo; Yet only 23% of organizations believe they are building sustainable AI advantages, despite 79% reporting competitors are making similar investments.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"ai-deskilling-is-a-trap\"\u003eAI deskilling is a trap\u003c/h2\u003e\n\u003cp\u003eIf a generation of junior analysts learns to use AI before developing independent judgment, they never build the pattern recognition that lets them spot when the model is wrong. If junior lawyers lean on AI for legal research before reading enough case law to develop intuition for what\u0026rsquo;s plausible, they can\u0026rsquo;t catch the 69-88% hallucination rates. If aspiring filmmakers generate scenes with Seedance 2.0 instead of learning how cameras, bodies, and physics actually interact, they can\u0026rsquo;t identify the impossible backhand. \u003ca href=\"https://www.gartner.com/en/articles/ai-lock-in\"\u003eGartner predicts\u003c/a\u003e that by 2030, half of enterprises will face irreversible skill shortages in at least two critical job roles because of unchecked automation. This AI skill erosion creates a vicious cycle: fewer skilled workers, greater dependence on AI, higher costs to fill the gaps.\u003c/p\u003e\n\u003cp\u003eAcemoglu warns that technology \u0026ldquo;does not automatically benefit workers.\u0026rdquo; In 19th-century England, the benefits of mechanization only spread after decades of worker activism. The parallel risk with AI isn\u0026rsquo;t mass unemployment. It\u0026rsquo;s a hollowing out of the skill base that makes the centaur model function. You lose not the jobs but the expertise that makes the jobs valuable.\u003c/p\u003e\n\u003cp\u003eDavid Autor\u0026rsquo;s vision is more optimistic: AI could \u0026ldquo;extend the relevance, reach, and value of human expertise,\u0026rdquo; democratizing it rather than eliminating it. I want to believe that\u0026rsquo;s right. But it requires treating AI as a tool that amplifies existing expertise rather than a shortcut that replaces the need to develop it. The 43% improvement that below-average BCG consultants saw from using GPT-4 is real. The 19-percentage-point penalty when those same consultants blindly trusted AI outside its frontier is equally real. The difference between those two outcomes is judgment. And judgment comes from experience, not from a larger context window.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m more confident in the centaur framework than in any specific prediction about timelines or magnitudes. The ninth-power scaling curve, the 53-point gap on Humanity\u0026rsquo;s Last Exam, the α=0.57 typicality bias in RLHF, the 69-88% hallucination rates in legal applications, and the 95% of \u003ca href=\"/posts/enterprise-ai-strategy-is-backwards/\"\u003eenterprises\u003c/a\u003e seeing no measurable P\u0026amp;L returns from AI investments all point in the same direction. The question of AI augmentation versus replacement has an empirical answer: AI is a tool that makes good practitioners better and bad practitioners worse. The \u003ca href=\"/posts/is-ai-really-eating-the-world/\"\u003eindustry narrative\u003c/a\u003e demands a story about replacement. The data tells a story about partnership, one where the human\u0026rsquo;s contribution is not a relic of an earlier era but the irreducible ingredient that makes the whole system work.\u003c/p\u003e\n\u003cp\u003eThe ability to spot the impossible backhand isn\u0026rsquo;t going away. If anything, it\u0026rsquo;s worth more every day.\u003c/p\u003e\n","summary":"AI converges to the mean by design. Ninth-power scaling costs and a 53-point gap on Humanity's Last Exam show domain expertise is appreciating, not declining.","image":"https://static.philippdubach.com/ograph/ograph-ai-scaling-walls1.jpg","date_published":"2026-02-17T00:00:00Z","date_modified":"2026-03-15T11:43:29+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Article","word_count":2077,"reading_time_minutes":10,"keywords":["AI quality ceiling domain expertise","human AI collaboration centaur model","Humanity's Last Exam AI benchmark 2026","AI hallucination rates legal medical","AI deskilling risk workforce","AI convergence to mean RLHF typicality","AI scaling laws diminishing returns","centaur model AI augmentation vs replacement","jagged technological frontier Harvard BCG","model collapse Shumailov Nature 2024","ninth power compute scaling AI","AI overconfidence calibration error","Ben Affleck AI Joe Rogan quality","domain expert AI competitive advantage","AI skill erosion professional knowledge","Harvard BCG GPT-4 consultant study","AI video generation quality limits","Sara Hooker scaling laws compute cost","AI blind delegation risk centaur","enterprise AI strategy domain knowledge","AI output quality limits structural","human AI partnership knowledge workers","AI capability frontier jagged","knowledge worker productivity AI","professional AI risk hallucination","does AI replace or augment experts","AI quality plateau ninth power curve","why AI converges to average output"],"section":"posts"}},{"id":"https://philippdubach.com/posts/europes-24-trillion-payment-breakup-is-really-a-bet-on-infrastructure-arbitrage/","url":"https://philippdubach.com/posts/europes-24-trillion-payment-breakup-is-really-a-bet-on-infrastructure-arbitrage/","title":"Europe's $24 Trillion Payment Breakup Is Really a Bet on Infrastructure Arbitrage","content_html":"\u003cbr\u003e\n\u003cp\u003eOn February 2, 2026, the European Payments Initiative signed a \u003ca href=\"https://epicompany.eu/media-insights/bancomat-bizum-epi-sibs-and-vipps-mobilepay-sign-mou-to-accelerate-the-rollout-of-sovereign-pan-european-payment-solutions\"\u003eMemorandum of Understanding\u003c/a\u003e with the Alliance EuroPA, a consortium linking Spain\u0026rsquo;s Bizum, Italy\u0026rsquo;s Bancomat, Portugal\u0026rsquo;s SIBS, and the Nordic Vipps MobilePay system. The deal connects 130 million users across 13 countries into a single interoperable payment network. Headlines framed it as Europe breaking up with Visa and Mastercard. The actual story is more interesting: Europe is attempting an infrastructure arbitrage that, if it works, could reprice how money moves across the continent.\u003c/p\u003e\n\u003cp\u003eThis is not primarily a sovereignty play, though that is how politicians sell it. It is an attempt to exploit a structural pricing inefficiency in European payments that Visa and Mastercard have maintained for decades and that the EU\u0026rsquo;s own regulation accidentally made harder to dislodge.\u003c/p\u003e\n\u003ch2 id=\"i-the-hidden-fee-structure\"\u003eI. The hidden fee structure\u003c/h2\u003e\n\u003cp\u003eThe EU\u0026rsquo;s 2015 \u003ca href=\"https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32015R0751\"\u003eInterchange Fee Regulation\u003c/a\u003e capped consumer debit interchange at 0.2% and credit at 0.3%. This was celebrated as a win for merchants. What happened next was predictable to anyone who has watched regulated industries: Visa and Mastercard shifted revenue to unregulated \u0026ldquo;scheme fees\u0026rdquo; for authorization, clearing, and settlement. According to \u003ca href=\"https://www.eurocommerce.eu/2025/06/ten-years-after-the-interchange-fee-regulation-we-need-new-action-to-tackle-new-wholesale-price-increases/\"\u003eEuroCommerce\u003c/a\u003e, scheme fees rose by a cumulative 33.9% between 2018 and 2022, averaging 7.6% annually. The European Commission\u0026rsquo;s own data shows scheme fees increased by €1.46 billion between 2016 and 2021. \u003ca href=\"https://ecommerce-europe.eu/news-item/the-interchange-fee-regulation-turns-10/\"\u003eEcommerce Europe found\u003c/a\u003e that the average net merchant service charge nearly doubled from 0.27% to 0.44% between 2018 and 2022, effectively neutralizing the entire regulatory benefit.\u003c/p\u003e\n\u003cp\u003eA card transaction through Visa or Mastercard can cost a European merchant up to 2% when all components are included. A SEPA Instant Credit Transfer, the rails that EPI\u0026rsquo;s Wero system uses, processes payments for a fraction of that with near-zero interchange and only processing fees. In Germany, S-Payment has proposed Wero merchant pricing at 0.77% plus gateway charges. That spread, roughly 100 to 120 basis points on every transaction, is the arbitrage opportunity. Applied to the \u003ca href=\"https://coinlaw.io/global-payment-network-statistics/\"\u003e$4.7 trillion\u003c/a\u003e in combined Visa and Mastercard European volume, we are talking about tens of billions of euros annually in fees that could theoretically be disintermediated. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fee-arbitrage-european-payments-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fee-arbitrage-european-payments.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fee-arbitrage-european-payments.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fee-arbitrage-european-payments.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fee-arbitrage-european-payments.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fee-arbitrage-european-payments.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fee-arbitrage-european-payments.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fee-arbitrage-european-payments.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fee-arbitrage-european-payments.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fee-arbitrage-european-payments.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fee-arbitrage-european-payments.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fee-arbitrage-european-payments.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fee-arbitrage-european-payments.png\"\n           alt=\"Horizontal bar chart comparing total merchant cost per transaction across payment methods. Card networks: Visa and Mastercard up to 2.0 percent, PayPal up to 2.3 percent. Account-to-account rails: Wero at 0.77 percent, iDEAL at a flat 0.29 euros, India UPI at 0.0 percent. Dual callout showing the IFR backfired as scheme fees rose 33.9 percent between 2018 and 2022, and the structural A2A arbitrage of 100 to 120 basis points per transaction\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fee-arbitrage-european-payments-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fee-arbitrage-european-payments.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart comparing total merchant cost per transaction across payment methods. Card networks: Visa and Mastercard up to 2.0 percent, PayPal up to 2.3 percent. Account-to-account rails: Wero at 0.77 percent, iDEAL at a flat 0.29 euros, India UPI at 0.0 percent. Dual callout showing the IFR backfired as scheme fees rose 33.9 percent between 2018 and 2022, and the structural A2A arbitrage of 100 to 120 basis points per transaction\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eMost analysis I read over the past days focuses on whether Wero can beat Visa and Mastercard on user experience or brand recognition. But Wero does not need to win on UX. It needs to win on cost, and the cost advantage is structural because account-to-account payments simply skip an entire layer of intermediation. The question is whether that cost advantage is large enough to overcome the switching costs, and whether the political will exists to force adoption where market forces alone might not.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"ii-what-wero-actually-is-and-why-the-impact-of-the-europa-deal\"\u003eII. What Wero actually is and why the impact of the EuroPA deal\u003c/h2\u003e\n\u003cp\u003eWero is a digital wallet built on top of SEPA Instant Credit Transfer infrastructure. Users access it through their existing banking app. Payments move directly between bank accounts in under 10 seconds using a phone number, email, or QR code. No card, no card network, no intermediary skimming basis points off each transaction.\u003c/p\u003e\n\u003cp\u003eEPI launched Wero for peer-to-peer transfers in Germany on July 2, 2024, followed by France in September and Belgium in November of that year. E-commerce payments went live in Germany in November 2025, with merchants including Lidl, Decathlon, and Rossmann accepting it. Point-of-sale NFC tap payments are planned for 2026 to 2027.\u003c/p\u003e\n\u003cp\u003eThe 16 founding bank shareholders include \u003ca href=\"https://group.bnpparibas/en/news/bnp-paribas-partners-with-wero-for-e-commerce-payment-solutions\"\u003eBNP Paribas\u003c/a\u003e, Crédit Agricole, Société Générale, Deutsche Bank, the Sparkassen-Finanzgruppe (which alone committed €150 million), ABN AMRO, ING, Rabobank, and pan-European acquirers Nexi and Worldline. Total committed capital sits at roughly €500 million. Membership has expanded to over 1,100 institutions, and \u003ca href=\"https://fintech.global/2025/12/08/n26-partners-with-epi-to-launch-wero-payment-option/\"\u003eboth Revolut and N26\u003c/a\u003e joined in 2025.\u003c/p\u003e\n\u003cp\u003eBefore the EuroPA deal, Wero was a Franco-German-Benelux payments app with roughly 47 million users and a geographic footprint that excluded most of southern and northern Europe. That is not a challenger to Visa and Mastercard. The EuroPA deal changes the math because it connects Wero with Bizum\u0026rsquo;s 30.6 million users in Spain, Bancomat\u0026rsquo;s dominant network in Italy, SIBS in Portugal, and Vipps MobilePay\u0026rsquo;s 12.5 million users across the Nordics. Crucially, it does this through a hub model rather than requiring each country to join EPI as a shareholder. This is a key architectural choice because the shareholder approach already failed once: in 2021 and 2022, \u003ca href=\"https://omdia.tech.informa.com/om022317/european-payments-initiative-project-pivots-after-20-banks-depart\"\u003eroughly 20 banks withdrew from EPI\u003c/a\u003e, including all Spanish institutions, over disagreements about governance and cost sharing. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-europa-network-scale-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/europa-network-scale.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/europa-network-scale.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/europa-network-scale.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/europa-network-scale.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/europa-network-scale.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/europa-network-scale.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/europa-network-scale.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/europa-network-scale.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/europa-network-scale.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/europa-network-scale.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/europa-network-scale.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/europa-network-scale.png\"\n           alt=\"Data table showing the EuroPA alliance network after the February 2 2026 MoU. Wero core with 47 million users across Germany France Belgium Netherlands Luxembourg. Bizum EuroPA hub with 30.6 million users in Spain and 111000 merchants. Bancomat hub with approximately 30 million users in Italy. Vipps MobilePay hub with 12.5 million users across Norway Denmark Finland Sweden. SIBS MB WAY hub with approximately 6 million users in Portugal. iDEAL acquired with approximately 30 million users in Netherlands transitioning to Wero by end 2027. Combined network of over 130 million users across 13 countries covering 72 percent of EU population. Stats strip showing 1100 plus participating institutions, 500 million euros committed capital, 16 founding bank shareholders\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-europa-network-scale-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/europa-network-scale.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Data table showing the EuroPA alliance network after the February 2 2026 MoU. Wero core with 47 million users across Germany France Belgium Netherlands Luxembourg. Bizum EuroPA hub with 30.6 million users in Spain and 111000 merchants. Bancomat hub with approximately 30 million users in Italy. Vipps MobilePay hub with 12.5 million users across Norway Denmark Finland Sweden. SIBS MB WAY hub with approximately 6 million users in Portugal. iDEAL acquired with approximately 30 million users in Netherlands transitioning to Wero by end 2027. Combined network of over 130 million users across 13 countries covering 72 percent of EU population. Stats strip showing 1100 plus participating institutions, 500 million euros committed capital, 16 founding bank shareholders\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe hub model lets national systems keep their local brands and governance while gaining cross-border interoperability. A Bizum user in Madrid will be able to pay a German merchant. An Italian Bancomat customer can transfer money to someone in France. 130 million users is not just a bigger number than 47 million, it is the difference between a niche product and something that forces merchant adoption.\u003c/p\u003e\n\u003cp\u003eEPI also acquired two established national payment systems outright. \u003ca href=\"https://ideal.nl/en/epi-successfully-completes-acquisition-of-ideal-and-payconiq-international\"\u003eiDEAL\u003c/a\u003e in the Netherlands processes 1.5 billion transactions annually and handles 72% of Dutch e-commerce. Payconiq/Bancontact dominates in Belgium and Luxembourg. Both acquisitions completed in October 2023. iDEAL will \u003ca href=\"https://epicompany.eu/media-insights/ideal-to-phase-into-wero\"\u003etransition to Wero branding by end of 2027\u003c/a\u003e. In France, the pre-existing Paylib service with 35 million users was directly replaced by Wero at launch. These are not greenfield user acquisition plays. They are migrating existing transaction volumes onto a unified pan-European rail.\u003c/p\u003e\n\u003ch2 id=\"iii-the-geopolitical-accelerant\"\u003eIII. The geopolitical accelerant\u003c/h2\u003e\n\u003cp\u003eThe economics alone might not have been enough to generate the political will for this kind of project. What changed was Russia. When Visa and Mastercard \u003ca href=\"https://www.americanbanker.com/news/how-visa-and-mastercards-ban-could-disrupt-russian-payments\"\u003esuspended operations in Russia\u003c/a\u003e in March 2022 following the invasion of Ukraine, they severed a market where they controlled approximately 72% of card payments. The intended target was Moscow. The unintended lesson was Brussels: payment networks controlled by American corporations can be weaponized, and what gets deployed against Russia could theoretically be turned against Europe (see my earlier post on \u003ca href=\"https://philippdubach.com/posts/pozsars-bretton-woods-iii-three-years-later-2/2/\"\u003eBretton Woods III\u003c/a\u003e).\u003c/p\u003e\n\u003cp\u003eECB President Christine Lagarde has become the initiative\u0026rsquo;s most vocal political champion. In \u003ca href=\"https://www.irishtimes.com/business/2026/02/09/european-alternatives-to-visa-and-mastercard-urgently-needed-says-banking-chief/\"\u003eearly February 2026\u003c/a\u003e she told Irish radio that whether Europeans use a card or a phone, the transaction typically flows through Visa, Mastercard, PayPal, or Alipay, all of which originate from either the US or China. ECB Executive Board member \u003ca href=\"https://www.ecb.europa.eu/press/key/date/2025/html/ecb.sp250929~9a94367d26.en.html\"\u003ePiero Cipollone\u003c/a\u003e has been more direct, arguing that Europe\u0026rsquo;s dependence on non-European payment solutions puts it at the mercy of decisions made elsewhere. In March 2025, ECB Chief Economist Philip Lane warned that this dependence leaves Europe \u0026ldquo;open to coercion.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eTrump\u0026rsquo;s second term has sharpened these concerns considerably. EPI CEO Martina Weimert \u003ca href=\"https://www.irishtimes.com/business/2026/02/09/european-alternatives-to-visa-and-mastercard-urgently-needed-says-banking-chief/\"\u003etold the Financial Times\u003c/a\u003e that the problem with the digital euro is that it will arrive in a few years, perhaps after Trump\u0026rsquo;s term ends, so she thinks Europe is somewhat short on time. Tariff threats, territorial claims over Greenland, and a pro-crypto, anti-CBDC US policy agenda have turned European payment sovereignty from a technocratic aspiration into something closer to a defense priority. And European defense spending is the one area where political consensus currently exists across virtually all member states.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis matters for understanding why Wero might succeed where its predecessors failed. The Monnet Project collapsed in 2012 when the European Commission refused to support multilateral interchange fees. The original EPI card-scheme vision was abandoned after the bank withdrawals. The Nordic P27 initiative collapsed in 2023. Each failure happened in a geopolitical context where the urgency was abstract. The urgency is no longer abstract. When 70 economists including Thomas Piketty published an \u003ca href=\"https://eutoday.net/parliament-pivotal-decision-on-ecb-digital-euro/\"\u003eopen letter in January 2026\u003c/a\u003e calling the digital euro \u0026ldquo;the only defence\u0026rdquo; against dependence on US payment systems, that represents a shift in the Overton window that did not exist even two years ago.\u003c/p\u003e\n\u003ch2 id=\"iv-profitability\"\u003eIV. Profitability\u003c/h2\u003e\n\u003cp\u003eThe EU\u0026rsquo;s own interchange fee regulation, the one designed to protect merchants from Visa and Mastercard, has inadvertently created what I think is one of the largest barriers to entry for any new European payment network.\u003c/p\u003e\n\u003cp\u003eWhen interchange is capped at 0.2% for debit, the revenue pool available to fund a new network is tiny. Visa and Mastercard can sustain their European operations because they amortize costs across a $24 trillion global transaction base. A new European entrant has to build comparable infrastructure, convince hundreds of thousands of merchants to integrate, and acquire tens of millions of users, all while operating in a margin environment that was deliberately compressed by regulation. Weimert herself has estimated that building a viable full-scale alternative requires \u0026ldquo;several billion euros,\u0026rdquo; with private estimates cited by \u003ca href=\"https://fortune.com/2021/07/10/europe-digital-payments-network-epi-sepa-mastercard-visa/\"\u003eFortune\u003c/a\u003e ranging as high as €6 billion.\u003c/p\u003e\n\u003cp\u003eThis is what often happen with bad regulation. The regulation that was supposed to weaken the duopoly has actually strengthened its competitive moat by making the economics of entry worse. Visa and Mastercard responded to interchange caps by raising unregulated fees, so their total revenue per transaction barely changed. But a new entrant cannot charge those same scheme fees without undermining its cost advantage proposition. The revenue has to come from somewhere else.\u003c/p\u003e\n\u003cp\u003eEPI\u0026rsquo;s answer is value-added services: buy-now-pay-later, digital identity, subscription management, loyalty programs. None of these exist yet. They are on the roadmap for 2027 and beyond. In the meantime, Wero operates as a cost center subsidized by its bank shareholders. The Sparkassen\u0026rsquo;s €150 million commitment is patient capital from a cooperative banking group with a 200-year time horizon. BNP Paribas and Crédit Agricole can absorb the costs as a strategic investment. But the question of when, or whether, Wero becomes self-sustaining is genuinely open.\u003c/p\u003e\n\u003ch2 id=\"v-india-and-brazil-comparisons-are-both-more-and-less-instructive-than-they-appear\"\u003eV. India and Brazil comparisons are both more and less instructive than they appear\u003c/h2\u003e\n\u003cp\u003eEvery article about Wero mentions India\u0026rsquo;s UPI and Brazil\u0026rsquo;s Pix as proof of concept. The numbers are undeniably impressive. UPI processed \u003ca href=\"https://meetanshi.com/blog/upi-statistics/\"\u003e228.3 billion transactions worth approximately $3.6 trillion\u003c/a\u003e in 2025, up 29% year-over-year. The IMF \u003ca href=\"https://www.pib.gov.in/PressReleasePage.aspx?PRID=2200569\u0026amp;reg=3\u0026amp;lang=1\"\u003erecognized it in June 2025\u003c/a\u003e as the world\u0026rsquo;s largest retail fast-payment system. Brazil\u0026rsquo;s Pix reached \u003ca href=\"https://en.wikipedia.org/wiki/Pix_(payment_system)\"\u003e175 million users\u003c/a\u003e and processed 63.4 billion transactions worth $4.6 trillion in 2024, growing 53% year-over-year. Both systems achieved in a few years what Visa and Mastercard built over decades. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-global-a2a-payment-scale-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/global-a2a-payment-scale.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/global-a2a-payment-scale.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/global-a2a-payment-scale.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/global-a2a-payment-scale.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/global-a2a-payment-scale.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/global-a2a-payment-scale.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/global-a2a-payment-scale.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/global-a2a-payment-scale.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/global-a2a-payment-scale.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/global-a2a-payment-scale.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/global-a2a-payment-scale.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/global-a2a-payment-scale.png\"\n           alt=\"Horizontal bar chart comparing annual transaction volumes of sovereign account-to-account payment systems worldwide. Pix at 4.6 trillion dollars with 175 million users and 53 percent year over year growth. UPI at 3.6 trillion dollars with 491 million users and 29 percent year over year growth. Wero highlighted in red at less than 0.1 trillion dollars with 47 million users in its first year. MIR at approximately 1.4 trillion dollars estimated with 400 million plus cards and 66.7 percent domestic share. Callout noting Europe targets 4.7 trillion in annual Visa and Mastercard European volume but UPI and Pix had structural advantages Europe lacks including low card penetration and central bank mandates\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-global-a2a-payment-scale-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/global-a2a-payment-scale.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart comparing annual transaction volumes of sovereign account-to-account payment systems worldwide. Pix at 4.6 trillion dollars with 175 million users and 53 percent year over year growth. UPI at 3.6 trillion dollars with 491 million users and 29 percent year over year growth. Wero highlighted in red at less than 0.1 trillion dollars with 47 million users in its first year. MIR at approximately 1.4 trillion dollars estimated with 400 million plus cards and 66.7 percent domestic share. Callout noting Europe targets 4.7 trillion in annual Visa and Mastercard European volume but UPI and Pix had structural advantages Europe lacks including low card penetration and central bank mandates\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eBut the structural conditions that enabled UPI and Pix do not map cleanly onto Europe. India had a large unbanked population and low existing card penetration. UPI did not have to displace an entrenched incumbent so much as fill a vacuum. Pix launched via central bank mandate requiring every financial institution to participate, with zero-cost transfers for individuals. Both countries also had single regulatory jurisdictions and populations accustomed to mobile-first payments.\u003c/p\u003e\n\u003cp\u003eEurope has none of these conditions. Card penetration is high. Consumer habits are entrenched. The regulatory patchwork spans 27 member states plus associated countries, each with their own banking traditions and payment preferences. There is no single authority that can mandate participation the way Brazil\u0026rsquo;s central bank did.\u003c/p\u003e\n\u003cp\u003eWhat Europe does have, and this is the part most analysts underweight, is a functioning SEPA infrastructure that already connects every bank account in the eurozone. Wero does not need to build new rails. It needs to build a user interface and merchant acceptance layer on top of rails that already exist and that already process trillions of euros annually. The \u003ca href=\"https://www.europeanpaymentscouncil.eu/news-insights/insight/wero-shaping-future-european-payments\"\u003eSEPA Instant Credit Transfer regulation\u003c/a\u003e that became mandatory in 2025 means every eurozone bank must support real-time payments. Europe\u0026rsquo;s governments have already paid for the highway. Wero is building the on-ramps.\u003c/p\u003e\n\u003cp\u003eThe other underappreciated advantage is regulatory asymmetry. The EU\u0026rsquo;s July 2024 ruling forcing \u003ca href=\"https://www.macrumors.com/2024/07/11/apple-opens-iphone-nfc-access-eu/\"\u003eApple to open iPhone NFC access\u003c/a\u003e to third-party wallets means Wero can offer tap-to-pay on iPhones without going through Apple Pay. PSD3, expected in 2026, will likely further strengthen open banking requirements. The European Commission has \u003ca href=\"https://www.pymnts.com/news/regulation/2025/report-european-commission-looking-into-visa-and-mastercard-fees/\"\u003eactive investigations\u003c/a\u003e into Visa and Mastercard\u0026rsquo;s fee structures. In the UK, the \u003ca href=\"https://www.rte.ie/news/business/2025/0526/1514937-visa-mastercard-probe/\"\u003eCompetition Appeal Tribunal\u003c/a\u003e ruled unanimously in June 2025 that the networks\u0026rsquo; interchange fee structures breach competition law. Europe is building an alternative while simultaneously making the incumbent\u0026rsquo;s business model harder to sustain.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"vi-visa-and-mastercard\"\u003eVI. Visa and Mastercard\u003c/h2\u003e\n\u003cp\u003eNeither company has made extensive public statements about Wero, which is itself a strategy: do not elevate the challenger\u0026rsquo;s profile. When pressed, they emphasize the value they provide. Mastercard CEO Michael Miebach argued on an October 2025 earnings call that wherever cards are available in a competitive, level playing field, businesses and consumers opt for cards because of the protections they offer.\u003c/p\u003e\n\u003cp\u003eBut both companies are executing a quiet multi-rail pivot. Visa acquired European open banking leader \u003ca href=\"https://www.paymentsdive.com/news/visa-pay-by-bank-services-card-payments/717206/\"\u003eTink for approximately $2.2 billion\u003c/a\u003e in 2022, gaining the capability to offer the same account-to-account payment rails that Wero uses. Mastercard acquired cybersecurity firm \u003ca href=\"https://en.wikipedia.org/wiki/Mastercard\"\u003eRecorded Future for $2.65 billion\u003c/a\u003e in September 2024, expanding into value-added services. Both are positioning themselves as payment technology platforms rather than pure card networks.\u003c/p\u003e\n\u003cp\u003eThis is rational. If account-to-account payments do take share from card networks in Europe, Visa and Mastercard want to be the infrastructure layer that processes those payments too. They have the merchant relationships, the fraud detection capabilities, and the global acceptance network. The risk for Wero is that even if it succeeds in shifting transactions off card rails, the toll collectors simply move to the new road.\u003c/p\u003e\n\u003ch2 id=\"the-digital-euro\"\u003eThe digital euro\u003c/h2\u003e\n\u003cp\u003eRunning in parallel is the ECB\u0026rsquo;s digital euro project, a central bank digital currency that would serve as legal tender across the eurozone. The \u003ca href=\"https://www.consilium.europa.eu/en/press/press-releases/2025/12/19/single-currency-council-agrees-position-on-the-digital-euro-and-on-strengthening-the-role-of-cash/\"\u003eEU Council agreed its negotiating position\u003c/a\u003e in December 2025. A European Parliament vote is expected in the first half of 2026, with potential first issuance \u003ca href=\"https://finance.yahoo.com/news/ecb-says-digital-euro-ready-025009411.html\"\u003earound 2029\u003c/a\u003e. In October 2025, the ECB \u003ca href=\"https://www.ecb.europa.eu/euro/digital_euro/progress/html/ecb.deprp202510.en.html\"\u003ecompleted its preparation phase\u003c/a\u003e and declared the digital euro technically ready.\u003c/p\u003e\n\u003cp\u003eEPI positions Wero as complementary, handling private money while the digital euro handles public money. But the overlap in ambition is obvious, and it creates a coordination problem. Banks worry about deposit outflows and implementation costs estimated at \u003ca href=\"https://www.capco.com/intelligence/capco-intelligence/the-digital-euro-in-2025\"\u003e€4 to 5.8 billion\u003c/a\u003e. There is no guaranteed parliamentary majority for the legislation. And Trump\u0026rsquo;s anti-CBDC stance, including signing the GENIUS Act for stablecoins while banning federal CBDCs, creates a strange dynamic where Europe might pursue a digital euro partly as a response to American policy that explicitly rejects the concept.\u003c/p\u003e\n\u003cp\u003eMy read is that the digital euro and Wero are less complementary than they are competing bets on the same thesis: that Europe needs sovereign payment infrastructure. The digital euro is the maximalist version. Wero is the pragmatic one. If I had to bet, I would bet on the pragmatic version arriving first and capturing enough transaction volume to make the digital euro\u0026rsquo;s incremental value harder to justify politically. But both could fail. And both failing would leave Europe exactly where it started, which is the outcome Visa and Mastercard are quietly optimizing for.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"whats-next\"\u003eWhat\u0026rsquo;s next\u003c/h2\u003e\n\u003cp\u003eThe next 18 months are decisive. Cross-border P2P payments through the EuroPA hub launch in 2026. E-commerce expansion to France and Belgium follows in the second half of the year. Cross-border e-commerce and point-of-sale payments via the hub are targeted for 2027. iDEAL\u0026rsquo;s full migration to Wero should complete by end of 2027. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-wero-execution-timeline-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/wero-execution-timeline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/wero-execution-timeline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/wero-execution-timeline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/wero-execution-timeline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/wero-execution-timeline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/wero-execution-timeline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/wero-execution-timeline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/wero-execution-timeline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/wero-execution-timeline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/wero-execution-timeline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/wero-execution-timeline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/wero-execution-timeline.png\"\n           alt=\"Timeline chart showing the critical execution window from 2024 to 2029 across five parallel tracks. Wero Core track showing P2P launch completed in 2024, e-commerce Germany completed in 2025, e-commerce France Belgium and NFC pilot active in 2026, POS rollout and BNPL digital ID planned for 2027. EuroPA Hub track showing MoU signed February 2 and cross-border P2P active in 2026, cross-border e-commerce and POS planned for 2027. iDEAL migration track showing co-branding started in 2025, dual-brand phase in 2026, full Wero migration planned end 2027. Regulation track showing Apple NFC forced open July 2024, SEPA Instant mandatory 2025, PSD3 expected and EP digital euro vote uncertain in 2026, potential digital euro issuance uncertain in 2029. Visa Mastercard response track showing acquisitions in 2025, European hub expansion 2026, multi-rail platform pivot 2027. Dual callout on the pragmatic bet versus the wildcard\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-wero-execution-timeline-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/wero-execution-timeline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Timeline chart showing the critical execution window from 2024 to 2029 across five parallel tracks. Wero Core track showing P2P launch completed in 2024, e-commerce Germany completed in 2025, e-commerce France Belgium and NFC pilot active in 2026, POS rollout and BNPL digital ID planned for 2027. EuroPA Hub track showing MoU signed February 2 and cross-border P2P active in 2026, cross-border e-commerce and POS planned for 2027. iDEAL migration track showing co-branding started in 2025, dual-brand phase in 2026, full Wero migration planned end 2027. Regulation track showing Apple NFC forced open July 2024, SEPA Instant mandatory 2025, PSD3 expected and EP digital euro vote uncertain in 2026, potential digital euro issuance uncertain in 2029. Visa Mastercard response track showing acquisitions in 2025, European hub expansion 2026, multi-rail platform pivot 2027. Dual callout on the pragmatic bet versus the wildcard\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe biggest risk is not technology or regulation. It is consumer inertia. Wero has 47 million users, but Mastercard alone has over 900 million branded cards in EU circulation. Credit cards offer credit facilities, rewards programs, and purchase protection that Wero currently cannot match. German adoption has been \u003ca href=\"https://insights.flagshipadvisorypartners.com/wero-the-european-challenger-digital-wallet\"\u003enotably sluggish\u003c/a\u003e: despite being the first launch country, Germany accounts for only 5% of Wero\u0026rsquo;s transaction volume, with France dominating thanks to its Paylib migration. Dutch merchants have \u003ca href=\"https://blog.onlinepaymentplatform.com/en/weros-false-promise-higher-costs-and-more-risks\"\u003epushed back\u003c/a\u003e on the shift from iDEAL\u0026rsquo;s flat €0.29 per transaction to Wero\u0026rsquo;s percentage-based model.\u003c/p\u003e\n\u003cp\u003eI think the outcome depends on whether European policymakers treat this as a market initiative or a strategic infrastructure project. If it is the former, the cost advantages may not be enough to overcome switching costs and consumer habit. If it is the latter, and if governments are willing to subsidize adoption the way they subsidize defense procurement, then the math works. The 130-million-user network created by the EuroPA deal gives Wero something no previous European payment initiative has achieved: a user base large enough to force merchant adoption through sheer volume. Whether that is enough depends on a political question, not a technical one.\u003c/p\u003e\n\u003cp\u003eThe $24 trillion figure in the headline refers to Visa and Mastercard\u0026rsquo;s combined global transaction volume. Europe\u0026rsquo;s share is roughly $4.7 trillion. Even capturing 10% of that would be a major rewiring of European payment infrastructure. The infrastructure arbitrage is real. The spread between card network fees and SEPA Instant costs is measurable and persistent. The question is execution, and execution in Europe is always the question.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"The EuroPA alliance connected 130 million users across 13 countries overnight. But this isn't really about sovereignty. It's an infrastructure arbitrage exploiting a 100-120bps spread between card network fees and SEPA Instant rails, accidentally protected by the EU's own regulation.","image":"https://static.philippdubach.com/ograph/ograph-european-payments-arbitrage3.jpg","date_published":"2026-02-16T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Article","word_count":2789,"reading_time_minutes":14,"keywords":["European Payments Initiative EPI","Wero payment system","Visa Mastercard European market","SEPA Instant Credit Transfer","payment sovereignty Europe","account-to-account payments A2A","EuroPA alliance Bizum Bancomat","interchange fee regulation EU","digital euro ECB","India UPI Brazil Pix comparison","European payment infrastructure","scheme fees Visa Mastercard"],"section":"posts"}},{"id":"https://philippdubach.com/posts/long-volatility-premium/","url":"https://philippdubach.com/posts/long-volatility-premium/","title":"Long Volatility Premium","content_html":"\u003cblockquote\u003e\n\u003cp\u003eThe real value of tail hedging is not in the hedge itself. It\u0026rsquo;s in what the hedge enables.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIn \u003ca href=\"/posts/the-variance-tax/\"\u003eThe Variance Tax\u003c/a\u003e I wrote about the ½σ² formula: compound returns equal arithmetic returns minus half the variance, and because the penalty is quadratic, large drawdowns destroy wealth in ways that are hard to recover from. A portfolio that falls 50% needs 100% just to break even. That piece was about the problem. This one is about a potential solution, and about whether paying for crash protection can actually improve total returns rather than drag them.\u003c/p\u003e\n\u003cp\u003eThere is a chart circulating in quantitative finance circles that should not exist. It shows a strategy that buys put options on the S\u0026amp;P 500 and, when layered on top of a stock portfolio, \u003cem\u003eimproves\u003c/em\u003e total returns while simultaneously reducing volatility and maximum drawdown. The chart comes from Patrick Causley at One River Asset Management in a paper called \u003ca href=\"https://one-river.nyc3.cdn.digitaloceanspaces.com/alternatives-white-papers/October2025/OR%20-%20Heretical%20Thinking%20-%20The%20Long%20Volatility%20Premium%20-%20Oct%2025%20-%20Web.pdf\"\u003e\u0026ldquo;Heretical Thinking: The Long Volatility Premium\u0026rdquo;\u003c/a\u003e and it makes a specific claim: that long volatility, properly constructed, is not a cost center but a compensated factor that deserves to sit alongside value, momentum, and trend in institutional portfolios.\u003c/p\u003e\n\u003cp\u003eThe conventional wisdom says buying puts is a losing game. The dominant empirical finding is that a \u003ca href=\"https://www.cboe.com/insights/posts/white-paper-shows-volatility-risk-premium-facilitated-higher-risk-adjusted-returns-for-put-index/\"\u003evolatility risk premium\u003c/a\u003e (VRP) exists: from 1990 to 2018, the average VIX level was 19.3% while average realized S\u0026amp;P 500 volatility was just 15.1%, a persistent gap of 4.2 percentage points. Options are, on average, overpriced relative to what materializes. The \u003ca href=\"https://en.wikipedia.org/wiki/CBOE_S\u0026amp;P_500_PutWrite_Index\"\u003eCBOE S\u0026amp;P 500 PutWrite Index\u003c/a\u003e, which systematically sells S\u0026amp;P 500 puts against cash collateral, rose 1,835% from 1986 to 2018. The CBOE 5% Put Protection Index, which buys puts as a hedge, rose only 708%. As \u003ca href=\"https://cdn.cboe.com/resources/education/research_publications/PutWriteCBOE19_v14_by_Prof_Oleg_Bondarenko_as_of_June_14.pdf\"\u003eBondarenko (2019)\u003c/a\u003e documented, the PUT Index achieved \u003cstrong\u003e9.54%\u003c/strong\u003e annualized versus \u003cstrong\u003e9.80%\u003c/strong\u003e for the S\u0026amp;P 500 but with far lower volatility (9.95% vs. 14.93%), yielding a Sharpe ratio of 0.65 versus 0.33 for put buyers.\u003c/p\u003e\n\u003cp\u003eSo selling options earns money. Buying them bleeds money. That is the consensus. This article is about why that framing, while technically correct, misses something important. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart2-volatility-risk-premium-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart2-volatility-risk-premium.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart2-volatility-risk-premium.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart2-volatility-risk-premium.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart2-volatility-risk-premium.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart2-volatility-risk-premium.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart2-volatility-risk-premium.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart2-volatility-risk-premium.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart2-volatility-risk-premium.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart2-volatility-risk-premium.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart2-volatility-risk-premium.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart2-volatility-risk-premium.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart2-volatility-risk-premium.png\"\n           alt=\"Dual panel chart showing VIX implied volatility consistently trading above realized S\u0026amp;P 500 volatility from 1990 to 2024, with the VRP spread averaging \u0026#43;4.2 percentage points. Bottom panel shows annual bar chart of the spread with 2008 and 2020 as notable inversions\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart2-volatility-risk-premium-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart2-volatility-risk-premium.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Dual panel chart showing VIX implied volatility consistently trading above realized S\u0026amp;P 500 volatility from 1990 to 2024, with the VRP spread averaging \u0026#43;4.2 percentage points. Bottom panel shows annual bar chart of the spread with 2008 and 2020 as notable inversions\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"i-separating-beta-from-convexity\"\u003eI. Separating beta from convexity\u003c/h2\u003e\n\u003cp\u003eThe key insight from the One River paper is that the raw return of a put option conflates two partially independent components, and conflating them has led to a categorical error in how most allocators think about tail hedging.\u003c/p\u003e\n\u003cp\u003eWhen you buy a put, your P\u0026amp;L is driven by delta (directional exposure to the underlying), gamma (the acceleration of that exposure as the market moves), and vega (sensitivity to implied volatility). The problem with naively holding puts is that delta embeds a massive short-beta position. Since the equity risk premium is one of the most reliable premia in equity markets, you are fighting a powerful headwind. Your puts bleed value every day the market does not crash, and that bleed overwhelms the occasional windfall when it does.\u003c/p\u003e\n\u003cp\u003eCausley\u0026rsquo;s move is straightforward. Neutralize the short-beta by adding enough long equity exposure to offset the embedded delta. What remains is a beta-neutral \u0026ldquo;long volatility factor\u0026rdquo; that isolates gamma and vega. Stack this on top of an equity program and the historical results over approximately 40 years are striking: the beta-1 portfolio with long volatility outperformed a portfolio without it while producing lower volatility and a shallower maximum drawdown. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart1-growth-of-dollar-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart1-growth-of-dollar.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart1-growth-of-dollar.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart1-growth-of-dollar.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart1-growth-of-dollar.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart1-growth-of-dollar.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart1-growth-of-dollar.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart1-growth-of-dollar.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart1-growth-of-dollar.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart1-growth-of-dollar.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart1-growth-of-dollar.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart1-growth-of-dollar.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart1-growth-of-dollar.png\"\n           alt=\"Growth of one dollar chart from 1986 to 2024 on logarithmic scale showing three lines: Beta-1 Long Volatility plus S\u0026amp;P 500 outperforming both the S\u0026amp;P 500 alone and the PPUT index, with event markers for the GFC and COVID crashes\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart1-growth-of-dollar-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart1-growth-of-dollar.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Growth of one dollar chart from 1986 to 2024 on logarithmic scale showing three lines: Beta-1 Long Volatility plus S\u0026amp;P 500 outperforming both the S\u0026amp;P 500 alone and the PPUT index, with event markers for the GFC and COVID crashes\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n The persistence of this phenomenon, even with a simplistic implementation using monthly 5% OTM puts from the CBOE\u0026rsquo;s PPUT index, is what makes the paper interesting rather than dismissible. A more sophisticated execution (better strike selection, dynamic sizing, multi-tenor rolls) would likely improve results further. But the baseline already makes the case.\u003c/p\u003e\n\u003ch2 id=\"ii-why-would-a-long-volatility-premium-exist\"\u003eII. Why would a long volatility premium exist?\u003c/h2\u003e\n\u003cp\u003eIf markets are efficient, a beta-adjusted long volatility position should not deliver a positive premium. Three mechanisms suggest why it might.\u003c/p\u003e\n\u003cp\u003eThe first is the rebalancing premium. When you hold negatively correlated assets and rebalance systematically, you extract what the literature calls a \u0026ldquo;rebalancing bonus\u0026rdquo; where the geometric return exceeds the weighted average of individual arithmetic returns. \u003ca href=\"https://www.tandfonline.com/doi/full/10.1080/10293523.2025.2553254\"\u003eRecent work in the Investment Analysts Journal\u003c/a\u003e formalizes this for tail hedging specifically. A long volatility position that delivers explosive gains during crashes and modest losses during calm markets, rebalanced against equities, creates a structural tailwind. You systematically sell the hedge at high prices after crashes and buy it back cheaply during calm, monetizing mean reversion.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe second is that stock-volatility correlation intensifies dramatically during crashes. When equities fall sharply, implied volatility does not just rise proportionally, it spikes exponentially. The hedge\u0026rsquo;s payoff is largest precisely when the portfolio most needs it. This convexity, once beta-adjusted, can more than compensate for the ongoing cost of the position.\u003c/p\u003e\n\u003cp\u003eThe third is a supply-demand imbalance. Institutional investors are structurally short volatility in numerous ways: through equity ownership itself, through structured products with embedded short option positions, and through strategies that implicitly sell insurance (risk parity, short vol ETFs, pension de-risking). Meanwhile, the supply of long volatility is limited by behavioral challenges. As Jody Deio of Aearon Risk Advisors \u003ca href=\"https://alphaarchitect.com/the-long-volatility-premium-short-the-market-get-paid/\"\u003eexplains\u003c/a\u003e: \u0026ldquo;People don\u0026rsquo;t have the patience to wear these exposures for any long period of time. You\u0026rsquo;re happy being basically a wasting asset. And a wasting asset is nothing that any investment committee or client meeting wants to deal with.\u0026rdquo; This behavioral gap between the demand for protection and the willingness to supply it may create a structural premium for those who can withstand the psychology.\u003c/p\u003e\n\u003cp\u003eAll three mechanisms connect back to the variance tax. The ½σ² drag on compound returns means that reducing drawdown severity has a nonlinear effect on terminal wealth. By truncating left-tail outcomes, even a costly hedge can increase compound wealth through the compounding channel alone. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart4-volatility-tax-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart4-volatility-tax.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart4-volatility-tax.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart4-volatility-tax.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart4-volatility-tax.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart4-volatility-tax.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart4-volatility-tax.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart4-volatility-tax.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart4-volatility-tax.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart4-volatility-tax.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart4-volatility-tax.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart4-volatility-tax.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart4-volatility-tax.png\"\n           alt=\"Exponential recovery curve showing that a 50 percent drawdown requires 100 percent to recover, with severity zones marked as moderate, severe, and catastrophic, illustrating Spitznagel\u0026#39;s volatility tax thesis\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart4-volatility-tax-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart4-volatility-tax.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Exponential recovery curve showing that a 50 percent drawdown requires 100 percent to recover, with severity zones marked as moderate, severe, and catastrophic, illustrating Spitznagel\u0026#39;s volatility tax thesis\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"iii-what-the-research-actually-says\"\u003eIII. What the research actually says\u003c/h2\u003e\n\u003cp\u003eThe claim that long volatility is a compensated factor runs against a large body of literature documenting the short volatility premium. But the two are not necessarily contradictory. They operate at different levels of analysis.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://indices.cib.barclays/dms/Public%20marketing/Volatility_Risk_Premium.pdf\"\u003eResearch from Barclays\u003c/a\u003e found that while the VRP has positive equity market beta, it also has excess alpha above that beta exposure. A linear regression of the VRP against S\u0026amp;P 500 returns found a significant positive intercept of roughly 3.48 volatility points independent of equity market direction. This suggests that both buying and selling volatility can capture distinct premia depending on how the trade is structured and what exposures are isolated.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://am.gs.com/en-dk/advisors/insights/article/2026/finding-true-value-tail-risk-hedging\"\u003eGoldman Sachs Asset Management\u0026rsquo;s 2025 analysis\u003c/a\u003e has the clearest framing I have seen. Their key finding: even an idealized, 99% reliable tail-risk hedging strategy provides a standalone annual return boost of only about \u003cstrong\u003e0.8 basis points\u003c/strong\u003e. Trivial. But that is not the point. The real value comes from what the hedge enables. Because tail-risk hedges reduce the impact of severe drops, they allow a portfolio to take on more equity risk, to increase beta. The gains from this \u0026ldquo;risk budget reallocation\u0026rdquo; can be substantial, especially for institutional investors with fixed drawdown constraints. In Goldman\u0026rsquo;s framing, tail-risk hedging is not a standalone return generator. It is an offensive weapon that enables more aggressive positioning in core assets. This is philosophically closer to how Formula 1 teams think about pit stops: they cost time, but soft tires allow faster laps, resulting in a faster overall race. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart6-portfolio-construction-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart6-portfolio-construction.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart6-portfolio-construction.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart6-portfolio-construction.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart6-portfolio-construction.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart6-portfolio-construction.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart6-portfolio-construction.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart6-portfolio-construction.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart6-portfolio-construction.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart6-portfolio-construction.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart6-portfolio-construction.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart6-portfolio-construction.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart6-portfolio-construction.png\"\n           alt=\"Grid table showing five portfolio constructions with inline metric bars comparing CAGR, volatility, max drawdown, and Sharpe ratio. The 97 percent equity plus 3 percent tail hedge portfolio achieves 12.3 percent CAGR, beating the S\u0026amp;P 500\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart6-portfolio-construction-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart6-portfolio-construction.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Grid table showing five portfolio constructions with inline metric bars comparing CAGR, volatility, max drawdown, and Sharpe ratio. The 97 percent equity plus 3 percent tail hedge portfolio achieves 12.3 percent CAGR, beating the S\u0026amp;P 500\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe numbers from Universa\u0026rsquo;s live track record add color. During Q1 2020, as the COVID pandemic triggered a 34% crash in the S\u0026amp;P 500, \u003ca href=\"https://finance.yahoo.com/news/mark-spitznagel-univesa-cio-on-risk-mitigation-204157461.html\"\u003eUniversa delivered a 4,144% return\u003c/a\u003e. But Spitznagel himself downplays these headline figures, noting that \u0026ldquo;any punter can devise a trade that does well in a crash. The key is how you do in a crash relative to the rest of the time.\u0026rdquo; \u003ca href=\"https://en.wikipedia.org/wiki/Universa_Investments\"\u003eThe Wall Street Journal reported\u003c/a\u003e that a strategy consisting of just a \u003cstrong\u003e3.3% allocation\u003c/strong\u003e to Universa with the rest in the S\u0026amp;P 500 had a compound annual return of \u003cstrong\u003e12.3%\u003c/strong\u003e over 10 years through February 2018, beating the S\u0026amp;P 500 itself. A 3.3% tail position improving total portfolio returns over a decade is not intuitive. But it follows directly from the variance tax arithmetic.\u003c/p\u003e\n\u003ch2 id=\"iv-puts-vs-trend-the-tortoise-and-the-hare\"\u003eIV. Puts vs. trend: the tortoise and the hare\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.aqr.com/Insights/Research/White-Papers/Tail-Risk-Hedging-Contrasting-Put-and-Trend-Strategies\"\u003eAQR\u0026rsquo;s research on tail hedging\u003c/a\u003e, published in the Journal of Systematic Investing, complicates the picture in a way I find genuinely useful for portfolio construction. They compare two fundamental approaches: buying out-of-the-money puts and multi-asset trend-following.\u003c/p\u003e\n\u003cp\u003ePuts act as the hare. They deliver spectacular returns in sudden crashes like COVID, when put-buying strategies returned over \u003cstrong\u003e+42%\u003c/strong\u003e in a single month. But they are expensive to maintain and their long-term expected return is negative. Trend-following acts as the tortoise. It \u003ca href=\"https://www.aqr.com/Insights/Research/Alternative-Thinking/Tail-Hedging-Strategies\"\u003ecannot provide the same reliable downside protection as index puts\u003c/a\u003e, but has delivered surprisingly consistent safe-haven performance when most needed while earning positive long-run returns. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart5-tortoise-vs-hare-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart5-tortoise-vs-hare.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart5-tortoise-vs-hare.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart5-tortoise-vs-hare.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart5-tortoise-vs-hare.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart5-tortoise-vs-hare.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart5-tortoise-vs-hare.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart5-tortoise-vs-hare.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart5-tortoise-vs-hare.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart5-tortoise-vs-hare.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart5-tortoise-vs-hare.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart5-tortoise-vs-hare.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart5-tortoise-vs-hare.png\"\n           alt=\"Grouped horizontal bars comparing put hedging versus trend following returns across six major crises from the dot-com bust to COVID, showing puts dominate short crashes like COVID at plus 42 percent while trend following wins protracted drawdowns like the dot-com bust at plus 42 percent over 31 months\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart5-tortoise-vs-hare-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart5-tortoise-vs-hare.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Grouped horizontal bars comparing put hedging versus trend following returns across six major crises from the dot-com bust to COVID, showing puts dominate short crashes like COVID at plus 42 percent while trend following wins protracted drawdowns like the dot-com bust at plus 42 percent over 31 months\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.wealthmanagement.com/equities/hedging-tail-risks-the-tortoise-versus-the-hare\"\u003eAQR\u0026rsquo;s follow-up paper\u003c/a\u003e examined the five largest 60/40 drawdowns since 2000 and found that options-based strategies outperformed in shorter drawdowns while trend-following posted its most impressive returns during protracted bear markets. Since longer drawdowns are arguably more damaging to long-term wealth (they impair compounding for extended periods, which brings us back to the variance tax), AQR leans toward trend-following as the more practical hedge for most investors.\u003c/p\u003e\n\u003cp\u003eBut the strategies are genuinely complementary. \u003ca href=\"https://www.tandfonline.com/doi/full/10.1080/10293523.2025.2553254\"\u003eRecent academic work\u003c/a\u003e combining both approaches via a portable alpha framework found statistically significant alpha of 0.25% per month after controlling for traditional equity factors, with the strongest outperformance during periods of market turmoil. Puts for the fast crash, trend for the slow bleed.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"where-most-tail-hedges-fail-and-the-benchmark-problem\"\u003eWhere most tail hedges fail, and the benchmark problem\u003c/h2\u003e\n\u003cp\u003eHere is where most investors get burned. A \u003ca href=\"https://www.caia.org/sites/default/files/2013-aiar-q1-comparison.pdf\"\u003eCAIA Association paper\u003c/a\u003e compared multiple tail-risk strategies against a deliberately boring benchmark: holding cash. Cash achieved a reduction of 80% of portfolio tail risk and 81% of portfolio standard deviation compared to an S\u0026amp;P 500-only portfolio, with an information ratio of 0.67. Several popular tail-risk strategies, particularly those involving short-dated VIX futures and 1-month variance swaps, actually failed to beat this cash benchmark, with performance drags of 355 and 203 basis points respectively. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-chart7-strategy-efficiency-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/chart7-strategy-efficiency.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/chart7-strategy-efficiency.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/chart7-strategy-efficiency.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/chart7-strategy-efficiency.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart7-strategy-efficiency.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/chart7-strategy-efficiency.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/chart7-strategy-efficiency.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/chart7-strategy-efficiency.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart7-strategy-efficiency.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/chart7-strategy-efficiency.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/chart7-strategy-efficiency.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/chart7-strategy-efficiency.png\"\n           alt=\"Scatter quadrant chart plotting annual cost in basis points versus crisis return for six tail-risk strategies. Trend following sits in the ideal quadrant with low cost and high crisis return. VIX futures and variance swaps fall in the expensive quadrant, underperforming even a simple cash allocation\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-chart7-strategy-efficiency-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/chart7-strategy-efficiency.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Scatter quadrant chart plotting annual cost in basis points versus crisis return for six tail-risk strategies. Trend following sits in the ideal quadrant with low cost and high crisis return. VIX futures and variance swaps fall in the expensive quadrant, underperforming even a simple cash allocation\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThis is the finding that should make any allocator uncomfortable. If your sophisticated tail hedge cannot beat holding Treasury bills, you are paying for complexity that destroys value. The specific implementation matters a lot, and many \u0026ldquo;obvious\u0026rdquo; approaches (VIX futures being the most popular) are structurally flawed because of contango decay in the VIX term structure that steadily erodes returns during calm periods.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eA \u003ca href=\"https://onlinelibrary.wiley.com/doi/full/10.1002/fut.22602\"\u003e2025 paper in the Journal of Futures Markets\u003c/a\u003e adds a related finding: naïve hedging strategies outperformed more complex models for tail-risk hedging, consistent with earlier findings on variance-minimizing hedges. The explanation lies in model risk. Sophisticated approaches require more assumptions about market dynamics, and when those assumptions are wrong (as they inevitably are during the very tail events you are trying to hedge) the resulting model misspecification can leave hedging portfolios with higher-than-expected risk. This is a familiar problem in ML: the more parameters you fit on in-sample data, the worse your out-of-sample performance when the regime changes. Tail events are precisely when regimes change.\u003c/p\u003e\n\u003cp\u003eThere is also a benchmark problem that poisons the conversation around tail hedging. A portfolio of stocks plus put options gets compared against a portfolio of just stocks. When the market rises steadily for years, the hedged portfolio naturally underperforms and the hedge looks like a waste of money. This comparison is intellectually dishonest. A portfolio with puts has less risk than a portfolio without them. Comparing them as equivalent is like comparing a levered equity portfolio to an unlevered one and concluding that leverage \u0026ldquo;works\u0026rdquo; because it outperformed during a bull market. The appropriate comparison for a hedged portfolio is against a portfolio with similar risk, whether achieved through lower equity allocation, higher cash balances, or other risk-reducing measures. When \u003ca href=\"https://www.tandfonline.com/doi/full/10.1080/10293523.2025.2553254\"\u003eBhansali and Davis (2010)\u003c/a\u003e conducted this more appropriate comparison, they found that offensive tail hedging, using the freed-up risk budget to increase equity exposure, resulted in superior risk-adjusted performance. The hedge was not a drag. It was an enabler.\u003c/p\u003e\n\u003ch2 id=\"what-i-take-away\"\u003eWhat I take away\u003c/h2\u003e\n\u003cp\u003eMost of the interesting questions in finance are not about individual positions but about what positions enable. Tail hedging is boring in isolation. What it does to the rest of the portfolio, the willingness to stay invested during drawdowns, the capacity to hold concentrated positions, the ability to rebalance into cheap assets after crashes rather than capitulating, that is where the return comes from. Spitznagel and Goldman agree on this even if they agree on little else.\u003c/p\u003e\n\u003cp\u003eThe optimal tail hedge allocation is a psychological question, not a mathematical one. Most practitioners suggest 1 to 5% of portfolio value, sized to offset a meaningful portion of equity losses during a severe 30 to 50% drawdown. But the right number is the one that allows you to stay invested in your core portfolio through the worst of times without abandoning the strategy. If you cannot stomach three years of negative carry on a put overlay, the correct allocation for you is zero, not five percent.\u003c/p\u003e\n\u003cp\u003eThe framing I find most useful is Goldman\u0026rsquo;s. Do not evaluate the hedge in isolation. Evaluate what it enables. A 3% tail hedge allocation that reduces max drawdown from 50% to 25% frees up enough risk budget to increase equity exposure by 10 to 15 percentage points. The incremental return from that higher equity allocation over a full market cycle will, in most scenarios, more than compensate for the cost of the hedge. The hedge is the enabler, not the alpha.\u003c/p\u003e\n\u003cp\u003eWhether you implement this with puts, trend-following, or both depends on your time horizon and what kind of drawdown keeps you up at night. Fast crashes favor puts. Slow bleeds favor trend. If you do not know which one is coming (you do not), blend them.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe caveats are real. All backtests benefit from hindsight. Transaction costs and bid-ask spreads in options markets are material and not fully captured in the CBOE indices used as benchmarks. The behavioral challenge of holding a position that bleeds money most of the time is hard to overstate, especially for allocators who report to investment committees that look at monthly returns.\u003c/p\u003e\n\u003cp\u003eBut the turkey metaphor from One River\u0026rsquo;s presentation is apt. A statistician turkey who, right up until Thanksgiving, can prove with perfect p-values that the farmer is benevolent. The turkey\u0026rsquo;s model is flawless within the distribution of observed data. The problem is that the data does not contain the event that matters most. Tail hedging is the strategy of the paranoid turkey. The empirical evidence suggests this paranoia can be not just protective but profitable, provided you implement it with discipline and use it not as a way to avoid risk but as a foundation for taking more of it.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"One River's data shows beta-adjusted long volatility outperformed the S\u0026P 500 over 40 years. Goldman, AQR, and Universa agree on the mechanism but disagree on implementation. A synthesis of the evidence.","image":"https://static.philippdubach.com/ograph/ograph-long-volatility-premium3.jpg","date_published":"2026-02-14T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Article","word_count":2501,"reading_time_minutes":12,"keywords":["long volatility premium","beta-adjusted tail hedging","volatility risk premium","tail risk hedging portfolio construction","variance tax compounding","put options convexity gamma","trend following vs put hedging","Universa Investments Spitznagel","rebalancing premium volatility","Goldman Sachs tail risk","AQR tortoise hare hedging","CAIA tail risk comparison"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-saaspocalypse-paradox/","url":"https://philippdubach.com/posts/the-saaspocalypse-paradox/","title":"The SaaSpocalypse Paradox","content_html":"\u003cblockquote\u003e\n\u003cp\u003eThe market is simultaneously pricing AI capex failure and AI destroying all software. Both cannot be true.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-jpm-murphy-note-spread-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/jpm-murphy-note-spread.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/jpm-murphy-note-spread.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/jpm-murphy-note-spread.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/jpm-murphy-note-spread.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/jpm-murphy-note-spread.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/jpm-murphy-note-spread.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/jpm-murphy-note-spread.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/jpm-murphy-note-spread.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/jpm-murphy-note-spread.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/jpm-murphy-note-spread.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/jpm-murphy-note-spread.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/jpm-murphy-note-spread.png\"\n           alt=\"JP Morgan research note on the February 2026 software sell-off by Mark R Murphy titled Software Collapse Broadens with Nowhere to Hide, questioning the leap from Claude Cowork Plugins to full enterprise software disruption\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-jpm-murphy-note-spread-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/jpm-murphy-note-spread.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"JP Morgan research note on the February 2026 software sell-off by Mark R Murphy titled Software Collapse Broadens with Nowhere to Hide, questioning the leap from Claude Cowork Plugins to full enterprise software disruption\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eAnthropic released \u003ca href=\"https://github.com/anthropics/knowledge-work-plugins\"\u003e11 open-source plugins\u003c/a\u003e for Claude Cowork on January 30. Apache-2.0 licensed, file-based, running in a macOS-only research preview. Within a week, the IGV software ETF had fallen \u003cstrong\u003e32%\u003c/strong\u003e from its September peak to a 52-week low of $79.65, roughly $2 trillion in market cap had evaporated, and hedge funds had made \u003ca href=\"https://www.bnnbloomberg.ca/business/2026/02/04/us-software-stocks-hit-by-anthropic-wake-up-call-on-ai-disruption/\"\u003e$24 billion\u003c/a\u003e shorting the sector. The RSI hit 18, the most oversold reading \u003ca href=\"https://articles.stockcharts.com/article/the-claude-crash-how-ai-triggered-a-historic-selloff-in-software-stocks/\"\u003esince 1990\u003c/a\u003e. JP Morgan titled their note \u0026ldquo;\u003ca href=\"https://privatebank.jpmorgan.com/nam/en/insights/markets-and-investing/tmt/software-shock-ais-broken-logic\"\u003eSoftware Collapse Broadens with Nowhere to Hide\u003c/a\u003e.\u0026rdquo; Jefferies coined the term SaaSpocalypse. It was the worst software stock crash since the dot-com bust.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://fortune.com/2026/02/04/why-saas-stocks-tech-selloff-freefall-like-deepseek-2025-overblown-paradox-irrational/\"\u003eBank of America\u0026rsquo;s Vivek Arya\u003c/a\u003e identified the paradox at the center of this: investors are simultaneously punishing hyperscaler stocks because AI capex might generate weak returns, while destroying software stocks because AI adoption will be so pervasive it renders all existing software obsolete. Both cannot hold simultaneously. If AI tools aren\u0026rsquo;t generating meaningful ROI, they\u0026rsquo;re not replacing enterprise software at scale. If they are replacing enterprise software at scale, the hyperscalers are earning extraordinary returns on their infrastructure investment. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-saaspocalypse-paradox-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/saaspocalypse-paradox.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/saaspocalypse-paradox.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/saaspocalypse-paradox.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/saaspocalypse-paradox.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/saaspocalypse-paradox.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/saaspocalypse-paradox.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/saaspocalypse-paradox.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/saaspocalypse-paradox.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/saaspocalypse-paradox.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/saaspocalypse-paradox.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/saaspocalypse-paradox.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/saaspocalypse-paradox.png\"\n           alt=\"The BofA AI paradox in the 2026 SaaSpocalypse showing two mutually exclusive narratives: AI capex generating weak returns with $670B spend and 4 percent coverage ratio, versus AI destroying all software with 32 percent IGV drawdown and $2 trillion lost despite 17 percent sector earnings growth\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-saaspocalypse-paradox-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/saaspocalypse-paradox.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"The BofA AI paradox in the 2026 SaaSpocalypse showing two mutually exclusive narratives: AI capex generating weak returns with $670B spend and 4 percent coverage ratio, versus AI destroying all software with 32 percent IGV drawdown and $2 trillion lost despite 17 percent sector earnings growth\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThis paradox can only resolve in one of three ways: AI adoption is real and hyperscaler capex is justified, AI adoption stalls and software incumbents are fine, or the truth is somewhere in between and the market has mispriced both sides. The first two are internally consistent. The market is pricing neither.\u003c/p\u003e\n\u003ch2 id=\"the-bear-case-for-enterprise-software\"\u003eThe bear case for enterprise software\u003c/h2\u003e\n\u003cp\u003eThe structural argument against enterprise software is serious and worth stating on its own terms.\u003c/p\u003e\n\u003cp\u003eEnterprise software monetizes through per-seat licensing. The SaaS business model depends on a stable correlation between headcount and license count. AI agents break that correlation. If 10 agents do the work of 100 people, the software doesn\u0026rsquo;t get replaced directly, the headcount that justifies the seats does, and CRM seat revenue drops with it. \u003ca href=\"https://www.tekedia.com/ai-could-destroy-500b-in-enterprise-software-revenue/\"\u003eAlixPartners estimates\u003c/a\u003e up to \u003cstrong\u003e$500 billion\u003c/strong\u003e in enterprise software revenue could be at risk over time. \u003ca href=\"https://www.idc.com/resource-center/blog/is-saas-dead-rethinking-the-future-of-software-in-the-age-of-ai/\"\u003eIDC predicts\u003c/a\u003e pure seat-based pricing will be obsolete by 2028.\u003c/p\u003e\n\u003cp\u003eThe moat question is equally uncomfortable. Enterprise software\u0026rsquo;s traditional defense was the trained-user-interface moat: the years of institutional muscle memory that makes switching costs prohibitive. Databricks CEO Ali Ghodsi \u003ca href=\"https://techcrunch.com/2026/02/09/databricks-ceo-says-saas-isnt-dead-but-ai-will-soon-make-it-irrelevant/\"\u003etold TechCrunch\u003c/a\u003e that this moat collapses when the interface becomes natural language. If the value of Salesforce or ServiceNow lived in their UI rather than their data, and the UI can now be replicated by a general-purpose model, then the moat was shallower than anyone thought. VC has \u003ca href=\"https://www.calcalistech.com/ctechnews/article/hjlvyl7lze\"\u003efled traditional SaaS entirely\u003c/a\u003e; as one investor noted, \u0026ldquo;an entrepreneur approaching a VC fund today with a SaaS startup won\u0026rsquo;t even reach the pitch stage.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe build-versus-buy equation is inverting in real time. \u003ca href=\"https://www.klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/\"\u003eKlarna\u003c/a\u003e ditched Salesforce and Workday, consolidated onto its own AI-augmented stack, and used an OpenAI-powered bot to handle work that previously required 700 employees. \u003ca href=\"https://www.saastr.com/the-2026-saas-crash-its-not-what-you-think/\"\u003eSaaStr\u0026rsquo;s analysis\u003c/a\u003e of Gartner\u0026rsquo;s \u003ca href=\"https://www.gartner.com/en/newsroom/press-releases/2026-02-03-gartner-forecasts-worldwide-it-spending-to-grow-10-point-8-percent-in-2026-totaling-6-point-15-trillion-dollars\"\u003e$1.43 trillion\u003c/a\u003e 2026 software spending forecast reveals that roughly 9 percentage points of the 14.7% headline growth is price increases on existing software, not net new demand. AI is eating SaaS budgets, redirecting IT spend toward infrastructure while reducing the headcount that generates software seats.\u003c/p\u003e\n\u003cp\u003eThis is the case priced into the IGV at $80.\u003c/p\u003e\n\u003ch2 id=\"the-bull-case-for-software-stocks\"\u003eThe bull case for software stocks\u003c/h2\u003e\n\u003cp\u003eThe structural argument for enterprise software rests on a distinction the current sell-off is ignoring entirely.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe bear case assumes a shrinking TAM. \u003ca href=\"https://www.goldmansachs.com/insights/articles/ai-agents-to-boost-productivity-and-size-of-software-market\"\u003eGoldman Sachs Research\u003c/a\u003e argues the opposite: the application software market grows to $780 billion by 2030 at a 13% CAGR, with agents accounting for over 60% of the total. The profit pool shifts from SaaS seats to agentic workloads, but the overall market gets larger, not smaller. \u003ca href=\"https://a16z.com/ai-will-supercharge-modelbusters/\"\u003ea16z\u0026rsquo;s Alex Rampell\u003c/a\u003e takes it further: if AI enables software to not just enhance productivity but actually complete work, the addressable market isn\u0026rsquo;t roughly $350 billion in enterprise software spend (about 1% of GDP). It\u0026rsquo;s the \u003cstrong\u003e~$6 trillion\u003c/strong\u003e white-collar services market (~20% of GDP), a 20x expansion into work that was never software-addressable before.\u003c/p\u003e\n\u003cp\u003eDavid Friedberg made the sharpest version of this argument on the All-In Podcast: software transitions from helping people do work, to completing work, to doing work humans cannot do. At that point, the SaaS pricing model transitions from per-seat to value-based, and \u0026ldquo;SaaS basically takes over the services economy.\u0026rdquo; His estimate: the combined market cap of software companies could be 4x to 10x higher in five years, but \u0026ldquo;not evenly distributed.\u0026rdquo; \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-tam-expansion-bull-case-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/tam-expansion-bull-case.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/tam-expansion-bull-case.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/tam-expansion-bull-case.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/tam-expansion-bull-case.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tam-expansion-bull-case.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/tam-expansion-bull-case.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/tam-expansion-bull-case.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/tam-expansion-bull-case.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tam-expansion-bull-case.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/tam-expansion-bull-case.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/tam-expansion-bull-case.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/tam-expansion-bull-case.png\"\n           alt=\"TAM expansion analysis from $350B enterprise software at 1 percent of GDP to Goldman Sachs $780B projection by 2030 with over 60 percent AI agent share, to the a16z thesis of $6 trillion in white-collar services at 20 percent of GDP, a 20x expansion\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-tam-expansion-bull-case-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/tam-expansion-bull-case.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"TAM expansion analysis from $350B enterprise software at 1 percent of GDP to Goldman Sachs $780B projection by 2030 with over 60 percent AI agent share, to the a16z thesis of $6 trillion in white-collar services at 20 percent of GDP, a 20x expansion\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe software vs semiconductor valuation picture strengthens this framing. The sector is delivering 17% aggregate earnings growth in 2026 while trading at November 2022 EV/Sales multiples, back when the Fed was aggressively hiking into recession fears. The Russell 1000 Software subsector now trades at 32.4x forward earnings versus 43.6x for semiconductors. Recurring-revenue businesses with 90%+ gross margins and 95%+ renewal rates trade at a lower multiple than cyclical chipmakers with 40-60% margins and concentrated customer bases. \u003ca href=\"https://www.cnbc.com/2026/02/10/jpmorgan-says-the-historic-software-selloff-has-gone-far-enough-10-stocks-to-buy-on-sale.html\"\u003eHistorically that\u0026rsquo;s an inversion\u003c/a\u003e that has not persisted. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-earnings-vs-stock-disconnect-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/earnings-vs-stock-disconnect.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/earnings-vs-stock-disconnect.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/earnings-vs-stock-disconnect.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/earnings-vs-stock-disconnect.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/earnings-vs-stock-disconnect.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/earnings-vs-stock-disconnect.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/earnings-vs-stock-disconnect.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/earnings-vs-stock-disconnect.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/earnings-vs-stock-disconnect.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/earnings-vs-stock-disconnect.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/earnings-vs-stock-disconnect.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/earnings-vs-stock-disconnect.png\"\n           alt=\"Q4 2025 earnings vs stock performance disconnect in the 2026 software sell-off: Palantir plus 70.5 percent revenue growth but minus 11.6 percent stock, ServiceNow plus 21 percent but minus 28 percent, Oracle plus 10 percent but minus 53 percent from peak, sector aggregate plus 17 percent earnings growth versus minus 32 percent IGV drawdown\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-earnings-vs-stock-disconnect-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/earnings-vs-stock-disconnect.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Q4 2025 earnings vs stock performance disconnect in the 2026 software sell-off: Palantir plus 70.5 percent revenue growth but minus 11.6 percent stock, ServiceNow plus 21 percent but minus 28 percent, Oracle plus 10 percent but minus 53 percent from peak, sector aggregate plus 17 percent earnings growth versus minus 32 percent IGV drawdown\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-valuation-inversion-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/valuation-inversion.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/valuation-inversion.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/valuation-inversion.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/valuation-inversion.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/valuation-inversion.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/valuation-inversion.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/valuation-inversion.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/valuation-inversion.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/valuation-inversion.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/valuation-inversion.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/valuation-inversion.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/valuation-inversion.png\"\n           alt=\"Software vs semiconductor valuation inversion in 2026: Russell 1000 Software at 32.4x forward PE trades below Russell 1000 Semiconductors at 43.6x, an 11.2x multiple gap, with IGV at $79.65 and S\u0026amp;P 500 software weight compressed from 12 percent to 8.4 percent\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-valuation-inversion-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/valuation-inversion.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Software vs semiconductor valuation inversion in 2026: Russell 1000 Software at 32.4x forward PE trades below Russell 1000 Semiconductors at 43.6x, an 11.2x multiple gap, with IGV at $79.65 and S\u0026amp;P 500 software weight compressed from 12 percent to 8.4 percent\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThis is the case that BofA called a paradox and JP Morgan called a mispricing.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"the-hyperscaler-ai-capex-question-that-connects-both-sides\"\u003eThe hyperscaler AI capex question that connects both sides\u003c/h2\u003e\n\u003cp\u003eThere is a number that both cases have to account for, and it\u0026rsquo;s the one that determines which side of the paradox resolves first.\u003c/p\u003e\n\u003cp\u003eCombined 2026 capex guidance from Microsoft, Alphabet, Amazon, Meta, and Oracle now approaches \u003ca href=\"https://www.cnbc.com/2026/02/06/google-microsoft-meta-amazon-ai-cash.html\"\u003e\u003cstrong\u003e$700 billion\u003c/strong\u003e\u003c/a\u003e, more than doubling from $256 billion in 2024. \u003ca href=\"https://fortune.com/2026/02/04/why-saas-stocks-tech-selloff-freefall-like-deepseek-2025-overblown-paradox-irrational/\"\u003eBank of America calculates\u003c/a\u003e this consumes 94% of operating cash flows after capital returns. The Big Five raised $108 billion in bonds in 2025. AI-related services generate roughly $25 billion in direct revenue against $400+ billion in annual infrastructure spending, a coverage ratio of about 4%. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-hyperscaler-capex-vs-cashflow-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler-capex-vs-cashflow.png\"\n           alt=\"FY2026 hyperscaler AI capex vs cash flow: MSFT META GOOGL AMZN and ORCL estimated cash from operations less dividends and buybacks versus guided capital expenditure, with only Microsoft generating a $5B surplus while Meta shows minus $23B, Google minus $20B, Amazon minus $18B, and Oracle minus $30B\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-hyperscaler-capex-vs-cashflow-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/hyperscaler-capex-vs-cashflow.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"FY2026 hyperscaler AI capex vs cash flow: MSFT META GOOGL AMZN and ORCL estimated cash from operations less dividends and buybacks versus guided capital expenditure, with only Microsoft generating a $5B surplus while Meta shows minus $23B, Google minus $20B, Amazon minus $18B, and Oracle minus $30B\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eIf the bear case is right and AI agents are replacing enterprise software at scale, this capex should already be generating enormous returns. It isn\u0026rsquo;t. If the bull case is right and AI is expanding the TAM into the services economy, this capex is early-stage infrastructure investment that will compound over a decade. In that reading, $700 billion in annual spend is the foundation of a $6 trillion market, not a write-off. Both interpretations require the same capex figure to mean something fundamentally different. The market hasn\u0026rsquo;t decided which.\u003c/p\u003e\n\u003cp\u003eMicrosoft is the sharpest illustration of this tension. Quarterly capex went from $1 billion in early 2015 to a record \u003ca href=\"https://fintool.com/news/microsoft-q2-record-capex-cloud-ai\"\u003e$37.5 billion in Q2 FY2026\u003c/a\u003e, with roughly two-thirds going to short-lived GPU/CPU assets. And yet Microsoft is the \u003ca href=\"https://www.gurufocus.com/news/8591224/microsoft-msft-maintains-resilient-cash-flow-amid-hyperscaler-spending-surge\"\u003eonly hyperscaler\u003c/a\u003e that can fund this buildout from operating cash flow. Azure grew \u003ca href=\"https://futurumgroup.com/insights/microsoft-q2-fy-2026-cloud-surpasses-50b-azure-up-38-cc/\"\u003e39% in Q2 FY2026\u003c/a\u003e, crossing $50 billion in quarterly cloud revenue for the first time. The company is simultaneously the biggest AI capex spender, the one best positioned to generate returns on that spend, and the company whose products (365, Dynamics, Azure) are supposedly being disrupted by Claude plugins. The market is punishing all three at once. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-msft-quarterly-capex-png-8\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/msft-quarterly-capex.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/msft-quarterly-capex.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/msft-quarterly-capex.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/msft-quarterly-capex.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/msft-quarterly-capex.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/msft-quarterly-capex.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/msft-quarterly-capex.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/msft-quarterly-capex.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/msft-quarterly-capex.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/msft-quarterly-capex.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/msft-quarterly-capex.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/msft-quarterly-capex.png\"\n           alt=\"Microsoft quarterly AI capex from FY2015 to FY2026 showing growth from $1 billion to $37.5 billion per quarter, a 2048 percent increase, with recent quarters showing AI infrastructure acceleration\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-msft-quarterly-capex-png-8\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/msft-quarterly-capex.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Microsoft quarterly AI capex from FY2015 to FY2026 showing growth from $1 billion to $37.5 billion per quarter, a 2048 percent increase, with recent quarters showing AI infrastructure acceleration\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"bifurcation-not-extinction-the-saaspocalypse-resolved\"\u003eBifurcation, not extinction: the SaaSpocalypse resolved\u003c/h2\u003e\n\u003cp\u003eA \u003ca href=\"https://am.jpmorgan.com/content/dam/jpm-am-aem/global/en/insights/eye-on-the-market/smothering-heights-amv.pdf\"\u003e60% recession probability\u003c/a\u003e, a \u003ca href=\"https://www.cnbc.com/2026/02/02/fridays-jobs-report-will-be-delayed-because-of-the-partial-government-shutdown.html\"\u003epartial government shutdown\u003c/a\u003e, \u003ca href=\"https://www.salesforceben.com/what-do-trumps-tariffs-mean-for-the-tech-sector/\"\u003eelevated tariffs\u003c/a\u003e, and a structural pricing transition are being sold as a single story. They aren\u0026rsquo;t. Separating the macro from the structural requires asking which software categories are genuinely at risk and which are being sold by association.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.janushenderson.com/en-us/investor/article/how-ai-disruption-is-reshaping-the-software-sector-landscape/\"\u003eJanus Henderson makes a useful distinction\u003c/a\u003e between \u0026ldquo;systems of record\u0026rdquo; and \u0026ldquo;systems of engagement.\u0026rdquo; Systems of record are deeply embedded in business processes, require regulatory compliance, and carry enormous switching costs: ERP, core finance, cybersecurity, observability. \u003ca href=\"https://pitchbook.com/news/articles/is-ais-threat-to-software-overblown-pitchbook-analysis\"\u003ePitchBook described\u003c/a\u003e replacing one as \u0026ldquo;effectively open-heart surgery for an enterprise.\u0026rdquo; Systems of engagement are user-facing workflow tools where the interface is the product: content creation, tier-1 support, basic analytics. When the interface becomes natural language, that moat collapses. \u003cfigure class=\"post-figure\" style=\"width: 90%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-software-bifurcation-map-png-9\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/software-bifurcation-map.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/software-bifurcation-map.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/software-bifurcation-map.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/software-bifurcation-map.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/software-bifurcation-map.png 1200w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/software-bifurcation-map.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/software-bifurcation-map.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/software-bifurcation-map.png 1440w\"\n              sizes=\"90vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/software-bifurcation-map.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/software-bifurcation-map.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/software-bifurcation-map.png 2000w\"\n              sizes=\"90vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/software-bifurcation-map.png\"\n           alt=\"Software bifurcation map by AI disruption risk: ERP cybersecurity and observability at low risk, core CRM and dev tools at medium risk, content creation tier-1 support and basic analytics at high risk, showing the market is pricing every category as if it faces equal threat\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-software-bifurcation-map-png-9\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/software-bifurcation-map.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Software bifurcation map by AI disruption risk: ERP cybersecurity and observability at low risk, core CRM and dev tools at medium risk, content creation tier-1 support and basic analytics at high risk, showing the market is pricing every category as if it faces equal threat\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe bear case is correct about the second category. The bull case is correct about the first. The market is wrong to price them identically. Selling both at the same multiple compression implies that switching costs, regulatory requirements, data gravity, and enterprise procurement cycles have all vanished simultaneously. \u003ca href=\"https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027\"\u003eGartner predicts\u003c/a\u003e over 40% of agentic AI projects will be cancelled by 2027. Salesforce\u0026rsquo;s Agentforce reached \u003ca href=\"https://www.salesforceben.com/salesforce-avoids-q3-danger-zone-with-explosive-agentforce-momentum/\"\u003e18,500 customers\u003c/a\u003e in its first year, the fastest-adopted organic product in company history. These are not the behaviors of a category that has been disrupted. They are the behaviors of incumbents absorbing a new paradigm.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eStated precisely: the bear case is a zero-sum repricing where AI agents compress existing software revenue by eliminating seats and commoditizing interfaces. The bull case is a positive-sum expansion where the surviving software companies capture the $6 trillion in white-collar services that was never software-addressable before. The cost of intelligence has fallen \u003ca href=\"https://a16z.com/ai-will-supercharge-modelbusters/\"\u003e99.7% in two years\u003c/a\u003e (Stanford AI Index). Cumulative AI infrastructure investment is expected to exceed $3 trillion by 2030. That kind of capital deployment doesn\u0026rsquo;t produce a world where software shrinks. It produces a world where the definition of \u0026ldquo;software\u0026rdquo; expands to include most of the services economy.\u003c/p\u003e\n\u003cp\u003eI wrote \u003ca href=\"https://philippdubach.com/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/\"\u003erecently\u003c/a\u003e about how passive flows create mechanical, price-insensitive selling that overwhelms fundamental buyers. This software sell-off is a textbook case. JP Morgan\u0026rsquo;s Murphy \u003ca href=\"https://privatebank.jpmorgan.com/nam/en/insights/markets-and-investing/tmt/software-shock-ais-broken-logic\"\u003edescribed\u003c/a\u003e index arbitrage basket selling, programmatic de-grossing, and passive flow liquidity vacuums. The IGV recorded its \u003ca href=\"https://articles.stockcharts.com/article/the-claude-crash-how-ai-triggered-a-historic-selloff-in-software-stocks/\"\u003ehighest single-day trading volume\u003c/a\u003e in 25 years. \u003ca href=\"https://www.cnbc.com/2026/02/10/jpmorgan-says-the-historic-software-selloff-has-gone-far-enough-10-stocks-to-buy-on-sale.html\"\u003eJP Morgan\u0026rsquo;s follow-up\u003c/a\u003e argued the sell-off has gone far enough. \u003ca href=\"https://fortune.com/2026/02/04/why-saas-stocks-tech-selloff-freefall-like-deepseek-2025-overblown-paradox-irrational/\"\u003eBofA called it\u003c/a\u003e a paradox that \u0026ldquo;doesn\u0026rsquo;t make any sense.\u0026rdquo; History suggests these kinds of extremes, the 2016 LinkedIn panic, the 2022 rate-shock drawdown, the January 2025 DeepSeek crash, tend to mark inflection points rather than starting points for further decline.\u003c/p\u003e\n\u003cp\u003eThe hardest trade right now is the one that requires distinguishing between stocks that are cheap because they\u0026rsquo;re broken and stocks that are cheap because the market is broken. The SaaSpocalypse priced into the IGV at $80, with a 30-year-extreme RSI, pricing in an extinction event that operating results don\u0026rsquo;t remotely support, looks a lot more like the latter.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"AI capex failure and AI replacing all software are mutually exclusive. Why the 2026 SaaSpocalypse is a $2 trillion pricing error, not an extinction event.","image":"https://static.philippdubach.com/ograph/ograph-saaspocalypse-paradox3.jpg","date_published":"2026-02-13T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Investing"],"_philippdubach":{"type":"Article","word_count":1696,"reading_time_minutes":8,"keywords":["SaaSpocalypse 2026","software sell-off 2026","IGV ETF crash 2026","AI disruption enterprise software","software vs semiconductor valuation 2026","SaaS crash 2026","software bear market 2026","BofA AI paradox","hyperscaler capex sustainability 2026","SaaS per-seat pricing AI agents","software valuation inversion semiconductors","Anthropic Claude Cowork Plugins sell-off","systems of record vs systems of engagement","software bifurcation AI risk","value-based pricing AI software","AI capex ROI 2026","SaaS pricing model transition","enterprise AI adoption 2026","Goldman Sachs AI agent TAM 2030","a16z white collar services AI expansion"],"section":"posts"}},{"id":"https://philippdubach.com/posts/dont-go-monolithic-the-agent-stack-is-stratifying/","url":"https://philippdubach.com/posts/dont-go-monolithic-the-agent-stack-is-stratifying/","title":"Don't Go Monolithic; The Agent Stack Is Stratifying","content_html":"\u003cblockquote\u003e\n\u003cp\u003eThe defensible asset in enterprise AI is not the model. It\u0026rsquo;s the organizational world model.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eEvery major compute era decomposes into specialized layers with different winners at each level. Cloud split into IaaS, PaaS, and SaaS. The modern data stack split into ingestion, warehousing, transformation, and BI. Each time, specialists beat the generalists because the layers have fundamentally different economics: different rates of change, different capital requirements, different sources of lock-in.\u003c/p\u003e\n\u003cp\u003eThe enterprise AI agent stack is doing the same thing right now. Arvind Jain, the CEO of Glean, recently published a \u003ca href=\"https://x.com/arvind2/status/2020920652950339694\"\u003estructural analysis\u003c/a\u003e of the emerging enterprise agent architecture that crystallized something I\u0026rsquo;d been thinking about. His framing describes a stack decomposing into six layers (security, context, models, orchestration, agents, and interfaces) with different defensibility profiles at each level. Glean sits in the context layer so the usual positioning caveats apply, but the structural argument is sound regardless of who makes it.\u003c/p\u003e\n\u003cp\u003eI want to take it further. There are three claims embedded in this agentic AI architecture that I think are underappreciated, and together they form a thesis about where durable advantage actually accrues in enterprise AI. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-emerging-agent-stack-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/emerging-agent-stack.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/emerging-agent-stack.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/emerging-agent-stack.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/emerging-agent-stack.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/emerging-agent-stack.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/emerging-agent-stack.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/emerging-agent-stack.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/emerging-agent-stack.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/emerging-agent-stack.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/emerging-agent-stack.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/emerging-agent-stack.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/emerging-agent-stack.png\"\n           alt=\"Enterprise AI agent stack diagram showing six layers ranked by defensibility: Context scores highest (hardest to rebuild), followed by Orchestration and Security, while Models and Interfaces have the lowest switching costs\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-emerging-agent-stack-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/emerging-agent-stack.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Enterprise AI agent stack diagram showing six layers ranked by defensibility: Context scores highest (hardest to rebuild), followed by Orchestration and Security, while Models and Interfaces have the lowest switching costs\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch2 id=\"i-models-are-converging-toward-shared-infrastructure\"\u003eI. Models are converging toward shared infrastructure\u003c/h2\u003e\n\u003cp\u003eThe model layer is the one most people obsess over, and it\u0026rsquo;s also the one converging fastest toward commodity economics. Training costs \u003ca href=\"https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models\"\u003escale roughly 2.4x per year\u003c/a\u003e, with current frontier runs costing hundreds of millions and \u003ca href=\"https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-models-that-cost-dollar1-billion-to-train-are-in-development-dollar100-billion-models-coming-soon-largest-current-models-take-only-dollar100-million-to-train-anthropic-ceo\"\u003ebillion-dollar training runs already underway\u003c/a\u003e, according to Anthropic\u0026rsquo;s Dario Amodei. Only a handful of organizations on Earth can operate at this scale: OpenAI, Google DeepMind, Anthropic, Meta, and a few others including xAI and Mistral. This is textbook capital-intensive infrastructure, structurally identical to semiconductor fabs or cloud hyperscalers. The logical conclusion: foundation models become shared utilities, not enterprise moats.\u003c/p\u003e\n\u003cp\u003eThe industry has already internalized this. \u003ca href=\"https://a16z.com/ai-enterprise-2025/\"\u003e37% of enterprises now use five or more models in production\u003c/a\u003e, up from 29% the prior year. Different tasks demand different models: Claude for code and tool use, GPT for extended reasoning, Gemini Flash for low-latency routing, specialized models for image generation and embeddings. Betting your enterprise stack on a single model provider is the new version of single-cloud risk. Open standards like Anthropic\u0026rsquo;s \u003ca href=\"https://www.anthropic.com/news/model-context-protocol\"\u003eModel Context Protocol\u003c/a\u003e, now \u003ca href=\"https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation\"\u003ehosted by the Linux Foundation\u003c/a\u003e with 97 million monthly SDK downloads, and Google\u0026rsquo;s \u003ca href=\"https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/\"\u003eAgent-to-Agent protocol\u003c/a\u003e are making this multi-model enterprise AI architecture practical.\u003c/p\u003e\n\u003cp\u003eIf models are infrastructure, the differentiation question moves up the stack. And that\u0026rsquo;s where it gets interesting.\u003c/p\u003e\n\u003ch2 id=\"ii-the-enterprise-ai-context-layer-has-two-depths-and-most-people-only-see-the-first\"\u003eII. The enterprise AI context layer has two depths, and most people only see the first\u003c/h2\u003e\n\u003cp\u003eThis is the part of the thesis I find most intellectually compelling, and where I think the conventional understanding falls short.\u003c/p\u003e\n\u003cp\u003eMost enterprise AI efforts operate at what I\u0026rsquo;d call Layer 1 context: connecting data sources, indexing content, enforcing permissions, retrieving relevant documents. This is the RAG-era problem set: familiar, well-understood, and increasingly commoditized. Virtually every enterprise AI platform offers connectors, vector stores, and retrieval pipelines. It matters, but it\u0026rsquo;s not where defensibility lives.\u003c/p\u003e\n\u003cp\u003eLayer 2 is where the thesis gets genuinely novel: process-level understanding. Most enterprise knowledge systems capture decisions. What ends up in the CRM, the ticketing system, the ERP. But they don\u0026rsquo;t capture \u003cem\u003ehow\u003c/em\u003e those decisions were made: the meetings, Slack threads, document iterations, handoffs, and informal coordination that produced the recorded outcome. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-context-depth-comparison-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/context-depth-comparison.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/context-depth-comparison.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/context-depth-comparison.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/context-depth-comparison.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/context-depth-comparison.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/context-depth-comparison.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/context-depth-comparison.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/context-depth-comparison.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/context-depth-comparison.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/context-depth-comparison.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/context-depth-comparison.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/context-depth-comparison.png\"\n           alt=\"Enterprise AI context layer depth comparison showing what Systems of Record capture (decisions, states, entities, relationships) versus what Context Graphs capture (processes, temporal traces, causal structure, variability), with ML lens annotations mapping to labels versus feature space and trajectory data\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-context-depth-comparison-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/context-depth-comparison.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Enterprise AI context layer depth comparison showing what Systems of Record capture (decisions, states, entities, relationships) versus what Context Graphs capture (processes, temporal traces, causal structure, variability), with ML lens annotations mapping to labels versus feature space and trajectory data\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Through a machine learning lens, the distinction is sharp: systems of record give you labels. Context graphs give you the feature space and trajectory data you\u0026rsquo;d actually need to learn the decision boundary. Consider a concrete example. Your CRM records that Deal X closed at $500K. That\u0026rsquo;s a label. The context graph captures the 14 meetings, 3 stakeholder handoffs, the pricing negotiation pattern, and the competitive displacement sequence that produced that outcome. Those are the features and the trajectory. An agent trained on labels alone can\u0026rsquo;t replicate the process that generated them.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis is why so many early enterprise AI deployments produce outputs that are technically plausible but operationally useless. The agent has access to the what but not the how. It can retrieve the right documents but can\u0026rsquo;t reconstruct the reasoning process that a human would follow. Closing that gap, building systems that capture and encode process knowledge rather than just decision records, is the highest-value problem in enterprise AI right now.\u003c/p\u003e\n\u003ch2 id=\"iii-context-and-orchestration-form-a-compounding-flywheel\"\u003eIII. Context and orchestration form a compounding flywheel\u003c/h2\u003e\n\u003cp\u003eThere\u0026rsquo;s a reinforcement learning analogy here that I think is underappreciated. The orchestrator is the policy. The context graph is the learned world model. Agent traces are the trajectories. Every successful execution reinforces good patterns. Every failure surfaces where context is missing or stale. Over time, the system builds an increasingly accurate representation of how the organization actually operates.\u003c/p\u003e\n\u003cp\u003eAnd this loops back: more deployment produces richer traces, which improve the context graph, which improves agent decisions, which builds trust, which drives more deployment. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-compounding-flywheel-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/compounding-flywheel.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/compounding-flywheel.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/compounding-flywheel.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/compounding-flywheel.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compounding-flywheel.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/compounding-flywheel.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/compounding-flywheel.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/compounding-flywheel.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compounding-flywheel.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/compounding-flywheel.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/compounding-flywheel.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compounding-flywheel.png\"\n           alt=\"Organizational world model compounding flywheel showing the five-step loop: Agent Executes → Traces Captured → Context Improves → Better Decisions → More Deployment, with ML analogy mapping table showing enterprise concepts mapped to RL primitives (policy rollout, trajectories, world model update, policy improvement, online learning loop)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-compounding-flywheel-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/compounding-flywheel.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Organizational world model compounding flywheel showing the five-step loop: Agent Executes → Traces Captured → Context Improves → Better Decisions → More Deployment, with ML analogy mapping table showing enterprise concepts mapped to RL primitives (policy rollout, trajectories, world model update, policy improvement, online learning loop)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n This is the same compounding mechanism that makes recommendation engines and autonomous driving systems improve with scale. Netflix gets better at recommendations because every viewing session generates training signal. Waymo gets better at driving because every mile generates edge cases. The difference here is that the asset being built isn\u0026rsquo;t a product feature. It\u0026rsquo;s an organizational world model, a learned representation of how your specific company works.\u003c/p\u003e\n\u003cp\u003eAnd unlike model weights, which any well-funded lab can approximate, your organization\u0026rsquo;s accumulated process knowledge is genuinely unique. No one else has your meeting patterns, your escalation sequences, your informal decision-making topology. That\u0026rsquo;s a moat.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"where-this-breaks-and-why-the-agentic-ai-failure-rate-will-be-high\"\u003eWhere this breaks, and why the agentic AI failure rate will be high\u003c/h2\u003e\n\u003cp\u003e\u003ca href=\"https://www.uctoday.com/unified-communications/gartner-predicts-40-of-enterprise-apps-will-feature-ai-agents-by-2026/\"\u003eGartner predicts 40% of enterprise applications will feature task-specific AI agents by 2026\u003c/a\u003e, up from less than 5% in 2025. \u003ca href=\"https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai\"\u003eMcKinsey\u0026rsquo;s latest survey shows 23% of organizations are already scaling agentic AI\u003c/a\u003e, with another 39% experimenting. But Gartner also warns that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and unclear business value.\u003c/p\u003e\n\u003cp\u003eThe gap between ambition and execution is the context problem in disguise. Without process knowledge, agents produce plausible outputs that don\u0026rsquo;t match how the organization actually works. They retrieve the right policy document but apply it without understanding the exceptions your team has developed over years. They draft the right kind of email but miss the relationship dynamics that would change the tone. The failure mode isn\u0026rsquo;t that the model is bad. It\u0026rsquo;s that the context is shallow. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-lockin-vs-rebuild-scatter-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/lockin-vs-rebuild-scatter.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/lockin-vs-rebuild-scatter.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/lockin-vs-rebuild-scatter.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/lockin-vs-rebuild-scatter.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/lockin-vs-rebuild-scatter.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/lockin-vs-rebuild-scatter.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/lockin-vs-rebuild-scatter.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/lockin-vs-rebuild-scatter.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/lockin-vs-rebuild-scatter.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/lockin-vs-rebuild-scatter.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/lockin-vs-rebuild-scatter.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/lockin-vs-rebuild-scatter.png\"\n           alt=\"Enterprise AI agent stack scatter plot showing six layers plotted by lock-in risk versus rebuild difficulty. Context sits alone in the top-right danger zone with highest lock-in and hardest rebuild. Models, Interfaces, and Agents cluster in the commodity zone at bottom-left. Orchestration and Security occupy the middle.\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-lockin-vs-rebuild-scatter-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/lockin-vs-rebuild-scatter.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Enterprise AI agent stack scatter plot showing six layers plotted by lock-in risk versus rebuild difficulty. Context sits alone in the top-right danger zone with highest lock-in and hardest rebuild. Models, Interfaces, and Agents cluster in the commodity zone at bottom-left. Orchestration and Security occupy the middle.\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n This chart tells the strategic story in one image. Models, interfaces, and agents cluster in the commodity zone: low lock-in, easy to replace. Context sits alone in the danger zone: highest lock-in risk and hardest to rebuild. That\u0026rsquo;s exactly where your due diligence should concentrate.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"what-to-actually-do-about-your-agentic-ai-architecture\"\u003eWhat to actually do about your agentic AI architecture\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eDon\u0026rsquo;t go monolithic.\u003c/strong\u003e Each layer evolves at a different rate. Models improve quarterly, context infrastructure evolves over months, security requirements shift with regulation. Coupling them into one vendor\u0026rsquo;s all-in-one platform forces you to upgrade at the speed of the slowest-moving layer. You inherit their architectural bets, their integration timeline, their roadmap priorities. The history of enterprise software is littered with platforms that tried to own every layer and ended up mediocre at all of them.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInsist on interoperability.\u003c/strong\u003e MCP, A2A, open connectors. If your vendor doesn\u0026rsquo;t support open standards, you\u0026rsquo;re absorbing limitations you can\u0026rsquo;t see yet. The pace of AI innovation is faster than any prior technology cycle, and you need the ability to swap in new capabilities the moment they appear without rebuilding your stack. The organizations that locked into single-vendor cloud stacks in 2015 spent years migrating out. Don\u0026rsquo;t repeat that mistake at the agent layer.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTreat context as portable IP.\u003c/strong\u003e Your organizational world model (process knowledge, interaction history, learned workflow patterns) is the hardest-to-rebuild and most valuable asset in the stack. Ensure it is not locked to any single vendor or model provider. The right architecture separates accumulated context from the model layer so you retain your organizational IP regardless of which models or platforms you use tomorrow.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStart the flywheel early.\u003c/strong\u003e The compounding advantage in context accrues with deployment, not with time spent evaluating. Every agent execution generates organizational learning. Companies that wait to \u0026ldquo;see how it plays out\u0026rdquo; forfeit years of compounding to first movers. This isn\u0026rsquo;t speculative. It\u0026rsquo;s the same math that governs every data flywheel business. The question isn\u0026rsquo;t whether to start. It\u0026rsquo;s whether you can afford the cost of starting late.\u003c/p\u003e\n\u003cp\u003eThe stack will stratify. Specialists will outperform monoliths. Models will converge toward shared infrastructure. The defensible asset in enterprise AI is not the model. It\u0026rsquo;s the organizational world model. The organizations that start building it now, maintaining it carefully, and keeping it portable will compound their lead in the agent era. Everyone else will be buying commodity inference and wondering why their agents don\u0026rsquo;t work.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e AI capabilities evolve rapidly; information may become outdated. Code and implementations provided as-is without warranty.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"The enterprise AI agent stack is stratifying into six layers with different winners at each. Models commoditize; context — your organizational world model — compounds. A framework for agentic AI architecture decisions.","image":"https://static.philippdubach.com/ograph/ograph-enterprise-agent-stack1.jpg","date_published":"2026-02-10T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Article","word_count":1504,"reading_time_minutes":8,"keywords":["enterprise AI agent stack","agentic AI architecture enterprise","AI model commoditization","organizational world model","organizational context AI defensibility","multi-model AI strategy","enterprise AI context layer","agent orchestration","AI governance enterprise","context graph enterprise AI","multi-agent systems enterprise","AI agent stack defensibility"],"section":"posts"}},{"id":"https://philippdubach.com/posts/where-mobile-money-goes-now/","url":"https://philippdubach.com/posts/where-mobile-money-goes-now/","title":"Where Mobile Money Goes Now","content_html":"\u003cp\u003eSensor Tower\u0026rsquo;s \u003ca href=\"https://sensortower.com/state-of-mobile-2026\"\u003eState of Mobile 2026\u003c/a\u003e report confirms what had been building for years: the mobile app economy has permanently shifted. For the first decade of mobile, games made more money than everything else combined. Clash of Clans and Candy Crush built empires on freemium. King went public. Supercell sold for $10 billion. That changed in 2025.\u003c/p\u003e\n\u003ch2 id=\"apps-overtake-games-in-mobile-revenue\"\u003eApps Overtake Games in Mobile Revenue\u003c/h2\u003e\n\u003cp\u003e\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-apps_vs_games_revenue-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/apps_vs_games_revenue.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/apps_vs_games_revenue.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/apps_vs_games_revenue.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/apps_vs_games_revenue.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/apps_vs_games_revenue.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/apps_vs_games_revenue.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/apps_vs_games_revenue.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/apps_vs_games_revenue.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/apps_vs_games_revenue.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/apps_vs_games_revenue.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/apps_vs_games_revenue.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/apps_vs_games_revenue.png\"\n           alt=\"Line chart showing apps overtaking games in mobile IAP revenue in 2025, with apps at $85.6B and games at $81.8B, per Sensor Tower\u0026#39;s State of Mobile 2026\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-apps_vs_games_revenue-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/apps_vs_games_revenue.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Line chart showing apps overtaking games in mobile IAP revenue in 2025, with apps at $85.6B and games at $81.8B, per Sensor Tower\u0026#39;s State of Mobile 2026\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Non-game applications now generate more in-app purchase revenue than games. Apps crossed $85.6 billion in 2025, up 21% year-over-year. Games managed $81.8 billion, barely moving from the year before.\u003c/p\u003e\n\u003cp\u003eGames peaked in 2021 and flatlined. Apps kept compounding. Subscriptions, which seemed like a novelty in 2018, became the dominant mobile monetization model for cloud storage, language learning, and now AI.\u003c/p\u003e\n\u003ch2 id=\"genai-the-35-billion-growth-engine\"\u003eGenAI: The $3.5 Billion Growth Engine\u003c/h2\u003e\n\u003cp\u003e\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-genai_revenue_growth-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/genai_revenue_growth.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/genai_revenue_growth.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/genai_revenue_growth.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/genai_revenue_growth.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_revenue_growth.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/genai_revenue_growth.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/genai_revenue_growth.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/genai_revenue_growth.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_revenue_growth.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/genai_revenue_growth.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/genai_revenue_growth.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_revenue_growth.png\"\n           alt=\"Horizontal bar chart showing GenAI led mobile app revenue growth in 2025 with $3.5B added, more than any other category\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-genai_revenue_growth-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/genai_revenue_growth.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart showing GenAI led mobile app revenue growth in 2025 with $3.5B added, more than any other category\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Generative AI was the biggest contributor to consumer spending on mobile apps. The category added $3.5 billion in IAP revenue in 2025, more than Movies \u0026amp; TV ($2.2B) or Social Media ($2.1B). It went from near-zero in 2022 to the top growth category in three years. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-genai_rise-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/genai_rise.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/genai_rise.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/genai_rise.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/genai_rise.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_rise.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/genai_rise.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/genai_rise.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/genai_rise.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_rise.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/genai_rise.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/genai_rise.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_rise.png\"\n           alt=\"Combined bar and line chart showing GenAI app downloads rising from 0.05B in 2021 to 1.45B in 2024, with revenue hitting $1.25B\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-genai_rise-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/genai_rise.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Combined bar and line chart showing GenAI app downloads rising from 0.05B in 2021 to 1.45B in 2024, with revenue hitting $1.25B\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n GenAI apps went from 50 million downloads in 2021 to 1.45 billion in 2024. Revenue jumped from essentially nothing to $1.25 billion. ChatGPT alone accounts for 40% of the category\u0026rsquo;s consumer spend. This is just in-app purchases and does not count subscriptions billed outside the app store or enterprise contracts.\u003c/p\u003e\n\u003ch2 id=\"who-actually-uses-ai-apps\"\u003eWho Actually Uses AI Apps\u003c/h2\u003e\n\u003cp\u003eThe demographics are interesting: AI app users look nothing like the broader internet population. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-genai_demographics-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/genai_demographics.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/genai_demographics.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/genai_demographics.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/genai_demographics.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_demographics.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/genai_demographics.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/genai_demographics.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/genai_demographics.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_demographics.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/genai_demographics.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/genai_demographics.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/genai_demographics.png\"\n           alt=\"Scatter plot showing GenAI user demographics cluster with Reddit and X (young, male-skewing), not Instagram or Pinterest\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-genai_demographics-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/genai_demographics.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Scatter plot showing GenAI user demographics cluster with Reddit and X (young, male-skewing), not Instagram or Pinterest\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n GenAI users cluster with Reddit and X. Young, male, tech-adjacent. They look nothing like Instagram (young women) or Pinterest (older women) or even Facebook (everyone\u0026rsquo;s parents). The AI audience is still a niche, even as GenAI app revenue scales.\u003c/p\u003e\n\u003ch2 id=\"the-ai-advertising-playbook\"\u003eThe AI Advertising Playbook\u003c/h2\u003e\n\u003cp\u003eThis explains where AI companies advertise: \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai_advertising_skew-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai_advertising_skew.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai_advertising_skew.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai_advertising_skew.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai_advertising_skew.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_advertising_skew.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai_advertising_skew.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai_advertising_skew.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai_advertising_skew.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_advertising_skew.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai_advertising_skew.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai_advertising_skew.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_advertising_skew.png\"\n           alt=\"Horizontal bar chart showing AI companies over-index on LinkedIn (\u0026#43;45%) and under-index on Pinterest (-13%) and YouTube (-9%) for ad demographics\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai_advertising_skew-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai_advertising_skew.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart showing AI companies over-index on LinkedIn (\u0026#43;45%) and under-index on Pinterest (-13%) and YouTube (-9%) for ad demographics\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n LinkedIn gets 45% more GenAI ad impressions than its share of the general population would suggest. Pinterest and YouTube get less. The AI advertising playbook is simple: find professionals, not consumers.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"ai-driven-retail-referral-traffic\"\u003eAI-Driven Retail Referral Traffic\u003c/h2\u003e\n\u003cp\u003eOne place where AI has found consumers: shopping. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-ai_retail_referrals-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/ai_retail_referrals.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/ai_retail_referrals.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/ai_retail_referrals.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/ai_retail_referrals.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_retail_referrals.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/ai_retail_referrals.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/ai_retail_referrals.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/ai_retail_referrals.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_retail_referrals.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/ai_retail_referrals.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/ai_retail_referrals.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/ai_retail_referrals.png\"\n           alt=\"Stacked area chart showing GenAI referral traffic to major retailers growing from ~$5M to ~$51M between Oct 2024 and Dec 2025\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-ai_retail_referrals-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/ai_retail_referrals.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Stacked area chart showing GenAI referral traffic to major retailers growing from ~$5M to ~$51M between Oct 2024 and Dec 2025\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Referral traffic from AI tools to major retailers grew roughly 7x between October 2024 and December 2025. People are asking ChatGPT what to buy, and then buying it. Amazon captures the largest share, but Walmart, Target, and Home Depot have all seen triple-digit percentage growth in AI-driven traffic. Still less than 1% of total retail traffic. But growing fast.\u003c/p\u003e\n\u003ch2 id=\"youtubes-cross-generational-dominance\"\u003eYouTube\u0026rsquo;s Cross-Generational Dominance\u003c/h2\u003e\n\u003cp\u003eOne pattern stands out: \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-youtube_dominance-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/youtube_dominance.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/youtube_dominance.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/youtube_dominance.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/youtube_dominance.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/youtube_dominance.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/youtube_dominance.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/youtube_dominance.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/youtube_dominance.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/youtube_dominance.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/youtube_dominance.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/youtube_dominance.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/youtube_dominance.png\"\n           alt=\"Table showing YouTube is the #1 app across every age group in the US (18-24, 25-34, 35-44, 45\u0026#43;) per Sensor Tower\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-youtube_dominance-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/youtube_dominance.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Table showing YouTube is the #1 app across every age group in the US (18-24, 25-34, 35-44, 45\u0026#43;) per Sensor Tower\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n YouTube is the top app across every age demographic. Every single one. 18-24, 25-34, 35-44, 45+. No other app has achieved this. Not TikTok (appears for youngest and oldest, vanishes in the middle). Not Instagram (fades with age). Not Facebook (rises with age). YouTube alone spans generations.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch2 id=\"waymos-quiet-expansion\"\u003eWaymo\u0026rsquo;s Quiet Expansion\u003c/h2\u003e\n\u003cp\u003eFinally, Waymo: \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-waymo_penetration-png-9\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/waymo_penetration.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/waymo_penetration.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/waymo_penetration.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/waymo_penetration.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/waymo_penetration.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/waymo_penetration.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/waymo_penetration.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/waymo_penetration.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/waymo_penetration.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/waymo_penetration.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/waymo_penetration.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/waymo_penetration.png\"\n           alt=\"Line chart showing Waymo\u0026#39;s autonomous ride-hailing penetration of Lyft and Uber users rising to ~4% and ~3% respectively by Q4 2025\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-waymo_penetration-png-9\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/waymo_penetration.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Line chart showing Waymo\u0026#39;s autonomous ride-hailing penetration of Lyft and Uber users rising to ~4% and ~3% respectively by Q4 2025\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Waymo accounts for about 4% of Lyft users and 3% of Uber users nationally, despite operating in only a handful of cities. In its active markets (San Francisco, Phoenix), market share is closer to 15%. The company has driven 127 million autonomous miles and tripled its ride volume to 15 million trips in 2025.\u003c/p\u003e\n\u003cp\u003eMobile is no longer a platform question. It is a distribution question. The app economy winners so far: AI companies targeting professionals, YouTube serving everyone, and autonomous vehicles growing quietly in the background.\u003c/p\u003e\n","summary":"Apps overtook games in mobile IAP revenue for the first time in 2025, driven by $3.5B in GenAI growth. Analysis of Sensor Tower's State of Mobile 2026 report.","image":"https://static.philippdubach.com/ograph/ograph-state-of-mobile.jpg","date_published":"2026-02-07T00:00:00Z","date_modified":"2026-03-15T11:43:29+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Analysis","word_count":569,"reading_time_minutes":3,"keywords":["state of mobile 2026","mobile app revenue vs games 2025","GenAI app revenue 2025","ChatGPT app revenue","AI app demographics","Sensor Tower state of mobile","mobile gaming revenue decline","app economy trends 2025","GenAI consumer spending","AI app adoption","YouTube most popular app all ages","mobile in-app purchase revenue","AI advertising demographics","Waymo ride volume 2025","mobile app subscription revenue","autonomous ride-hailing growth","AI referral traffic retail","cross-generational app usage","mobile monetization trends","consumer spending mobile apps"],"section":"posts"}},{"id":"https://philippdubach.com/posts/variance-tax/","url":"https://philippdubach.com/posts/variance-tax/","title":"Variance Tax","content_html":"\u003cp\u003eLet\u0026rsquo;s say your portfolio returned +60% in 2024, then fell 40% in 2025. That\u0026rsquo;s an annualized average return of +10%. Actual return after two years: minus 4% (i.e $100 * 1.6 * 0.6 = $96).\u003c/p\u003e\n\u003cp\u003eThat 14-point gap is what we call the variance tax aka \u003ca href=\"https://www.bogleheads.org/wiki/Variance_drain\"\u003evariance drain\u003c/a\u003e or volatility drag and it\u0026rsquo;s one of the least intuitive forces in investing.\u003c/p\u003e\n\u003cp\u003eTake any series of returns with arithmetic mean μ and volatility σ. The compound growth rate, the one that actually determines your wealth, is approximately:\u003c/p\u003e\n$$G ≈ μ − ½σ²$$\u003cp\u003eThis comes from a \u003ca href=\"https://en.wikipedia.org/wiki/Taylor%27s_theorem#Example\"\u003esecond-order Taylor expansion\u003c/a\u003e of ln(1+r). Take expectations, and the mean log return equals the arithmetic mean minus half the variance. Everything else drops out. Half the variance. That is the tax. The same correction term appears when you solve \u003ca href=\"https://en.wikipedia.org/wiki/Geometric_Brownian_motion\"\u003egeometric Brownian motion\u003c/a\u003e via \u003ca href=\"https://en.wikipedia.org/wiki/It%C3%B4%27s_lemma\"\u003eItô\u0026rsquo;s lemma\u003c/a\u003e (the drift of log(S) is μ − σ²/2, not μ) so whether you come at it from discrete compounding or continuous-time stochastic calculus, you land in the same place. And because it is quadratic, doubling volatility does not double the cost. It quadruples it. And what we learned during covid, if anything at all, is that we generally \u003ca href=\"https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242839\"\u003ehave a hard time to mentally abstract exponential growth\u003c/a\u003e rates.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-variance_drain_by_vol-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/variance_drain_by_vol.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/variance_drain_by_vol.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/variance_drain_by_vol.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/variance_drain_by_vol.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_drain_by_vol.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/variance_drain_by_vol.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/variance_drain_by_vol.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/variance_drain_by_vol.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_drain_by_vol.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/variance_drain_by_vol.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/variance_drain_by_vol.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_drain_by_vol.png\"\n           alt=\"Chart showing the variance tax as a quadratic curve ½σ², with labeled data points for Bonds (5% vol, 0.1% drain), S\u0026amp;P 500 (16%, 1.3%), Nasdaq (22%, 2.4%), Emerging Markets (25%, 3.1%), 2x Leveraged S\u0026amp;P (32%, 5.1%), 3x Leveraged S\u0026amp;P (48%, 11.5%), and Bitcoin (60%, 18%)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-variance_drain_by_vol-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/variance_drain_by_vol.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Chart showing the variance tax as a quadratic curve ½σ², with labeled data points for Bonds (5% vol, 0.1% drain), S\u0026amp;P 500 (16%, 1.3%), Nasdaq (22%, 2.4%), Emerging Markets (25%, 3.1%), 2x Leveraged S\u0026amp;P (32%, 5.1%), 3x Leveraged S\u0026amp;P (48%, 11.5%), and Bitcoin (60%, 18%)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Treasury bonds at 5% vol pay about 0.1% per year in variance drain. Barely noticeable. The S\u0026amp;P 500 at 16% vol pays 1.3%. A 3x leveraged ETF at 48% vol pays 11.5%. \u003ca href=\"https://people.bu.edu/jacquier/papers/geom.faj0312.pdf\"\u003eJacquier, Kane, and Marcus (2003)\u003c/a\u003e studied S\u0026amp;P 500 returns from 1926 to 2001: arithmetic mean 12.49%, geometric mean 10.51%. The gap is 1.98 percentage points. The formula predicts ½ × 0.203² = 2.06%. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-variance_table-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/variance_table.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/variance_table.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/variance_table.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/variance_table.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_table.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/variance_table.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/variance_table.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/variance_table.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_table.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/variance_table.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/variance_table.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/variance_table.png\"\n           alt=\"Table showing variance drain by asset class: US Bonds (5% vol, 0.1% drain), S\u0026amp;P 500 (16% vol, 1.3% drain), Nasdaq (22% vol, 2.4% drain), Emerging Markets (25% vol, 3.1% drain), 2x Leveraged S\u0026amp;P (32% vol, 5.1% drain), 3x Leveraged S\u0026amp;P (48% vol, 11.5% drain)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-variance_table-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/variance_table.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Table showing variance drain by asset class: US Bonds (5% vol, 0.1% drain), S\u0026amp;P 500 (16% vol, 1.3% drain), Nasdaq (22% vol, 2.4% drain), Emerging Markets (25% vol, 3.1% drain), 2x Leveraged S\u0026amp;P (32% vol, 5.1% drain), 3x Leveraged S\u0026amp;P (48% vol, 11.5% drain)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Looking at the last row, we see that tripling leverage triples the arithmetic return but delivers nearly the same compound return as 2x. The linear gain gets eaten by the quadratic penalty. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-compound_wealth_growth-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/compound_wealth_growth.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/compound_wealth_growth.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/compound_wealth_growth.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/compound_wealth_growth.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compound_wealth_growth.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/compound_wealth_growth.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/compound_wealth_growth.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/compound_wealth_growth.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compound_wealth_growth.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/compound_wealth_growth.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/compound_wealth_growth.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/compound_wealth_growth.png\"\n           alt=\"Line chart showing $100 invested at 10% arithmetic return over 30 years at four volatility levels: 0% vol reaches $1,745, 15% vol reaches $1,280, 30% vol reaches $498, and 50% vol loses most of the original investment\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-compound_wealth_growth-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/compound_wealth_growth.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Line chart showing $100 invested at 10% arithmetic return over 30 years at four volatility levels: 0% vol reaches $1,745, 15% vol reaches $1,280, 30% vol reaches $498, and 50% vol loses most of the original investment\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Same 10% arithmetic return, different volatility. After 30 years, the zero-volatility path reaches $1,745. At 15% vol, $1,280. At 30%, $498. At 50% vol you have lost more than half your money despite averaging +10% per year.\u003c/p\u003e\n\u003cp\u003eNow apply leverage. If you lever an asset by factor L, the arithmetic return scales linearly (Lμ) but the variance drain scales quadratically (½L²σ²). The compound return becomes:\u003c/p\u003e\n$$G(L) ≈ r + L(μ − r) − ½L²σ²$$\u003cp\u003eTake the derivative, set to zero. The leverage that maximizes compound wealth:\u003c/p\u003e\n$$L^{\\ast} = (μ − r) / σ²$$\n\n\n\n\n\n\n\n\u003cp\u003eFor the S\u0026amp;P 500 with roughly 7% excess return and 16% vol, L* comes out to about 2.7x.\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-leverage_curve-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/leverage_curve.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/leverage_curve.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/leverage_curve.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/leverage_curve.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/leverage_curve.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/leverage_curve.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/leverage_curve.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/leverage_curve.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/leverage_curve.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/leverage_curve.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/leverage_curve.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/leverage_curve.png\"\n           alt=\"The leverage curve for S\u0026amp;P 500 parameters showing compound return peaking at Kelly optimal leverage L*=2.7x, with labeled points at 1x, 2x, and 3x leverage. Returns decline beyond the Kelly optimum and eventually turn negative\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-leverage_curve-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/leverage_curve.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"The leverage curve for S\u0026amp;P 500 parameters showing compound return peaking at Kelly optimal leverage L*=2.7x, with labeled points at 1x, 2x, and 3x leverage. Returns decline beyond the Kelly optimum and eventually turn negative\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n This is the \u003ca href=\"https://en.wikipedia.org/wiki/Kelly_criterion\"\u003eKelly criterion\u003c/a\u003e (\u003cem\u003ewhich you might know from utility theory or gambling heuristics but in fact, as we see here, it falls straight out of the variance tax formula.\u003c/em\u003e) Beyond Kelly, every dollar of additional leverage costs more in variance drain than it earns in expected return. The curve bends over and eventually goes negative. In practice, most practitioners use \u0026ldquo;half-Kelly\u0026rdquo; — sizing positions at L*/2 — because the formula assumes you know μ and σ precisely, and you don\u0026rsquo;t. Estimation error in either parameter can push you past the peak and onto the losing side of the curve. Half-Kelly sacrifices roughly 25% of the theoretical growth rate but dramatically reduces drawdown risk.\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-UPRO_factsheet-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/UPRO_factsheet.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/UPRO_factsheet.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/UPRO_factsheet.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/UPRO_factsheet.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/UPRO_factsheet.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/UPRO_factsheet.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/UPRO_factsheet.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/UPRO_factsheet.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/UPRO_factsheet.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/UPRO_factsheet.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/UPRO_factsheet.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/UPRO_factsheet.png\"\n           alt=\"Extract of ProShares UltraPro S\u0026amp;P 500 Factsheet Total Return\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-UPRO_factsheet-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/UPRO_factsheet.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Extract of ProShares UltraPro S\u0026amp;P 500 Factsheet Total Return\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nYou can see this play out in practice. \u003ca href=\"https://www.proshares.com/our-etfs/leveraged-and-inverse/upro\"\u003eProShares UPRO\u003c/a\u003e, the 3x S\u0026amp;P 500 ETF, has returned roughly 28% annualized over the past decade during one of the strongest bull markets in history. The S\u0026amp;P 500 compounded at about 10% over the same period. Linear 3x leverage would imply roughly 30%. Variance drain accounts for the gap, and that was in a favorable environment. In 2022, when the S\u0026amp;P fell about 19%, UPRO dropped 70%. The effect is even starker in higher-volatility underlyings: \u003ca href=\"https://www.proshares.com/our-etfs/leveraged-and-inverse/tqqq\"\u003eProShares TQQQ\u003c/a\u003e, the 3x Nasdaq-100 ETF, sat roughly flat from its 2021 highs through early 2025 while the unlevered QQQ had long since recovered — a textbook case of variance drain overwhelming the leverage premium in a choppy market.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe same half-sigma-squared shows up across finance. It is why stock prices follow \u003ca href=\"https://en.wikipedia.org/wiki/Log-normal_distribution\"\u003elog-normal distributions\u003c/a\u003e, not normal ones. Why put options cost more than equidistant calls. Why the \u003ca href=\"https://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model\"\u003eBlack-Scholes\u003c/a\u003e d₁ and d₂ terms carry a ½σ²t adjustment. Why a $100 stock\u0026rsquo;s true geometric midpoint between $150 up and $50 down is not $100 but $86.60, because ln(150/100) = ln(100/66.67). Wherever returns compound and volatility is nonzero, the variance tax is being collected.\u003c/p\u003e\n","summary":"Variance drain is the hidden cost of volatility: why a portfolio averaging +10% can lose money. The ½σ² formula explains the gap between paper and real returns.","image":"https://static.philippdubach.com/ograph/ograph-variance-tax.jpg","date_published":"2026-02-06T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Essay","word_count":738,"reading_time_minutes":4,"keywords":["variance drain","volatility drag","geometric vs arithmetic returns","leveraged ETF decay","Kelly criterion"],"section":"posts"}},{"id":"https://philippdubach.com/posts/claude-opus-4.6-anthropics-new-flagship-ai-model-for-agentic-coding/","url":"https://philippdubach.com/posts/claude-opus-4.6-anthropics-new-flagship-ai-model-for-agentic-coding/","title":"Claude Opus 4.6: Anthropic's New Flagship AI Model for Agentic Coding","content_html":"\u003cp\u003eAnthropic just released Claude Opus 4.6, the latest frontier AI model in the Claude family. It\u0026rsquo;s a big upgrade over Opus 4.5 and probably the most agentic-focused LLM release from any lab this year.\u003c/p\u003e\n\u003cp\u003eKey upgrades: better agentic AI coding capabilities (plans more carefully, sustains longer tasks, catches its own mistakes), a 1M token context window (a first for Opus-class models), and 128K output tokens. Pricing holds at $5/$25 per million tokens.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-claude46-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/claude46.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/claude46.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/claude46.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/claude46.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude46.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/claude46.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/claude46.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/claude46.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude46.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/claude46.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/claude46.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/claude46.png\"\n           alt=\"Claude Opus 4.6 release announcement on claude.ai showing the new flagship model from Anthropic\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-claude46-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/claude46.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Claude Opus 4.6 release announcement on claude.ai showing the new flagship model from Anthropic\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"llm-benchmark-results-how-claude-opus-46-compares\"\u003eLLM Benchmark Results: How Claude Opus 4.6 Compares\u003c/h3\u003e\n\u003cp\u003eThe benchmark numbers are strong across the board. Opus 4.6 hits state-of-the-art on Terminal-Bench 2.0 (65.4% for agentic coding in the terminal), Humanity\u0026rsquo;s Last Exam (complex multidisciplinary reasoning), and BrowseComp (agentic web search). It beats GPT-5.2 by roughly 144 Elo points on GDPval-AA, the benchmark that measures real-world knowledge work across 44 professional occupations.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-opus46-elo-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/opus46-elo.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/opus46-elo.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/opus46-elo.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/opus46-elo.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-elo.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/opus46-elo.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/opus46-elo.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/opus46-elo.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-elo.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/opus46-elo.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/opus46-elo.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-elo.png\"\n           alt=\"GDPval-AA Elo benchmark comparison chart: Claude Opus 4.6 at 1,606 Elo vs GPT-5.2 at 1,462 Elo vs Claude Opus 4.5 at 1,416 Elo for real-world knowledge work\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-opus46-elo-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/opus46-elo.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"GDPval-AA Elo benchmark comparison chart: Claude Opus 4.6 at 1,606 Elo vs GPT-5.2 at 1,462 Elo vs Claude Opus 4.5 at 1,416 Elo for real-world knowledge work\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe standout is ARC-AGI-2, which tests abstract reasoning on problems easy for humans but hard for AI. Opus 4.6 scores 68.8%, a dramatic leap from Opus 4.5\u0026rsquo;s 37.6%. For comparison, GPT-5.2 scores 54.2% and Gemini 3 Pro hits 45.1%. That gap matters because ARC-AGI-2 resists memorization — it measures whether models can actually generalize.\u003c/p\u003e\n\u003cp\u003eOn coding-specific evaluations, Terminal Bench 2.0 rises to 65.4% (from 59.8% for Opus 4.5), and OSWorld for agentic computer use jumps from 66.3% to 72.7%, putting Opus ahead of both GPT-5.2 and Gemini 3 Pro on those particular tests. SWE-bench Verified shows a small regression — worth watching, though the model excels on the benchmarks that better reflect real production work.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-opus46-benchmarks-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/opus46-benchmarks.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/opus46-benchmarks.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/opus46-benchmarks.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/opus46-benchmarks.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-benchmarks.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/opus46-benchmarks.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/opus46-benchmarks.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/opus46-benchmarks.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-benchmarks.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/opus46-benchmarks.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/opus46-benchmarks.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-benchmarks.png\"\n           alt=\"Claude Opus 4.6 LLM benchmark comparison: SOTA on Terminal-Bench 2.0, Humanity\u0026#39;s Last Exam, BrowseComp, and GDPval-AA with 90.2% on BigLaw Bench\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-opus46-benchmarks-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/opus46-benchmarks.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Claude Opus 4.6 LLM benchmark comparison: SOTA on Terminal-Bench 2.0, Humanity\u0026#39;s Last Exam, BrowseComp, and GDPval-AA with 90.2% on BigLaw Bench\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003ch3 id=\"what-can-you-do-with-a-1-million-token-context-window\"\u003eWhat Can You Do With a 1 Million Token Context Window?\u003c/h3\u003e\n\u003cp\u003eThe 1M context window paired with the new context compaction feature is the upgrade that matters most in practice. To put it in perspective: 1M tokens covers roughly 750 novels, an entire enterprise codebase of several thousand files, or a full legal discovery set — processed in a single prompt.\u003c/p\u003e\n\u003cp\u003eCompaction automatically summarizes older context when approaching limits, which means agents can theoretically run indefinitely without hitting the wall that\u0026rsquo;s plagued long-running AI tasks. Combined with the model\u0026rsquo;s improved ability to catch its own mistakes through better code review and debugging, you\u0026rsquo;re looking at agents that can actually finish what they start.\u003c/p\u003e\n\u003cp\u003eThe long-context retrieval jump tells the story. On MRCR v2, which tests whether a model can find and reason over specific facts buried in massive prompts, Opus 4.6 scores 76% compared to Sonnet 4.5\u0026rsquo;s 18.5%. That\u0026rsquo;s not an incremental improvement — it\u0026rsquo;s a different capability class.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-opus46-context-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/opus46-context.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/opus46-context.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/opus46-context.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/opus46-context.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-context.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/opus46-context.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/opus46-context.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/opus46-context.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-context.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/opus46-context.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/opus46-context.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/opus46-context.png\"\n           alt=\"Long-context retrieval benchmark: Claude Opus 4.6 scores 76% vs Claude Sonnet 4.5 at 18.5% on MRCR v2 needle-in-a-haystack reasoning test\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-opus46-context-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/opus46-context.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Long-context retrieval benchmark: Claude Opus 4.6 scores 76% vs Claude Sonnet 4.5 at 18.5% on MRCR v2 needle-in-a-haystack reasoning test\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThat said, bigger context doesn\u0026rsquo;t automatically mean better. Research from Factory.ai and others shows attention degrades across very long sequences, and prefill latency at 1M tokens can exceed two minutes before you get your first output token. The premium pricing tier for prompts exceeding 200K tokens ($10/$37.50) reflects this cost — Anthropic isn\u0026rsquo;t subsidizing power users anymore. The real question for enterprise deployments is whether stuffing your entire codebase into context beats a well-designed RAG pipeline. The answer, as usual, depends on the use case.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch3 id=\"agentic-ai-coding-agent-teams-and-claude-code-updates\"\u003eAgentic AI Coding: Agent Teams and Claude Code Updates\u003c/h3\u003e\n\u003cp\u003eThe headline numbers impress, but the real story is the agentic focus. Anthropic isn\u0026rsquo;t just making Claude smarter. They\u0026rsquo;re making it more useful for the actual work people want AI to do: sustained, multi-step tasks in large codebases.\u003c/p\u003e\n\u003cp\u003eNew API features reinforce this direction: adaptive thinking lets the model decide when to reason deeper based on contextual cues, effort controls give developers fine-grained tradeoffs between intelligence, speed, and cost (low/medium/high/max), and context compaction keeps long-running agents within limits without manual intervention.\u003c/p\u003e\n\u003cp\u003eClaude Code gets the headline feature: \u003cstrong\u003eAgent Teams\u003c/strong\u003e that work in parallel. Multiple subagents can coordinate autonomously on read-heavy work like codebase reviews, with each agent handling a different branch via git worktrees before merging back. This ships as a research preview, but it\u0026rsquo;s clearly aimed at the production workflows where agentic coding tools like Cursor, GitHub Copilot, and OpenAI\u0026rsquo;s Codex are competing hard. The timing isn\u0026rsquo;t accidental — Apple just announced Xcode 26.3 with native support for Claude Agent and OpenAI\u0026rsquo;s Codex via MCP (Model Context Protocol), making agentic coding a standard part of the developer toolchain rather than an experiment.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003ch3 id=\"enterprise-deployment-why-gdpval-aa-matters\"\u003eEnterprise Deployment: Why GDPval-AA Matters\u003c/h3\u003e\n\u003cp\u003eThe GDPval-AA benchmark matters because it measures performance on real-world knowledge work — not toy problems or academic puzzles. Beating GPT-5.2 by 144 Elo points (and Opus 4.5 by 190) suggests meaningful improvements in the tasks that matter for enterprise AI adoption: financial analysis, legal reasoning, and multi-step professional workflows.\u003c/p\u003e\n\u003cp\u003eThe product expansions signal where Anthropic sees the market going. Claude in Excel now handles long-running tasks and unstructured data. Claude in PowerPoint reads layouts and slide masters for brand consistency. These aren\u0026rsquo;t research demos — they\u0026rsquo;re enterprise-ready integrations designed for knowledge workers who need AI that fits into existing toolchains.\u003c/p\u003e\n\u003cp\u003eFor teams evaluating which frontier model to standardize on, the picture is nuanced. Claude Opus 4.6 leads on agentic coding and enterprise knowledge work. GPT-5.2 still holds advantages in abstract reasoning (ARC-AGI-2, though the gap narrowed significantly) and math. Gemini 3 Pro offers the best cost efficiency and multimodal processing with its own 1M context window. The multi-model workflow trend is real — the smartest enterprise teams aren\u0026rsquo;t picking one model; they\u0026rsquo;re routing tasks to whichever model handles them best.\u003c/p\u003e\n\u003ch3 id=\"safety-profile-and-the-zero-day-question\"\u003eSafety Profile and the Zero-Day Question\u003c/h3\u003e\n\u003cp\u003eOne detail worth noting: the safety profile. Anthropic claims Opus 4.6 is \u0026ldquo;just as well-aligned as Opus 4.5, which was the most-aligned frontier model to date.\u0026rdquo; Given the enhanced cybersecurity capabilities — Opus 4.6 independently discovered over 500 previously unknown zero-day vulnerabilities in open-source code during Anthropic\u0026rsquo;s pre-release testing — they developed six new detection probes specifically for this release.\u003c/p\u003e\n\u003cp\u003eWhether that\u0026rsquo;s reassuring or concerning depends on your priors about AI capabilities research. The vulnerabilities ranged from system-crashing bugs to memory corruption flaws in widely-used tools like GhostScript and OpenSC. As Logan Graham, head of Anthropic\u0026rsquo;s frontier red team, put it: it\u0026rsquo;s a race between defenders and attackers, and Anthropic wants defenders to have the tools first.\u003c/p\u003e\n\u003ch3 id=\"what-this-means-for-the-competitive-landscape\"\u003eWhat This Means for the Competitive Landscape\u003c/h3\u003e\n\u003cp\u003eThe competitive picture just got more interesting. GPT-5.2 and Gemini 3 Pro now have a new benchmark to chase, and Anthropic has clearly staked its claim on agentic coding as the primary battleground. With pricing unchanged at $5/$25 per million tokens — significantly more expensive than GPT-5.2 at $2/$10 but competitive for the performance tier — the value proposition comes down to whether the agentic improvements translate to fewer retries, less hand-holding, and faster task completion in your specific workflow.\u003c/p\u003e\n\u003cp\u003eFor developers, the move is straightforward: swap in \u003ccode\u003eclaude-opus-4-6\u003c/code\u003e via the API and test it on your hardest tasks. For enterprise decision makers, the GDPval-AA results and Agent Teams feature are worth a serious evaluation cycle. The model is available now on claude.ai, the API, and all major cloud platforms (AWS Bedrock, Azure Foundry, GCP Vertex AI).\u003c/p\u003e\n","summary":"Claude Opus 4.6 brings a 1M token context window, 68.8% ARC-AGI-2, and Agent Teams to Claude Code. Full benchmark comparison vs GPT-5.2 and Gemini 3 Pro with pricing analysis.","image":"https://static.philippdubach.com/ograph/ograph-opus46.jpg","date_published":"2026-02-05T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Commentary","word_count":1162,"reading_time_minutes":6,"keywords":["Claude Opus 4.6","Anthropic Claude","LLM benchmarks comparison","agentic AI coding","1M context window LLM","Claude Code Agent Teams","GPT-5.2 vs Claude Opus","frontier AI model 2026","Terminal Bench 2.0","enterprise AI deployment"],"section":"posts"}},{"id":"https://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/","url":"https://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/","title":"Buying the Haystack Might Not Work This Year","content_html":"\u003cp\u003eI\u0026rsquo;ve been reading the January 2026 state of markets reports from \u003ca href=\"https://docs.google.com/presentation/d/e/2PACX-1vQXsMMv5ZCWm77za7oXJcz1X-Th5Mz15g5nYBxbUjnomStVcjn8lXPjE5LzAlvc_hg4yHKgwASWLo5a/pub?start=false\u0026amp;loop=false\u0026amp;delayms=3000\u0026amp;slide=id.g3b6e2578ab2_8_4858\"\u003eAndreessen Horowitz\u003c/a\u003e and \u003ca href=\"https://www.aqr.com/Insights/Research/Alternative-Thinking/2026-Capital-Market-Assumptions-for-Major-Asset-Classes\"\u003eAQR\u003c/a\u003e, and their conclusions on the AI bubble question in 2026 are almost impossible to reconcile.\u003c/p\u003e\n\u003cp\u003eThe a16z view is straightforward: AI fundamentals are real, and current prices reflect that reality. Their evidence is compelling. The top 50 private AI companies now generate \u003cstrong\u003e$40.6 billion in annual revenue\u003c/strong\u003e. Companies like ElevenLabs and Cursor are hitting $100 million ARR faster than Slack or Twilio ever did. GPUs are running at \u003cstrong\u003e80% utilization\u003c/strong\u003e, compared to the 7% utilization rate for fiber optic cables during the dotcom bubble. This isn\u0026rsquo;t speculation, they argue. It\u0026rsquo;s demand exceeding supply.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-gpu-utilization-vs-fiber-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-gpu-utilization-vs-fiber.png\"\n           alt=\"GPU utilization at 80% in AI datacenters compared to just 7% fiber optic cable utilization during the early 2000s dotcom bubble\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-gpu-utilization-vs-fiber-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-gpu-utilization-vs-fiber.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"GPU utilization at 80% in AI datacenters compared to just 7% fiber optic cable utilization during the early 2000s dotcom bubble\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nAQR looks at the same market and sees something else entirely. Their capital market assumptions put the U.S. CAPE ratio at the \u003cstrong\u003e96th percentile since 1980\u003c/strong\u003e. Expected real returns for U.S. large cap equities over the next 5-10 years? \u003cstrong\u003e3.9%\u003c/strong\u003e. For a global 60/40 portfolio, just \u003cstrong\u003e3.4%\u003c/strong\u003e, well below the long-term average of roughly 5% since 1900. Risk premia, in their framework, are compressed across nearly every asset class. The narrative doesn\u0026rsquo;t enter their models.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-aqr-expected-returns-summary-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/aqr-expected-returns-summary.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/aqr-expected-returns-summary.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/aqr-expected-returns-summary.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/aqr-expected-returns-summary.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-summary.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/aqr-expected-returns-summary.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/aqr-expected-returns-summary.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/aqr-expected-returns-summary.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-summary.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/aqr-expected-returns-summary.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/aqr-expected-returns-summary.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-summary.png\"\n           alt=\"AQR medium-term expected real returns summary showing U.S. equities at 3.9%, non-U.S. developed at 5.3%, and global 60/40 at 3.4%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-aqr-expected-returns-summary-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/aqr-expected-returns-summary.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AQR medium-term expected real returns summary showing U.S. equities at 3.9%, non-U.S. developed at 5.3%, and global 60/40 at 3.4%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\na16z points to earnings growth. The market rally hasn\u0026rsquo;t been driven by multiple expansion, they note, but by actual EPS growth. Tech P/E multiples sit around 30-35x, elevated but nowhere near the 70-80x of 2000. Tech margins have \u0026ldquo;lapped the field\u0026rdquo; at 25%+ compared to 5-8% for the rest of the S\u0026amp;P 500. The fundamentals, they insist, are doing the work.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-pe-multiples-vs-dotcom-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-pe-multiples-vs-dotcom.png\"\n           alt=\"Earnings multiples are high but nowhere near dotcom levels: large cap tech trailing P/E around 30-35x today versus 70-80x in 2000\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-pe-multiples-vs-dotcom-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-pe-multiples-vs-dotcom.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Earnings multiples are high but nowhere near dotcom levels: large cap tech trailing P/E around 30-35x today versus 70-80x in 2000\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-tech-margins-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-tech-margins.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-tech-margins.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-tech-margins.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-tech-margins.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-tech-margins.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-tech-margins.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-tech-margins.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-tech-margins.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-tech-margins.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-tech-margins.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-tech-margins.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-tech-margins.png\"\n           alt=\"Tech margins have lapped the field: Tech and Interactive Media at 25%\u0026#43; compared to 5-8% for the rest of the S\u0026amp;P 500\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-tech-margins-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-tech-margins.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Tech margins have lapped the field: Tech and Interactive Media at 25%\u0026#43; compared to 5-8% for the rest of the S\u0026amp;P 500\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nAQR\u0026rsquo;s response would be that fundamentals always look good near peaks. Their research shows a \u003cstrong\u003e50% probability\u003c/strong\u003e that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade. Compressed premia don\u0026rsquo;t announce themselves with blaring headlines. They just quietly erode returns until investors notice they\u0026rsquo;ve been running in place.\u003c/p\u003e\n\u003cp\u003eCumulative hyperscaler capex is projected to reach \u003cstrong\u003e$4.8 trillion by 2030\u003c/strong\u003e. To achieve a 10% hurdle rate on that investment, AI revenue needs to hit roughly \u003cstrong\u003e$1 trillion annually by 2030\u003c/strong\u003e, about 1% of global GDP excluding China. \u003ca href=\"https://fortune.com/2025/11/17/is-ai-a-bubble-goldman-sachs-market-already-priced-in-19-trillion/\"\u003eGoldman Sachs estimates\u003c/a\u003e that $9 trillion in revenue could flow from the AI buildout, which at 20% margins and a 22x P/E multiple would create $35 trillion in new market cap. Only about $24 trillion has been pulled forward so far, leaving $11 trillion \u0026ldquo;on the table.\u0026ldquo;\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-ai-revenue-capex-targets-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-ai-revenue-capex-targets.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-ai-revenue-capex-targets.png\"\n           alt=\"Required AI-enabled revenue to meet return on capital targets: cumulative AI investment reaching $4.8 trillion by 2030 requires roughly $1 trillion in annual AI revenue\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-ai-revenue-capex-targets-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-ai-revenue-capex-targets.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Required AI-enabled revenue to meet return on capital targets: cumulative AI investment reaching $4.8 trillion by 2030 requires roughly $1 trillion in annual AI revenue\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nOr not. AQR would point out that the expected return for U.S. buyouts, private equity\u0026rsquo;s bread and butter, is now \u003cstrong\u003e4.2%\u003c/strong\u003e. That\u0026rsquo;s barely above the 3.9% for public large caps. The illiquidity premium has essentially vanished. If sophisticated PE firms can\u0026rsquo;t find excess returns, why should AI capex be different?\u003c/p\u003e\n\u003cp\u003eI find myself uncertain, which feels like the more honest position. Neither source is disinterested. a16z manages billions in venture capital and growth equity; bullish AI narratives support their portfolio valuations and fundraising. AQR runs systematic strategies that benefit when investors diversify away from concentrated U.S. tech exposure toward international equities and alternatives. Both are talking their book, which doesn\u0026rsquo;t make either wrong, but it\u0026rsquo;s worth noting.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe a16z data on utilization and revenue growth is hard to dismiss. 80% GPU utilization isn\u0026rsquo;t vaporware. Harvey users nearly tripled their time on the platform in nine months. Navan\u0026rsquo;s AI handles half of all customer interactions at satisfaction levels matching human agents. These are real products generating real engagement. But AQR\u0026rsquo;s valuation work has a longer track record. Their models don\u0026rsquo;t care about narratives, and historically that discipline has been valuable. When they say U.S. equities offer the lowest expected returns among major markets, that\u0026rsquo;s not pessimism. It\u0026rsquo;s arithmetic.\u003c/p\u003e\n\u003cp\u003eThe reconciliation might be this: AI winners could thrive spectacularly while broad market indices disappoint. a16z\u0026rsquo;s portfolio companies operate in a different universe than the average S\u0026amp;P 500 constituent. Compressed risk premia can coexist with individual companies generating enormous returns. The question is whether you\u0026rsquo;re buying the index or picking the winners.\u003c/p\u003e\n\u003cp\u003eNon-U.S. developed markets, by the way, offer expected returns of around 5%, versus 3.9% for U.S. large caps. The valuation gap is real even if the AI story is true. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-aqr-expected-returns-equities-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/aqr-expected-returns-equities.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/aqr-expected-returns-equities.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/aqr-expected-returns-equities.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/aqr-expected-returns-equities.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-equities.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/aqr-expected-returns-equities.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/aqr-expected-returns-equities.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/aqr-expected-returns-equities.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-equities.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/aqr-expected-returns-equities.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/aqr-expected-returns-equities.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-equities.png\"\n           alt=\"AQR expected local real returns for equities: U.S. Large at 3.9%, Eurozone at 5.0%, UK at 4.9%, Japan at 4.9%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-aqr-expected-returns-equities-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/aqr-expected-returns-equities.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AQR expected local real returns for equities: U.S. Large at 3.9%, Eurozone at 5.0%, UK at 4.9%, Japan at 4.9%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"a16z sees AI fundamentals thriving with 80% GPU utilization. AQR sees the CAPE at the 96th percentile. Both have data. Both may be right.","image":"https://static.philippdubach.com/ograph/ograph-two-ways-ai.jpg","date_published":"2026-01-31T00:00:00Z","date_modified":"2026-05-04T13:38:26+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Investing"],"_philippdubach":{"type":"Commentary","word_count":728,"reading_time_minutes":4,"keywords":["AI bubble 2026","CAPE ratio stock market","AI valuation 2026","a16z state of markets","expected stock returns 2026"],"section":"posts"}},{"id":"https://philippdubach.com/posts/bandits-and-agents-netflix-and-spotify-recommender-stacks-in-2026/","url":"https://philippdubach.com/posts/bandits-and-agents-netflix-and-spotify-recommender-stacks-in-2026/","title":"Bandits and Agents: Netflix and Spotify Recommender Stacks in 2026","content_html":"\u003cp\u003eHyperscalers spent over \u003ca href=\"https://www.goldmansachs.com/insights/articles/why-ai-companies-may-invest-more-than-500-billion-in-2026\"\u003e$350 billion on AI infrastructure\u003c/a\u003e in 2025 alone, with projections exceeding $500 billion in 2026. The trillion-dollar question is not whether machines can reason, but whether anyone can afford to let them. Hybrid recommender systems sit at the center of this tension. Large Language Models promised to transform how Netflix suggests your next show or how Spotify curates your morning playlist. Instead, the industry has split into two parallel universes, divided not by capability but by cost.\u003c/p\u003e\n\u003cp\u003eOn one side sits what engineers call the \u0026ldquo;classical stack\u0026rdquo;: matrix factorization, two-tower embedding models, and contextual bandits. These methods respond in microseconds, scale linearly with users, and run on nothing more complicated than dot products. A query costs a fraction of a cent. On the other side is the \u0026ldquo;agentic stack\u0026rdquo;: LLM-based reasoning engines that can handle requests like \u0026ldquo;find me a sci-fi movie that feels like Blade Runner but was made in the 90s.\u0026rdquo; This second approach consumes thousands of tokens per recommendation. The cost difference is not incremental; it is \u003ca href=\"https://www.softwareseni.com/understanding-inference-economics-and-why-ai-costs-spiral-beyond-proof-of-concept/\"\u003eorders of magnitude\u003c/a\u003e. LLM inference cost economics, more than any algorithmic breakthrough, is now the dominant force shaping recommender architecture.\u003c/p\u003e\n\u003cp\u003eThe 2026 consensus is a hybrid architecture: use the cheap, fast models for candidate generation from millions of items, then invoke the expensive reasoning layer only for the final dozen items a user actually sees. This \u0026ldquo;funnel\u0026rdquo; pattern — retrieval, then ranking, then re-ranking — is the only way to make the economics work. The smartest model is reserved for the fewest items.\u003c/p\u003e\n\u003cp\u003eWhat makes this work in practice goes back to a formalism from \u003ca href=\"https://www.jstor.org/stable/2332286\"\u003e1933\u003c/a\u003e: the multi-armed bandit. Imagine a gambler facing a row of slot machines, each with an unknown payout rate. She wants to maximize her winnings over a night of play. If she always pulls the arm with the highest observed payout, she might miss a better machine she never tried. If she explores too much, she wastes money on losers. The mathematics of this exploration–exploitation tradeoff define \u003cem\u003eregret\u003c/em\u003e:\u003c/p\u003e\n$$\nR(T) = \\mu^* \\cdot T - \\sum_{t=1}^{T} \\mu(a_t)\n$$\u003cp\u003eHere μ* is the best possible average reward, and μ(aₜ) is the reward from whatever arm she actually pulled at time t. Total regret is how much she left on the table by not knowing the optimal choice in advance. The goal of every multi-armed bandit algorithm in recommender systems is to drive this quantity sublinear in T — to learn fast enough that the cost of exploration vanishes relative to the horizon. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-slide10-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/slide10.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/slide10.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/slide10.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/slide10.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide10.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/slide10.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/slide10.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/slide10.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide10.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/slide10.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/slide10.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide10.png\"\n           alt=\"Multi-armed bandit recommender system diagram: a Learner taking Actions and receiving Rewards from an Environment, with the goal to maximize cumulative reward or minimize cumulative regret\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-slide10-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/slide10.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Multi-armed bandit recommender system diagram: a Learner taking Actions and receiving Rewards from an Environment, with the goal to maximize cumulative reward or minimize cumulative regret\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n The three main exploration strategies each take a different approach: epsilon-greedy adds random noise to avoid getting stuck; Upper Confidence Bound (UCB) prefers actions with uncertain values; Thompson Sampling selects actions according to the probability they are optimal. In practice, Thompson Sampling tends to outperform the others because its exploration is guided by posterior uncertainty rather than arbitrary randomness — it explores where it matters most. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-slide12-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/slide12.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/slide12.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/slide12.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/slide12.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide12.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/slide12.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/slide12.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/slide12.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide12.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/slide12.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/slide12.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide12.png\"\n           alt=\"Principles of Exploration in recommender systems: Naive Exploration (ε-greedy), Optimism in the Face of Uncertainty (UCB), and Probability Matching (Thompson Sampling)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-slide12-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/slide12.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Principles of Exploration in recommender systems: Naive Exploration (ε-greedy), Optimism in the Face of Uncertainty (UCB), and Probability Matching (Thompson Sampling)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Every recommendation you see on \u003ca href=\"https://research.netflix.com/publication/lessons-learnt-from-consolidating-ml-models-in-a-large-scale-recommendation\"\u003eNetflix\u0026rsquo;s homepage\u003c/a\u003e is the output of an algorithm trying to minimize exactly this quantity, whether it realizes it or not.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eNetflix\u0026rsquo;s recommendation algorithm architecture runs this optimization across \u003ca href=\"https://www.slideshare.net/slideshow/a-multiarmed-bandit-framework-for-recommendations-at-netflix/102629078\"\u003ethree computation layers\u003c/a\u003e. Offline systems crunch terabytes of viewing history to train deep collaborative filtering models, a process that takes hours and happens on a schedule. Nearline systems update user embeddings seconds after a click, keeping the recommendations fresh without the cost of full retraining. Online systems respond to each page load in milliseconds, combining the precomputed signals with real-time context like time of day and device type. The architecture is a \u003ca href=\"https://netflixtechblog.com/post-training-generative-recommenders-with-advantage-weighted-supervised-finetuning-61a538d717a9\"\u003elatency-cost tradeoff\u003c/a\u003e: deep analysis happens in batch, while the user-facing layer stays fast. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-slide28-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/slide28.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/slide28.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/slide28.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/slide28.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide28.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/slide28.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/slide28.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/slide28.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide28.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/slide28.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/slide28.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide28.png\"\n           alt=\"Netflix recommendation algorithm architecture: Member Activity and Contextual Information flow through an Offline System for model training, then to an Online System where the Multi-Armed Bandit produces recommendations\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-slide28-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/slide28.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Netflix recommendation algorithm architecture: Member Activity and Contextual Information flow through an Offline System for model training, then to an Online System where the Multi-Armed Bandit produces recommendations\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n What Netflix learned from a decade of experimentation is counterintuitive. The goal is not to recommend what users will definitely watch, but what they would not have found on their own. They call this \u0026ldquo;incrementality.\u0026rdquo; A greedy algorithm that always surfaces the highest-probability titles just confirms what users already knew — it exploits without exploring, and in doing so collapses the discovery space. A better approach is to measure the \u003cem\u003ecausal effect\u003c/em\u003e of the recommendation: how much does showing this thumbnail increase the probability of a play compared to not showing it? Some titles have low baseline interest but high incrementality. Those are the ones worth featuring. This is the exploration–exploitation tradeoff made concrete: the value of a recommendation is not its predicted rating, but its marginal contribution to discovery. \u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-slide41-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/slide41.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/slide41.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/slide41.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/slide41.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide41.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/slide41.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/slide41.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/slide41.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide41.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/slide41.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/slide41.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/slide41.png\"\n           alt=\"Netflix incrementality analysis: scatter plot showing incremental probability vs baseline probability, where Title A has low baseline but high incremental lift, while Title C has high baseline but less benefit from featuring\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-slide41-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/slide41.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Netflix incrementality analysis: scatter plot showing incremental probability vs baseline probability, where Title A has low baseline but high incremental lift, while Title C has high baseline but less benefit from featuring\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n Spotify\u0026rsquo;s AI DJ recommender system takes a different approach to the same problem. Their \u0026ldquo;\u003ca href=\"https://research.atspotify.com/2025/9/you-say-search-i-say-recs-a-scalable-agentic-approach-to-query-understanding\"\u003eAI DJ\u003c/a\u003e\u0026rdquo; feature uses what engineers internally call the \u0026ldquo;agentic router.\u0026rdquo; When you ask for \u0026ldquo;music for a rainy reading session in 1990s Seattle,\u0026rdquo; the router decides whether to invoke the expensive LLM reasoning layer or just fall back to keyword matching against collaborative filtering embeddings. Complex queries get the big model; simple ones get the fast path. This router is the economic governor of the entire system — an inference cost optimizer disguised as a product feature. Underneath the DJ\u0026rsquo;s personality, built on Spotify\u0026rsquo;s Sonantic voice synthesis and LLM-generated contextual narratives, sits a bandit framework called BaRT (Bandits for Recommendations as Treatments) that quietly balances what you know you like against what you might not yet know you need.\u003c/p\u003e\n\u003cp\u003eNot everyone is convinced the algorithms are making us better off. My own \u003ca href=\"https://philippdubach.com/posts/social-media-success-prediction-bert-models-for-post-titles/\"\u003eanalysis of social media success prediction\u003c/a\u003e found that sophisticated language models often just memorize temporal patterns rather than learning what actually makes content good. They learn the news cycle, not the news.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe risk is that we build hybrid recommender systems that are technically brilliant but experientially hollow, engineering away the serendipity that made discovery meaningful in the first place. The recommender is becoming a curator, and the curator is becoming an agent. The architecture will keep evolving — foundation models for recommendations, reinforcement learning from human feedback applied to discovery, inference costs that continue their \u003ca href=\"https://a16z.com/llmflation-llm-inference-cost/\"\u003e10× annual decline\u003c/a\u003e — but the open question for 2026 is whether we want to be the curators of our own lives, or merely consumers of an optimized feed.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSlides courtesy of \u0026ldquo;\u003ca href=\"https://www.slideshare.net/slideshow/a-multiarmed-bandit-framework-for-recommendations-at-netflix/102629078\"\u003eA Multi-Armed Bandit Framework for Recommendations at Netflix\u003c/a\u003e\u0026rdquo; by Jaya Kawale, Netflix.\u003c/em\u003e\u003c/p\u003e\n","summary":"How hybrid recommender systems balance multi-armed bandits against LLM inference cost economics in 2026. A deep dive into Netflix recommendation algorithm architecture and Spotify's AI DJ recommender system.","image":"https://static.philippdubach.com/ograph/ograph-recommender-architecture.jpg","date_published":"2026-01-30T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Article","word_count":1038,"reading_time_minutes":5,"keywords":["hybrid recommender systems 2026","multi-armed bandits recommender systems","LLM inference cost economics","Netflix recommendation algorithm architecture","Spotify AI DJ recommender system","recommender systems 2026","inference economics","contextual bandits","exploration exploitation tradeoff","Thompson Sampling","candidate generation ranking","personalization at scale"],"section":"posts"}},{"id":"https://philippdubach.com/posts/is-private-equity-just-beta-with-a-lockup/","url":"https://philippdubach.com/posts/is-private-equity-just-beta-with-a-lockup/","title":"Is Private Equity Just Beta With a Lockup?","content_html":"\u003cp\u003eThe pitch used to be simple: accept illiquidity, get rewarded. Lock up your capital for seven years, tolerate capital calls and J-curves, and in exchange you\u0026rsquo;d earn returns that public markets couldn\u0026rsquo;t touch. It was the defining bargain of institutional investing for two decades.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://www.aqr.com/Insights/Research/Alternative-Thinking/2026-Capital-Market-Assumptions-for-Major-Asset-Classes\"\u003eAQR\u0026rsquo;s latest capital market assumptions\u003c/a\u003e make for uncomfortable reading if you\u0026rsquo;re an allocator to private markets. Their expected real return for U.S. buyouts over the next 5-10 years is \u003cstrong\u003e4.2%\u003c/strong\u003e. For U.S. large cap public equities, it\u0026rsquo;s \u003cstrong\u003e3.9%\u003c/strong\u003e. That\u0026rsquo;s a 30 basis point premium for accepting years of lockup, unpredictable capital calls, limited transparency, and the very real risk of picking the wrong manager.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-aqr-expected-returns-private-assets-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/aqr-expected-returns-private-assets.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/aqr-expected-returns-private-assets.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/aqr-expected-returns-private-assets.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/aqr-expected-returns-private-assets.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-private-assets.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/aqr-expected-returns-private-assets.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/aqr-expected-returns-private-assets.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/aqr-expected-returns-private-assets.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-private-assets.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/aqr-expected-returns-private-assets.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/aqr-expected-returns-private-assets.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/aqr-expected-returns-private-assets.png\"\n           alt=\"AQR Exhibit 6: Expected real returns for private assets showing U.S. Buyouts at 4.2%, U.S. Real Estate at 3.1%, and U.S. Private Credit at 2.6%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-aqr-expected-returns-private-assets-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/aqr-expected-returns-private-assets.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"AQR Exhibit 6: Expected real returns for private assets showing U.S. Buyouts at 4.2%, U.S. Real Estate at 3.1%, and U.S. Private Credit at 2.6%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nPrivate credit looks even worse. Expected returns dropped \u003cstrong\u003e0.5 percentage points\u003c/strong\u003e year over year as spreads narrowed and base rates came down. The asset class that was supposed to be the sensible alternative to stretched equity valuations now offers less compensation than it did twelve months ago.\u003c/p\u003e\n\u003cp\u003eThis isn\u0026rsquo;t a temporary dislocation. It\u0026rsquo;s the logical endpoint of too much capital chasing the same opportunities. When every pension fund, endowment, and sovereign wealth fund decides they need \u003ca href=\"https://www.cbh.com/insights/reports/u.s.-alternative-investment-industry-report-2025\"\u003e20-30% allocation to alternatives\u003c/a\u003e, the returns that made alternatives attractive get arbitraged away. The money didn\u0026rsquo;t find alpha. It became beta (with a lockup).\u003c/p\u003e\n\u003cp\u003eI read more reports and the \u003ca href=\"https://docs.google.com/presentation/d/e/2PACX-1vQXsMMv5ZCWm77za7oXJcz1X-Th5Mz15g5nYBxbUjnomStVcjn8lXPjE5LzAlvc_hg4yHKgwASWLo5a/pub?start=false\u0026amp;loop=false\u0026amp;delayms=3000\u0026amp;slide=id.g3b6e2578ab2_8_4858\"\u003ea16z State of the Markets 2026\u003c/a\u003e isn\u0026rsquo;t less interesting. The dispersion numbers tell an interesting story. In venture capital, top decile managers generate \u003cstrong\u003e31.7% IRR\u003c/strong\u003e while bottom decile managers return \u003cstrong\u003enegative 7%\u003c/strong\u003e. The spread between winners and losers is enormous. But that spread is precisely why average returns have compressed. Access to top-tier funds has always been limited, and everyone else is fighting over what\u0026rsquo;s left.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-irr-dispersion-by-strategy-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-irr-dispersion-by-strategy.png\"\n           alt=\"Net IRR dispersion by strategy for 2002-2019 vintages showing venture capital with top decile at 31.7% and bottom decile at negative 7%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-irr-dispersion-by-strategy-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-irr-dispersion-by-strategy.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Net IRR dispersion by strategy for 2002-2019 vintages showing venture capital with top decile at 31.7% and bottom decile at negative 7%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n AQR\u0026rsquo;s framework suggests something that few allocators want to hear: the illiquidity premium might be negative for most investors. If you\u0026rsquo;re not in the top quartile of manager selection, you\u0026rsquo;re accepting lockup risk for returns you could approximate in public markets with better liquidity and lower fees.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe counterargument, and it\u0026rsquo;s a reasonable one, is that private markets offer exposure to companies you simply can\u0026rsquo;t access in public markets anymore. This part is true. \u003cstrong\u003e\u003ca href=\"https://www.apolloacademy.com/many-more-private-firms-in-the-us/\"\u003e87% of U.S. companies with more than $100 million in revenue are now private\u003c/a\u003e\u003c/strong\u003e. The top 10 private companies represent 38% of total unicorn valuation, and that share has nearly doubled since 2020. SpaceX, OpenAI, Anthropic, Databricks, Stripe: these are category-defining businesses, and they\u0026rsquo;re not on any exchange.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-companies-public-vs-private-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-companies-public-vs-private.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-companies-public-vs-private.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-companies-public-vs-private.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-companies-public-vs-private.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-companies-public-vs-private.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-companies-public-vs-private.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-companies-public-vs-private.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-companies-public-vs-private.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-companies-public-vs-private.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-companies-public-vs-private.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-companies-public-vs-private.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-companies-public-vs-private.png\"\n           alt=\"Share of U.S. companies with annual revenue greater than $100M showing private companies dominate\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-companies-public-vs-private-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-companies-public-vs-private.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Share of U.S. companies with annual revenue greater than $100M showing private companies dominate\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-top-10-private-companies-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-top-10-private-companies.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-top-10-private-companies.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-top-10-private-companies.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-top-10-private-companies.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-top-10-private-companies.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-top-10-private-companies.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-top-10-private-companies.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-top-10-private-companies.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-top-10-private-companies.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-top-10-private-companies.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-top-10-private-companies.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-top-10-private-companies.png\"\n           alt=\"Top 10 private companies represent 38% of total unicorn valuation in 2025, including SpaceX, OpenAI, Anthropic, Databricks, and Stripe\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-top-10-private-companies-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-top-10-private-companies.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Top 10 private companies represent 38% of total unicorn valuation in 2025, including SpaceX, OpenAI, Anthropic, Databricks, and Stripe\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nBut access isn\u0026rsquo;t the same as returns. You can have exposure to the most exciting companies in the world and still underperform a boring index fund if you pay too much or pick the wrong vintage. The S\u0026amp;P 500 minimum market cap eligibility has \u003ca href=\"https://press.spglobal.com/2025-07-01-S-P-Dow-Jones-Indices-Announces-Update-to-S-P-Composite-1500-Market-Cap-Guidelines\"\u003etripled since 2019 to $22.7 billion\u003c/a\u003e. Companies are staying private longer, which means more value creation happens before public investors get a chance. It also means private investors are paying up for that privilege.\u003c/p\u003e\n\u003cp\u003eValue creation has moved earlier in the company lifecycle. For IPOs between 2014-2019, only \u003cstrong\u003e12% of median value\u003c/strong\u003e was created in private markets. For 2020-2023 IPOs, that number jumped to \u003cstrong\u003e55%\u003c/strong\u003e. If you want to capture returns from the next generation of important companies, you probably need private market exposure.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-a16z-value-creation-shift-private-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/a16z-value-creation-shift-private.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/a16z-value-creation-shift-private.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/a16z-value-creation-shift-private.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/a16z-value-creation-shift-private.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-value-creation-shift-private.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/a16z-value-creation-shift-private.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/a16z-value-creation-shift-private.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/a16z-value-creation-shift-private.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-value-creation-shift-private.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/a16z-value-creation-shift-private.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/a16z-value-creation-shift-private.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/a16z-value-creation-shift-private.png\"\n           alt=\"Return potential has shifted to private markets: median value created in private markets went from 12% for 2014-2019 IPOs to 55% for 2020-2023 IPOs\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-a16z-value-creation-shift-private-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/a16z-value-creation-shift-private.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Return potential has shifted to private markets: median value created in private markets went from 12% for 2014-2019 IPOs to 55% for 2020-2023 IPOs\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe question really is what you\u0026rsquo;re paying for it.At 4.2% expected returns versus 3.9% for public equities, you\u0026rsquo;re paying in liquidity and flexibility for almost nothing in expected return. The premium that justified the allocation model has been competed away. If you\u0026rsquo;re in the top 5% of venture funds earning 60%+ IRR, none of this applies. For everyone else, the world has moved on.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"AQR's 2026 data shows private equity returning 4.2% versus 3.9% for public equities. The 30bp illiquidity premium barely justifies years of lockup.","image":"https://static.philippdubach.com/ograph/ograph-illiquidity-premium.jpg","date_published":"2026-01-29T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Commentary","word_count":650,"reading_time_minutes":4,"keywords":["illiquidity premium","private equity returns 2026","AQR capital market assumptions","PE vs public equity","venture capital dispersion"],"section":"posts"}},{"id":"https://philippdubach.com/posts/britains-strategic-limbo/","url":"https://philippdubach.com/posts/britains-strategic-limbo/","title":"Britain's Strategic Limbo","content_html":"\u003cp\u003eThe UK is the country with no bloc.\u003c/p\u003e\n\u003cp\u003eAt Davos, Britain \u003ca href=\"https://www.washingtonpost.com/world/2026/01/22/trump-board-peace-davos-countries-involved/\"\u003erefused to join Trump\u0026rsquo;s Board of Peace\u003c/a\u003e, citing commitment to international law and rejection of the \u0026ldquo;pay-to-play\u0026rdquo; model. France, Germany, Sweden, Norway made the same choice. The difference is that those countries have somewhere else to go. Britain doesn\u0026rsquo;t.\u003c/p\u003e\n\u003cp\u003eThe \u003ca href=\"https://ukandeu.ac.uk/explainers/explainer-security-action-for-europe-safe/\"\u003eSAFE instrument\u003c/a\u003e, the EU\u0026rsquo;s €150 billion fund for joint defense procurement, is designed explicitly for strategic autonomy. Strict \u0026ldquo;Buy European\u0026rdquo; provisions limit non-EU subcontractors to 15-35% of contract value, phased out within two years. Canada, remarkably, \u003ca href=\"https://www.pm.gc.ca/en/news/news-releases/2025/12/01/prime-minister-carney-secures-canadas-participation-european-unions\"\u003enegotiated access\u003c/a\u003e and now has preferential treatment on par with EU firms. The UK \u003ca href=\"https://behorizon.org/safe-mechanism-reshaping-eu-defence-integration/\"\u003eremains excluded\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eTalks broke down in late 2025. London viewed the EU\u0026rsquo;s requirements for third-country participation as an infringement on sovereignty. The same sovereignty concerns that drove Brexit now lock Britain out of the emerging European defense architecture. The \u0026ldquo;mid-Atlantic bridge\u0026rdquo; was always a metaphor. Britain positioned itself as the hinge between American power and European integration, useful to both, dependent on neither. That positioning assumed both poles wanted a bridge. Now the US treats allies as protection rackets and the EU is building walls around its defense industrial base. The bridge has nowhere to land.\u003c/p\u003e\n\u003cp\u003eWhat does the Starmer government do? The choices were supposed to be theoretical. Align with Washington and accept the transactional terms of the \u003ca href=\"https://en.wikipedia.org/wiki/Donroe_Doctrine\"\u003eDonroe Doctrine\u003c/a\u003e. Align with Brussels and accept the sovereignty constraints of SAFE participation. Or go it alone, with a defense budget that can\u0026rsquo;t sustain independent capability against peer competitors.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe \u003ca href=\"https://www.iiss.org/research-paper/2025/12/the-safe-regulation-and-its-implications-for-non-eu-defence-suppliers/\"\u003eIISS analysis\u003c/a\u003e of SAFE\u0026rsquo;s implications for non-EU suppliers is blunt: firms outside the bloc face structural disadvantages that compound over time. Procurement cycles last decades. If British defense firms are locked out of European contracts now, the gap widens with each passing year. The industrial base erodes.\u003c/p\u003e\n\u003cp\u003e\u0026ldquo;Global Britain\u0026rdquo; was the slogan after Brexit, a vision of nimble bilateral relationships unconstrained by Brussels bureaucracy. The reality is that global influence requires either hard power or bloc membership. Britain has neither the military budget for the former nor the political will for the latter.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://philippdubach.com/posts/the-rise-of-middle-power-realism/\"\u003eCanada\u0026rsquo;s pivot\u003c/a\u003e is instructive. Facing similar pressure from Washington, Carney diversified, joining SAFE, negotiating with Beijing, building horizontal coalitions with other middle powers. Britain has done none of this. It refused the Board of Peace on principle but hasn\u0026rsquo;t found an alternative structure to join on pragmatism.\u003c/p\u003e\n\u003cp\u003ePrinciples without alternatives is just isolation. The UK is learning what it means to be a middle power without a coalition, morally opposed to the new American order but structurally excluded from the European one.\u003c/p\u003e\n","summary":"Britain faces strategic isolation: locked out of EU defense cooperation, unwilling to join Trump's coalition. The mid-Atlantic bridge has nowhere to land.","image":"https://static.philippdubach.com/ograph/0016.png","date_published":"2026-01-28T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Analysis","word_count":432,"reading_time_minutes":3,"keywords":["UK EU defense cooperation","Brexit defense policy","SAFE fund exclusion","Britain strategic isolation","post-Brexit foreign policy","defense procurement economics","trade bloc exclusion","economic isolation"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-rise-of-middle-power-realism/","url":"https://philippdubach.com/posts/the-rise-of-middle-power-realism/","title":"The Rise of Middle Power Realism","content_html":"\u003cp\u003eAt Davos 2026, Canadian Prime Minister \u003ca href=\"https://en.wikipedia.org/wiki/Mark_Carney\"\u003eMark Carney\u003c/a\u003e delivered a speech that received something rare at these gatherings: a standing ovation. Carney told the assembled elites what they already knew but hadn\u0026rsquo;t said aloud: \u003ca href=\"https://www.weforum.org/stories/2026/01/davos-2026-special-address-by-mark-carney-prime-minister-of-canada/\"\u003ethe world is not in a \u0026ldquo;transition\u0026rdquo; but a \u0026ldquo;rupture.\u0026rdquo;\u003c/a\u003e\u003c/p\u003e\n\u003cp\u003eThe speech drew on Václav Havel\u0026rsquo;s 1978 essay \u003cem\u003eThe Power of the Powerless\u003c/em\u003e, specifically \u003ca href=\"https://www.nonviolent-conflict.org/resource/the-power-of-the-powerless/\"\u003ethe parable of the greengrocer\u003c/a\u003e who displays the slogan \u0026ldquo;Workers of the World, Unite!\u0026rdquo; in his shop window. The grocer doesn\u0026rsquo;t believe the slogan. He displays it to signal submission, to live in harmony with the regime. Carney\u0026rsquo;s application was pointed: for years, US allies have displayed the signs of the liberal international order, pretending the partnership was mutual, that rules mattered, that values were shared. Even as reality diverged.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;It is time for companies and countries to take their signs down.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eWhat followed the speech was more interesting than the speech itself. Days later, Canada became the first non-European G7 nation to join the EU\u0026rsquo;s \u003ca href=\"https://www.pm.gc.ca/en/news/news-releases/2025/12/01/prime-minister-carney-secures-canadas-participation-european-unions\"\u003eSAFE defense initiative\u003c/a\u003e, a €150 billion fund for joint European defense procurement. Canadian firms now have \u003ca href=\"https://www.squirepattonboggs.com/insights/publications/canadian-companies-to-be-allowed-preferential-access-under-eu-safe-defence-investment-program/\"\u003epreferential access\u003c/a\u003e to the European defense market, treated on par with EU companies. Days before Davos, Carney had traveled to Beijing to secure a \u003ca href=\"https://www.china-briefing.com/news/china-canada-trade-deal-preliminary-agreement/\"\u003epreliminary trade agreement\u003c/a\u003e on electric vehicles, 49,000 units at 6.1% tariff, compared to the 100% tariff the US imposes.\u003c/p\u003e\n\u003cp\u003eThe intellectual framework Carney articulated has a name now: \u0026ldquo;middle power realism.\u0026rdquo; It\u0026rsquo;s built on three observations.\u003c/p\u003e\n\u003cp\u003e(1) The US is no longer a reliable partner. Not because of Trump specifically, but because American politics has shifted in ways that make transactional unilateralism the new baseline. The \u003ca href=\"https://en.wikipedia.org/wiki/Donroe_Doctrine\"\u003e\u0026ldquo;Donroe Doctrine\u0026rdquo;\u003c/a\u003e, a portmanteau of \u0026ldquo;Donald\u0026rdquo; and \u0026ldquo;Monroe\u0026rdquo;, asserts American hegemony over the Western Hemisphere with a resource-driven, security-focused twist. It treats allies as protection rackets and international law as an impediment.\u003c/p\u003e\n\u003cp\u003e(2) Nostalgia is dangerous. The pre-2016 order isn\u0026rsquo;t coming back. Waiting for \u0026ldquo;normal\u0026rdquo; to return is a strategy for decline. Middle powers that don\u0026rsquo;t build domestic strength and horizontal coalitions will find themselves, as Carney put it invoking Thucydides, \u003ca href=\"https://www.theguardian.com/business/live/2026/jan/20/davos-von-der-leyen-he-macron-carney-wef-greenland-trump-uk-unemployment-business-live-news-updates\"\u003e\u0026ldquo;on the menu.\u0026rdquo;\u003c/a\u003e\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003e(3) Sovereignty requires the capacity to say no. That means diversified partnerships, even with rivals. Canada\u0026rsquo;s China deal infuriated Trump, who \u003ca href=\"https://www.washingtonexaminer.com/news/world/4433043/carney-no-intention-signing-free-trade-deal-china/\"\u003eaccused Carney of allowing a \u0026ldquo;Trojan Horse\u0026rdquo;\u003c/a\u003e into the continent. But from Ottawa\u0026rsquo;s perspective, the ability to trade with Beijing is precisely what makes Canadian sovereignty credible. You can\u0026rsquo;t negotiate from strength if you have no alternatives.\u003c/p\u003e\n\u003cp\u003eThe European response follows similar logic. During the Greenland crisis, when Trump threatened tariffs on eight European nations and refused to rule out military force to \u0026ldquo;secure\u0026rdquo; the island, the EU \u003ca href=\"https://www.theguardian.com/commentisfree/2026/jan/23/europe-trump-climbdown-genuflecting-tacos-greenland\"\u003ethreatened to deploy its Anti-Coercion Instrument\u003c/a\u003e against the United States. For the first time, the bloc signaled willingness to engage in a trade war with its primary security guarantor to protect the sovereignty of a member state.\u003c/p\u003e\n\u003cp\u003eThe \u003ca href=\"https://defence-industry-space.ec.europa.eu/eu-defence-industry/safe-security-action-europe_en\"\u003eSAFE instrument\u003c/a\u003e itself is designed for strategic autonomy. Strict \u0026ldquo;Buy European\u0026rdquo; provisions limit subcontractors from non-EU countries to 15-35% of contract value, phased out within two years. The explicit goal is ITAR-free supply chains, defense procurement that doesn\u0026rsquo;t depend on American permission. Meanwhile, the UK, which refused Trump\u0026rsquo;s Board of Peace but remains \u003ca href=\"https://behorizon.org/safe-mechanism-reshaping-eu-defence-integration/\"\u003eexcluded from SAFE\u003c/a\u003e due to post-Brexit negotiating failures, finds itself in strategic limbo. Alienated from Washington, locked out of European defense architecture, the \u0026ldquo;mid-Atlantic bridge\u0026rdquo; is collapsing.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;s a strange inversion happening in the international system. At Davos, \u003ca href=\"https://timesofindia.indiatimes.com/world/china/its-not-about-gaza-is-un-real-target-of-trumps-board-of-peace-china-emerges-as-unlikely-defender/articleshow/126967201.cms\"\u003eChina positioned itself as the defender of the UN Charter\u003c/a\u003e, rejecting Trump\u0026rsquo;s \u0026ldquo;Board of Peace\u0026rdquo; as a parallel structure that undermines international law. The authoritarian superpower defending liberal institutions while the democratic superpower seeks to dismantle them. China benefits from a multipolar system with weak enforcement mechanisms. The US benefits from a unipolar system where it makes the rules. Middle powers benefit from rules that constrain the strong, which is why the Global South \u003ca href=\"https://en.wikipedia.org/wiki/2026_Mark_Carney_speech_at_the_World_Economic_Forum\"\u003efound validation in Carney\u0026rsquo;s speech\u003c/a\u003e. The admission that the \u0026ldquo;Rules-Based Order\u0026rdquo; was often cover for Western interests resonated with nations that experienced that hypocrisy firsthand.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe term \u003cem\u003e\u0026ldquo;middle power\u0026rdquo;\u003c/em\u003e has always been slightly embarrassing, an admission of limits, a confession that you\u0026rsquo;re not at the top table. But there\u0026rsquo;s a realism emerging in these countries that the great powers lack. They can\u0026rsquo;t afford illusions about the international system because they don\u0026rsquo;t control it. They have to see clearly or get crushed.\u003c/p\u003e\n\u003cp\u003eCarney\u0026rsquo;s greengrocer metaphor cuts both ways. Yes, taking down the sign exposes the illusion. But it also means operating without the protection the illusion provided. The grocer who removes the slogan faces consequences. So do countries. Canada is betting it can navigate between giants, trading with China, defending alongside Europe, maintaining what leverage it has with Washington. The EU is betting it can build autonomous defense capacity fast enough to matter. Japan, Australia, and others are making similar calculations, hedging relationships that used to be taken for granted.\u003c/p\u003e\n","summary":"At Davos 2026, Carney told allies to take down the signs of the liberal order. Middle powers are learning to navigate between giants without illusions.","image":"https://static.philippdubach.com/ograph/0015.png","date_published":"2026-01-27T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Analysis","word_count":813,"reading_time_minutes":4,"keywords":["Mark Carney Davos 2026 speech","middle power foreign policy","Donroe Doctrine Trump","Canada EU SAFE defense","rules-based international order collapse","trade policy economics","tariff response strategy","economic coercion"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-most-expensive-assumption-in-ai/","url":"https://philippdubach.com/posts/the-most-expensive-assumption-in-ai/","title":"The Most Expensive Assumption in AI","content_html":"\u003cp\u003eSara Hooker\u0026rsquo;s paper arrived with impeccable timing. \u003cem\u003e\u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5877662\"\u003eOn the slow death of scaling\u003c/a\u003e\u003c/em\u003e dropped just as hyperscalers are committing another $500 billion to GPU infrastructure, bringing total industry deployment into the scaling thesis somewhere north of a trillion dollars. I\u0026rsquo;ve been \u003ca href=\"/posts/how-ai-is-shaping-my-investment-portfolio-for-2026/\"\u003etracking these capital flows\u003c/a\u003e for my own portfolio. Either Hooker is early to a generational insight or she\u0026rsquo;s about to be very publicly wrong.\u003cfigure class=\"post-figure\" style=\"width: 100%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-hyperscaler_capex2-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/hyperscaler_capex2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/hyperscaler_capex2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/hyperscaler_capex2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/hyperscaler_capex2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler_capex2.png 1200w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/hyperscaler_capex2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/hyperscaler_capex2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/hyperscaler_capex2.png 1440w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler_capex2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/hyperscaler_capex2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/hyperscaler_capex2.png 2000w\"\n              sizes=\"100vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/hyperscaler_capex2.png\"\n           alt=\"Hyperscaler AI capital expenditure 2019-2025\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-hyperscaler_capex2-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/hyperscaler_capex2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Hyperscaler AI capital expenditure 2019-2025\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe core argument is very simple: bigger is not always better. \u003ca href=\"https://www.tii.ae/news/falcon-2-uaes-technology-innovation-institute-releases-new-ai-model-series-outperforming-metas\"\u003eLlama-3 8B outperforms Falcon 180B\u003c/a\u003e. \u003ca href=\"https://arxiv.org/abs/2211.05100\"\u003eAya 23 8B beats BLOOM 176B\u003c/a\u003e despite having only 4.5% of the parameters. These are not isolated flukes. Hooker plots submissions to the Open LLM Leaderboard over two years and finds a systematic trend where compact models consistently outperform their bloated predecessors. The bitter lesson, as Rich Sutton framed it, was that brute force compute always wins. Hooker\u0026rsquo;s counter is that maybe we\u0026rsquo;ve been held hostage to \u0026ldquo;a painfully simple formula\u0026rdquo; that\u0026rsquo;s now breaking down.\u003cfigure class=\"post-figure\" style=\"width: 100%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-model_size_vs_performance2-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/model_size_vs_performance2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/model_size_vs_performance2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/model_size_vs_performance2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/model_size_vs_performance2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/model_size_vs_performance2.png 1200w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/model_size_vs_performance2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/model_size_vs_performance2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/model_size_vs_performance2.png 1440w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/model_size_vs_performance2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/model_size_vs_performance2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/model_size_vs_performance2.png 2000w\"\n              sizes=\"100vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/model_size_vs_performance2.png\"\n           alt=\"Model size vs benchmark performance showing smaller models outperforming larger ones\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-model_size_vs_performance2-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/model_size_vs_performance2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Model size vs benchmark performance showing smaller models outperforming larger ones\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nScaling laws, she notes, only reliably predict pre-training test loss. When you look at actual downstream performance, the results are \u0026ldquo;murky or inconsistent.\u0026rdquo; The term \u0026ldquo;emergent properties\u0026rdquo; gets thrown around to describe capabilities that appear suddenly at scale, but Hooker points out this is really just a fancy way of admitting we have no idea what\u0026rsquo;s coming. If your scaling law can\u0026rsquo;t predict emergence, it\u0026rsquo;s not much of a law.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Gary_Marcus\"\u003eGary Marcus\u003c/a\u003e has been making a related argument from a different angle. The cognitive scientist, whose 2001 book predicted hallucination problems, calls LLMs \u0026ldquo;glorified memorization machines\u0026rdquo; that work because the internet contains answers to most common queries. His framing is less academic and more market-oriented: the jump from GPT-1 to GPT-4 showed obvious qualitative leaps requiring no benchmarks. The jump from GPT-4 to GPT-5? Marginal improvements requiring careful measurement. The textbook definition of diminishing returns.\u003c/p\u003e\n\u003cp\u003eThe market signals are worth watching. According to \u003ca href=\"https://www.ft.com/content/a081aa60-eaca-4413-ba15-489762154c57\"\u003eGoldman Sachs data\u003c/a\u003e, hedge fund short interest in utilities now sits at the 99th percentile relative to the past five years. Utilities. The bet appears to be that AI data center demand, the premise on which \u003ca href=\"https://www.reuters.com/business/energy/american-electric-power-signs-265-billion-deal-fuel-cells-2026-01-08/\"\u003eAmerican Electric Power trades at $65 billion\u003c/a\u003e, may not materialize as expected. Meanwhile, names like Bloom Energy, Oracle, and various AI-adjacent plays are showing up on heavily-shorted lists. Hedge funds aren\u0026rsquo;t yet betting against Nvidia directly, but they\u0026rsquo;re circling the weaker members of the herd.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThere\u0026rsquo;s a certain irony here that Hooker captures well. Academia was effectively priced out of meaningful AI research by the compute arms race. The explosion in necessary compute \u0026ldquo;marginalized academia from meaningfully participating in AI progress.\u0026rdquo; Industry labs stopped publishing to preserve commercial advantage. Now, as scaling hits diminishing returns, the skills that matter shift back toward algorithmic cleverness, data quality, and architectural innovation. Things that don\u0026rsquo;t require a billion-dollar data center. If you got priced out of the game, the game may be coming back to you. Hooker writes,\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe less reliable gains from compute makes our purview as computer scientists interesting again\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe quiet tell is how frontier labs are actually behaving. Major players are now incorporating classical symbolic tools, things like Python interpreters and code execution, into LLM pipelines. These symbolic components run on CPUs, not GPUs. \u003ca href=\"https://en.wikipedia.org/wiki/Ilya_Sutskever\"\u003eIlya Sutskever\u003c/a\u003e, coauthor of the 2012 ImageNet paper and OpenAI cofounder, publicly stated that\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWe need to go back to the age of research\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eShorting the scaling thesis has been a widow-maker trade for the better part of three years. Nvidia is up roughly 800% since 2022. As I\u0026rsquo;ve \u003ca href=\"/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/\"\u003ewritten before\u003c/a\u003e, the market can remain irrational longer than you can remain solvent, and that applies to both directions. OpenAI reportedly burns around $3 billion monthly with a $40 billion funding round implying perhaps 13 months of runway. If the next mega-round prices down or requires distressed terms, that\u0026rsquo;s your signal. Until then, the thesis may be directionally correct on the technical limitations while the timing remains treacherous.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWe can only see a short distance ahead, but we can see plenty there that needs to be done.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAs Alan Turing put it, and Hooker quotes approvingly. The scaling era produced real capabilities alongside real capital misallocation. What comes next is genuinely uncertain. That uncertainty cuts both ways.\u003c/p\u003e\n","summary":"Sara Hooker's research challenges the trillion-dollar scaling thesis. Compact models now outperform massive ones as diminishing returns hit AI.","image":"https://static.philippdubach.com/ograph/ograph-scaling.jpg","date_published":"2026-01-26T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Commentary","word_count":707,"reading_time_minutes":4,"keywords":["AI scaling laws","diminishing returns AI","small vs large language models","Sara Hooker scaling","AI infrastructure investing"],"section":"posts"}},{"id":"https://philippdubach.com/posts/against-all-odds-the-mathematics-of-provably-fair-casino-games/","url":"https://philippdubach.com/posts/against-all-odds-the-mathematics-of-provably-fair-casino-games/","title":"Against All Odds: The Mathematics of 'Provably Fair' Casino Games","content_html":"\u003cbr\u003e\n\u003cblockquote\u003e\n\u003cp\u003eGambling can be harmful and lead to significant losses. Participation is subject to local laws and age restrictions. Always gamble responsibly. Need help? Visit BeGambleAware.org\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cbr\u003e\n\u003cp\u003eCrash games represent a category of online gambling where players place bets on an increasing multiplier that can \u003cem\u003e\u0026lsquo;crash\u0026rsquo;\u003c/em\u003e at any moment. The fundamental mechanic requires players to cash out before the crash occurs; successful cash-outs yield the bet amount multiplied by the current multiplier, while failure results in total loss of the wager.\u003c/p\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-flight-game-gif-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/flight-game.gif 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/flight-game.gif 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/flight-game.gif 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/flight-game.gif 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/flight-game.gif 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/flight-game.gif 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/flight-game.gif 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/flight-game.gif 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/flight-game.gif 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/flight-game.gif 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/flight-game.gif 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/flight-game.gif\"\n           alt=\"Crash game showing an airplane flying with increasing multiplier until it crashes\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-flight-game-gif-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/flight-game.gif\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Crash game showing an airplane flying with increasing multiplier until it crashes\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\n\u003cp\u003eThe specific game I came across is a variant that employs an aircraft flight metaphor. Let\u0026rsquo;s call it \u003cem\u003ePlane Game\u003c/em\u003e. What intrigued me wasn\u0026rsquo;t the game itself but that it said \u0026ldquo;provably fair\u0026rdquo; on the startup screen, which I assumed to be a typo at first. I stand corrected:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eA provably fair gambling system uses cryptography to let players verify that each outcome was generated from fixed inputs, rather than chosen or altered by the operator after a bet is placed. The casino commits to a hidden \u0026ldquo;server seed\u0026rdquo; via a public hash, combines it with a player-controlled \u0026ldquo;client seed\u0026rdquo; and a per-bet nonce, and later reveals the server seed so anyone can recompute and confirm the result.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe stated Return-to-Player (RTP) of that specific game is 97%, implying a 3% \u003ca href=\"https://www.investopedia.com/articles/personal-finance/110415/why-does-house-always-win-look-casino-profitability.asp\"\u003ehouse edge\u003c/a\u003e. After watching a few rounds, the perceived probability felt off. And if there\u0026rsquo;s something that gets my attention, it\u0026rsquo;s \u003ca href=\"/posts/counting-cards-with-computer-vision/\"\u003ethe combination of games and statistics\u003c/a\u003e. So I did what any reasonable person would do: I watched another 20,000 rounds over six days (112 hours total) and wrote \u003ca href=\"https://static.philippdubach.com/pdf/202601_PD_DUBACH_The%20Online%20Gambling%20Fairness%20Paradox.pdf\"\u003ea paper about it\u003c/a\u003e.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-crash_game_stats-png-1\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/crash_game_stats.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/crash_game_stats.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/crash_game_stats.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/crash_game_stats.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/crash_game_stats.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/crash_game_stats.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/crash_game_stats.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/crash_game_stats.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/crash_game_stats.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/crash_game_stats.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/crash_game_stats.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/crash_game_stats.png\"\n           alt=\"Script recording 20000 rounds over six days (112 hours total)\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-crash_game_stats-png-1\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/crash_game_stats.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Script recording 20000 rounds over six days (112 hours total)\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eThe distribution below shows the classic heavy tail: most rounds crash quickly at low multipliers, while rare events produce 100x or even 1000x payouts. The maximum I observed was 10,000x. This extreme variance creates the illusion of big wins just around the corner while the house edge operates relentlessly over time.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig_distribution2-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig_distribution2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig_distribution2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig_distribution2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig_distribution2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_distribution2.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig_distribution2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig_distribution2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig_distribution2.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_distribution2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig_distribution2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig_distribution2.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_distribution2.png\"\n           alt=\"Heavy-tailed distribution of crash multipliers on log-log scale showing most rounds end at low multipliers while rare events exceed 100x or 1000x, with maximum observed at 10,000x\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig_distribution2-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig_distribution2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Heavy-tailed distribution of crash multipliers on log-log scale showing most rounds end at low multipliers while rare events exceed 100x or 1000x, with maximum observed at 10,000x\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nFor a crash game with RTP = r (where 0 \u0026lt; r \u0026lt; 1), the crash multiplier M follows a specific probability distribution. The survival function is particularly relevant:\u003c/p\u003e\n$$P(M \\geq m) = \\frac{r}{m}$$\u003cp\u003eThis means the probability of reaching at least multiplier m before crashing equals r/m. For any cash-out target, the expected value of a unit bet works out to:\u003c/p\u003e\n$$E[\\text{Profit}] = P(M \\geq m) \\times m - 1 = \\frac{r}{m} \\times m - 1 = r - 1 = -0.03$$\n\n\n\n\n\n\n\n\u003cp\u003eThis mathematical property makes crash games theoretically \u0026ldquo;strategy-proof\u0026rdquo; in expectation. No cash-out timing strategy should yield better long-term results than another.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig_survival_annotated-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig_survival_annotated.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig_survival_annotated.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig_survival_annotated.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig_survival_annotated.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_survival_annotated.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig_survival_annotated.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig_survival_annotated.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig_survival_annotated.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_survival_annotated.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig_survival_annotated.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig_survival_annotated.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_survival_annotated.png\"\n           alt=\"Survival probability curve on log-log scale showing probability of reaching target multiplier: 2x succeeds 48.5% of the time, 5x at 19.6%, 10x at 9.7%, 50x at 2.0%, and 100x at just 1.1%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig_survival_annotated-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig_survival_annotated.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Survival probability curve on log-log scale showing probability of reaching target multiplier: 2x succeeds 48.5% of the time, 5x at 19.6%, 10x at 9.7%, 50x at 2.0%, and 100x at just 1.1%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe empirical data matches theory almost perfectly. A 2x target succeeds about 48.5% of the time. Aiming for 10x? That works only 9.7% of rounds. The close fit between my observations and the theoretical line confirms the stated 97% RTP.\u003c/p\u003e\n\u003cp\u003eSo is the game fair? My analysis says yes. Using three different statistical methods (log-log regression, maximum likelihood, and the Hill estimator), I estimated the probability density function exponent at α ≈ 1.98, within 2.2% of the theoretical value of 2.0. This contrasts with \u003ca href=\"https://www.nature.com/articles/s41598-019-50168-2\"\u003eWang and Pleimling\u0026rsquo;s 2019 research\u003c/a\u003e that found exponents of 1.4 to 1.9 for player cashout distributions. The key distinction: their deviations reflect player behavioral biases (probability weighting), not game manipulation. The random number generator produces fair outcomes.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig_qq_enhanced-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig_qq_enhanced.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig_qq_enhanced.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig_qq_enhanced.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig_qq_enhanced.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_qq_enhanced.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig_qq_enhanced.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig_qq_enhanced.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig_qq_enhanced.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_qq_enhanced.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig_qq_enhanced.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig_qq_enhanced.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_qq_enhanced.png\"\n           alt=\"Q-Q plot comparing empirical vs theoretical quantiles with perfect fit line and 10% confidence band, showing close alignment confirming fair random number generation\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig_qq_enhanced-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig_qq_enhanced.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Q-Q plot comparing empirical vs theoretical quantiles with perfect fit line and 10% confidence band, showing close alignment confirming fair random number generation\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nI then ran Monte Carlo simulations of 10,000 betting sessions under four different strategies: conservative 1.5x cashouts, moderate 2.0x, aggressive 3.0x, and high-risk 5.0x targets.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig_strategies-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig_strategies.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig_strategies.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig_strategies.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig_strategies.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_strategies.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig_strategies.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig_strategies.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig_strategies.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_strategies.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig_strategies.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig_strategies.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_strategies.png\"\n           alt=\"Strategy comparison boxplot showing session returns for 100 rounds: 1.5x Conservative averages -2.9%, 2.0x Moderate -2.4%, 3.0x Aggressive -3.3%, and 5.0x High Risk -3.5%, all negative\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig_strategies-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig_strategies.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Strategy comparison boxplot showing session returns for 100 rounds: 1.5x Conservative averages -2.9%, 2.0x Moderate -2.4%, 3.0x Aggressive -3.3%, and 5.0x High Risk -3.5%, all negative\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nEvery single strategy produces negative expected returns. The conservative approach has lower variance but still loses. The aggressive strategies lose faster with higher variance.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-fig_trajectories-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/fig_trajectories.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/fig_trajectories.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/fig_trajectories.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/fig_trajectories.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_trajectories.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/fig_trajectories.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/fig_trajectories.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/fig_trajectories.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_trajectories.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/fig_trajectories.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/fig_trajectories.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/fig_trajectories.png\"\n           alt=\"Simulated player sessions using 1.5x strategy over 200 rounds showing multiple trajectories trending toward expected loss line of -3% per round\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-fig_trajectories-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/fig_trajectories.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Simulated player sessions using 1.5x strategy over 200 rounds showing multiple trajectories trending toward expected loss line of -3% per round\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe consumer protection angle is what concerns me most. My data revealed 179 rounds per hour with 16-second median intervals. At that pace, with a 3% house edge per round, players face expected losses exceeding 500% of amounts wagered per hour of play. The manual cashout mechanic creates an illusion of control, masking the deterministic nature of losses.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe game is provably fair in the cryptographic sense. The mathematics check out. But mathematical fairness doesn\u0026rsquo;t ensure consumer safety. The house always wins, and it wins fast.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe only winning strategy is not to play\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe full paper preprint with methodology and statistical details is \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6065213\"\u003eavailable on SSRN\u003c/a\u003e. Code and data are on \u003ca href=\"https://github.com/philippdubach/stats-gambling\"\u003eGitHub\u003c/a\u003e.\u003c/p\u003e\n","summary":"Statistical analysis of 20,000 crash game rounds verifies the 97% RTP claim. But 179 rounds per hour means expected losses exceed 500% of wagers hourly.","image":"https://static.philippdubach.com/ograph/ograph-casino.jpg","date_published":"2026-01-25T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Quantitative Finance"],"_philippdubach":{"type":"Project","word_count":714,"reading_time_minutes":4,"keywords":["provably fair gambling","crash game mathematics","RTP statistical testing","gambling house edge","crash game strategy"],"doi":"10.2139/ssrn.6065213","section":"posts"}},{"id":"https://philippdubach.com/posts/enterprise-ai-strategy-is-backwards/","url":"https://philippdubach.com/posts/enterprise-ai-strategy-is-backwards/","title":"Enterprise AI Strategy is Backwards","content_html":"\u003cp\u003eThat’s the claim made by LinkedIn co-founder \u003ca href=\"https://en.wikipedia.org/wiki/Reid_Hoffman\"\u003eReid Hoffman\u003c/a\u003e. It’s a bold assertion, so I set out to investigate whether the data supports it.\u003cfigure class=\"post-figure\" style=\"width: 100%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-download_overview-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/download_overview.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/download_overview.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/download_overview.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/download_overview.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/download_overview.png 1200w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/download_overview.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/download_overview.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/download_overview.png 1440w\"\n              sizes=\"100vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/download_overview.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/download_overview.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/download_overview.png 2000w\"\n              sizes=\"100vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/download_overview.png\"\n           alt=\"Report Header Overview\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-download_overview-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/download_overview.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Report Header Overview\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\nThe result is a comprehensive report, backed by more than 30 sources. You can download \u003ca href=\"https://static.philippdubach.com/pdf/Enterprise_AI_Strategy2026_philippdubach.pdf\"\u003ethe full report\u003c/a\u003e\nand the \u003ca href=\"https://static.philippdubach.com/pdf/Enterprise_AI_Strategy2026_Deck_philippdubach.pdf\"\u003eaccompanying presentation\u003c/a\u003e for free.\u003c/p\u003e\n\u003chr\u003e\n\u003cp\u003eGlobal AI spending hit $13.8 billion; a six-fold increase since late 2023. Yet 85% of AI projects never reach production. Only 26% of companies can translate pilots into outcomes. The gap between ambition and execution has become so predictable that Gartner now officially places generative AI in the \u0026ldquo;\u003ca href=\"https://www.snaplogic.com/lp/gartner-magic-quadrant-ipaas-2025?utm_source=GOOG\u0026amp;utm_medium=PS\u0026amp;utm_campaign=Content_AR_Gartner-iPaas-MQ-2025\u0026amp;_bt=778769312143\u0026amp;_bk=gartner%20ipaas%20magic%20quadrant\u0026amp;_utm_term=gartner%20ipaas%20magic%20quadrant\u0026amp;_bm=b\u0026amp;_bn=g\u0026amp;saf_src=google_g\u0026amp;saf_pt=\u0026amp;saf_kw=gartner%20ipaas%20magic%20quadrant\u0026amp;saf_dv=\u0026amp;saf_cam=23125873381\u0026amp;saf_grp=186359808906\u0026amp;saf_ad=778769312143\u0026amp;saf_acc=4847116121\u0026amp;saf_cam_tp=search\u0026amp;gad_source=1\u0026amp;gad_campaignid=23125873381\u0026amp;gbraid=0AAAAAD3MpSl-QdXUDpLVTClnJRS_g2cQ-\u0026amp;gclid=Cj0KCQiA1czLBhDhARIsAIEc7ugOJcXK_OoRuxk2au4MhOAaluMKdTwxFcl3uPdWSMcYdLd0JAogI7QaAvbeEALw_wcB\"\u003etrough of disillusionment\u003c/a\u003e.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;s an economic concept called \u003ca href=\"https://en.wikipedia.org/wiki/Jevons_paradox\"\u003eJevons paradox\u003c/a\u003e \u003cem\u003e(yes, I \u003ca href=\"https://notes.philippdubach.com/0005\"\u003ereferenced this before\u003c/a\u003e)\u003c/em\u003e. When efficiency improves for a resource, consumption increases, not decreases. Coal-efficient steam engines didn\u0026rsquo;t reduce coal usage, they made coal so useful that demand exploded. The same logic applies to organizational communication. Email was supposed to reduce meetings. Slack was supposed to reduce email. AI was supposed to reduce everything.\u003c/p\u003e\n\u003cp\u003eInstead, the average employee now spends 57% of their workday on coordination: communicating, updating, aligning. Meetings alone cost the US economy $532 billion per year. This is the coordination layer, where organizations actually run, and where organizations quietly bleed.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThree observations:\u003c/p\u003e\n\u003cp\u003e(1) Only 26% of companies have the maturity to translate AI pilots into outcomes. The rest are layering AI on legacy workflows instead of redesigning them.\u003cbr\u003e\n(2) Language models bridge the gap between messy human communication and structured data. Transcripts to CRM fields. Teams using these tools report 30% higher win rates and 80% less manual work.\u003cbr\u003e\n(3) AI gains compound when shareable. A summary helps one person. A system that captures and distributes knowledge helps everyone downstream.\u003c/p\u003e\n\u003cp\u003eThe coordination layer isn\u0026rsquo;t glamorous. It\u0026rsquo;s transcripts, status updates, action items, CRM entries. It\u0026rsquo;s the administrative exhaust of getting anything done with other people. And it\u0026rsquo;s almost entirely composed of language. We have language models now. Models that extract structured data from messy transcripts, convert meeting notes into CRM fields with 99% accuracy. Sales teams using these tools report 30% higher win rates and 80% less manual work.\u003c/p\u003e\n\u003cp\u003eYet most enterprise AI strategies ignore this entirely. They\u0026rsquo;re focused on chatbots and demos for board presentations. Meanwhile, the language processing that constitutes the primary workload of any modern business remains stuck in the same recursive loops. The winners won\u0026rsquo;t be companies with great AI announcements. They\u0026rsquo;ll be the ones building daily habits early enough for the gains to stack.\u003c/p\u003e\n","summary":"85% of AI projects fail. Only 26% translate pilots to production. The winners automate the coordination layer where employees spend 57% of their workday.","image":"https://static.philippdubach.com/ograph/ograph-ai-strat-backwards.jpg","date_published":"2026-01-22T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Essay","word_count":413,"reading_time_minutes":2,"keywords":["enterprise AI failure","AI implementation strategy","coordination layer AI","AI pilot to production","workplace AI productivity"],"section":"posts"}},{"id":"https://philippdubach.com/posts/big-in-japan/","url":"https://philippdubach.com/posts/big-in-japan/","title":"Big in Japan","content_html":"\u003cp\u003eJapan holds roughly \u003ca href=\"https://pbs.twimg.com/media/G_j8tfLXEAA1djy?format=jpg\u0026amp;name=medium\"\u003e$5 trillion in foreign assets\u003c/a\u003e. The US alone accounts for \u003ca href=\"https://pbs.twimg.com/media/G_j8tfKWwAAesgX?format=jpg\u0026amp;name=medium\"\u003e¥342 trillion\u003c/a\u003e in bonds and equities.\u003c/p\u003e\n\u003cp\u003eJapanese \u003ca href=\"https://pbs.twimg.com/media/G_j8tfQWoAADp2D?format=jpg\u0026amp;name=medium\"\u003e30-year yields sat below 1%\u003c/a\u003e from 2019 through early 2024. They\u0026rsquo;re now above 3%. The \u003ca href=\"https://pbs.twimg.com/media/G_j8tfRXgAAgGPV?format=jpg\u0026amp;name=medium\"\u003eyield spread\u003c/a\u003e between developed market bonds and JGBs has collapsed from 400 basis points to roughly 100. The yen carry trade that defined Japanese institutional behavior since the 1990s, borrow cheap at home and invest abroad for yield, suddenly has added friction.\u003c/p\u003e\n\u003cp\u003eJapanese life insurers and pension funds have duration-matching obligations. If domestic yields offer adequate returns with lower currency risk, the marginal incentive to hold Treasuries weakens. GPIF, the world\u0026rsquo;s largest pension fund, doesn\u0026rsquo;t need to reach for yield in US credit markets when JGBs pay 3%.\u003c/p\u003e\n\u003cp\u003eThis doesn\u0026rsquo;t mean Japanese investors dump everything tomorrow. Institutional rebalancing is glacial. Currency hedging costs matter and existing positions have different maturity profiles. Treasury market depth has deteriorated since 2020. Primary dealers hold smaller inventories. Liquidity provision is thinner. A sustained seller of size, which Japanese institutions would be, arrives into a market less equipped to absorb flow than at any point since the GFC.\u003c/p\u003e\n\u003cp\u003eThe second-order effects compound. Japanese selling pressures Treasury yields higher. Higher yields strengthen the dollar near-term but raise US borrowing costs. If Japan\u0026rsquo;s repatriation triggers broader reserve manager concern about duration exposure, the feedback loop accelerates.\u003c/p\u003e\n\u003cp\u003eThe consensus view remains that Japan is trapped. Any meaningful tightening implodes JGB markets where the BOJ owns half of outstanding supply. But the data suggests something else. Yields are rising, volatility is elevated, and the market is absorbing it. The trap might be less binding than assumed.\u003c/p\u003e\n\u003cp\u003eThe yen carry trade unwound violently in August 2024 and the S\u0026amp;P dropped 6% in three days. That was positioning adjustment. Repatriation of actual assets would be slower but larger.\u003c/p\u003e\n\u003cp\u003eWhen a $5 trillion portfolio starts rebalancing toward domestic assets, you don\u0026rsquo;t need to predict the timing. You need to be positioned for a situation where the marginal Treasury buyer becomes a marginal seller. What happens in Japan doesn\u0026rsquo;t stay in Japan.\u003c/p\u003e\n","summary":"Japan holds $5 trillion in foreign assets. With 30-year JGB yields now above 3%, the carry trade that defined Japanese investing faces new friction.","image":"https://static.philippdubach.com/ograph/0014.png","date_published":"2026-01-19T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Commentary","word_count":344,"reading_time_minutes":2,"keywords":["yen carry trade","Japan Treasury holdings","JGB yields","Japanese capital repatriation","Japan foreign assets"],"section":"posts"}},{"id":"https://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/","url":"https://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/","external_url":"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5073929","title":"Ozempic is Reshaping the Fast Food Industry","content_html":"\u003cp\u003eSomething strange is happening in the food industry. \u003ca href=\"https://www.wsj.com/health/wellness/us-dietary-food-guidelines-trump-rfk-jr-aaf51714\"\u003eNew US dietary guidelines call for more protein and less sugar\u003c/a\u003e. Greggs, the UK bakery chain, just warned of \u003ca href=\"https://www.ft.com/content/7ab5e9b8-45fe-4ba2-97f2-41d417561ce3\"\u003e\u0026ldquo;flatlining profits\u0026rdquo;\u003c/a\u003e in the food-to-go market. Food companies are racing to overhaul their brands, ditching artificial dyes and packing protein into products. Earnings calls across the sector blame \u0026ldquo;inflation\u0026rdquo; and \u0026ldquo;subdued consumer confidence.\u0026rdquo; Nobody mentions the elephant in the room: GLP-1 medications.\u003c/p\u003e\n\u003cp\u003eNew \u003ca href=\"https://doi.org/10.1177/00222437251412834\"\u003eresearch from Cornell\u003c/a\u003e finally puts numbers to what the food industry doesn\u0026rsquo;t want to discuss. Using transaction data from 150,000 households linked to survey responses on medication adoption, Sylvia Hristakeva, Jūra Liaukonytė, and Leo Feler tracked exactly how Ozempic and Wegovy users change their spending. The results deserve attention from anyone holding food stocks.\u003c/p\u003e\n\u003cp\u003eThe headline: households with a GLP-1 user cut grocery spending by \u003cstrong\u003e5.3%\u003c/strong\u003e within six months. For high-income households, that figure jumps to \u003cstrong\u003e8.2%\u003c/strong\u003e. Fast food takes an even harder hit, with spending at limited-service restaurants falling \u003cstrong\u003e8.0%\u003c/strong\u003e. These aren\u0026rsquo;t people switching brands or trading down. They\u0026rsquo;re simply eating less.\u003c/p\u003e\n\u003cp\u003eThe category-level data tells the real story. Savory snacks see the largest decline at \u003cstrong\u003e10.1%\u003c/strong\u003e. Sweets, baked goods, cookies, all down. Even staples like meat, eggs, and bread decline. In the entire grocery basket, only one category shows a statistically significant increase: yogurt. Fresh fruit and nutrition bars trend up slightly, but yogurt is the lone winner with statistical confidence.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-glp1_category_spending-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/glp1_category_spending.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/glp1_category_spending.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/glp1_category_spending.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/glp1_category_spending.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_category_spending.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/glp1_category_spending.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/glp1_category_spending.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/glp1_category_spending.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_category_spending.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/glp1_category_spending.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/glp1_category_spending.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_category_spending.png\"\n           alt=\"Horizontal bar chart showing GLP-1 users\u0026#39; grocery spending changes: savory snacks -10.1%, sweet snacks -6.8%, baked goods -5.4%, with yogurt as only significant increase at \u0026#43;3.4%\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-glp1_category_spending-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/glp1_category_spending.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Horizontal bar chart showing GLP-1 users\u0026#39; grocery spending changes: savory snacks -10.1%, sweet snacks -6.8%, baked goods -5.4%, with yogurt as only significant increase at \u0026#43;3.4%\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nAs of July 2024, \u003cstrong\u003e16.3%\u003c/strong\u003e of U.S. households have at least one GLP-1 user. The adoption curve is steepening. Nearly half of adopters report taking the medication specifically for weight loss rather than diabetes management. These weight-loss users tend to be younger, higher income, and more willing to pay out of pocket. They\u0026rsquo;re also the most profitable customers for fast food chains, the ones who don\u0026rsquo;t flinch at price increases.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis creates what the researchers call a \u0026ldquo;double whammy\u0026rdquo; for the food industry. Companies are losing their highest-margin customers to a biological shift in appetite while being left with a more price-sensitive demographic that actually \u003cem\u003edoes\u003c/em\u003e respond to inflation. When McDonald\u0026rsquo;s CEO Chris Kempczinski talks about \u003ca href=\"https://www.youtube.com/watch?v=srH8f_Fa82A\"\u003elosing lower-income customers to home cooking\u003c/a\u003e, he\u0026rsquo;s describing the wrong problem.\u003cfigure class=\"post-figure\" style=\"width: 80%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-glp1_adoption_timeline-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/glp1_adoption_timeline.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/glp1_adoption_timeline.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/glp1_adoption_timeline.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/glp1_adoption_timeline.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_adoption_timeline.png 1200w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/glp1_adoption_timeline.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/glp1_adoption_timeline.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/glp1_adoption_timeline.png 1440w\"\n              sizes=\"80vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_adoption_timeline.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/glp1_adoption_timeline.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/glp1_adoption_timeline.png 2000w\"\n              sizes=\"80vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/glp1_adoption_timeline.png\"\n           alt=\"Line chart showing GLP-1 adoption from Jan 2023 to Jul 2024: weight loss users surpassed diabetes control users by July 2023, reaching over 1,200 users by end of period\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-glp1_adoption_timeline-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/glp1_adoption_timeline.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Line chart showing GLP-1 adoption from Jan 2023 to Jul 2024: weight loss users surpassed diabetes control users by July 2023, reaching over 1,200 users by end of period\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe research also suggests why food executives might be keeping quiet. About \u003cstrong\u003e34%\u003c/strong\u003e of GLP-1 users discontinue within the sample period. When they stop, their spending doesn\u0026rsquo;t just return to baseline. It becomes \u003cem\u003eless healthy\u003c/em\u003e. Candy and chocolate purchases rise \u003cstrong\u003e11.4%\u003c/strong\u003e above pre-adoption levels after stopping the medication.\u003c/p\u003e\n\u003cp\u003eIf you\u0026rsquo;re running a snack company, the math might look survivable: lose customers to Ozempic for a year, then welcome them back once they quit. The drugs suppress appetite biologically; they don\u0026rsquo;t teach new habits. When the biology reverts, so does the behavior.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://youtu.be/JTG5uMWDKXk\"\u003eScott Galloway\u003c/a\u003e has called the food industry an \u0026ldquo;obesity index\u0026rdquo; and predicted a \u0026ldquo;tsunami of shareholder destruction.\u0026rdquo; The Cornell data suggests he\u0026rsquo;s directionally right but possibly too aggressive on timing. The industry has a built-in buffer: medication discontinuation. The question is whether that buffer lasts as drugs get cheaper, side effects improve, and insurance coverage expands.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe deeper issue is about the persistence of dietary change. \u003ca href=\"https://jn.nutrition.org/article/S0022-3166(25)00647-9/fulltext\"\u003ePrevious studies found\u003c/a\u003e that even major life events, a diabetes diagnosis, job loss, childbirth, produce only modest and short-lived changes in diet. Information campaigns and price nudges have mixed results at best. GLP-1 medications work differently because they alter the biological reward system directly. Users describe the experience as \u0026ldquo;silencing food noise,\u0026rdquo; a constant background hum of cravings that simply disappears.\u003c/p\u003e\n\u003cp\u003eBut this biological dependence cuts both ways. The changes don\u0026rsquo;t stick without the drug. Stopping medication means losing both the appetite suppression and whatever habits might have formed during treatment. The Cornell team notes that \u0026ldquo;GLP-1s could complement existing nutritional interventions\u0026rdquo; but cautions that \u0026ldquo;their broader public health relevance ultimately depends on sustained adherence.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eFor investors, the practical question is positioning. Companies selling hyperpalatable, calorie-dense products face structural headwinds. Companies selling protein-rich, nutrient-dense foods in smaller portions have tailwinds. The data shows users shifting toward yogurt, fresh fruit, and nutrition bars. Package sizes may need to shrink. Marketing strategies may need to pivot from \u0026ldquo;craveable\u0026rdquo; to \u0026ldquo;satisfying.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe next few quarters of earnings calls will be interesting. At some point, an analyst will ask the GLP-1 question directly. The honest answer from management would be: we don\u0026rsquo;t know the full impact yet, but 16% of households having a user, 8% declines in fast food spending, and the fastest-growing prescription category in the country is not something we can ignore.\u003c/p\u003e\n\u003caside class=\"inline-newsletter\" aria-label=\"Newsletter signup\"\u003e\n  \u003cdiv class=\"inline-newsletter-content\"\u003e\n    \u003cp class=\"inline-newsletter-headline\"\u003eEnjoy this writing? Get new posts, projects, and articles delivered monthly.\u003c/p\u003e\n    \u003cform id=\"inline-newsletter-4-form\" class=\"inline-newsletter-form\"\u003e\n      \u003clabel for=\"inline-newsletter-4-email\" class=\"visually-hidden\"\u003eEmail address\u003c/label\u003e\n      \u003cinput \n        type=\"email\" \n        id=\"inline-newsletter-4-email\" \n        name=\"email\" \n        placeholder=\"your@email.com\" \n        required \n        class=\"inline-newsletter-input\"\n        aria-label=\"Email address\"\n      /\u003e\n      \u003cbutton type=\"submit\" class=\"inline-newsletter-button\"\u003eSign Up\u003c/button\u003e\n    \u003c/form\u003e\n    \u003cp id=\"inline-newsletter-4-privacy\" class=\"inline-newsletter-privacy\"\u003e\u003ca href=\"/posts/building-a-no-tracking-newsletter-from-markdown-to-distribution/\"\u003eNo tracking\u003c/a\u003e. Unsubscribe anytime.\u003c/p\u003e\n    \u003cdiv id=\"inline-newsletter-4-message\" class=\"inline-newsletter-message\" style=\"display: none;\"\u003e\u003c/div\u003e\n  \u003c/div\u003e\n\u003c/aside\u003e\n\n\u003cscript\u003e\n(function() {\n  var formId = 'inline-newsletter-4-form';\n  var messageId = 'inline-newsletter-4-message';\n  var emailId = 'inline-newsletter-4-email';\n  var privacyId = 'inline-newsletter-4-privacy';\n  \n  function init() {\n    var form = document.getElementById(formId);\n    var messageDiv = document.getElementById(messageId);\n    var emailInput = document.getElementById(emailId);\n    var privacyDiv = document.getElementById(privacyId);\n    \n    \n    if (privacyDiv \u0026\u0026 !privacyDiv.dataset.countLoaded) {\n      privacyDiv.dataset.countLoaded = 'true';\n      fetch('https://newsletter-api.philippd.workers.dev/api/subscriber-count')\n        .then(function(r) { return r.json(); })\n        .then(function(data) {\n          if (data.display) {\n            \n            var countText = document.createTextNode('Join ' + data.display + ' readers. ');\n            privacyDiv.insertBefore(countText, privacyDiv.firstChild);\n          }\n        })\n        .catch(function() {   });\n    }\n    \n    if (!form) return;\n    \n    form.addEventListener('submit', function(e) {\n      e.preventDefault();\n      \n      var email = emailInput.value.trim();\n      if (!email) return;\n      \n      var emailRegex = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/;\n      if (!emailRegex.test(email)) {\n        showMessage('Please enter a valid email address.', 'error');\n        return;\n      }\n      \n      var submitButton = form.querySelector('button[type=\"submit\"]');\n      submitButton.disabled = true;\n      submitButton.textContent = 'Subscribing...';\n      \n      fetch('https://newsletter-api.philippd.workers.dev/api/subscribe', {\n        method: 'POST',\n        headers: { 'Content-Type': 'application/json' },\n        body: JSON.stringify({ email: email })\n      })\n      .then(function(response) { return response.json(); })\n      .then(function(data) {\n        if (data.success) {\n          form.style.display = 'none';\n          document.querySelector('#' + formId).closest('.inline-newsletter').querySelector('.inline-newsletter-privacy').style.display = 'none';\n          showMessage('Thanks for subscribing! You\\'ll receive the next newsletter in your inbox.', 'success');\n        } else {\n          showMessage(data.error || 'Something went wrong. Please try again.', 'error');\n          submitButton.disabled = false;\n          submitButton.textContent = 'Sign Up';\n        }\n      })\n      .catch(function() {\n        showMessage('Something went wrong. Please try again later.', 'error');\n        submitButton.disabled = false;\n        submitButton.textContent = 'Sign Up';\n      });\n    });\n    \n    function showMessage(text, type) {\n      messageDiv.textContent = text;\n      messageDiv.className = 'inline-newsletter-message inline-newsletter-message-' + type;\n      messageDiv.style.display = 'block';\n    }\n  }\n  \n  if (document.readyState === 'loading') {\n    document.addEventListener('DOMContentLoaded', init);\n  } else {\n    init();\n  }\n})();\n\u003c/script\u003e\n\n","summary":"Cornell research: GLP-1 users cut grocery spending 5.3%, fast food 8%. With 16% household adoption and savory snacks down 10%, food stocks face headwinds.","image":"https://static.philippdubach.com/ograph/ograph-glp1-food.jpg","date_published":"2026-01-16T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Medicine","Economics"],"_philippdubach":{"type":"Commentary","word_count":780,"reading_time_minutes":4,"keywords":["GLP-1 food industry impact","Ozempic grocery spending","food stocks GLP-1","Wegovy consumer behavior","weight loss drug food sales"],"section":"posts"}},{"id":"https://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/","url":"https://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/","title":"Does AI mean the demand on labor goes up?","content_html":"\u003cp\u003e\u003ca href=\"https://x.com/TheStalwart/status/2011418760813629738\"\u003eJoe Weisenthal\u003c/a\u003e from Bloomberg, this week:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eAll my shower thoughts now are about designing efficient workflows for synthesizing, collecting, labeling and annotating data.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eSame. Since I started building every app and tool I thought would make my life easier, my workflow more efficient, I haven\u0026rsquo;t stopped. Apparently \u003ca href=\"https://techcrunch.com/2026/01/16/the-rise-of-micro-apps-non-developers-are-writing-apps-instead-of-buying-them/\"\u003enon-developers are now writing apps\u003c/a\u003e instead of buying them. This is the AI productivity paradox in miniature: the tools get better and we do more, not less.\u003c/p\u003e\n\u003cp\u003eThe assumed narrative is still AI displaces jobs, humans collect UBI, society figures out leisure. But the trajectory might be more work, not less. A \u003ca href=\"https://cepr.org/voxeu/columns/ais-power-grows-so-does-our-workday\"\u003erecent NBER study\u003c/a\u003e found that workers in AI-exposed occupations now work roughly 3 extra hours per week—and leisure time has dropped by the same amount. \u003ca href=\"https://investors.upwork.com/news-releases/news-release-details/upwork-study-finds-employee-workloads-rising-despite-increased-c\"\u003eUpwork\u0026rsquo;s research\u003c/a\u003e puts it bluntly: 77% of employees say AI tools have \u003cem\u003eadded\u003c/em\u003e to their workload.\u003c/p\u003e\n\u003cp\u003eThe \u003ca href=\"https://en.wikipedia.org/wiki/Jevons_paradox\"\u003eJevons paradox\u003c/a\u003e is 160 years old: when James Watt made steam engines more efficient, coal consumption didn\u0026rsquo;t fall. It exploded. Efficiency made coal useful in new ways. Satya Nadella \u003ca href=\"https://www.npr.org/sections/planet-money/2025/02/04/g-s1-46018/ai-deepseek-economics-jevons-paradox\"\u003ereferenced this for AI\u003c/a\u003e after DeepSeek rattled the markets. Erik Brynjolfsson argues it applies to AI-augmented occupations—coders, radiologists, translators. Make something more efficient and you find more things to do with it.\u003c/p\u003e\n\u003cp\u003eWhen I can build an app in a weekend that used to take months, I don\u0026rsquo;t build one. I build six. When I can write a report in an hour, I write five. The friction that once protected us from infinite expectations evaporates. This is the Jevons paradox applied not just to markets or coal, but to our own time and cognitive capacity—a kind of psychological rebound effect where internal expectations outrun what\u0026rsquo;s actually sustainable.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eKeynes predicted a \u003ca href=\"http://www.econ.yale.edu/smith/econ116a/keynes1.pdf\"\u003e15-hour work week\u003c/a\u003e by now. We got the productivity gains. We work longer hours than ever. Only \u003ca href=\"https://hellofuture.orange.com/en/the-ai-productivity-paradox-the-new-tech-may-be-eating-into-your-leisure-time/\"\u003e21% of employees\u003c/a\u003e actually use the time AI saves them for personal life. The rest reinvest it right back into work. When capability expands, so does the definition of \u0026ldquo;enough.\u0026rdquo; The bar rises.\u003c/p\u003e\n\u003cp\u003eIf AI makes me 10x more productive, that\u0026rsquo;s not 10x more free time. That\u0026rsquo;s 10x more I \u003cem\u003ecould\u003c/em\u003e be doing. In a competitive environment—founding, climbing, anything with stakes—someone who uses that 10x while I rest will outrun me. The fear was displacement. The reality might be inescapability.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Parkinson%27s_law#First_meaning\"\u003eParkinson\u0026rsquo;s Law\u003c/a\u003e: work expands to fill time available. The AI corollary: work expands to fill capabilities available. More capability means more possibility—and more obligation. We should know where this points.\u003c/p\u003e\n","summary":"AI was supposed to free us. The Jevons paradox plays out in real time: efficiency expands workload, not leisure. 77% of workers say AI added to their work.","image":"https://static.philippdubach.com/ograph/0005.png","date_published":"2026-01-15T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI","Economics"],"_philippdubach":{"type":"Commentary","word_count":411,"reading_time_minutes":2,"keywords":["AI productivity paradox","Jevons paradox AI","AI increasing workload","future of work AI","Parkinson's law AI"],"section":"posts"}},{"id":"https://philippdubach.com/posts/repo-might-be-even-bigger-than-we-thought/","url":"https://philippdubach.com/posts/repo-might-be-even-bigger-than-we-thought/","title":"Repo might be even bigger than we thought","content_html":"\u003cblockquote\u003e\n\u003cp\u003eFinance is anthropological\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat\u0026rsquo;s \u003ca href=\"https://en.wikipedia.org/wiki/Zoltan_Pozsar\"\u003eZoltan Pozsar\u003c/a\u003e, the Hungarian-American economist who mapped the plumbing of modern money before most people knew there was plumbing to map. When he said it to Bloomberg in 2019, he was trying to explain why repo markets \u003cem\u003e(\u003ca href=\"https://en.wikipedia.org/wiki/Repurchase_agreement\"\u003ethe overnight lending infrastructure that lubricates trillions in daily transactions\u003c/a\u003e)\u003c/em\u003e had just seized up in ways the Federal Reserve didn\u0026rsquo;t anticipate.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve \u003ca href=\"https://philippdubach.com/posts/pozsars-bretton-woods-iii-the-framework-1/2/\"\u003ewritten about Pozsar\u0026rsquo;s work before\u003c/a\u003e, particularly his \u0026ldquo;Bretton Woods III\u0026rdquo; thesis about the shifting role of the dollar. But his earlier research on shadow banking and repo markets feels increasingly relevant as we enter 2026. In December 2025, the Office of Financial Research \u003ca href=\"https://www.financialresearch.gov/the-ofr-blog/2025/12/04/sizing-us-repo-market/\"\u003epublished new data\u003c/a\u003e on the size of the U.S. repo market. The number: $12.6 trillion in average daily exposures. That\u0026rsquo;s roughly $700 billion larger than previous estimates; a measurement error roughly the size of the entire Swiss banking system.\u003c/p\u003e\n\u003cp\u003eWhere did the extra $700 billion come from? Mostly from what the OFR calls \u0026ldquo;non-centrally cleared bilateral repo,\u0026rdquo; or NCCBR; the segment of the market that doesn\u0026rsquo;t flow through clearinghouses or the tri-party platforms that regulators can easily observe. This bilateral segment alone accounts for $5 trillion. Until the OFR\u0026rsquo;s new transaction-level data collection, which only reached full implementation in July 2025, much of this activity was essentially invisible.\u003c/p\u003e\n\u003cp\u003eThis matters because repo is not a peripheral market. It is the market through which cash-rich institutions lend to cash-poor ones, every single day, against collateral. Money market funds, hedge funds, broker-dealers, asset managers, banks. When repo works, it\u0026rsquo;s invisible. When it doesn\u0026rsquo;t, as in September 2019, overnight rates spike and the Fed scrambles to inject liquidity.\u003c/p\u003e\n\u003cp\u003eOn December 31, 2025, eligible financial firms borrowed a record $74.6 billion from the Fed\u0026rsquo;s Standing Repo Facility, which is the highest since its launch in 2021. The Fed had just \u003ca href=\"https://tellerwindow.newyorkfed.org/2025/12/23/standing-repo-operations-in-the-federal-reserves-monetary-policy-implementation-framework/\"\u003eeliminated the $500 billion daily cap\u003c/a\u003e on this facility, a quiet acknowledgment that the ceiling might actually matter. Quantitative tightening officially ended on December 1, 2025. Reserves had fallen to $2.8 trillion, their lowest in four years.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThe plumbing was straining again.\u003c/p\u003e\n\u003cp\u003ePozsar\u0026rsquo;s 2014 OFR paper, \u0026ldquo;Shadow Banking: The Money View,\u0026rdquo; introduced a framework that still haunts anyone who reads it carefully. At its core is a hierarchy of money. Currency sits at the top, the liability of the sovereign. Below that: bank deposits, insured and backstopped by the FDIC. Below that: repo, secured by collateral but not by any explicit government guarantee. Below that: the constant-NAV shares of money market funds, which promise par redemption but rest on layers of private credit puts, reputational commitments, and the fragile assumption that nothing will go wrong simultaneously.\u003c/p\u003e\n\u003cp\u003eThe key insight is that what counts as \u0026ldquo;money\u0026rdquo; depends on where you sit in this hierarchy. For a retail depositor, money is an insured bank balance. For a corporate treasurer managing $50 billion in cash, money begins where M2 ends—in repo, in money fund shares, in instruments that offer some semblance of safety at scale but lack the explicit backstops that smaller depositors take for granted.\u003c/p\u003e\n\u003cp\u003ePozsar called these institutions \u0026ldquo;cash pools\u0026rdquo;—the corporate treasuries, sovereign wealth funds, and asset managers whose cash balances are too large to fit within the insured deposit system. They need money-like instruments, but the supply of truly safe assets (Treasury bills, insured deposits) is inelastic. So they reach for the next best thing: shadow money claims backed by private collateral and private liquidity puts.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eNow, the new OFR data reveals that $5 trillion of daily repo activity, roughly 40% of the market, occurs in bilateral arrangements that, until recently, were largely opaque to regulators. The collateral backing this activity is 61.8% Treasuries, but that leaves substantial room for corporate bonds, agency MBS, and other assets that can gap in value during stress.\u003c/p\u003e\n\u003cp\u003ePozsar\u0026rsquo;s 2019 Global Money Notes described the repo market as a hierarchy with dealers at the center and the Fed at the top, operating as a \u0026ldquo;dealer of last resort\u0026rdquo; when private balance sheets reach their limits. The Standing Repo Facility was supposed to institutionalize this role, providing a ceiling on overnight rates by offering funding at a known price.\u003c/p\u003e\n\u003cp\u003eThe facility sat unused for years while reserves were abundant. Now, as reserves decline, usage is spiking at quarter-ends and year-ends, exactly when balance sheet constraints bind hardest. The question Pozsar raised in 2019 remains unanswered: can the Fed operate a standing repo facility that polices the top of its target range without losing control over its balance sheet size? Or will it be forced, eventually, to monetize excess collateral on a scale that looks a lot like QE by another name?\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;s a concept in infrastructure studies called \u0026ldquo;seamful design\u0026rdquo;: the idea that making the seams of a system visible can improve rather than degrade the user experience. GPS, for instance, became more useful when designers surfaced uncertainty estimates rather than hiding them.\u003c/p\u003e\n\u003cp\u003eThe repo market is the opposite: seamless by design, invisible until it fails. The OFR\u0026rsquo;s new data collection is, in some sense, an attempt to add seams, to make visible what was hidden, to understand the shape of the beast before the next crisis. But measurement is not control. Knowing the market is $12.6 trillion doesn\u0026rsquo;t tell you what happens when a major counterparty fails, or when a category of collateral suddenly trades at distressed prices, or when the behavioral assumptions embedded in banks\u0026rsquo; liquidity models turn out to be wrong.\u003c/p\u003e\n\u003cp\u003ePozsar understood this intuitively. His famous \u003ca href=\"https://www.newyorkfed.org/medialibrary/media/research/economists/adrian/1306adri_map.pdf\"\u003emap of the shadow banking system\u003c/a\u003e, posted in the New York Fed\u0026rsquo;s briefing room, required zooming in seven or eight times to read any detail. Colleagues who didn\u0026rsquo;t take the time to study it, he warned, were looking at \u0026ldquo;10% of the picture.\u0026rdquo;\u003c/p\u003e\n","summary":"New OFR data reveals $12.6 trillion in daily repo exposures—$700 billion larger than previous estimates. The plumbing of modern money remains poorly understood.","image":"https://static.philippdubach.com/ograph/0012.png","date_published":"2026-01-13T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Macro"],"_philippdubach":{"type":"Commentary","word_count":958,"reading_time_minutes":5,"keywords":["repo market","shadow banking","Standing Repo Facility","money market liquidity","Zoltan Pozsar"],"section":"posts"}},{"id":"https://philippdubach.com/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/","url":"https://philippdubach.com/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/","title":"The Market Can Stay Irrational Longer Than You Can Stay Solvent","content_html":"\u003cp\u003eA friend recently recommended \u003ca href=\"https://en.wikipedia.org/wiki/Steve_Eisman\"\u003eSteve Eisman\u003c/a\u003e\u0026rsquo;s podcast to me. Eisman, you might recall, is the hedge fund manager portrayed in The Big Short who famously bet against subprime mortgages before the 2008 crisis. In his \u003ca href=\"https://www.youtube.com/@RealEismanPlaybook\"\u003emost recent episode\u003c/a\u003e, Eisman laid out a thesis for something that made me uncomfortable ever since the \u003ca href=\"https://en.wikipedia.org/wiki/2020_stock_market_crash\"\u003eCovid-19 stock market crash\u003c/a\u003e recovery: the U.S. equity market has structurally decoupled from everyday economic reality.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve written \u003ca href=\"https://philippdubach.com/posts/how-ai-is-shaping-my-investment-portfolio-for-2026/\"\u003eabout market concentration\u003c/a\u003e in my 2026 portfolio allocation. But Eisman\u0026rsquo;s point isn\u0026rsquo;t just about concentration. It\u0026rsquo;s about what this concentration means for everyone else. Consider what happens to consumer-exposed sectors. Combined, healthcare, consumer discretionary, and consumer staples have fallen from 38% of the index in 2015 to just 25% today. This matters because roughly \u003ca href=\"https://fred.stlouisfed.org/series/DPCERE1Q156NBEA\"\u003e70% of U.S. GDP is consumer-driven\u003c/a\u003e. The traditional logic was simple: consumer spending drives the economy, consumer stocks reflect that spending, and therefore the stock market reflects economic health. That relationship has broken down.\u003c/p\u003e\n\u003cp\u003eThe disconnect shows up in daily American life. Healthcare costs continue rising, housing remains unaffordable for many, and grocery prices have yet to normalize. These are real pressures on real households. Yet the S\u0026amp;P 500 gained 16% in 2025, with the Nasdaq up 21%. The market doesn\u0026rsquo;t care about rent or insurance premiums because the companies reflecting those costs barely register in the index anymore. As Eisman puts it:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eThe market has become unmoored from everyday life.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThis creates a structural problem for active managers that compounds over time. When \u003ca href=\"https://www.slickcharts.com/sp500\"\u003eNVIDIA alone represents 7.7% of the S\u0026amp;P 500\u003c/a\u003e, Apple 6.8%, and Microsoft 6.1%, most institutional mandates physically prevent managers from holding proportional positions. Risk limits cap initial positions at perhaps 5% of assets under management. Sector allocation rules require diversification across all eleven sectors. The result is systematic underweighting of the fastest-growing names. Meanwhile, the bottom five sectors combined represent just 14% of the index. Real estate, with 31 constituents, accounts for barely 2%. Why dedicate research resources to an entire sector that can only marginally move your portfolio?\u003c/p\u003e\n\u003cp\u003eThe rise of passive investing amplifies all of this. Index funds now control roughly 60% of flows versus 40% for active managers. When money enters an index fund, it buys stocks in proportion to their existing market cap. Large positions grow larger. There\u0026rsquo;s no portfolio manager deciding NVIDIA looks expensive. The buying is mechanical, price-insensitive, and self-reinforcing. This doesn\u0026rsquo;t eliminate price discovery entirely.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eEisman points to Oracle\u0026rsquo;s Q3 2025 experience: shares surged after reporting a massive backlog, then corrected below pre-earnings levels once investors realized the backlog concentrated in a single customer with questionable financing. Active managers still matter. They just matter less.\u003c/p\u003e\n\u003cp\u003eIn a normal correction, sellers meet buyers who evaluate whether prices have become attractive. In a passive-dominated market, redemptions trigger mechanical selling. Index funds don\u0026rsquo;t decide that a 20% drawdown makes stocks compelling. They sell what they own in proportion to what they own. If active managers control only 40% of flows, the stabilizing bid may prove insufficient. The \u003ca href=\"https://www.reuters.com/markets/\"\u003eFebruary-April 2025 correction\u003c/a\u003e saw the S\u0026amp;P fall 19% peak-to-trough. Eisman\u0026rsquo;s assessment: if an actual recession materializes, or if AI spending disappoints expectations,\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003ethe decline will almost certainly be steeper. It will be fast and very ugly.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThere\u0026rsquo;s also a tax dimension creating behavioral lock-in. Years of technology outperformance have embedded massive unrealized capital gains in both retail and institutional portfolios. Selling NVIDIA means realizing those gains and paying taxes on them. Investors avoid this until forced by margin calls, redemptions, or actual fundamental collapse. This creates asymmetric liquidity: plenty of buyers on the way up, scarce ones on the way down.\u003c/p\u003e\n\u003cp\u003eWhat does this mean for portfolio construction? First, understand that traditional cap-weighted benchmarks now represent a concentrated bet on technology and AI capital expenditure. Second, active management faces structural headwinds that have nothing to do with manager skill. Third, liquidity assumptions that held in previous corrections may not hold in the next one. And fourth, consumer welfare can deteriorate materially without meaningfully impacting index returns. The K-shaped economy produces a K-shaped market, where the experience of median households and the experience of median stock index performance have genuinely diverged.\u003c/p\u003e\n\u003caside class=\"disclaimer\" role=\"note\" aria-label=\"Disclaimer\"\u003e\n  \u003cdiv class=\"disclaimer-content\"\u003e\u003cp\u003e\u003cstrong\u003eDisclaimer:\u003c/strong\u003e All opinions expressed are my own. This is not investment, financial, tax, or legal advice. Past performance does not indicate future results. Do your own research and consult qualified professionals before making financial decisions. No liability accepted for any losses.\u003c/p\u003e\u003c/div\u003e\n\u003c/aside\u003e\n\n","summary":"Steve Eisman explains how U.S. equity markets have structurally decoupled from everyday economic reality through concentration and passive investing.","image":"https://static.philippdubach.com/ograph/ograph-unmoored-market.jpg","date_published":"2026-01-11T00:00:00Z","date_modified":"2026-02-23T18:16:42+01:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["Investing"],"_philippdubach":{"type":"Commentary","word_count":739,"reading_time_minutes":4,"keywords":["stock market decoupling economy","S\u0026P 500 concentration risk","passive investing market impact","Steve Eisman market thesis","K-shaped economy stocks"],"section":"posts"}},{"id":"https://philippdubach.com/posts/social-media-success-prediction-bert-models-for-post-titles/","url":"https://philippdubach.com/posts/social-media-success-prediction-bert-models-for-post-titles/","title":"Social Media Success Prediction: BERT Models for Post Titles","content_html":"\u003cp\u003eLast week I published a \u003ca href=\"https://philippdubach.com/standalone/hn-sentiment/\"\u003eHacker News title sentiment analysis\u003c/a\u003e based on the \u003ca href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5910263\"\u003eAttention Dynamics in Online Communities\u003c/a\u003e paper I have been working on. The \u003ca href=\"https://news.ycombinator.com/item?id=46512881\"\u003ediscussion on Hacker News\u003c/a\u003e raised the obvious question: can you actually predict what will do well here?\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-https:--static-philippdubach-com-hn_post_frontpage2-png-0\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png\"\n           alt=\"Hacker News Frontpage\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-https:--static-philippdubach-com-hn_post_frontpage2-png-0\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/https://static.philippdubach.com/hn_post_frontpage2.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Hacker News Frontpage\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe honest answer is: partially. Timing matters. News cycles matter. Who submits matters. Weekend versus Monday morning matters. Most of these factors aren\u0026rsquo;t in the title. But titles aren\u0026rsquo;t nothing either. \u0026ldquo;Show HN\u0026rdquo; signals something. So does phrasing, length, and topic selection. The question becomes: how much signal can you extract from 80 characters?\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003ca href=\"https://news.ycombinator.com/news\"\u003eHacker News\u003c/a\u003e (HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator \u003ca href=\"https://www.ycombinator.com\"\u003eY Combinator\u003c/a\u003e.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eThis isn\u0026rsquo;t new territory. \u003ca href=\"https://minimaxir.com/2017/06/reddit-deep-learning/\"\u003eMax Woolf built a Reddit submission predictor\u003c/a\u003e back in 2017, and \u003ca href=\"https://ontology2.com/essays/ClassifyingHackerNewsArticles/\"\u003eontology2 trained an HN classifier\u003c/a\u003e using logistic regression on title words. Both found similar ceilings; around 0.76 AUC with classical approaches. I wanted to see what modern transformers could add.\u003c/p\u003e\n\u003cp\u003eThe baseline was DistilBERT, fine-tuned on 90,000 HN posts. ROC AUC of 0.654, trained in about 20 minutes on a T4 GPU. Not bad for something that only sees titles. Then RoBERTa with label smoothing pushed it to 0.692. Progress felt easy.\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-03_roc_curve-png-2\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/03_roc_curve.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/03_roc_curve.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/03_roc_curve.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/03_roc_curve.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_roc_curve.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/03_roc_curve.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/03_roc_curve.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/03_roc_curve.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_roc_curve.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/03_roc_curve.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/03_roc_curve.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/03_roc_curve.png\"\n           alt=\"ROC curve comparing model versions\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-03_roc_curve-png-2\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/03_roc_curve.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"ROC curve comparing model versions\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nWhat if sentence embeddings captured something classification heads missed? I built an ensemble: \u003ca href=\"https://www.sbert.net/\"\u003eSBERT\u003c/a\u003e for semantic features, RoBERTa for discrimination, weighted average at the end. The validation AUC jumped to 0.714.\u003c/p\u003e\n\u003cp\u003eThe problem was hiding in the train/test split. I\u0026rsquo;d used random sampling. HN has strong temporal correlations: topics cluster, writing styles evolve, news cycles create duplicates. A random split let the model see the future. SBERT\u0026rsquo;s semantic embeddings matched near-duplicate posts across the split perfectly.\u003c/p\u003e\n\u003cp\u003eWhen I switched to a strict temporal split, training on 2022-early 2024 and testing on late 2024 onward, the ensemble dropped to 0.693. More revealing: the optimal SBERT weight went from 0.35 to 0.10. SBERT was contributing almost nothing. The model had memorized temporal patterns, not learned to predict.\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-02_calibration-png-3\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/02_calibration.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/02_calibration.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/02_calibration.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/02_calibration.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_calibration.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/02_calibration.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/02_calibration.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/02_calibration.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_calibration.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/02_calibration.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/02_calibration.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/02_calibration.png\"\n           alt=\"Calibration plot showing predicted vs actual probabilities\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-02_calibration-png-3\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/02_calibration.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Calibration plot showing predicted vs actual probabilities\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nI kept RoBERTa, added more regularization, dropped from 0.1 to 0.2 dropout, weight decay from 0.01 to 0.05, froze the lower six transformer layers. The model got worse at fitting training data. Train AUC dropped from 0.803 to 0.727.\u003c/p\u003e\n\u003cp\u003eBut the train-test gap collapsed from 0.109 to 0.042. That\u0026rsquo;s a 61% reduction in overfitting. Test AUC of 0.685 versus the ensemble\u0026rsquo;s 0.693, a difference that vanishes once you account for confidence intervals. And now inference runs on a single model, half the latency, no SBERT dependency, 500MB instead of 900MB.\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_version_comparison-png-4\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_version_comparison.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_version_comparison.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_version_comparison.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_version_comparison.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_version_comparison.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_version_comparison.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_version_comparison.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_version_comparison.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_version_comparison.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_version_comparison.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_version_comparison.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_version_comparison.png\"\n           alt=\"Model version comparison showing evolution from V1 to V7\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_version_comparison-png-4\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_version_comparison.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Model version comparison showing evolution from V1 to V7\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-06_score_by_category-png-5\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/06_score_by_category.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/06_score_by_category.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/06_score_by_category.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/06_score_by_category.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_score_by_category.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/06_score_by_category.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/06_score_by_category.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/06_score_by_category.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_score_by_category.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/06_score_by_category.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/06_score_by_category.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/06_score_by_category.png\"\n           alt=\"Prediction scores by content category\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-06_score_by_category-png-5\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/06_score_by_category.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Prediction scores by content category\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe other lesson was calibration. A model that says 0.8 probability should mean \u0026ldquo;70% of posts I give this score actually hit 100 points.\u0026rdquo; Neural networks trained on cross-entropy don\u0026rsquo;t do this naturally. They\u0026rsquo;re overconfident. I used \u003ca href=\"https://scikit-learn.org/stable/modules/isotonic.html\"\u003eisotonic regression\u003c/a\u003e on the validation set to fix the mapping. Expected calibration error (ECE) measures this gap:\u003c/p\u003e\n$$ECE = \\sum_{b=1}^{B} \\frac{n_b}{N} \\left| \\text{acc}(b) - \\text{conf}(b) \\right|$$\u003cp\u003ewhere you bin predictions by confidence, then measure how far off the actual accuracy is from the predicted confidence in each bin. ECE went from 0.089 to 0.043. Now when the model says 0.4, it\u0026rsquo;s telling the truth.\u003c/p\u003e\n\u003cp\u003eIn practice, the model provides meaningful lift. If you only look at the top 10% of predictions by score, 62% of them are actual hits, roughly 1.9x better than random selection:\u003cfigure class=\"post-figure\" style=\"width: 50%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_lift_analysis-png-6\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_lift_analysis.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_lift_analysis.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_lift_analysis.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_lift_analysis.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_lift_analysis.png 1200w\"\n              sizes=\"50vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_lift_analysis.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_lift_analysis.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_lift_analysis.png 1440w\"\n              sizes=\"50vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_lift_analysis.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_lift_analysis.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_lift_analysis.png 2000w\"\n              sizes=\"50vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_lift_analysis.png\"\n           alt=\"Lift analysis showing precision at different thresholds\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_lift_analysis-png-6\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_lift_analysis.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Lift analysis showing precision at different thresholds\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-08_calibration_error-png-7\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/08_calibration_error.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/08_calibration_error.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/08_calibration_error.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/08_calibration_error.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/08_calibration_error.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/08_calibration_error.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/08_calibration_error.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/08_calibration_error.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/08_calibration_error.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/08_calibration_error.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/08_calibration_error.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/08_calibration_error.png\"\n           alt=\"Calibration error distribution\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-08_calibration_error-png-7\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/08_calibration_error.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Calibration error distribution\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nAbout training speed: I used the \u003ca href=\"https://www.nvidia.com/en-us/data-center/h100/\"\u003eNVIDIA H100 GPU\u003c/a\u003e, which runs around 18x more expensive than the T4 per hour on hosted (Google Colab) runtimes. A sensible middle ground would be an A100 (40 or 80GB VRAM) or L4, training 3-5x faster than T4, maybe 5-7 minutes instead of 20-30. But watching epochs fly by at ~130 iterations per second after coming from T4\u0026rsquo;s ~3 iterations per second was a different experience. \u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-colab-training-hn-png-8\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/colab-training-hn.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/colab-training-hn.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/colab-training-hn.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/colab-training-hn.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/colab-training-hn.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/colab-training-hn.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/colab-training-hn.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/colab-training-hn.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/colab-training-hn.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/colab-training-hn.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/colab-training-hn.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/colab-training-hn.png\"\n           alt=\"Colab notebook showing H100 training at 130 it/s\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-colab-training-hn-png-8\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/colab-training-hn.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Colab notebook showing H100 training at 130 it/s\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nThe model learned some intuitive patterns. \u0026ldquo;Show HN\u0026rdquo; titles score higher. Deep technical dives do well. Generic news aggregation doesn\u0026rsquo;t. Titles between 40-80 characters perform better than very short or very long ones. Some of this probably reflects real engagement patterns. Some of it is noise the model hasn\u0026rsquo;t been sufficiently regularized to ignore.\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-10_title_length_performance-png-9\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/10_title_length_performance.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/10_title_length_performance.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/10_title_length_performance.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/10_title_length_performance.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/10_title_length_performance.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/10_title_length_performance.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/10_title_length_performance.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/10_title_length_performance.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/10_title_length_performance.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/10_title_length_performance.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/10_title_length_performance.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/10_title_length_performance.png\"\n           alt=\"Model performance by title length\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-10_title_length_performance-png-9\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/10_title_length_performance.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Model performance by title length\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003eRunning a few titles through the model shows what it picks up on:\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-table_title_workshop-png-10\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/table_title_workshop.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/table_title_workshop.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/table_title_workshop.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/table_title_workshop.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_title_workshop.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/table_title_workshop.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/table_title_workshop.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/table_title_workshop.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_title_workshop.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/table_title_workshop.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/table_title_workshop.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/table_title_workshop.png\"\n           alt=\"Title workshop showing model predictions for different phrasings\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-table_title_workshop-png-10\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/table_title_workshop.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"Title workshop showing model predictions for different phrasings\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\nVague claims score low. Specificity helps. First-person \u0026ldquo;I built\u0026rdquo; framing does well, which matches what actually gets upvoted. The model isn\u0026rsquo;t learning to game HN; it\u0026rsquo;s learning what HN already rewards.\u003c/p\u003e\n\u003cp\u003eThe model now runs, scoring articles in an \u003ca href=\"https://github.com/philippdubach/rss-reader\"\u003eRSS reader pipeline\u003c/a\u003e I built. Does it help? Mostly. I still click on things marked low probability. But the high-confidence predictions are usually right. It\u0026rsquo;s a filter, not an oracle.\u003cfigure class=\"post-figure\" style=\"width: 70%; margin: 1.5rem auto;\"\u003e\n  \u003cbutton type=\"button\" class=\"img-trigger\" data-lightbox-target=\"lightbox-dashboard-hn-scoring-png-11\" aria-label=\"View full-size image\"\u003e\n    \u003cpicture class=\"img-lightbox\"\u003e\n      \u003csource media=\"(max-width: 768px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=320,quality=80,format=auto/dashboard-hn-scoring.png 320w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=480,quality=80,format=auto/dashboard-hn-scoring.png 480w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=640,quality=80,format=auto/dashboard-hn-scoring.png 640w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=960,quality=80,format=auto/dashboard-hn-scoring.png 960w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dashboard-hn-scoring.png 1200w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(max-width: 1024px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=768,quality=80,format=auto/dashboard-hn-scoring.png 768w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1024,quality=80,format=auto/dashboard-hn-scoring.png 1024w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1440,quality=80,format=auto/dashboard-hn-scoring.png 1440w\"\n              sizes=\"70vw\"\u003e\n      \u003csource media=\"(min-width: 1025px)\"\n              srcset=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dashboard-hn-scoring.png 1200w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=1600,quality=80,format=auto/dashboard-hn-scoring.png 1600w,\n                      https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=80,format=auto/dashboard-hn-scoring.png 2000w\"\n              sizes=\"70vw\"\u003e\n      \u003cimg src=\"https://static.philippdubach.com/cdn-cgi/image/width=1200,quality=80,format=auto/dashboard-hn-scoring.png\"\n           alt=\"RSS reader dashboard showing HN prediction scores\"\n           class=\"\"\n           width=\"1200\"\n           \n           loading=\"lazy\"\n           decoding=\"async\"\u003e\n    \u003c/picture\u003e\n  \u003c/button\u003e\n\u003c/figure\u003e\n\n\u003cdialog id=\"lightbox-dashboard-hn-scoring-png-11\" class=\"lightbox-dialog\" aria-label=\"Full-size image\" data-hires=\"https://static.philippdubach.com/cdn-cgi/image/width=2000,quality=85,format=auto/dashboard-hn-scoring.png\"\u003e\n  \u003cform method=\"dialog\" class=\"lightbox-close-form\"\u003e\n    \u003cbutton type=\"submit\" class=\"lightbox-close\" aria-label=\"Close\"\u003e×\u003c/button\u003e\n  \u003c/form\u003e\n  \u003cimg alt=\"RSS reader dashboard showing HN prediction scores\" decoding=\"async\"\u003e\n\u003c/dialog\u003e\n\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://huggingface.co/philippdubach/hn-success-predictor\"\u003eModel on HuggingFace\u003c/a\u003e — Download the weights and run inference locally\n\u003cbr\u003e\n\u003ca href=\"https://github.com/philippdubach/rss-reader\"\u003eRSS Reader Pipeline\u003c/a\u003e — Full scoring pipeline with feed aggregation\n\u003cbr\u003e\n\u003ca href=\"https://huggingface.co/philippdubach/hn-success-predictor/blob/main/training.ipynb\"\u003eTraining Notebook\u003c/a\u003e — Colab-ready notebook with the complete training code\u003c/p\u003e\n\u003cp\u003eOn a side note: The patterns here aren\u0026rsquo;t specific to Hacker News or online communities. Temporal leakage shows up whenever you\u0026rsquo;re predicting something that evolves over time: credit defaults, client churn, market regimes. The fix is the same: validate on future data, not random holdouts. Calibration matters anywhere probabilities drive decisions. A loan approval model that says \u0026ldquo;70% chance of repayment\u0026rdquo; needs that number to mean something. Overfitting to training data is how banks end up with models that look great in backtests and fail in production.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\u003cp\u003eI\u0026rsquo;ve built \u003ca href=\"https://philippdubach.com/projects/\"\u003esimilar systems for other domains\u003c/a\u003e: sentiment-based trading signals, glycemic response prediction, portfolio optimization. The ML fundamentals transfer. What changes is the domain knowledge needed to avoid the obvious mistakes, like training on data that wouldn\u0026rsquo;t have been available at prediction time, or trusting metrics that don\u0026rsquo;t reflect real-world performance.\u003c/p\u003e\n","summary":"Training RoBERTa to predict Hacker News success revealed temporal leakage inflating metrics. How temporal splits, calibration, and regularization fix it.","image":"https://static.philippdubach.com/ograph/ograph-hn-predictor.jpg","date_published":"2026-01-10T00:00:00Z","date_modified":"2026-05-04T14:02:44+02:00","authors":[{"name":"Philipp D. Dubach","url":"https://philippdubach.com/about/"}],"tags":["AI"],"_philippdubach":{"type":"Project","word_count":960,"reading_time_minutes":5,"keywords":["Hacker News prediction model","temporal train test split ML","RoBERTa text classification","model calibration isotonic regression","machine learning overfitting regularization"],"section":"posts"}}]}