<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Inside Voice]]></title><description><![CDATA[Notes from inside the AI systems everyone is trying to build, written by someone who builds them every day. ]]></description><link>https://www.insidevoice.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!aNwX!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16a50cc1-73fa-4fbe-826c-27adf6fe731e_256x256.png</url><title>Inside Voice</title><link>https://www.insidevoice.ai</link></image><generator>Substack</generator><lastBuildDate>Tue, 19 May 2026 04:47:53 GMT</lastBuildDate><atom:link href="https://www.insidevoice.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Jabari Allen]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[insidevoiceai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[insidevoiceai@substack.com]]></itunes:email><itunes:name><![CDATA[Jabari Allen]]></itunes:name></itunes:owner><itunes:author><![CDATA[Jabari Allen]]></itunes:author><googleplay:owner><![CDATA[insidevoiceai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[insidevoiceai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Jabari Allen]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AI's Elephant In The Room]]></title><description><![CDATA[There are two stories being told about AI right now.]]></description><link>https://www.insidevoice.ai/p/ais-elephant-in-the-room</link><guid isPermaLink="false">https://www.insidevoice.ai/p/ais-elephant-in-the-room</guid><dc:creator><![CDATA[Jabari Allen]]></dc:creator><pubDate>Wed, 18 Mar 2026 17:10:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_Lb4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are two stories being told about AI right now.</p><p>On one hand, there&#8217;s the likely more familiar story. It&#8217;s often in, behind, or around the headlines. It&#8217;s being told breathlessly and constantly. Across industries, across the political spectrum, on every major news network, in company meetings. It says that AI is both imminently and, somehow, currently, <em>unnervingly capable</em>. Half unbridled exuberance, half white knuckling anxiety.</p><p>The past several weeks have demonstrated the alluring hold this first story has. An<a href="https://github.com/openclaw/openclaw"> open source AI agent framework</a> goes viral, ships with<a href="https://conscia.com/blog/the-openclaw-security-crisis/"> glaring security issues</a>, buys a guy a Hyundai, and gets its creator hired by OpenAI and its AI agent social network spinoff acquired by Meta. A<a href="https://www.msn.com/en-us/money/markets/meet-the-former-karaoke-company-that-sank-trucking-stocks/ar-AA1Wf8gJ"> karaoke machine company</a> and a<a href="https://www.reuters.com/business/skittish-investors-spooked-dystopian-ai-outlooks-go-viral-2026-02-24/"> well written sci-fi dystopia</a> compete to see which can elicit a bigger jump scare from the stock market. The exuberance and the anxiety feed off each other.</p><p>There are some legitimate reasons this story persists. The St. Louis Fed found that<a href="https://www.stlouisfed.org/on-the-economy/2025/nov/state-generative-ai-adoption-2025"> over half of U.S. adults now use generative AI</a> in some capacity, outpacing both PC and internet adoption at the same stage after their first mass market products. In the software industry, AI coding assistants have been a breakout hit, with surveys showing between 72-91% adoption.</p><p>But there&#8217;s another story also being told. It&#8217;s not as loud. It isn&#8217;t a secret, it just isn&#8217;t as provocative. It&#8217;s not typically told by the politicians, the headlines, the CEOs, nor the industry analysts. You&#8217;ll hear it when talking to technical folks that regularly use or implement the technology. Their read on things tends to line up with the following:</p><ul><li><p><a href="https://www.spglobal.com/market-intelligence/en/news-insights/research/2025/10/generative-ai-shows-rapid-growth-but-yields-mixed-results">According to S&amp;P Global</a>, 42% of companies abandoned most of their AI initiatives, up sharply from 17% the prior year. The average organization scrapped 46% of proof-of-concept projects before reaching production.</p></li><li><p><a href="https://www.pwc.com/gx/en/issues/c-suite-insights/ceo-survey.html">PwC&#8217;s Global CEO Survey</a> found that 56% of CEOs report neither revenue increase nor cost decrease from AI over the prior 12 months. Only 12% report both.</p></li><li><p><a href="https://www.weforum.org/stories/2026/01/ceos-are-all-in-on-ai-but-anxieties-remain/">BCG&#8217;s AI Radar</a> reported that 60% of CEOs have intentionally slowed AI implementation due to concerns over errors and malfunctions.</p></li></ul><p>In other words, despite high usage and adoption in some areas, implementations are struggling.</p><p>The U.S. Census Bureau tells a similar story. AI use in actual production rose from 3.7% in September 2023 to<a href="https://www.ey.com/en_us/insights/ai/ai-powered-growth"> roughly 10% by September 2025</a>. When the Bureau was later<a href="https://econlab.substack.com/p/the-census-bureau-was-undercounting"> prodded to broaden its criteria</a> to include any business function, the number jumped to 17.6% (safe to say that this is including even nominal chatbot usage). Over 50% of Americans are using gen AI but less than 20% of businesses are using it in actual production.</p><p>No matter how you look at it, there&#8217;s a large gap between both the narratives and numbers of these two stories. But both stories do agree on one point. The capabilities are indeed real.</p><p>So if capability seemingly isn&#8217;t the problem, then what is?</p><h2>The Capability-Reliability Gap</h2><p>Anthropic put out <a href="https://www.anthropic.com/research/labor-market-impacts">an article a couple weeks ago</a> introducing a measure that looks at theoretical AI capability vs. observed real world usage across a variety of jobs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Lb4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Lb4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Lb4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_Lb4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!_Lb4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baae5a4-4207-4026-a260-e5d58d00199b_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The gap between what AI can theoretically do and what it's actually being used for, across occupational categories. Source: Anthropic, "<a href="https://www.anthropic.com/research/labor-market-impacts">The Impact of AI on Labor Markets</a>"</figcaption></figure></div><p>This chart of theirs shows the massive gap between theoretical capabilities and actual usage. Even arguably the best fit job categories (i.e. &#8220;Computer &amp; math&#8221; occupations) are supposedly only at 33% coverage. But the paper frames all of that gap as a diffusion timing problem.</p><blockquote><p>As capabilities advance, adoption spreads, and deployment deepens, the red area will grow to cover the blue.</p></blockquote><p>It&#8217;s a teleological assumption. It&#8217;s presented as inevitable.</p><p>The article pays marginal lip service to asking why the gap exists, but no serious interrogation. This &#8220;capability-first&#8221; framing is deeply embedded in the AI industry and shows up everywhere.</p><p><a href="https://www.normaltech.ai/p/new-paper-towards-a-science-of-ai">A draft paper published last month</a> puts some rigorous measurement behind what many of us have been feeling and what I believe helps explain that coverage gap.</p><p>&#8220;Towards a Science of AI Agent Reliability&#8221;, from Stephan Rabanser, Sayash Kapoor, and Arvind Narayanan, evaluated 14 models from OpenAI, Google, and Anthropic across 18 months of releases. Their core finding was that nearly two years of rapid capability progress produced only modest reliability gains.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gdrE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gdrE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 424w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 848w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 1272w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gdrE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png" width="727.998046875" height="555.1534401722837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:758,&quot;width&quot;:994,&quot;resizeWidth&quot;:727.998046875,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gdrE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 424w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 848w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 1272w, https://substackcdn.com/image/fetch/$s_!gdrE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea5d84d-3d52-4d23-8caa-2d2281b7d0dc_994x758.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overall reliability across major AI providers improved only modestly over 18 months of rapid capability gains. Source: Rabanser, Kapoor, &amp; Narayanan (<a href="https://hal.cs.princeton.edu/reliability/">interactive dashboard</a>)</figcaption></figure></div><p>All three major providers cluster together on this. It&#8217;s an industry wide pattern.</p><p>It often feels like the AI industry is continually glossing over the fact that capability and reliability are fundamentally different qualities. We tend to use &#8220;accurate&#8221; and &#8220;reliable&#8221; interchangeably, but they describe different things. A model can ace a benchmark (capability/accuracy) and still be a liability in production (reliability). The authors put it well:</p><blockquote><p>When we consider a coworker to be reliable, we don&#8217;t just mean that they get things right most of the time. We mean something richer: they get it right consistently, not right today and wrong tomorrow on the same thing. They don&#8217;t fall apart when conditions aren&#8217;t perfect. They tell you when they&#8217;re unsure rather than confidently guessing. When they do mess up, their mistakes are more likely to be fixable than catastrophic.</p></blockquote><p>That &#8220;richer&#8221; meaning is what the paper breaks down into four dimensions.</p><blockquote><ul><li><p><strong>Consistency</strong>: Agents that can solve a task often fail on repeated attempts under identical conditions. Many models have trouble giving a consistent answer, with outcome consistency scores ranging from 30% to 75% across the board.</p></li></ul><ul><li><p><strong>Robustness</strong>: Most models handle genuine technical failures (server crashes, API timeouts) gracefully. But if we rephrase the instructions with the same semantic meaning, performance drops substantially.</p></li></ul><ul><li><p><strong>Predictability</strong>: Agents are not good at knowing when they&#8217;re wrong. This is the weakest dimension across the board. When agents report confidence, it often carries little signal. On one benchmark, most models couldn&#8217;t distinguish their correct predictions from incorrect ones better than chance.</p></li><li><p><strong>Safety</strong>: Recent models are noticeably better at avoiding constraint violations, though financial errors, such as incorrect charges, remain a common failure mode. We use safety narrowly to mean bounded harm when failures occur, not broader concerns like alignment. We are still iterating on how we measure safety, so we report it separately from the aggregate reliability score.</p></li></ul></blockquote><p>So far, scaling up to bigger models has not guaranteed improvements across all these dimensions. Calibration and robustness may see improvements but then consistency takes a hit. More capability often leads to more behavioral range, more ways to do the same thing differently each time. But the standard evaluation practice is to run a benchmark once, report a number, and move on.</p><p>OpenAI&#8217;s latest and greatest model released March 5 2026, GPT-5.4, scores well on capability benchmarks while simultaneously <a href="https://artificialanalysis.ai/evaluations/omniscience">showing notably high hallucination rates</a> on tests measuring factual recall. And it&#8217;s not just OpenAI. The chart below shows a consistent pattern across both OpenAI and Anthropic: as models get more capable, their hallucination rates tend to go up, not down.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XdJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XdJp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 424w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 848w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 1272w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XdJp!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png" width="1200" height="408.7912087912088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:496,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XdJp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 424w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 848w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 1272w, https://substackcdn.com/image/fetch/$s_!XdJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a5a559-9872-4263-9df7-eeadb6359c21_1600x545.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hallucination rates across major AI models. More capable models tend to hallucinate more, not less. Source: Artificial Analysis (<a href="https://artificialanalysis.ai/evaluations/omniscience">artificialanalysis.ai</a>)</figcaption></figure></div><p>Even when industry evaluations acknowledge consistency, the discourse at large tends to downplay it. </p><p>The oft-cited <a href="https://metr.org/time-horizons/">METR evaluations</a> show increasing capabilities of LLMs to complete complex and lengthy software tasks. This is too often stated in ways like, &#8220;LLMs can now do 10 hours of work successfully.&#8221; Rather than, &#8220;LLMs can now do 10 hours of software bound work successfully <strong>50% of the time</strong>. And 1 hour of software bound work successfully <strong>80% of the time</strong>.&#8221; Put another way, imagine a colleague who hands you an hour&#8217;s worth of work five times a day. Four of those are fine. The fifth is complete slop, and you won&#8217;t know which one until you check. Now imagine that same colleague takes on a full day&#8217;s project. It&#8217;s a coin flip whether what they hand you is fantastic or unusable. Bigger wins, bigger headaches.</p><p>So the models are capable. They&#8217;re just not comparably reliable. If the AI industry doesn&#8217;t develop proper practices that meaningfully address the varying dimensions of reliability, then that red area from Anthropic&#8217;s paper, representing actual usage, might not grow smoothly at all. It might stall or grow in uneven ways. It might require reliability scaffolding as a prerequisite. If reliability is the actual bottleneck, then what kind of problem is it?</p><h2>Reliability Is a Systems Problem</h2><p>All signs point to reliability not being a property you can scale into existence the same way you can scale raw capability. Rather, it requires intentional design and engineering. It&#8217;s a systems problem. It requires architectural decisions, constraints, system design, evaluation, and deep contextual knowledge about the problem you&#8217;re trying to solve.</p><p><a href="https://www.science.org/doi/10.1126/science.aay2400">Pluribus</a>, a system built by Carnegie Mellon University and Facebook in 2019, beat elite professional poker players. For years, the broader poker AI field had been throwing scale and deep learning at the problem without cracking multiplayer play. The system that actually beat the pros cost about $144 in cloud compute and ran on a single server. It was a compound system consisting of self-play algorithms, game abstraction layers, and a realtime search engine. Multiple components with distinct roles, working together.</p><p><a href="https://www.science.org/doi/10.1126/science.ade9097">Cicero</a>, a Diplomacy-playing bot that Meta built in 2022 operated on a similar principle. A language model for communication, a future simulator for strategy, a planning engine, all orchestrated together. And in the research paper, the authors mention that they set it up so it wouldn&#8217;t lie. Not because of alignment fears (they weren&#8217;t afraid it was going to blackmail anyone), but because it would&#8217;ve been unfair to other players otherwise.</p><p>Computer scientist Cal Newport explained in <a href="https://youtu.be/8MLbOulrLA0?si=dwupNJMSH5dmQdMl&amp;t=3341">a recent interview</a> that what made this possible was the architecture:</p><blockquote><p>There&#8217;s no amorphous neural network... it&#8217;s six or seven components. We know what they do. [...] They wrote that simulator. It&#8217;s not a neural network. They wrote it and they just coded it to say, don&#8217;t look at possibilities where you lie. So it doesn&#8217;t. So that machine can&#8217;t lie. There&#8217;s no out of control piece to this.</p></blockquote><p>Newport may be oversimplifying a bit there. Cicero&#8217;s honesty wasn&#8217;t a hard coded rule exactly. The team trained the dialogue model on a curated subset of human games where players weren&#8217;t lying, and architecturally constrained the system so that what Cicero says stays tethered to what it actually plans to do. Neural networks were very much involved. But his broader point stands. Cicero is a specialized composed system rather than a single monolithic model doing everything. And because of that compositional architecture, the team could make a design decision about honesty and then actually enforce it.</p><p>What made these systems work was human judgment about architecture, constraints, and composition. The model was a component in a broader system.</p><p>None of this is radical or novel. This is just how functional software systems have always been built. The monolithic model narrative implicitly asks us to believe that everything we know about how working technology gets developed, deployed, and maintained simply stops applying. That&#8217;s the extraordinary claim. Not the idea that systems should be made up of specialized, legible, constrained components.</p><p>And the industry intuitively knows this. The meteoric rise of OpenClaw didn&#8217;t come from showcasing some breakthrough in raw model capability. It captured imaginations because it gave people the feeling of owning a composable system they could shape. The fully managed frontier models and tooling paradigm structurally can&#8217;t offer that. You can use the same proprietary models under the hood, but owning the orchestration layer reintroduces the optionality that full dependence on a provider&#8217;s ecosystem can&#8217;t provide. And OpenClaw achieved this cultural moment even as it shipped with hundreds of security vulnerabilities and the reliability profile of a hobbyist&#8217;s proof-of-concept. Yes, it&#8217;s likely partially a fad, and we&#8217;ve seen prior waves of AI agent viral moments come and go (AutoGPT, BabyAGI, Manus, rabbit R1s &#8220;Large Action Model&#8221;, and others). But the gap between what people expect from these technologies and what they can actually deliver is probably closer than it&#8217;s ever been, while still being much larger than most enthusiasts want to accept. That narrowing gap has produced a renewed and profoundly larger base of interest in sincerely pursuing composable, orchestrated systems. Even if OpenClaw itself isn&#8217;t the answer, the nerve it struck is telling.</p><p>I also see this in my work. I build production voice AI systems for businesses. Something as common as a scheduling voice agent, perhaps for a medical clinic, still requires significant engineering and experimentation. There is no tried and true way yet to build and deploy that where a clinic can just turn it on, mostly stop thinking about it, and trust it to function without any issues. Off the shelf solutions still require a considerable amount of customization (not just because of AI agent unreliability, but also because of all the challenges typical to IT integration). Almost nobody talks about this because the dissonance between that reality and the dominant narratives is too uncomfortable to reconcile. These narratives are buoyed by persistent myths like <a href="https://www.insidevoice.ai/p/effortless-ai">the notion of &#8220;effortless AI&#8221;</a>, which I&#8217;ve written about before.</p><p>We&#8217;re simultaneously told these models are so powerful they&#8217;ll replace entire categories of jobs, and yet reliably automating a phone call to schedule an appointment requires a nontrivial amount of system design and engineering (not to mention ongoing operation and maintenance).</p><p>The AI Agent Reliability paper&#8217;s recommendations for deployers are pretty straightforward . They stress the importance of clearly distinguishing automation from augmentation. If a staff member reviewing medical appointments scheduled during off-hours via your voice agent catches an error, it may be a bit annoying for them to call that person back later. But a fully autonomous AI agent that books an erroneous appointment causing a patient to miss a doctor&#8217;s appointment is unacceptable.</p><p>The paper also recommends building internal evaluations tailored to the specific context and considering reliability thresholds before moving from sandbox to production, &#8220;the way aviation requires certification before service.&#8221;</p><p>That evaluation work itself can be a competitive advantage. It&#8217;s the type of institutional knowledge you can only build up by experimenting with your actual data. Generic benchmarks won&#8217;t tell you how a model will behave in your environment, with your data, or under your constraints. Customized evaluations are a necessity, and the work of building them is deeply contextual. It resists commodification.</p><p>You don&#8217;t have to take my word on any of this. NVIDIA, the company that has arguably benefited the most from the AI scaling gold rush, itself published a paper in September 2025 literally titled <a href="https://research.nvidia.com/labs/lpr/slm-agents/">&#8220;Small Language Models are the Future of Agentic AI&#8221;</a>. The paper argues that for the repetitive, specialized tasks that AI agents actually perform in practice, smaller models are more suitable and more economical, and that the future is heterogeneous systems where small specialized models handle the bulk of the work and large models get invoked selectively. For the company selling the massively expensive shovels, that&#8217;s a notable thing to put in writing. NVIDIA also has its own growing family of open source smaller models built for exactly this kind of modular use, from language models (Nemotron) to speech-to-text models (Parakeet). There&#8217;s also NVIDIA&#8217;s Groq &#8220;acquisition&#8221;. <a href="https://www.cnbc.com/2025/12/24/nvidia-buying-ai-chip-startup-groq-for-about-20-billion-biggest-deal.html">NVIDIA spent roughly $20 billion</a> acquiring rights to a company&#8217;s technology whose entire selling point is custom AI chips built around low latency, low cost inference. That bet lines up with the view they laid out in the SLM paper. If the future is heterogeneous agentic systems running lots of smaller specialized models, you want inference infrastructure optimized for that, not just bigger clusters running bigger models.</p><h2>Optionality and Agency</h2><p>In addition to making reliability tractable, composed systems also preserve something else that the dominant paradigm structurally forecloses on: optionality.</p><p>Organizations working within composed systems can keep their hands on all the various knobs. At each boundary and breakpoint, they get to make decisions about what to build, what to outsource, what to control tightly, and where to accept tradeoffs. Not every component will be locally owned or fully controlled. Sometimes a frontier model is absolutely the right tool for a specific part of the system or workflow. But the wider architecture itself preserves the agency to decide. Maybe one part of your system requires more external capabilities than you&#8217;d prefer, but you&#8217;re able to balance that with tighter control in other parts. More responsibility comes with that, yes. More upfront and maintenance labor too. But it also means your institutional knowledge, your domain expertise, your proprietary data, all of it becomes woven into whatever your competitive edge is, rather than remaining an inert input to a commodity model that everyone else also has access to.</p><p>The monolithic frontier path collapses this. It concentrates dependency, homogenizes capability, and often leaves organizations waiting for someone else&#8217;s next model release to solve their problems.</p><p>The incentives for composed systems aren&#8217;t purely technical. Compliance and privacy requirements, <a href="https://www.wiley.law/article-2026-State-AI-Bills-That-Could-Expand-Liability-Insurance-Risk">liability and risk considerations</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, competitive differentiation, the basic economics around generating tokens from expensive frontier models. Everything about the current landscape points toward compound, specialized AI systems becoming the norm. The model as one component in a broader system, not the whole product.</p><h2>The Question That Follows</h2><p>Which raises a final question.</p><p>Hundreds of billions of dollars are being poured into hyperscale data centers right now. Communities are being asked to accept higher electricity prices, pollution, strained resources. All of it premised on the assumption that the future belongs to ever larger general purpose models and that there will be an insatiable demand for them.</p><p>But if the actual path to ubiquitous, reliable &#8220;AI that just works&#8221; runs through specialized, composed, contextual systems (systems that are smaller, more diverse, and more amenable to a distributed mosaic of compute architectures and environments) then the assumptions underwriting that investment may be fundamentally misaligned with where the real value is heading. Their role as but one component in a much broader ecosystem would be in stark contrast to the center of gravity the current narratives require them to be.</p><p>What happens to all of that if it turns out most businesses, most use cases, most solutions don&#8217;t actually need them nearly as much as everyone currently assumes?</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Some insurers are already <a href="https://www.ft.com/content/abfe9741-f438-4ed6-a673-075ec177dc62">seeking to exclude AI risks from existing policies</a>.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[The Comforting Myth of Effortless AI]]></title><description><![CDATA[There&#8217;s a low, persistent hum in the air if you work anywhere near AI these days.]]></description><link>https://www.insidevoice.ai/p/effortless-ai</link><guid isPermaLink="false">https://www.insidevoice.ai/p/effortless-ai</guid><dc:creator><![CDATA[Jabari Allen]]></dc:creator><pubDate>Fri, 12 Dec 2025 19:21:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aNwX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16a50cc1-73fa-4fbe-826c-27adf6fe731e_256x256.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There&#8217;s a low, persistent hum in the air if you work anywhere near AI these days. I&#8217;m not talking about the loud stuff (e.g. &#8220;AGI in five years&#8221;, &#8220;50% of jobs disappearing tomorrow&#8221;). It&#8217;s much quieter. A subtle siren song of sorts. It goes something like:</p><blockquote><p>The hardest problems are already on their way to being solved. The friction you&#8217;re feeling now is temporary. If you just wait a little bit longer, the models will improve and all those integration headaches will go away.</p></blockquote><p>It sounds reasonable. Why spend time on solving problems that might disappear on their own? Why build complex systems when the ground is still shifting underneath you?</p><p>This is the myth of &#8220;effortless AI&#8221;. The idea that the messy, unglamorous work of stitching new capabilities into real systems will soon be trivial if not outright obsolete. The myth says: the <em>real</em> innovation is happening at the model layer, the infrastructure layer &#8211; that it&#8217;s happening fast, and it&#8217;s being done elsewhere. Your job isn&#8217;t to solve hard problems, it&#8217;s to wait for the hard problems to get solved for you, then plug in the solutions when they&#8217;re ready.</p><p>What makes this tricky to talk about is that different strains of AI discourse overlap and twist around each other. You&#8217;ll see leaders from big AI labs warn about imminent societal collapse, politicians parrot similar talking points, books with press tours heralding dystopia...all against the backdrop of investment and infrastructure buildouts continuing at dizzying levels. These narratives reinforce each other and mix together into a general atmosphere of not just perpetual substantial advancement of AI but anticipated swift, widespread <em>adoption and integration</em> of it. When I say &#8220;effortless AI&#8221;, I&#8217;m talking about that second part. The adoption and integration. And specifically from the perspective of those who will make up the lion&#8217;s share of it: enterprises.</p><p>Right now you may be thinking, &#8220;Who actually says implementing AI is effortless? I just saw Andrej Karpathy and Ilya Sutskever talking about how hard this stuff is and how many unsolved problems there are.&#8221; And you&#8217;re right. Many of us working in the field know better. But it&#8217;s seldom stated so bluntly. That&#8217;s part of what makes it tricky. It&#8217;s hard to show absence. It&#8217;s hard to show the voided interstitial space where you&#8217;d expect more substance. It&#8217;s in the &#8220;but&#8221; that always follows any acknowledgment of difficulty. In the conversational scurry to safety: &#8220;but there&#8217;s new updates every week&#8221;, &#8220;it&#8217;s all changing so fast&#8221;, &#8220;we need to be ready&#8221;, &#8220;this is the worst it&#8217;ll ever be&#8221;. Don&#8217;t get me wrong, I do this too. That&#8217;s part of why I wanted to explore this. We launder the narrative of effortless AI by what we&#8217;re not saying. And that&#8217;s what can make it hard to nail down.</p><h2>Learned Innovationlessness</h2><p>&#8220;Effortless AI&#8221; is a comforting story. It takes the pressure off. It tells you that it&#8217;s fine to not have anything figured out yet, it&#8217;s just temporary. It allows us to conflate inaction with patience, disengagement with prudence.</p><p>But narratives shape behavior. And one behavior I keep seeing, in client conversations, in industry discourse, in the way some teams approach AI projects, is a kind of... &#8220;learned innovationlessness&#8221;.</p><p>It&#8217;s this strange paralysis where some people have convinced themselves that unless you&#8217;re working at the model layer then you&#8217;re incapable of &#8220;real&#8221; innovation. Everything else is derivative. Doomed to be absorbed by the next model release or the next framework or whatever OpenAI, Google, or Anthropic announces next quarter.</p><p>To be clear, the belief isn&#8217;t just &#8220;AI will get better&#8221;. That&#8217;s a given. The belief is &#8220;AI will get better <em>without me</em>, and therefore I don&#8217;t need to act.&#8221;</p><p>If you squint then it might look like optimism. But it&#8217;s more of an optimistic type of fatalism.</p><p>It removes personal agency. It devalues context-specific engineering and creativity. It frames innovation as unnecessary and offloads responsibility to labs and vendors. It convinces people that their own domain knowledge, their own hard-won understanding of their business, their systems, their constraints, and their customers is really of no consequence.</p><p>By choosing inaction, by deferring to the comfortable narrative of inevitability, the only thing that actually becomes inevitable is what you lose out on. The learnings you would have accumulated. The institutional knowledge about how to make this stuff work in your specific context, in practice. Potentially even a voice in the discussions that will set standards and conventions.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>You&#8217;re not much better off than the doomers if you think your own creativity will be deprecated by some company&#8217;s future model update. That&#8217;s resignation dressed up as excitement.</p><h2>The Last Ten Miles</h2><p>I spend my days building production voice AI systems<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> for businesses. And what I keep running into are problems that neither better models nor widely accepted solutions have solved.</p><ul><li><p>Figuring out the right kind of telemetry for workflows hinged upon non-deterministic models so you actually know what&#8217;s going on in your system and can explain when and why things go wrong.</p></li><li><p>Building guardrails for inputs and outputs to ensure both the integrity and fidelity of the system.</p></li><li><p>Accounting for fluctuating throughput and concurrency needs and factoring in how that affects decisions to use cloud APIs vs. dedicated compute / self-hosted deployments.</p></li><li><p>Navigating the latency expectations and accuracy tradeoffs that are inherent to real-world spoken conversations.</p></li><li><p>Dealing with compliance requirements in regulated industries like healthcare where an intelligible system with auditable, human-readable traces is non-negotiable.</p></li><li><p>Designing human-in-the-loop workflows that don&#8217;t just treat your subject matter experts as fail-safes or rote approval-button pushers.</p></li><li><p>The joys of small business IT systems.</p></li></ul><p>None of these problems are solved by the latest and greatest models nor the currently fashionable agent frameworks. The unavoidable work is figuring out how to actually integrate non-deterministic technology with the messy, specific, constrained reality of a given organization&#8217;s systems and needs.</p><p>You can see evidence of this everywhere. From study after study of enterprises struggling with implementation, to conflicting reports of flattening adoption curves. And that&#8217;s all happening too despite a proliferation of models, frameworks, platforms, SDKs, and protocols for building &#8220;AI agents&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. 60+ different approaches, from every major tech company and a small army of startups, with no universal standards and competing interoperability schemes.</p><p>If connecting AI models to enterprise systems was a solved problem, we would be regularly <em>seeing</em> unequivocally successful implementations and we wouldn&#8217;t have this fragmented landscape. We&#8217;d have a handful of dominant patterns that the industry was starting to congeal around and some vendors offering holistic, consistently reliable, production-ready solutions. We&#8217;d have growing consensus from tangible evidence, not just talk.</p><p>The fact that we don&#8217;t have that consensus, after years of intense effort by the best-resourced labs and companies on the planet, suggests that well, we are still early, and that the problems are genuinely hard. And distinctly varied. The variety of solutions reflects how varied and stubborn the problems themselves are.</p><p>To be clear, I&#8217;m not saying any of the above suggests progress has stalled necessarily. All of the above should be expected. General purpose technologies have this characteristic where the <em>application layer</em> is where most of the value and complexity lives. Electricity was transformative (over <em>decades and decades</em>), but the hard part wasn&#8217;t generating power. It was rewiring factories, redesigning workflows, training workers, building appliances. The same goes for the internet. The &#8220;last mile&#8221; turned out to be most of the miles.</p><p>That same dynamic seems to be playing out with AI. The models themselves are increasingly commodified (the capability gap between frontier models has narrowed, and open source models continue to close in). But the integration challenges? Dealing with messy data, legacy systems, compliance requirements, edge cases, modularity, observability, human workflows, and stochastic failure modes? If we&#8217;re lucky, a better model might truly solve a couple of those. But most are systems problems, and they&#8217;re inherently tied to the implementation context.</p><p>Even if we fast forward 5 years to GPT-N, it still will not automatically know your team&#8217;s specific processes and workflows. It won&#8217;t know the quirks of the custom in-house API middleware you built to talk to some industry-specific integration-hostile external system. It won&#8217;t know your compliance team&#8217;s audit and documentation requirements. Capability is not the same as applicability.</p><h2>The Good News</h2><p>So the problems are hard. But I promise that&#8217;s a good thing.</p><p>There are plenty of technologists and teams who know that the edge is in the context, in the unique constraints. They understand that solving problems now is not wasted work, and they are not waiting.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>When you solve hard integration problems today, you&#8217;re not just solving for today&#8217;s models. You get an opportunity to build abstraction layers. Evaluation harnesses. Data flywheels. Orchestration architectures and design patterns that enable you to reliably leverage presently available models, rough edges and all.</p><p>It&#8217;s not always that simple. There are aspects that are difficult to abstract away. But that doesn&#8217;t change the fact that the more you solve today, the greater chance you have at designing a flexible and adaptable system. Not to mention a chance at cultivating valuable institutional technical knowledge that is hard to come by right now. You build the organizational muscle for iterating and adapting as the technology evolves.</p><p>Waiting for better models to solve your problems is not a strategy. It&#8217;s a decision to accumulate dependency instead of internal capability. It&#8217;s betting your future on someone else&#8217;s roadmap.</p><p>I think people would be surprised by what they can do with current AI models when they neither underestimate nor overestimate them. When they fully accept the limitations along with the capabilities. When they&#8217;re willing to stick with experimenting past the initial novelty, past the point where they have to accept that it is &#8220;just&#8221; a tool. A powerful tool, but still a tool. One that requires the same patient, unglamorous work of integration that every other worthwhile novel technology has required.</p><p>The best ideas won&#8217;t come from passive observers that only act in response to others. Lead with curiosity instead of expectations and you develop both experience and flexibility.</p><h2>Reintroducing Gravity</h2><p>None of this is an argument against commodified or use case specific off-the-shelf solutions. It&#8217;s an argument against assuming those things will arrive before the work that makes them possible.</p><p>Every team or business that wants to try out AI does not need to be (nor should be) building a whole new system from scratch. Everything I&#8217;ve said here is directed at those who are already building AI systems, are thinking about it, or want to. My hope is that what I&#8217;m saying evokes curiosity and a desire to experiment, not anxiety about the lack of answers or solutions.</p><p>While there are <em>many</em> things about AI that are worrisome, concerning, or outright harmful&#8230;it is still just a technology, a tool. And like any other tool, it&#8217;s entirely up to people, us, how it will be developed, deployed, adopted, and integrated into our work and lives.</p><p>Right now I&#8217;m engaging with it as a geek that has always loved janky and experimental tech and as someone that loves demystifying technology for anyone that has questions or anxieties about it. In other pieces I&#8217;ll be engaging with it from other perspectives.</p><p>Whether you&#8217;re a lay person curious or skeptical about AI, a technologist using or implementing it, or a decision-maker considering AI initiatives, here are some ways to bring conversations back to earth if you catch a whiff of the &#8220;effortless AI&#8221; narrative.</p><ul><li><p><strong>Ask:</strong> &#8220;What&#8217;s the hardest unsolved problem in this implementation / product / solution, and what&#8217;s the plan to address it?&#8221;</p><ul><li><p>Not &#8220;what are the risks&#8221; (most will have canned answers for that). But asking specifically about <em>unsolved</em> problems pushes people to admit where the real effort lives. If the answer is vague or hand-wavy, that&#8217;s a sign.</p></li></ul></li></ul><ul><li><p><strong>Notice:</strong> When someone tells you a problem will be solved by the next model release, ask (them or yourself) what&#8217;s being done about it <em>now</em>.</p><ul><li><p>If the answer is &#8220;nothing&#8221; or &#8220;waiting,&#8221; that&#8217;s the myth at play. If the answer is &#8220;we&#8217;re building something that might work, or might become obsolete, but we&#8217;re learning either way,&#8221; that&#8217;s someone who tries to create opportunities with the technology they have, within their contexts.</p></li></ul></li></ul><ul><li><p><strong>Experiment:</strong> The teams and companies getting value from AI right now are the ones that have accepted there isn&#8217;t an effortless path.</p><ul><li><p>It is always worthwhile to be thoughtful about how you spend your time and energy. But be wary of convenient narratives that dim your curiosity, discourage interrogation, or sideline your intuition &#8211; that would prefer you distant from your own problems and dependent on the solutions of others.</p></li></ul></li></ul><p>The best way to become intimately acquainted with the problems you&#8217;re trying to solve is to, well, try to solve them.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This is precisely why if you&#8217;re concerned with governance, ethics, trust, safety, and/or regulation in AI, you too should be wary of these narratives. Focus on the capabilities of the present &#8211; the potential <em>harms</em> of the present.</p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Keyword: <em>systems</em>. Not agents. A distinction I think is important, if only to convey the primacy of integration and interoperability.</p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>A term whose definition still lacks consensus</p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>There&#8217;s a clear desire for more discourse around this. See Dex Horthy&#8217;s popular recent talk at the AI Engineer Code Summit from late Nov 2025: <a href="https://www.youtube.com/watch?v=rmvDxxNubIg">No Vibes Allowed: Solving Hard Problems in Complex Codebases</a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[What to Expect Here]]></title><description><![CDATA[Talking about tech these past few years, specifically AI, has felt a bit like being pulled between extremes of forced "awe" and "horror".]]></description><link>https://www.insidevoice.ai/p/what-to-expect-here</link><guid isPermaLink="false">https://www.insidevoice.ai/p/what-to-expect-here</guid><dc:creator><![CDATA[Jabari Allen]]></dc:creator><pubDate>Fri, 08 Aug 2025 02:38:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aNwX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16a50cc1-73fa-4fbe-826c-27adf6fe731e_256x256.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Talking about tech these past few years, specifically AI, has felt a bit like being pulled between extremes of forced "awe" and "horror". A techno &#8220;utopia&#8221; filled with superintelligent AI and robot butlers. Or doomsday scenarios where all jobs are gone and Skynet is your landlord. Whether the sentiments are sincere or not, they're exhausting.</p><p>Luckily there are some great voices out there that have already been adding more <a href="https://knightcolumbia.org/content/ai-as-normal-technology">nuanced</a>, <a href="https://youtu.be/oq_mZ40TBnk?si=WnRmfzkYLv0-BGo1&amp;t=1760">critical</a>, and <a href="https://www.youtube.com/watch?v=dm5JUtqV7zo&amp;t=1799s">grounded</a> analyses and <a href="https://www.youtube.com/watch?v=g-vDL5O2f_E">perspectives</a> that give everyday people and domain experts more practical tools to reason about the AI industry and the technology we're creating and propagating.</p><p>My hope is that <em>Inside Voice</em> can be a small contribution to those efforts as well. A place for regular people, tech workers, small businesses, startups, even big enterprises &#8212; anyone who's tired of unsolicited "answers to all their problems" and is instead looking for more and better questions being asked about the changing role of technology in our work and lives.</p><h1>Why am I doing this?</h1><p>I've been in tech for a decade and I'm now an AI consultant. I've worked extensively with clients on Voice AI projects in particular (think real-time transcription, voice agents, voice-powered interfaces). And whether it's at the beginning of a project or in the middle, many times the question "so why are we doing X this way?" will arise. I love those questions and want to make sure they keep getting asked. But too often, I see them getting drowned out by hype, fear, or the pressure to just ship something. I don't think that's good for business, and I definitely don't think it's conducive to building robust communities around the development of novel tech.</p><p>I want to surface and wrestle with more of those &#8220;why&#8221; questions. My goal is to provide technical literacy that feels like talking with your "tech-y friend" &#8212; someone who always has time for your questions, doesn't talk down to you, never shrouds simple ideas in unnecessary mystique, and tries to learn more than they teach. An informed public encourages more and better discourse.</p><p>More people need to feel they have a say in where this technology goes. "Development" isn't just what engineers do, it's also the public sentiments we encourage, the habits and norms we cultivate, the everyday choices we make about what to use or refuse, and the conversations we do or don't have. The clearer we understand the tools, the more agency we have in shaping their impact.</p><p>So how will this blog be different? You'll find fewer hot takes here and more sincere questions: </p><p><em>Why does so much "innovation" default to excess, to scaling at all costs?</em> </p><p><em>What would it look like if users and businesses had more nimble and contextually appropriate tools and were able to prioritize data sovereignty?</em></p><p><em>How do we decide when new tech actually solves a problem versus just sounding impressive?</em></p><p><em>What should every person know before regularly interacting with generative AI tools in their daily lives?</em></p><h1>So&#8230;what exactly am I going to talk about?</h1><p>I have a lot of thoughts bouncing around in my head, but my posts will typically be contained to the following topics/sections:</p><ul><li><p><strong>Industry Analysis</strong>  <br><em>[How to spot AI hype and ask better questions]</em><br>I'll poke at the hype cycles, the narratives, claims from the big players and AI labs and then ask how does this work, who does this affect, and why does this matter for the rest of us. The goal will always be to help develop your own critical lens for this technology so you can draw your own conclusions about industry claims past, present, and future.</p></li></ul><ul><li><p><strong>Workbench <br></strong><em>[Technical explorations and deep dives]</em><br>At my core, I love getting into the fine details of things, <a href="https://www.youtube.com/watch?v=4YVtd4EvpOo">the nitty gritty</a>. I love taking things apart, diagnosing bugs, fixing broken machines, and building custom solutions. So I might do a deep dive into things like the anatomy of a low-latency AI voice agent, or the reason voice bots keep interrupting you, or the importance of reciprocity and conversational repair for enjoyable conversations and how to translate that into code, or the nuances of customizing your own task-specific language models. Sometimes I'll have code. Sometimes I'll draw some diagrams. Sometimes it'll just be me banging my head against a problem and sharing the joys of that.</p></li><li><p><strong>AI Literacy<br></strong><em>[Plainspoken explainers to demystify AI technologies]</em><br>No matter what, I think generative AI will be here to stay in one way or another. So it is crucial that we continue to develop our own literacy around this technology. I'll provide accessible explainers and mental models that aim to demystify how these systems work so you can intentionally use them (or refuse them). I want people to fully understand what all these AI products and services mean when they say "use with caution".</p></li></ul><ul><li><p><strong>Sandbox</strong><br><em>[Cross-disciplinary connections and fun &#8220;what ifs&#8221;]</em><br>The grab bag, the playground. The place where I'll post the occasional "shower thought&#8221;, wild connection, research deep dive, or speculative tangent that doesn't fit anywhere else.</p></li></ul><p>What you shouldn't expect:</p><ul><li><p>Thinly-veiled promotion for <em>[insert latest cutting-edge AI product]</em>.</p></li></ul><ul><li><p>"five prompt tricks to 10x your workflow" listicles.</p></li></ul><ul><li><p>Dire warnings about AI apocalypse with a vibe that can only be described as "menacing glee".</p></li></ul><ul><li><p>Anything I wouldn't bother reading myself.</p></li></ul><p></p><div><hr></div><p></p><p>I want this blog to be a place where curiosity, play, healthy skepticism, and critical analysis are encouraged and appreciated as fertilizer to the soil that grows innovation and fuel for our collective engines of creativity.</p><p>If you want hype, if you want doom, those places are easy enough to find. If you want something else that hopefully feels a bit more familiar, maybe more like the type of conversations you&#8217;d have with your friends or family or colleagues, then stick around and see if you like it here. </p><p>I don't have a content schedule. I'm not promising weekly posts. I'll write when I have something worth sharing.</p><p>If all of that works for you, welcome. If not, no hard feelings.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.insidevoice.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Inside Voice! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>