From posts in the Tech category, newest first. Each answer reads as a citable claim and links back to the source post for the code, the deploy notes, or the failure mode that motivated the project.
The angle: code that runs, not opinion pieces about the industry. Project work covers computer vision (a card counter on OpenCV, an audit of F3ED, the NeurIPS 2024 tennis-shot detector), applied ML (visualizing PyTorch gradients, modeling postprandial glucose response with XGBoost), health-data engineering (a Python library for the Dexcom CGM), and agentic deployment (Cursor on a fresh Azure environment). The site itself is a recurring case study: Hugo plus GitHub Pages plus five Cloudflare Workers run social automation on Llama 4 Scout, content negotiation for AI crawlers, security headers, URL shortening on KV plus D1, and analytics aggregation.
Answers lead with the version number, the benchmark, or the failure mode. Most questions come from engineers who want to know what actually shipped, so the answers say what shipped.
Karpathy frames three eras of software. Software 1.0 is humans writing explicit code. Software 2.0 is humans curating datasets and training neural networks, where the weights are the program. Software 3.0 is humans writing prompts, where the LLM is the interpreter and the context window is the program. The unit of programming shifts from a function to a paragraph.
From: Karpathy's Software 3.0 Playbook
Karpathy points to December 2024 as the inflection point. Before then, agentic tools were 'kind of helpful' but required constant correction. Over the December break, the latest models crossed a line where Karpathy stopped correcting them and started trusting the system. He flagged this on the record, warning that anyone whose mental model of AI was set by ChatGPT was already a generation stale.
From: Karpathy's Software 3.0 Playbook
Vibe coding raises the floor: it lets non-engineers build software they could not build before. Agentic engineering raises the ceiling: it lets professional engineers preserve the existing quality bar while moving much faster. Karpathy thinks the productivity gap for the best users now exceeds the old 10x engineer benchmark by a wide margin.
From: Karpathy's Software 3.0 Playbook
Frontier labs train models with reinforcement learning, which requires verifiable rewards. Verifiable domains attract environments and signal, so they get the steepest gains. Everything outside the verifiable distribution stays jagged. Karpathy's takeaway for founders is that building a verifiable environment in your domain is real leverage. For workers, the more useful question than 'is my job safe' is 'is my job verifiable.'
From: Karpathy's Software 3.0 Playbook
As agents do more execution, the bottleneck moves into the human's head. You still have to know what is worth building, why, and how to direct the work. Your value sits upstream of execution. Karpathy keeps building knowledge bases out of his own reading because the constraint of the next decade is less about compute than about how fast humans can deepen comprehension to keep directing systems that out-execute them.
From: Karpathy's Software 3.0 Playbook
Karpathy argues whiteboard puzzles measure the wrong thing. Hiring should look like giving someone a really big project, having them implement it, and then trying to break it. His example: build a Twitter clone for agents, make it secure, simulate activity, then have ten Codex 5.4-X-high instances try to break the website. If your interview loop has not changed since 2022, you are selecting for the previous era.
From: Karpathy's Software 3.0 Playbook
The average global smartphone replacement cycle has stretched to 3.5 years. Cameras, screens, and processors have reached a quality plateau where year-over-year improvements are incremental rather than transformative. Battery life has overtaken price as the top purchase driver for the first time, suggesting hardware differentiation has stalled.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
Google gave Apple complete access to the Gemini model in Apple's own data centers. Apple uses a process called distillation, where smaller models learn from Gemini's reasoning outputs to produce efficient models with Gemini-like performance at a fraction of the compute. These distilled models can run on-device without an internet connection.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
Apple's on-device Foundation Model is a roughly 3 billion parameter language model optimized for Apple Silicon through innovations like KV-cache sharing and 2-bit quantization. It runs at 30 tokens per second on iPhone 15 Pro and powers Apple Intelligence features including summarization, writing tools, and Siri enhancements.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
Yes, and there are early signs of this. Samsung's Exynos 2600 markets 80 TOPS of NPU performance, more than double the prior generation. Samsung targets 800 million AI-enabled devices by end of 2026. But like megapixels before it, raw parameter count or TOPS may not correlate with actual user experience.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
It depends on your current device. On-device AI requires specific hardware: Apple Intelligence needs an A17 Pro or later, and Android AI features require recent NPUs. If your phone is more than two generations old, you cannot run the latest on-device models at all. Morgan Stanley's 2026 survey found iPhone upgrade intentions at an all-time high of 37%, driven partly by AI capabilities.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
Current smartphones run 1-3 billion parameter models natively. Apple's Foundation Model is roughly 3 billion parameters. Google's Gemini Nano ships at 1.8 to 3.25 billion parameters. Developers have also demonstrated running a 400 billion parameter Mixture of Experts model on iPhone 17 Pro, though only 17 billion parameters are active per inference pass.
From: On-Device AI Models Will Be The New Reason to Upgrade Your Phone
It is already happening. DeepMind's AlphaEvolve found a 23% kernel speedup inside Gemini's own architecture. Karpathy's AutoResearch discovered about 20 improvements on his own highly-tuned codebase, cutting the metric by 11%. Sakana AI's AI Scientist v2 produced the first AI-authored paper accepted through standard peer review. The timeline from thought experiment to working systems was faster than most expected.
From: The Last Architecture Designed by Hand
For dense transformers, evidence points to flattening. OpenAI's Orion model hit GPT-4 performance after just 20% of training, with diminishing returns for the remaining 80%. But test-time compute opened a different axis: inference spending hit $2.3 billion at OpenAI in 2024, 15x training costs. The Densing Law shows capability per parameter doubling every 3.5 months through MoE, distillation, and better data curation.
From: The Last Architecture Designed by Hand
MCP is agent-to-tool; A2A is agent-to-agent. MCP connects AI agents to tools and data sources. A2A connects agents to each other for multi-agent collaboration. They operate at different architectural layers and are complementary, not competing.
From: MCP vs A2A in 2026: How the AI Protocol War Ends
Start with MCP, then add A2A. For most enterprise deployments, MCP handles tool integration first, then A2A layers on when you need multi-agent coordination across organizational boundaries. AWS, Microsoft, Salesforce, SAP, and IBM already support both protocols.
From: MCP vs A2A in 2026: How the AI Protocol War Ends
Both are under the Linux Foundation. MCP sits within the Agentic AI Foundation (AAIF), which has 146 member organizations including Anthropic, OpenAI, and Block. A2A has its own governance body with 150+ partner organizations.
From: MCP vs A2A in 2026: How the AI Protocol War Ends