Tech FAQ

Question 1

What is Software 3.0 according to Karpathy?

Accepted Answer

Karpathy frames three eras of software. Software 1.0 is humans writing explicit code. Software 2.0 is humans curating datasets and training neural networks, where the weights are the program. Software 3.0 is humans writing prompts, where the LLM is the interpreter and the context window is the program. The unit of programming shifts from a function to a paragraph.

Question 2

When did agentic coding actually start working?

Accepted Answer

Karpathy points to December 2024 as the inflection point. Before then, agentic tools were 'kind of helpful' but required constant correction. Over the December break, the latest models crossed a line where Karpathy stopped correcting them and started trusting the system. He flagged this on the record, warning that anyone whose mental model of AI was set by ChatGPT was already a generation stale.

Question 3

What is the difference between vibe coding and agentic engineering?

Accepted Answer

Vibe coding raises the floor: it lets non-engineers build software they could not build before. Agentic engineering raises the ceiling: it lets professional engineers preserve the existing quality bar while moving much faster. Karpathy thinks the productivity gap for the best users now exceeds the old 10x engineer benchmark by a wide margin.

Question 4

Why are LLMs good at code and math but bad at common-sense tasks?

Accepted Answer

Frontier labs train models with reinforcement learning, which requires verifiable rewards. Verifiable domains attract environments and signal, so they get the steepest gains. Everything outside the verifiable distribution stays jagged. Karpathy's takeaway for founders is that building a verifiable environment in your domain is real leverage. For workers, the more useful question than 'is my job safe' is 'is my job verifiable.'

Question 5

What does Karpathy mean by 'outsource your thinking, not your understanding'?

Accepted Answer

As agents do more execution, the bottleneck moves into the human's head. You still have to know what is worth building, why, and how to direct the work. Your value sits upstream of execution. Karpathy keeps building knowledge bases out of his own reading because the constraint of the next decade is less about compute than about how fast humans can deepen comprehension to keep directing systems that out-execute them.

Question 6

Why does Karpathy say MenuGen 'shouldn't exist'?

Accepted Answer

Karpathy built MenuGen as a full-stack app: photo a restaurant menu, OCR it, generate dish images, render a new menu. Then he saw the Software 3.0 version: hand the photo to Gemini, say 'use NanoBanana to overlay the dishes,' and a single model call returns the rendered menu. The lesson is that a lot of what gets built today is scaffolding around a capability the model could perform end-to-end. Before writing the next CRUD app, ask whether the model is the app.

Question 7

How should hiring change in the agentic era?

Accepted Answer

Karpathy argues whiteboard puzzles measure the wrong thing. Hiring should look like giving someone a really big project, having them implement it, and then trying to break it. His example: build a Twitter clone for agents, make it secure, simulate activity, then have ten Codex 5.4-X-high instances try to break the website. If your interview loop has not changed since 2022, you are selecting for the previous era.

Question 8

Why have smartphone upgrade cycles slowed down?

Accepted Answer

The average global smartphone replacement cycle has stretched to 3.5 years. Cameras, screens, and processors have reached a quality plateau where year-over-year improvements are incremental rather than transformative. Battery life has overtaken price as the top purchase driver for the first time, suggesting hardware differentiation has stalled.

Question 9

How does Apple use Google Gemini for on-device AI?

Accepted Answer

Google gave Apple complete access to the Gemini model in Apple's own data centers. Apple uses a process called distillation, where smaller models learn from Gemini's reasoning outputs to produce efficient models with Gemini-like performance at a fraction of the compute. These distilled models can run on-device without an internet connection.

Question 10

What is the Apple Foundation Model?

Accepted Answer

Apple's on-device Foundation Model is a roughly 3 billion parameter language model optimized for Apple Silicon through innovations like KV-cache sharing and 2-bit quantization. It runs at 30 tokens per second on iPhone 15 Pro and powers Apple Intelligence features including summarization, writing tools, and Siri enhancements.

Question 11

Could on-device AI model size become a marketing spec like megapixels?

Accepted Answer

Yes, and there are early signs of this. Samsung's Exynos 2600 markets 80 TOPS of NPU performance, more than double the prior generation. Samsung targets 800 million AI-enabled devices by end of 2026. But like megapixels before it, raw parameter count or TOPS may not correlate with actual user experience.

Question 12

Is it worth upgrading my phone for AI features in 2026?

Accepted Answer

It depends on your current device. On-device AI requires specific hardware: Apple Intelligence needs an A17 Pro or later, and Android AI features require recent NPUs. If your phone is more than two generations old, you cannot run the latest on-device models at all. Morgan Stanley's 2026 survey found iPhone upgrade intentions at an all-time high of 37%, driven partly by AI capabilities.

Question 13

How many parameters can a smartphone run on-device?

Accepted Answer

Current smartphones run 1-3 billion parameter models natively. Apple's Foundation Model is roughly 3 billion parameters. Google's Gemini Nano ships at 1.8 to 3.25 billion parameters. Developers have also demonstrated running a 400 billion parameter Mixture of Experts model on iPhone 17 Pro, though only 17 billion parameters are active per inference pass.

Question 14

What are the mathematical limits of the transformer architecture?

Accepted Answer

Several recent proofs demonstrate structural constraints. Duman Keles et al. (2023) proved O(n²) attention complexity is a necessary lower bound. Kalai and Vempala (STOC 2024) proved any calibrated language model must hallucinate at a certain rate. Chowdhury (2026) showed the lost-in-the-middle problem is geometric, present at initialization before training. These are not engineering challenges to be fixed with better data.

Question 15

What will replace the transformer architecture?

Accepted Answer

Not a single replacement but a hybrid stack. Over 60% of frontier models already use Mixture of Experts. Production systems like AI21's Jamba, Alibaba's Qwen3-Next, and Microsoft's Phi-4-mini-flash-reasoning blend attention with state space models (Mamba) for 3-10x throughput gains. Diffusion language models like LLaDA offer a wilder alternative, generating text through denoising rather than sequential token prediction.

Question 16

Can AI systems design their own replacement architecture?

Accepted Answer

It is already happening. DeepMind's AlphaEvolve found a 23% kernel speedup inside Gemini's own architecture. Karpathy's AutoResearch discovered about 20 improvements on his own highly-tuned codebase, cutting the metric by 11%. Sakana AI's AI Scientist v2 produced the first AI-authored paper accepted through standard peer review. The timeline from thought experiment to working systems was faster than most expected.

Question 17

Has AI pre-training scaling hit a wall?

Accepted Answer

For dense transformers, evidence points to flattening. OpenAI's Orion model hit GPT-4 performance after just 20% of training, with diminishing returns for the remaining 80%. But test-time compute opened a different axis: inference spending hit $2.3 billion at OpenAI in 2024, 15x training costs. The Densing Law shows capability per parameter doubling every 3.5 months through MoE, distillation, and better data curation.

Question 18

What is the difference between MCP and A2A?

Accepted Answer

MCP is agent-to-tool; A2A is agent-to-agent. MCP connects AI agents to tools and data sources. A2A connects agents to each other for multi-agent collaboration. They operate at different architectural layers and are complementary, not competing.

Question 19

Do I need both MCP and A2A?

Accepted Answer

Start with MCP, then add A2A. For most enterprise deployments, MCP handles tool integration first, then A2A layers on when you need multi-agent coordination across organizational boundaries. AWS, Microsoft, Salesforce, SAP, and IBM already support both protocols.

Question 20

Who governs MCP and A2A?

Accepted Answer

Both are under the Linux Foundation. MCP sits within the Agentic AI Foundation (AAIF), which has 146 member organizations including Anthropic, OpenAI, and Block. A2A has its own governance body with 150+ partner organizations.