Ten Jobs Whose Current Form Deserves a Farewell Party
A sharp look at which white-collar roles AI may not merely change, but quietly make obsolete, and why polite language hides the scale of the shift.
63 posts
A sharp look at which white-collar roles AI may not merely change, but quietly make obsolete, and why polite language hides the scale of the shift.
A review of a rare AI book that uses mathematics to illuminate rather than intimidate, making difficult ideas feel genuinely learnable.
A playful mock protocol imagines prompts as transport packets, turning generative reconstruction into a deadpan internet standard.
Continuous Autoregressive Language Models challenge the token-by-token bottleneck and hint at a different future for language generation.
Donald Knuth's collaboration with Claude offers a quietly historic glimpse of AI as mathematical assistant rather than mere answer machine.
A concise guide to model distillation as both useful compression technique and strategic attack surface in the LLM economy.
PageIndex.ai makes the case for document-aware retrieval that respects pages, structure, and references instead of blindly chunking PDFs.
Meta-prompting treats the prompt itself as a draft to debug, producing clearer goals and fewer disappointing model outputs.
Recursive language models challenge the idea that longer context alone solves reasoning over large documents and codebases.
A new AI-assisted algebraic geometry result raises the stakes for language models as collaborators in genuine mathematical discovery.
Two papers suggest that external guardrails cannot provide airtight AI safety, forcing a harder look at the mathematics of control.
Strange LLM outputs become clues to the messy training data, transcription errors, and hidden artifacts inside modern models.
Interpretability research asks whether LLMs can detect their own internal states, moving introspection from philosophy toward experiment.
Kimi K2 Thinking enters the reasoning-model race, showing how quickly China's AI frontier is becoming globally competitive.
If transformers are theoretically invertible, the question shifts from whether models lose information to how they manage and suppress it.
Musk's idea of using idle Teslas for inference turns a car fleet into a provocative vision of distributed AI infrastructure.
The neural junk-food hypothesis asks whether low-quality viral content can degrade models much like shallow media degrades attention.
Different coding models show recognizable habits, risk tolerances, and failure modes, making 'personality' a practical engineering concern.
CraftGPT turns a language model into Minecraft redstone, proving that absurd constraints can teach serious lessons about computation.
Prompt packs can make general models behave like specialists, but the post asks where scaffolding ends and real specialization begins.
Human and LLM errors can look similar, but their causes differ in ways that matter for trust, correction, and accountability.
The AI boom is compared with dot-com excess, asking which parts are durable infrastructure and which are speculative heat.
Bayesian experimental design offers a way for LLMs to ask better follow-up questions instead of guessing blindly.
AI hype is framed as an economic mirage, propping up confidence while hiding fragile assumptions beneath the spectacle.
Dietrich Dörner's work on complex-system failure becomes a warning label for autonomous AI and overconfident decision-making.
A study of intimate chatbot conversations reveals how major models handle flirtation, refusal, safety, and awkward human expectations.
SEAL points toward language models that rewrite their own training material, hinting at AI systems that learn after deployment.
A practical map of OpenAI's model lineup in May 2025, cutting through confusing names and overlapping capabilities.
Sycophantic AI is mocked as flattery gone wrong, showing how agreeable models can become less useful and less truthful.
Knowledge graphs are useful, but the post argues they are not a magic cure for LLM hallucination and reasoning failures.
Humanity's Last Exam is framed as a benchmark that tests not only models, but our assumptions about intelligence itself.
Small LLMs are not a contradiction but a response to the need for cheaper, private, and more efficient intelligence.
A year-end inventory of ten unresolved AI problems that still define the frontier despite rapid progress.
Gibson's digital ghosts become a frame for modern AI simulations of human behavior and the science behind them.
LLM reasoning failures may reveal uncomfortable parallels with human cognition rather than a simple machine deficiency.
A plain-language glossary of fifty AI terms for readers who want the field's vocabulary without the usual fog.
Malla represents the darker side of generative AI, where language models become tools for scalable cybercrime.
The Jevons paradox explains why more efficient AI may increase total consumption rather than reduce costs or energy use.
The post asks whether LLMs possess coherent world models or merely produce fluent stories about reality.
STaR shows how models can improve reasoning by generating and learning from their own explanations.
THERMOMETER targets overconfident language models, offering a way to calibrate systems that bluff too easily.
LLM steerability is treated as both craft and control problem: how to guide powerful models without losing the plot.
Decentralized multi-agent systems promise problem-solving without a central boss, but coordination becomes the real challenge.
Multi-agent LLM systems are explored as a path toward distributed reasoning, specialization, and collaborative AI workflows.
The opening part of a benchmark series asks what LLM evaluations really measure and why the numbers often mislead.
Part two examines benchmark methods themselves, exposing the assumptions behind the scores used to compare language models.
Part three moves from benchmark scores to application areas, asking where LLM performance actually matters in practice.
Part four digs into the good, bad, and misleading sides of benchmark results and their interpretation.
Part five steps beyond scores to consider real-world limitations, reliability, and practical model behavior.
The final benchmark essay looks toward better evaluation methods that test usefulness rather than leaderboard theater.
A friendly guide to the difference between narrow AI and artificial general intelligence, with metaphors that make the distinction stick.
Human overconfidence and AI hallucination meet in a comparison of how bad certainty distorts judgment in both minds and machines.
Apple's MM1 research is presented as a step toward AI systems that understand text and images together.
A practical guide to prompt engineering techniques for getting more reliable, useful behavior from large language models.
The echo-chamber problem asks what happens when future models learn increasingly from content produced by earlier models.
Two perspectives on LLM interaction reveal how user behavior and model dynamics shape each other in unexpected ways.
Apple's rumored Ajax and Apple GPT projects are examined as early signs of its generative-AI strategy.
Multimodal LLMs are explained as a key step toward systems that can reason across text, images, and other signals.
The LLaMA leak becomes a case study in open AI, research ethics, and the risks of powerful models spreading freely.
AI is used to explore risk, protection, and compliance questions in IT security through a structured expert-system lens.
The GPT Store launch becomes the backdrop for introducing gekko's own specialized expert systems.
Track&Field Analyst is introduced as a custom GPT for objective athletics data analysis and performance insight.
InfoSec Advisor combines ChatGPT with German IT-Grundschutz knowledge to support security analysis and practical guidance.