← Blog
trolley problemmoral philosophyethics historythought experimentsai ethics

The Trolley Problem at 60: What It Still Teaches Us

Philippa Foot's 1967 thought experiment has outlasted most philosophy papers. Here's why the trolley problem still matters — and what 60 years of research have uncovered.

·7 min read

The Thought Experiment That Outlasted Most of the Philosophy Around It

Most academic philosophy papers are read by a few hundred specialists, disputed briefly, and quietly forgotten. Philippa Foot's 1967 scenario about a runaway trolley has instead colonised undergraduate syllabuses, neuroscience labs, automotive engineering departments, and the comment sections of the internet. Sixty years after its publication, the trolley problem is not merely alive — it has become the most recognised moral thought experiment in history, a shared cultural shorthand for the collision between cold arithmetic and something that feels irreducibly human.

Why? The simplest answer is that the scenario crystallises a genuine conflict rather than manufacturing a fake one. It does not ask you to imagine a world operating by different laws of physics. It asks only that you imagine a moment of terrible choice — and then confronts you with the fact that your answer probably does not survive a minor variation in the setup.

Foot's Original Formulation: It Was Never Really About Trolleys

Philippa Foot did not sit down in 1967 with the goal of creating an internet meme. She was working through a much thornier problem: the doctrine of double effect, a principle with roots in Aquinas that holds it can be permissible to cause harm as a foreseen but unintended side effect of achieving a good outcome, even when it would be impermissible to cause the same harm as a means to the same end. Foot introduced the trolley scenario in "The Problem of Abortion and the Doctrine of Double Effect" as a vehicle — pun unavoidable — for testing where that doctrine held and where it cracked.

In her original version, the driver of a runaway tram can steer onto a side track, killing one worker, or continue straight, killing five. Most readers agreed the driver should turn. Foot contrasted this with a surgeon who might kill one healthy patient to harvest organs and save five dying ones — a case that felt obviously monstrous despite identical arithmetic. Her question was precise: what makes the tram case acceptable while the organ case is not? The answer she pursued had to do with the direction of harm: in the tram scenario, the one death is a side effect; in the organ scenario, it is the mechanism. The trolley problem was, from its birth, a tool for dissecting moral logic — not a puzzle with a correct answer.

Thomson's Footbridge: Same Numbers, Opposite Gut

Judith Jarvis Thomson sharpened the experiment considerably with her 1985 paper "The Trolley Problem," published in the Yale Law Journal. Thomson introduced what philosophers now call the footbridge variant: you are standing on a bridge above the tracks, next to a large stranger. The trolley is coming. The only way to stop it and save five workers is to push the stranger off the bridge — his body will halt the trolley, but he will die. Would you push him? The vast majority of people, across cultures and decades of replication, say no.

The arithmetic is indistinguishable from the original: one life against five. Yet the intuition reverses. This is not a quirk of the experiment — it is the finding. Thomson's variation exposed a fracture in moral reasoning that has kept philosophers and, later, cognitive scientists occupied for four decades. Something about personal physical force, about using a body as an instrument, triggers a revulsion that pulling a lever does not. The question of why became the next sixty years of work.

When Neuroscience Entered the Trolley Car

Joshua Greene, then a Princeton graduate student, put people inside fMRI scanners in 2001 and presented them with trolley-style dilemmas. His findings, published in Science, were striking: the footbridge scenario lit up regions of the brain associated with emotion and social cognition — the medial prefrontal cortex and posterior cingulate — far more intensely than the impersonal lever scenario. Greene interpreted this through the lens of what he called dual-process moral psychology: fast, emotional intuitions generated by one system clash with slower, deliberative utilitarian reasoning generated by another. When the emotional system wins, we refuse to push. When we override it — and some people do — we tend to give more consequentialist justifications.

Greene's work was controversial — some philosophers argued he was committing a naturalistic fallacy by using brain data to adjudicate between moral theories — but it opened a new empirical programme. Moral psychology became a legitimate science, not just a branch of armchair speculation. The trolley problem had become a laboratory instrument.

Trolleyology Goes to Silicon Valley

By the 2010s, the thought experiment had migrated from seminars to engineering offices. Self-driving car developers faced genuine versions of the dilemma: what decision rules should an autonomous vehicle follow when a collision is unavoidable and the algorithm must, in effect, choose who bears the risk? The question is not hypothetical. It sits inside real software deployed on public roads.

MIT's Moral Machine project, launched in 2014 and publishing its landmark results in Nature in 2018, collected over 40 million moral decisions from participants in 233 countries. The data revealed something uncomfortable for anyone hoping to encode a universal ethical standard: preferences varied dramatically by culture, geography, and socioeconomic context. In some regions, participants prioritised younger lives; in others, higher-status individuals received implicit protection. In collectivist cultures, group survival was weighted differently than in individualistic ones. The trolley problem, scaled to the size of the global internet, did not produce convergence. It produced a map of moral heterogeneity.

  • Western participants tended to prioritise saving the most lives regardless of age or status
  • East Asian participants showed stronger deference to elders compared to Western samples
  • Latin American and French samples penalised jaywalkers more heavily in collision scenarios
  • Across all cultures, humans and children were consistently valued over animals
  • No single ethical rule commanded majority support across all 233 countries studied

The Critics Have a Point — And It's Worth Hearing

Not everyone found the trolleyology boom illuminating. Philosopher Peter Unger, in his 1996 book "Living High and Letting Die," argued that intuitions elicited by trolley problems are artefacts of the scenarios themselves — morally irrelevant features like physical distance or the number of causal steps between action and outcome were doing the ethical work, not any principled distinction. Unger's "physical belief" hypothesis suggested that people's responses track shallow physical cues, not moral truths. On this view, the experiment does not teach us what ethics requires; it teaches us how badly calibrated our gut reactions are.

There is also a practical objection. Real emergencies rarely resemble trolley scenarios. Paramedics, pilots, and soldiers making split-second decisions operate under time pressure, incomplete information, and the weight of training — not the sterile binary of a thought experiment with stipulated certainty. Studies of actual decision-making in crisis situations suggest people rely heavily on pattern recognition and procedural rules, not real-time utilitarian calculation. The trolley problem, critics argue, may reveal something about moral intuitions without revealing much about moral action.

What Makes a Thought Experiment Last Sixty Years?

The durability of the trolley problem is not accidental. It survives because it does something rare: it makes a structural feature of ethical reasoning visible without requiring any technical vocabulary. You do not need to know what consequentialism or deontology mean to feel the pull of both when you encounter the scenario. The experiment renders the abstract concrete, the philosophical personal. That is extraordinarily difficult to achieve, and most thought experiments fail at it.

At its deepest level, the trolley problem is a question about whether our moral rules are rules or heuristics. If they are genuine rules — inviolable constraints on action — then the footbridge refusal is principled and the lever-pull is a concession to the doctrine of double effect. If they are heuristics — evolved intuitions that generally track good outcomes but can misfire — then our reluctance to push the large stranger is a cognitive bias to be examined, not a moral insight to be trusted. Both views have serious defenders. Neither has won. That unresolved tension is why the problem is still being taught, still being researched, and still being argued about in comment sections at midnight.

How SplitVote Users Split: A 50/50 Fracture After Millions of Votes

When SplitVote users encounter the trolley dilemma, the results are striking — not because one side dominates, but because it remains one of the most evenly divided questions on the entire platform. Despite generations of philosophical argument, empirical research, and cultural saturation, the scenario continues to cleave voters almost exactly in half. The lever-pullers and the non-pullers coexist in near-perfect equilibrium, a statistical reflection of the same tension Foot identified in 1967.

What the data also shows is the variability by demographic and by how users arrive at the question. Users who have already encountered the fat-man variant — where the scenario requires direct physical contact — become significantly less likely to pull the lever in the original. Moral intuitions, it turns out, are not isolated; they shift in response to context, sequence, and framing. SplitVote captures that fluidity in real time, across a genuinely global population, in a way that laboratory experiments rarely can.

The Diagnostic That Became a Mirror

Philippa Foot died in 2010, having watched her trolley scenario become something she never fully anticipated: a cultural touchstone, a neuroscience instrument, a design brief for AI engineers, and a litmus test for moral personality. She remained skeptical, to the end, of overconfident moral theorising — a philosopher who used the thought experiment to expose complexity, not to resolve it. That intellectual humility may be the most important thing the trolley problem models.

Sixty years on, the scenario has not told us what the right answer is. It has done something more valuable: it has shown us that we disagree, that our disagreement is structured and predictable, and that the disagreement itself is philosophically significant. A thought experiment that makes us see our own inconsistency clearly is not a failure. It is doing exactly what philosophy should do.

This article discusses moral philosophy, cognitive science research, and AI ethics for educational purposes. Vote data referenced reflects anonymised aggregate results from SplitVote users and does not constitute a representative sample of any population.