The Manhattan distance heuristic is the most-used heuristic in heuristic search, probably ever. Introduced informally in the 1960s for the 15 puzzle, formalised in the 1968 A* paper, still the first thing every AI textbook teaches. This article is about why it works, what it does well, and where it stops being enough.
The definition
For an N×N slide puzzle, for each tile (ignore the blank), compute:
distance = | row_current − row_goal | + | col_current − col_goal |
Sum across all tiles. That sum is the Manhattan distance of the current board from the goal.
Why "Manhattan"? The streets of Manhattan are a regular grid. To get from one corner to another, you walk so many blocks east-west plus so many blocks north-south. You cannot cut diagonally. The total number of blocks walked is the Manhattan distance — also called taxicab distance or L¹ norm.
For a tile, the same logic: it can move horizontally or vertically by one cell per move, but not diagonally. The minimum number of moves to get it from where it is to where it needs to go is the row-distance plus the column-distance.
Admissibility
A* requires the heuristic to be admissible: it must never overestimate the actual number of moves remaining. If the heuristic overestimates, A* might cut off the optimal path early and return a suboptimal solution.
Manhattan distance is admissible because:
- Each individual tile must travel at least its row-distance + column-distance, because moves are unit steps in row or column.
- Summing over all tiles cannot overshoot, because each move advances exactly one tile by exactly one cell. (It also moves the blank, but we're not counting the blank.)
- In fact, the true distance is at least the Manhattan distance because tiles often must be moved aside for others to pass — but never less.
So Manhattan ≤ actual. Admissible.
Consistency
A stronger property: consistency, sometimes called the monotonicity of the heuristic. A heuristic is consistent if, for every state s and every successor s' via move m:
h(s) ≤ cost(m) + h(s')
In English: the heuristic value cannot drop by more than one when you make one move.
Manhattan distance is consistent because a single move changes the Manhattan distance by exactly +1 or −1. (One tile moves by one cell, so its contribution to the heuristic changes by ±1; all other tiles' contributions are unchanged.)
Consistency matters because it guarantees A* never re-opens a state it has already closed. Implementations don't need to handle the inconsistency case, which simplifies code and proofs.
Why it works so well in practice
Three concrete properties:
It is cheap to compute. O(N²) per board — sum over all tiles, each contribution constant time. Computing h(s) takes microseconds on any modern computer.
It is tight enough to prune deeply. For 15 puzzles, the Manhattan heuristic value averages 75–90% of the true distance. A* with such a tight heuristic explores a tiny fraction of the state space.
It composes well with linear conflict. Adding 2 moves per "linear conflict" (two tiles in the same row, both belong to that row, but in the wrong order) gives a strictly tighter heuristic at marginal computational cost.
Where it falls short
Manhattan distance treats tiles as if they could pass through each other freely. They cannot. The deepest blind spot:
Conflicts in a column or row. If tiles 2 and 3 are both in row 1 but in the wrong order, Manhattan gives them their straight-line distances and notices nothing. In reality one of them has to be lifted out of the row, the other passed, and the first lifted back. That is at least 2 extra moves Manhattan doesn't see. Linear conflict patches this.
"Walking" structure. Manhattan ignores how tiles relate to each other within a row or column when they all belong to that row or column. A row of four tiles in the wrong permutation requires multiple "row swaps" — a structure that walking distance captures.
Disjoint subset interactions. For any disjoint partition of tiles, the optimal cost of permuting each partition into place is a tighter lower bound than Manhattan (which is the partition into singletons). Additive pattern databases exploit this.
The progression is: Manhattan → Manhattan + linear conflict → walking distance → pattern databases. Each is a strict improvement.
Why bigger boards need more
Manhattan distance averages about 75–90% of true distance for the 15 puzzle. For the 24 puzzle, it averages about 65%. For the 35 puzzle, about 55%.
The reason: as boards grow, tile conflicts and "walking structure" become more important relative to straight-line distances. Manhattan distance, which ignores both, underestimates more egregiously on larger boards.
This is why pattern databases are essentially mandatory for 24-puzzle and 35-puzzle solving. The Manhattan heuristic is fine for the 15 puzzle and 8 puzzle but loses too much information at larger sizes.
Geometric intuition
A nice way to think about Manhattan distance: imagine a flat plane with one tile sitting on it. The set of points the tile can reach in k moves is a diamond of side k — points at Manhattan distance ≤ k. (The Euclidean equivalent would be a circle of radius k.)
Manhattan distance is the natural geometry of grid-locomotion. When you have a movement model that is constrained to grid steps, Manhattan distance is the geometry; Euclidean distance is irrelevant.
What this means for app developers
If you are writing a hint button for a slide-puzzle app and want it to work in milliseconds:
- 3×3 — Manhattan + linear conflict is plenty. A* explores tens of states; finishes in microseconds.
- 4×4 — Manhattan + linear conflict is fast enough for a hint (one move, not the full optimal solution). For the full optimal solution, walking distance is the right choice.
- 5×5+ — Pattern databases. Manhattan is not enough to fit inside a hint button's time budget.
Slide Puzzle uses Manhattan + linear conflict for the 3×3 and 4×4 hint button, walking distance for the 4×4 difficulty calibration, and a 5+5+5+9 pattern database for the 5×5 and 6×6 calibration. The Manhattan heuristic is the foundation of all of them.
A final intuition
The Manhattan distance heuristic is what makes slide puzzles searchable at all. Without it, A* and IDA* would degenerate into BFS — and BFS does not finish on the 15 puzzle within the lifetime of the universe.
That heuristic is the single line of code that turns an intractable problem into a millisecond computation. It is worth understanding well.