Skip to main content

How to diff two large JSON files with out-of-order keys

·2 mins

You have two JSON files that should be equivalent but somebody’s process reorders the keys, so diff shows every line as changed. You want the semantic diff, not the byte diff.

The Cheap Fix: jq -S #

Sort keys deep, then diff:

diff <(jq -S . a.json) <(jq -S . b.json)

-S sorts object keys recursively. The output of jq is now deterministic across both files, so diff only shows the real changes. Zero extra tools required if you already have jq.

When jq -S Isn’t Enough: dyff #

If the diff is still noisy - arrays that don’t match by index, or you want friendlier output - install dyff:

dyff between a.json b.json

Reads YAML and JSON, understands arrays of objects (matches them by key rather than position), and colours the output. Useful for Kubernetes manifests and Helm values files.

The Structural One: jd #

jd produces machine-parseable diffs and supports JSON Patch format:

jd a.json b.json

Output looks like:

@ ["users",0,"email"]
- "[email protected]"
+ "[email protected]"

You can apply the patch elsewhere:

jd -p a.json b.json > patch.jd
jd -p -f patch.jd < a.json > new_b.json

Handy when you want to programmatically detect specific changes in CI.

Which One When #

SituationTool
Quick semantic diff, no installdiff <(jq -S ...) <(jq -S ...)
Human-readable, arrays of objectsdyff
Machine-readable, patch applicationjd
YAML too (Kubernetes, Helm)dyff

The Trap Nobody Talks About #

jq -S normalises whitespace and quoting, but it does not normalise number precision. 1.0 and 1 will still diff, and so will 1e2 and 100. If your producers are inconsistent about integer-vs-float representation, normalise numbers too:

jq -S 'walk(if type == "number" then tonumber else . end)'

walk is a jq builtin from 1.6 onwards.