How to diff two large JSON files with out-of-order keys
Table of Contents
You have two JSON files that should be equivalent but somebody’s process reorders the keys, so diff shows every line as changed. You want the semantic diff, not the byte diff.
The Cheap Fix: jq -S #
Sort keys deep, then diff:
diff <(jq -S . a.json) <(jq -S . b.json)
-S sorts object keys recursively. The output of jq is now deterministic across both files, so diff only shows the real changes. Zero extra tools required if you already have jq.
When jq -S Isn’t Enough: dyff #
If the diff is still noisy - arrays that don’t match by index, or you want friendlier output - install dyff:
dyff between a.json b.json
Reads YAML and JSON, understands arrays of objects (matches them by key rather than position), and colours the output. Useful for Kubernetes manifests and Helm values files.
The Structural One: jd #
jd produces machine-parseable diffs and supports JSON Patch format:
jd a.json b.json
Output looks like:
@ ["users",0,"email"]
- "[email protected]"
+ "[email protected]"
You can apply the patch elsewhere:
jd -p a.json b.json > patch.jd
jd -p -f patch.jd < a.json > new_b.json
Handy when you want to programmatically detect specific changes in CI.
Which One When #
| Situation | Tool |
|---|---|
| Quick semantic diff, no install | diff <(jq -S ...) <(jq -S ...) |
| Human-readable, arrays of objects | dyff |
| Machine-readable, patch application | jd |
| YAML too (Kubernetes, Helm) | dyff |
The Trap Nobody Talks About #
jq -S normalises whitespace and quoting, but it does not normalise number precision. 1.0 and 1 will still diff, and so will 1e2 and 100. If your producers are inconsistent about integer-vs-float representation, normalise numbers too:
jq -S 'walk(if type == "number" then tonumber else . end)'
walk is a jq builtin from 1.6 onwards.