Productivity & Focus · Guide · Writing & Content
How to compare text changes
How diff algorithms work, line vs word vs character granularity, unified diffs, workflow for code, prose, and legal docs, tool comparison.
Text diffs are how developers, writers, legal teams, and reviewers see what actually changed. “I updated the doc” tells you nothing. A diff tells you exactly which words moved, which clauses were added, which numbers were tweaked. The difference between diff-illiterate and diff-fluent is hours a week in code review, contract review, and document review. This guide covers how diff algorithms work (line-based, word-based, character-based), the diff formats you’ll encounter (unified, context, side-by-side), practical diffing for code versus prose versus contracts, and the tools that go beyond naive line-level comparison.
Advertisement
What a diff actually computes
A diff algorithm finds the minimum set of edits that transforms one string into another. “Minimum” is the key word — there are always many valid edit sequences, but the readable one is the shortest.
Myers diff algorithm: the standard since 1986. Used by git, most IDEs, and most online tools. Computes the longest common subsequence (LCS), then marks additions and deletions around it.
Patience diff: alternative used by Bazaar and git (with --patience). Better on structured text like code — aligns function signatures and braces more intuitively.
Histogram diff: git’s default since 2.12. Variant of patience, often the clearest for real code. Enable globally: git config --global diff.algorithm histogram.
Granularity — line, word, or character
Line-based: the default in git and most tools. Fast, readable, but noisy when you’ve reformatted. A single wrapping change makes the whole paragraph look changed.
Word-based: highlights which words changed within a line. git’s --word-diff enables this. Much clearer for prose and contracts.
Character-based: highlights exactly which characters flipped. Overkill for most reading; useful for spotting typos or non-printing character changes (smart quotes, non-breaking spaces).
Common diff formats
Unified diff (git diff default):
@@ -10,5 +10,7 @@ unchanged line -deleted line +added line 1 +added line 2 unchanged line
@@ -10,5 +10,7 @@ means “5 lines at line 10 in the old file correspond to 7 lines at line 10 in the new.”
Context diff (older, less common): shows 3 lines of context with ! markers for changes.
Side-by-side diff: tools render unified diffs visually with old on left, new on right. Better for long blocks; worse for many small changes.
Three-way merge diff: shows base, local, and remote versions when resolving conflicts. git’sconflictstyle=diff3 enables this — huge quality improvement over the default.
Diffing code — the standard workflow
git diff: unstaged changes vs. the index.
git diff --staged: staged vs. last commit.
git diff main..feature: branch comparison.
git diff commit1 commit2: between arbitrary commits.
Ignore whitespace: git diff -w strips whitespace-only changes — essential when reviewing after a formatter ran.
Word-level for prose: git diff --word-diff.
Stat summary: git diff --stat shows files with line counts.
Pickaxe: git log -S"searchterm" finds commits where that string was added or removed.
Diffing prose and documents
Code diff tools handle code well; prose diffs need different care.
Reformatting noise: hard line-wrapping at 80 chars makes diffs unreadable when one word changes. Prose should be soft-wrapped (one sentence per line or unwrapped entirely) to keep diffs clean.
Tracked changes (Word, Google Docs): native diff UX for non-technical users. Good for collaborative editing but not round-trippable with git.
Markdown diffs: treat markdown as prose for formatting, but validate structure changes (headings, lists) visually — diff tools don’t understand markdown semantically.
Diffing contracts and legal documents
Redlining tools (Microsoft Word’s “Compare,” Litera, iManage) produce a legal-style redline showing deletions struck through and additions underlined. Standard for legal review.
Word-level is essential for contracts. A single changed word can shift liability.
Formatting diffs matter less; content diffs matter more. Use tools that let you ignore style-only changes.
Sign every redline. Best practice: circulate both clean and redlined versions for review. Never sign without the redline.
Common diff tools
git diff / git difftool: default for code. Configure difftool with Beyond Compare, Meld, or Delta.
Delta: syntax-highlighted, readable git diff replacement. Install and configure git config pager.diff delta.
VS Code diff viewer: side-by-side with syntax highlighting. Opens on branch compare or from Git tab.
Beyond Compare, Meld, Kaleidoscope: dedicated diff apps. Heavy for casual use; powerful for complex merges.
Online diff checkers: for quick ad-hoc text compare — don’t upload sensitive content. Use local tools for anything private.
diff command (Unix): diff file1 file2. Venerable, minimal. -u for unified, -r for recursive directory diff.
Patch files
A diff saved to a file is a patch. Apply with git apply patch.diff or patch < patch.diff.
Use cases: sharing changes without pushing/pulling, reviewing contributions to read-only repos, applying upstream fixes to forked code.
Patch hygiene: patches fail when the target file has drifted. Small, single-purpose patches apply cleanly; large patches break.
Reading diffs efficiently
Deletions first: read what was removed. Understanding what’s gone gives context for what’s new.
Context lines: expand context (-U10) when the default 3 lines isn’t enough. Too much context hides the change; too little hides the reason.
Split noisy diffs: reformatting + content changes in one commit are unreadable. Reformat in one commit, change content in the next. This is not optional.
Check whitespace characters: git marks trailing whitespace with red background. Non-breaking spaces, tabs-vs-spaces, Windows/Unix line endings all show up in diffs as mysterious changes.
Common mistakes
Reviewing reformatted code line-by-line.Impossible. Pre-format, re-commit, re-review.
Ignoring whitespace noise. CRLF vs LF, trailing whitespace, tab sizes — all hide real changes. Normalize line endings with .gitattributes.
Pasting binary files. Diff tools can’t show meaningful diffs of binary formats (images, PDFs, Word docs). Use format-specific tools.
Using character-level diff for code review.Too noisy. Line-level with word-level expansion is the sweet spot.
Skipping the word diff for prose. Line-level makes paragraph changes unreadable.
Comparing files in different encodings. UTF-8 vs CP1252 makes everything look changed. Normalize encoding first.
Run the numbers
Compare two blocks of text instantly with the diff checker. Pair with the word counter to see how much was added or removed, and the text reverser for quick text transformations before comparing.
Advertisement