I haven't quite filled that fourth box yet but I've made a lot of progress. I'm parsing both files into a syntax tree which I'll then use to calculate a diff between those files. This syntax tree is a combined abstract and concrete syntax tree: The black nodes form the AST and the gray nodes come from the CST and fill in the gaps between the nodes of the AST. The content in quotes comes directly from the source code while anything else is tree-types and structure. The grey nodes are actually two different types: necessary punctuation and optional whitespace; they are not differentiated in this visualization. Because of its structure and because it's neither an AST nor a CST I'm calling it an "expandable syntax tree" or EST (which then brings us to "diffEST" 😛 ). Up next: Fill that fourth box. &diffest