Problem solvers based on graphs are hard to get your head around at first, but then you get extremely elegant and powerful solutions to seemingly unsolvable problems.
The only gotchas are: 1) time to get your head around 2) algorithmic complexity of the resulting solution.
Graph theory is probably the most fulfilling math application in the computer science. In a way, graph-based algorithms do magic similar to AI but in a fully determined manner. If you think about it more broadly, a graph resembles a subset of a neural network but only with {0, 1} weights.
Maybe some day neural networks will so obvious and well-known to the general public that this is how we'd explain graphs to kids: imagine a NN where weights are always 0/1...
This seems, at least upon first read, analogous to global value numbering (GVN). Or, depending on how you look at it, common subexpression elimination (CSE). I am mostly wondering why they are not mentioned in the article.
GVN and CSE only identify duplicate/common subexpressions. They do not tell you where to place the computation of the common subexpression.
The canonical algorithm to do that is to compute the dominance relation. A node X dominates Y if every path to Y must go through X. Once you have computed the dominance relation, if a common subexpression is located at nodes N1, N2, N3, you can place the computation at some shared dominator of N1, N2, and N3. Because dominance is a statement about /all/ paths, there is a unique lowest dominator [1]. This is exactly the "lowest single common ancestor."
Note that dominance is also defined for cyclic graphs. There may be faster algorithms to compute dominance for acyclic graphs. Expressions in non-lazy programming languages are almost always acyclic (e.g. in Haskell, you can write cyclic expressions).
[1] Claim. Let A, B, and C be reachable nodes. Suppose A and B both dominate C. Then either A dominates B or B dominates A.
Proof. We prove the contrapositive. If neither A dominates B nor B dominates A, then there exist paths a, b from the root such that path a passes through A but not B and path b passes through B but not A. If there is no path from A to C, then A cannot dominate C as C is reachable. Similarly, if there is no path from B to C, then B cannot dominate C. So assume there are paths a' from A to C and b' B to C. Then the path b.b' witnesses that A does not dominate C, and the path a.a' witnesses that B does not dominate C.
(There might be a bug in the proof; I think I proved something too strong, but I'm going to bed.)
Wondered about the same thing. Perhaps the author deals with graphs with no side effects or branches? It would then trivially become CSE on a single basic block.
SSA transformations are essentially equivalent to what the author appears to be doing in terms of let-bindings [0].
Shit. I had the semi hard problem at my last job and I just realized that the most efficient way I could solve it would be to write a blog post on how proud I was of my (shitty) solution and wait for the snarky commenters to tell me the proper way to do it..
I love this fact about the internet! Thanks guys! Keep it up! Including the snarkyness. It’s part of what makes it great!
(I am aware this is not a novel idea. Posting the wrong solution is better than asking for help.. It is just fun to see it in action)
I came here to mention this as well. If this problem was so critical to the company the author was working at, it seems negligent to spend a _year_ reinventing a solved problem from scratch, especially given the author's apparent history of compiler experience.
Problem solvers based on graphs are hard to get your head around at first, but then you get extremely elegant and powerful solutions to seemingly unsolvable problems.
The only gotchas are: 1) time to get your head around 2) algorithmic complexity of the resulting solution.
Graph theory is probably the most fulfilling math application in the computer science. In a way, graph-based algorithms do magic similar to AI but in a fully determined manner. If you think about it more broadly, a graph resembles a subset of a neural network but only with {0, 1} weights.
Maybe some day neural networks will so obvious and well-known to the general public that this is how we'd explain graphs to kids: imagine a NN where weights are always 0/1...
You know what did happen millions of years ago when the monkeys started to use tools.
Time of AI to know how to use tools, like a mathematical formal solver. Well, it is already done, but it is not LLM... soooo.. academics only?
The general public believes 1/3 is smaller than 1/4.
If you like reasoning about a program in terms of expression trees/graphs, I recently discovered that Wolfram Language has built-ins for this:
https://reference.wolfram.com/language/ref/ExpressionTree.ht...
A great example of the value of academia and scientific research.
This seems, at least upon first read, analogous to global value numbering (GVN). Or, depending on how you look at it, common subexpression elimination (CSE). I am mostly wondering why they are not mentioned in the article.
GVN and CSE only identify duplicate/common subexpressions. They do not tell you where to place the computation of the common subexpression.
The canonical algorithm to do that is to compute the dominance relation. A node X dominates Y if every path to Y must go through X. Once you have computed the dominance relation, if a common subexpression is located at nodes N1, N2, N3, you can place the computation at some shared dominator of N1, N2, and N3. Because dominance is a statement about /all/ paths, there is a unique lowest dominator [1]. This is exactly the "lowest single common ancestor."
Note that dominance is also defined for cyclic graphs. There may be faster algorithms to compute dominance for acyclic graphs. Expressions in non-lazy programming languages are almost always acyclic (e.g. in Haskell, you can write cyclic expressions).
[1] Claim. Let A, B, and C be reachable nodes. Suppose A and B both dominate C. Then either A dominates B or B dominates A.
Proof. We prove the contrapositive. If neither A dominates B nor B dominates A, then there exist paths a, b from the root such that path a passes through A but not B and path b passes through B but not A. If there is no path from A to C, then A cannot dominate C as C is reachable. Similarly, if there is no path from B to C, then B cannot dominate C. So assume there are paths a' from A to C and b' B to C. Then the path b.b' witnesses that A does not dominate C, and the path a.a' witnesses that B does not dominate C.
(There might be a bug in the proof; I think I proved something too strong, but I'm going to bed.)
Wondered about the same thing. Perhaps the author deals with graphs with no side effects or branches? It would then trivially become CSE on a single basic block.
SSA transformations are essentially equivalent to what the author appears to be doing in terms of let-bindings [0].
[0] https://dl.acm.org/doi/10.1145/278283.278285
Shit. I had the semi hard problem at my last job and I just realized that the most efficient way I could solve it would be to write a blog post on how proud I was of my (shitty) solution and wait for the snarky commenters to tell me the proper way to do it..
I love this fact about the internet! Thanks guys! Keep it up! Including the snarkyness. It’s part of what makes it great!
(I am aware this is not a novel idea. Posting the wrong solution is better than asking for help.. It is just fun to see it in action)
I came here to mention this as well. If this problem was so critical to the company the author was working at, it seems negligent to spend a _year_ reinventing a solved problem from scratch, especially given the author's apparent history of compiler experience.
Yivycy