DFA-Foundation
Last updated
Last updated
Here is a classic iterative algorithm template for May & Forward analysis
Let’s view it in another way
Given a CFG with k nodes, the iterative algorithm updates OUT[n] for every node n in each iteration.
Assume the domain of the values in data flow analysis is V, let’s define a k-tuple: (OUT[n1], OUT[n2], …, OUT[nk]) as an element of set (V1 × V2 … × Vk) denoted as Vk, to hold the values of the analysis after each iteration.
Each iteration can be considered as taking an action to map an element of Vk to a new element of Vk, through applying the transfer functions and control-flow handing, abstracted as a function F: Vk → Vk
Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations.
X is a fixed point of function F if X = F(X)
We say the iterative algorithm reaches a fixed point.
Some examples of poset:
(S, ≤) , S is a set of integers
(S, substring) , S is a set of English words
(S, subset) , S is the power set of set {a,b,c}
Obviously, as long as there is an upper bound or a lower bound, the corresponding lub and glb also exist.
Some examples of lattice:
(S, ≤) , S is a set of integers
⊔ means “max”
⊓ means “min”
(S, subset) , S is the power set of set {a,b,c}
⊔ means ∪
⊓ means ∩
Some examples of complete lattice:
(S, ≤) , S is a set of integers
× not a complete lattice
it has no ⊔S(+∞)
(S, subset) , S is the power set of set {a,b,c}
√
Note: the definition of bounds implies that the bounds are not necessarily in the subsets (but they must be in the lattice)
A complete lattice is not necessarily a finite lattice!
The IN and OUT of every node can be seen as a latice.(data flow value)
Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice
Proof:
two conditions of fixed point theorem:
L is finite
f is monotonic
If a product lattice Lk is a product of complete(and finite) lattices, i.e., (L, L, …, L), then Lk is also complete (and finite) =》 L is a finite and complete lattice
In each iteration, it is equivalent to think that we apply function F which consists of
transfer function fi: L → L for every node
join/meet function ⊔/⊓: L×L×L... → L for control-flow influence
How to prove function F is monotonic?
first, transfer function is obviously monotonic(Gen/Kill function is monotonic, IN/OUT never shrinks)
next, prove that ⊔/⊓ is monotonic
When will the algorithm reach the fixed point?
To sum up, we can draw these conclusions:
the iterative algorithm is guaranteed to terminate(reach the fixed point)
There may be more than one solution, but our solution is the best one(least/greatest fixed point)
Worst case of iterations: lattice height × nodes num in CFG
What is the meaning of top and bottom?
Take Reaching Definition for example:
Suppose there is a scenario where the program checks the undefine error. We set a dummy definition for each variable, which means that the variable is undefined. Obviously, they should be set to a value of 0 at first, indicating that all undefined variable will not reach this program point. This is an unsafe result because there may be some undefined variables. This is bottom.
For the top, the values are all 1 which means that all definitions may reach. This is a safe but useless result.
We need the fixed point to stay at the safe area and close to the Truth as long as possible. That is, we need the optimal solution.(least/greatest fixed point)
How precise is our solution?
MOP: Meet Over All Paths
MOP is only a conceptual metric, as it is not practical to enumerate all paths (there may be path explosions).
Bit-vector or Gen/Kill problems(set union/intersection for join/meet) are distributive.
But some analyses are not distributive
Given a variable x at program point p, determine whether x is guaranteed to hold a constant value at p.
The OUT of each node in CFG, includes a set of pairs (x, v) where x is a variable and v is the value held by x after that node
an optimization of Iterative Algorithm.
As iteration goes on in the iterative algorithm, we need to apply F to every node even if one node’s OUT changes.