DFA-Foundation

Iterative Algorithm

Here is a classic iterative algorithm template for May & Forward analysis

Let’s view it in another way

Given a CFG with k nodes, the iterative algorithm updates OUT[n] for every node n in each iteration.
Assume the domain of the values in data flow analysis is V, let’s define a k-tuple: (OUT[n1], OUT[n2], …, OUT[nk]) as an element of set (V1 × V2 … × Vk) denoted as Vk, to hold the values of the analysis after each iteration.
Each iteration can be considered as taking an action to map an element of Vk to a new element of Vk, through applying the transfer functions and control-flow handing, abstracted as a function F: Vk → Vk
Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations.

X is a fixed point of function F if X = F(X)

We say the iterative algorithm reaches a fixed point.

Partial Order

poset

Some examples of poset:

(S, ≤) , S is a set of integers
(S, substring) , S is a set of English words
(S, subset) , S is the power set of set {a,b,c}

upper & lower bounds

Obviously, as long as there is an upper bound or a lower bound, the corresponding lub and glb also exist.

Lattice

Some examples of lattice:

(S, ≤) , S is a set of integers
- ⊔ means “max”
- ⊓ means “min”
(S, subset) , S is the power set of set {a,b,c}
- ⊔ means ∪
- ⊓ means ∩

Semilattice

Complete Lattice

Some examples of complete lattice:

(S, ≤) , S is a set of integers
- × not a complete lattice
- it has no ⊔S（+∞）
(S, subset) , S is the power set of set {a,b,c}
- √

Note: the definition of bounds implies that the bounds are not necessarily in the subsets (but they must be in the lattice)

A complete lattice is not necessarily a finite lattice!

Product Lattice

DFA Framework via Lattice

The IN and OUT of every node can be seen as a latice.（data flow value）

Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice

Monotonicity

Fixed Point Theorem

Proof：

Relate to Iterative Algorithm

two conditions of fixed point theorem:

L is finite
f is monotonic

If a product lattice Lk is a product of complete(and finite) lattices, i.e., (L, L, …, L), then Lk is also complete (and finite) =》 L is a finite and complete lattice

In each iteration, it is equivalent to think that we apply function F which consists of

transfer function fi: L → L for every node
join/meet function ⊔/⊓: L×L×L... → L for control-flow influence

How to prove function F is monotonic?

first, transfer function is obviously monotonic(Gen/Kill function is monotonic, IN/OUT never shrinks)

next, prove that ⊔/⊓ is monotonic

When will the algorithm reach the fixed point?

To sum up, we can draw these conclusions:

the iterative algorithm is guaranteed to terminate（reach the fixed point）
There may be more than one solution, but our solution is the best one（least/greatest fixed point）
Worst case of iterations: lattice height × nodes num in CFG

May & Must analysis

What is the meaning of top and bottom?

Take Reaching Definition for example：

Suppose there is a scenario where the program checks the undefine error. We set a dummy definition for each variable, which means that the variable is undefined. Obviously, they should be set to a value of 0 at first, indicating that all undefined variable will not reach this program point. This is an unsafe result because there may be some undefined variables. This is bottom.

For the top, the values are all 1 which means that all definitions may reach. This is a safe but useless result.

We need the fixed point to stay at the safe area and close to the Truth as long as possible. That is, we need the optimal solution.（least/greatest fixed point）

MOP

How precise is our solution?

MOP: Meet Over All Paths

MOP is only a conceptual metric, as it is not practical to enumerate all paths (there may be path explosions).

Bit-vector or Gen/Kill problems(set union/intersection for join/meet) are distributive.

But some analyses are not distributive

Constant Propagation

Given a variable x at program point p, determine whether x is guaranteed to hold a constant value at p.

The OUT of each node in CFG, includes a set of pairs (x, v) where x is a variable and v is the value held by x after that node

Worklist algorithm

an optimization of Iterative Algorithm.

As iteration goes on in the iterative algorithm, we need to apply F to every node even if one node’s OUT changes.

PreviousDFA NextInterprocedural Analysis

Last updated 10 months ago

Was this helpful?