DFA
Overview of DFA
DFA:Data Flow Analysis
How application-specific data flows through the nodes and edges of CFG(source code->IR->CFG)
most static analyzer tends to may analysis
may analysis
outputs information that may be true (over-approximation)
must analysis
outputs information that must be true (under-approximation)
different data-flow analysis applications have
different data abstraction and
different flow safe-approximation strategies and
different transfer functions and control-flow handlings
e.g., determine the sgin of a variable
data abstraction:+、-、0、unknown、undefined
transfer function:+ op + = + , + op - = -
control-flow handlings:union the signs at merges
Preliminaries of DFA
Input and Output States
Each execution of an IR statement transforms an input state to a new output state
The input/output state is associated with the program point before/after the statement
In each data-flow analysis application, we associate with every program point a data-flow value that represents an abstraction of the set of all possible program states that can be observed for that point.
DFA is to find a solution to a set of safe-approximation-directed constraints on the IN[s]’s and OUT[s]’s for all statements
constraints based on semantics of statements(transfer function)
constraints based on the flows of control
Transfer Function
Control Flow
Reaching Definitions analysis
A definition d at program point p reaches a point q if there is path from p to q such that d is not “killed” along that path
A definition of a variable v is a statement that assigns a value to v
how to be “killed”: new definition of v
Reaching definitions can be used to detect possible undefined variables.
e.g., introduce a dummy definition for each variable v at the entry of CFG, and if the dummy definition of v reaches a point p where v is used, then v may be used before definition (as undefined reaches v)
So reaching definitions analysis is may analysis.
(不放过动态运行时所有可能的路径)
Abstraction
The definitions of all the variables in a program can be represented by bit vectors.
Safe-Approximation
Transfer Function
genB:definitions generated in this BB.
killB: new definition D in this BB kills other definitions
The definitions in genB can reach the program point at OUT[B] and those in killB can't reach the program point at OUT[B]
obviously, same variables in definition bit vectors can not exist at the same time.
Control Flow
reaching definition is a may analysis.So here use union(∪) as the meet operator.
Algorithm
For Example:
CFG above can be transfered to pseudo-code below.
Iteration 0 —— init
Iteration 1
IN[B1] = 0000 0000 = OUT[Entry]
OUT[B1] = 1100 0000 generate D1、D2
IN[B2] = 1100 0000 = OUT[B1] ∪ OUT[B4]
OUT[B2] = 1011 0000 generate D3、D4,kill D2
IN[B3] = 1011 0000 = OUT[B2]
OUT[B3] = 0011 0010 generate D7,kill D1
IN[B4] = 1011 0000 = OUT[B2]
OUT[B4] = 0011 1100 generate D5、D6,kill D1
IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]
OUT[B5] = 0011 1011 generate D8, kill D6
Iteration 2
IN[B1] = 0000 0000 = OUT[Entry]
OUT[B1] = 1100 0000 generate D1、D2
IN[B2] = 1111 1100 = OUT[B1] ∪ OUT[B4]
OUT[B2] = 1011 1100 generate D3、D4,kill D2
IN[B3] = 1011 1100 = OUT[B2]
OUT[B3] = 0011 0110 generate D7,kill D1、D5
IN[B4] = 1011 1100 = OUT[B2]
OUT[B4] = 0011 1100 generate D5、D6,kill D1、D7、D8
IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]
OUT[B5] = 0011 1011 generate D8, kill D6
Iteration 3
IN[B1] = 0000 0000 = OUT[Entry]
OUT[B1] = 1100 0000 generate D1、D2
IN[B2] = 1111 1100 = OUT[B1] ∪ OUT[B4]
OUT[B2] = 1011 1100 generate D3、D4,kill D2
IN[B3] = 1011 1100 = OUT[B2]
OUT[B3] = 0011 0110 generate D7,kill D1、D5
IN[B4] = 1011 1100 = OUT[B2]
OUT[B4] = 0011 1100 generate D5、D6,kill D1、D7、D8
IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]
OUT[B5] = 0011 1011 generate D8, kill D6
The final result is the green part which means definitions can reach this point in the program.
Why this iterative algorithm can finally stop?
INs will not change if OUTs do not change
OUTs will not change if INs do not change
Reach a fixed point. Also related with monotonicity
Live Variables Analysis
Live variables analysis tells whether the value of variable v at program point p could be used along some path in CFG starting at p.If so, v is live at p; otherwise, v is dead at p.
information of live variables can be used for register allocations.
e.g., all registers are full and we need to use one, then we should favor using a register with a dead value.
Abstraction
All variables in a program can be represented by bit vectors.
Safe-Approximation
redefinition will break the path. forwards analysis requires record of previous state, so we use backwards analysis
One live path will be ok, so we use may analysis
Control Flow
Transfer Function
useB:variables used before redefined in B
defB:variables redefined in B
OUT[B]:variables live coming out of B
note: we focus on the live state of variables at some program point.
Algorithm
Iteration 0 —— init
Iteration 1
IN[Exit] = 000 0000
OUT[B5] = 000 0000 = IN[Exit]
IN[B5] = 000 1000 use p, redefine z
OUT[B3] = 000 1000 = IN[B5]
IN[B3] = 100 1000 use x
OUT[B4] = 000 1000 = IN[B5] ∪ IN[B2]
IN[B4] = 010 1000 use y, redefine x、q
OUT[B2] = 110 1000 = IN[B3] ∪ IN[B4]
IN[B2] = 100 1001 use k, redefine m、y
OUT[B1] = 100 1001 = IN[B2]
IN[B1] = 001 1101 use p、q、z, redefine x、y
Iteration 2
IN[Exit] = 000 0000
OUT[B5] = 000 0000 = IN[Exit]
IN[B5] = 000 1000 use p, redefine z
OUT[B3] = 000 1000 = IN[B5]
IN[B3] = 100 1000 use x
OUT[B4] = 100 1001 = IN[B5] ∪ IN[B2]
IN[B4] = 010 1001 use y, redefine x、q
OUT[B2] = 110 1001 = IN[B3] ∪ IN[B4]
IN[B2] = 100 1001 use k, redefine m、y
OUT[B1] = 100 1001 = IN[B2]
IN[B1] = 001 1101 use p、q、z, redefine x、y
Iteration 3
IN[Exit] = 000 0000
OUT[B5] = 000 0000 = IN[Exit]
IN[B5] = 000 1000 use p, redefine z
OUT[B3] = 000 1000 = IN[B5]
IN[B3] = 100 1000 use x
OUT[B4] = 100 1001 = IN[B5] ∪ IN[B2]
IN[B4] = 010 1001 use y, redefine x、q
OUT[B2] = 110 1001 = IN[B3] ∪ IN[B4]
IN[B2] = 100 1001 use k, redefine m、y
OUT[B1] = 100 1001 = IN[B2]
IN[B1] = 001 1101 use p、q、z, redefine x、y
The final result is the green part which means variable is live along the path(can be used in future) from this program point
Available Expressions Analysis
An expression x op y
is available at program point p if
all paths from the entry to p must pass through the evaluation of
x op y
after the last evaluation of
x op y
, there is no redefinition of x or y
available expressions can be used for detecting global common subexpressions.
Abstraction
All the expressions in a program can be represented by bit vectors.
Safe-Approximation
Transfer Function
Control Flow
All paths from entry to point p must pass through the evaluation of x op y
,so we use must analysis
for safety of the analysis, it may report an expression as unavailable even if it is truly available.(用于编译器优化,不能优化错误的内容, under-approximation)
Algorithm
Iteration 0 —— init
Iteration 1
OUT[Entry] = 00000
IN[B1] = 00000
OUT[B1] = 10000 generate E1
IN[B2] = 10000 OUT[B1] ∩ OUT[B4]
OUT[B2] = 01010 generate E2、E4,kill E1
IN[B3] = 01010 = OUT[B2]
OUT[B3] = 00011 generate E5,kill E2
IN[B4] = 01010 = OUT[B2]
OUT[B4] = 01110 generate E3、E4
IN[B5] = 00010 = OUT[B3] ∩ OUT[B4]
OUT[B5] = 01010 generate E4、E2,kill E3
Iteration 2
OUT[Entry] = 00000
IN[B1] = 00000
OUT[B1] = 10000 generate E1
IN[B2] = 00000 OUT[B1] ∩ OUT[B4]
OUT[B2] = 01010 generate E2、E4,kill E1
IN[B3] = 01010 = OUT[B2]
OUT[B3] = 00011 generate E5,kill E2
IN[B4] = 01010 = OUT[B2]
OUT[B4] = 01110 generate E3、E4
IN[B5] = 00010 = OUT[B3] ∩ OUT[B4]
OUT[B5] = 01010 generate E4、E2,kill E3
Summary
Last updated