Pointer Analysis Foundation
Last updated
Last updated
Domain and Notations
Essentially, pointer analysis is to propagate points-to information among pointers (variables & fields)
It’s about solving a system of inclusion constraints for pointers
Key to implementation: when pt(x) is changed, propagate the changed part to the related pointers of x
Pointer flow graph of a program is a directed graph that expresses how objects flow among the pointers in the program.
Nodes: Pointer = V ⋃ (O × F) A node n represents a variable or a field of an abstract object
Edges: Pointer × Pointer An edge x->y means that the objects pointed by pointer x may flow to (and also be pointed to by) pointer y
As shown above, the building of PFG depends on some points-to information which need a PFG to be obtained from.
So PFG is dynamically updated during pointer analysis.
Why ∆ = pts - pt(n) ?
differential propagation: avoid redundant propagation of points-to information and infinite loop.
When a edge is added to PFG, we need to propagate the source’s points-to information(if not empty) to the target.
When a node’s pointer set is updated, we need to propagate the differential information to it’s successors.
An Example:
Init
WL: [<b, {o1}>、<c, {o3}>]
PFG: b->a、c->d
pt(b) = {}、pt(a) = {}、pt(c) = {}、pt(d) = {}
Round 1
process <b, {o1}>
pt(b) = {o1}
WL: [<c, {o3}>、<a, {o1}>]
PFG: b->a、c->d
Round 2
process <c, {o3}>
pt(b) = {o1}、pt(c) = {o3}
WL: [<a, {o1}>、<d, {o3}>]
PFG: b->a、c->d、a->o3.f、d->o3.f
Round 3
process <a, {o1}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}
WL: [<d, {o3}>、<o3.f, {o1}>]
PFG: b->a、c->d、a->o3.f、d->o3.f
Round 4
process <d, {o3}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}、pt(d) = {o3}
WL: [<o3.f, {o1}>、<o3.f, {o3}>]
PFG: b->a、c->d、a->o3.f、d->o3.f、o3.f->e
Round 5
process <o3.f, {o1}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}、pt(d) = {o3}
pt(o3.f) = {o1}
WL: [<o3.f, {o3}>、<e, {o1}>]
PFG: b->a、c->d、a->o3.f、d->o3.f、o3.f->e
Round 6
process <o3.f, {o3}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}、pt(d) = {o3}
pt(o3.f) = {o1, o3}
WL: [<e, {o1}>、<e, {o3}>]
PFG: b->a、c->d、a->o3.f、d->o3.f、o3.f->e
Round 7
process <e, {o1}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}、pt(d) = {o3}
pt(o3.f) = {o1, o3}、pt(e) = {o1}
WL: [<e, {o3}>]
PFG: b->a、c->d、a->o3.f、d->o3.f、o3.f->e
Round 8
process <e, {o3}>
pt(b) = {o1}、pt(c) = {o3}、pt(a) = {o1}、pt(d) = {o3}
pt(o3.f) = {o1, o3}、pt(e) = {o1, o3}
WL: []
PFG: b->a、c->d、a->o3.f、d->o3.f、o3.f->e
inter-procedural pointer analysis requires call graph
CHA
resolve call targets based on the class hierarchy of caller’s declared type
imprecise, introduce spurious call graph edges and points-to relations
Pointer analysis
resolve call targets based on pt(receiver object)
more precise than CHA, both for call graph and points-to relations
Why not add PFG edge x->m
thisReceiver object should only flow to this variable of the corresponding target method
PFG edge x->m
thiswould introduce spurious points-to relations for this variables
this
variable is special as it can determine the real class type containing the called method
Obviously pointer analysis and call graph construction are mutually dependent.(The former needs call graph to provide call & return information to achieve interprocedural analysis.The latter needs pointer analysis to provide points-to information to obtain more precise call relations.)
To be efficient, only reachable methods and statements are analyzed. Reachable methods will be gradually discovered during analysis.
When a variable’s points-to information changes, find if statements like load
、store
、call
exists.
When a called method is found, dispatch it first, propagate the points-to information to this
variable in the method, try to add the method to the reachable method(extract it’s New
and Assign
statements), add edge between arguments and parameters, return variable and return-site variable.
An Example:
Initialization
RM = {A.main()}
WL = [<a, {o3}>、<b, {o4}>]
CG = {}
Round 1
Process <a, {o3}>
pt(a) = {o3}
RM = {A.main()}
WL = [<b, {o4}>]
Round 2
Process <b, {o4}>
pt(a) = {o3}、pt(b) = {o4}
Dispatch(o4.class, foo) = B.foo(A)
CG = {5->B.foo(A)}
RM = {A.main(), B.foo(A)}
WL = [<B.foo/this, {o4}>、<r, {o11}>、<B.foo.y, {o3}>]
PFG = {a->y, r->c}
Round 3
Process <B.foo/this, {o4}>
pt(a) = {o3}、pt(b) = {o4}、pt(B.foo/this)= {o4}
CG = {5->B.foo(A)}
RM = {A.main(), B.foo(A)}
WL = [<r, {o11}>、<y, {o3}>]
PFG = {a->y, r->c}
Round 4
Process <r, {o11}>
pt(a) = {o3}、pt(b) = {o4}、pt(B.foo/this)= {o4}、pt(r) = {o11}
CG = {5->B.foo(A)}
RM = {A.main(), B.foo(A)}
WL = [<y, {o3}>, <c, {o11}>]
PFG = {a->y, r->c}
Round 5
Process <y, {o3}>
pt(a) = {o3}、pt(b) = {o4}、pt(B.foo/this)= {o4}、pt(r) = {o11}
pt(y) = {o3}
CG = {5->B.foo(A)}
RM = {A.main(), B.foo(A)}
WL = [<c, {o11}>]
PFG = {a->y, r->c}
Round 6
Process <c, {o11}>
pt(a) = {o3}、pt(b) = {o4}、pt(B.foo/this)= {o4}、pt(r) = {o11}
pt(B.foo.y) = {o3}、pt(c) = {o11}
CG = {b.foo(a)->B.foo(A)}
RM = {A.main(), B.foo(A)}
WL = []
PFG = {a->y, r->c}