Java
  • About This Book
  • 🍖Prerequisites
    • 反射
      • 反射基本使用
      • 高版本JDK反射绕过
      • 反射调用命令执行
      • 反射构造HashMap
      • 方法句柄
    • 类加载
      • 动态加载字节码
      • 双亲委派模型
      • BCEL
      • SPI
    • RMI & JNDI
      • RPC Intro
      • RMI
      • JEP 290
      • JNDI
    • Misc
      • Unsafe
      • 代理模式
      • JMX
      • JDWP
      • JPDA
      • JVMTI
      • JNA
      • Java Security Manager
  • 👻Serial Journey
    • URLDNS
    • SerialVersionUID
    • Commons Collection 🥏
      • CC1-TransformedMap
      • CC1-LazyMap
      • CC6
      • CC3
      • CC2
    • FastJson 🪁
      • FastJson-Basic Usage
      • FastJson-TemplatesImpl
      • FastJson-JdbcRowSetImpl
      • FastJson-BasicDataSource
      • FastJson-ByPass
      • FastJson与原生反序列化(一)
      • FastJson与原生反序列化(二)
      • Jackson的原生反序列化利用
    • Other Components
      • SnakeYaml
      • C3P0
      • AspectJWeaver
      • Rome
      • Spring
      • Hessian
      • Hessian_Only_JDK
      • Kryo
      • Dubbo
  • 🌵RASP
    • JavaAgent
    • JVM
    • ByteCode
    • JNI
    • ASM 🪡
      • ASM Intro
      • Class Generation
      • Class Transformation
    • Rasp防御命令执行
    • OpenRASP
  • 🐎Memory Shell
    • Tomcat-Architecture
    • Servlet API
      • Listener
      • Filter
      • Servlet
    • Tomcat-Middlewares
      • Tomcat-Valve
      • Tomcat-Executor
      • Tomcat-Upgrade
    • Agent MemShell
    • WebSocket
    • 内存马查杀
    • IDEA本地调试Tomcat
  • ✂️JDBC Attack
    • MySQL JDBC Attack
    • H2 JDBC Attack
  • 🎨Templates
    • FreeMarker
    • Thymeleaf
    • Enjoy
  • 🎏MessageQueue
    • ActiveMQ CNVD-2023-69477
    • AMQP CVE-2023-34050
    • Spring-Kafka CVE-2023-34040
    • RocketMQ CVE-2023-33246
  • 🛡️Shiro
    • Shiro Intro
    • Request URI ByPass
    • Context Path ByPass
    • Remember Me反序列化 CC-Shiro
    • CB1与无CC依赖的反序列化链
  • 🍺Others
    • Deserialization Twice
    • A New Blazer 4 getter RCE
    • Apache Commons Jxpath
    • El Attack
    • Spel Attack
    • C3P0原生反序列化的JNDI打法
    • Log4j
    • Echo Tech
      • SpringBoot Under Tomcat
    • CTF 🚩
      • 长城杯-b4bycoffee (ROME反序列化)
      • MTCTF2022(CB+Shiro绕过)
      • CISCN 2023 西南赛区半决赛 (Hessian原生JDK+Kryo反序列化)
      • CISCN 2023 初赛 (高版本Commons Collections下其他依赖的利用)
      • CISCN 2021 总决赛 ezj4va (AspectJWeaver写字节码文件到classpath)
      • D^3CTF2023 (新的getter+高版本JNDI不出网+Hessian异常toString)
      • WMCTF2023(CC链花式玩法+盲读文件)
      • 第六届安洵杯网络安全挑战赛(CB PriorityQueue替代+Postgresql JDBC Attack+FreeMarker)
  • 🔍Code Inspector
    • CodeQL 🧶
      • Tutorial
        • Intro
        • Module
        • Predicate
        • Query
        • Type
      • CodeQL 4 Java
        • Basics
        • DFA
        • Example
    • SootUp ✨
      • Intro
      • Jimple
      • DFA
      • CG
    • Tabby 🔦
      • install
    • Theory
      • Static Analysis
        • Intro
        • IR & CFG
        • DFA
        • DFA-Foundation
        • Interprocedural Analysis
        • Pointer Analysis
        • Pointer Analysis Foundation
        • PTA-Context Sensitivity
        • Taint Anlysis
        • Datalog
Powered by GitBook
On this page
  • Overview of DFA
  • Preliminaries of DFA
  • Input and Output States
  • Transfer Function
  • Control Flow
  • Reaching Definitions analysis
  • Abstraction
  • Safe-Approximation
  • Live Variables Analysis
  • Abstraction
  • Safe-Approximation
  • Available Expressions Analysis
  • Abstraction
  • Safe-Approximation
  • Summary

Was this helpful?

  1. 🔍Code Inspector
  2. Theory
  3. Static Analysis

DFA

Overview of DFA

DFA:Data Flow Analysis

How application-specific data flows through the nodes and edges of CFG(source code->IR->CFG)

most static analyzer tends to may analysis

  • may analysis

    • outputs information that may be true (over-approximation)

  • must analysis

    • outputs information that must be true (under-approximation)

different data-flow analysis applications have

different data abstraction and

different flow safe-approximation strategies and

different transfer functions and control-flow handlings

e.g., determine the sgin of a variable

data abstraction:+、-、0、unknown、undefined

transfer function:+ op + = + , + op - = -

control-flow handlings:union the signs at merges

Preliminaries of DFA

Input and Output States

  • Each execution of an IR statement transforms an input state to a new output state

  • The input/output state is associated with the program point before/after the statement

In each data-flow analysis application, we associate with every program point a data-flow value that represents an abstraction of the set of all possible program states that can be observed for that point.

DFA is to find a solution to a set of safe-approximation-directed constraints on the IN[s]’s and OUT[s]’s for all statements

  • constraints based on semantics of statements(transfer function)

  • constraints based on the flows of control

Transfer Function

Control Flow

Reaching Definitions analysis

A definition d at program point p reaches a point q if there is path from p to q such that d is not “killed” along that path

  • A definition of a variable v is a statement that assigns a value to v

  • how to be “killed”: new definition of v

Reaching definitions can be used to detect possible undefined variables.

e.g., introduce a dummy definition for each variable v at the entry of CFG, and if the dummy definition of v reaches a point p where v is used, then v may be used before definition (as undefined reaches v)

So reaching definitions analysis is may analysis.

(不放过动态运行时所有可能的路径)

Abstraction

The definitions of all the variables in a program can be represented by bit vectors.

Safe-Approximation

Transfer Function

genB:definitions generated in this BB.

killB: new definition D in this BB kills other definitions

The definitions in genB can reach the program point at OUT[B] and those in killB can't reach the program point at OUT[B]

obviously, same variables in definition bit vectors can not exist at the same time.

Control Flow

reaching definition is a may analysis.So here use union(∪) as the meet operator.

Algorithm

For Example:

CFG above can be transfered to pseudo-code below.

D1
D2
do {
	D3
	D4
	if(condition){
		D5
		D6
	} else {
		D7
		break
	}
} while(condition)
D8
  • Iteration 0 —— init

  • Iteration 1

IN[B1] = 0000 0000 = OUT[Entry]

OUT[B1] = 1100 0000 generate D1、D2

IN[B2] = 1100 0000 = OUT[B1] ∪ OUT[B4]

OUT[B2] = 1011 0000 generate D3、D4,kill D2

IN[B3] = 1011 0000 = OUT[B2]

OUT[B3] = 0011 0010 generate D7,kill D1

IN[B4] = 1011 0000 = OUT[B2]

OUT[B4] = 0011 1100 generate D5、D6,kill D1

IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]

OUT[B5] = 0011 1011 generate D8, kill D6

  • Iteration 2

IN[B1] = 0000 0000 = OUT[Entry]

OUT[B1] = 1100 0000 generate D1、D2

IN[B2] = 1111 1100 = OUT[B1] ∪ OUT[B4]

OUT[B2] = 1011 1100 generate D3、D4,kill D2

IN[B3] = 1011 1100 = OUT[B2]

OUT[B3] = 0011 0110 generate D7,kill D1、D5

IN[B4] = 1011 1100 = OUT[B2]

OUT[B4] = 0011 1100 generate D5、D6,kill D1、D7、D8

IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]

OUT[B5] = 0011 1011 generate D8, kill D6

  • Iteration 3

IN[B1] = 0000 0000 = OUT[Entry]

OUT[B1] = 1100 0000 generate D1、D2

IN[B2] = 1111 1100 = OUT[B1] ∪ OUT[B4]

OUT[B2] = 1011 1100 generate D3、D4,kill D2

IN[B3] = 1011 1100 = OUT[B2]

OUT[B3] = 0011 0110 generate D7,kill D1、D5

IN[B4] = 1011 1100 = OUT[B2]

OUT[B4] = 0011 1100 generate D5、D6,kill D1、D7、D8

IN[B5] = 0011 1110 = OUT[B4] ∪ OUT[B3]

OUT[B5] = 0011 1011 generate D8, kill D6

The final result is the green part which means definitions can reach this point in the program.

Why this iterative algorithm can finally stop?

INs will not change if OUTs do not change

OUTs will not change if INs do not change

Reach a fixed point. Also related with monotonicity

Live Variables Analysis

Live variables analysis tells whether the value of variable v at program point p could be used along some path in CFG starting at p.If so, v is live at p; otherwise, v is dead at p.

information of live variables can be used for register allocations.

e.g., all registers are full and we need to use one, then we should favor using a register with a dead value.

Abstraction

All variables in a program can be represented by bit vectors.

Safe-Approximation

redefinition will break the path. forwards analysis requires record of previous state, so we use backwards analysis

One live path will be ok, so we use may analysis

Control Flow

Transfer Function

useB:variables used before redefined in B

defB:variables redefined in B

OUT[B]:variables live coming out of B

note: we focus on the live state of variables at some program point.

Algorithm

  • Iteration 0 —— init

  • Iteration 1

IN[Exit] = 000 0000

OUT[B5] = 000 0000 = IN[Exit]

IN[B5] = 000 1000 use p, redefine z

OUT[B3] = 000 1000 = IN[B5]

IN[B3] = 100 1000 use x

OUT[B4] = 000 1000 = IN[B5] ∪ IN[B2]

IN[B4] = 010 1000 use y, redefine x、q

OUT[B2] = 110 1000 = IN[B3] ∪ IN[B4]

IN[B2] = 100 1001 use k, redefine m、y

OUT[B1] = 100 1001 = IN[B2]

IN[B1] = 001 1101 use p、q、z, redefine x、y

  • Iteration 2

IN[Exit] = 000 0000

OUT[B5] = 000 0000 = IN[Exit]

IN[B5] = 000 1000 use p, redefine z

OUT[B3] = 000 1000 = IN[B5]

IN[B3] = 100 1000 use x

OUT[B4] = 100 1001 = IN[B5] ∪ IN[B2]

IN[B4] = 010 1001 use y, redefine x、q

OUT[B2] = 110 1001 = IN[B3] ∪ IN[B4]

IN[B2] = 100 1001 use k, redefine m、y

OUT[B1] = 100 1001 = IN[B2]

IN[B1] = 001 1101 use p、q、z, redefine x、y

  • Iteration 3

IN[Exit] = 000 0000

OUT[B5] = 000 0000 = IN[Exit]

IN[B5] = 000 1000 use p, redefine z

OUT[B3] = 000 1000 = IN[B5]

IN[B3] = 100 1000 use x

OUT[B4] = 100 1001 = IN[B5] ∪ IN[B2]

IN[B4] = 010 1001 use y, redefine x、q

OUT[B2] = 110 1001 = IN[B3] ∪ IN[B4]

IN[B2] = 100 1001 use k, redefine m、y

OUT[B1] = 100 1001 = IN[B2]

IN[B1] = 001 1101 use p、q、z, redefine x、y

The final result is the green part which means variable is live along the path(can be used in future) from this program point

Available Expressions Analysis

An expression x op y is available at program point p if

  1. all paths from the entry to p must pass through the evaluation of x op y

  2. after the last evaluation of x op y, there is no redefinition of x or y

available expressions can be used for detecting global common subexpressions.

Abstraction

All the expressions in a program can be represented by bit vectors.

Safe-Approximation

Transfer Function

Control Flow

All paths from entry to point p must pass through the evaluation of x op y,so we use must analysis

for safety of the analysis, it may report an expression as unavailable even if it is truly available.(用于编译器优化,不能优化错误的内容, under-approximation)

Algorithm

  • Iteration 0 —— init

  • Iteration 1

OUT[Entry] = 00000

IN[B1] = 00000

OUT[B1] = 10000 generate E1

IN[B2] = 10000 OUT[B1] ∩ OUT[B4]

OUT[B2] = 01010 generate E2、E4,kill E1

IN[B3] = 01010 = OUT[B2]

OUT[B3] = 00011 generate E5,kill E2

IN[B4] = 01010 = OUT[B2]

OUT[B4] = 01110 generate E3、E4

IN[B5] = 00010 = OUT[B3] ∩ OUT[B4]

OUT[B5] = 01010 generate E4、E2,kill E3

  • Iteration 2

OUT[Entry] = 00000

IN[B1] = 00000

OUT[B1] = 10000 generate E1

IN[B2] = 00000 OUT[B1] ∩ OUT[B4]

OUT[B2] = 01010 generate E2、E4,kill E1

IN[B3] = 01010 = OUT[B2]

OUT[B3] = 00011 generate E5,kill E2

IN[B4] = 01010 = OUT[B2]

OUT[B4] = 01110 generate E3、E4

IN[B5] = 00010 = OUT[B3] ∩ OUT[B4]

OUT[B5] = 01010 generate E4、E2,kill E3

Summary

PreviousIR & CFGNextDFA-Foundation

Last updated 7 months ago

Was this helpful?

image-20240312213047748
image-20240312213409543
image-20240404104015211
image-20240404105015142
image-20240404110234375
image-20240404105623734
image-20240404110250025
image-20240404110529658
image-20240404111804621
image-20240404111847156
image-20240404114642223
image-20240404115359792
image-20240404115705891
image-20240404122526868
image-20240404122932842
image-20240404125456185
image-20240404125631901
image-20240404125640830
image-20240404130057765
image-20240404131040928
image-20240404132406602
image-20240404132806730
image-20240404133110495
image-20240404135114804
image-20240404140102302
image-20240404140531596
image-20240404141422819
image-20240404142143318
image-20240404142943287
image-20240404143148713
image-20240404143825127
image-20240404102713528
image-20240404121117629