静态程序分析¶
- StartAt:
- 2022-7-16 
南京大学《软件分析》课程。
1. Course Introduction¶
- Sound and Complete¶
- Sound by default
- 在静态分析中,几乎所有的分析方法都追求 Sound 而非 Complete。 
   
- 静态分析的思维
- Symbol evaluation,避免使用编程语言本身的思维。 TODO 
- 静态分析的流程
- 抽象 
- 近似 - Transfer function <= 问题 + 语义 
- 控制流 <= flow merging 
 
 
- Top¶
- \(\top\),用来表示一个符号是未知的(Unknown) 
- Bottom¶
- \(\bot\), 用来表示一个符号是未定义的(Undefined ) 
2. Intermediate Representation¶
Intermediate Representation (IR) 是介于高级编程语言和字节码之间的一种中间表现形式,通常是 语言无关 + 机器无关的。用于编译优化和程序分析。
AST 和 IR 的异同:
 
常见的 IR 形式:
常见的 IR 语言:
- C - 天然的汇编抽象语言 
- LLVM IR 
- Soot (3AC) 
- Jimple (Typed 3AC) 
- Three-Addrress Code¶
- 
  
- Static Single-Assignment¶
  
- Control Flow Graph¶
  
- Basic Block¶
- Entry of BB must be dest of a JUMP instr. Exit of BB must be a JUMP instr.   
Some Soot Stuffs¶
$x: temp var of soot.
Java invoke type:
- special:
- constructor, super class mentod, private method 
- virtual:
- instance method call (virtual dispatch) 
- interface:
- can not optmizetion, check interface implementation 
- static:
- call static method 
- dynamic:
- for lambda 
3. Data Flow Analysis I¶
- Data Flow Analysis¶
- How Data is Flow on - #️⃣CFG?- Data:
- is application-specific data, an abstraction (such as - #️⃣Top,- #️⃣Bottom)
- Flow:
- through the nodes( - #️⃣Basic Blocks, statements) and edges (control flows) of CFG (program). Safe-approximation?
 - Different data flow analysis application has: - different data abstraction 
- different flow safe-approximation strategies (策略) 
- different transfer functions and control-flow handlings 
   
- May Analysis::¶
- Output information may be true (over-approximation, sound?). - Most static analyses is May Analysis. - 一般情况下 state 初始化为空集。 
- Must Analysis::¶
- Output information must be must (under-approximation, complete?). - 一般情况下 state 初始化为全集。 
Over- and under-approximation are both safty of analysis.
- Input/Ouput States
- The set of possible data flow values is the domain for this application.   - Target of - #️⃣Data Flow Analysis.¶
- Forward/Backward Analysis
- Nothing special. 
Application¶
Reaching Definitions Analysis¶
- Intra-producedural CFG 
- No alias (alias: Pointer Analysis) 
Dummy definition.
Bit Vectors
不动点计算
AB => A but exclude B
模版
停机:IN[S] 不变时,OUT[S] 不变
OUT[S] never shrinks
 
Iteration Cycles¶
4. Data Flow Analysis II¶
Live Variable Analysis¶
- Abstraction
- Use a bit vector, one bit for one variable assignment ( - v = x). 1 for alive and 0 for dead.
- Transfer function: (no formal)
- Do backward iteratiom, find variable use statements in every basic block, once found, set corrspoind bit according its value. 
- Merge: - 1 + ? -> 1(It is a may analysis)
- How to determine value? 
 
- Initial State
- All variables are dead, bit vector: - 000...00
