静态程序分析#

StartAt

2022-7-16

南京大学《软件分析》课程。

1. Course Introduction#

莱斯定理#

📖 递归可枚举语言 的所有非平凡(nontrival)性质都是 不可判定 的。 [rice0]

无法断言一个图灵机是否具有一个可以被图灵机计算的功能函数。[rice1]

../_images/2022-07-16_181851.png
rice0

📖 莱斯定理

rice1

https://zhuanlan.zhihu.com/p/339648002

Sound and Complete#
Sound by default

在静态分析中,几乎所有的分析方法都追求 Sound 而非 Complete。

../_images/2022-07-16_182054.png
静态分析的思维

Symbol evaluation,避免使用编程语言本身的思维。 TODO

静态分析的流程
  • 抽象

  • 近似

    • Transfer function <= 问题 + 语义

    • 控制流 <= flow merging

Top#

\(\top\),用来表示一个符号是未知的(Unknown)

Bottom#

\(\bot\), 用来表示一个符号是未定义的(Undefined )

2. Intermediate Representation#

Intermediate Representation (IR) 是介于高级编程语言和字节码之间的一种中间表现形式,通常是 语言无关 + 机器无关的。用于编译优化和程序分析。

AST 和 IR 的异同:

../_images/18492e22-a308-4a83-9c1a-6819d8025917.png

常见的 IR 形式:

常见的 IR 语言:

  • C - 天然的汇编抽象语言

  • LLVM IR

  • Soot (3AC)

  • Jimple (Typed 3AC)

Three-Addrress Code#

📖 Three-address_code

../_images/6e1456b0-5734-4083-8617-fb8676028fe3.png
Static Single-Assignment#
../_images/00cce13d-de57-41ac-b41b-84455cc2d6d7.png
Control Flow Graph#
../_images/0bc9d1e2-3ea7-4c5f-b1ea-6f948c8684fb.png
Basic Block#

Entry of BB must be dest of a JUMP instr. Exit of BB must be a JUMP instr.

../_images/dd34afeb-87e1-4cea-b8d5-d903eb179cb8.png

Some Soot Stuffs#

$x: temp var of soot.

Java invoke type:

special

constructor, super class mentod, private method

virtual

instance method call (virtual dispatch)

interface

can not optmizetion, check interface implementation

static

call static method

dynamic

for lambda

3. Data Flow Analysis I#

Data Flow Analysis#

How Data is Flow on #️⃣CFG?

Data

is application-specific data, an abstraction (such as #️⃣Top, #️⃣Bottom)

Flow

through the nodes(#️⃣Basic Blocks, statements) and edges (control flows) of CFG (program). Safe-approximation?

Different data flow analysis application has:

  • different data abstraction

  • different flow safe-approximation strategies (策略)

  • different transfer functions and control-flow handlings

../_images/screenshot-20220906-202633.png
May Analysis::#

Output information may be true (over-approximation, sound?).

Most static analyses is May Analysis.

一般情况下 state 初始化为空集。

Must Analysis::#

Output information must be must (under-approximation, complete?).

一般情况下 state 初始化为全集。

Over- and under-approximation are both safty of analysis.

Input/Ouput States

The set of possible data flow values is the domain for this application.

../_images/screenshot-20220906-203643.png

Target of #️⃣Data Flow Analysis.#

Forward/Backward Analysis

Nothing special.

Application#

Reaching Definitions Analysis#

  1. Intra-producedural CFG

  2. No alias (alias: Pointer Analysis)

Dummy definition.

Bit Vectors

不动点计算

AB => A but exclude B

模版

停机:IN[S] 不变时,OUT[S] 不变

OUT[S] never shrinks

../_images/screenshot-20220906-212807.png

Iteration Cycles#

4. Data Flow Analysis II#

Abstraction

Use a bit vector, one bit for one variable assignment (v = x). 1 for alive and 0 for dead.

Transfer function: (no formal)
  • Do backward iteratiom, find variable use statements in every basic block, once found, set corrspoind bit according its value.

  • Merge: 1 + ? -> 1 (It is a may analysis)

  • How to determine value?

Initial State

All variables are dead, bit vector: 000...00

评论

如果你有任何意见,请在此评论。 如果你留下了电子邮箱,我可能会通过 回复你。