2013年8月23日 星期五

Paper Reading : Enabling Sophisticated Analyses of x86 Binaries with RevGen

This paper is proposed in HotDep'11.
This paper is aim to ease the difficulty of binary analysis by RevGen, which translating tradition x86 binary into LLVM IR instead of ad-hoc IR, which use by different system.

There are many different system for binary program analysis, for example, BitBlaze and CodeSurfer. However they use their own IR language which hard to migrate to other system and without formalize verify.

LLVM, a popular compiler framework, have been widely used in many analysis tool. For example,
KLEE and Parfait are both LLVM-based.

LLVM

LLVM is a compile framework with a compact RISC-like instruction set. It support unlimit register and only contain 30 opcodes, which ease to analysis. Especially, only load and store can access memory.
LLVM naturelly support Static Single Assignment(SSA) code representation. So data flow and def-use graph can be compute. Moreover function inlining, constant propagation, or dead store removal can also achive.
To translate binary code into LLVM, following issue must be concern
  • pointer arithmetic 
  • accommodate different stack layouts
  • transform accesses to various code and data segments
  • deal with indirect call
  • semantic equivalent LLVM programs.

Challenge


  1. Extracting binary code's semantics
  2. Inferring type information

RevGen




Translating Blocks of Binary Code

1. Disassemble into micro-operations, which translate to LLVM instructions later.
2. One to one mapping between micro-operations and LLVM instructions.

Reconstructing the Control Flow Graph (CFG)

1. Each code block is translate to LLVM block, and form the functions.
2. Connect each functions with call instructions.

 Obtaining Analyzable LLVM Programs

This part the symbol table contain library call and relocation table are provide to identify some constant address. Then translation can be complete.

沒有留言:

張貼留言