2012年8月14日 星期二

S&P'12 : ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions

To deal with unpatched code clones in scale of OS-distribution, Jiyong Jang, Abeer Agrawal, and David Brumley from Carnegie Mellon University published "ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions" in S&'P12, in which paper they proposed a system that can find unpatched code clones in OS distribution.

Due to the complexity of OS, there are always bugs inside whole system. Nowadays, patch is the standard methodology to update OS and fix bugs discovered. However there are still bugs produced by clone original buggy code in different sub-system, those bugs has the same attribute to original bug but hard to discover.

Most previous work like MOSS, CCFinder are focus on detection all code clone in system, which is limit by code scale. Many of them are language-dependent which is not suitable for OS ,which is multi-language environment.

In addition to find more unpatched buggy clone code, this paper try to answer following question.

  1. how much (potentially) vulnerable code can an attacker identify when a patch is released
  2. how responsive is the new version of an OS to known security vulnerabilities
  3. how many persisting unpatched code clones  
Due to the huge scale of OS, ReDeBug system proposed must has flexible scalability, Language agnostic, and has Lower false detection rate. ReDeBug
  The flowchart above describe the progress of ReDebug.

  • Pre-process the source
    1. Performs normalization and tokenization
    2. Moves an n-length window over the token stream. For each window, the resulting n-tokens are hashed into a Bloom filte
    3. Stores the Bloom filter for each source file in a raw data format
  • Check for unpatched code copies
    1. Extracts the original code snippet and the c tokens of context information from the pre-patch source
    2. Normalizes and tokenizes the extracted original buggy code snippets
    3. Hashes the n-token window into a set of hashes
    4. Bloom filter set membership test
  • Post-process the reported clones
    1. Performs an exact-matching test
    2. Excludes dead code
    3. reports only non-dead code to the user

沒有留言:

張貼留言