2012年12月23日 星期日

Malware Detection System Based on SplitScreen

This is a simple project implemented by python. With the power of SplitScreen approach proposed in usenix security, this system can detect if a file is belong to black list. To handle explosion of malware signature, match the signature one by one is not enough and more efficient approach to identify known malware. SplitScreen employs Bloom Filter Algorithms for signature management, distribution and  detection.

Bloom Filter

Data Structure and Algorithms

Bloom Filter composed by  an n bits bit-vector which initialized all bits to 0 and k different hash functions. Bloom Filter support two operation, add and query. Add operation is used to add new element in to Bloom Filter. Add operation first compute k hash functions of the element and then set correspond bit 1.
In the same way, query operation compute k hash functions of the element and then check if correspond bit is set. If all the correspond bits is set to 1, then we can claim the membership of this element. If one of the correspond bits is clear, this element is not belong to the set.
  

Time and Space Complexity

The time to add and query both have time complexity O(k) , where k is the constant we can set. So we can estimate the two operations complete in constant time. And Bloom Filter is also space-saving because it take only n bits to store Bloom Filter.

Split Screen

Architecture

Split Screen has three components : Malware Database, Server and Client Agent.
Fig 1 System Architecture
Malware Signature Database is used to store all malware sample, in my case the samples is come from vxheaven.  And Server is aim to maintain BloomFilter structure and update new Bloom Filter to Client Agent.
        

System Flow

In my system , server will check malware database for newly update malware and add new malware to Bloom Filter. Then send a command to client for updating BloomFilter.

Implementation

In my implementation, source code structure is as below:
[server]
FFBFServer.py
[client]
FFBFCli.py
client.py
[common]
BF_Operation.py
FeedForwardBloomFilter.py
DirectoryList.pyHash.py

First we look into Hash.py, Although Bloom Filter need k different hash , in practice we can use linear combination of two hash functions instead. So Hash.py only compute two hash value about a file. 
With Hash.py's help, FeedForwardBloomFilter.py implement add and query operation.
class FFBloomFilter(object):
    def add(self,fname):
        self.hashClass.hash(fname)
        val1 = int(self.hashClass.getHash(),16)
        val2 = int(self.hashClass.getHash2(),16)
        hashValue =val1%self.size
        for _ in range(self.hashNum):
            self.bf1[hashValue]+=1
            hashValue+=val2%self.size
            hashValue%=self.size
            
    def query(self,fname):
        self.hashClass.hash(fname)
        val1 = int(self.hashClass.getHash(),16)
        val2 = int(self.hashClass.getHash2(),16)
        hashValue =val1%self.size
        for _ in range(self.hashNum):
            if self.bf1[hashValue]==0 :
                return False
            hashValue+=val2%self.size
            hashValue%=self.size
            
        hashValue =val1%self.size
        for _ in range(self.hashNum):
            self.bf2[hashValue]+=1
            hashValue+=val2%self.size
            hashValue%=self.size
        return True

BF_Operation.py implement algorithms in SplitScreen as Fig 2
Fig 2 SplitScreen Architecture
class BF_Operation(object):
    def __init__(self):
        self.hashNum = 10
        self.bitSize = 1024*1024
    
    def FFBF_INIT(self,path,outBF):
        ffbf =  FeedForwardBloomFilter.FFBloomFilter(self.bitSize,self.hashNum)
        Dir = DirectoryList.DirList()
        Dir.ListAdd(path,ffbf)
        ffbf.serialize(outBF)
    
    def FFBF_SCAN(self,path,inBF,outBF):
        ffbf =  FeedForwardBloomFilter.FFBloomFilter(self.bitSize,self.hashNum)
        ffbf.parsing(inBF)
        Dir = DirectoryList.DirList()
        ffbf.clear()
        matchdata = []
    
        Dir.ListCheck(path,ffbf,matchdata)
        ffbf.serialize(outBF)
        return matchdata
Where ListAdd() and ListCheck() treat a path of directory as input and add(check) all files in directory. In the paper, there are two more algorithms, FFBF_HIT() and FFBF_verify(), which are not needed while we only need to test membership of files.

沒有留言:

張貼留言