This is a simple project implemented by python. With the power of SplitScreen approach proposed in usenix security, this system can detect if a file is belong to black list. To handle explosion of malware signature, match the signature one by one is not enough and more efficient approach to identify known malware. SplitScreen employs Bloom Filter Algorithms for signature management, distribution and detection.
Bloom Filter
Data Structure and Algorithms
Bloom Filter composed by an
n bits bit-vector which initialized all bits to 0 and
k different hash functions. Bloom Filter support two operation, add and query. Add operation is used to add new element in to Bloom Filter. Add operation first compute k hash functions of the element and then set correspond bit 1.
In the same way, query operation compute k hash functions of the element and then check if correspond bit is set. If all the correspond bits is set to 1, then we can claim the membership of this element. If one of the correspond bits is clear, this element is not belong to the set.
Time and Space Complexity
The time to add and query both have time complexity O(k) , where k is the constant we can set. So we can estimate the two operations complete in constant time. And Bloom Filter is also space-saving because it take only n bits to store Bloom Filter.
Split Screen
Architecture
Split Screen has three components : Malware Database, Server and Client Agent.
|
Fig 1 System Architecture |
Malware Signature Database is used to store all malware sample, in my case the samples is come from vxheaven. And Server is aim to maintain BloomFilter structure and update new Bloom Filter to Client Agent.
System Flow
In my system , server will check malware database for newly update malware and add new malware to Bloom Filter. Then send a command to client for updating BloomFilter.
Implementation
In my implementation, source code structure is as below:
[server]
FFBFServer.py
[client]
FFBFCli.py
client.py
[common]
BF_Operation.py
FeedForwardBloomFilter.py
DirectoryList.pyHash.py
First we look into Hash.py, Although Bloom Filter need k different hash , in practice we can use linear combination of two hash functions instead. So Hash.py only compute two hash value about a file.
With Hash.py's help, FeedForwardBloomFilter.py implement add and query operation.
class FFBloomFilter(object):
def add(self,fname):
self.hashClass.hash(fname)
val1 = int(self.hashClass.getHash(),16)
val2 = int(self.hashClass.getHash2(),16)
hashValue =val1%self.size
for _ in range(self.hashNum):
self.bf1[hashValue]+=1
hashValue+=val2%self.size
hashValue%=self.size
def query(self,fname):
self.hashClass.hash(fname)
val1 = int(self.hashClass.getHash(),16)
val2 = int(self.hashClass.getHash2(),16)
hashValue =val1%self.size
for _ in range(self.hashNum):
if self.bf1[hashValue]==0 :
return False
hashValue+=val2%self.size
hashValue%=self.size
hashValue =val1%self.size
for _ in range(self.hashNum):
self.bf2[hashValue]+=1
hashValue+=val2%self.size
hashValue%=self.size
return True
BF_Operation.py implement algorithms in SplitScreen as Fig 2
|
Fig 2 SplitScreen Architecture |
class BF_Operation(object):
def __init__(self):
self.hashNum = 10
self.bitSize = 1024*1024
def FFBF_INIT(self,path,outBF):
ffbf = FeedForwardBloomFilter.FFBloomFilter(self.bitSize,self.hashNum)
Dir = DirectoryList.DirList()
Dir.ListAdd(path,ffbf)
ffbf.serialize(outBF)
def FFBF_SCAN(self,path,inBF,outBF):
ffbf = FeedForwardBloomFilter.FFBloomFilter(self.bitSize,self.hashNum)
ffbf.parsing(inBF)
Dir = DirectoryList.DirList()
ffbf.clear()
matchdata = []
Dir.ListCheck(path,ffbf,matchdata)
ffbf.serialize(outBF)
return matchdata
Where ListAdd() and ListCheck() treat a path of directory as input and add(check) all files in directory. In the paper, there are two more algorithms, FFBF_HIT() and FFBF_verify(), which are not needed while we only need to test membership of files.