Monday, July 31, 2006

"Genius" Kornblum on fuzzy hashing

Jesse Kornblum was recently interviewed by the CyberSpeak podcast guys with regards to his "fuzzy hashing" paper to be presented at DFRWS.

Jesse, the author of hashing tools such as md5-, tiger-, and whirlpooldeep, has come up with something called "fuzzy hashing" which can be used to combat the approach that's being taken to obfuscate files by making small changes, as with Word documents (intellectual property crime) and images ('nuff said).

Jesse clearly explains the concept behind "fuzzy hashing" in his interview...in a nutshell, if you have two similar bitstreams (ie, JPEGs, GIFs, etc.) that have small changes, his tool (dubbed "SSDeep") will be able to tell you if they are similar...whereas tools like MD5Deep will tell you if the tools are exactly the same via a mathematical algorithm.

So what are the applications of something like this? Well, first off, it's not meant to find evidence; instead, it's an awesome data reduction tool. One of the examples Jesse used in his interview is Word documents that are printed out. When you print out a Word document, the time that the document was last printed is modified within the document itself...so an MD5 hash generated for the original document will not match the one generated for the printed document. The bitstreams are essentially the same, with some small modifications, and Jesse says that his tool will let you know that the two files are similar.

There are many other applications for this tool, to include image identification, intellectual property theft, etc. So, go on over to Cyberspeak and give the podcast a listen. If you see Jesse at DFRWS or GMU2006 or HTCIA, say hi, and buy him a beer!

Interestingly, Jesse is presenting on Windows memory analysis at HTCIA (end of Oct)...I'll be presenting on the subject at GMU2006. Jesse's also presenting at GMU2006.

BTW...Jesse, if you're reading this...thanks for the shout-out in your Cyberspeak interview!

No comments: