Nilsimsa Hash Explained
Nilsimsa is an anti-spam focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001[1] and then reviewed by Ernesto Damiani et al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection".[2] The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with cryptographic hash functions such as SHA-1 or MD5, making a small modification to a document does not substantially change the resulting hash of the document. The paper suggests that the Nilsimsa satisfies three requirements:
- The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically.
- The encoding must be robust against intentional attacks.
- The encoding should support an extremely low risk of false positives.
Subsequent testing on a range of file types identified the Nilsimsa hash as having a significantly higher false positive rate when compared to other similarity digest schemes such as TLSH, Ssdeep and Sdhash.[3]
Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,[4] that used the algorithms of spamsum by Andrew Tridgell (2002).[5]
Several implementations of Nilsimsa exist as open-source software.[6] [7] [8] [9] [10]
Notes and References
- Web site: Nilsimsa v.0.2.4. cmeclax remailer operator. 10 February 2002. https://web.archive.org/web/20050707005338/http://ixazon.dynip.com/~cmeclax/nilsimsa-0.2.4.tar.gz. 7 July 2005. 23 February 2014.
- Web site: Damiani. etal. An Open Digest-based Technique for Spam Detection. 2004. 2013-09-01.
- Oliver. etal. TLSH - A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop. 2013. 2015-06-04.
- Web site: The Fuzzy Hashing Patent. Jesse Kornblum. 15 May 2008. LiveJournal. 23 February 2014. dead. https://web.archive.org/web/20160507201540/http://jessekornblum.livejournal.com/242493.html. 7 May 2016.
- Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. DFRWS. 2006. 23 February 2014.
- Web site: py-nilsimsa - Python port of Nilsimsa locality-sensitive hash . github.com . 2016-11-08.
- Web site: Nilsimsa . Nilsimsa.rubyforge.org . 2013-09-01 . https://web.archive.org/web/20130615032426/http://nilsimsa.rubyforge.org/ . 2013-06-15 . dead .
- Web site: Digest::Nilsimsa . metacpan.org . 2013-09-01.
- Web site: golang nilsimsa - implements nilsimsa fuzzy hash by cmeclax. hersensch.im. en. 2018-04-25.
- Web site: node-nilsimsa - Node.JS port of Nilsimsa locality-sensitive hash . github.com . 2023-09-09.