Nilsimsa Hash Explained

Nilsimsa is an anti-spam focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001[1] and then reviewed by Ernesto Damiani et al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection".[2] The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with cryptographic hash functions such as SHA-1 or MD5, making a small modification to a document does not substantially change the resulting hash of the document. The paper suggests that the Nilsimsa satisfies three requirements:

  1. The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically.
  2. The encoding must be robust against intentional attacks.
  3. The encoding should support an extremely low risk of false positives.

Subsequent testing on a range of file types identified the Nilsimsa hash as having a significantly higher false positive rate when compared to other similarity digest schemes such as TLSH, Ssdeep and Sdhash.[3]

Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,[4] that used the algorithms of spamsum by Andrew Tridgell (2002).[5]

Several implementations of Nilsimsa exist as open-source software.[6] [7] [8] [9] [10]

Notes and References

  1. Web site: Nilsimsa v.0.2.4. cmeclax remailer operator. 10 February 2002. https://web.archive.org/web/20050707005338/http://ixazon.dynip.com/~cmeclax/nilsimsa-0.2.4.tar.gz. 7 July 2005. 23 February 2014.
  2. Web site: Damiani. etal. An Open Digest-based Technique for Spam Detection. 2004. 2013-09-01.
  3. Oliver. etal. TLSH - A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop. 2013. 2015-06-04.
  4. Web site: The Fuzzy Hashing Patent. Jesse Kornblum. 15 May 2008. LiveJournal. 23 February 2014. dead. https://web.archive.org/web/20160507201540/http://jessekornblum.livejournal.com/242493.html. 7 May 2016.
  5. Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. DFRWS. 2006. 23 February 2014.
  6. Web site: py-nilsimsa - Python port of Nilsimsa locality-sensitive hash . github.com . 2016-11-08.
  7. Web site: Nilsimsa . Nilsimsa.rubyforge.org . 2013-09-01 . https://web.archive.org/web/20130615032426/http://nilsimsa.rubyforge.org/ . 2013-06-15 . dead .
  8. Web site: Digest::Nilsimsa . metacpan.org . 2013-09-01.
  9. Web site: golang nilsimsa - implements nilsimsa fuzzy hash by cmeclax. hersensch.im. en. 2018-04-25.
  10. Web site: node-nilsimsa - Node.JS port of Nilsimsa locality-sensitive hash . github.com . 2023-09-09.