Count-min sketches for estimating password frequency within Hamming distance two

Example update of a count-min sketch at distance 1 and 2.

Abstract

The count-min sketch is a useful data structure for recording and estimating the frequency of string occurrences, such as passwords, in sub-linear space with high accuracy. However, it cannot be used to draw conclusions on groups of strings that are similar, for example close in Hamming distance. This paper introduces a variant of the count-min sketch which allows for estimating counts within a specified Hamming distance of the queried string. This variant can be used to prevent users from choosing popular passwords, like the original sketch, but it also allows for a more efficient method of analysing password statistics.

Keywords: count-min sketch, Bloom filter, password frequency, approximate string matching

Reference

Leah South, Douglas Stebila. Count-min sketches for estimating password frequency within Hamming distance two. In Colin Boyd, Leonie Simpson, editors, Proc. 18th Australasian Conference on Information Security and Privacy (ACISP) 2013, LNCS, vol. 7959, pp. 388-402. Springer, July 2013. © Springer.

Download

BibTeX