MAC address anonymization
MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google,[1] Apple[2] and CrowdVision[3] - which track users' movements via their computer hardware - to simultaneously preserve the identities of the people they are tracking, while tracking the hardware itself.
Flawed approaches
Simple hashing
An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66
, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1
(32 hexadecimal digits).[4] An address only one character different (11:22:33:44:55:67
) produces 391907146439938c9821856fa181052e
,[5] an entirely different hash due to the avalanche effect.
The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.
For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.
In 2018, academics found that with modern computing equipment with the ability to calculate 6 billion MD5 hashes and 844 million SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in:[6]
Truncation
Another approach that has been tested is truncating the MAC Address by removing the Organizationally unique identifier (the first 24 bits of the 48 bit MAC Address).[7] However, as only 0.1% of the total Organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit.[8] Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in use - allowing the device type to be identified with only a small part of the MAC Address.[9]
Ali and Dyo approach
Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics.[10] In particular, Junade Ali and Vladimir Dyo developed an approach which works by:[11]
- Using computationally expensive hash functions like Bcrypt to prevent background knowledge attacks
- Truncating the resulting hash to achieve K-anonymity
The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the Birthday Paradox.[12] Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision p can be calculated by where there are m MAC Addresses and n possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".
References
- ^ "Google Maps Has Been Tracking Your Every Move, And There's A Website To Prove It". Junkee. 15 August 2014. Retrieved 2016-04-10.
- ^ "How your iPhone has been tracking your every move in secret". Metro News. 2014-09-28. Retrieved 2016-04-11.
- ^ "iInside retail brochure: Leading the market in indoor location techno…". 2014-03-10.
- ^ echo -n "112233445566"|md5sum = eb341820cd3a3485461a61b1e97d31b1
- ^ echo -n "112233445567"|md5sum = 391907146439938c9821856fa181052e
- ^ Marx, Matthias; Zimmer, Ephraim; Mueller, Tobias; Blochberger, Maximilian; Federrath, Hannes (2018). Hashing of personally identifiable information is not sufficient. Gesellschaft für Informatik e.V. ISBN 978-3-88579-675-6.
- ^ Fuxjaeger, P.; Ruehrup, S.; Paulin, T.; Rainer, B. (Fall 2016). "Towards Privacy-Preserving Wi-Fi Monitoring for Road Traffic Analysis". IEEE Intelligent Transportation Systems Magazine. 8 (3): 63–74. doi:10.1109/MITS.2016.2573341. ISSN 1941-1197. S2CID 2646906.
- ^ Demir, Levent; Cunche, Mathieu; Lauradoux, Cédric (2014-06-11). "Analysing the privacy policies of Wi-Fi trackers". Proceedings of the 2014 workshop on physical analytics. Association for Computing Machinery. pp. 39–44. doi:10.1145/2611264.2611266. ISBN 978-1-4503-2825-8. S2CID 2624491.
- ^ Martin, Jeremy; Rye, Erik; Beverly, Robert (2016-12-05). "Decomposition of MAC address structure for granular device inference". Proceedings of the 32nd Annual Conference on Computer Security Applications. Los Angeles, California, US: Association for Computing Machinery. pp. 78–88. doi:10.1145/2991079.2991098. ISBN 978-1-4503-4771-6.
- ^ Feng, X.; Feng, Y.; Dawam, E. S. (August 2020). "Artificial Intelligence Cyber Security Strategy". 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). pp. 328–333. doi:10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00064. ISBN 978-1-7281-6609-4.
- ^ Ali, Junade; Dyo, Vladimir (2020-12-25). Practical Hash-based Anonymity for MAC Addresses. pp. 572–579. doi:10.5220/0009825105720579. ISBN 978-989-758-446-6.
- ^ Demir, L.; Kumar, A.; Cunche, M.; Lauradoux, C. (2018). "The Pitfalls of Hashing for Privacy". IEEE Communications Surveys and Tutorials. 20 (1): 551–565. doi:10.1109/COMST.2017.2747598. ISSN 1553-877X. S2CID 3571244.