Thursday, March 31, 2005

remote hardware fingerprinting - implications

Remote hardware fingerprinting affects anonymity. Whether or not that's a good thing depends on how you feel about anonymity. For example, it makes it easier to prove that what appear to be two different computers really are the same computer, which could help with prosecuting hackers. (Note: that's not the same as saying that what appears to be two different computer users are the same computer user, since the technique identifies the hardware, not the user.) On the other hand, there are times anonymity is important, like message boards designed so that people can post anonymous questions about AIDS or mental illness.

Let's consider how powerful this technique is. First of all, there's a limit on how many different computers the technique can tell apart. Remember that the technique measures clock skew, how many seconds per day a clock gains or loses. Their resolution is limited: while clock skew is fairly stable, it does vary somewhat, so they can't measure more closely than a few parts per million. At the same time, there's a practical limit on just how much clock skew can exist in a real system. After all, a clock that gains ten minutes a day would be very rare. So, if we take an upper bound of 10 minutes per day (7000us / sec), and a resolution of one part per million (1 us/sec), we get 7,000 possible clock skews. Double it because clocks can gain or lose, and you're still in the neighborhood of 14,000 possible skews. The actual results will be smaller because of measurement errors and such.

On the other hand, the technique in the paper gives you the current clock's value as well as the skew. That can help you identify more machines: if two machines have the same skew, but one's clock says it's 7:00 and the other says 10:00, you know they're two different machines. (In reality, the clocks are going to say things like number of seconds since the last time the computer rebooted, rather than an actual time or date. The principle stays the same.) Also, there are techniques to tell different operating systems apart: different operating systems update their clocks at different rates and have other traits you can use to identify them. To be generous, lets say this technique can potentially distinguish in the neighborhood of 100,000 to 1,000,000 different machines. That's still far from the total number of machines on the Internet.

That last point bears repeating. This technique isn't like a classic "fingerprint". It does not let you uniquely identify every computer on the Internet.

One place it might be useful is in forensics: if Alice is trying to prove to a jury that Mallory broke into her web site, she may be able to submit additional evidence that Mallory's computer had the same clock skew as the intruder's computer. Mallory, of course, could counter that (1) there are hundreds of millions of computers out there, so there are hundreds of computers with the same skew as his, and (2) even if it was Mallory's computer, that doesn't mean Mallory was the one using it at the time of the break-in.

Another place people could use it is in breaking the anonymity on trace files. The people who own large, high-bandwidth links will sometimes record all the traffic that goes across those links and make the record available for research purposes. Typically, they make two kinds of records available. One is just the protocol, the signalling traffic back and forth, without any of the data. The other kind has the packets and the data, or at least more of the data, but for privacy reasons it randomizes the IP addresses. With this technique, if you have one trace of each kind, in many cases you can match the clock skews between the traces, so you can match the addresses in the first trace to the packet data in the second. Since clock skews don't change much over time, the traces don't have be at the same times.

No comments: