Tuesday, January 12, 2010

Why IP addresses are no longer enough to identify internet users

Richard Clayton has an excellent post explaining (in terms even a lawyer can understand) why the traditional formula of IP address plus timestamp is increasingly inadequate as a way of identifying internet users:
The basics are that you record an IP address and a timestamp; use the Regional Internet Registry records (RIPE, ARIN etc) to determine which ISP has been allocated the IP address; and then ask the ISP to use their internal records to determine which customer account was allocated the IP address at the relevant instant. All very simple in concept, but hung about — as the thesis explained — by considerable caveats as to whether the simple assumptions involved are actually true in a particular case.

One of the caveats concerned the use of Network Address Translation (NAT), whereby the IP addresses used by internal machines are mapped back and forth to external IP addresses that are visible on the global Internet. The most familiar NAT arrangement is that used by a great many home broadband users, who have one externally facing IP address, yet run multiple machines within the household.

Companies also use NAT. If they own sufficient IP addresses they may map one-to-one between internal and external addresses (usually for security reasons), or they may only have 4 or 8 external IP addresses, and will use some or all of them in parallel for dozens of internal machines.

Where NAT is in use, as my thesis explained, traceability becomes problematic because it is rare for the NAT equipment to generate logs to record the internal/external mapping, and even rarer for those logs to be preserved for any length of time. Without these logs, it is impossible to work out which internal user was responsible for the event being traced. However, in practice, all is not lost because law enforcement is usually able to use other clues to tell them which member of the household, or which employee, they wish to interview first.

Treating NAT with this degree of equanimity is no longer possible, and that’s because of the way in which the mobile telephone companies are providing Internet access.

The shortage of IPv4 addresses has meant that the mobile telcos have not been able to obtain huge blocks of address space to dish out one IP address per connected customer — the way in which ISPs have always worked. Instead, they are using relatively small address blocks and a NAT system, so that the same IP address is being simultaneously used by a large number of customers; often hundreds at a time.

This means that the only way in which they can offer a traceability service is if they are provided with an IP address and a timestamp AND ALSO with the TCP (or UDP) source port number. Without that source port value, the mobile firm can only narrow down the account being used to the extent that it must be one out of several hundred — and since those several hundred will have nothing in common, apart from their choice of phone company, law enforcement (or anyone else who cares) will be unable to go much further.
Edited to add (14.01.10):

In two follow up posts, Richard explains what this means for data retention rules (arguing that the IP address only approach of the Data Retention Directive is flawed) and considers the practicalities of identifying mobile internet users.

No comments:

Post a Comment