Given the media coverage around location based services (LBS), it is easy to forget that most of these services center around accurately identifying the user’s location. In most LBS, you have two types of services – real time services which provide information about your current location, and account level services which provide you information about your “home” base. In the latter, you cannot always use the IP address to locate the home base because the user might be traveling. At the same time, the accuracy of IP based geolocation is decreasing due to larger wifi and 3/4G connections.
With the technique described in this facebook research paper, social network information can be used to accurately predict the location of users who have not provided this location information. Note this is a collective inferencing technique that uses social connections as a prediction mechanism.
Telcos can use a similar algorithm to help identify addresses of the prepaid users; social networks can use this to improve their user experience by “predicting” and tailoring the web service to the location of their users. Retailers can use this to predict the location of their anonymous shoppers once the social network is extracted. Fraud for prepaid cash cards can be more easily detected by establishing a home geography of the user.
From a practical perspective, here are the recommendations from this study:
“Based on these results, it seems that a good trade off is to predict from friend locations when an individual has 5 or
more locatable friends, and from the user’s IP address if she has fewer than 5 friends with known addresses. Doing this
causes the performance at 100 miles to slightly exceed the IP performance, and it is almost as good as strictly friend-
based prediction at smaller distances.”
Details
3.5 million US facebook users have addresses were accurately tagged to longtitude latitude using TIGER/line dataset. These users had an average of 10 friends, creating a 30.6million edge graph. 2.9 users had at least one friend with an accurate address. This user base was shown to have similar online habits and geographic location as the overall population, ie. possibly of bias in the sample was low.
Main technique involved an empirical finding that probability of friendship as a function of distance had a reasonably good power law fit with exponent of -1. This means that based on the friendship links, you could calculate the probability for any one location. Of course, it would be prohibitive to maximize likelihood for ALL locations. The authors identify empirically that it the most likely location is usually colocated with one of the friends. This means the algorithm only needs to calculate and compare probabalities for all known friend locations. This optimization significantly reduces computation.
To compare performance, the authors tried two approaches (1) predicting the known addresses of every 2.9M users without removing the unknown locations for other predictions (2) taking some users out completely out of the graph.
Figure 11 shows the results of (1).
Figure 12 shows the results from (2). Interestingly this uses the concept of iterative collective inferencing with 3 iterations being optimal.
References
Lars Backstrom, Eric Sun, Cameron Marlow. Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity. WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA.





