Posts Tagged ‘predictions’

Integration with Facebook custom audiences

Thursday, August 9th, 2018 by Nick

We have enabled integration with Facebook custom audiences. This feature allows you to synchronize Sonamine predictions with Facebook custom audiences.

How might this be useful? For example, you can now retarget Facebook users who are likely to churn out of your game. With lookalike campaigns, you can acquire FB users who are most likely to convert to paying users or have the highest predicted LTV. The list goes on.

Predicting inactive players of Everquest II (2006 data)

Wednesday, July 13th, 2011 by Nick

Many online and mobile games such as MMORPGs rely on players returning to the game frequently.  Being able to predict which players will not return is a useful CRM technique.  This technique is commonly applied in companies with large numbers of customers, such as financial services firms (think CapitalOne) or mobile phone operators.

In this study, the goal was to see how accurately we could predict which players would cancel their subscription or become inactive for 30 days.  The authors proposed a novel technique to extract predictors which is commonly used to align gene sequences.

The system uses different predictor variables and machine learning algorithms, such as decision trees and neural networks.  The output of the described system is a player prediction, either inactive or not.  Then this prediction is compared to the actual behavior of the player.  True positive rate indicates how many of the actual inactive players were flagged correctly.  The results show that the true positive rate using machine learning techniques can be anywhere from 50-90%.  (see below table).


Unfortunately, the authors did not specify the how many inactive users there were, and how many inactive users the system flagged.  This question is important because if there are 10 inactive players, then it’s easy to find them by casting a wide net and flagging 1000 players.  So this result while encouraging cannot be used for marketing actions.

Another important way to evaluate how well the prediction system works is the false positive rate.  This rate indicates how many of the inactive players flagged by the system was wrong.  This false positive rate is critical in determining whether these predictions can be profitably used in marketing retention campaigns.  Unfortunately, the authors did not provide any results on the false positive rate.

Why false positive rate matters

Marketing retention campaigns have a business case calculation that depends on who gets the promotion.  In each promotion, there are two groups of players

  1. Actual churners – these are the people that would have gone on to become inactive
  2. Non churners – these are people who would NOT have become inactive

You make money by saving players in group 1… that’s obvious.

But you lose money when group 2 players take your promotion offer.  If you have 10000 people in this group and you gave them a 10% discount, then you are essentially losing revenue to the tune of 1000 people.   In other words, you needlessly left money on the table.

So it is critical that your true positive rate is high, but it is even more critical that your false positive rate is low enough to make an overall positive business case.

So what?

This study shows that machine learning techniques can help improve player insight.  To leverage this insight, marketers need to provide promotions to retain the players. However, you need to build a business case by making sure that the false positive rate is low enough .

Shameless plug -> check out Sonamine Predictive Player Segments™ – ChurnSoon.  We have been doing this for multiple game developers around the globe.


The dataset had 8 months of data from Jan to Sep 2006.  It was taken from one particular Everquest II server.  Inactive is defined as either 1) canceling subscription or 2) being inactive for 30 days.

Shim, K. J., Srivastava, J.  Sequence Alignment Based Analysis of Player Behavior in Massively Multiplayer Online Role-Playing Games (MMORPGs).  Proceedings of the IEEE International Conference on Data Mining (ICDM-10). Workshop on Domain-Driven Data Mining.

Facebook social network analysis predicts location better than IP address

Friday, June 11th, 2010 by Nick

Given the media coverage around location based services (LBS), it is easy to forget that most of these services center around accurately identifying the user’s location.  In most LBS, you have two types of services – real time services which provide information about your current location, and account level services which provide you information about your “home” base.  In the latter, you cannot always use the IP address to locate the home base because the user might be traveling.  At the same time, the accuracy of IP based geolocation is decreasing due to larger wifi and 3/4G connections.

With the technique described in this facebook research paper, social network information can be used to accurately predict the location of users who have not provided this location information.  Note this is a collective inferencing technique that uses social connections as a prediction mechanism.

Telcos can use a similar algorithm to help identify addresses of the prepaid users; social networks can use this to improve their user experience by “predicting” and tailoring the web service to the location of their users.  Retailers can use this to predict the location of their anonymous shoppers once the social network is extracted.  Fraud for prepaid cash cards can be more easily detected by establishing a home geography of the user.

From a practical perspective, here are the recommendations from this study:

“Based on these results, it seems that a good trade off is to predict from friend locations when an individual has 5 or
more locatable friends, and from the user’s IP address if she has fewer than 5 friends with known addresses. Doing this
causes the performance at 100 miles to slightly exceed the IP performance, and it is almost as good as strictly friend-
based prediction at smaller distances.”


3.5 million US facebook users have addresses were accurately tagged to longtitude latitude using TIGER/line dataset.  These users had an average of 10 friends, creating a 30.6million edge graph.  2.9 users had at least one friend with an accurate address.   This user base was shown to have similar online habits and geographic location as the overall population, ie. possibly of bias in the sample was low.

Main technique involved an empirical finding that probability of friendship as a function of distance had a reasonably good power law fit with exponent of -1.  This means that based on the friendship links, you could calculate the probability for any one location.  Of course, it would be prohibitive to maximize likelihood for ALL locations.  The authors identify empirically that it the most likely location is usually colocated with one of the friends.  This means the algorithm only needs to calculate and compare probabalities for all known friend locations.  This optimization significantly reduces computation.

To compare performance, the authors tried two approaches (1) predicting the known addresses of every 2.9M users without removing the unknown locations for other predictions (2) taking some users out completely out of the graph.

Figure 11 shows the results of (1).


Figure 12 shows the results from (2).  Interestingly this uses the concept of iterative collective inferencing with 3 iterations being optimal.



Lars Backstrom, Eric Sun, Cameron Marlow.  Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity.  WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA.