IBM researchers use social groups to predict churn (again)

Another data mining study showing how churn can be predicted by using social network analysis.  The main finding here is that off-net status of the leaders of social groups is a useful predictor of churn.   See how the diagram below indicates that the social strength of the leader (see details for how this is calculated) is correlated with the increase in probability to churn.

churnleader

Lifts up to 8x were reported for the most valuable customers.

The main application here is to use social network analysis software such as Sonamine to identify groups and their leaders, and add these an input predictors into existing churn prediction models.

Details

The paper is quite long and so we confine our discussion here to the parts where churn was predicted using social group analysis.

Data used was from a mobile operator,  included 28 days of call data, daily volumes of 117M calls, 28M subscribers, 16M on-net.  800,000 subscribers churned during the 28 days.

The overall method consists of five step (1) use call data records to cluster users into tightly knit social groups (2) identify specific characteristics of each group and use them to model churn of the group (3) use the group characteristics to predict group churn in the hold-out validation and test samples (4) use the rank of each group member within the group to build a individual model against probability of churning (5) combine group churn score and individual by multiplying them together.

Grouping users together into social groups is accomplished by first calculating the overlap between neighbors of each pair of callers.  The idea is that if we call a lot of the same people, then we are probably in the same social group.  Then the network is partitioned using fast algorithms.  (Note Sonamine software has such a fast clustering algorithm that is linear to the number of edges.  Try it out yourself).

The group is labeled as churn if more than one third of its members churned.  Several group variables were then calculated including size,# on-net, % on-net, social strength (max,min, ratio), calls made and received by leader, average call number made and received.  The leader is identified using a pagerank function with dampening at 0.85.   (Note Sonamine software has pagerank and other “leader identification” algorithms.  Try it out yourself)

A decision tree was used to build a model that predicts group-churn based on the group’s input predictors.

For the individual model, an exponentially decaying function was fitted to two variables – rank within the group based on pagerank function, and the probability of churn.  This means for all the rank =2, take the number of churn divided by total number to get the probability of churn.  Do the same with the different ranks.  Then plot them and get a exponential function.  The function used in the paper was       score=0.85 exp (rank-1)  Use this function to get the probability of churn if any user is rank=2, the value is 0.85.

Finally, to get an individual score, use the group score multiply by the individual probability to churn.  So if user is rank 2 in the group and the group has a 25% chance of churning, then you take the two numbers and multiply them together: 85%x 25% = 21.25%

References

Yossi Richter, Elad Yom-Tovy, Noam Slonim.  Predicting customer churn in mobile networks through analysis of social groups.   SDM , p. 732-741, SIAM, (2010).  The authors are at the IBM Haifa Research lab.

Pursway (previous known as Datanetis) was mentioned in this article.

Tags: , , ,

Leave a Reply