Another telecom churn study proves effectiveness of SNA

Last week at the Predictive Analytics Show in San Francisco, another great study of how using social network analysis can help predict churn in a telecoms was unveiled by Michael Driscoll.  Michael has a consulting firm called Dataspora, he graduated from Harvard and has a PhD in Bioinformatics from Boston University.  (Disclaimer: Michael and I overlapped in college and we share common interests in large data predictive analytics)

Here are some details given about the study: North American mobile operator, data slice with several million customers in three major regional markets, data from May to August 2009, data represented only about 10% of the total subscriber base.  The system used for modeling was R and most of the queries were submitted to an greenplum database owned by the operator.

One finding of interest is that a subscriber with a churner in their social group was twice as likely to churn as someone who did not have a churner in their social group.  This finding is fairly consistent among those reported by the other vendors such as Pursway, Idiro and Xtract.   One question that naturally arises is whether the number of churners in your social group affects this increased tendency to churn.  Is there a threshold cut-off?

It also turned out that if there were churner-churner relationships within the subscriber’s social group, that subscriber was 7x as likely as the overall base rate to churn.  Again this makes intuitive sense, some amount of peer pressure will lead one to churn.

One thing that this study did not do was to compare how the SNA study compares to the internal data mining’s churn prediction model.  This is going to be a major impediment in getting SNA into production at telecoms if there are no ways to integrate SNA findings with existing models.  (Sonamine plug: our software generates the SNA variables to be included as predictors in existing churn models, download evaluation edition here).

One other question raised was the scalability of this analysis.  Since it was only conducted for about 10% of the subscriber base, and that it was taking about 1 hour to run the analysis, did this mean it would take 10hours to complete it?  This is going to be really serious challenge.  The flip answer here is that Greenplum would parallelize things well.  Note however that much SNA work is iterative and does not do well in SQL.  The basic going rate for a greenplum database is in the hundreds of thousands, the same applies for other relational database systems such as Netezza.  In other words, is it worth half a million dollars or more to start with SNA analysis?  And how many DBAs do you know that would be thrilled to have another database in the IT environment?  All in all, it’s a difficult challenge to productionize this SNA into standard operational mode with weekly refreshes.

Where there are challenges, there are opportunities.  Stay tuned to Sonamine’s answers to these problems.

Tags: , , ,

Leave a Reply