Influencer marketing for data miners (1)

August 25th, 2010 by Nick

Influencer marketing is pretty hot these days, companies such as Pursway, Influencer50, idiro, xtract all are players in this field.   They all suggest that marketing can be improved by targeting messages to influential customers that then spread the word to their network of friends and colleagues.  This concept is intuitive and easy to grasp.

The other under-current in marketing these days is the use of data and measurement.  Almost every thing is going digital, and hence can be measured, reported and hopefully “predicted”.   CMOs want to optimally allocate their marketing budget, and to do this, you need to measure the effects of each marketing program to see if the money being spent is working.

These two trends – influencer marketing and data mining – merge in the job of the marketing data analyst or data miner.   Questions arise such as how does influencer marketing work compared to traditional direct or email marketing?  What is the cost-per-sale or cost-per-acquisition from these different programs?

The immediate thing to tackle is how to measure the effect of influencer marketing.   It is very difficult to determine for sure that influencer A caused customer B to buy the product.  Similarly, it is very difficult to tease out the other confounding factors such as advertising, word-of-mouth from “non-influencers”, email promotions, retail level coupons and other marketing campaigns.

One simple way to quantify the influence is to make some assumptions.  If customer B bought a product within 1 month of influencer A buying the same product, and we know that B and A are “linked”, then we can attribute customer B’s purchase to influencer A.  In other words, A has a value of 1 due to B.   Influencer A will have large values when many of her “linked” friends purchase the same product within a month.  This would be a “first order” approximations.

Already, we can hear the objections:  how do we “know” that A and B discussed the product in question?  We don’t… why one month and not 12 months?  We don’t really know the timescale of influence, so it’s probably trial and error here… what if customer B received an  email promotion? Another good question, perhaps we can discount the influencer effect by1/2.  It’s not perfect, but for influencer marketing to be quantitatively analyzed, some trade-offs will need to be made.

Now that there is a “metric” that defines the actual impact of an influencer (A influenced 10 purchases, but K only influenced 1 purchase), you can at least start to identify which of your customers are influencers!

Next post, we’ll discuss how to use traditional data mining and predictive modeling to score these influencers, so that you can target them!

IBM researchers use social groups to predict churn (again)

July 22nd, 2010 by Nick

Another data mining study showing how churn can be predicted by using social network analysis.  The main finding here is that off-net status of the leaders of social groups is a useful predictor of churn.   See how the diagram below indicates that the social strength of the leader (see details for how this is calculated) is correlated with the increase in probability to churn.

churnleader

Lifts up to 8x were reported for the most valuable customers.

The main application here is to use social network analysis software such as Sonamine to identify groups and their leaders, and add these an input predictors into existing churn prediction models.

Details

The paper is quite long and so we confine our discussion here to the parts where churn was predicted using social group analysis.

Data used was from a mobile operator,  included 28 days of call data, daily volumes of 117M calls, 28M subscribers, 16M on-net.  800,000 subscribers churned during the 28 days.

The overall method consists of five step (1) use call data records to cluster users into tightly knit social groups (2) identify specific characteristics of each group and use them to model churn of the group (3) use the group characteristics to predict group churn in the hold-out validation and test samples (4) use the rank of each group member within the group to build a individual model against probability of churning (5) combine group churn score and individual by multiplying them together.

Grouping users together into social groups is accomplished by first calculating the overlap between neighbors of each pair of callers.  The idea is that if we call a lot of the same people, then we are probably in the same social group.  Then the network is partitioned using fast algorithms.  (Note Sonamine software has such a fast clustering algorithm that is linear to the number of edges.  Try it out yourself).

The group is labeled as churn if more than one third of its members churned.  Several group variables were then calculated including size,# on-net, % on-net, social strength (max,min, ratio), calls made and received by leader, average call number made and received.  The leader is identified using a pagerank function with dampening at 0.85.   (Note Sonamine software has pagerank and other “leader identification” algorithms.  Try it out yourself)

A decision tree was used to build a model that predicts group-churn based on the group’s input predictors.

For the individual model, an exponentially decaying function was fitted to two variables – rank within the group based on pagerank function, and the probability of churn.  This means for all the rank =2, take the number of churn divided by total number to get the probability of churn.  Do the same with the different ranks.  Then plot them and get a exponential function.  The function used in the paper was       score=0.85 exp (rank-1)  Use this function to get the probability of churn if any user is rank=2, the value is 0.85.

Finally, to get an individual score, use the group score multiply by the individual probability to churn.  So if user is rank 2 in the group and the group has a 25% chance of churning, then you take the two numbers and multiply them together: 85%x 25% = 21.25%

References

Yossi Richter, Elad Yom-Tovy, Noam Slonim.  Predicting customer churn in mobile networks through analysis of social groups.   SDM , p. 732-741, SIAM, (2010).  The authors are at the IBM Haifa Research lab.

Pursway (previous known as Datanetis) was mentioned in this article.

Browsing graphs improve online advertising targeting lifts between 1.8 to 6.0

July 12th, 2010 by Nick

So what and overall business results

In this research, the authors provide convincing evidence that graph network mining allows brands to identify better audience targets for display advertising. The basic idea is that if a brand already knows some users/cookies who are customers or have brand advocates, their “social network” is a better targeting audience for the brand.  You can find the social network of “seeds users” using online browsing data.  These findings are robust across multiple industries.

This improved cookie-level audience identification allows brands to lower the cost of the online advertising while reaching the audience most likely to take a brand action.

Contact us to learn how Sonamine graph mining platform can help you.

Details

The steps of the process are as follows (1) first construct a network graph of cookies based on their visits to infrequently visited pages.  (2) to identify a target brand audience, first identify a seed list of cookies who have an affinity to that brand.  (3) use the network graph to identify which other cookies  are similar or close to the seeds in (2).

The data was sampled from browser logs to user generated content (UGG) sites over 90 day period, resulting in 10M unique cookies.  The entire page browsing history of these 10M unique cookies were then extracted from the 90 day dataset.  A link is created between two cookies if they were observed to have visited the same UGC  URL.    For the second step, brand actors are defined to be those browsers (here meant as people who have viewed the page, not a web browser like Firefox or Internet Explorer) observed to have visited a particular brand oriented page selected by the advertiser, e.g., a brand loyalty page, a customer login landing page, a purchase thank-you page, or simply the company’s home page. The seed nodes are these brand actors.   In the experiments, seed nodes varied from 5000 to 1M each, with an average of 100,000.  Five different proximity closeness metrics were used :

- POSCNT – the number of unique pages (content) that the link the target browser to the seeds in the immediate .

- MATL – for each link between the browser and the seed nodes, identify the maximum number of UGC pages for each browser.

- maxCos – the maximum of the cosine similarity between the browser and any seed node.

- minEUD: the minimum Euclidean distance between the normalized content vector of a candidate node and that of any seed node.

- ATODD – the ratio of the number of a browser’s neighbors that are seed nodes to the number of its neighbors that are not seed nodes.

From these five measures, they created 5 ranked lists for each seed list.  For each of these lists, an evaluation was done by comparing the percentage of the future “brand actors” in the general population versus the percentage of future “brand actors” in these ranked lists.  An AUC comparison was used to see if the ranked lists performed better than “random” lists.

This analysis was repeated for different industries – hotels, apparel (sporting, womens, hiphop) , electronics, auto insurance, cellphones, credit reports, modeling agencies…

Detailed Results

The table below shows the results of univariate comparison of the different closeness measures.

results

ALL the different proximity measures provide significantly better than random performance across all industries.

However, the best performing metric varied by industry.  This led the researchers to develop a multi-variate combination of these 5 measures with logistic regression techniques.   We will not discuss these here.

References

Foster Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray.  Audience Selection for On-line Brand Advertising: Privacy-friendly Social Network Targeting. KDD’09, June 28–July 1, 2009, Paris, France.

Personal social network drives choice of mobile operator

June 16th, 2010 by Nick

There is now a growing body of evidence showing that mobile consumer’s decisions are influenced by their social network.  Here is another recent research paper that shows how:  “The probability that a customer selects a mobile phone company increases with the number of members of her social network already subscribed to that firm”.  The findings are significant at the 0.01 level.

So what?

Although it is easy in hindsight to say this finding is very logical and intuitive, nonetheless it is good to have some reproducible research backing up the intuition.  For marketers, the implications are clear.  In retention, you can use SNA to first isolate the social network of each subscriber, then improve your churn predictions by adding SNA derived variables regarding %-of-network-that-is-offnet.

In acquisition, trying to get competitor’s subscribers to switch, marketers can use member-get-member type promotions.  The target campaign list in this case would be SNA based as follows:  use Sonamine to find communities in your current calling base.  For these communities, identify ones where your network constitutes the bulk of the community.  These communities are the ones you should target for member get member campaigns.

Caveats: this study was done in a saturated mobile market with over 100% penetration.  The results will likely be different in less saturated markets.

Details

The study was conducted using a sampling methodology in a Spanish university.  The sample was pretty small, 221.  But for each data point, a detailed questionnaire was answered by the student.  This way, there was very accurate data on the % of each subscriber’s social network that were customers of the 3 big Spanish mobile operators (Orange, Movistar and Vodafone).  The authors then created a model equation with the variables representing the key drivers.

One intriguing finding is that postpaid customers were significantly more likely to switch operators than prepaid customers! (p<0.01)

Reference

The role of (personal) network effects and switching costs in determining mobile users’ choice.  Juan Pablo Maicas, Yolanda Polo, Francisco Javier Sese.  Journal of Information Technology (2009) 24, 160–171.

Facebook social network analysis predicts location better than IP address

June 11th, 2010 by Nick

Given the media coverage around location based services (LBS), it is easy to forget that most of these services center around accurately identifying the user’s location.  In most LBS, you have two types of services – real time services which provide information about your current location, and account level services which provide you information about your “home” base.  In the latter, you cannot always use the IP address to locate the home base because the user might be traveling.  At the same time, the accuracy of IP based geolocation is decreasing due to larger wifi and 3/4G connections.

With the technique described in this facebook research paper, social network information can be used to accurately predict the location of users who have not provided this location information.  Note this is a collective inferencing technique that uses social connections as a prediction mechanism.

Telcos can use a similar algorithm to help identify addresses of the prepaid users; social networks can use this to improve their user experience by “predicting” and tailoring the web service to the location of their users.  Retailers can use this to predict the location of their anonymous shoppers once the social network is extracted.  Fraud for prepaid cash cards can be more easily detected by establishing a home geography of the user.

From a practical perspective, here are the recommendations from this study:

“Based on these results, it seems that a good trade off is to predict from friend locations when an individual has 5 or
more locatable friends, and from the user’s IP address if she has fewer than 5 friends with known addresses. Doing this
causes the performance at 100 miles to slightly exceed the IP performance, and it is almost as good as strictly friend-
based prediction at smaller distances.”

Details

3.5 million US facebook users have addresses were accurately tagged to longtitude latitude using TIGER/line dataset.  These users had an average of 10 friends, creating a 30.6million edge graph.  2.9 users had at least one friend with an accurate address.   This user base was shown to have similar online habits and geographic location as the overall population, ie. possibly of bias in the sample was low.

Main technique involved an empirical finding that probability of friendship as a function of distance had a reasonably good power law fit with exponent of -1.  This means that based on the friendship links, you could calculate the probability for any one location.  Of course, it would be prohibitive to maximize likelihood for ALL locations.  The authors identify empirically that it the most likely location is usually colocated with one of the friends.  This means the algorithm only needs to calculate and compare probabalities for all known friend locations.  This optimization significantly reduces computation.

To compare performance, the authors tried two approaches (1) predicting the known addresses of every 2.9M users without removing the unknown locations for other predictions (2) taking some users out completely out of the graph.

Figure 11 shows the results of (1).

fig11

Figure 12 shows the results from (2).  Interestingly this uses the concept of iterative collective inferencing with 3 iterations being optimal.

fig12

References

Lars Backstrom, Eric Sun, Cameron Marlow.  Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity.  WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA.