Posts Tagged ‘predictive analytics’

Game developers obtain significant revenue improvements using Sonamine

Thursday, August 11th, 2011 by Nick

At Casual Connect in July, we presented two game developer case studies using Sonamine predictive segments.

Easymode, a korean game developer, achieved 63% higher conversion rates by showing pop-up promotions to the Sonamine ConvertSoon segment.

Gamepoint, a developer of realtime multiplayer social games, achieved 160% higher conversion rates using Sonamine ConvertSoon predictive segment.

The presentation can be found here.

Investigative SNA or large scale SNA? Part 1

Wednesday, January 19th, 2011 by Nick

In the recent Predictive Analytics World show, Karl Rexer was presenting the highlights from the annual data miner survey.  735 data miners from 60 countries (mostly North America and Europe) responded.  You can contact Karl directly for inquiries on this survey (

In terms of algorithms used, it was interesting to see that

  • 12% of the data miners mentioned using Social Network Analysis
  • 9% of the data miners mentioned using Link Analysis.

This means there are at least 88 data miners out there using social network analysis!  This is quite a large number and we are glad to see this.

However this finding also raises some questions:

  • how are they using SNA?
  • what tools are they using?
  • are the models using SNA in production?
  • are data miners exploring individual communities or looking at large scale modeling?

Unfortunately the data does not provide any clues to the answers.  But it does bring up some key differences between various SNA tools.  We’ll touch briefly on one key difference and expand on it in the next post.

There are two distinct flavors of SNA.  One is investigative in nature and involves an analyst discovering and analyzing individual social networks.  In this flavor of SNA, if the analyst finds fraud or criminal activity, it is more than sufficient to make up the cost of the software and the analyst time.   Financial services, law enforcement companies use this type of SNA.  Vendors include i2 (see and SAS (see

The second is a large scale backend version of SNA.  In this case, you cannot have analysts looking at individual users, for example telephone customers.  Rather you are trying to add SNA variables into predictive analytics and data mining.  In many cases, this large scale SNA can be used as a filter to help narrow down the cases for investigative SNA.  Sonamine is the only vendor in this space, providing large scale SNA for up to billions of nodes in the social network.

It is likely that the data miners in the survey were probably using the large scale SNA to add SNA insights into their predictive models.  Next post, we will cover the differences between these two types and how customers should evaluate the key requirements.

Browsing graphs improve online advertising targeting lifts between 1.8 to 6.0

Monday, July 12th, 2010 by Nick

So what and overall business results

In this research, the authors provide convincing evidence that graph network mining allows brands to identify better audience targets for display advertising. The basic idea is that if a brand already knows some users/cookies who are customers or have brand advocates, their “social network” is a better targeting audience for the brand.  You can find the social network of “seeds users” using online browsing data.  These findings are robust across multiple industries.

This improved cookie-level audience identification allows brands to lower the cost of the online advertising while reaching the audience most likely to take a brand action.

Contact us to learn how Sonamine graph mining platform can help you.


The steps of the process are as follows (1) first construct a network graph of cookies based on their visits to infrequently visited pages.  (2) to identify a target brand audience, first identify a seed list of cookies who have an affinity to that brand.  (3) use the network graph to identify which other cookies  are similar or close to the seeds in (2).

The data was sampled from browser logs to user generated content (UGG) sites over 90 day period, resulting in 10M unique cookies.  The entire page browsing history of these 10M unique cookies were then extracted from the 90 day dataset.  A link is created between two cookies if they were observed to have visited the same UGC  URL.    For the second step, brand actors are defined to be those browsers (here meant as people who have viewed the page, not a web browser like Firefox or Internet Explorer) observed to have visited a particular brand oriented page selected by the advertiser, e.g., a brand loyalty page, a customer login landing page, a purchase thank-you page, or simply the company’s home page. The seed nodes are these brand actors.   In the experiments, seed nodes varied from 5000 to 1M each, with an average of 100,000.  Five different proximity closeness metrics were used :

- POSCNT – the number of unique pages (content) that the link the target browser to the seeds in the immediate .

- MATL – for each link between the browser and the seed nodes, identify the maximum number of UGC pages for each browser.

- maxCos – the maximum of the cosine similarity between the browser and any seed node.

- minEUD: the minimum Euclidean distance between the normalized content vector of a candidate node and that of any seed node.

- ATODD – the ratio of the number of a browser’s neighbors that are seed nodes to the number of its neighbors that are not seed nodes.

From these five measures, they created 5 ranked lists for each seed list.  For each of these lists, an evaluation was done by comparing the percentage of the future “brand actors” in the general population versus the percentage of future “brand actors” in these ranked lists.  An AUC comparison was used to see if the ranked lists performed better than “random” lists.

This analysis was repeated for different industries – hotels, apparel (sporting, womens, hiphop) , electronics, auto insurance, cellphones, credit reports, modeling agencies…

Detailed Results

The table below shows the results of univariate comparison of the different closeness measures.


ALL the different proximity measures provide significantly better than random performance across all industries.

However, the best performing metric varied by industry.  This led the researchers to develop a multi-variate combination of these 5 measures with logistic regression techniques.   We will not discuss these here.


Foster Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray.  Audience Selection for On-line Brand Advertising: Privacy-friendly Social Network Targeting. KDD’09, June 28–July 1, 2009, Paris, France.