Browsing graphs improve online advertising targeting lifts between 1.8 to 6.0

So what and overall business results

In this research, the authors provide convincing evidence that graph network mining allows brands to identify better audience targets for display advertising. The basic idea is that if a brand already knows some users/cookies who are customers or have brand advocates, their “social network” is a better targeting audience for the brand.  You can find the social network of “seeds users” using online browsing data.  These findings are robust across multiple industries.

This improved cookie-level audience identification allows brands to lower the cost of the online advertising while reaching the audience most likely to take a brand action.

Contact us to learn how Sonamine graph mining platform can help you.

Details

The steps of the process are as follows (1) first construct a network graph of cookies based on their visits to infrequently visited pages.  (2) to identify a target brand audience, first identify a seed list of cookies who have an affinity to that brand.  (3) use the network graph to identify which other cookies  are similar or close to the seeds in (2).

The data was sampled from browser logs to user generated content (UGG) sites over 90 day period, resulting in 10M unique cookies.  The entire page browsing history of these 10M unique cookies were then extracted from the 90 day dataset.  A link is created between two cookies if they were observed to have visited the same UGC  URL.    For the second step, brand actors are defined to be those browsers (here meant as people who have viewed the page, not a web browser like Firefox or Internet Explorer) observed to have visited a particular brand oriented page selected by the advertiser, e.g., a brand loyalty page, a customer login landing page, a purchase thank-you page, or simply the company’s home page. The seed nodes are these brand actors.   In the experiments, seed nodes varied from 5000 to 1M each, with an average of 100,000.  Five different proximity closeness metrics were used :

- POSCNT – the number of unique pages (content) that the link the target browser to the seeds in the immediate .

- MATL – for each link between the browser and the seed nodes, identify the maximum number of UGC pages for each browser.

- maxCos – the maximum of the cosine similarity between the browser and any seed node.

- minEUD: the minimum Euclidean distance between the normalized content vector of a candidate node and that of any seed node.

- ATODD – the ratio of the number of a browser’s neighbors that are seed nodes to the number of its neighbors that are not seed nodes.

From these five measures, they created 5 ranked lists for each seed list.  For each of these lists, an evaluation was done by comparing the percentage of the future “brand actors” in the general population versus the percentage of future “brand actors” in these ranked lists.  An AUC comparison was used to see if the ranked lists performed better than “random” lists.

This analysis was repeated for different industries – hotels, apparel (sporting, womens, hiphop) , electronics, auto insurance, cellphones, credit reports, modeling agencies…

Detailed Results

The table below shows the results of univariate comparison of the different closeness measures.

results

ALL the different proximity measures provide significantly better than random performance across all industries.

However, the best performing metric varied by industry.  This led the researchers to develop a multi-variate combination of these 5 measures with logistic regression techniques.   We will not discuss these here.

References

Foster Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray.  Audience Selection for On-line Brand Advertising: Privacy-friendly Social Network Targeting. KDD’09, June 28–July 1, 2009, Paris, France.

Tags: , ,

Leave a Reply