Increasing cybersecurity breaches have led to identity theft and impersonation in fraudulent activity. In response, many companies and startups are now increasingly developing advanced techniques to detect and predict false identities. As in many things in life, it is the combination of different techniques, rather than one single method, that proves to be more practical and accurate. This paper discusses how text based and link analysis (graph theoretic) similarity measures are combined for better performance. In this blog post we will cover the link analysis based approach used in the paper.
Graph analysis setup
To identify the similarity between names based on link analysis, a mathematical undirected graph must be constructed where each vertex represents a unique name and an edge represents a relation between two names. The edge contains information about the strength of the relation, most often derived on the frequency of relations between the two names. Finally, a real world entity that is linked to its multiple different false identities. Practically, the job of finding a false identity is reduced to finding two vertices which have very high similarities.
Types of link similarity approaches
Overlapping neighbors approaches emphasize that if two vertices have a high overlap of neighbors, then the two vertices have a higher similarity. The triple connected approach is a variant which counts the number of connected triple vertices of which the vertex pair belongs to. These methods are available in the Sonamine graph scoring engine and server.
These approaches can be very sensitive to noise and generate large number of false positives. Boongoen and Shen proposed a uniqueness approach that takes into account not only the number of shared neighbors, but also the nature of the link between the two vertices and the shared neighbor. Take for example vertices A and B are both connected to C; the uniqueness of neighbor C for the pair is the sum of the edge weights from A to C and B to C, divided by the sum of all the edge weights coming into C. This uniqueness is further aggregated and normalized for all the common neighbors for A and B. This approach was found to be more effective in terrorist identification.
Connected path approach is proposed by Boongoen and Shen as a way to consider the expanded context of the two vertices and the shared neighbor. By looking at whether the paths of various lengths that connect the two vertices, this method operationalizes the intuition that the greater context around two similar vertices should also be similar – ie the paths that connect them should also be similar. Experiments A hybrid algorithm covering both text based and the connected path similarities was proposed and tested on the Terrorist dataset and Digital Bibliography and Library Project publication dataset (see references below). The proposed method was compared with standard techinques such as Jaro similarity, pagesim and simrank.
Results
The table below demonstrates how the proposed connected path approach outperforms all the established methods by a large margin. Clearly the graph theoretic approach adds to both precision and recall.
Moreover “the hybrid approach can further improve the performance obtained from the Connected-Path method alone (r is set to 4). Accordingly, the number of false positives substantially declines, especially within the top-200 and 400 sets. In addition, this positive trend also exists when the less accurate, but more efficient, Connected-Path estimation is exploited (i.e. r = 2) in the hybrid modelThe”
So what?
There are multiple applications of this alias detection method, the most obvious one being the one discussed. Other uses include detecting rotational churners or spinners in the mobile telecommunications world. Setting up the problem differently allows a credit department to determine if a new application is similar to past known fraudulent accounts. Vertex similarity in this way can be used for many predictive applications.
References
Tossapon Boongoen and Qiang Shen. Intelligent Hybrid Approach to False Identity Detection. ICAIL-2009 Barcelona, Spain
[Terrorist dataset] P. Hsiung, A. Moore, D. Neill, and J. Schneider. Alias detection in link data sets. In Proceedings of International Conference on Intelligence Analysis, 2005.
[DLBP dataset] S. Klink, P. Reuther, A. Weber, B. Walter, and M. Ley. Analysing social networks within bibliographical data. In Proceedings of International Conference on Database and Expert Systems Applications, Poland, pages 234–24
Tags: fraud, predictive analysis, shortest path





