Relationship analytics application notes
This post is part of a series on managing and analyzing graph data. Posts to date include:
- Graph data model basics
- Relationship analytics definition
- Relationship analytics applications (this post)
- Analysis of large graphs
In my recent post on graph data models, I cited various application categories for relationship analytics. For most applications, it’s hard to get a lot of details. Reasons include:
- In adversarial domains such as national security, anti-fraud, or search engine ranking, it’s natural to keep algorithms secret.
- The big exception — influencer analytics, aka social network analysis — is obscured by a major hype/reality gap (so, come to think of it, is a lot of other predictive modeling).
Even so, it’s fairly safe to say:
- Much of relationship analytics is about subgraph pattern matching.
- Much of relationship analytics is about identifying subgraph patterns that are predictive of certain characteristics or outcomes.
- An important kind of relationship analytics challenge is to identify influential individuals.
Notes on that middle point include:
- Pattern identification could be done through trial-and-error visualization, through predictive modeling, or through any form of investigative analytics in between.
- I presume what’s hardest about all this from a processing-performance standpoint would often be enumerating the subgraphs of a particular candidate pattern.
So I’m tempted to say “it’s all about subgraphs.” But it might be more accurate yet to say “It’s about paths”. Arguably, that’s saying the same thing; paths are subgraphs, and subgraphs are made up of paths, so a way of finding one is also a way of finding the other. But referring to paths nods to such standard tasks as:
- Finding the shortest path between two nodes.
- Calculating centrality metrics.
Paths are also simpler than subgraphs, and hence also simpler to think about.
Let’s drill down a bit more on the cases of influencer analysis and centrality. Telecom service providers around the world compete with relatively few of their peers (because they’re so geographically bound), and hence are pretty good about sharing technical ideas with each other. One application that has spread like wildfire is influencer analysis for churn control. The idea is to identify influential subscribers who, if they left your service, would be particularly likely to take other people with them, so that you can make great efforts to retain them. The key data used is CDRs (call detail records).
As in many things, it’s tough to separate influencer analysis adoption fact from fiction.
- The telecom case is surely real; I’ve heard of many examples.
- Social networking is a harder call. Top-down, the story sounds good; but bottom-up, I’m not so sure.*
- I’m quite dubious about attempts to use influencer analysis based on, say, credit card records; the detailed information about person-to-person connections isn’t there.
- National security clearly uses similar kinds of techniques, albeit for slightly different purposes.
Specific conclusions I’ve heard include:
- Who calls you is a better predictor of whether you influence cellular subscribers to churn along with you than who you call.
- Length of calls is an indicator of involvement influence in terrorist networks (short ones suggest there’s serious business being done).
*For example my Klout profile asserts I’m more influential about Airlines than about Databases or Software. A bit of manual intervention could surely change that — which just serves to underscore my doubts about the effectiveness of social network analytic automation.
One more thing — relationship analytics on social networks rarely works unless you take out a few spurious highly-connected nodes. The paradigmatic example is the local pizza parlor, which receives many phone calls, but is neither a terrorist mastermind nor a major influence upon telecom service churn. More on that point when I write about the partitioning of large graphs.
Comments
6 Responses to “Relationship analytics application notes”
Leave a Reply
[…] Relationship analytics applications […]
Another area for graph analysis is around the area of “graph partitioning”, or more commonly known as community detection.
There’s some interest in marketing segmentation that is based on the community of friends that you are a part of, as opposed to segmentation based on common user aspirations and behavior. Treating each community in a coherent marketing communications approach is the end goal.
Of course, we would suspect some overlaps as birds of the same feather do flock together.
Looking at the examples, it looks like you’re using the term “graph analysis” to mean the same thing as “link analysis”. That’s fine with me, but if you mean something different you should spell that out.
For applications like claims fraud, insurers use link analysis/graph analysis to identify anomalous relationships among claimants, appraisers, service providers and policyholders, for example. If an unexpectedly large number of claims include the same parties, it’s a clue that there is a fraud ring operating. A claims fraud monitoring application will kick those claims over to an investigator.
Thomas,
I wouldn’t assume there’s a single, canonical definition of “link analysis”. So I couldn’t give you a precise compare/contrast.
[…] Relationship analytics applications […]
[…] is still trying to figure out exactly which relationship analytics application areas it is pursuing. Yarcdata’s big multi-year design partner was a large intelligence agency, for […]