Earlier this year, I wrote a post called ‘Data, Equality and Insurance’, about a small survey that I’d undertaken into the equality policies of six leading insurers. I’d wanted to find out the extent to which those insurers were thinking of equality not just in terms of their one-to-one dealings with individual customers, but also in terms of their one-to-many dealings with customers overall. The survey grew out of an earlier blog post about the dangers that big data might present to the insurance sector in the form of inadvertent discrimination.
The reaction I got to that survey could be summed up as ‘it would certainly be a concern if it were to happen, but our systems aren’t configured that way, so it is not a worry’. That’s a revealing reaction, the fragility of which (when it comes to big data) I want to examine in this post.
First off, let’s be clear: I would be very surprised if any insurer in the UK was collecting and storing personally identifiable information (PII) about a customer and directly and systematically using that PII in underwriting or claims decisions in ways that would be illegal. End of problem? Not at all.
You may have heard of the case in 2012 of the US retailer Target using data mining techniques to predict which female customers might be pregnant, and then marketing pregnancy related products to them, even, as in this case, before the customer had told the news to anyone else. Target had never collected data specifically showing that any particular female customer was pregnant. Instead, their predictive analytics had gauged from her purchases that the customer was likely to be pregnant and had targeted her for marketing. Her purchase decisions had in effect manufactured a new piece of personally identifiable information about her and associated it with her customer identity. And this was a very personal and sensitive piece of information.
Big data is often talked about in terms of its capacity to reveal new and startling insight about the relationship between different parameters. It does this through correlation clustering, where the relationship between the objects are known instead of the actual representations of the objects. Out of such analysis comes a piece of ‘manufactured information’: in this case that the customer was very likely to be pregnant. Imagine then how easy it could be to ‘manufacture’ a variety of attributes about a customer from a pattern of activity found in disparate sources of data. It is quite possible that from the alignment of such patterns of activities with relatively generic profiles of loss or fraud could emerge decision patterns that become discriminatory.
The fact that insurers haven’t ever asked their customers about personal attributes the use of which would be illegal in underwriting or claims decisions, doesn’t mean that such PII is not capable of being manufactured and factored into such decisions through the design and application of all those algorithms in predictive analytics models. Those algorithms are picking up online purchases and social media activity and aligning it with a myriad of other data sources (some of dubious provenance) to find correlations of perceived significance. If, when added up together, those correlations point to you being a different type of health or motor risk than the insurer is looking for, then up will go your premium: that is, if you’re still able to buy a policy.
So for senior insurance executives to really be sure that their firms aren’t starting to drift down the type of slippery slope described in this earlier post about ‘social sorting’, they need to understand more about big data and its ability to manufacture personally identifiable information about us, and the dangers of these clusters of correlations then being used to influence underwriting and claims decisions in ways that could significantly impact certain types of customers. The response I got to my survey indicates that they’ve still got some way to go on this.
But surely, some will ask, aren’t algorithms objective? And isn’t data just pieces of fact? Far from it, and I’ll explain why in a forthcoming post.
This post owes much to the following paper by Kate Crawford and Jason Shultz, published by the New York University School of Law in October 2013: “Big Data and Due Process: Towards a Framework to Redress Predictive Privacy Harms.”