An insurer told me recently that the only reason for a huge increase in a renewal premium was because of a new dataset they’d brought in. Let’s put aside a whole host of issues around stability and confidence in its underwriting, and concentrate instead on how they placed responsibility for that increase onto that new dataset. It was very much presented as an issue caused by the data, not by them as the insurer. It begs the obvious question: how objective is data? And to pick up on a current trend, how objective is ‘big data’?
Data is often presented as factual, and the outputs of statistical analysis as objective. Yet is this really the case? Let’s look at data first. What data you decide to collect is a choice, as is the extent to which you recognise the context from which that data emerged. The way in which one dataset is then aligned and connected with other datasets involves judgements being made. And how you then weigh up the significance of a piece of data involves subjective assessments about what is high or low, acceptable or excessive. Then there’s the question about how representative is a piece of past data for your present day decision.
The decisions then taken as a result of how the significance of that data is judged are more widely recognised as subjective, although even then, there are some who see a particular course of action as inevitable, so giving it an aire of objectivity. That’s essentially how the aforementioned insurer was presenting its renewal premium, while at the same time, rather ironically, casting doubt on the reliability of its competitors’ quotes.
Messy Datasets
Big data has been described as “…datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” So what defines ‘big data’ more than anything else is not so much the size of the datasets involved, but the ability of the complex algorithms it uses to “mine messy and diverse datasets”. If some of those datasets originate from social media, then the data cleansing process involves decisions about what attributes and variables will be counted and which ignored. Not everyone uses Facebook or Twitter, and those organisations may not release all posts, so such datasets will be partial to say the least, delineated by many subjective decisions. And as for much of the content that people post on social media – objectivity is not the word that first springs to mind. Using it for insurance is a bit like ‘anecdotal underwriting’.
And then there’s the algorithms to undertake all that data mining. They’re programmed by human beings, whose values and biases are embedded into the software’s source code: for example, in determining whether a particular correlation is statistically significant or not. Those people may know about software, but how expert are they in the type of decisions that their programmes will start to automate? As for independence, their contracts and salaries largely point in one direction.
One Easy Yardstick
There’s an easy yardstick to bear in mind when thinking about how objective data and data analytics are in the world of insurance: if both were always objective, then all quotes and renewal premiums would always be the same.
Does this matter? Well, think of it this way. Underwriting and claims decisions are increasingly being automated, to the point that underwriters are often unable now to understand how end premiums have been generated. That has huge implications for the fairness of the outcomes being experienced by consumers. And not just the type of consumers seen as vulnerable by regulators such as the FCA, but a much wider range of people.
In gathering together data and applying analytics to it for key business decisions, insurance executives need to keep front of mind that the outputs their managers are being presented with must still be subject to critical review and oversight. Outcomes are never ‘down to the data’: they’re down to the decisions people make.