5 sources of bias in analytics that insurers must address

Insurers in the UK are waiting to hear what the regulator has to say in its forthcoming report on data ethics. One of the big issues that the FCA’s report will cover is discrimination. And while most insurers will be expecting the focus to be on bias in data, the regulator will also address bias in analytics. So how might insurers structure their approach to bias in analytics?

I’ve written on several occasions in the past about discrimination being the biggest ethical challenge that insurers will face (more here). And ever since 2015, when a short survey I undertook pointed to UK insurers giving too little attention to discrimination vis-à-vis consumers, I’ve been urging clients to develop a discrimination prevention plan.

Such a plan must of course address bias in the data they collect or acquire. That is widely recognised as a source of discrimination risk. Less widely appreciated is the role played by bias in analytics.

In 2018, I took part in an event at the University of Zurich on how the design of insurance pricing analytics can lead to discriminatory outcomes. The message to the data scientists attending was clear – how you assemble the analytics for your pricing model can increase or decrease bias.

What this means is that even if you have done everything possible to eliminate bias in your data,  it can still emerge through choices in how your model is designed, tested and deployed.

Five Categories of Bias in Analytics

Researchers have grouped bias in analytics into five categories, representing different stages of a model’s development. Here’s a summary of those categories:

The Representation Bias: this occurs when your model is trained upon data that under-represents the population for which your model will be used. This could be through the sampling methods for your training dataset being wrong, or that training dataset being out of date.

The Measurement Bias: this occurs when you select features and labels to use in the model. If the measurement process is not consistent across groups within your target population, then bias occurs. The same happens if some groups generate more of the data to be measured than other groups. And your approach to classification can also introduce problems (more on that here).

The Aggregation Bias: this occurs when a ‘one size fits all’ model is used for a population that is more diverse than the model allows for. As a result, one model variable could mean different things across groups within that population. How you balance the simplicity and performance of your model matters.

The Evaluation Bias: this occurs when your model is tested or benchmarked on data that doesn’t represent the target population for which the model will be used. This means that your model performs well on certain groups within that population, and badly on others.

The Deployment Bias: this occurs when there is a mismatch between the goal the model has been built to address, and the way in which it ends up being used. This could be in relation to the goal itself, or the population to which it is applied.

The Good, the Bad and the Response

None of these sources of bias in analytics are revolutionary. Insurers who have been putting data ethics at the heart of their digital strategies will be familiar with them, and knowledgeable about the challenges when addressing them. This applies to the development of their own models, and the use of models from external commercial providers and in-sector partners.

Those insurers will be able to evidence the steps they’ve been taking and the results that has produced. What regulators like the FCA will be looking for is intent, direction and progress.  They will know where the difficulties like, and be looking at how those have been addressed.

What we will find signalled in the FCA data ethics report will be practices that suggest some insurers are not addressing bias in their analytics. If the FCA have picked up on some of the practices I’ve reliably been told about, then their warnings will be sharp.

It is not hard to see a four stage process emerge…

  • the regulator’s data ethics report confirms its concerns about bias in analytics.
  • supervisory technologies are then deployed to collect evidence of which firms have been really failing in their regulatory obligations.
  • the worst offending firms are told to switch off the offending algorithms until they can evidence how they’re resolved the problem (as per this FTC ruling last month in the US).
  • they levy a fine commensurate with the scale of misconduct and switch off the certification of those responsible for it.

The step that will trigger investor concerns will be the third one – it is such a disruptive one.

Summing Up

My short survey back in 2015 pointed to insurers under appreciating the risk of discrimination in relation to consumers. Many will have upped their game since then, but the danger is that they’ve followed the broad public concern about bias in data, and not given enough attention to bias in analytics. Given the commitment the regulator has given to Parliament to address discrimination in insurance, and given the regulator’s ‘SupTech’ capabilities, insurers need to prepare clear evidence of how they have been managing this issue.

It is without doubt a difficult issue to address, full of ethical dilemmas. Yet out of that evidence of intent, direction and progress needs also to emerge the accountability that the regulatory is expecting.

More Information

You can obtain a PDF of this blog post by clicking on the button below – no email address is required. In the download, you will find examples set in the insurance world for each of the five forms of bias in analytics set out above.

Acknowledgement

Much of this article has been based upon this following paper: A Framework for Understanding Unintended Consequences of Machine Learning (Cornell, 2020), by Harini Suresh and John Guttag (arXiv:1901.10002v3 [cs.LG])