Big data and the dangers of unintentional discrimination

There’s concern that insurers’ use of big data will result in some sectors of society suffering discrimination. Data mining could lead to someone with a ‘protected characteristic’ being treated unfairly, through more expensive cover, reduced cover or no cover at all. Insurance executives respond with concern that anyone should ever think that this was something their firms would do. Their concern is genuine, yet should they really be so confident?

I think it is highly unlikely that any established UK insurer would intentionally discriminate against someone with a ‘protected characteristic’. I chose those words carefully though. It is quite possible that insurers could drift into unintentional discrimination and in this post, I’ll outline how this might come about.

It’s worth remembering from the outset that data mining is meant to distinguish between individuals and to assign them the qualities possessed by those who seem statistically similar. It goes about this through a series of stages, from labelling and targeting, collecting and feature selection, to the use of proxies. Each stage carries the risk of unintentional discrimination.

where data is not pre-labelled or the labels are insufficiently precise, the choice of which label to attach to which data can introduce a subjective skew that may be free from intentional biase, but not unintentional biase. This will result in data that is being trained to recognise patterns then picking up those biases and systematically reproducing them throughout the model.
even when the data is labelled, it could be data that comes with values and labels that reflect past prejudice. This will result in conscious prejudice or implicit biase of prior decision makers being reproduced into new models without any of the current decision makers realising it.
current data often comes into the model as an input, some of which is under the control of the insurer, but sometimes not. If those inputs contain conscious prejudice or implicit biase, that will be reproduced in the model. If Google can become unstuck in this way, couldn’t insurers?
where people with ‘protected characteristics’ live on the margins of big data, their lives will be less ‘datafied’ and so under-represented in datasets. This may place them as statistical outliers that trigger decisions with unfavourable repercussions. Once this pattern is established within a model, those decisions will replicate themselves and become even more skewed as a result.
where data is unverified or relatively high level, data mining will use proxies that are more readily and cheaply available and, inevitably, less granular. This will introduce inferences that may look statistically sound, but are in fact inaccurate.

While each of these five points may not appear that significant, the real danger lies in them coming together within a data model that is learning to find relationships of significance, but which is in fact riddled with, at best, unconscious biase. And what is most worrying is that this might all be happening behind a veneer of statistical significance, of scientific rigour, of objectivity.

Insurance isn’t going to sweep away on its own past patterns of discrimination. Yet it is under a clear obligation not to reproduce those past patterns in how it works today, in how it prepares for tomorrow. That is why insurers need to take some clear and decisive decisions on how to ensure that people with protected characteristics receive fair outcomes from their engagement with the insurance market.

Where should they start? Firstly, individual insurers should break out of the mindset that ‘equality means employee’ and make some detailed policy commitments about ‘equality and customers’ (more here). And together, insurers should get on and agree a set of principles that frame their responsibilities around how they make use of big data. Having something in the public domain by the end of this year doesn’t sound too onerous.

Click here to follow me on LinkedIn for more views and insight