The new uncertainties that anonymised data will introduce
In this second post about the ethical issues raised by how insurers are handling anonymised data, we’ll start by looking at the question of buying and selling datasets. Most insurers say that they will not sell on your personal details. So is their commitment not to sell matched by a commitment not to buy? After all, if it is ethical not to sell information about you, surely it must be unethical to buy information about you? Unfortunately, commitments not to buy data are virtually unknown in today’s insurance market.
In earlier stages of the ‘big data revolution’, a commitment to only buy data from which personal identifiers had been stripped might have been acceptable. However those days are long gone and now only a very wide interpretation of personal identifiers would reduce the likelihood of our data being de-anonymised. So if your firm says it will respect policyholders’ privacy by not selling their personal details, it should give serious consideration to respecting their privacy by not buying in data with the expectation of de-anonymising something about them.
Let’s move on to other repercussions from insurers’ aggregation of datasets. Most of the datasets being aggregated will have come from non-insurance sources, the personal data in them having been disclosed under a wide range of circumstances (some formal and some casual). The reliability of those disclosures will therefore vary just as widely.This throws up two problems.
The first relates to the reliability of those disclosures. Aggregated datasets can show you underwriting correlations that help fine tune your pricing and acceptances, for at that point you’re working with correlated trends whose reliability mainly affects your bottom line. When it comes to decisions about claims, the picture is quite different, for now you’re talking about individual cases, not cases in the round. The reliability of those aggregated datasets upon which you’ve made underwriting decisions now becomes very material. Can the information disclosed by the policyholder, say to their energy company two years ago, influence your decision on their claim? And especially so if the information you’re relying upon has gone through first an anonymisation process at the energy company and then a de-anonymisation process within your firm? You as the insurer may have let it influence the premium you quoted, but it would hardly count as treating the customer fairly if you let it influence your decision on their claim. This is not an inconvenience for your data experts to find a work around for – it is a structural flaw for any business plan that relies upon the value of big data flowing through into claims management.
The second problem that insurers will confront with data of varying origin and quality is that their underwriting will now have to factor in new and potentially significant sources of uncertainty. Insurance has always been underwritten on a mix of risk and uncertainty – new lines of business had more uncertainty than risk (due to lack of data), while existing lines (with many years of quality data) would be all about risk. While at first glance, aggregating more and more datasets would seem to reduce uncertainty (due to more and more data), it can all too easily increase it.
That increase in uncertainty will of course come from variations in data quality and, to some degree, underwriters can allow for this in their rates. What they are unlikely to do however is allow for the uncertainty that flows from the correlation and aggregation processes themselves. There’s nothing certain about correlation and aggregation – they are after all just probabilities of relationships between variables. And if there’s one circumstance under which the parts won’t always add up to the whole, it’s in what all of us decide to disclose to different people, for different reasons, at different times, under different circumstances. The net result is the introduction of a new, potentially disruptive uncertainty into your underwriting.
If this was 2007, I would be less confident in raising such a concern, but if the financial crisis of the last six years has taught us anything, it is that slicing and dicing something and then selling it on for someone else to then use complicated algorithms for recombining it into new forms is a market risk that is best not ignored second time round.
Big data does hold many opportunities for insurers, but like many information revolutions, it comes with risks that could trap the unwary and perhaps lead markets astray.