Controls for Insurers' Use of Social Media Data
A recent article in the trade press reminds us of what insurers have been working on. I’m going to look at some key quotes from the article and use them to raise a series of control issues that insurers should be focussing on in respect of social media data.
The article appeared last week in the online US trade journal, ‘Insurance Thought Leadership’. It was a fairly well balanced take on the opportunities and risks associated with social media data. The author was Michael de Waal, the CEO of software house Global IQX.
What was Learnt from the Admiral Situation?
Reference is made to a UK motor insurer who…
“created a personality-type assessment based on certain choices and actions of potential clients. These included what athletes the client likes/follows, how concise their writing was and how often they used exclamation marks in social media interactions. The data enabled the insurer to determine if the client was overconfident or reckless, traits associated with many high-risk drivers.”
This quote comes from the report of a UK law firm back in early 2017 and so comes after the Admiral situation. Two things stand out from this. Firstly, the use of such assessments needs to be subject to what I would call a causation/correlation control. What such a control does is to challenge users of such assessments on the degree of significance set within the algorithms that underpin it. And to then weigh up what is found against the context of the decisions that are relying on it. Certain types of decisions require higher levels of significance.
A second control that is obviously critical to such assessments is for discrimination. To what extent has the firm built bias controls into the training and testing of the algorithms upon which such assessments rely? And how are these monitoring on an on-going basis?
Remember when deciding how to set such controls that the UK public have said that they mistrust how insurers use their data (more here). So the returns that the firm wants from its investment in social media data needs to be balanced against the risk of controls failing to address these ethical concerns.
What then might cause those controls to fail? The more obvious cause would be ethical culture and insufficient attention to behavioural ethical risks (more here) and rationalisations (more on this soon).
Predicting Health
Here’s another quote…
“A life and health insurer in the U.S. tested behavioral data gathered from online retail sites and third-party databases as inputs for predictive modeling to determine the health risks of over 60,000 applicants. Examining user behavior helped the insurer get results similar to traditional medical examinations.”
The financial benefit for the insurer is obvious, but what about the financial benefit for the policyholder? Predicting health risk from behavioural data and using it in underwriting is one thing. Using it in claims and counter fraud is quite another thing. That’s because data coming out of a ‘one to many’ context can’t be readily picked up and used in a ‘one to one’ context.
Clearly, the context aspect of the causation/correlation control is important here. Other controls that come to mind are to do with decision quality. How well are quality controls in claims and counter fraud factoring in the extent to which the risk profile is being influenced by predictive modelling of behavioural data? If the predictive data and the real data differ, how is that reconciled?
Human State Sensing
Sentiment analysis has been talked about in insurance circles for some years now…
“Sentiment analysis, equipped with natural language processing (NLP), is a machine-learning technique that analyzes and interprets text. Sentiment analysis can take in written user information at scale and use it to assess a client's behavior. For example, sentiment analysis can read, analyze and collect information from a business's review section, flagging any potential risks that require further investigation.”
Again, those two controls I’ve mentioned earlier, relating to causation / correlation and to discrimination, matter a lot here. NLP can apply and perpetuate gender and cultural stereotypes (more here). How is this being addressed?
This next quote is specifically about image data…
“Insurers can leverage machine-vision applications to investigate the photos and videos to discover more about a client's lifestyle, including eating, exercise and smoking habits.”
It’s important to remember that those photos and videos are being interpreted not just in terms of ‘that person is holding a cigarette’, but also in terms of ‘that person’s face resembles that of a smoker’. Again, those same two controls mentioned earlier apply here too. There’s been plenty of evidence that not all face types are handled equally well by such applications.
You may recall this article I wrote back in 2019. It looked at how an insurer funded research programme had “established” that you could predict a person’s mental health from how they smiled in a selfie photo. It relied on controversial and contested science, yet was being openly promoted by the insurer.
As your decisions systems use more and more voice and image data, it’s important that what I call science related controls are implemented. Such controls should weigh up the science that underpins the analysis undertaken by those algorithms. Is it contested and by whom? Does the basis upon which it is contested raise ethical issues? What societal issues are connected with it?
Am I asking a lot here? Well, I’m minded in answering ‘no’ of the recent case of the behavioural scientists whose influential paper on dishonesty in declarations was subsequently withdrawn after it became clear that it was based upon a dishonest use of data. The use case around which this debacle emerged was that of a motor insurer’s use of their findings.
Key Controls for Insurers’ Use of Social Media Data
I’ve raised the need for causation / correlation controls and for discrimination controls above. I would add a few more in support of them:
Controls around consent – to what extent does your use of third party data and data scraping techniques fall within the consent given by the consumer to a) your firm, and b) those sites from which the data has been drawn? Think of this on two levels: the legal level and the ‘consumer trust’ level. You may tick this control in terms of the former level, but what’s the reason for you not considering the latter level?
Controls relating to explainability – to what extent does your use of social media data fall within what is known as logical relatedness (more here)? In other words, are you using social media data in ways that you could explain and justify to a non-insurance but still interested person?
Controls around ethical culture – to what extent has your firm’s project for social media data been assessed for ethical culture risks? How are you managing the cultural levers for such projects to ensure that the first two lines of defence are actually working?
Operational and financial controls relating to pressure – to what extent does the trend in cost / utility for your firm’s use of social media data put pressure on people to push control boundaries? What do protocols say about undue pressure and how do you monitor it?
Quality controls relating to identity – to what extent does your firm factor in the nature of social media use amongst different types of consumer?
To explain that last point. We know that people often present themselves in differing ways on different social media platforms. So for example, how my eldest daughter presents herself on Linkedin is different from how she presents herself on TikTok and the like. All of us, from time immemorial, have presented different aspects of ourselves in different circumstances. Creating one identify from social media data scraped off different platforms ignores this. Yet if that one identity then influences decisions for that person’s policy and claim, those differences will be ignored. Clearly, that’s not good for the quality of automated decisions.
Anecdotal Underwriting
To end of a lighter note, back in 2012, I wrote an article about insurers’ use of social media data and introduced the phrase ‘anecdotal underwriting’. This reflected the fact that not everything posted on social media is serious or accurate! And this is true for some platforms in particular, hence anecdotal underwriting. So in precise world of data and analytics, are you controlling for anecdotes in your decision systems?!