Blog

  • Carbon
  • Insights
  • Let’s talk about the never-ending world of Machine Learning

Let’s talk about the never-ending world of Machine Learning

 The setup of Machine Learning Operations (MLOps) opens doors to untapped potential of our data models when put in the hands of our talented Graphene team.

Find out more with Nav Dhuti, Data Engineer at Carbon…

Let’s talk about the never-ending world of Machine Learning

Once upon a time we had a small set of risks, claims and other data points to look at; however, the rapid growth of our business has significantly increased the number of data points in our data warehouse. That made some of the tasks done by our team time-consuming. 

 

For example, labelling a claim description for a few records can be done easily by a human but it gets hard when one has to deal with thousands of data points from a previous year portfolio as well as your usual business-as-usual data feeds. Labelling risk and claim descriptions becomes an almost impossible job in a high-volume data warehouse. 

 

The easy solution to the problem is some kind of automation which uses a set of business rules to classify a risk or claim description to a generic label which will open doors to drawing deeper insights from the data.

 

So far so good. Let’s use fuzzy matching to get Carbon categories for each description and tell the world we are using machine learning.

 

However, human language is not easily classified and there are many ways one would write the same thing, e.g.  “car drove into the front door” and “house got hit by a vehicle” are semantically very similar and using a set of logical rules will hardly match these descriptions as a common message. 

 

Hence, machine learning is a go-to solution for this problem. A classification model can help classify these descriptions into their semantically accurate classes.

 

Now let’s build a machine learning model using some sort of public library out there on the internet and let the black box do its magic.

 

As we know it’s a huge investment to actually make machine learning models work in production and cross-integrate them with all other applications within the organisation. Most of the machine learning projects live until the proof-of-concept stage. 

 

This leads some engineers to pick off-the-shelf frameworks that take a black box approach to utilise the power of machine learning. To hit the nail on the head, we looked into the managed services of Google Cloud and picked Vertex AI, a Google Cloud service to operationalise machine learning without worrying about the infrastructure and helping us to focus purely on value generation.

Vertex AI is our gateway to MLOps, allowing us to streamline machine learning projects from the stage of exploratory data analysis to sharing model predictions with other apps in the Graphene ecosystem.

 

Okay, there is a stage to perform now but what about the performance? How is this helping with Carbon category classification?

 

Vertex AI helped us to develop quick and dirty proof of concepts on their Managed Jupyter notebooks (a place for data scientists to write some code and evaluate their hypothesis). We start with claims data as it is in a small quantity. Our claims team prepared a list of Carbon categories for a given claim description. We used that to train our ML model and provide predictions for new claim descriptions.

 

Let’s dive into this process a little deeper. In the field of Natural Language Processing where understanding the semantics of the text is important, generating high quality text embeddings is critical. In simple terms, an embedding is a numeric representation of a text. However, how will that embedding be generated for a given text which is semantically accurate? To solve this problem, we took the Transfer Learning route. We used Google’s Universal Sentence Encoder (trained on billions of words on the internet) to generate semantically accurate embeddings for Claim descriptions.

 

Having high quality embeddings, we evaluated a few classification models and decided to stick with a simple k-nearest neighbour classifier which allowed us to classify Claim categories. As a result, “leakage in the roof”, for example, is classified as “Water Damage”.

 

We have put a data pipeline in place to automatically classify new claim descriptions as the data arrives in our data warehouse, this is done using our trained KNN model sitting behind Vertex AI-managed prediction endpoints. On the other side, our claims team flags any incorrect matches, and we use the active feedback to improve the model. Scheduled notebook executions on Vertex AI made this process flawless.

 

Vertex AI to classify claims, is that it?

 

Claims classification was our first project on the Vertex AI enabled ML infrastructure. This has now made it easy to train risk category classification models and opened doors to new machine learning ideas.

 

Our philosophy of cross-team collaboration and full cycle development is baked into our MLOps as well. Allowing anyone in the team to take on machine learning projects, share findings and deploy into production.

 

Lastly, machine learning is going to help our team to work faster, smarter and discover the unseen from our rapidly growing data warehouse.

Contact

Registered office address:

5th Floor, 20 Gracechurch Street, London, EC3V 0BG

Carbon Underwriting is a trading name of Carbon Underwriting Limited which is an appointed representative of Davies MGA Services Ltd, a company authorised and regulated by the Financial Conduct Authority under firm reference number 597301 to carry on insurance distribution activities. Carbon Underwriting Limited is registered in England and Wales company number 11193856. Registered office at 5th Floor, 20 Gracechurch Street, London, EC3V 0BG

© Copyright 2021 Carbon Underwriting