Linked visuals for data exploration 🔍🎯

visualization
actuarial
Author
Published

April 11, 2024

Here’s a type of visualization I haven’t gotten excited to share in a while:

A visualization on entirely static data!

Below are 8,155 policy data points in a scatterplot along with distributions for key features.

All visuals are interactive and linked (‘coordinated’); you can:

Visuals, summary stats and the table all update like ⚡ thanks to a neat coming-together of technology described below.

Duration is very significant here, so I map it to the color scale; sum assured likewise and I map it to the plot x axis.

There are still variances that can be explored using interactions on other features. Outlier details can be followed-up on using the table.

Linked interaction is also helpful for higher-level exploration:

The value is of calculated future cashflows for life term assurance policies; more information linked under ‘data origin & actuarial model’.

The policies are ‘in force’: duration in force indicating how long ago they started.

There are no reserve cashflows or other accounting smoothing included1, so the value profile is heavily dependant on duration:

  • at later durations policy holders are older, and are usually loss-making due to higher death/claim probabilities
  • whereas at early durations they are generally profit-making for the insurance company

🖌️🕹️


Data comes from running lifelib: “an open-source Python package featuring practical actuarial models, tools, and examples”, specifically the Basic Term SE model and in it’s default configuration.

These default model points were randomly generated; but I removed some future new business and reversed a modelled effect of scaling by policy_count: so that now each mark above represents the value of one policy (pv_net_cf_pp).

My data manipulation carried out in Python is in this notebook 📓.

Other adjustments are included with the visualization code2.

interaction technology

I made this because I was excited to try Mosaic: a visualization framework from UW Interactive Data Lab and CMU Data Interaction Group: the same people who develop Vega, Vega Lite3 along with other cutting-edge visualization/interaction work and research.

There’s more cool stuff behind-the-scenes. Mosaic is fast like ⚡ by pushing processing to a database, and DuckDB-Wasm is a database that makes this possible in your browser: without the complexity of a remote database, although that’s also an option.

The code is ~250 lines but is a lot of repetition, and I have the flexibility of SQL in there - besides an elegant visualization API that’s also facilated by Observable Plot.

Obligatory

If ~8k data points doesn’t impress you, try 10 million or else 7.6 million !

lifelib & Python

lifelib & Python are other awesome things I was glad to use here; I’ll post more on both in future.

Footnotes

  1. Reserves and accounting rules are very important in insurance! More on this in a future post↩︎

  2. View Source linked at the bottom of this page↩︎

  3. I use Vega/Vega Lite in most of my published visualization/interaction work↩︎

|