Here’s a type of visualization I haven’t gotten excited to share in a while:
A visualization on entirely static data!
Below are 8,155 policy data points in a scatterplot along with distributions for key features.
All visuals are interactive and linked (‘coordinated’); you can:
drag to filter intervals on the scatterplot or the distributions above it
toggle the features below it
Visuals, summary stats and the table all update like ⚡ thanks to a neat coming-together of technology described below.
Duration is very significant here, so I map it to the color scale; sum assured likewise and I map it to the plot x axis.
There are still variances that can be explored using interactions on other features. Outlier details can be followed-up on using the table.
Linked interaction is also helpful for higher-level exploration:
How many policies are ‘profitable’ versus ‘unprofitable’ (roughly)? What are corresponding values?
Are new policies (low duration in force) ‘profitable’? Are all of them?
“value of future cashflows” & duration
The value is of calculated future cashflows for life term assurance policies; more information linked under ‘data origin & actuarial model’.
The policies are ‘in force’: duration in force indicating how long ago they started.
There are no reserve cashflows or other accounting smoothing included1, so the value profile is heavily dependant on duration:
at later durations policy holders are older, and are usually loss-making due to higher death/claim probabilities
whereas at early durations they are generally profit-making for the insurance company
🖌️🕹️
Show the code 📜
viewof x = {await vg.coordinator().exec( vg.sql`CREATE TABLE IF NOT EXISTS policy_data AS SELECT *, pv_net_cf/policy_count as pv_net_cf_pp, (age_at_entry-20)*4 as age_at_entry_color FROM 'https://calcwithdec.dev/posts/linked-visuals/policy-results-SE-noNB.parquet' --FROM 'http://127.0.0.1:8080/posts/mosaic-cashflow-aggregates/policy-results-SE-noNB.parquet' WHERE (duration_mth != policy_term*12) ` );// policy_data2 workaround for query needed for selection total// (see summary below)await vg.coordinator().exec(vg.sql`CREATE TABLE IF NOT EXISTS policy_data2 AS SELECT * from policy_data`);// can't param the color domain, so I remap age values in age_at_entry_color aboveconst $color_field = vg.Param.value("duration_mth");const color_scheme ="turbo";const $filter = vg.Selection.crossfilter();// for everythingconst $highlight = vg.Selection.intersect();// for termconst $highlight2 = vg.Selection.intersect();// for sexconst dots = vg.plot( vg.frame({ fill:"#fff9fd" }), vg.dot(vg.from("policy_data", { filterBy: $filter }), {x:"sum_assured",y:"pv_net_cf_pp",fill: vg.sql`if(${$color_field}='duration_mth',duration_mth, age_at_entry_color)`,symbol:"sex",minWidth:600,// tip: true, // too much noise and doesn't show p; table better atmr:2 }), vg.colorScheme(color_scheme), vg.aspectRatio(0.8),//vg.grid(true), // doesn't work with frame vg.xLabelAnchor("center"), vg.xLabelOffset(30), vg.xLabel("sum assured"), vg.yLabel("value future cashflows (premiums - claims - expenses incl. commissions)" ), vg.yLabelOffset(-20), vg.name("dots"), vg.intervalXY({ as: $filter }), vg.xDomain([0,1e6]), vg.yDomain([-12.6e3,22.8e3]),// -12.3k to 22.2k vg.width(550),//vg.minWidth(450), vg.height(400-100), vg.colorDomain(vg.Fixed), vg.axisY({ticks:3,tickFormat: d3.format(".1s") }), vg.axisX({// if axiX/Y of these appears earlier interactions break!tickFormat: d3.format(".1s") }),// vg.symbolDomain(vg.Fixed) not possible vg.margins({ left:35,top:20,bottom:30,right:20 }) );// now mostly a lot of repetition:const terms = vg.plot( vg.barY(vg.from("policy_data", { filterBy: $filter }), {x:"policy_term",y: vg.count(),fill:"green" }), vg.yAxis(null), vg.toggleX({ as: $filter }), vg.toggleX({ as: $highlight }), vg.highlight({ by: $highlight }), vg.xDomain(vg.Fixed), vg.axisX({ label:"policy term",labelAnchor:"center" }), vg.axisY({label:"#",labelAnchor:"center",ticks:2,tickFormat: d3.format(".1s") }), vg.width((450*5) /8-100), vg.height(80) );const ages = vg.plot( vg.rectY(vg.from("policy_data", { filterBy: $filter }), {x: vg.bin("age_at_entry"),// must be binned for intervalXy: vg.count(),fill:"steelblue",// how to condition on $color_field ?inset:0.5 }), vg.xAxis("top"), vg.xLabel("age at entry"), vg.xLabelAnchor("center"), vg.yAxis(null), vg.xTicks([]), vg.xDomain(vg.Fixed), vg.intervalX({ as: $filter }),// create an interval selection brush vg.xDomain(vg.Fixed),// don't change the x-axis domain across updates vg.axisY({label:"#",labelAnchor:"center",ticks:2,tickFormat: d3.format(".1s") }), vg.axisX({label:"",ticks: [20,40,60] }), vg.width(400), vg.height(80) );const durs = vg.plot( vg.rectY(vg.from("policy_data", { filterBy: $filter }), {x: vg.bin("duration_mth"),y: vg.count(),fill: vg.bin("duration_mth"),inset:0.5 }), vg.colorDomain(vg.Fixed), vg.colorScheme(color_scheme), vg.xAxis("top"), vg.xLabel("duration in force (yrs)"), vg.xLabelAnchor("center"), vg.yAxis(null), vg.xTicks([]), vg.xDomain(vg.Fixed), vg.intervalX({ as: $filter }),// create an interval selection brush vg.xDomain(vg.Fixed),// don't change the x-axis domain across updates vg.axisX({label:"",tickFormat: (d) => d /12,ticks: [0,120,180,240] }), vg.axisY({label:"#",labelAnchor:"center",ticks:2,tickFormat: d3.format(".1s") }), vg.width(400), vg.height(80) );const sex = vg.plot( vg.barY(vg.from("policy_data", { filterBy: $filter }), {x:"sex",y: vg.count(),fill:"purple" }), vg.toggleX({ as: $filter }), vg.toggleX({ as: $highlight2 }), vg.highlight({ by: $highlight2 }), vg.xDomain(vg.Fixed), vg.axisX({ label:"sex",labelAnchor:"center" }), vg.axisY({label:"#",labelAnchor:"center",ticks:2,tickFormat: d3.format(".1s") }), vg.width((450*3) /8-50), vg.height(80) );const summary_count = vg.plot( vg.text(vg.from("policy_data2", { filterBy: $filter }), {x:0.1,//vg.sum(0),y:0,//vg.sum(1),text: vg.sql`count(*)`,//vg.sum(vg.sql`pv_net_cf_pp`)fontSize:20,//fontFamily: "monospace",anchor:"right",fill: vg.sql`if(sum("pv_net_cf_pp") > 0, 3, 2)`// -> colorDomain -> category10 scheme }), vg.yDomain([0,1]), vg.xDomain([0,1]), vg.colorDomain([1,2,3,4]), vg.colorScheme("category10"), vg.xAxis(null), vg.yAxis(null), vg.axisY({label:"# selected policies ..",labelAnchor:"top",ticks: [] }), vg.height(50) );const summary_value = vg.plot( vg.text(vg.from("policy_data2", { filterBy: $filter }), {x:0.1,//vg.sum(0),y:0,//vg.sum(1),text: vg.sql`sum("pv_net_cf_pp")`,//vg.sum(vg.sql`pv_net_cf_pp`)fontSize:15,fontFamily:"monospace",anchor:"right",fill: vg.sql`if(sum("pv_net_cf_pp") > 0, 3, 2)`// -> colorDomain -> category10 scheme }), vg.yDomain([0,1]), vg.xDomain([0,1]), vg.colorDomain([1,2,3,4]), vg.colorScheme("category10"), vg.xAxis(null), vg.yAxis(null), vg.axisY({label:"selected total value ..",labelAnchor:"top",ticks: [] }), vg.height(50) );const table = vg.table({from:"policy_data",height:150,width:150,filterBy: $filter,format: { p: (d) => d },columns: ["p","pv_net_cf_pp"],// comment this for complete datawidth: { p:50 } });return vg.hconcat( vg.vconcat( ages, vg.vspace(10), durs, vg.vspace(10), dots, vg.hconcat(terms, sex),// 🚧 vg.menu({label:"🚧 color field ⚠️ ",as: $color_field,options: ["age_at_entry","duration_mth"] }) ), vg.hspace(10), vg.vconcat(vg.vspace(90), vg.vconcat(summary_count,summary_value), table, vg.vspace(100)) );}
data origin & actuarial model
Data comes from running lifelib: “an open-source Python package featuring practical actuarial models, tools, and examples”, specifically the Basic Term SE model and in it’s default configuration.
Default model points were randomly generated; but I removed some future new business and reversed a modelled effect of scaling by policy_count: so that now each mark above represents the value of a single policy (pv_net_cf_pp).
My data manipulation carried out in Python is in this notebook 📓.
Other adjustments are included with the visualization code2.
There’s more cool stuff behind-the-scenes. Mosaic is fast like ⚡ by pushing processing to a database, and DuckDB-Wasm is a database that makes this possible in your browser: without the complexity of a remote database, although that’s also an option.
The code is ~250 lines but is a lot of repetition, and I have the flexibility of SQL in there - besides an elegant visualization API that’s also facilated by Observable Plot.
lifelib & Python are other awesome things I was glad to use here; I’ll post more on both in future.
Show the code 📜
// import vgplot and configure Mosaic to use DuckDB-WASMvg = {const vg =awaitimport('https://cdn.jsdelivr.net/npm/@uwdata/vgplot@0.7.1/+esm'); vg.coordinator().databaseConnector(vg.wasmConnector());return vg;}
Footnotes
Reserves and accounting rules are very important in insurance! More on this in a future post↩︎