Loss Triangles are an essential tool for Property & Casualty actuary. They are a visual representation of the development of claims overtime, and are a starting point for many predictive analysis techniques such as the Chain Ladder method.
The code and example data is available at my GitHub.
For this example, we will look at a Policy Year triangle with 12 Month developments. Since the purpose of this article is to demonstrate how to create a loss triangle in Python, I will not go into too much of the technical, but look out for future posts where I’ll go into more details.
The following triangle (created using Excel and simulated data) is an example of our end goal. Data is presented as of December-31st-2017.
How do we interpret this triangle? Let’s take a look at the first row, denoted by Policy Year 2014. Any policy which starts (the “effective date”) anytime in 2014 will fall into this category. The columns denote the development in months. For policy year 20XX and development Y, the corresponding cell represents the cumulative claims reported for policies written in 20XX as of Y months from January-1-20XX.
For example, take a look at PY 2014 development 24. This tells us that: “As of Dec-31st-2015, there have been $1,000 worth of claims reported on policies written during 2014″. Take a moment to reason to yourself why the data presents itself as a triangle. You’ll note that as each 12 month period passes, a new entry is added for each policy year along the diagonal, hence increasing the size of the triangle.
Loss Triangles In Python
More often than not, loss triangles are created in Excel. While this works well for creating the triangles, learning how to replicate this process in Python (or R!) will allow you to easily apply analysis techniques via additional Python scripts.
First, let’s take a look at the raw transactional data (again, simulated for example purposes):
This looks like typical claims data. Each row represents a transaction, and we’re given the Policy ID, Claim ID, Dates and Transaction Amounts. In this example, we will be focusing on creating a Policy Year claims triangle as is the above example.
If you do not have Python set up yet, I recommenced doing a Google search for one of the many guides or videos to help. We will be using the NumPy and Pandas packages:
import numpy as np import pandas as pd
We will now import the data. It is good practice to check to see if there is any missing data as well:
path = "C:/MY_PATH/ClaimsExample.csv" claims = pd.read_csv(path, sep = ",") claims = pd.DataFrame(claims) # Check for missing data claims.isnull().any()
Out: Policy_ID False Claim_ID False Policy_Effective_Date False Accident_Date False Report_Date False Transaction_Date False Indemnity False Expenses False Total False dtype: bool
Now, the first thing we’ll need to calculate is the development or lag for each line. In this case, the lag is calculated as the Transaction Year – Policy Year + 1. Note the semantics here – we are calculating the amount reported for the Policy as at Dec/31 of the Transaction year, thus the “+1” is needed.
We must extract the Year from the given dates. The methodology used will depend on your data formatting.
claims['Policy_Year'] = [int(x[-4:]) for x in claims['Policy_Effective_Date']] claims['Transaction_Year'] = [int(x[-4:]) for x in claims['Transaction_Date']]
Next, we can calculate the lags.
claims['PY_Lag'] = claims['Transaction_Year'] - claims['Policy_Year'] + 1
Let’s take a look at a few rows of our data now for spot checking.
claims[:5] Out: Policy_ID Claim_ID Policy_Effective_Date Accident_Date Report_Date \ 0 101 1 1/1/2010 7/1/2010 8/1/2010 1 101 1 1/1/2010 7/1/2010 8/1/2010 2 101 1 1/1/2010 7/1/2010 8/1/2010 3 101 1 1/1/2010 7/1/2010 8/1/2010 4 101 1 1/1/2010 7/1/2010 8/1/2010 Transaction_Date Indemnity Expenses Total Policy_Year Transaction_Year \ 0 9/1/2010 100.0 50.0 150.0 2010 2010 1 11/1/2010 200.0 20.0 220.0 2010 2010 2 1/1/2011 300.0 0.0 300.0 2010 2011 3 3/1/2011 200.0 0.0 200.0 2010 2011 4 5/1/2011 50.0 0.0 50.0 2010 2011 PY_Lag 0 1 1 1 2 2 3 2 4 2
Looks like our code worked! As expected, lines with the same PY and TY show a lag of one (e.g. 12 months).
Next we get to use a bit of Pandas Dataframe manipulation. We are only interested in the development in the aggregate, thus we need to sum over the policies at each lag for each policy year. We’re going to combine losses and expenses in this example, thus we use the total column.
Here is the code and result, which I will walk through:
py_data = claims['Total'].groupby([claims['Policy_Year'],claims['PY_Lag']]).sum().reset_index()
py_data Out: Policy_Year PY_Lag Total 0 2010 1 370.0 1 2010 2 6910.0 2 2010 3 -200.0 3 2010 4 -600.0 4 2011 1 960.0 5 2011 2 4320.0 6 2011 3 100.0 7 2012 1 1090.0 8 2012 2 8550.0 9 2013 1 21720.0
This may seem complicated, but we can break it down into steps:
claims['Total']: We are choosing the Total column
groupby([claims['Policy_Year'],claims['PY_Lag']]): We want to see the total column in the aggregate for each combination of Policy_Year and PY_Lag
sum(): We are choosing sum as our aggregate function
reset_index(): This is a bit trickier, it forces the DataFrame to reset the “index” for each line so that Policy_Year and PY_Lag are forced into their own columns. This may be easier to explain by looking at the result without the reset_index function. Try it out yourself!
We are now ready to make out triangle. The Panadas function “pivot_table” works very similarly to Pivot Tables in Excel, which you might be more familiar with. We first need to calculate the cumulative sum at each lag for our policy years. This is done using a similar technique to above.
py_data['cumsum'] = py_data["Total"].groupby(py_data["Policy_Year"]).cumsum() py_cumu_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["cumsum"])
print(py_cumu_triangle) cumsum PY_Lag 1 2 3 4 Policy_Year 2010 370.0 7280.0 7080.0 6480.0 2011 960.0 5280.0 5380.0 NaN 2012 1090.0 9640.0 NaN NaN 2013 21720.0 NaN NaN NaN
Success! We have created a loss triangle using Python. If you are familiar with loss triangles you will notice that this one does not look like a triangle you would expect for stable business. That, however, is something to be discussed in a future post.
We are also sometimes interested in a incremental loss triangle. We can easily edit the code above to accomplish this. Try out the code below yourself:
py_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["Total"])
print(py_triangle) Total PY_Lag 1 2 3 4 Policy_Year 2010 370.0 6910.0 -200.0 -600.0 2011 960.0 4320.0 100.0 NaN 2012 1090.0 8550.0 NaN NaN 2013 21720.0 NaN NaN NaN