Actuarial Science in Python – Loss Triangles

Posted by

Loss Triangles are an essential tool for Property & Casualty actuary. They are a visual representation of the development of claims overtime, and are a starting point for many predictive analysis techniques such as the Chain Ladder method.

The code and example data is available at my GitHub.

For this example, we will look at a Policy Year  triangle with 12 Month developments. Since the purpose of this article is to demonstrate how to create a loss triangle in Python, I will not go into too much of the technical, but look out for future posts where I’ll go into more details.

The following triangle (created using Excel and simulated data) is an example of our end goal. Data is presented as of December-31st-2017.

ExampleCumulativeTriangle.PNG

How do we interpret this triangle? Let’s take a look at the first row, denoted by Policy Year 2014. Any policy which starts (the “effective date”) anytime in 2014 will fall into this category. The columns denote the development in months. For policy year 20XX and development Y, the corresponding cell represents the cumulative claims reported for policies written in 20XX as of Y months from January-1-20XX.

For example, take a look at PY 2014 development 24. This tells us that: “As of Dec-31st-2015, there have been $1,000 worth of claims reported on policies written during 2014″. Take a moment to reason to yourself why the data presents itself as a triangle. You’ll note that as each 12 month period passes, a new entry is added for each policy year along the diagonal, hence increasing the size of the triangle.

Loss Triangles In Python

More often than not, loss triangles are created in Excel. While this works well for creating the triangles, learning how to replicate this process in Python (or R!) will allow you to easily apply analysis techniques via additional Python scripts.

First, let’s take a look at the raw transactional data (again, simulated for example purposes):

This looks like typical claims data. Each row represents a transaction, and we’re given the Policy ID, Claim ID, Dates and Transaction Amounts. In this example, we will be focusing on creating a Policy Year claims triangle as is the above example.

If you do not have Python set up yet, I recommenced doing a Google search for one of the many guides or videos to help. We will be using the NumPy and Pandas packages:

import numpy as np
import pandas as pd

We will now import the data. It is good practice to check to see if there is any missing data as well:

path = "C:/MY_PATH/ClaimsExample.csv"
claims = pd.read_csv(path, sep = ",")
claims = pd.DataFrame(claims)

# Check for missing data
claims.isnull().any()
Out[8]: 
Policy_ID                False
Claim_ID                 False
Policy_Effective_Date    False
Accident_Date            False
Report_Date              False
Transaction_Date         False
Indemnity                False
Expenses                 False
Total                    False
dtype: bool

Now, the first thing we’ll need to calculate is the development or lag for each line. In this case, the lag is calculated as the Transaction Year – Policy Year + 1. Note the semantics here – we are calculating the amount reported for the Policy as at Dec/31 of the Transaction year, thus the “+1” is needed.

We must extract the Year from the given dates. The methodology used will depend on your data formatting.

claims['Policy_Year'] = [int(x[-4:]) for x in claims['Policy_Effective_Date']]
claims['Transaction_Year'] = [int(x[-4:]) for x in claims['Transaction_Date']]

Next, we can calculate the lags.

claims['PY_Lag'] = claims['Transaction_Year'] - claims['Policy_Year'] + 1

Let’s take a look at a few rows of our data now for spot checking.

claims[:5]
Out[23]: 
   Policy_ID  Claim_ID Policy_Effective_Date Accident_Date Report_Date  \
0        101         1              1/1/2010      7/1/2010    8/1/2010   
1        101         1              1/1/2010      7/1/2010    8/1/2010   
2        101         1              1/1/2010      7/1/2010    8/1/2010   
3        101         1              1/1/2010      7/1/2010    8/1/2010   
4        101         1              1/1/2010      7/1/2010    8/1/2010   

  Transaction_Date  Indemnity  Expenses  Total  Policy_Year  Transaction_Year  \
0         9/1/2010      100.0      50.0  150.0         2010              2010   
1        11/1/2010      200.0      20.0  220.0         2010              2010   
2         1/1/2011      300.0       0.0  300.0         2010              2011   
3         3/1/2011      200.0       0.0  200.0         2010              2011   
4         5/1/2011       50.0       0.0   50.0         2010              2011   

   PY_Lag  
0       1  
1       1  
2       2  
3       2  
4       2 

Looks like our code worked! As expected, lines with the same PY and TY show a lag of one (e.g. 12 months).

Next we get to use a bit of Pandas Dataframe manipulation. We are only interested in the development in the aggregate, thus we need to sum over the policies at each lag for each policy year. We’re going to combine losses and expenses in this example, thus we use the total column.

Here is the code and result, which I will walk through:

py_data = claims['Total'].groupby([claims['Policy_Year'],claims['PY_Lag']]).sum().reset_index()
py_data
Out[25]: 
   Policy_Year  PY_Lag    Total
0         2010       1    370.0
1         2010       2   6910.0
2         2010       3   -200.0
3         2010       4   -600.0
4         2011       1    960.0
5         2011       2   4320.0
6         2011       3    100.0
7         2012       1   1090.0
8         2012       2   8550.0
9         2013       1  21720.0

This may seem complicated, but we can break it down into steps:

  • claims['Total'] : We are choosing the Total column
  • groupby([claims['Policy_Year'],claims['PY_Lag']]) : We want to see the total column in the aggregate for each combination of Policy_Year and PY_Lag
  • sum(): We are choosing sum as our aggregate function
  • reset_index() : This is a bit trickier, it forces the DataFrame to reset the “index” for each line so that Policy_Year and PY_Lag are forced into their own columns. This may be easier to explain by looking at the result without the reset_index function. Try it out yourself!

We are now ready to make out triangle. The Panadas function “pivot_table” works very similarly to Pivot Tables in Excel, which you might be more familiar with. We first need to calculate the cumulative sum at each lag for our policy years. This is done using a similar technique to above.

py_data['cumsum'] = py_data["Total"].groupby(py_data["Policy_Year"]).cumsum()
py_cumu_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["cumsum"])
print(py_cumu_triangle)
              cumsum                        
PY_Lag             1       2       3       4
Policy_Year                                 
2010           370.0  7280.0  7080.0  6480.0
2011           960.0  5280.0  5380.0     NaN
2012          1090.0  9640.0     NaN     NaN
2013         21720.0     NaN     NaN     NaN

Success! We have created a loss triangle using Python. If you are familiar with loss triangles you will notice that this one does not look like a triangle you would expect for stable business. That, however, is something to be discussed in a future post.

Post Script:

We are also sometimes interested in a incremental loss triangle. We can easily edit the code above to accomplish this. Try out the code below yourself:

py_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["Total"])
print(py_triangle)
               Total                      
PY_Lag             1       2      3      4
Policy_Year                               
2010           370.0  6910.0 -200.0 -600.0
2011           960.0  4320.0  100.0    NaN
2012          1090.0  8550.0    NaN    NaN
2013         21720.0     NaN    NaN    NaN
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s