Actuarial Science in Python – Loss Triangles

Loss Triangles are an essential tool for Property & Casualty actuary. They are a visual representation of the development of claims overtime, and are a starting point for many predictive analysis techniques such as the Chain Ladder method.

The code and example data is available at my GitHub.

For this example, we will look at a Policy Year  triangle with 12 Month developments. Since the purpose of this article is to demonstrate how to create a loss triangle in Python, I will not go into too much of the technical, but look out for future posts where I’ll go into more details.

The following triangle (created using Excel and simulated data) is an example of our end goal. Data is presented as of December-31st-2017.

ExampleCumulativeTriangle.PNG

How do we interpret this triangle? Let’s take a look at the first row, denoted by Policy Year 2014. Any policy which starts (the “effective date”) anytime in 2014 will fall into this category. The columns denote the development in months. For policy year 20XX and development Y, the corresponding cell represents the cumulative claims reported for policies written in 20XX as of Y months from January-1-20XX.

For example, take a look at PY 2014 development 24. This tells us that: “As of Dec-31st-2015, there have been $1,000 worth of claims reported on policies written during 2014″. Take a moment to reason to yourself why the data presents itself as a triangle. You’ll note that as each 12 month period passes, a new entry is added for each policy year along the diagonal, hence increasing the size of the triangle.

Loss Triangles In Python

More often than not, loss triangles are created in Excel. While this works well for creating the triangles, learning how to replicate this process in Python (or R!) will allow you to easily apply analysis techniques via additional Python scripts.

First, let’s take a look at the raw transactional data (again, simulated for example purposes):

This looks like typical claims data. Each row represents a transaction, and we’re given the Policy ID, Claim ID, Dates and Transaction Amounts. In this example, we will be focusing on creating a Policy Year claims triangle as is the above example.

If you do not have Python set up yet, I recommenced doing a Google search for one of the many guides or videos to help. We will be using the NumPy and Pandas packages:

import numpy as np
import pandas as pd

We will now import the data. It is good practice to check to see if there is any missing data as well:

path = "C:/MY_PATH/ClaimsExample.csv"
claims = pd.read_csv(path, sep = ",")
claims = pd.DataFrame(claims)

# Check for missing data
claims.isnull().any()
Out[8]: 
Policy_ID                False
Claim_ID                 False
Policy_Effective_Date    False
Accident_Date            False
Report_Date              False
Transaction_Date         False
Indemnity                False
Expenses                 False
Total                    False
dtype: bool

Now, the first thing we’ll need to calculate is the development or lag for each line. In this case, the lag is calculated as the Transaction Year – Policy Year + 1. Note the semantics here – we are calculating the amount reported for the Policy as at Dec/31 of the Transaction year, thus the “+1” is needed.

We must extract the Year from the given dates. The methodology used will depend on your data formatting.

claims['Policy_Year'] = [int(x[-4:]) for x in claims['Policy_Effective_Date']]
claims['Transaction_Year'] = [int(x[-4:]) for x in claims['Transaction_Date']]

Next, we can calculate the lags.

claims['PY_Lag'] = claims['Transaction_Year'] - claims['Policy_Year'] + 1

Let’s take a look at a few rows of our data now for spot checking.

claims[:5]
Out[23]: 
   Policy_ID  Claim_ID Policy_Effective_Date Accident_Date Report_Date  \
0        101         1              1/1/2010      7/1/2010    8/1/2010   
1        101         1              1/1/2010      7/1/2010    8/1/2010   
2        101         1              1/1/2010      7/1/2010    8/1/2010   
3        101         1              1/1/2010      7/1/2010    8/1/2010   
4        101         1              1/1/2010      7/1/2010    8/1/2010   

  Transaction_Date  Indemnity  Expenses  Total  Policy_Year  Transaction_Year  \
0         9/1/2010      100.0      50.0  150.0         2010              2010   
1        11/1/2010      200.0      20.0  220.0         2010              2010   
2         1/1/2011      300.0       0.0  300.0         2010              2011   
3         3/1/2011      200.0       0.0  200.0         2010              2011   
4         5/1/2011       50.0       0.0   50.0         2010              2011   

   PY_Lag  
0       1  
1       1  
2       2  
3       2  
4       2 

Looks like our code worked! As expected, lines with the same PY and TY show a lag of one (e.g. 12 months).

Next we get to use a bit of Pandas Dataframe manipulation. We are only interested in the development in the aggregate, thus we need to sum over the policies at each lag for each policy year. We’re going to combine losses and expenses in this example, thus we use the total column.

Here is the code and result, which I will walk through:

py_data = claims['Total'].groupby([claims['Policy_Year'],claims['PY_Lag']]).sum().reset_index()
py_data
Out[25]: 
   Policy_Year  PY_Lag    Total
0         2010       1    370.0
1         2010       2   6910.0
2         2010       3   -200.0
3         2010       4   -600.0
4         2011       1    960.0
5         2011       2   4320.0
6         2011       3    100.0
7         2012       1   1090.0
8         2012       2   8550.0
9         2013       1  21720.0

This may seem complicated, but we can break it down into steps:

  • claims['Total'] : We are choosing the Total column
  • groupby([claims['Policy_Year'],claims['PY_Lag']]) : We want to see the total column in the aggregate for each combination of Policy_Year and PY_Lag
  • sum(): We are choosing sum as our aggregate function
  • reset_index() : This is a bit trickier, it forces the DataFrame to reset the “index” for each line so that Policy_Year and PY_Lag are forced into their own columns. This may be easier to explain by looking at the result without the reset_index function. Try it out yourself!

We are now ready to make out triangle. The Panadas function “pivot_table” works very similarly to Pivot Tables in Excel, which you might be more familiar with. We first need to calculate the cumulative sum at each lag for our policy years. This is done using a similar technique to above.

py_data['cumsum'] = py_data["Total"].groupby(py_data["Policy_Year"]).cumsum()
py_cumu_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["cumsum"])
print(py_cumu_triangle)
              cumsum                        
PY_Lag             1       2       3       4
Policy_Year                                 
2010           370.0  7280.0  7080.0  6480.0
2011           960.0  5280.0  5380.0     NaN
2012          1090.0  9640.0     NaN     NaN
2013         21720.0     NaN     NaN     NaN

Success! We have created a loss triangle using Python. If you are familiar with loss triangles you will notice that this one does not look like a triangle you would expect for stable business. That, however, is something to be discussed in a future post.

Post Script:

We are also sometimes interested in a incremental loss triangle. We can easily edit the code above to accomplish this. Try out the code below yourself:

py_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["Total"])
print(py_triangle)
               Total                      
PY_Lag             1       2      3      4
Policy_Year                               
2010           370.0  6910.0 -200.0 -600.0
2011           960.0  4320.0  100.0    NaN
2012          1090.0  8550.0    NaN    NaN
2013         21720.0     NaN    NaN    NaN
Advertisements

Magic the Gathering: Pack Cracking Statistics (Oath of the Gatewatch)

In addition to writing actuarial exams, I am a massive nerd when it comes to the game Magic: The Gathering. One thing that has always irked my statistician side about M:TG journalism is that when it comes to statistics, the only thing ever talked about is the expected value of opening packs of cards. I’m here to set that straight by demonstrating an even more important factor, variance.

Hold up, what is M:TG?

Magic: the Gathering is a trading card game that began in 1993 and has since exploded in popularity to become the most popular physical card game on the planet. If you have never played before I encourage you to find a friend that does (might be easier than you think!) and get them to teach you the basics. The game has something to offer for all skill levels and is a fantastic social activity.

The Basics of Pack Cracking

Each regular pack of Magic cards contain 15 cards distributed over the four main rarities: Common, Uncommon, Rare and Mythic. A normal pack will contain 10 Commons, 3 Uncommons, 1 Rare OR 1 Mythic and 1 “Land” card. There is also a chance that there will be a “premium” foil card taking the place of one common. These probabilities are as follows:

  • Chance of a Mythic Rare: 1 in 8 packs, or 12.5%
  • Chance of a Foil card (any rarity): 1 in 6 packs, or 16.7%

If you buy a “box” of cards, you get 36 packs. From this we can calculate the expected number (the Expected Value) of Mythics and Foils we get in our box:

  • Expected Mythics: 0.125*36 = 4.5
  • Expected Foils: 0.1666*36 = 6

Most people will stop here with the calculations. I’m going to go a step further and show you some other ways to look at what we expect from our boxes of cards.

The Simple Model

I will be using “RStudio” to complete the math for the rest of this article, feel free to skip over the coding. I’m adding it in for anyone who would like to reproduce my results.

When we open a pack of cards, we are actually performing a Bernoulli Trial. Put simply this says we have two possible outcomes: a “Rare” or a “Mythic”. Using the probability from above, let’s simulate cracking an entire box of cards:

p.mythic <- c(1/8)

# Simulate a box (36 packs) for Rares
packs <- rbinom(36,1,1-p.mythic) > results
[1] “Rare”   “Rare”   “Mythic” “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Mythic” “Rare”
[12] “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Mythic” “Rare”   “Rare”   “Rare”   “Rare”
[23] “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”
[34] “Rare”   “Mythic” “Rare”

In a nicer form:

results
Mythic   Rare
     4     32

Congrats! You got four mythics in your box. We can do the same thing to simulate our foils:

results2
   Foil No Foil
      4      32

Hmm, looks like we came up with a disappointing 4 foils (we expected 6). Ask yourself, why did this happen? You’re right, it’s variance.

Simulating 100,000 Boxes

To visualize how the distribution of how many mythics you get will change, we have to repeat this experiment many times. Many, many times. For those interested, check out The Law of Large Numbers. Let’s take a look at the results of this simulation:

 boxes.mythics <- rbinom(10000,36,p.mythic) hist(boxes.mythics, col = “lightblue”, xlab =“Number of Mythics”, ylab = “Number of Boxes”, main = “Distribution of Mythics in Boxes”) 

mythics
We can do a similar analysis for foils:

Foils

I think this is a pretty great visualization as we can see that even though we “Expect” to get 6 foils, we can clearly see that it’s actually more likely that we get a number other than 6. Some poor soul may only get 0-2 foils, where some lucky duck will walk away with 8 or more! Sure, we “knew” this already, but this proves it.

In fact, we can calculate the exact probability of getting a specified number of Mythics or Foils using a Binomial DistributionThe binomial distribution results from performing consecutive, independent Bernoulli trials. The resulting chart is depicted here:

chances

The Complicated Model

In the most recent Magic expansion, Oath of the Gatewatch, there are 70 commons, 60 uncommons, 42 rares and 12 mythics for a total of 184 cards. There is also a very small chance of opening an Expedition Land, which command a high price. I will be using the same assumptions as above, with the following additional foil assumptions:

  • Commons and uncommons are equally distributed over the foils
  • The chance of your foil being rare or mythic is equal to their proportion of over all cards (52/184), and then the card has ~4 times chance of being a Foil Rare over a Foil Mythic
  • Expeditions will be treated the same as the calculated value for Foil Mythic Rares (the article states it is slightly less rare, but I have no way to make an educated guess as to how much and it is a very small amount anyway)

I know what some of you are saying. There is an often repeated assumption that a Foil Mythic is a 1:216 pack occurrence and that’s that. I scoured the internet to find a source and everything seems to back to a single article from StarCity who themselves source a website that is now offline. Additionally, those numbers are for a Large set (200+ cards) whereas Oath is a small sized set. From my numbers below I hope you can agree that these seem like reasonable jumps.

When we open a pack, we have multiple different possibilities:

  • Rare or Mythic
  • No Foil, Common Foil, Uncommon Foil, Rare Foil, Mythic Foil
  • No Expedition, Expedition

And any combination of the three. If you trust my math, here are the probabilities for each of the combinations:

tablecards2

What we have here is a Multinomial DistributionJust like before, we can simulate boxes. Let’s open 10 boxes and see what we get:

                                   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
Rare|No Foil|No Expedition           25   29   24   26   29   30   24   22   26    29
Mythic|No Foil|No Expedition          4    3    2    4    4    2    6    7    2     4
Rare|Common Foil|No Expedition        2    2    4    3    1    2    2    1    4     1
Rare|Uncommon Foil|No Expedition      4    0    1    0    0    1    2    2    1     0
Rare|Rare Foil|No Expedition          1    0    3    1    1    0    1    1    0     2
Mythic|Common Foil|No Expedition      0    0    0    1    1    0    1    3    1     0
Mythic|Uncommon Foil|No Expedition    0    1    1    0    0    0    0    0    1     0
Rare|Mythic Foil|No Expedition        0    0    1    1    0    1    0    0    0     0
Rare|No Foil|Expedition               0    1    0    0    0    0    0    0    1     0
Mythic|Rare Foil|No Expedition        0    0    0    0    0    0    0    0    0     0
Mythic|Mythic Foil|No Expedition      0    0    0    0    0    0    0    0    0     0
Mythic|No Foil|Expedition             0    0    0    0    0    0    0    0    0     0
Rare|Common Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Rare|Uncommon Foil|Expedition         0    0    0    0    0    0    0    0    0     0
Rare|Rare Foil|Expedition             0    0    0    0    0    0    0    0    0     0
Mythic|Common Foil|Expedition         0    0    0    0    0    0    0    0    0     0
Mythic|Uncommon Foil|Expedition       0    0    0    0    0    0    0    0    0     0
Rare|Mythic Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Mythic|Rare Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Mythic|Mythic Foil|Expedition         0    0    0    0    0    0    0    0    0     0

Two Expeditions in 10 boxes. I hope they were good ones.

I plan on doing more Magic related posts in the future! If you liked this post and want to see more, message me your ideas on Reddit /u/NaturalBlogarithm and be sure to follow me @NatBlogarithm on Twitter so you don’t miss out.

Aspiring Actuary: Calculator Skills for Exam MFE

As I write this article there is just over a month until the next sitting of the Society of Actuaries “Models for Financial Economics” (MFE) Exam. If you plan on writing it this sitting, good luck! If I had one piece of advice it would be to learn how to use your calculator quickly and accurately – I’m going to teach you some tricks that will make some of the seemingly daunting questions straightforward.

What calculator should I be using?

The SOA maintains a list of calculators that one can bring to an official exam. If you’re going to take just one thing away from this article make it that the  TI-30XS MultiView Calculator is the only calculator you will ever need. (Well, possibly the BA II, but that’s for another article). With enough practice and knowledge of the calculator’s unique functions I guarantee you will finish problems much faster than if you had the single view version. Let me convince you why with a couple examples.

Using the Memory Function

Did you just have to convert a nice, round annual effective interest rate to a disgusting, compounded fortnightly, discount rate? Yeah, there’s no way we’ll ever accurately type that number back in. Here is where the “sto->” and “x y z” buttons come in handy. After you hit enter to get your answer, simply hit “sto ->” ,”x”,”enter” and your entry will be saved. You can then recall it in a later function by hitting the “x” button again.

Let’s try this out by converting 6% to it’s semi-annually compound equivalent, saving it, then accumulating $100 today six months forward:

Screen1Screen2Screen3

Easy eh? Best part is, you can store up to seven variables at once which really comes in handy during some of the more lengthy binomial tree questions.

That was boring, my TI-30XIIS can do that too.

All right, I hear you. Let me turn your attention to a specific type of question, this one is #51 in the SOA’s Official MFE practice questions (which if you didn’t know about, today is your lucky day because you just gained an invaluable study resource). Here it is:

question

Now, I’m going to focus on the part of the answer where you need to calculate the sample variance of the given data. I’ll leave the rest of the question as an exercise to the reader (I’ve always wanted to be able to say that).

As someone who is still reading this, you probably know the formula for sample variance is as follows:

S_x = \frac{1}{n-1} \sum_{i=1}^n (x_j - \bar{x})^2

Now, for this question we need the sample variance of the continuously compounded monthly returns. We find those numbers by taking the (natural) logarithm of the price one month over the price the month before, ending up with six values. Here is my chicken scratch version of doing this by hand:

scratch
I really should have opted for the printer with a scanner on boxing day…

Can you imagine doing that during an exam? That’s not even the whole question! Let me show you how to do this much quicker.

The “data” function

See that oddly named “data” button on the second row of the multiview? Click it. You’ll come to this screen:

Screen4

You’ll learn to love this screen. Enter the values that we want to find the variance of one by one, pressing enter after each on. Note – I’m entering these as “ln(56/54) -> enter -> ln(48/56) -> enter …”. The calculator does the necessary calculation automatically!

Screen5

Now, hit “2nd -> quit”. Your numbers will be saved. Now we’re ready to see some magic: hit “2nd -> data(stat)”. This brings us to the STATS screen. Since we are only dealing with line one, hit “1: 1-Var Stats”, “Data: L1”, “FRQ: ONE”. Hit CALC.

Screen6Screen7Screen8

This screen shows a list of different statistics that the calculator has calculated for you. We can see the first number, n, is our number of entries, x bar is our average, and Sx is our sample standard deviation! Scroll down to Sx, press enter, and it will show up as a variable in the calculation screen. Remember, variance is standard deviation squared, so square this number:

Screen9

Look familiar? That was much quicker than doing it by hand, and there is a much lower chance that you’ll mistype something when using your calculator. Using this method can really make a difference when it comes to these tedious calculations.

You probably noticed that there is more the the data screen than what we used. In a future article I’ll be looking at some methods for using the TI-30XS Multiview for Exam C, so look out for it!

-R

How much are your Student Council executives making?

The purpose of university student councils, in theory, is to give a voice to the students and representation to the administration of the university. Most also act as the governing body for all school clubs, and provide student-oriented events and services. I personally would like to thank the Feds Used Books store for selling me countless textbooks for half the price that some poor soul paid two semester ago.

While student councils do some great things, it doesn’t take long to find their criticisms online. Just try to do a search for the University of Waterloo’s Federation of Students (“Feds”) on our subreddit and take a look at the results. Here is a snippet of the titles from the top posts:

  • Why do you hate FEDS? (61 comments)
  • Can we just get rid of FEDs? (70 comments)
  • Good job Feds election /s (16 comments)

We get a similar feeling when we search for the University of Toronto’s student union, UTSU:

  • UTSU Fraud cover-up (119 comments)
  • How to disband UTSU? (74 comments)
  • An honest question: What does the UTSU do? (31 comments)

Why is there so much disdain for these organizations which are composed of our fellow students who are working in our best interest? That is a question well beyond the scope of this post, but if you feel like spending your evening debating it is an excellent question to ask on your school’s subreddit or OMG UW equivalent.

A surprising number of students (by no fault of their own) are unaware that some of the student council fees they pay go towards paying the people in the organization. The part I want to focus on is the salaries that the executives of these organizations are making. Seeing as we all pay for these salaries as part of out tuition, I believe that we should be aware as to where our money is going.

Now, keep a couple things in mind:

  • These are full-time, year long positions. The people in these positions are delaying their graduation to provide a service to the student body.
  • I found the most recent salary numbers I could, but not all are for the 2015 year. You can see my list of sources below if you want to check them out.
  • Due to how some financial statements were laid out, some salaries required making an educated guess based on Total Executive Salaries divided by Number of Executives.
  • If you see a mistake and can provide a source I will gladly update the numbers.

Here are the salaries/stipends provided to the top level (President or equivalent) executives for the largest schools in the country:

chart

These numbers include all benefits. 

Note that these salaries are comparable to the rest of the executive teams – all schools have between 4-6 people on the executive council that are getting paid the same amount as the president. Without a doubt, this graph shows that the Waterloo FEDs executives are compensated higher than their counterparts at other schools, though all schools seem to ensure their student counsel executives are comfortable during their tenure.

I hesitate to show this next graph, as it may not be a fair comparison. Each school has a different way of reporting incidental fees and this will definitely effect how the fees are collected. For example, I believe that the budget for all clubs at Waterloo supported by Feds is lumped together in the Feds fees, whereas at other school there may be separate fees associated with different clubs. Nevertheless, here are the fees paid each semester by each full time student associated with their student council:

fees

I’m interesting in hearing your thoughts on these results – if you’re coming from Reddit feel free to message me at /u/NaturalBlogarithm or send me a tweet at @NatBlogarithm.

My list of sources can be found in this Excel file. Call me out if something is wrong!