Actuarial Science in Python – Loss Triangles

Loss Triangles are an essential tool for Property & Casualty actuary. They are a visual representation of the development of claims overtime, and are a starting point for many predictive analysis techniques such as the Chain Ladder method.

The code and example data is available at my GitHub.

For this example, we will look at a Policy Year  triangle with 12 Month developments. Since the purpose of this article is to demonstrate how to create a loss triangle in Python, I will not go into too much of the technical, but look out for future posts where I’ll go into more details.

The following triangle (created using Excel and simulated data) is an example of our end goal. Data is presented as of December-31st-2017.

ExampleCumulativeTriangle.PNG

How do we interpret this triangle? Let’s take a look at the first row, denoted by Policy Year 2014. Any policy which starts (the “effective date”) anytime in 2014 will fall into this category. The columns denote the development in months. For policy year 20XX and development Y, the corresponding cell represents the cumulative claims reported for policies written in 20XX as of Y months from January-1-20XX.

For example, take a look at PY 2014 development 24. This tells us that: “As of Dec-31st-2015, there have been $1,000 worth of claims reported on policies written during 2014″. Take a moment to reason to yourself why the data presents itself as a triangle. You’ll note that as each 12 month period passes, a new entry is added for each policy year along the diagonal, hence increasing the size of the triangle.

Loss Triangles In Python

More often than not, loss triangles are created in Excel. While this works well for creating the triangles, learning how to replicate this process in Python (or R!) will allow you to easily apply analysis techniques via additional Python scripts.

First, let’s take a look at the raw transactional data (again, simulated for example purposes):

This looks like typical claims data. Each row represents a transaction, and we’re given the Policy ID, Claim ID, Dates and Transaction Amounts. In this example, we will be focusing on creating a Policy Year claims triangle as is the above example.

If you do not have Python set up yet, I recommenced doing a Google search for one of the many guides or videos to help. We will be using the NumPy and Pandas packages:

import numpy as np
import pandas as pd

We will now import the data. It is good practice to check to see if there is any missing data as well:

path = "C:/MY_PATH/ClaimsExample.csv"
claims = pd.read_csv(path, sep = ",")
claims = pd.DataFrame(claims)

# Check for missing data
claims.isnull().any()
Out[8]: 
Policy_ID                False
Claim_ID                 False
Policy_Effective_Date    False
Accident_Date            False
Report_Date              False
Transaction_Date         False
Indemnity                False
Expenses                 False
Total                    False
dtype: bool

Now, the first thing we’ll need to calculate is the development or lag for each line. In this case, the lag is calculated as the Transaction Year – Policy Year + 1. Note the semantics here – we are calculating the amount reported for the Policy as at Dec/31 of the Transaction year, thus the “+1” is needed.

We must extract the Year from the given dates. The methodology used will depend on your data formatting.

claims['Policy_Year'] = [int(x[-4:]) for x in claims['Policy_Effective_Date']]
claims['Transaction_Year'] = [int(x[-4:]) for x in claims['Transaction_Date']]

Next, we can calculate the lags.

claims['PY_Lag'] = claims['Transaction_Year'] - claims['Policy_Year'] + 1

Let’s take a look at a few rows of our data now for spot checking.

claims[:5]
Out[23]: 
   Policy_ID  Claim_ID Policy_Effective_Date Accident_Date Report_Date  \
0        101         1              1/1/2010      7/1/2010    8/1/2010   
1        101         1              1/1/2010      7/1/2010    8/1/2010   
2        101         1              1/1/2010      7/1/2010    8/1/2010   
3        101         1              1/1/2010      7/1/2010    8/1/2010   
4        101         1              1/1/2010      7/1/2010    8/1/2010   

  Transaction_Date  Indemnity  Expenses  Total  Policy_Year  Transaction_Year  \
0         9/1/2010      100.0      50.0  150.0         2010              2010   
1        11/1/2010      200.0      20.0  220.0         2010              2010   
2         1/1/2011      300.0       0.0  300.0         2010              2011   
3         3/1/2011      200.0       0.0  200.0         2010              2011   
4         5/1/2011       50.0       0.0   50.0         2010              2011   

   PY_Lag  
0       1  
1       1  
2       2  
3       2  
4       2 

Looks like our code worked! As expected, lines with the same PY and TY show a lag of one (e.g. 12 months).

Next we get to use a bit of Pandas Dataframe manipulation. We are only interested in the development in the aggregate, thus we need to sum over the policies at each lag for each policy year. We’re going to combine losses and expenses in this example, thus we use the total column.

Here is the code and result, which I will walk through:

py_data = claims['Total'].groupby([claims['Policy_Year'],claims['PY_Lag']]).sum().reset_index()
py_data
Out[25]: 
   Policy_Year  PY_Lag    Total
0         2010       1    370.0
1         2010       2   6910.0
2         2010       3   -200.0
3         2010       4   -600.0
4         2011       1    960.0
5         2011       2   4320.0
6         2011       3    100.0
7         2012       1   1090.0
8         2012       2   8550.0
9         2013       1  21720.0

This may seem complicated, but we can break it down into steps:

  • claims['Total'] : We are choosing the Total column
  • groupby([claims['Policy_Year'],claims['PY_Lag']]) : We want to see the total column in the aggregate for each combination of Policy_Year and PY_Lag
  • sum(): We are choosing sum as our aggregate function
  • reset_index() : This is a bit trickier, it forces the DataFrame to reset the “index” for each line so that Policy_Year and PY_Lag are forced into their own columns. This may be easier to explain by looking at the result without the reset_index function. Try it out yourself!

We are now ready to make out triangle. The Panadas function “pivot_table” works very similarly to Pivot Tables in Excel, which you might be more familiar with. We first need to calculate the cumulative sum at each lag for our policy years. This is done using a similar technique to above.

py_data['cumsum'] = py_data["Total"].groupby(py_data["Policy_Year"]).cumsum()
py_cumu_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["cumsum"])
print(py_cumu_triangle)
              cumsum                        
PY_Lag             1       2       3       4
Policy_Year                                 
2010           370.0  7280.0  7080.0  6480.0
2011           960.0  5280.0  5380.0     NaN
2012          1090.0  9640.0     NaN     NaN
2013         21720.0     NaN     NaN     NaN

Success! We have created a loss triangle using Python. If you are familiar with loss triangles you will notice that this one does not look like a triangle you would expect for stable business. That, however, is something to be discussed in a future post.

Post Script:

We are also sometimes interested in a incremental loss triangle. We can easily edit the code above to accomplish this. Try out the code below yourself:

py_triangle = pd.pivot_table(py_data, index = ["Policy_Year"], columns = ["PY_Lag"], values = ["Total"])
print(py_triangle)
               Total                      
PY_Lag             1       2      3      4
Policy_Year                               
2010           370.0  6910.0 -200.0 -600.0
2011           960.0  4320.0  100.0    NaN
2012          1090.0  8550.0    NaN    NaN
2013         21720.0     NaN    NaN    NaN

Writing Two Actuarial Exams at Once: A Reflection

For the May 2016 exam sitting I decided to take on the daunting task of writing CAS Exam 5 and 6C at the same time. This was my first attempt at both of the exams. Spoiler alert: I passed Exam 5, and ended up with a disappointingly close 5 on Exam 6C. Despite not getting the result that I had hoped for, I thought I would share some of my thoughts on the experience as I’ve had a few people ask.

Would I recommend you study for multiple exams at once?

We’ll start by answering this question and working backwards. It’s a tricky one to answer: On one hand, by passing one of the two exams, I have not “lost” an exam period and have still progressed closer to my designation. On the other hand, take a look at this chart of my past exam results:

ExamChart

It’s quite evident that there has been a drop in my results this past sitting compared to the average. In fact, getting a 6 on Exam 5 means I came just as close to failing both exams as I did to passing both. We do have to take into account that the upper level (5+) CAS exams are a totally different beast than the Prelims, however I had felt that I had adequately adjusted my study time to compensate for that and was still surprised by the lower results.

I would say that if you’re considering doubling up, you’ll have to ask yourself if you’d be willing to deal with the possibility that you could have passed one exam had you focussed solely on one or the other, but instead you failed both. It’s surprising how even failing one of the two really deflated my enthusiasm for having passed the other.

How do I split my study time?

I would estimate I spent 60% of my study time on Exam 6C and 40% of my time on Exam 5. They are vastly different exams with very little overlap. Sometimes this actually worked to my advantage, as when I got bored or frustrated with the material for one exam, I was more likely to switch over to studying for the other rather than opening up Facebook or Reddit. However, even with these synergies the time commitment was much higher. I lost quite a few weeknights and weekends to studying that, had it not been the dead of winter, I’m not so sure I would have been so committed to.

I would suggest studying for more closely related exams if you decide on doubling up. For example, MFE and FM have quite a lot in common, or even FM and MLC. There was the additional added factor that these were the first written CAS exams I had taken. Although I haven’t yet received my grade distribution, I’m willing bet I lost quite a few marks on the CAS “question words”. The CAS has an excellent article which warns about this pitfall which I feel I may have underestimated.

Was it worth it?

In hindsight, I don’t regret my decision, however I doubt I would do it again. It’s comforting to know that I will already have the basis for Exam 6 when I start studying for the Fall sitting, which may allow me to enjoy some more sunshine over the Summer months. It was disheartening to get so close to my ACAS, but hey, sometimes you need to slow down and smell the roses.

Congratulations to everyone who passed an exam this sitting – I did see on Reddit and the AO that at least a couple people passed 5 and S at the same time, fantastic job! I would love to hear thoughts from anyone else who has taken multiple exams.

Advertisements

Data Science: Where Are Canadians Travelling?

Where Are Canadians Travelling To?

Despite living in the second largest country in the world, sometimes you just need to get away from things, eh? Statistics Canada provides a yearly summary table depicting the top travel destinations for Canadians in the form of the Travel Survey of Residents of Canada (TSRC). The visualizations give us some insight into our habits and how global events affect our travel plans. Visualization have all been written in R, largely using the gpplot2 package.

All code and a cleaned up data file can be found on my GitHub.

A Short Primer on the Data

I have compiled all the summary tables into a single file, as well as a CPI table and a Canadian Population table. The data is presented in yearly intervals, thus line charts will contain a single point for each year. The data collected spans from 2001-2015, and represents the top 15 countries visited by residents of Canada.

The travel data has five data entries:
– Year
– Country
– Number of Visits
– Nights Spent
– Dollar Spent in Canadian Dollars

For those with questions about the survey methodology, data imputation strategy and the assumptions (i.e. visiting multiple countries), please check out the Official CANSIM Documentation.

The Results

For those who are interested in seeing a walk through of the code, read through the results an head to “Data Science In R: Canadian Travel Data“.

First, let’s look at a table of the countries based on the number of times they appear in the top 15:

Country Frequency
China 15
Cuba 15
Dominican Republic 15
France 15
Germany 15
Italy 15
Mexico 15
Netherlands 15
Spain 15
United Kingdom 15
United States 15
Hong Kong 14
Ireland 12
Switzerland 11
Australia 6
Jamaica 5
Japan 5
Austria 4
Greece 2
Bahamas 1

Does anything here surprise you? This chart is lacking in much of the detail we need to draw conclusions. Instead, let’s take a look at the following Bump Chart. It ranks the countries as in order of number of visits over time. Note that the labels account only for the countries left in 2015, as some of the countries fall off the chart as time progresses.

BumpChartRecolor

There are some interesting take away from this chart. As probably expected, Canadians have always made the United States their number one travel destination. The United Kingdom was competing with Mexico for second place for a number of years before falling, along with France, to 5th and 6th place respectively in 2009. The Netherlands also sat comfortably at 10th place until nearly dropping off the chart in the past few years. What could be the causes of this?

Let’s try to quantify some of these trips. By using a stacked area plot on the number of trips (“Visits”) taken, we can get a sense of the relativity:

VisitsCount

Wow! The United States is overwhelmingly the largest importer of Canadian Travelers. Maybe not too surprising though, considering 75% of Canadians live within 100 miles of the American Border.

How Long Are We Staying?

For this, I’ve taken the average number of nights stayed in 2015 per visit. For interests sake I’ve also overlayed the cost of a flight to that country, chosen as the lowest result Google provides for a Wednesday flight six weeks from the writing of this article to the countries capital city. Note that for some countries, especially the United States, the cost of a flight can vary wildly, so these are far from rigorous.

AverageNights

Diving Into the US Data

Since the US represents such a large portion of the data, I thought it’d be interesting to take a deeper delve into the Canadian – American relationship. Based on the data available, we can answer a few questions:

How Many Visits Have Canadians Made to the US Over Time?

For this question, I’ve opted to show two ways of looking at the data. The unadjusted line shows the raw number of visits given by the data. I have also opted to make a population adjustment, bringing the number of visits to the 2015 level. This was done by using Statistics Canada Population Data, and using the following formula:

 Adjusted Visits 20XX = (Population of Canada 2015) / (Population of Canada 20XX) * (Number of Visits 20XX)

I have also added a couple notes which we can discuss after viewing the results:

Rplot01

You’ll notice a general upward trend, even adjusted for population, however there are some noticeable drops. I’ve made two suggestions to explain these drops:

  1. The 2008 Recession reduced the desire or ability to travel to the United States due to lack of funds
  2. The sudden drop of the $CDN from par ($1 CDN = $1 USD) around 2013 made it less appealing to travel to the US due to the cost.

Do you disagree or have additional thoughts? Leave me a comment and let me know.

How Much Are Canadians Spending While in the US?

This is a bit more tricky to answer. We have some competing forces at play here, and questions about how we calculate spending. Do we look at per night or per visit? How to we account for the exchange rate and inflation? Let’s first take a look at the raw data of average spending. Don’t forget that all the numbers are presented in Canadian Dollars.

AverageSpentUS

At face value, it looks like we have been spending more per night as time progresses. However, we can tease more information out of this data. Here is my methodology for adjusting the amount paid:

  1. Say, for example, we assume in 2002 we spent 85 Canadian Dollars
  2. The exchange rate in 2002 was about $1 USD = $1.5 CDN. Thus we spent 85/1.5 = $57 USD.
  3. Accounting for inflation at 2% per year, this is now worth $57 * (1.02)^13 = $74 USD in 2015.
  4. The exhange rate in 2015 was about $1 USD = $1.25 CDN. Thus we spent $74 * 1.25 = $92.50 CDN

To be a bit more technical, I have pulled the Canadian CPI to use as an inflation calculation. Additionally, the CAN-USD FX Rates are provided at January-1 of each year by Statistics Canada. Here is the resulting graph with these adjustments:

AdjustedSpent

This shows a much different picture. It looks like early in the decade, Canadians were spending more and more on their travels to the US, and in the more recent years they’ve been spending less. Take these results with a grain of salt – There are still many possible confounding variables and questions we could ask our self about the data, such as:

  1. Can we make an adjustment for the number of nights stayed?
    • One of the largest travel expenses for most people is likely the flight costs. The more nights you stay, the more nights this cost will be spread over.
  2. Are Canadians choosing different travel destinations within the US?
    • There are near countless vacation destinations within the US. These all come with their own costs and stay duration.

Come up with your own questions and have a try for yourself at answering some of them! You can find the code on my GitHub and a walkthrough of the code in my follow up article, “Data Science In R: Canadian Travel Data”

Sources:
Travel Data http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/arts37j-eng.htm
CPI: http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/econ46a-eng.htm
Population: http://www5.statcan.gc.ca/cansim/a26
FX Rates: http://www.bankofcanada.ca/rates/exchange/legacy-noon-and-closing-rates/

Data Science in R: Canada Travelling Code Walkthrough

Data Science in R: Canadian Travel Data Walkthrough.

For background, first read through the results in my previous blog post. All code is available on my GitHub.

Data Sources and Notes

The data for this analysis was sourced from Statistics Canada. I’ve compiled it into one file (including CPI, FX and Population) called “Travel.xlsx”, available in the GitHub folder.

Importing and Cleaning the Data

I prefer using the openxlsx package simply because it allows you to easily import Excel files without converting them to .csv first. The code is pretty straightfoward.

fromCanada <- read.xlsx("Travel.xlsx", sheet = "TravelData")

Let’s take a look at the data:

> fromCanada
                 Country Visits Nights Spent_CDN Year
1          United States  20702 216628     19929 2015
2                 Mexico   1926  25650      2306 2015
3         United Kingdom   1192  16688      1714 2015
4                 France   1140  15041      1549 2015
5                   Cuba    800   8216       708 2015
6                Germany    574   5523       542 2015
7     Dominican Republic    487   6871       561 2015
8                  China    480  10042      1009 2015
9                  Italy    469   5871       583 2015
10                 Spain    399   4993       382 2015
11  Ireland, Republic Of    254   3510       291 2015
12             Australia    245   5158       664 2015
13             Hong Kong    243   3246       319 2015
...
179              Ireland    144   1633       185 2004
180          Switzerland    142    980       105 2004
181        United States  12666  97333      8075 2003
182               Mexico    716   7375       790 2003
183       United Kingdom    684   8624       821 2003
184               France    509   6468       671 2003
185                 Cuba    495   4408       451 2003
186   Dominican Republic    415   3983       403 2003
187              Germany    331   3297       278 2003
188                Italy    248   3279       384 2003
189          Netherlands    165   1401       110 2003
190                Spain    154   2017       199 2003
191          Switzerland    125    911        98 2003
192                Japan    122   1643       193 2003
193       Mainland China    115   2245       197 2003
194              Austria    109    586        77 2003

Something you may have noticed is that we have differing names for some countries over the years. This happens often with piecemeal data like this, as surveying methods may slightly differ each year. We can use the dplyr package to clean these up:

fromCanada <- fromCanada %>%
  mutate(Country = ifelse(Country == "Republic of Ireland" | Country == "Republic Of Ireland" | Country ==  "Ireland, Republic Of", "Ireland",
                          ifelse(Country == "Mainland China", "China", Country)))

There are couple powerful tools in the dplyr package used here:

  • The %>%, or pipe tool.

While not actually a true function in itself, this pipe make the code look cleaner and improves readability. The pipe works as follows (side note – I can’t for the life of me get LaTeX to work on this WordPress.com blog. Any tips?):

x %>% f(y) = f(x,y)

As you can see, the pipe will insert whatever you put before it into whatever function comes after it. This is very useful for nested functions with many levels.

  • mutate(data, …)

The mutate function creates new columns in your data frame while also preserving the old ones. What we’ve done here is told mutate to overwrite the current value in the Country column based on the If/Else/Or logic that follows. If you had chosen another column name, say Country_new, this column would have been added to the end.

Lines 31-38 are a quick hack to assign the ranking of the countries to each year based on the number of visits. I’ll leave that one to read through yourself.

Bump Chart

Bump charts are a cool way to show changes over time, especially usefull with ranking data as we have. The ggplot2 package does a decent job of creating this, though require some adjustments to make them look nice.

ggplot(fromCanada, aes(Year,YearRank,color = Country)) +
  geom_line(size=1) +
  scale_y_continuous(breaks = c(1:15), trans = "reverse") +
  scale_x_continuous(breaks = c(2001:2015)) +
  theme(legend.position="none") +
  ggtitle("Top 15 Travel Destinations by Canadians") +
  geom_text(data = subset(fromCanada, Year == "2015"), aes(label = Country, x = Year +1.2),
            size =3.5, hjust = 0.0 ) +
  scale_color_manual(values=colors)  +
  theme(panel.background = element_blank())

BumpChartRecolor

Here are the parts that I consider the least intuitive:

  1. aes(..., color = Country)

Luckily for us, ggplot is a very smart function. By telling it to color the graph based on the the Country, it knows to group the data for each country as it’s own line. Try for yourself and see what the graph would look like without the grouping.

  1. geom_text(data = subset(fromCanada, Year == "2015"), aes(label = Country, x = Year +1.2), size =3.5, hjust = 0.60)

This one is a mouthful. Here is what it’s doing:

  • geom_text(): We want to insert text into our graph.
  • data = subset(fromCanada, Year == "2015"): We will be sourcing the text from the fromCanada dataframe, but only looking at the 2015 year.
  • aes(label = Country, x = Year +1.2): From the data, take the Country names and use that as our label. We want to place those labels on the x-axis at 2015 + 1.2 = 2016.2
  • size =3.5, hjust = 0.0: These adjust the size and horizontal placement of the labels.

Cumulative Visits Charts

Jumping down the code a bit, you’ll see that some clean-up has been done before creating the graphs. Take a look at the difference between the two following graphs:

Why does this happen? It has to do with the fact that some of the countries do not appear in all the years. Take a read through This StackExchange post for details. To fix this, we can use the xtabs function to insert zeros into the data for all missing points:

Visitsnew <-xtabs(Visits~Country+Year, fromCanada) %>%
  as.data.frame() %>%
  transmute(Country, Year, Visits = Freq)
  1. xtabs(Visits~Country+Year, fromCanada):
    • This creates a matrix of Country by Year, with Visits as the entry in the matrix. 0 is inserted if unknown.
  2. transmute(...)
    • Similar to mutate, however it throws out any columns you don’t name. Thus here we’re just renaming Freq to Visits.

Last Notes

A quick shout out again to the dplyr package – in line 226 you’ll notice the use of left_join(). While R has similar functions, I prefer the dplyr version as it is almost identical to the more familiar SQL functions. Type ?left_join into R to see all the available joins.

Want more detail on any of the code? Leave me a comment and I’ll get back to you!

Aspiring Actuary: 5 Tips for Your First Actuarial Exam

As all good (aspiring) actuaries do I will start this article with a few assumptions. First, I assume you have some familiarity with the actuarial exam process (check out my previous article if needed). I’m also going to assume that you are writing one of the Computer-Based Testing (CBT) Exams – likely Exam P or Exam FM. Here are a few tips I’ve compiled that have helped me succeed with the “prelim” SOA/CAS exams.

1. Start Early

If I wanted to make this article even more click-baity than it already was, I would list this as Step 0. Underestimating the time commitment it takes to write these exams is the number one trap I see people falling into. I would equate the time spent on a prelim exam approximately to an upper year university math course – something that you can’t (or shouldn’t) study for in a just a few days. Remember, the pass mark on these exams is ~70%, which means there is less room for error than what you may be used to.

The exam syllabus has a wide breadth. I would use the 100 meter puddle analysis; you are covering a little bit of each of many topics. This lends itself well to setting a schedule. When you sign up for your exam date, take a look at the number of days you have left and set yourself up an Excel spreadsheet with all the topics and your goal completion date. Don’t forget to leave yourself time for practice problems!

2. Get a Good Calculator, and Get Good With It.

This is a very close second in my list of most important tips. The exams will be crunched for time, and it is essential that you get familiar and accurate with your calculator. My Exam 5 (CAS – Basic Ratemaking) proctor make a comment after our exam about this: “It’s amazing to see you kids these days, bringing multiple calculators to the exam and using both your thumbs to type!”

Remember that the SOA has an approved list of calculators. For all exams I would recommend the TI-30XS MultiView (You can support this blog by buying it via the Amazon link). It has the advantage of being able to easily scroll through past calculations as well as many summary statistic functions which can provide shortcuts for certain problems. Above all the buttons on this calculator are responsive and accurate, meaning fewer mistakes. If you have any questions on how to use it please leave me a comment.

3. Choose the Right Study Material (For You)

You are “lucky” in the sense that the prelims have a large amount of study material to choose from. Many companies publishes study guides, practice exams and online exams to help you learn the material and get a feel for the types of questions you’ll be asked. Of course, you can also study from the source material (the textbooks found on the syllabus), however many exam passers you speak to will argue for their alternative study source.

Of course the one caveat for these materials is that they can get quite expensive, and seeing as you are writing your first actuarial exam you may not have the backing of a company yet to pay for it. In this case I would recommend trying the free samples that each of the companies put out – you’ll have the advantage of finding the study material that works best for you as well as getting some extra study time in.

4. Utilize Old Exams and Example Questions

These are the single most important study tool at your disposal. They’re also free! For the CBT exams, the SOA has released sets of practice questions as well as online practice exams for Exam P and Exam FM. For whatever reason the SOA doesn’t seem to have the .PDFs posted on a page on their website, but they’re easy enough to find by Googling “SOA Exam X Practice Problems”.

There will always be debates as to whether these questions are harder or easier than the actual exams, but they are definitely an excellent way to familiarize yourself with the SOA’s exam problem wording. I recommend you study these closely and ensure that you fully understand the answers.

5. Ask Yourself If You’re Being Productive

You may have heard the “100 hour rule” that is thrown around in actuarial circles. It states that for every hour an exam is, you should put in 100 hours worth of study time. For the first few exams this equates to 300 hours of quality studying. Remember, the pass rate for Exam P hovers around 50% (see Actuarial-Lookup.com). While there are always exceptions, I would be comfortable with claiming that if you follow this rule and put in 300 hours of dedicated studying time I would expect you to be in the passing half of the group. This requires discipline – ask yourself while you’re studying if you’re actually studying. Did you spend 20 minutes of the last hour browsing Facebook? That time doesn’t count.

At the same time, ensure that you take breaks. Or, if you’re on a roll but finding the material isn’t sticking, I find that switching to a new topic or question set will reboot my motivation.

Magic the Gathering: Pack Cracking Statistics (Oath of the Gatewatch)

In addition to writing actuarial exams, I am a massive nerd when it comes to the game Magic: The Gathering. One thing that has always irked my statistician side about M:TG journalism is that when it comes to statistics, the only thing ever talked about is the expected value of opening packs of cards. I’m here to set that straight by demonstrating an even more important factor, variance.

Hold up, what is M:TG?

Magic: the Gathering is a trading card game that began in 1993 and has since exploded in popularity to become the most popular physical card game on the planet. If you have never played before I encourage you to find a friend that does (might be easier than you think!) and get them to teach you the basics. The game has something to offer for all skill levels and is a fantastic social activity.

The Basics of Pack Cracking

Each regular pack of Magic cards contain 15 cards distributed over the four main rarities: Common, Uncommon, Rare and Mythic. A normal pack will contain 10 Commons, 3 Uncommons, 1 Rare OR 1 Mythic and 1 “Land” card. There is also a chance that there will be a “premium” foil card taking the place of one common. These probabilities are as follows:

  • Chance of a Mythic Rare: 1 in 8 packs, or 12.5%
  • Chance of a Foil card (any rarity): 1 in 6 packs, or 16.7%

If you buy a “box” of cards, you get 36 packs. From this we can calculate the expected number (the Expected Value) of Mythics and Foils we get in our box:

  • Expected Mythics: 0.125*36 = 4.5
  • Expected Foils: 0.1666*36 = 6

Most people will stop here with the calculations. I’m going to go a step further and show you some other ways to look at what we expect from our boxes of cards.

The Simple Model

I will be using “RStudio” to complete the math for the rest of this article, feel free to skip over the coding. I’m adding it in for anyone who would like to reproduce my results.

When we open a pack of cards, we are actually performing a Bernoulli Trial. Put simply this says we have two possible outcomes: a “Rare” or a “Mythic”. Using the probability from above, let’s simulate cracking an entire box of cards:

p.mythic &amp;lt;- c(1/8)

# Simulate a box (36 packs) for Rares
packs &amp;lt;- rbinom(36,1,1-p.mythic) &amp;amp;amp;amp;amp;gt; results
[1] “Rare”   “Rare”   “Mythic” “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Mythic” “Rare”
[12] “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Mythic” “Rare”   “Rare”   “Rare”   “Rare”
[23] “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”   “Rare”
[34] “Rare”   “Mythic” “Rare”

In a nicer form:

results
Mythic   Rare
     4     32

Congrats! You got four mythics in your box. We can do the same thing to simulate our foils:

results2
   Foil No Foil
      4      32

Hmm, looks like we came up with a disappointing 4 foils (we expected 6). Ask yourself, why did this happen? You’re right, it’s variance.

Simulating 100,000 Boxes

To visualize how the distribution of how many mythics you get will change, we have to repeat this experiment many times. Many, many times. For those interested, check out The Law of Large Numbers. Let’s take a look at the results of this simulation:

 boxes.mythics &lt;- rbinom(10000,36,p.mythic) hist(boxes.mythics, col = “lightblue”, xlab =“Number of Mythics”, ylab = “Number of Boxes”, main = “Distribution of Mythics in Boxes”) 

mythics
We can do a similar analysis for foils:

Foils

I think this is a pretty great visualization as we can see that even though we “Expect” to get 6 foils, we can clearly see that it’s actually more likely that we get a number other than 6. Some poor soul may only get 0-2 foils, where some lucky duck will walk away with 8 or more! Sure, we “knew” this already, but this proves it.

In fact, we can calculate the exact probability of getting a specified number of Mythics or Foils using a Binomial DistributionThe binomial distribution results from performing consecutive, independent Bernoulli trials. The resulting chart is depicted here:

chances

The Complicated Model

In the most recent Magic expansion, Oath of the Gatewatch, there are 70 commons, 60 uncommons, 42 rares and 12 mythics for a total of 184 cards. There is also a very small chance of opening an Expedition Land, which command a high price. I will be using the same assumptions as above, with the following additional foil assumptions:

  • Commons and uncommons are equally distributed over the foils
  • The chance of your foil being rare or mythic is equal to their proportion of over all cards (52/184), and then the card has ~4 times chance of being a Foil Rare over a Foil Mythic
  • Expeditions will be treated the same as the calculated value for Foil Mythic Rares (the article states it is slightly less rare, but I have no way to make an educated guess as to how much and it is a very small amount anyway)

I know what some of you are saying. There is an often repeated assumption that a Foil Mythic is a 1:216 pack occurrence and that’s that. I scoured the internet to find a source and everything seems to back to a single article from StarCity who themselves source a website that is now offline. Additionally, those numbers are for a Large set (200+ cards) whereas Oath is a small sized set. From my numbers below I hope you can agree that these seem like reasonable jumps.

When we open a pack, we have multiple different possibilities:

  • Rare or Mythic
  • No Foil, Common Foil, Uncommon Foil, Rare Foil, Mythic Foil
  • No Expedition, Expedition

And any combination of the three. If you trust my math, here are the probabilities for each of the combinations:

tablecards2

What we have here is a Multinomial DistributionJust like before, we can simulate boxes. Let’s open 10 boxes and see what we get:

                                   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
Rare|No Foil|No Expedition           25   29   24   26   29   30   24   22   26    29
Mythic|No Foil|No Expedition          4    3    2    4    4    2    6    7    2     4
Rare|Common Foil|No Expedition        2    2    4    3    1    2    2    1    4     1
Rare|Uncommon Foil|No Expedition      4    0    1    0    0    1    2    2    1     0
Rare|Rare Foil|No Expedition          1    0    3    1    1    0    1    1    0     2
Mythic|Common Foil|No Expedition      0    0    0    1    1    0    1    3    1     0
Mythic|Uncommon Foil|No Expedition    0    1    1    0    0    0    0    0    1     0
Rare|Mythic Foil|No Expedition        0    0    1    1    0    1    0    0    0     0
Rare|No Foil|Expedition               0    1    0    0    0    0    0    0    1     0
Mythic|Rare Foil|No Expedition        0    0    0    0    0    0    0    0    0     0
Mythic|Mythic Foil|No Expedition      0    0    0    0    0    0    0    0    0     0
Mythic|No Foil|Expedition             0    0    0    0    0    0    0    0    0     0
Rare|Common Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Rare|Uncommon Foil|Expedition         0    0    0    0    0    0    0    0    0     0
Rare|Rare Foil|Expedition             0    0    0    0    0    0    0    0    0     0
Mythic|Common Foil|Expedition         0    0    0    0    0    0    0    0    0     0
Mythic|Uncommon Foil|Expedition       0    0    0    0    0    0    0    0    0     0
Rare|Mythic Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Mythic|Rare Foil|Expedition           0    0    0    0    0    0    0    0    0     0
Mythic|Mythic Foil|Expedition         0    0    0    0    0    0    0    0    0     0

Two Expeditions in 10 boxes. I hope they were good ones.

I plan on doing more Magic related posts in the future! If you liked this post and want to see more, message me your ideas on Reddit /u/NaturalBlogarithm and be sure to follow me @NatBlogarithm on Twitter so you don’t miss out.

Aspiring Actuary: Calculator Skills for Exam MFE

As I write this article there is just over a month until the next sitting of the Society of Actuaries “Models for Financial Economics” (MFE) Exam. If you plan on writing it this sitting, good luck! If I had one piece of advice it would be to learn how to use your calculator quickly and accurately – I’m going to teach you some tricks that will make some of the seemingly daunting questions straightforward.

What calculator should I be using?

The SOA maintains a list of calculators that one can bring to an official exam. If you’re going to take just one thing away from this article make it that the  TI-30XS MultiView Calculator is the only calculator you will ever need. (Well, possibly the BA II, but that’s for another article). With enough practice and knowledge of the calculator’s unique functions I guarantee you will finish problems much faster than if you had the single view version. Let me convince you why with a couple examples.

Using the Memory Function

Did you just have to convert a nice, round annual effective interest rate to a disgusting, compounded fortnightly, discount rate? Yeah, there’s no way we’ll ever accurately type that number back in. Here is where the “sto->” and “x y z” buttons come in handy. After you hit enter to get your answer, simply hit “sto ->” ,”x”,”enter” and your entry will be saved. You can then recall it in a later function by hitting the “x” button again.

Let’s try this out by converting 6% to it’s semi-annually compound equivalent, saving it, then accumulating $100 today six months forward:

Screen1Screen2Screen3

Easy eh? Best part is, you can store up to seven variables at once which really comes in handy during some of the more lengthy binomial tree questions.

That was boring, my TI-30XIIS can do that too.

All right, I hear you. Let me turn your attention to a specific type of question, this one is #51 in the SOA’s Official MFE practice questions (which if you didn’t know about, today is your lucky day because you just gained an invaluable study resource). Here it is:

question

Now, I’m going to focus on the part of the answer where you need to calculate the sample variance of the given data. I’ll leave the rest of the question as an exercise to the reader (I’ve always wanted to be able to say that).

As someone who is still reading this, you probably know the formula for sample variance is as follows:

S_x = \frac{1}{n-1} \sum_{i=1}^n (x_j - \bar{x})^2

Now, for this question we need the sample variance of the continuously compounded monthly returns. We find those numbers by taking the (natural) logarithm of the price one month over the price the month before, ending up with six values. Here is my chicken scratch version of doing this by hand:

scratch
I really should have opted for the printer with a scanner on boxing day…

Can you imagine doing that during an exam? That’s not even the whole question! Let me show you how to do this much quicker.

The “data” function

See that oddly named “data” button on the second row of the multiview? Click it. You’ll come to this screen:

Screen4

You’ll learn to love this screen. Enter the values that we want to find the variance of one by one, pressing enter after each on. Note – I’m entering these as “ln(56/54) -> enter -> ln(48/56) -> enter …”. The calculator does the necessary calculation automatically!

Screen5

Now, hit “2nd -> quit”. Your numbers will be saved. Now we’re ready to see some magic: hit “2nd -> data(stat)”. This brings us to the STATS screen. Since we are only dealing with line one, hit “1: 1-Var Stats”, “Data: L1”, “FRQ: ONE”. Hit CALC.

Screen6Screen7Screen8

This screen shows a list of different statistics that the calculator has calculated for you. We can see the first number, n, is our number of entries, x bar is our average, and Sx is our sample standard deviation! Scroll down to Sx, press enter, and it will show up as a variable in the calculation screen. Remember, variance is standard deviation squared, so square this number:

Screen9

Look familiar? That was much quicker than doing it by hand, and there is a much lower chance that you’ll mistype something when using your calculator. Using this method can really make a difference when it comes to these tedious calculations.

You probably noticed that there is more the the data screen than what we used. In a future article I’ll be looking at some methods for using the TI-30XS Multiview for Exam C, so look out for it!

-R

How much are your Student Council executives making?

The purpose of university student councils, in theory, is to give a voice to the students and representation to the administration of the university. Most also act as the governing body for all school clubs, and provide student-oriented events and services. I personally would like to thank the Feds Used Books store for selling me countless textbooks for half the price that some poor soul paid two semester ago.

While student councils do some great things, it doesn’t take long to find their criticisms online. Just try to do a search for the University of Waterloo’s Federation of Students (“Feds”) on our subreddit and take a look at the results. Here is a snippet of the titles from the top posts:

  • Why do you hate FEDS? (61 comments)
  • Can we just get rid of FEDs? (70 comments)
  • Good job Feds election /s (16 comments)

We get a similar feeling when we search for the University of Toronto’s student union, UTSU:

  • UTSU Fraud cover-up (119 comments)
  • How to disband UTSU? (74 comments)
  • An honest question: What does the UTSU do? (31 comments)

Why is there so much disdain for these organizations which are composed of our fellow students who are working in our best interest? That is a question well beyond the scope of this post, but if you feel like spending your evening debating it is an excellent question to ask on your school’s subreddit or OMG UW equivalent.

A surprising number of students (by no fault of their own) are unaware that some of the student council fees they pay go towards paying the people in the organization. The part I want to focus on is the salaries that the executives of these organizations are making. Seeing as we all pay for these salaries as part of out tuition, I believe that we should be aware as to where our money is going.

Now, keep a couple things in mind:

  • These are full-time, year long positions. The people in these positions are delaying their graduation to provide a service to the student body.
  • I found the most recent salary numbers I could, but not all are for the 2015 year. You can see my list of sources below if you want to check them out.
  • Due to how some financial statements were laid out, some salaries required making an educated guess based on Total Executive Salaries divided by Number of Executives.
  • If you see a mistake and can provide a source I will gladly update the numbers.

Here are the salaries/stipends provided to the top level (President or equivalent) executives for the largest schools in the country:

chart

These numbers include all benefits. 

Note that these salaries are comparable to the rest of the executive teams – all schools have between 4-6 people on the executive council that are getting paid the same amount as the president. Without a doubt, this graph shows that the Waterloo FEDs executives are compensated higher than their counterparts at other schools, though all schools seem to ensure their student counsel executives are comfortable during their tenure.

I hesitate to show this next graph, as it may not be a fair comparison. Each school has a different way of reporting incidental fees and this will definitely effect how the fees are collected. For example, I believe that the budget for all clubs at Waterloo supported by Feds is lumped together in the Feds fees, whereas at other school there may be separate fees associated with different clubs. Nevertheless, here are the fees paid each semester by each full time student associated with their student council:

fees

I’m interesting in hearing your thoughts on these results – if you’re coming from Reddit feel free to message me at /u/NaturalBlogarithm or send me a tweet at @NatBlogarithm.

My list of sources can be found in this Excel file. Call me out if something is wrong!

Aspiring Actuary: Exam Basics

You know nothing about what it takes to become an actuary. At least, that is what I’m going to assume for this post. I know from experience that without guidance it can be difficult to find all the information you need to make an informed decision on pursing an actuarial designation – through this series of posts I aim to alleviate some of your concerns.

This posts assumes you are pursuing a designation in the US or Canada, there are separate organizations in the UK.

What exams do I need to take to become an actuary?

The first thing you should know is that there are two main regulating bodies that offer exams in North America to qualify actuaries. The Society of Actuaries – which will be my focus – offers exams for life insurance, pension and health actuaries (among others). The second is the Casualty Actuarial Society, who focus on property and casualty insurance.

At some point in your career you will have to decide which designation to pursue. Both societies offer two levels of designation at the associate (ASA or ACAS) level and the fellowship (FSA or FCAS) level. Each level involves writing a number of rigorous exams on your own time and have a large study commitment requirement.

Luckily as of January 2016 there are still four exams that are offered jointly by the SOA and CAS. This means that you aren’t locked into one path right away, you have some time to wet your feet in the profession before you narrow your focus.  Often, as I did, the companies that you do your co-ops or internships at will determine which exams you write. If you are doing an internship for a life insurance company you will be writing the life insurance (SOA) exams and vice versa. Many companies offer financial support to students to help with exam writing.

The ASA (Associate to the Society of Actuaries) designation requires completed of five exams: P, FM, MFE, MLC and C. Note that all of these except for MLC will also count towards an ACAS designation.

Contents of ASA Exams

Click the links to be redirected to the SOAs homepage for each exam. Registration, syllabi and study materials are all available there.

Exam P (Probability)

Exam P is where most people will begin writing exams. Overall it has little to do specifically with actuarial topics, it is more of a way to develop the mathematical background that comes with later exams. You’ll find that a good understanding of calculus (Calculus II at Waterloo) and a basic understanding of statistics will help you pass this exam.

Exam FM (Financial Mathematics)

This exam is split into two sections. The first deals with interest theory and the time value of money. You will learn about discounting cash flows, annuities, loans and bonds. The second section deals with derivatives (of the stock market type, not calculus!), forward rates and different types of investment strategies. If you are early in your undergrad and are not in an actuarial science program I would expect you to have less exposure to this material – adjust your study time accordingly.

Exam MFE (Models for Financial Economics)

In many ways Exam MFE is the second level to exam FM. It is a very math intensive exam that will require you to know how to use your calculator accurately to stay on time. Recently updated, the syllabus now focusses mainly on the valuation of financial derivatives – you’ll learn to love (read: “love”) the Black-Scholes model.

Exam MLC (Models for Life Contingencies)

I have less to say about MLC as I have yet to take it (studying for the May 2016 sitting!) – I’ll be sure to update this post once I have finished. MLC takes many of the concepts from previous exams and adds a new twist: Payments only occur while the policyholder is alive. Here you will learn about life contingent annuities and the math behind the different types of insurance policies. Many people will say that MLC was the hardest of the ASA level exams, so be sure to set aside a lot of time to study for this one.

Exam C (Construction of Actuarial Models)

Covering a large array of topics, Exam C is the most statistics intensive of the ASA level exams. Personally, I also found it to be the most interesting of the exams I’ve taken. The syllabus is split into loss models, credibility and simulation. Loss models encompasses the bulk of the exam and requires a high understanding of calculus, sums of series and random variables. Many fellow students I have spoken with about Exam C agree that in addition to the mathematics much of the difficultly comes from the breadth of the syllabus and the time it takes to learn it all.

Okay, I know the basics of the exams. Now what?

I plan on continuing to write posts related to exams. Hopefully some of the study habits I have picked up can help others to study more efficiently. If you have any specific questions please leave a comment or shoot me an email, maybe it will spark inspiration for a new article.