# OC R User Group Meet-Up

This a a summary of the presentation I gave at the October OC R User Group Meet Up. The focus on the talk was to solve the following problem statement using R.

I am investing in Lending Club notes. Specifically Grade A, 36 Month loans. I want to generate an expected cash flow for my investment.

## Note

Sometimes wordpress.com will format code like the pipes and R assignment incorrectly.  You can find the code for this post here:  https://github.com/bryantravissmith/OC-RUG

## Setup

I want to generate my expectations for grade A, 36 months loan. The main goal is to generate a framework for the solution that it can also be applied to other loans. I am starting off creating a data frame from all the join Lending Club Data from $Jun \ 30^{th}, 2016$ when I selected the target loans from tranches of loan that are mature.

library(dplyr)
library(reshape2)
library(ggplot2)
library(zoo)
library(stringr)
library(knitr)
library(DT)

# Previously loaded all LC data into a single data frame and saved as RData

lcA &lt;- lc %&gt;% mutate(
#Crate a Data Object for the Issue Date
issue_yearmon = as.Date(as.yearmon(issue_d,format=&quot;%b-%Y&quot;)),
#Create a Numeric Issue Year For Group By Calculations
issue_year = as.numeric(as.character(issue_yearmon,format=&quot;%Y&quot;)),
#Convert to Date object to calculate the 'Age' of the loan
last_pymnt_yearmon = as.Date(as.yearmon(last_pymnt_d,format=&quot;%b-%Y&quot;)),
#Converte interest rate to ta numeric value
interest = as.numeric(str_replace(int_rate,&quot;%&quot;,&quot;&quot;))
) %&gt;%
mutate(
#Make what are roughly 1 month wide in the time between the loan was
#originated and the time it stopped generating cash flows
AgeBucket = cut(as.numeric((last_pymnt_yearmon-issue_yearmon)/365),
breaks=seq(0,6,1/12),
include.lowest=T,
right=F)
) %&gt;%
mutate(
#Converte the Age Bucket to a Numeric Month or Statement value
Age = match(AgeBucket,levels(AgeBucket))
) %&gt;%
mutate(
#Get the total recieved payments for each loan
total_rec = total_rec_prncp+total_rec_int
) %&gt;%
#Limit to the data we are interested in for the problem
#Remove 2007 - stands out in graphs, actually doesn't change the results too much.
filter(term == &quot; 36 months&quot;,grade == &quot;A&quot;, issue_year  &lt; 2013, issue_year &gt; 2007)



## Cash Flow Expectation : Paid As Agreed

The simplest expectation is to just assume that everyone will make every schedule payment. The equation for the payment of a fix term loan is: $Payment = \frac{LoanAmount * MonthlyInterestRate}{1 - \frac{1}{(1+MonthlyInterestRate)^{Term}}}$

You can then use this to get the amortization schedule for the loan.

# Generates the expected payment for a fixed term loan.
payment_value &lt;- function(loan_amount,interest_rate,term_number) {
loan_amount*interest_rate/12/(1 - (1+interest_rate/12)^(-1*term_number))
}

#Gereates the scheduled payment behavior for a loan that is paid as agreed
get_amortization &lt;- function(term_number,interest_rate){
amortization &lt;- data.frame(statement = seq(0,term_number,1),
payment = c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment = 0,
start_balance = 0,
end_balance = 1)
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
return(amortization)
}

#Helper function to round numeric values when writing to data table
round_df &lt;- function(df, digits) {
nums &lt;- vapply(df, is.numeric, FUN.VALUE = logical(1))
df[,nums] &lt;- round(df[,nums], digits = digits)   (df) } #Printing values get_amortization(36,0.078) %&gt;%
round_df(4) %&gt;%
kable(format = &quot;markdown&quot;)

statement payment interest_payment principal_payment start_balance end_balance
0 0.0000 0.0000 0.0000 0.0000 1.0000
1 0.0312 0.0065 0.0247 1.0065 0.9753
2 0.0312 0.0063 0.0249 0.9816 0.9504
3 0.0312 0.0062 0.0251 0.9565 0.9253
4 0.0312 0.0060 0.0252 0.9313 0.9001
5 0.0312 0.0059 0.0254 0.9059 0.8747
6 0.0312 0.0057 0.0256 0.8803 0.8491
7 0.0312 0.0055 0.0257 0.8546 0.8234
8 0.0312 0.0054 0.0259 0.8287 0.7975
9 0.0312 0.0052 0.0261 0.8027 0.7714
10 0.0312 0.0050 0.0262 0.7764 0.7452
11 0.0312 0.0048 0.0264 0.7500 0.7188
12 0.0312 0.0047 0.0266 0.7235 0.6922
13 0.0312 0.0045 0.0267 0.6967 0.6655
14 0.0312 0.0043 0.0269 0.6698 0.6386
15 0.0312 0.0042 0.0271 0.6427 0.6115
16 0.0312 0.0040 0.0273 0.6154 0.5842
17 0.0312 0.0038 0.0274 0.5880 0.5567
18 0.0312 0.0036 0.0276 0.5604 0.5291
19 0.0312 0.0034 0.0278 0.5326 0.5013
20 0.0312 0.0033 0.0280 0.5046 0.4733
21 0.0312 0.0031 0.0282 0.4764 0.4452
22 0.0312 0.0029 0.0284 0.4481 0.4168
23 0.0312 0.0027 0.0285 0.4195 0.3883
24 0.0312 0.0025 0.0287 0.3908 0.3596
25 0.0312 0.0023 0.0289 0.3619 0.3307
26 0.0312 0.0021 0.0291 0.3328 0.3016
27 0.0312 0.0020 0.0293 0.3035 0.2723
28 0.0312 0.0018 0.0295 0.2740 0.2428
29 0.0312 0.0016 0.0297 0.2444 0.2131
30 0.0312 0.0014 0.0299 0.2145 0.1833
31 0.0312 0.0012 0.0301 0.1845 0.1532
32 0.0312 0.0010 0.0302 0.1542 0.1230
33 0.0312 0.0008 0.0304 0.1238 0.0925
34 0.0312 0.0006 0.0306 0.0931 0.0619
35 0.0312 0.0004 0.0308 0.0623 0.0310
36 0.0312 0.0002 0.0310 0.0312 0.0000

## Results: Paid As Agreed

I am now going to use the Lending Club data to calculate actual yield of grade A, 36 month loans and compare it to the expectations generated by the above payment schedule. I will do this calculation by year for each year of mature loans in Lending Club. We are assuming there is no prepayment and charge off behavior, so I will also calculate these to see scale of these behaviors.

lcA %&gt;%
# Perform the following calculation for each year
group_by(issue_year) %&gt;%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl(&quot;Charged Off&quot;,loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl(&quot;Fully Paid&quot;,loan_status)&amp;(Age &lt; 36),1,0))/Count*100,1),     #The Percent Increase on the Dollars Loans     Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),     #Dollar weighted interest rate for each year             Interest = round(weighted.mean(interest,funded_amnt),1)) %&gt;%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization(36,
as.numeric(x['Interest'])/100)$payment))) %&gt;% mutate( #Rounds Expected Yield to a percent ExpectedYield = round(100*(ExpectedYield-1),1)) %&gt;% #Filter to relevant variables select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %&gt;% #Convert to % for display mutate(Prepays=paste0(Prepays,&quot;%&quot;), Chargeoffs=paste0(Chargeoffs,&quot;%&quot;), Interest=paste0(Interest,&quot;%&quot;), Yield=paste0(Yield,&quot;%&quot;), ExpectedYield = paste0(ExpectedYield,&quot;%&quot;)) %&gt;% #Create a data table kable( format = &quot;markdown&quot;)  issue_year DollarsFunded Prepays Chargeoffs Interest Yield ExpectedYield 2008 1899800 49.4% 6% 8.5% 7.6% 13.6% 2009 8700550 49.7% 6.7% 8.8% 7.6% 14.1% 2010 20650050 53.1% 4.4% 7.2% 7.4% 11.5% 2011 49784200 50.9% 6.4% 7.2% 6.4% 11.5% 2012 119545400 52.4% 7.2% 7.7% 6.4% 12.3% ## Review: Paid As Agreed I can see that there are signficant Prepayments and Charge Offs for each year of Lending Club, so I’ll try to include these in the expectations. ## Cash Flow Expectation : Account For Charge Offs I am going to include charge offs in the payment expectations. Once a loan charges off, it no longer makes payments. This makes it straight forward to include in the amortatization schedule because we only need to keep track of how many loans are left, and add the payments of the left over loans. In reality there is also deliquency behavior that should be accounted for, but we will have to do that another time. To execute this, I need to get get a feel for when the loans charge off in a given traunch of loans. # Charege Offs lcA %&gt;% #Perform the following calculations for each year group_by(issue_year) %&gt;% ## Count the number of leans each year mutate(count=n()) %&gt;% ## Perform a summary for each year and age of loan group_by(issue_year,Age) %&gt;% ## Find the number of loans that charge off each month summarise(bad = sum(ifelse(grepl(&quot;Charged Off&quot;,loan_status),1,0)), #Get the count value for the year #count is a vector of the save value for each group count = max(count)) %&gt;% ## ORder by the age for the cumsum arrange(Age) %&gt;% ## Get the cumlative total of bad loans over the course of the term mutate(total_bad = cumsum(bad)) %&gt;% ## Group by the traunch of loans group_by(issue_year) %&gt;% ##Get the total number of bad loans for the traunch mutate(max_total_bad = max(total_bad)) %&gt;% ##Plot the cumlative percentage of chargeoffs for each traunch ggplot(aes(Age,total_bad/max_total_bad,color=factor(issue_year)))+ geom_line()+ theme_bw()+ xlim(0,36)+ ylab('Percent Total Chargeoffs') The cumulative charge off behavior has a characteristic shape, so we can use this shape to include in the expected number of units making payments in a given month. We can generate this expectations by just doing a weighted average on these curves (thought there are more thoughtful ways to do this as well). I’m going to take this average, and smoothing fit, and make a function of out it to include in the amortization of the loan. percentChargeoffFunc &lt;- lcA %&gt;% #Perform the following calculations for each year group_by(issue_year) %&gt;% ## Count the number of leans each year mutate(count=n()) %&gt;% ## Perform a summary for each year and age of loan group_by(issue_year,Age) %&gt;% ## Find the number of loans that charge off each month summarise(bad = sum(ifelse(grepl(&quot;Charged Off&quot;,loan_status),1,0)), #Get the count value for the year #count is a vector of the save value for each group count = max(count)) %&gt;% ## ORder by the age for the cumsum arrange(Age) %&gt;% ## Limit to 36 months filter(Age &lt;= 36) %&gt;% ## Get the cumlative total of bad loans over the course of the term mutate(total_bad = cumsum(bad)) %&gt;% ## Group by the traunch of loans group_by(issue_year) %&gt;% ##Get the total number of bad loans for the traunch mutate(max_total_bad = max(total_bad)) %&gt;% ##Calculate percent bad mutate(percent_bad = total_bad/max_total_bad) %&gt;% ## Group by Age to do the weighted average of all the curves group_by(Age) %&gt;% ## Calculate the weighted average by the traunch size summarise(avg_percent_total_bad = weighted.mean(percent_bad,count)) %&gt;% ## Return Approx Function to describe Chargeoff Behavior (function(df) { ## Make a loess fit of the average to smooth out mod &lt;- loess(avg_percent_total_bad ~ Age,df,span=1/4) ## make the predictions pred &lt;- predict(mod,df) ## create a data frame of the age and smoothed predictions tmp &lt;- data.frame(Age = df$Age,pred_percent_total_bad=pred)     ## make sure the new smooth values are between 0 and 1     tmp$pred_percent_total_bad = tmp$pred_percent_total_bad-min(tmp$pred_percent_total_bad) tmp$pred_percent_total_bad = tmp$pred_percent_total_bad/max(tmp$pred_percent_total_bad)     # return approx function of the smooth function     approxfun(tmp$Age,tmp$pred_percent_total_bad,method='linear',yleft = 0,yright=1)   })  ## Plot Function Output data.frame(Age=seq(0,36),            pco = percentChargeoffFunc(seq(0,36))) %&gt;%
ggplot(aes(Age,pco))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Chargeoffs') Now that I have the function, I can include the expected charge off function times the actual scale to update our cash flow expectations.

#Ammortizaiton Schedule for loans with Charge Off Expectations
get_amortization_chargeoff &lt;- function(term_number,interest_rate,percent_chargeoff){
amortization &lt;- data.frame(statement=seq(0,term_number,1),
payment=c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment=0,
start_balance=0,
end_balance=1,
#Added a units chargeoff which is the new curve times the measured scale
unit_chargeoff = percent_chargeoff*percentChargeoffFunc(seq(0,36)))
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
#Adds a payment recieved which is the schedule payments times the number of units left on a given statement
amortization &lt;- amortization %&gt;% mutate(payment_recieved = (1-unit_chargeoff)*payment)
return(amortization)
}

#Print output
get_amortization_chargeoff(36,0.07,0.06) %&gt;%
round_df(4) %&gt;%
kable( format = &quot;markdown&quot;)

statement payment interest_payment principal_payment start_balance end_balance unit_chargeoff payment_recieved
0 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
1 0.0309 0.0058 0.0250 1.0058 0.9750 0.0000 0.0309
2 0.0309 0.0057 0.0252 0.9806 0.9498 0.0008 0.0309
3 0.0309 0.0055 0.0253 0.9553 0.9244 0.0019 0.0308
4 0.0309 0.0054 0.0255 0.9298 0.8989 0.0033 0.0308
5 0.0309 0.0052 0.0256 0.9042 0.8733 0.0051 0.0307
6 0.0309 0.0051 0.0258 0.8784 0.8475 0.0070 0.0307
7 0.0309 0.0049 0.0259 0.8525 0.8216 0.0088 0.0306
8 0.0309 0.0048 0.0261 0.8264 0.7955 0.0107 0.0305
9 0.0309 0.0046 0.0262 0.8002 0.7693 0.0130 0.0305
10 0.0309 0.0045 0.0264 0.7738 0.7429 0.0152 0.0304
11 0.0309 0.0043 0.0265 0.7472 0.7163 0.0172 0.0303
12 0.0309 0.0042 0.0267 0.7205 0.6896 0.0191 0.0303
13 0.0309 0.0040 0.0269 0.6937 0.6628 0.0212 0.0302
14 0.0309 0.0039 0.0270 0.6667 0.6358 0.0235 0.0302
15 0.0309 0.0037 0.0272 0.6395 0.6086 0.0258 0.0301
16 0.0309 0.0036 0.0273 0.6122 0.5813 0.0282 0.0300
17 0.0309 0.0034 0.0275 0.5847 0.5538 0.0305 0.0299
18 0.0309 0.0032 0.0276 0.5570 0.5261 0.0327 0.0299
19 0.0309 0.0031 0.0278 0.5292 0.4983 0.0345 0.0298
20 0.0309 0.0029 0.0280 0.5012 0.4704 0.0363 0.0298
21 0.0309 0.0027 0.0281 0.4731 0.4422 0.0383 0.0297
22 0.0309 0.0026 0.0283 0.4448 0.4139 0.0399 0.0296
23 0.0309 0.0024 0.0285 0.4164 0.3855 0.0414 0.0296
24 0.0309 0.0022 0.0286 0.3877 0.3569 0.0429 0.0296
25 0.0309 0.0021 0.0288 0.3589 0.3281 0.0447 0.0295
26 0.0309 0.0019 0.0290 0.3300 0.2991 0.0469 0.0294
27 0.0309 0.0017 0.0291 0.3008 0.2700 0.0490 0.0294
28 0.0309 0.0016 0.0293 0.2715 0.2407 0.0510 0.0293
29 0.0309 0.0014 0.0295 0.2421 0.2112 0.0528 0.0292
30 0.0309 0.0012 0.0296 0.2124 0.1815 0.0543 0.0292
31 0.0309 0.0011 0.0298 0.1826 0.1517 0.0556 0.0292
32 0.0309 0.0009 0.0300 0.1526 0.1217 0.0567 0.0291
33 0.0309 0.0007 0.0302 0.1224 0.0916 0.0577 0.0291
34 0.0309 0.0005 0.0303 0.0921 0.0612 0.0586 0.0291
35 0.0309 0.0004 0.0305 0.0616 0.0307 0.0594 0.0290
36 0.0309 0.0002 0.0307 0.0309 0.0000 0.0600 0.0290

## Results: Account For Charge Offs

lcA %&gt;%
# Perform the following calculation for each year
group_by(issue_year) %&gt;%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl(&quot;Charged Off&quot;,loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl(&quot;Fully Paid&quot;,loan_status)&amp;(Age &lt; 36),1,0))/Count*100,1),     #The Percent Increase on the Dollars Loans     Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),     #Dollar weighted interest rate for each year             Interest = round(weighted.mean(interest,funded_amnt),1)) %&gt;%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization_chargeoff(36,
as.numeric(x['Interest'])/100,
as.numeric(x['Chargeoffs'])/100
)$payment_recieved))) %&gt;% mutate( #Rounds Expected Yield to a percent ExpectedYield = round(100*(ExpectedYield-1),1) ) %&gt;% #Filter to relevant variables select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %&gt;% #Convert to % for display mutate(Prepays=paste0(Prepays,&quot;%&quot;), Chargeoffs=paste0(Chargeoffs,&quot;%&quot;), Interest=paste0(Interest,&quot;%&quot;), Yield=paste0(Yield,&quot;%&quot;), ExpectedYield = paste0(ExpectedYield,&quot;%&quot;)) %&gt;% #Create a data table kable( format = &quot;markdown&quot;)  issue_year DollarsFunded Prepays Chargeoffs Interest Yield ExpectedYield 2008 1899800 49.4% 6% 8.5% 7.6% 10% 2009 8700550 49.7% 6.7% 8.8% 7.6% 10.1% 2010 20650050 53.1% 4.4% 7.2% 7.4% 8.9% 2011 49784200 50.9% 6.4% 7.2% 6.4% 7.7% 2012 119545400 52.4% 7.2% 7.7% 6.4% 8% ## Review: Account For Charge Offs I see that the expected values have decreased from the previous expectations. The expected and actuals are closer in line, but there is room for improvement. I have accounted for the charge offs, but not the prepayment behavior. ## Cash Flow Expectation : Account For Prepayments & Charge Offs I can now repeat the process I did for including the effect of charge offs in our cash flow expectations. I can see if there is a characteristic cumulative behavior, generate a function that approximates the behavior, and incorporate it into the amortization. # Prepays lcA %&gt;% ## Perform the following calculations for each year group_by(issue_year) %&gt;% ## Count the number of leans each year mutate(count=n()) %&gt;% ## Perform a summary for each year and age of loan group_by(issue_year,Age) %&gt;% ## Find the number of loans that prepayments each month summarise(prepay = sum(ifelse(grepl(&quot;Fully Paid&quot;,loan_status),1,0)), ## Get the count value for the year ## count is a vector of the save value for each group count = max(count)) %&gt;% ## ORder by the age for the cumsum arrange(Age) %&gt;% ## Can only prepay before the loan is mature filter(Age &lt; 36) %&gt;% ## Get the cumlative total of prepaid loans over the course of the term mutate(total_prepay = cumsum(prepay)) %&gt;% ## Group by the traunch of loans group_by(issue_year) %&gt;% ## Get the total number of prepayments for the traunch mutate(max_total_prepay = max(total_prepay)) %&gt;% ## Plot the cumlative percentage of prepaid loans for each traunch ggplot(aes(Age,total_prepay/max_total_prepay,color=factor(issue_year)))+ geom_line()+ theme_bw()+ xlim(0,36)+ ylab('Percent Total Prepayment') The cumulative prepayment behavior over the life does have a characteristic shape, so I will generate an approximation function similar to charge offs. percentPrepaymentFunc &lt;- lcA %&gt;% ## Perform the following calculations for each year group_by(issue_year) %&gt;% ## Count the number of leans each year mutate(count=n()) %&gt;% ## Perform a summary for each year and age of loan group_by(issue_year,Age) %&gt;% ## Find the number of loans that prepayments each month summarise(prepay = sum(ifelse(grepl(&quot;Fully Paid&quot;,loan_status),1,0)), ## Get the count value for the year ## count is a vector of the save value for each group count = max(count)) %&gt;% ## ORder by the age for the cumsum arrange(Age) %&gt;% ## Can only prepay before the loan is mature filter(Age &lt; 36) %&gt;% ## Get the cumlative total of prepaid loans over the course of the term mutate(total_prepay = cumsum(prepay)) %&gt;% ## Group by the traunch of loans group_by(issue_year) %&gt;% ## Get the total number of prepayments for the traunch mutate(max_total_prepay = max(total_prepay)) %&gt;% ## Calculate the percentage of prepayments that happen in a given ## month out of all of the prepayments that happen in the traunch mutate(percent_prepay = total_prepay/max_total_prepay) %&gt;% ## Calculate for eeach age group_by(Age) %&gt;% ## The weighted mean of the percentage by the size of the traunch summarise(avg_percent_total_prepay = weighted.mean(percent_prepay,count)) %&gt;% ## Return an approx function (function(df) { ## perform a loess fit to smoooth the function mod &lt;- loess(avg_percent_total_prepay ~ Age,df,span=1/4) ## predict the results pred &lt;- predict(mod,df) ## create a data frame to use to make the approx function tmp &lt;- data.frame(Age = df$Age,pred_percent_total_prepay=pred)     ## Make sure the fit produces values between 0 and 1     tmp$pred_percent_total_prepay = tmp$pred_percent_total_prepay-min(tmp$pred_percent_total_prepay) tmp$pred_percent_total_prepay = tmp$pred_percent_total_prepay/max(tmp$pred_percent_total_prepay)     ## Return the approx function     approxfun(tmp$Age,tmp$pred_percent_total_prepay,method='linear',yleft = 0,yright=1)   })  ## Plot the approx functions data.frame(Age=seq(0,36),            ppp = percentPrepaymentFunc(seq(0,36))) %&gt;%
ggplot(aes(Age,ppp))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Chargeoffs') ## Results: Account For Prepayments & Charge Offs

I have the characteristic prepayment function and charge off function, so I can append it to the amortization schedule. Like loans that charge off, a prepaid loan does not generate payments after it is prepaid. Unlike charged off loans, it does return the principal. Once this is included, I suspect I will see higher than previously expected cashflows earlier due to returned principal.

#Ammortizaiton Schedule for loans with Charge Off and Prepayment Expectations
get_amortization_chargeoff_prepay &lt;- function(term_number,interest_rate,percent_chargeoff,percent_prepay){
amortization &lt;- data.frame(statement=seq(0,term_number,1),
payment=c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment=0,
start_balance=0,
end_balance=1,
#Units chargeoffed is measures scale * expected percent of total
unit_chargeoff = percent_chargeoff*percentChargeoffFunc(seq(0,36)),
#Units prepyament is measured scale * expected percent of total
unit_prepay = percent_prepay*percentPrepaymentFunc(seq(0,36)))
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
amortization &lt;- amortization %&gt;%
mutate(monthly_prepay = ifelse(is.na(lag(unit_prepay)),0,unit_prepay-lag(unit_prepay))) %&gt;%
mutate(payment_recieved = (1-unit_chargeoff-unit_prepay)*payment+monthly_prepay*(end_balance+payment))
return(amortization)
}

get_amortization_chargeoff_prepay(36,0.07,0.06,0.50) %&gt;%
round_df(4) %&gt;%
kable( format = &quot;markdown&quot;)

statement payment interest_payment principal_payment start_balance end_balance unit_chargeoff unit_prepay monthly_prepay payment_recieved
0 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
1 0.0309 0.0058 0.0250 1.0058 0.9750 0.0000 0.0000 0.0000 0.0309
2 0.0309 0.0057 0.0252 0.9806 0.9498 0.0008 0.0064 0.0064 0.0369
3 0.0309 0.0055 0.0253 0.9553 0.9244 0.0019 0.0135 0.0071 0.0372
4 0.0309 0.0054 0.0255 0.9298 0.8989 0.0033 0.0214 0.0079 0.0375
5 0.0309 0.0052 0.0256 0.9042 0.8733 0.0051 0.0298 0.0084 0.0374
6 0.0309 0.0051 0.0258 0.8784 0.8475 0.0070 0.0390 0.0093 0.0376
7 0.0309 0.0049 0.0259 0.8525 0.8216 0.0088 0.0492 0.0101 0.0377
8 0.0309 0.0048 0.0261 0.8264 0.7955 0.0107 0.0600 0.0108 0.0376
9 0.0309 0.0046 0.0262 0.8002 0.7693 0.0130 0.0718 0.0118 0.0377
10 0.0309 0.0045 0.0264 0.7738 0.7429 0.0152 0.0825 0.0107 0.0361
11 0.0309 0.0043 0.0265 0.7472 0.7163 0.0172 0.0928 0.0103 0.0352
12 0.0309 0.0042 0.0267 0.7205 0.6896 0.0191 0.1039 0.0111 0.0351
13 0.0309 0.0040 0.0269 0.6937 0.6628 0.0212 0.1168 0.0129 0.0356
14 0.0309 0.0039 0.0270 0.6667 0.6358 0.0235 0.1321 0.0153 0.0363
15 0.0309 0.0037 0.0272 0.6395 0.6086 0.0258 0.1474 0.0153 0.0353
16 0.0309 0.0036 0.0273 0.6122 0.5813 0.0282 0.1631 0.0157 0.0346
17 0.0309 0.0034 0.0275 0.5847 0.5538 0.0305 0.1788 0.0157 0.0336
18 0.0309 0.0032 0.0276 0.5570 0.5261 0.0327 0.1942 0.0154 0.0324
19 0.0309 0.0031 0.0278 0.5292 0.4983 0.0345 0.2096 0.0154 0.0315
20 0.0309 0.0029 0.0280 0.5012 0.4704 0.0363 0.2256 0.0160 0.0308
21 0.0309 0.0027 0.0281 0.4731 0.4422 0.0383 0.2432 0.0176 0.0305
22 0.0309 0.0026 0.0283 0.4448 0.4139 0.0399 0.2599 0.0167 0.0290
23 0.0309 0.0024 0.0285 0.4164 0.3855 0.0414 0.2759 0.0160 0.0278
24 0.0309 0.0022 0.0286 0.3877 0.3569 0.0429 0.2923 0.0163 0.0269
25 0.0309 0.0021 0.0288 0.3589 0.3281 0.0447 0.3098 0.0176 0.0262
26 0.0309 0.0019 0.0290 0.3300 0.2991 0.0469 0.3292 0.0194 0.0257
27 0.0309 0.0017 0.0291 0.3008 0.2700 0.0490 0.3486 0.0193 0.0244
28 0.0309 0.0016 0.0293 0.2715 0.2407 0.0510 0.3681 0.0196 0.0232
29 0.0309 0.0014 0.0295 0.2421 0.2112 0.0528 0.3872 0.0191 0.0219
30 0.0309 0.0012 0.0296 0.2124 0.1815 0.0543 0.4062 0.0189 0.0207
31 0.0309 0.0011 0.0298 0.1826 0.1517 0.0556 0.4249 0.0187 0.0195
32 0.0309 0.0009 0.0300 0.1526 0.1217 0.0567 0.4438 0.0190 0.0183
33 0.0309 0.0007 0.0302 0.1224 0.0916 0.0577 0.4632 0.0194 0.0172
34 0.0309 0.0005 0.0303 0.0921 0.0612 0.0586 0.4819 0.0187 0.0159
35 0.0309 0.0004 0.0305 0.0616 0.0307 0.0594 0.5000 0.0181 0.0147
36 0.0309 0.0002 0.0307 0.0309 0.0000 0.0600 0.5000 0.0000 0.0136

Now I can add the prepayment behavior into our cashflows and compare the expected to the actuals.

lcA %&gt;%
# Perform the following calculation for each year
group_by(issue_year) %&gt;%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl(&quot;Charged Off&quot;,loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl(&quot;Fully Paid&quot;,loan_status)&amp;(Age &lt; 36),1,0))/Count*100,1),     #The Percent Increase on the Dollars Loans     Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),     #Dollar weighted interest rate for each year             Interest = round(weighted.mean(interest,funded_amnt),1)) %&gt;%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization_chargeoff_prepay(36,
as.numeric(x['Interest'])/100,
as.numeric(x['Chargeoffs'])/100,
as.numeric(x['Prepays'])/100
)\$payment_recieved))) %&gt;%
mutate(
#Rounds Expected Yield to a percent
ExpectedYield = round(100*(ExpectedYield-1),1)
) %&gt;%
#Filter to relevant variables
select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %&gt;%
#Convert to % for display
mutate(Prepays=paste0(Prepays,&quot;%&quot;),
Chargeoffs=paste0(Chargeoffs,&quot;%&quot;),
Interest=paste0(Interest,&quot;%&quot;),
Yield=paste0(Yield,&quot;%&quot;),
ExpectedYield = paste0(ExpectedYield,&quot;%&quot;)) %&gt;%
#Create a data table
kable(format = &quot;markdown&quot;)

issue_year DollarsFunded Prepays Chargeoffs Interest Yield ExpectedYield
2008 1899800 49.4% 6% 8.5% 7.6% 8.4%
2009 8700550 49.7% 6.7% 8.8% 7.6% 8.3%
2010 20650050 53.1% 4.4% 7.2% 7.4% 7.4%
2011 49784200 50.9% 6.4% 7.2% 6.4% 6.3%
2012 119545400 52.4% 7.2% 7.7% 6.4% 6.4%

## Review: Account For Prepayments & Charge Offs

Now I have a framework for cashflow expectations that comes within error of the actuals. This method requires that I generate exceptions for the expected charge off and prepayment scales to forecast cashflows, but once I have that, the rest is just straight forward math. The cashflow model that comes describes lending club actuals is using the paid as agreed model, then account for charge off and prepayment behavior. Even though there are a number of simplification and assumptions in this framework, it is an improved understanding of the expected cashflows for a grade A, 36 month term loan in Lending Club.