Risk Management Of Stocks Using Python
Learn different ways to evaluate the Value At Risk(VaR) of a stock
--
What is Value at Risk(VaR)?
Value at Risk (VaR) is the most common metric used by an individual, firms, and banks to determine the extent of potential financial loss for a given period of time. Financial institutions use VaR to assess the risk associated with an investment to evaluate whether they have sufficient funds to cover potential losses, it also helps risk managers to re-evaluate and adjust their investments to reduce the risk of higher losses. The typical question that VaR helps answer is what is the maximum loss on investment of 100,000 over one year with a confidence of 95% ? or What is the probability that losses will be more than 3% next year? The standard way of representing these scenarios is in the form of a normal distribution which makes the analysis and interpretation much easier.
There are three important factors at play — the confidence interval, the time period, and the investment amount or expected percentage of loss.
In this blog, we will cover the below topics
- Exploratory data analysis(EDA)
- Calculation of returns
- Methodologies to calculate VaR
- Historical method
- Bootstrap method
- Decay factor method
- Monte Carlo simulation method
Exploratory data analysis (EDA)
Load the libraries
These are standard libraries for data loading, analysis, and visualization. The key point is to note the use of nsepy for getting the stock data. The other alternative is to use yahoo finance.
import numpy as np
import pandas as pd
import warnings
import matplotlib.pyplot as plt
from datetime import date
from tabulate import tabulate
from nsepy import get_history as gh
Load the data and initial settings
We will be using the last year of data of the TATA MOTORS. This can be any name of interest like ICICI or DABUR etc. This gives the daily value of the stock price.
initial_investment = 100000
startdate = date(2022,2,2)
end_date = date(2023,2,2)
stocksymbols = ['TATAMOTORS'] # This can be any stock
def load_stock_data(self):
df = pd.DataFrame()
for i in range(len(self.ticker)):
....
...
return df
OUTPUT:
TATAMOTORS
Date
2022-10-14 396.25
2022-10-17 396.10
2022-10-18 404.25
2022-10-19 399.05
2022-10-20 398.10
....
....
2023-01-27 445.60
2023-01-30 443.65
2023-01-31 452.10
2023-02-01 446.65
2023-02-02 444.80
Calculate the returns
The calculation of returns is a simple process, we take the percentage change between the current value and the previous value.
P(t+1) — P(t) / P(t)
This can be achieved easily in python using pct_change method and let's visualize the returns.
def stock_returns(self):
df = self.load_stock_data()
df.columns = ['Stock']
returns = df.pct_change()
returns.dropna(inplace=True)
return returns
OUTPUT:
Stock
Date
2022-10-17 -0.000379
2022-10-18 0.020576
2022-10-19 -0.012863
2022-10-20 -0.002381
2022-10-21 -0.000126
We have accessed the data, loaded it, calculated the returns, and visualized the trends. Let us now try the three methods to calculate the VaR in the next section
Historical method
This is the simplest of all the methods and the most basic as it doesn’t give importance to the distribution and tails. In this method, we take the return values from the previous section and sort the values. As a standard, we take a confidence level of 95% i.e, our focus shifts to the bottom 5%. Also, the number of trading days in a year is 252, so all that we need to do is calculate 5% of 252 days which is 12.6 meaning we will have to take the 13th lowest return which turns out to be -0.039306 as shown below.
returns.sort_values('Stock').head(13)
OUTPUT:
2022-02-24 -0.102830
2022-09-26 -0.060506
2022-03-07 -0.055722
2022-02-14 -0.054926
2022-06-16 -0.051075
2022-06-13 -0.049877
2022-11-10 -0.048367
2022-03-04 -0.045413
2022-05-06 -0.041637
2022-05-12 -0.040835
2022-12-23 -0.040816
2022-05-19 -0.039745
2022-10-10 -0.039306
Python has a more elegant way to get to this value using NumPy's percentile function.
np.percentile(returns['Stock'], 5, interpolation = 'lower')
OUTPUT:
-0.039306077884265433
Here is the complete method that loads the data, and calculates the VaR. The end result is a nice tabular format.
def var_historical(self):
returns = obj_loadData.df_returns.copy()
....
....
returns.sort_values('Stock').head(13)
var_hist = np.percentile(returns['Stock'], 5, interpolation = 'lower')
print(tabulate([[self.ticker,avg_rets,avg_std,var_hist]],
headers = ['Mean', 'Standard Deviation', 'VaR %'],
tablefmt = 'fancy_grid',stralign='center',numalign='center',floatfmt=".4f"))
return var_hist
def plot_shade(self, var_returns):
....
plt.text(var_returns, 25, f'VAR {round(var_returns, 4)} @ 5%',
horizontalalignment='right',
size='small',
color='navy')
....
plt.gca().add_patch(rect)
OUTPUT:
╒════════════════╤════════╤══════════════════════╤══════════╕
│ │ Mean │ Standard Deviation │ VaR │
╞════════════════╪════════╪══════════════════════╪══════════╡
│ ['TATAMOTORS'] │ 0.0003 │ 0.0225 │ -0.039306│
╘════════════════╧════════╧══════════════════════╧══════════╛
The value at risk of -0.039306 indicates that at a 95% confidence level, there will be a maximum loss of 3.9%, or there is a 5% probability that the losses will exceed 3.9%. In monetary terms, for an investment of 100,000, we are 95% confident that the maximum loss will be 3,930.
Bootstrap method
The Bootstrap method is similar to the historical method but in this case, we sample the returns multiple times like 100 or 1000 times or more, calculate the VaR and in the end take the average VaR. This is similar to the resampling that is done in the data science space where a dataset is resampled many times, the model is retrained to predict the value.
def var_bootstrap(self,iterations: int):
def var_boot(data):
...
...
return np.percentile(dff, 5, interpolation = 'lower')
def bootstrap(data, func):
sample = np.random.choice(data, len(data))
return func(sample)
def generate_sample_data(data, func, size):
bs_replicates = np.empty(size)
....
....
returns = obj_loadData.df_returns.copy()
....
return np.mean(bootstrap_VaR)
We have resampled the data over 500 iterations and calculated the VaR on each run and as the process is random, it is expected that the values should be in the same vicinity as that of the historical method i.e, -0.039306. Let us plot this distribution to validate our understanding.
The gray rectangle represents the return values and it falls in the range of -0.033 to -0.042 which is good. Now, let's take the mean of these values to arrive at VaR and also visualize the significant level by highlighting the area.
var_bootstrap = np.mean(bootstrap_VaR)
print(f'The Bootstrap VaR measure is {np.mean(bootstrap_VaR)}')
return np.mean(bootstrap_VaR)
OUTPUT:
╒════════════════╤══════════════╤
│ Stock │ Bootstrap │
╞════════════════╪═══════════════
│ ['TATAMOTORS'] │ -0.0369 │
╘════════════════╧══════════════╧
The value at risk of -0.0369 indicates that at a 95% confidence level, there will be a maximum loss of 3.69%, or there is a 5% probability that the losses will exceed 3.69%. This is 0.21% lower than the historical method and could be possibly due to the randomness introduced as part of resampling.
Decay factor method
In both the previous methods, there is no consideration for highs, lows, or market fluctuations which meant there was inherently a big assumption that the future trend would also be similar to the past year. The decay method addresses this issue by placing a higher weightage on the recent data. We will use the decay factor which will be between o and 1 i.e assign the lowest weightage to the farthest data point and higher weightage to the most recent data point.
decay_factor = 0.5 #we’re picking this arbitrarily
n = len(returns)
wts = [(decay_factor**(i-1) * (1-decay_factor))/(1-decay_factor**n)
for i in range(1, n+1)]
OUTPUT:
0.5,
0.25,
0.125,
0.0625,
0.03125,
....
....
2.210859150104178e-75,
1.105429575052089e-75,
5.527147875260445e-76]
we will create a data frame that has weights assigned to each data point.
wts_returns = pd.DataFrame(returns_recent_first['Stock'])
wts_returns['wts'] = wts
OUTPUT:
Stock wts
Date
2023-02-03 0.001461 0.50000
2023-02-02 -0.004142 0.25000
2023-02-01 -0.012055 0.12500
2023-01-31 0.019047 0.06250
2023-01-30 -0.004376 0.03125
....
....
2022-02-07 -0.011986 2.210859e-75
2022-02-04 -0.007730 1.105430e-75
2022-02-03 -0.003752 5.527148e-76
In the previous methods, we sorted the return values in ascending order and took the 13th lowest return value as VaR. We were able to do it because each of the data points had the same weightage of 1 but in the decay factor method, we have assigned different weights for each point so we cannot directly take the lowest 13th return value instead we will sum the weights till we hit 0.05 mark which is the 5% significant level and to make it easier we will use cumulative sum.
sort_wts = wts_returns.sort_values(by='Stock')
sort_wts['Cumulative'] = sort_wts.wts.cumsum()
sort_wts
sort_wts = sort_wts.reset_index()
idx = sort_wts[sort_wts.Cumulative <= 0.05].Stock.idxmax()
sort_wts.filter(items = [idx], axis = 0)
OUTPUT:
Date Stock wts Cumulative
63 2022-06-02 -0.012258 6.681912e-52 7.488894e-04
64 2023-02-01 -0.012055 1.250000e-01 1.257489e-01
We find that the cumulative value 0.05 falls between rows 63 and 64. We will have to interpolate to get the value which turns out to be -0.0121
xp = sort_wts.loc[idx:idx+1, 'Cumulative'].values
fp = sort_wts.loc[idx:idx+1, 'Stock'].values
var_decay = np.interp(0.05, xp, fp)
OUTPUT:
-0.01217808614447785
Here is the complete method that loads the data, generates and assigns weights, interpolates, and calculates the VaR.
def var_weighted_decay_factor(self):
returns = obj_loadData.df_returns.copy()
decay_factor = 0.5 #we’re picking this arbitrarily
n = len(returns)
wts = [(decay_factor**(i-1) * (1-decay_factor))/(1-decay_factor**n) for i in range(1, n+1)]
....
....
return var_decay
OUTPUT:
╒════════════════╤══════════════╤
│ Stock │ Decay │
╞════════════════╪═══════════════
│ ['TATAMOTORS'] │ -0.0122 │
╘════════════════╧══════════════╧
The decay method indicates that at a 95% confidence level, there will be a maximum loss of 1.22%, or there is a 5% probability that the losses will exceed 1.22%. This is significantly lower than the other two methods and this is due to the assignment of weights. The decay rate is set as 0.5, we can increase or decrease the decay rate to check for the most reasonable VaR. One approach would be to take a range of decay rates and run a simulation to get a range of VaR.
Monte Carlo simulation method
This method is similar to the Bootstrap method but the only difference is instead of choosing a data point from the existing set of return values, we generate a new set of return values within the same distribution. Let us understand it step by step.
We will calculate the mean and the standard distribution.
returns_mean = returns['Stock'].mean()
returns_sd = returns['Stock'].std()
We will write a method to generate the set of values from the distribution with the mean of returns_mean and the standard distribution of returns_sd
iterations = 1000
def simulate_values(mu, sigma, iterations):
try:
result = []
for i in range(iterations):
tmp_val = np.random.normal(mu, sigma, (len(returns)))
var_hist = np.percentile(tmp_val, 5, interpolation = 'lower')
result.append(var_hist)
return result
except Exception as e:
print(f'An exception occurred while generating simulation values: {e}')
Let us now execute the method and VaR for each of the 1000 iterations.
sim_val = simulate_values(returns_mean,returns_sd, iterations)
tmp_df = pd.DataFrame(columns=['Iteration', 'VaR'])
tmp_df['Iteration'] = [i for i in range(1,iterations+1)]
tmp_df['VaR'] = sim_val
tmp_df.head(50)
print(f'The mean VaR is {statistics.mean(sim_val)}')
OUTPUT:
Iteration VaR
1 -0.034532
2 -0.035278
3 -0.034831
4 -0.033859
....
....
997 -0.035699
998 -0.038877
999 -0.038362
1000 -0.035165
╒════════════════╤══════════════╤
│ Stock │ Monte Carlo │
╞════════════════╪══════════════╪
│ ['TATAMOTORS'] │ -0.03716 │
╘════════════════╧══════════════╧
The VaR is less than the historical method (-0.0393) and greater than the decay method (-0.0122). This result is more or less the same as the bootstrap method (-0.0366).
It will be good to have a function that would give consolidated results from all the approaches in a tabular form.
def show_summary(self):
try:
var_hist = self.var_historical()
var_bs = self.var_bootstrap()
var_decay = self.var_weighted_decay_factor()
var_MC = self.var_monte_carlo()
print(tabulate([[self.ticker,var_hist,var_bs,var_decay, var_MC]],
headers = ['Historical', 'Bootstrap',
'Decay', 'Monte Carlo'],
tablefmt = 'fancy_grid',stralign='center',
numalign='center',floatfmt=".4f"))
except Exception as e:
print(f'An exception occurred while executing show_summary: {e}')
OUTPUT:
╒════════════════╤══════════════╤═════════════╤═════════╤═══════════════╕
│ Stock │ Historical │ Bootstrap │ Decay │ Monte Carlo │
╞════════════════╪══════════════╪═════════════╪═════════╪═══════════════╡
│ ['TATAMOTORS'] │ -0.0393 │ -0.0366 │ -0.0122 │ -0.0373 │
╘════════════════╧══════════════╧═════════════╧═════════╧═══════════════╛
The complete code can be found on the GitHub
Advantages of VaR
- It is easy to understand and interpret as a single metric for risk assessment and it is used as a first-line analysis to gauge the pulse of the investment for a given period.
- It can be used for risk assessment of bonds, shares, or similar asset classes.
Disadvantages of VaR
- The accuracy of the assessment depends on the quality of the data and the assumptions. eg: it is assumed that the data is normally distributed, the economic factors are not considered, etc.
- The VAR gives the maximum potential loss for a given period at a specific confidence interval but it doesn’t tell us how big or small will be the potential losses beyond a point.
- It is difficult to use VaR for large portfolios as risk calculation should be done for each of the assets and there is a correlation angle to be factored in as well.
Conclusion
VaR is a widely used metric for buying, selling, and recommending universally because of its simplicity. There are multiple approaches to calculating the VaR and we learned the four basic methods using python. As there is no protocol or standard for the calculation of VaR, different methods result in different results as we saw in this blog. It is a good tool as a first-line analysis to gauge the risk involved for an asset class and it will be more effective when coupled with sophisticated methods that factor in the market trends, economic conditions, and other financial factors.
I hope you liked the article and found it helpful.
You can connect with me — on Linkedin and Github
Disclaimer:
The blog is only for educational purposes and should not be used as the basis for making any real-world financial decisions.