# What is Value at Risk(VaR)?

Value at Risk (VaR) is the most common metric used by an individual, firms, and banks to determine the extent of potential financial loss for a given period of time. Financial institutions use VaR to assess the risk associated with an investment to evaluate whether they have sufficient funds to cover potential losses, it also helps risk managers to re-evaluate and adjust their investments to reduce the risk of higher losses. The typical question that VaR helps answer is what is the maximum loss on investment of 100,000 over one year with a confidence of 95% ? or What is the probability that losses will be more than 3% next year? The standard way of representing these scenarios is in the form of a normal distribution which makes the analysis and interpretation much easier.

There are three important factors at play — the confidence interval, the time period, and the investment amount or expected percentage of loss.

In this blog, we will cover the below topics

• Exploratory data analysis(EDA)
• Calculation of returns
• Methodologies to calculate VaR
- Historical method
- Bootstrap method
- Decay factor method
- Monte Carlo simulation method

# Exploratory data analysis (EDA)

These are standard libraries for data loading, analysis, and visualization. The key point is to note the use of nsepy for getting the stock data. The other alternative is to use yahoo finance.

`import numpy as npimport pandas as pdimport warningsimport matplotlib.pyplot as pltfrom datetime import datefrom tabulate import tabulatefrom nsepy import get_history as gh`

## Load the data and initial settings

We will be using the last year of data of the TATA MOTORS. This can be any name of interest like ICICI or DABUR etc. This gives the daily value of the stock price.

`initial_investment = 100000startdate = date(2022,2,2)end_date = date(2023,2,2)stocksymbols = ['TATAMOTORS']    # This can be any stock    def load_stock_data(self):        df = pd.DataFrame()        for i in range(len(self.ticker)):          ....          ...        return dfOUTPUT:            TATAMOTORSDate                  2022-10-14      396.252022-10-17      396.102022-10-18      404.252022-10-19      399.052022-10-20      398.10........2023-01-27      445.602023-01-30      443.652023-01-31      452.102023-02-01      446.652023-02-02      444.80`

## Calculate the returns

The calculation of returns is a simple process, we take the percentage change between the current value and the previous value.

P(t+1) — P(t) / P(t)

This can be achieved easily in python using pct_change method and let's visualize the returns.

`def stock_returns(self):    df = self.load_stock_data()    df.columns = ['Stock']    returns = df.pct_change()    returns.dropna(inplace=True)     return returnsOUTPUT:               StockDate                2022-10-17 -0.0003792022-10-18  0.0205762022-10-19 -0.0128632022-10-20 -0.0023812022-10-21 -0.000126`

We have accessed the data, loaded it, calculated the returns, and visualized the trends. Let us now try the three methods to calculate the VaR in the next section

# Historical method

This is the simplest of all the methods and the most basic as it doesn’t give importance to the distribution and tails. In this method, we take the return values from the previous section and sort the values. As a standard, we take a confidence level of 95% i.e, our focus shifts to the bottom 5%. Also, the number of trading days in a year is 252, so all that we need to do is calculate 5% of 252 days which is 12.6 meaning we will have to take the 13th lowest return which turns out to be -0.039306 as shown below.

`returns.sort_values('Stock').head(13)OUTPUT:2022-02-24 -0.1028302022-09-26 -0.0605062022-03-07 -0.0557222022-02-14 -0.0549262022-06-16 -0.0510752022-06-13 -0.0498772022-11-10 -0.0483672022-03-04 -0.0454132022-05-06 -0.0416372022-05-12 -0.0408352022-12-23 -0.0408162022-05-19 -0.0397452022-10-10 -0.039306`

Python has a more elegant way to get to this value using NumPy's percentile function.

`np.percentile(returns['Stock'], 5, interpolation = 'lower')OUTPUT:-0.039306077884265433`

Here is the complete method that loads the data, and calculates the VaR. The end result is a nice tabular format.

`def var_historical(self):                returns = obj_loadData.df_returns.copy()            ....        ....        returns.sort_values('Stock').head(13)        var_hist = np.percentile(returns['Stock'], 5, interpolation = 'lower')                    print(tabulate([[self.ticker,avg_rets,avg_std,var_hist]],                       headers = ['Mean', 'Standard Deviation', 'VaR %'],                       tablefmt = 'fancy_grid',stralign='center',numalign='center',floatfmt=".4f"))        return var_histdef plot_shade(self, var_returns):        ....        plt.text(var_returns, 25, f'VAR {round(var_returns, 4)} @ 5%',                  horizontalalignment='right',                  size='small',                  color='navy')        ....        plt.gca().add_patch(rect)OUTPUT:╒════════════════╤════════╤══════════════════════╤══════════╕│                │  Mean  │  Standard Deviation  │     VaR  │╞════════════════╪════════╪══════════════════════╪══════════╡│ ['TATAMOTORS'] │ 0.0003 │        0.0225        │ -0.039306│╘════════════════╧════════╧══════════════════════╧══════════╛`

The value at risk of -0.039306 indicates that at a 95% confidence level, there will be a maximum loss of 3.9%, or there is a 5% probability that the losses will exceed 3.9%. In monetary terms, for an investment of 100,000, we are 95% confident that the maximum loss will be 3,930.

# Bootstrap method

The Bootstrap method is similar to the historical method but in this case, we sample the returns multiple times like 100 or 1000 times or more, calculate the VaR and in the end take the average VaR. This is similar to the resampling that is done in the data science space where a dataset is resampled many times, the model is retrained to predict the value.

`def var_bootstrap(self,iterations: int):                               def var_boot(data):                        ...            ...            return np.percentile(dff, 5, interpolation = 'lower')                        def bootstrap(data, func):            sample = np.random.choice(data, len(data))            return func(sample)        def generate_sample_data(data, func, size):            bs_replicates = np.empty(size)            ....            ....        returns = obj_loadData.df_returns.copy()        ....        return np.mean(bootstrap_VaR)`

We have resampled the data over 500 iterations and calculated the VaR on each run and as the process is random, it is expected that the values should be in the same vicinity as that of the historical method i.e, -0.039306. Let us plot this distribution to validate our understanding.

The gray rectangle represents the return values and it falls in the range of -0.033 to -0.042 which is good. Now, let's take the mean of these values to arrive at VaR and also visualize the significant level by highlighting the area.

`var_bootstrap = np.mean(bootstrap_VaR)        print(f'The Bootstrap VaR measure is {np.mean(bootstrap_VaR)}')        return np.mean(bootstrap_VaR)OUTPUT:╒════════════════╤══════════════╤│      Stock     │  Bootstrap   │  ╞════════════════╪═══════════════│ ['TATAMOTORS'] │   -0.0369    │╘════════════════╧══════════════╧`

The value at risk of -0.0369 indicates that at a 95% confidence level, there will be a maximum loss of 3.69%, or there is a 5% probability that the losses will exceed 3.69%. This is 0.21% lower than the historical method and could be possibly due to the randomness introduced as part of resampling.

# Decay factor method

In both the previous methods, there is no consideration for highs, lows, or market fluctuations which meant there was inherently a big assumption that the future trend would also be similar to the past year. The decay method addresses this issue by placing a higher weightage on the recent data. We will use the decay factor which will be between o and 1 i.e assign the lowest weightage to the farthest data point and higher weightage to the most recent data point.

`decay_factor = 0.5 #we’re picking this arbitrarilyn = len(returns)wts = [(decay_factor**(i-1) * (1-decay_factor))/(1-decay_factor**n)             for i in range(1, n+1)]OUTPUT:0.5,0.25,0.125,0.0625,0.03125,........2.210859150104178e-75,1.105429575052089e-75,5.527147875260445e-76]`

we will create a data frame that has weights assigned to each data point.

`wts_returns = pd.DataFrame(returns_recent_first['Stock'])wts_returns['wts'] = wts              OUTPUT:               Stock         wtsDate  2023-02-03    0.001461     0.500002023-02-02   -0.004142     0.250002023-02-01   -0.012055     0.125002023-01-31    0.019047     0.062502023-01-30   -0.004376     0.03125........2022-02-07    -0.011986    2.210859e-752022-02-04    -0.007730    1.105430e-752022-02-03    -0.003752    5.527148e-76`

In the previous methods, we sorted the return values in ascending order and took the 13th lowest return value as VaR. We were able to do it because each of the data points had the same weightage of 1 but in the decay factor method, we have assigned different weights for each point so we cannot directly take the lowest 13th return value instead we will sum the weights till we hit 0.05 mark which is the 5% significant level and to make it easier we will use cumulative sum.

`sort_wts = wts_returns.sort_values(by='Stock')sort_wts['Cumulative'] = sort_wts.wts.cumsum()sort_wtssort_wts = sort_wts.reset_index()idx = sort_wts[sort_wts.Cumulative <= 0.05].Stock.idxmax()sort_wts.filter(items = [idx], axis = 0)OUTPUT:    Date       Stock       wts           Cumulative63 2022-06-02 -0.012258   6.681912e-52   7.488894e-0464 2023-02-01 -0.012055   1.250000e-01   1.257489e-01`

We find that the cumulative value 0.05 falls between rows 63 and 64. We will have to interpolate to get the value which turns out to be -0.0121

`xp = sort_wts.loc[idx:idx+1, 'Cumulative'].valuesfp = sort_wts.loc[idx:idx+1, 'Stock'].valuesvar_decay = np.interp(0.05, xp, fp)OUTPUT:-0.01217808614447785 `

Here is the complete method that loads the data, generates and assigns weights, interpolates, and calculates the VaR.

`def var_weighted_decay_factor(self):                returns = obj_loadData.df_returns.copy()        decay_factor = 0.5 #we’re picking this arbitrarily        n = len(returns)        wts = [(decay_factor**(i-1) * (1-decay_factor))/(1-decay_factor**n) for i in range(1, n+1)]        ....        ....              return var_decayOUTPUT:╒════════════════╤══════════════╤│     Stock      │  Decay       │  ╞════════════════╪═══════════════│ ['TATAMOTORS'] │   -0.0122    │╘════════════════╧══════════════╧`

The decay method indicates that at a 95% confidence level, there will be a maximum loss of 1.22%, or there is a 5% probability that the losses will exceed 1.22%. This is significantly lower than the other two methods and this is due to the assignment of weights. The decay rate is set as 0.5, we can increase or decrease the decay rate to check for the most reasonable VaR. One approach would be to take a range of decay rates and run a simulation to get a range of VaR.

# Monte Carlo simulation method

This method is similar to the Bootstrap method but the only difference is instead of choosing a data point from the existing set of return values, we generate a new set of return values within the same distribution. Let us understand it step by step.

We will calculate the mean and the standard distribution.

`returns_mean = returns['Stock'].mean()returns_sd = returns['Stock'].std()`

We will write a method to generate the set of values from the distribution with the mean of returns_mean and the standard distribution of returns_sd

`iterations = 1000def simulate_values(mu, sigma, iterations):    try:        result = []              for i in range(iterations):            tmp_val = np.random.normal(mu, sigma, (len(returns)))              var_hist = np.percentile(tmp_val, 5, interpolation = 'lower')            result.append(var_hist)        return result    except Exception as e:        print(f'An exception occurred while generating simulation values: {e}')`

Let us now execute the method and VaR for each of the 1000 iterations.

`sim_val = simulate_values(returns_mean,returns_sd, iterations)tmp_df = pd.DataFrame(columns=['Iteration', 'VaR'])tmp_df['Iteration'] = [i for i in range(1,iterations+1)]tmp_df['VaR'] = sim_valtmp_df.head(50)print(f'The mean VaR is {statistics.mean(sim_val)}')OUTPUT:Iteration VaR1         -0.0345322         -0.0352783         -0.0348314         -0.033859........997       -0.035699998       -0.038877999       -0.0383621000      -0.035165╒════════════════╤══════════════╤│     Stock      │  Monte Carlo │  ╞════════════════╪══════════════╪│ ['TATAMOTORS'] │   -0.03716   │╘════════════════╧══════════════╧`

The VaR is less than the historical method (-0.0393) and greater than the decay method (-0.0122). This result is more or less the same as the bootstrap method (-0.0366).

It will be good to have a function that would give consolidated results from all the approaches in a tabular form.

` def show_summary(self):        try:            var_hist = self.var_historical()            var_bs = self.var_bootstrap()            var_decay = self.var_weighted_decay_factor()            var_MC = self.var_monte_carlo()                       print(tabulate([[self.ticker,var_hist,var_bs,var_decay, var_MC]],                        headers = ['Historical', 'Bootstrap',                                     'Decay', 'Monte Carlo'],                        tablefmt = 'fancy_grid',stralign='center',                                    numalign='center',floatfmt=".4f"))        except Exception as e:            print(f'An exception occurred while executing show_summary: {e}')OUTPUT:╒════════════════╤══════════════╤═════════════╤═════════╤═══════════════╕│      Stock     │  Historical  │  Bootstrap  │  Decay  │  Monte Carlo  │╞════════════════╪══════════════╪═════════════╪═════════╪═══════════════╡│ ['TATAMOTORS'] │   -0.0393    │   -0.0366   │ -0.0122 │    -0.0373    │╘════════════════╧══════════════╧═════════════╧═════════╧═══════════════╛`

The complete code can be found on the GitHub

• It is easy to understand and interpret as a single metric for risk assessment and it is used as a first-line analysis to gauge the pulse of the investment for a given period.
• It can be used for risk assessment of bonds, shares, or similar asset classes.

• The accuracy of the assessment depends on the quality of the data and the assumptions. eg: it is assumed that the data is normally distributed, the economic factors are not considered, etc.
• The VAR gives the maximum potential loss for a given period at a specific confidence interval but it doesn’t tell us how big or small will be the potential losses beyond a point.
• It is difficult to use VaR for large portfolios as risk calculation should be done for each of the assets and there is a correlation angle to be factored in as well.

# Conclusion

VaR is a widely used metric for buying, selling, and recommending universally because of its simplicity. There are multiple approaches to calculating the VaR and we learned the four basic methods using python. As there is no protocol or standard for the calculation of VaR, different methods result in different results as we saw in this blog. It is a good tool as a first-line analysis to gauge the risk involved for an asset class and it will be more effective when coupled with sophisticated methods that factor in the market trends, economic conditions, and other financial factors.

I hope you liked the article and found it helpful.

You can connect with me — on Linkedin and Github

## Disclaimer:

The blog is only for educational purposes and should not be used as the basis for making any real-world financial decisions.

# References

Investopedia

Unsplash

--

--

Data Science enthusiast with experience in machine learning and passionate about building analytic apps. http://www.linkedin.com/in/amitvkulkarni2