We used Random Forest Classifier and Cross Validation to predict the top performing mutual funds for 3 year, 5 year and 10 year investments.

A good fund should give maximum returns with less risk i.e. it should be less volatile. A fund may be giving > 30 % returns based on its past performance but if it has a high volatility/ high risk, then it is advisable not to invest in such fund. We used the information avaiable on Investopedia website to determine what should be the parameter values for a good fund.

From the information available from Investopedia website, a good fund should have the following values :

  • R sqaured : between 85-100
  • Return : as high as possible
  • Beta : greater than 1
  • alpha : greater than 0
  • sharpe ratio : greater than 0
  • Standard deviation : low

The aggregated data of 3 years, 5 years and 10 years was used to fit three Random Forest Models for 3 years, 5 years and 10 years investment respectively. Various fund parameters such as Alpha, Beta , Sharpe Ratio , Standard Deviation , Returns , R Squared and Expense Ratio were used for the analysis. For each analysis, n_estimator ranging from 1 to 40 Decision trees were used for Random Forest to find out which n_estimator gives the best accuracy for the model. 'f1' scoring parameter was used to decide the accuracy of each model.

We plotted boxplot of Accuracy Vs Number of Trees for short term/long term investments and chosen the Model giving the best accuracy. The box plot for 3 years , 5 years and 10 years investment are as shown below.

Random Forest for 3 years

3 years Random Forest

Random Forest for 5 years

5 years Random Forest

Random Forest for 10 years

10 years Random Forest

Based on the result of 1 to 40 Decision Trees, we selected the n_estimator giving maximun accuarcy for each model and predicted top performing fund for short term and long term investment.