CMSC320 Final Project
Name: Ryan Downing
Due: May 18, 2020
In the past few decades, the world of finance has been transformed by data. Now more than ever, companies are utilizing big data in attempt to predict the future. More specifically in regards to the stock market, investors want to predict whether a stock's share price will appreciate, or depreciate in the future.
This tutorial will walkthrough the data science process using data from: https://www.quandl.com/tables/SF1/SHARADAR-SF1
We will explore how we can use visualizations to help investors understand market conditions, and then apply more advanced techniques of machine learning to predict future performance. In particular, this tutorial will compare the prediction performance of a multivariate logistic regression against a random forest prediction.
import datetime
import requests
import psycopg2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
SEED = 1234
np.random.seed(SEED)
CONNECTION_PARAMS = {
'database': ## REDACTED ##
'user': ##################
'password': ##############
'host': ##################
'port': #### REDACTED ####
}
conn = psycopg2.connect(**CONNECTION_PARAMS)
cur = conn.cursor()
The data used in this tutorial comes from a SQL database as defined above. To concentrate the scope of this analysis, we will be looking at those companies that are currently in the S&P500 (taken from wikipedia). In addition, we will be looking at data between January 1st, 2005, and December 31st 2019. This gives us 15 years of data to work with, of which 12 years (2005-2017) will be used for training the model, and 3 years (2017-2020) will be reserved to test the model.
Each entry in the database represents a SEC 10Q filing for a company, given by ticker
. The attributes represent the different data elements of the SEC filing. For this analysis, we will focus on those given by the income statement
, statement of cash flows
, and balance sheet
. We will also look at many of the common metrics
used by investors to asses companies. A more thorough documentation of these attributes can be found here.
The code below constructs a SQL query to pull the necessary data from the database. To do this we must first pull the table of S&P companies from the wikipedia page, and then select the Symbol
column which gives us the ticker of each company. We must then create a list of all the attributes we want to pull from the database, features
.
Since we are interested in the change of share price for a given ticker over time, I also add datekey
, ticker
, and price
to the attributes list. We can then build a string query for the SQL database by joining the ticker conditions with OR
and the datae conditions with AND
, and then finally combine both of these with AND
for the WHERE
clause of the query.
The database also needs to know the list of parameters these conditions refer to, so we can simply build a list by combining the tickers with the start and end date. Now we can create the full query using the SELECT
, FROM
, WHERE
template.
note: dimension = \'ARQ\'
refers to "as-reported quarterly" which avoids future-bias, more information can be found here
table = 'fundamental_raw' # table name in database
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
sp_tickers = pd.read_html(url)[0]['Symbol'].unique() # Gets table from wikipedia and selects unique tickers
start_date = datetime.datetime(year=2005, month=1, day=1)
end_date = datetime.datetime(year=2019, month=12, day=31)
# Income statement
IS = ['revenue', 'cor', 'sgna', 'rnd', 'opex',
'intexp', 'taxexp', 'ebit', 'netinc', 'eps']
# Statement of Cash flows
CF = ['capex', 'ncfbus', 'ncfinv', 'ncff',
'ncfdebt', 'ncfcommon', 'ncfdiv', 'ncfi',
'ncfo', 'ncfx', 'ncf', 'sbcomp', 'depamor']
# Balance Sheet
BS = ['assets', 'cashneq', 'investments', 'investmentsc',
'investmentsnc', 'deferredrev', 'deposits','ppnenet',
'inventory', 'taxassets', 'receivables', 'payables',
'intangibles', 'liabilities', 'equity', 'retearn',
'assetsc', 'assetsnc', 'liabilitiesc', 'liabilitiesnc',
'taxliabilities', 'debt', 'debtc', 'debtnc']
# Common Metrics
MT = ['marketcap', 'ev', 'fcf', 'gp', 'opinc', 'grossmargin',
'netmargin', 'ebitdamargin', 'payoutratio', 'evebitda',
'evebit', 'pe', 'sps', 'ps', 'pb', 'de', 'divyield',
'currentratio', 'workingcapital', 'fcfps', 'bvps']
features = IS + CF + BS + MT # create a single list of all the features we are interested in
attributes = ['datekey', 'ticker', 'price'] + features # Create list of all attributes
ticker_conds = ' OR '.join(['ticker = %s' for ticker in sp_tickers]) # query for each ticker
date_conds = 'datekey >= %s AND datekey <= %s' # date query
query_conds = f'({ticker_conds}) AND ({date_conds}) AND (dimension = \'ARQ\')' # combine to create full "WHERE" query
attrs_query = ', '.join(attributes)
params = list(sp_tickers) + [start_date, end_date] # create list of parameters for SQL query
fundamental_query = f'SELECT {attrs_query} FROM {table} WHERE {query_conds}' # create full query
Now that we have the full query constructed, we can execute the query in the database using cur.execute
. The response gives us the full dataset, which we can simply put into a pandas dataframe, using the column names received from cur.description
.
# Execute query and store into dataframe
cur.execute(fundamental_query, params)
reponse = cur.fetchall()
col_names = [desc.name for desc in cur.description]
fund = pd.DataFrame(data=reponse, columns=col_names)
fund.head()
To get this data into a usable format for analysis, we must do some cleaning to prepare it. Fortunately, this database is already in a fairly clean format. However, we must convert the datekey
column to datetime
datatype, and then rename it date
. Now that we have this, we can easily extract the year using dt.year
to create a year
column.
Since companies have the ability to post corrections for their SEC filing, there is a chance that a company has multiple reports for a single date. To correct for this, we should drop duplicate rows, regarding ticker
and date
.
fund['datekey'] = pd.to_datetime(fund['datekey'])
fund.rename(columns={'datekey': 'date'}, inplace=True)
fund['year'] = fund['date'].dt.year
fund.drop_duplicates(subset=['date', 'ticker'], keep='last', inplace=True)
fund.head()
Since we are not concerned with the share price, but rather the change in share price, we must calculate the percent change in price between report periods for each ticker. To do this we must first ensure the data is sorted by date so the percent change is calculated between consecutive periods. Then we can just groupby ticker
and use pandas pct_change()
method to calculate the price change for each ticker.
Calculating change results in a NaN
in the first row for each ticker, so we should drop these rows. Then we transform the price_change
to a binary up
or down
signal using np.where
.
fund.sort_values('date', inplace=True)
fund['price_change'] = fund.groupby('ticker')['price'].pct_change()
fund.dropna(subset=['date', 'ticker', 'price', 'price_change'], inplace=True)
fund['movement'] = np.where(fund['price_change'] >= 0, 'up', 'down')
fund.head()
Since we are interested in creating a model to predict future price movement, let's first take a look at what the data we are working with looks like. To get an understanding of how price_change
is distributed, we can create distplot
s facetted on year to see the varying distributions across years. We can immediately see that price_change
tends to be normally distributed with a mean just above 0. There is exceptions to this as we see in 2008, 2009 (during the Great Financial Crisis) where the stock market experienced a strong decline as well as increased volatiltiy, as seen by the increased spread.
g = sns.FacetGrid(fund, col='year', col_wrap=5) # create facet grid
g.map(sns.distplot, 'price_change') # plot yearly histograms
plt.xlim(-1, 1)
plt.show()
We can also take a look at the price trends for individual stocks, or a group of stocks as shown below. To do this we must groupby ticker
once again, and take a cumulative sum of the price changes for the given ticker. Now that we have the cumulative returns stored in a new column, we can plot them across dates easily. I encourage you to change ticker_subset
to look at different companies' performance.
ticker_subset = ['FB', 'AMZN', 'NFLX', 'GOOGL', 'AAPL']
df = fund[['date', 'ticker', 'price_change']].copy() # make a copy for plotting
df = df[df['ticker'].isin(ticker_subset)] # select only interested tickers
df['returns'] = df.groupby('ticker')['price_change'].cumsum() # calculate cumulative returns
plt.rcParams['figure.figsize'] = [12, 6] # adjust plot size
sns.lineplot(data=df, x='date', y='returns', hue='ticker')
plt.show()
Because our model is only concerned with whether the price moved up or down and not by how much, we should take a look at how our movement
variable changes over time. To do this we have to groupby year
and then calculate the proportion of up
movements in each year. Since total movement
s is given by x.shape[0]
we can just sum all the occurrences of up
and then divide.
In the plot we can see that between 2005 and 2020, the proportion of up
movements fluctuates between 50 and 75 percent, with the exception of 2008 which only experienced about 30% up movements.
yearly_movement = fund.groupby('year')['movement'].apply(lambda x: (x == 'up').sum() / x.shape[0])
yearly_movement.plot()
We can take a more detailed look at this by plotting the individual price changes across time. Since this results in a lot of data points, I take a 10% sample from the original dataset to use for plotting. Furthermore, we can create a cut
function to label the market capitalizations (marketcap) of each data point into three equal groups: small cap, medium cap, and large cap.
The plot supports what we concluded previously. The distribution appears to be normal, centered slightly above 0. We can also see the drawdown from 2008-2010 in a more detailed way, as many companies experience negative price changes.
def cut(x, bins, labels=None):
"""Splits x into [bins] equal groups"""
percentiles = np.arange(0, 100, 100 / bins)
cutoffs = np.nanpercentile(x, percentiles) # value of each percentile in x
out = np.digitize(x, cutoffs) # value of each group the respective x value falls in
if labels: # convert bins to label names if labels are given
lab_dict = dict(zip(range(1, bins+1), labels))
return pd.Series(out).replace(lab_dict).values
else:
return out
fund['mktcap_bin'] = cut(fund['marketcap'], 3, ['small cap', 'medium cap', 'large cap'])
plot_data = fund.sample(fund.shape[0] // 10, random_state=SEED) # select 10% of data points
sns.scatterplot(x='date', y='price_change', hue="mktcap_bin", alpha=.5,
data=plot_data, hue_order=['small cap', 'medium cap', 'large cap'])
del plot_data # no longer needed, clear from memory
plt.xlim(start_date, end_date)
plt.ylim(-1, 1)
plt.show()
Now that we assigned a bin for the marketcap of each stock, lets take a closer look at how the price_change
distribution varies between each group. We can do this by creating violin plots for each marketcap bin where the y-axis is the price_change
. This graph gives shows us that while the mean price_change
is about the same for each group, the spread decreases as marketcap increases. This makes sense because we expect larger companies to have lower volatility. This can be attributed to larger trading volumes, more clear news from large companies, or many other factors.
# sort by mktcap_bin so the x-axis is in order: small, medium, large
sns.violinplot(data=fund.sort_values('mktcap_bin', ascending=False), inner='quartile', x='mktcap_bin', y='price_change')
plt.ylim(-1, 1)
plt.show()
Lastly, we can gain some idea of the relationship between features by constructing a correlation matrix between them. For this example I will just look at the balance sheet attributes, in addition to the price. Still focusing on change, we calculate the pct_change
for each attribute grouping first by ticker and then selecting the relevant attributes, attrs
. Then we call pandas corr()
method to create a correlation matrix and use plt.matshow
to plot.
This matrix is not very telling, but begins to give us an idea how the different features are related. We are most concerned with the correlations with price
, as seen in the last row or column. There are not any strong correlations, so perhaps combining some of these features with machine learning will be more useful. We do see stronger correlations where we would expect so, between the different asset
and liability
classes, which we should expect to trend together.
attrs = BS + ['price']
correlations = fund.groupby('ticker')[attrs].pct_change().corr()
plt.rcParams['figure.figsize'] = [15, 15]
plt.matshow(correlations)
plt.xticks(range(correlations.shape[1]), correlations.columns, rotation=90, fontsize=12)
plt.yticks(range(correlations.shape[1]), correlations.columns, fontsize=12)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.show()
From our previous exploratory data analysis (EDA), we found that there may be a relationship between the standard deviation of price changes for a company, and that company's marketcap. We can test this hypothesis using a linear fit against the relevant data. First we summarize each ticker, taking the mean marketcap for each, and standard deviation of price changes. We should also take the log of the marketcaps for each ticker to even out the distribution, since there are many small companies, with only a few very large ones.
ticker_group = fund.groupby('ticker')
ticker_data = pd.DataFrame(index=fund['ticker'].unique()) # create new dataframe to hold data for each ticker
ticker_data['ticker'] = ticker_data.index
ticker_data['std'] = ticker_group['price_change'].std()
ticker_data['marketcap'] = ticker_group['marketcap'].mean()
ticker_data['log_mktcap'] = np.log(ticker_data['marketcap'].values)
ticker_data.dropna(inplace=True)
We can plot this relationship using seaborn's lmplot
which shows that there is a slight negative relationship between these factors.
sns.lmplot(data=ticker_data, x='log_mktcap', y='std', aspect=2, scatter_kws={'alpha': .25})
plt.ylim(0, .5)
plt.show()
We can also use statsmodels to see the statistics of this fit. While the R-squared value is poor, only accounting for 1.8% of the variance, the $\mathrm{p-value} = .003 \leq \alpha = .05$ which means there is a statistically significant relationship between the (log) marketcap of a company, and it's share price's volatility.
import statsmodels.formula.api as smf
model = smf.ols('std~log_mktcap', data=ticker_data)
results = model.fit()
results.summary()
Now that we have done some basic hypothesis testing, we can move into more advanced machine learning techniques. To do this we must first convert the movement
label to a binary [1, 0]
score so it can be used in our models. As previously mentioned, we should split the data at 2017/01/01 for our training set and out-of-sample (oos) testing set.
Since we want to predict future price movement, we shift movement
column up 1 so it refers to the next change in share price.
data = fund.copy()
data['movement'] = data.groupby('ticker')['movement'].shift(-1) # movement now associated with future movement
data['mvmt_binary'] = np.where(data['movement'] == 'up', 1, 0)
# Split training and testing set at January 1st, 2017
predictor_df = data[data['date'] < datetime.datetime(year=2017, month=1, day=1)]
oos_df = data[data['date'] >= datetime.datetime(year=2017, month=1, day=1)]
# Split into X and Y and convert to numpy matrices
X = predictor_df[features].values
y = predictor_df['mvmt_binary'].values
X_oos = oos_df[features].values
y_oos = oos_df['mvmt_binary'].values
We will be looking at the prediction performance between a logistic regression and random forest, so first import all the necessary modules from sklearn
.
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import auc
For the logistic regression we need to adjust the max iterations to be the length of the dataset to ensure the model converges. For the random forest I will be using 250 decision trees with a max depth of 10 and also require at least 10 samples to split at a decision node.
Since the data is not perfectly clean, we will use sklearn's SimpleImputer
to fill NaN
values with the mean of the values for that feature. We then use fit_transform
to clean our data sets and then fit our models using the training data. Once the models are fitted, we use predict_proba
to test our out-of-sample data against the model.
# Create classification models
log_reg = LogisticRegression(max_iter=X.shape[0])
rf = RandomForestClassifier(n_estimators=250, max_depth=10,
min_samples_split=10, max_features="auto")
imp = SimpleImputer(missing_values=np.nan, strategy='mean') # object for cleaning data
# Clean training and testing sets
train_clean = imp.fit_transform(X)
test_clean = imp.fit_transform(X_oos)
# Fit models
log_reg.fit(train_clean, y)
rf.fit(train_clean, y)
# Get scores of models
log_scores = log_reg.predict_proba(test_clean)[:,1]
rf_scores = rf.predict_proba(test_clean)[:,1]
We can look at the coefficients from the logistic regression and the feature importances from the random forest to understand how the different models are weighting the various features.
def pretty_dict(val, per_row=5):
"""Prints dict items to stdout where per_row refers
to the number of items printed on each line"""
out = ""
for i, (k, v) in enumerate(val.items(), 1):
out += f"'{k}': {v}, "
if i % per_row == 0:
out += '\n'
return out
# Create dict of features' coefficients, sort, and format for printing
log_coefs = dict(zip(features, log_reg.coef_[0]))
log_coefs = sorted(log_coefs.items(), key=lambda item: item[1], reverse=True)
log_coefs = {k: '{:.3e}'.format(v) for k, v in log_coefs}
# Create dict of features' importances, sort, and format for printing
rf_importances = dict(zip(features, rf.feature_importances_))
rf_importances = sorted(rf_importances.items(), key=lambda item: item[1], reverse=True)
rf_importances = {k: round(v, 3) for k, v in rf_importances}
print(f"Logistic Regression Coefficients:\n{pretty_dict(log_coefs, 4)}")
print(f"Random Forest Feature Importances:\n{pretty_dict(rf_importances, 5)}")
We can also export one of the decision trees from the random forest to an image to see the actual decision path of one of the trees in the model.
from sklearn.tree import export_graphviz
from IPython.display import Image
import pydot
estimator = rf.estimators_[0]
# Export decision tree as dot file
export_graphviz(estimator, out_file='tree.dot',
feature_names = features,
class_names = None,
rounded = True, proportion = False,
precision = 2, filled = True,
max_depth=3)
# Use pydot to convert dot file to png
(graph,) = pydot.graph_from_dot_file('tree.dot')
graph.write_png('tree.png')
Image(filename='tree.png') # display image
Using the scores computed from predict_proba
we can calculate the true positive rate, false positive rate, precision, and negative predictive value at various thresholds. Normally we would use a threshold of 0.5, however, by looking at these rates, we can find a more optimal cutoff to improve our overall model.
def get_metrics(scores, y_oos, points=1001):
metrics = []
pos = y_oos == 1 # boolean mask for where out-of-sample is 1
neg = y_oos == 0 # boolean mask for where out-of-sample is 0
pos_total = pos.sum() # total out-of-sample positives
neg_total = neg.sum() # total out-of-sample negatives
for i in np.linspace(0, 1, points):
pred = scores > i
tp = (pred[pos] == y_oos[pos]).sum() # Calculate true positives
fp = (pred[neg] != y_oos[neg]).sum() # Calculate false positives
fn = (pred[pos] != y_oos[pos]).sum() # Calculate false negatives
p = pred.sum() # Calculate number of positives predicted
n = pred.shape[0] - p # Number of negatives is total_predictions - positive_predictions
tpr = tp / pos_total # Calculate true positive rate
fpr = fp / neg_total # Calculate false positive rate
ppv = tp / p if p else 0 # Calculate positive predicitive value (precision)
npv = fn / n if n else 0 # Calculate negative
metrics.append([i, tpr, fpr, ppv, npv]) # add data to output
return pd.DataFrame(metrics, columns=['threshold', 'tpr', 'fpr', 'precision', 'npv']) # convert data to pandas df
Using the function above, we can easily create a dataframe of these scores at many thresholds between 0 and 1.
# Get metrics for each model and add column to label which model
log_metrics = get_metrics(log_scores, y_oos)
log_metrics['model'] = 'log'
rf_metrics = get_metrics(rf_scores, y_oos)
rf_metrics['model'] = 'rf'
We can plot these metrics for both models by combining the dataframes and then dropping the different metrics into a single metric
column, and then plotting against threshold as the x-axis.
from plotnine import * # seaborn would not facet on metric
metrics = pd.concat([log_metrics, rf_metrics]) # combine metrics dataframes
# create single metrics column
metrics_melt = pd.melt(metrics, id_vars=['threshold', 'model'], var_name='metric', value_name='score')
ggplot(metrics_melt, aes(x='threshold', y='score', color='model')) + geom_line() + facet_wrap('metric')
We can also use sklearn's auc
function to calculate the area under the ROC curve (AUROC), which will serve as our model performance score. This curve measures the tradeoff between false positive and true positive rate when adjusting the threshold. Using this, we find that the logistic regression has an AUROC of 0.48, with the random forest slightly higher, with an AUROC of 0.52. We can also see this by plotting the tradeoff, as the random forest curve is slightly above that of the logistic model. From this result we can say that the random forest performs better than the logistic regression, but we cannot say if this is statistically significant, for that we will use K-fold cross validation.
print(f"Logisitc Regression AUROC: {auc(log_metrics['fpr'], log_metrics['tpr'])}")
print(f"Random Forest AUROC: {auc(rf_metrics['fpr'], rf_metrics['tpr'])}")
plt.rcParams['figure.figsize'] = [8, 8] # Adjust figure size to be square
sns.lineplot(data=metrics, x='fpr', y='tpr', hue='model') # plot ROC curve
plt.title("ROC Curve")
plt.show()
K-Fold Cross Validation is a process by which we split our dataset in K
equal sections, or folds, and uses each section as the testing set for the model, where the training set consists of all the data except the Kth section.
Sklearn provides StratifiedKFold
to make these datasets for us into an iterable with length K
, where K
is the number of folds.
splits = 10 # For my testing, I will use 10-Fold Cross Validation
cv_obj = StratifiedKFold(n_splits=splits)
def get_cv_data(model, cv_obj, points=1001):
metrics = pd.DataFrame() # output dataframe
# Iterate through folds from cv_obj
for i, (train, test) in enumerate(cv_obj.split(X, y), 1):
# Clean datasets
train_clean = imp.fit_transform(X[train])
test_clean = imp.fit_transform(X[test])
# Fit and score model
model.fit(train_clean, y[train])
scores = model.predict_proba(test_clean)[:,1]
# Use scores to compute metrics
temp = get_metrics(scores, y[test])
temp['fold'] = i # Add column to specify fold
metrics = pd.concat([metrics, temp]) # Append data to output
return metrics
Using get_cv_data
we can get the same data we looked at previously, but for each fold.
# Get cross validation data for each model and add column to label which model
log_cv = get_cv_data(log_reg, cv_obj)
log_cv['model'] = 'log'
rf_cv = get_cv_data(rf, cv_obj)
rf_cv['model'] = 'rf'
Once we add the additional column to specify the model, we can combine the two dataframes together for plotting. We also want to take the mean of the metrics across the folds of each model. We do this by grouping by model and threshold, which creates groups of size K
for each model, which is size 10 for each group in our case. Then we can just apply mean()
to the groupby object and reset index to get the mean of the metrics at every threshold.
We can also calculate the AUROC for each fold of each model by grouping by model and fold. Each group has a data for every threshold in this case so we can once again just use sklearn's auc
function on the fpr
and tpr
columns to get the AUROC score.
# Combine cv data and compute mean across folds
metrics_cv = pd.concat([log_cv, rf_cv])
summarised_cv = metrics_cv.groupby(['model', 'threshold']).mean().reset_index()
# Compute AUROC for each fold of each model
aucs_df = metrics_cv.groupby(['model', 'fold']).apply(lambda x: auc(x['fpr'], x['tpr']))
aucs_df = aucs_df.reset_index().rename(columns={0: 'auc'})
Now that we have this data collected, we can plot it to see the ROC curves for each model. Below we plot the ROC curve for each fold using an alpha of .2
and then plot the mean ROC curve on top so we can see the benefit from taking the mean of the K-folds.
# size is used to split up lines for each fold, but size is still 1 for all lines
sns.lineplot(data=metrics_cv, x='fpr', y='tpr', size='fold', hue='model',
size_order=[1 for _ in range(splits)], alpha=.2, legend=False)
sns.lineplot(data=summarised_cv, x='fpr', y='tpr', hue='model')
plt.title("ROC Curve")
plt.show()
Using the AUROC scores we calculated above, we can make a scatter plot of the scores for each model to visually compare them. In the plot below we see that the random forest model seems to consistently perform marginally better than that of the logistic model.
(ggplot(aucs_df, aes(x='model', y='auc')) +
geom_jitter(position=position_jitter(0.05)) +
coord_flip() +
labs(title = "AUC comparison",
x="Model",
y="Area under ROC curve"))
Similarly to our first hypothesis test, we can also use statsmodel's least squares regression to see the difference in the models. In the results chart below we see that the model explains about 25% of the increased varied performance from the R-squared value. We can also now say that this performance difference is statistically significant as $\mathrm{p-value} = .026 \leq \alpha = .05$.
smf.ols('auc~model', data=aucs_df).fit().summary()
Now that we have completed the data processing pipeline, there are a few notes we should leave with. Firstly, we should note that quantitative finance is a multi-billion dollar industry, and while we did see a performance increase from the random forest model over the logistic regression, neither model performed very well. With an AUROC score of .52, the random forest model is only marginally better than a random 50/50 guess. We expected the random forest to be better because a decision tree is a more natural way of making an investment decision, simulating the way an asset manager may make decisions. Furthermore, random forests perform better when working with many features, as it is easy to limit the features considered at each decision node. We can improve the performance by adjusting the threshold according to the analysis we did before, however, a better way to increase the model's performance would be to transform the data into more sophisticated features and reduce noise in the data. I encourage you to explore these ideas and follow a similar process to evaluate the new performance.
Thank you for reading, I hope you have found this valuable in some way, if you have any questions, feel free to reach out via email: ryansdowning1@gmail.com