SamSuka
Ernie.QT
Ernie.QT

patreon


Common Alpha 010: Feature Importance & Generation

The content will talk about some of the useful packages I have used when I tried to build trading strategies using a Machine Learning framework. The content will be split into 3 parts, faster computation method in Python, feature generation & in machine learning & feature importance.

A faster computation can help a lot especially in generating new features when the dataset is large enough, and it reduces the time spent on pre-processing the data.

Faster Computation (Numba) - Python Package

Background Info:

Set-up:

pip install numba

Usage:

Example:

Comparison:

Example 1:

Example 2:

Features Generation

Example (on Kline dataset):

Here I try to use ‘numba’ to write some custom functions to compute the common features, and if you are backtesting the larger dataset like 5min, or even 1min resolution, the speed can make a huge difference.

Features Importance

Below is an illustrative example of how to perform a time-series split for Random Forest Classification model training/testing, extract feature importance scores, and optionally carry out feature selection (via Recursive Feature Elimination, RFE) in a quant-trading scenario.

Assumptions:

1. Splits the dataset iteratively using TimeSeriesSplit (without shuffling), fitting a Random Forest on each split.

2. Recursive Feature Elimination (RFE) iteratively removes the least important features until a desired number of features remain.

 

The above example is a template that I started with quant backtesting using a machine learning approach. There are still more follow-ups to be done on avoiding overfitting & evaluating the model. Meanwhile, the feature importance result can be used as a reference to develop strategies on rule based level, like which features are useful in machine learning perspective.


More Creators