Inference in Neural Networks using an Explainable Parameter Encoder Network

A Parameter Encoder Neural Network (PENN) (Pfitzinger 2021) is an explainable machine learning technique that solves two problems associated with traditional XAI algorithms:

  1. It permits the calculation of local parameter distributions. Parameter distributions are often more interesting than feature contributions — particularly in economic and financial applications — since the parameters disentangle the effect from the observation (the contribution can roughly be defined as the demeaned product of effect and observation).
  2. It solves a problem of biased contributions that is inherent to many traditional XAI algorithms. Particularly in the setting where neural networks are powerful — in interactive, dependent processes — traditional XAI can be biased, by attributing effect to each feature independently.

At the end of the tutorial, I will have estimated the following highly nonlinear parameter functions for a simulated regression with three variables:

A Github version of the code can be found here.

Evolving Themescapes: Powerful Auto-ML for Thematic Investment with tidyfit

The recent years have been marked by an unusual amount of geopolitical upheaval and crisis. In this post, I explore the change in importance that this period has elicited in different investment themes. Which trends have grown in importance? What can be discovered about evolving market priorities and the brave new world ahead?

To explore these questions, I draw on a data set of MSCI Thematic and Sector index returns, and calculate the regression-based importance of each theme for each sector over time. The analytical workflow is typical to the quantitative finance setting, essentially requiring the estimation of a large number of linear regressions that provide orthogonal exposures to different investment themes. Here the R package tidyfit (available on CRAN) can be extremely helpful, since it automates much of the machine learning pipeline for regularized regressions (Pfitzinger 2022).

MSCI provides thematic equity indexes for 17 different themes that range from digital health and cybersecurity to millennials and future education. The following plot shows the average change in each theme’s importance — measured as the change in the absolute standardized beta — from before the COVID-19 pandemic to after the pandemic. The regression betas are estimated using an elastic net regression (discussed below). A positive value suggests that the theme has, on average, increased in recent years:

tidyfit: Extending the tidyverse with AutoML

tidyfit is an R-package that facilitates and automates linear regression and classification modeling in a tidy environment. The package includes several methods, such as Lasso, PLS and ElasticNet regressions, and can be augmented with custom methods. tidyfit builds on the tidymodels suite, but emphasizes automated modeling with a focus on the linear regression and classification coefficients, which are the primary output of tidyfit.

hfr: An R-Package for Hierarchical Regression Shrinkage

hfr is an R package that implements a novel graph-based regularized regression estimator: the Hierarchical Feature Regression (HFR). The method mobilizes insights from the domains of machine learning and graph theory to estimate robust parameters for a linear regression, constructing a supervised feature graph that decomposes parameters along its edges. The graph adjusts first for common variation and successively incorporates idiosyncratic patterns into the fitting process.

The result is group shrinkage of the parameters, where the extent of shrinkage is governed by a hyperparameter kappa that represents the size of the feature graph. At kappa = 1 the regression is unregularized resulting in OLS parameters. At kappa < 1 the graph is shrunken, reducing the effective model size and regularizing the regression.

Scraping and visualising global news heatmaps with R

It’s nothing fancy, but here’s the code.