SC²S Colloquium - Apr 25, 2018
|Date:||Apr 25, 2018|
|Time:||15:00 - 16:00|
Eric Koepke: Optimizing Hyperparameters in the SG++ Datamining Pipeline
In Machine Learning there are often parameters of the model or the training algorithm that have to be known before the actual learning begins. These hyperparameters can make a big difference to the success of a machine learning model, especially since these models grow more complex as research on them advances. Advanced automatic hyperparameter optimization algorithms were developed to find optimal hyperparameters as fast as possible. I implement and compare two different approaches in the context of SG++, a toolbox that uses Sparse Grids to perform different classical machine learning tasks. Harmonica successively reduces the optimization search space while Bayesian Optimization tries the most promising hyperparameter setting based on previous results. I test them on regression and density estimation tasks and discuss the strengths and weaknesses of both to show different use cases. Harmonica requires more resources while being trivial to parallelize and more thorough in its search. Bayesian Optimization converges faster and finds the optimal solution as long as certain conditions are met.
Benjamin Holzschuh: Asset Pricing with Hierarchical Clustering using Sparse Grids Density Estimation
Density estimation using sparse grids and Offline/Online splitting has been shown to be a competitive approach when clustering large datasets, and can also easily be adapted to generate hierarchical clusterings, which contain more information than regular clusterings as they capture the structure of the underlying density. We demonstrate how hierarchical clustering and sparse grids density estimation can be used for pricing financial derivatives in a Black-Scholes model using Monte Carlo simulations and what the advantages of this approach are. Additionally, we present a method for dealing with uncertain data sources in the context of clustering by selecting representative hierarchical clusterings and show what insights can be gained from the representatives.