1 - python for the data munging
2 - orange for a quick and dirty prototype
3 - node box for the visualization
While I had some fun with Orange - I also experience some issues. First and foremost being a bug that causes an error while attempting to score the validation set with a Neural Network (the best model typically).
If interested I can provide a more in-depth discussion on the architecture of the model. In short I used k-means to cluster the data into four groups, followed by models for each cluster. The data set I am using is....limited...something I hope to remedy in the future. These are the results for the validation set.
The tools in use here include: Python for the data munging, Orange for the modeling, and d3.js for the visualization.
The data prep approach includes the collection of 7 years worth of match results, calculating a rolling average for each team prior to the start of each game (about 2000 observations in total).
Variables include: goals scored, shots on target, penalties received etc for both the home and away teams.
I tried several different algorithms including decision trees, neural networks (bug prevented scoring - so I dropped that approach), SVMs and the old faithful -logistic regression. I also tried clustering - then training models against each of the clusters.
The SVM ended up proving the most robust when scoring validation data.
Using R and outlier analysis to explore NBA player statistics
Recently, I am interested in analyzing outliers in high dimensional data sets. Generally, the outlier analysis includes two parts, outlier d
Smith Hanley Associates | Recruiters for Finance, Pharmaceuticals, Marke...
Smith Hanley Associates is a professional recruiting firm focused on recruitment in financial services and investments, pharmaceuticals, mar