This blog consist of the solutions to the Max margin interval trees Tests for GSOC-2018. The Tasks given are:

Easy: run some R code that shows you know how to train and test a decision tree model (rpart, partykit, etc). Bonus points if you can get trtf running for an interval regression problem, for example data(neuroblastomaProcessed, package=”penaltyLearning”). Use 5-fold cross-validation to compare the learned decision tree models to a trivial baseline (which ignores the features and just learns the most likely prediction based on the train labels and always predicts that).

Medium: Read the partykit vignette to learn how to implement a new tree model using the partykit framework. Use it to re-implement a simple version of Breiman’s CART algorithm (rpart R package). Demonstrate the equivalence of your code and rpart on the data set in example(rpart).

Hard: Read the help page of the survival::survreg function, which can be used to fit a linear model for censored outputs. Use it as a sub-routine to implement a (slow) regression tree for interval censored output data. Search for the best possible split over all features – the best split is the one that maximizes the logLik of the survreg model. Demonstrate that your regression tree model works on a small subset of data(neuroblastomaProcessed, package=”penaltyLearning”).

Easy

Train and test a decision tree model using rpart and partykit. And compare 5-fold cross-validation pruned tree to trivial baseline model. Run trtf and train neuroblastomaProcessed dataset.

Medium

Implement a desicion tree model using the partykit framework using CART algorithm

Hard

Fit a linear model for censored outputs. Inplement Regression tree model for interval censored data.