Feedback

The notion of task/window/accuracy are featured in the results. But I don't know what any of them are. Why do you use two words (window, task) for the same thing? Surely we should do better in later tasks as we have more information? Do we have more information for later tasks? If not, what is the pont of plotting with Window as the x-axis.

Surely it is unfair to compare them on the best task. That is using test accuracy as a stoppping crierion, which you are not allow to do. You have to determine when to stop before seeing the test results. Either determine when to stop based on cross valudation or decide when to stop up from (e.g., after all data has been seen, which I would interpret as at the last task).

"tuning the hyper-parameters resulted in reduced accuracy for the majority of the models" seems implausive to me. Is there really no signal, or dod you start with the optimal parameters? Or is is just not statistically significant?

I need a better explanation of how you evaluated "Preprocessing and selection of features". Did you choose one feature or a subset of them? I can't work out which features were actually selected. Did you try L1 regularaization which tends to ignore features?

We need references to parts that are someone else's work.

DavidPoole (talk)‎