Predicting the Future: Language Modelling

Jeppe presented his third year dissertation to the weekly gathering of Computer Science students at Queens’.

After a hard day of lectures, practicals and supervisions, the first slide in Jeppe’s presentation was quite a relief.

Screen Shot 2014-11-25 at 21.48.00

It’s not a trick question. Being able to predict the next element in a sequence can be quite intuitive…to people.

Screen Shot 2014-11-25 at 16.49.40

To model a language, more history than just the previous word needs to be considered. The average length of a sentence is 20 words. Jeppe explained that Recurrent Neural Networks (RNNs) are appropriate because they allow the storage of more elements compared to previous methods such as Bayesian N-gram models (limited to tracking only 5 previous states). In a language modelling context the RNN should, given a partial sentence, be able to estimate the most likely next word in the sentence.

Jeppe introduced Hessian Free (HF) Optimisation which has recently been used to train RNNs…slowly. He talked about how this problem could be minimised for larger data sets using GPUs, offering from 5 to 100 times the speed. Jeppe may extend this project using a cluster of GPUs, instead of just one, so that HF optimisations can be run in parallel. This will greatly increase the speed.

Possibly the most obvious applications of this powerful machine learning algorithm are predictive text on smartphones and speech recognition. Both of which seem to have a lot of room for improvement.

Jeppe, we wish you the best of __?__.