Boundary Detection in Natural Language

Does it often happen to you when you get a text “Let’s eat Eduard” and start wondering whether your friend has joined University of Cambridge Cannibalism Club? Well, worry no more, because Andi’s project is going to put all the missing punctuation in and eliminate your worries. This Wednesday he gave us a presentation on the topic.

We now (hopefully) see that missing punctuation, or in more scientific terms, missing segment boundaries, can be harmful. However, why is this problem important? Surely, if a human omits crucial piece of punctuation from the written text, it’s his own fault. However, there is another type of obtaining text, apart from writing, which has grew really popular recently. I’m talking about speech recognition. Whether you use Siri, Cortana, Alexa or “OK, Google”, they largely have the same challenge – from an infinite stream of words they want to break them down to segments to obtain original meaning. Of course, the other way to obtain punctuation is to carefully consider pauses in speech, but we’re going to assume we do not have that information available.

The project is split into four stages as shown below:


First, we need to gather data. For this purpose, British National Corpus was used, containing over 100 million of segments with annotated words. Some extra processing was required to remove some noise, unhelpful to the project. Finally, the corpus was filtered down to include only written text; omitting some other types of text, like meeting minutes, which were not verified and could contain grammatical and linguistic errors.

Then, we move on to training a model. For this purpose, the n-gram model was used. As Andi explained, it is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n − 1) order Markov model. In other words, we use the previous (n – 1) words to predict the next one.

Now, we can perform classification in this model. We can now estimate the probability of whether the next word or punctuation follows (n – 1) words based on the training set. After that, we can use that information to predict the punctuation by using the beam trees. Essentially we want to consider all possible branches (punctuation or next word) pruning branches with very small probability. This is beneficial, since it allows us to delay the decision point if the probabilities are similar, and only prune when the continuation of the branch was deemed to be improbable. You can see the example of this in the Figure below, when we delay the prediction whether “cat” is followed by “eats” or by a punctuation in a bi-gram model.


Finally, we perform evaluation of the model. There are two commonly used metrics – recall and precision. Recall measures how many actual punctuation we have identified. Precision measures how many punctuations we identified were actual. However, neither of these metrics is good on its own – it’s easy to get a perfect recall by having punctuation everywhere and it’s easy to get a perfect precision by generating only really obvious punctuations. To solve this, the F-score is introduced, which combines these two metrics into a single one.

Screen Shot 2016-03-11 at 13.41.23

Easter Revision Planning: 2016

In a departure from the previous theme, we focused this week on Easter revision. Eduard collected everyone’s plans and classified them as either focusing on when to revise, or what to revise. Some of us were chosen to present our plans to everyone else. Here’s the selection of the more memorable moments from the presentations.

When, Middle Level of Detail:  Ben

Ben planned to do eight hours of work per day… on his dissertation alone. Not to mention revision for his final year exams and a three-day rowing trip. While he sketched out a rough work schedule – in sufficient detail to identify daily targets – he retained enough freedom to juggle sessions around as necessary.

When, High Level of Detail: Andy

Andy planned in great detail each hour of his day, segmenting it into blocks of exactly 1 hour 35 minutes. Andy’s plan should ensure that he stays on track, and can just run on autopilot from this stage onwards.

Image: Andy’s incredibly detailed plan.

Screen Shot 2016-03-11 at 13.45.54.png

When, Low Level of Detail: Alex no. 1

Alex started off by asserting that he was a great “believer in freedom”, and so it’s no surprise that his plan left plenty of room for improvisation. This should allow him to vary his schedule on a day-to-day basis, as per his mood.

What: Alex no. 2

Everyone was dazzled by his three page plan written in LaTeX — until he mentioned that “none of this is the actual plan.” Fortunately, he did seem to have a good alternative plan, relying heavily on tripos questions (“Java questions are jolly useful”) to revise. This is a good strategy to get to understand the style of tripos questions and cover the material at the same time.

Image: An extract from Alex’s plan written in LaTeX.

Screen Shot 2016-03-11 at 13.42.13

What: Henry

Henry emphasised the importance of taking a break, planning in a week of gliding. Cambridge can be lot of hard work, and it’s vital to take some time off to be able to focus fully during term-time.

What: Radu

Radu used fancy fitting algorithms to slot his study sessions into his timetable. Wisely, he prioritised the courses he needed by assigning them the most number of sessions. The Part II dissertation was also once more a running theme.

How it can all go wrong: Eduard

To finish off, Eduard gave a demonstration of how a plan can go wrong, even for the best of us. Overall, the message was “be honest with yourself,” and keep a realistic set of goals in mind while revising over Easter.

Image: Eduard’s example (of what not to do) plan, including travel to Japan, FIFA, meeting with friends, and a worryingly small amount of time to work on his project.

Screen Shot 2016-03-11 at 13.43.07.png

Compiler Optimisation and Automatic Loop Parallelisation

This week the Queens’ Computer Scientists were given a fascinating talk by Katie regarding her Part II project on Compiler Optimisation and Automatic Loop Parallelisation.

The project looks at exploiting parallelism automatically through compiler optimisations and distributing work across multiple cores. The benefits of which are quite remarkable: if 50% of a program is parallelizable, the speedup from parallelisation can be as much as 2x, but a program with a parallelizable code fraction of 95% could see a speedup of up to 20x!

To begin with, we were told about the basics of compiler optimisation.

Compiler optimisation aims to make code:

  • Smaller E.g. Dead code elimination, Unreachable code elimination
  • Faster E.g. Auto-Vectorisation, Auto-Parallelisation
  • Cheaper E.g. Register colouring, Cache management


Screen Shot 2016-03-06 at 20.14.08

There are many types of optimisation, and each type corresponds to the stage in which the compiler should be optimised. In the image above, we can see that it is possible to optimise the parse tree, intermediate code or target code. Katie chose to run her optimisations on the program’s intermediate code, because in doing so, the intermediate representation is translated from the original source code and so should have obvious basic blocks, meaning a block of code without control flow. As such, distinct units of execution are created, upon which optimisations can be carried out — this saves time and space compared to if the optimisation had been run and stored per individual instruction. Intermediate code also has a smaller number of instructions than the original source code and so it reduces the number of instruction combinations and structure that the optimisation needs to look for.

Optimisation is carried out by analysis and transform passes. The transform pass changes the code written by the programmer, and the analysis pass prevents these changes being dangerous, e.g., leading to unexpected behaviour or segmentation faults.

Unreachable elimination is one property that can be checked when optimising code, it allows for transforming the code to optimise the size of the program. However, with this comes the issue of the transform being safe; will any code be broken?

When the analysis searches for a particular program property that cannot be decided, the optimisation approximates cautiously. One example is overestimating reachability:

Screen Shot 2016-03-06 at 20.41.33

Katie adopted a similar approach in her project to implement a way of optimising a compiler to parallelise loops. The aim was to gain program speed up by utilising many cores.

It is possible to take a risk and try to parallelise code that is non-parallelizable, but the safer option is to underestimate parallelizability.

Screen Shot 2016-03-06 at 21.23.08
After carrying out safety checks in the analysis of the above loop: checking that there are no inner-loop dependencies, ensuring later iterations are independent of earlier ones, checking that arrays don’t alias to prevent dependencies between loops, bounding the loops with known constants, ensuring there are no unexpected breaks or returns, and omitting loops with reduction variables that occur on a per thread basis, Katie was able to optimise the code, extract loops and run them as separate threads distributed across multiple cores:

Screen Shot 2016-03-06 at 21.23.12
To finish her presentation, Katie talked about how she  ran a Mandelbrot set calculation with a maximum iteration of 1000 to test the compiler optimisations she had created to spread the iterations across multiple cores. The graph below shows the data from the test:

Procedurally generating closed-circuit racetracks

In this week’s talk we had another 3rd year student, Jamie, explain his project to us and why he chose to spend his time on this subject.

Jamie started out by giving some examples of procedurally generated content. The example he used was Minecraft and how 2D Perlin Noise was used for the terrain generation. However, he explained that different methods are used in procedural generation depending on the required properties of the result. He illustrated this by explaining that Perlin Worms are used to generate the caves in Minecraft in order to increase the connectivity of the caves.


He then went on by briefly touching upon the uses of procedural generation in audio, L-systems and city layout generation (as seen above), before going on to the main part of his talk: “Procedurally generating closed-circuit racetracks”.

Jamie started out by giving us some definitions that were necessary to understand the process. Firstly, he needed to define a racetrack, which was required to be a closed circuit and needed to have a start and a finish. Then, he talked about the different constraints one could put on the racetracks. He chose to have a number of “hard constraints”, which were properties the generated racetrack had to have, and also a number of “soft constraints”, which were properties which would be preferable but not necessary. The “hard constraints” were represented by a set of (in)equality constraints, so they had to have a certain exact values. Each “soft constraint” was represented by an Energy Function, which gave a certain racetrack a value based on its properties regarding the constraint. The total “value” of a racetrack was then found by summing all the Energy Functions.

In order to write his code and generate racetracks, he needed a way of representing the racetracks, which he did by using an ordered list of “d quadratic bezier curves”, which are a way of representing a piece of the racetrack, where each bezier curve can at most have one type of curvature.  The racetracks themselves were generated by using something called Voronoi Diagrams (as seen below).


Voronoi diagrams are represented by a group of points, with the edges being defined as the locus of points where the distances to each of the 2 closest points are equal. Jamie then created racetracks from these by selecting a collection of these edges. Each racetrack is first checked to conform with the “hard constraints” and is then optimised by minimising the sum of the energy functions of the “soft constraints” using gradient descent, which works a bit like rolling a ball down a slope (see picture below).


Jamie finished his presentation by giving us a demo of his code and explaining that, to finish off his project, he would do a user study to see if the racetracks he generated are truly as good or maybe even better than the content that is available so far.

Parallelization, Convolutional Neural Networks and other buzz-words

Last Wednesday we were given a talk by Ben, one of the third years at Queens’. His talk introduced us to some very recent popular computer science topics and a way to combine them under his Part II project “Parallelized Deep Learning for Convolutional Neural Networks on the Intel Xeon Phi”.

He started his talk by explaining how neural networks work. We’ve seen how they aim to process information in a way similar to how biological systems, such as the brain, do it. Their structure gives the impression of a brain, by having multiple interconnected layers, each of them composed of multiple neurons. Neurons are just processing units that apply a predefined function to a set of inputs. Convolutional Neural Network are a particular case in which the function applied in each neuron is a convolution. The following picture represents the structure of a neural network, and it shows how one neuron applies its function.


Ben then explains that you can train such networks to make different “intelligent” decisions – his example was understanding handwritten digits. The training can be done by minimizing the error for each training sample – in a way similar to letting a ball roll down a hill.

The theory behind all of this is as old as 1980, but requires quite high computational power. Due to recent advances in technology, using such a structure has become a feasible thing. For his project, Ben uses the Intel Xeon Phi co-processor, specialized to do parallelized computing. By splitting and parallelizing the calculation, Ben showed us how you can reduce the training time to mere seconds, compared to several days.

Keep Talking and Nobody Explodes

This week’s Wednesday meeting saw the Queens’ CompScis participate in a somewhat different activity: bomb defusal.

In Keep Talking and Nobody Explodes, a player is locked in a room with a bomb that will imminently go off, and has five minutes (or less) to defuse the bomb by disarming several complicated modules. However, they have absolutely no idea how to defuse the bomb. Thankfully, the other players are equipped with a bomb defusal manual — but they can’t see the bomb, so everyone involved has to communicate quickly and clearly to stand a chance of survival.


Having arranged ourselves into threes and deciding on team names, we got to work. The game proved to be quite interesting, and certainly tested our ability to convey information accurately and under pressure. We deciphered confusing symbols, disambiguated between lots of homophones, and tried to tackle a light displaying Morse Code, although around half of us found it impossible to coherently read out the rapid sequence of dots and dashes.

Overall, we enjoyed improving our communications skills in a fast-paced and competitive game environment. Congratulations to Dhruv, Henry, and Tamara, who managed to defuse 9 bombs by the end of the session! Also thanks to the developers of Keep Talking and Nobody Explodes for providing us with copies of the game.

Game Theory – NIM and Sprague-Grundy

Last Wednesday we were given a talk by Radu, one of the third year students at Queens’. His talk introduced us to some interesting and cool concepts in game theory – a topic that relates well to Computer Science, such as in problems where the optimal play must be taken.

Radu’s talk focused on NIM-style games. The aim of the game is to be the last player to be able to play a move. The image below shows an example – three piles of coins exist, and a each turn a play is allowed to take as many coins as they like, from a single pile (they must take at least one coin). IE, the player could take all the coins from pile one, and then it would be the opposing players turn.

Screenshot from 2016-01-26 12-09-24

Through the clever use of XORing the numbers, Radu went on to show that calculating and making the optimal play is very easy. Once a player has forced themselves into a winning state (where the XOR-Sum of all numbers is 0), as long as the player maintains the optimal play, they cannot lose.

Screenshot from 2016-01-26 12-16-34

Moving on from the NIM game, Radu went on to show (Via the Sprague-Grundy theorem) that many games can be reduced into a NIM game, and thus an optimal solution can be found. One such example is represented in the image above, the “Poison Chocolate Problem”, where a player can take any row(s) or column(s) in a single turn, but must not be the player to eat the poisoned block.