Category Archives: Uncategorized

Probabilistic Programming

This week it was Jamie’s turn to give a talk – on Probabilistic Programming, an alternative to the black-box magic of neural networks. He initially highlighted some of the pitfalls of traditional neural networks, primarily the large amount of data required to train these models and the lack of interpretability of the internals of the model.

Probabilistic programming instead allows us to encode probability distributions in our program rather than single values. This means we can kickstart our learning by starting with our prior beliefs about a problem. The model is also more interpretable, since we can determine the uncertainty of the model based on the posterior distribution.

Jamie then provided the motivating example of coal mining disasters, where we could use probabilistic programming to determine the switchpoint year when average number of disasters dropped from a high level to a low level. He showed that even with a simple uniform prior (that all years were equally likely to be the switchpoint) the model was able to accurately predict the correct posterior distribution.

Screenshot 2018-11-16 at 11.13.00

Advertisements

Second Year Summers

This week the current second-year students gave a series of quick presentations detailing their summers with a mild focus on computer-science related happenings.

Costin began the presentations. He attended the TechChallenge Summer Camp; he met with students in Romania over seven weeks, with two sessions per week, to attempt to implement some of what he had learned the year before. He said that the Cambridge course had given him great theoretical knowledge but through this camp, he improved his technical knowledge as well.

Next up, Alice told us about her internship at Softwire. She had depressingly long two-hour commute both ways, but in spite of this loved it! She worked at a site with 120 staff and 30 interns and was allocated to a group of six interns with an accompanying full-time employee and tasked with a project for the ten weeks she worked there. In her retelling, she stressed how much she enjoyed her time there. From morale events such as sword fighting and a trip to see Incredibles 2 to free food, snacks and access to a gym, she felt valued as an intern. She also enjoyed finally understanding git, “Git is terrifying, but now I understand it!”

IMG_2028.jpg

Alice presenting her summer

Mukul was next to tell us about his internship at the cyber-security startup Jazz Networks. They mostly deal with internal threats (for example when a company’s employees attempt sabotage). He was involved with the Machine Learning team, which is something he is intensely passionate about. His job was to take process file event data to cluster with unsupervised machine learning techniques for the purpose of detecting patterns to identify potential threats. He said that he really enjoyed the communal culture; that everyone ate in the same place and he felt free to go play table tennis when he felt stuck. At the end of his internship, he was taken by the company on a trip to Mallorca where he stayed in a villa owned by one of the investors for the company. The trip was to celebrate the launch of the product. He particularly enjoyed swimming in the sea from the investor’s private yacht! In his spare time, he wrote a blog to intuitively teach the basics of deep learning.

Zeb was next. His presentation discussed the balance between working and enjoying the summer. He detailed how, when thinking about internships, he was looking for something shorter so that he could have time during the summer to relax and pursue some of his other passions. He worked for five weeks at Diffblue, an Oxford spin-out company. Their main product is an automated test generator, which Zeb worked on in the team attempting to increase coverage of their product. He told us how it was nice to apply knowledge from the course in the industry. He enjoyed the friendly atmosphere and the academic background of the company, alongside the free lunches Delivero’d from local restaurants twice a week. In the rest of his summer, Zeb travelled to Paris, Marseilles, Austria and Amsterdam. Throughout the summer he also worked on creating videos for his YouTube channel. In his last three weeks he “finally fully relaxed” and he recounted the films and Netflix shows he enjoyed watching. He thoroughly enjoyed his summer and told the first-year students that they shouldn’t feel pressured to commit to a long internship for their next summer.

Pavol worked in Edinburgh for a 12-week internship. He’d never been to Scotland before and really enjoyed the city. The company was called Skyscanner, a much larger company with offices around the world which was convenient because it allowed him to travel to London and Glasgow for different experiences. The company attempts to be the most trusted and most used online travel brand in the world: the ultimate travel engine. He found it very interesting because he saw how every team in the company worked towards this same vision. The office integrated fun elements such as airplane seats and he showed us a picture of how his offer arrived in a cute little suitcase. He worked in a team of 50 people, in a sub-team of 10 people. He managed the financial data of the company, such as calculating the profit of the company based on web traffic from their sites in order to calculate the profit daily rather than monthly. He invented a machine learning model there and managed to publish an article in the company blog so that they could use his research once he’d left. During his summer he also travelled around Scotland, going to Glasgow, the Highland Lakes, the sea and the fringe festival. He’d go on weekend trips regularly and went to Mallorca for 4 days, sleeping in a car rather than a villa as he drove around the whole island. He also went to Venice for a week with his family and Porto with some friends. (He doesn’t recommend the waves for surfing: too big!) At the end of his internship, he went home to the countryside. His personal project throughout the summer was to cook 30 different dishes, which he succeeded in doing. His favourite part of the internship was the people he met, some of whom he went on trips with. He also liked generally chatting with people in the office and enjoyed the team building events such as a barbecue and a treasure hunt.

IMG_2032.jpg

Pavol showing off his cooking skills

 

Alice, Mukul and Pali were asked about the interview process for these internships and so they detailed the process for the sake of the first years, including how they found out about the opportunities and the number of interviews they had. When asked about whether he would return, Mukul said he loved the startup culture but is keen to try working for a bigger company. Both Alice and Pali were given return offers. Overall the internships were presented as a very positive experience and a great use of their summer. Alastair ended the session with a reminder that a careers fair was occurring at the Computer Lab in a month; a great opportunity to find internships and generally get a feel for the companies that are interested in hiring students from Cambridge.

DLLMA: Scaling LLAMA Into a Distributed Graph Database

This week, Daniel gave a presentation on his Part II project, which is called “DLLAMA: Scaling LLAMA Into a Distributed Graph Database.”

 

A graph database represents entities and relationships in the form of a graph instead of a table. In this approach, calculating the distance between two entities (see Bacon Number) would be much more efficient. Daniel discussed three naive ways to represent graphs in a computer and pointed out the problems of each of them.

 

We could use a matrix to represent the connectivity between nodes by recording a “1” in the matrix when there is a connection, and “0” otherwise:

matrix

Unfortunately, the matrix representation takes up a lot of space. Alternatively, we could use a linked-list representation to represent the graph:

linked list

Unfortunately, a linked list representation has bad read performance and interacts poorly with the cache since data is spread throughout memory.

To solve some of these problems, Dan introduced a data structure named CSR. This data structure is similar to the linked-list representation; the difference is that all the edges are stored in a single array. Doing so solves the high costs associated with reading in linked-list representation, at the expense of significant additional cost when adding edges:

CRS

 

This data structure (with a small improvement called“snapchat”) is used in LLAMA, a graph database engine. Dan modified LLAMA to support distributed processing across multiple computers and scale the data structure in a distributed system. His system works better than a single machine for read-heavy workloads and he provided benchmark results which showed that DLLAMA was significantly better than Neo4j (another frequently-used graph database) and the original version of LLAMA.

Interpolation in the Latent Space of Variational Autoencoders – What does it mean and why is it useful?

Dhruv’s part II project is titled “Interpolation in the Latent Space of Variational Autoencoders”. None of us had any idea what that means, so this week Dhruv aimed to help us all to understand that that means, why it’s useful, and how it’s done, with the promise that it would try to add some science to the black box of machine learning.

The first step was to explain what a variational autoencoder is. The basic idea behind them is that they take an image in a particular domain, or some other data, and convert it to a few numbers (in this project, they were converted to 2 numbers), from which the image can be approximately reconstructed later. The values were also weighted to follow a normal distribution, which, taken with the limited number of values per image, should mean that the values represent something useful about the image. The aim is to let us interpolate between the values from two images to get an interpolation that makes sense, such as interpolating between slanted “1”s to get a straight “1”.

variational-autoencoder-interpolation

Interpolation between two slanted ones gives a straight one

The first dataset used to test the model is a set of hand-written numbers. The values chosen for each of the images are tightly clustered by the number in the image, which means that the model is working well; when interpolating between images of the same number, it usually looks natural. However, there are a few points where different numbers are overlapping, which results in the model giving distorted combinations of the numbers when asked to interpolate across this region.

Variational Autoencoder VAE Plot

Plotting hand drawn numbers in the 2D latent space. Most of the numbers are clustered well, but there is some overlap between 5 and 3 in the centre of the plot.

The second dataset is a collection of photos of objects taken from several different angles, to try and interpolate between the angles and produce an image from a different perspective. Dhruv is still working on refining the model for this dataset, but currently it produces quite blurry images.

 

Deep Learning for Music Recommendation

This week, it was Andy’s turn to give a presentation about his Part II project on using Deep Learning to automatically tag music. Andy began by talking about how Spotify had millions of songs in its database and the need for machines to auto-tag these to form playlists.

 

He then explained some of the background theory that his project builds upon, such as:

  • Mel-frequency spectrograms (which represents the spectrum of frequencies according to their perceptual distance)
  • Deep networks and supervised learning
  • Convolutional layers and how deep networks learn through gradient descent
Screen Shot 2018-02-23 at 16.50.33

A mel-spectrogram of a typical music file.

Finally, Andy described traditional algorithms for tagging such as collaborative filtering and explained how his project aimed to improve their shortcomings. He described the network architecture and the research paper he was using as well as some preliminary results.

Screen Shot 2018-02-23 at 16.50.44

A diagram of a single neuron, and a deep network consisting of many neurons interconnected.

Summapp

Summapp

This evening Pavol Drotar gave a talk about Summapp, an Android application he wrote with a team of five friends. Summapp analyses an audio recording of a phone call and returns a list of key words and actions found in it. A beta version is available on the Google Play Store.

Summapp was implemented in Kotlin, a programming language which integrates closely with Java. The advantage of programming in Kotlin is that it offers more simplicity and safety.

The app itself uses Google Speech to Text on the phone handset and a custom cloud-based service which in turn makes use of DialogFlow to extract important parts of the call. The results are then fed back to the user’s phone using Google Firebase.

Benefits of using Summapp:

  • Extract key events, such as meeting places and times, from an audio recording of a phone call.
  • Export extracted events to Google Calendar and share with other users
  • View specific places described in the call on Google Maps
  • Determines contacts mentioned in the phone call
  • Provides an organised history of calls

 

Cambridge Hackathon 2018

Saturday January 20, right before Lent term starts, about 300 enthusiastic Hackers gather at 9 am in the Cambridge Corn Exchange to compete in Hack Cambridge Ternary – the 2018 edition of the Cambridge Hackathon. A long 24-hour period of brainstorming, discussing, snacking and above all, coding, is awaiting them.

The Cambridge Hackathon is a student-run coding competition where teams compete to create the most cutting-edge, creative, sophisticated, or amusing product. In the 24 hours, the participants have to come up with ideas, develop the concepts, put it all together, and give a presentation of their achievement. There are also various companies with mentors present to help all the Hackers with their problems. (Also, they give away tons of swag.) Despite the limited time, amazing products are made every year.

DSC04393

View of an average table at the Hackathon

Queens’ was well represented at the Hackathon, CompScis from various years signed up for the event and developed some cool products. Aliyah, Jack, Jamie and Lorelyn developed an app which uses Microsoft Cognitive Services to scan payment receipt and summarise these for the user. Jirka and some others developed a system for Amazon’s Alexa which can tell jokes, store new jokes, and even rate your jokes! We (Lex and some others) developed a distributed system for fast and secure sharing of medical records.

 

During the 24-hour period, there are many points at which a Hacker can feel tired and hopeless, but pushing through results in some great products, which are definitely worth the struggle. All these great ideas were showcased on Sunday, the variety of which was mind-blowing. It covered body-controlled games, health applications, speech recognition, and many more.

DSC04606

The showcase

The Cambridge Hackathon is a great way to meet new people, develop coding skills, but most importantly to have fun. I would personally recommend it to anyone who has done some coding and wants to have a great time!