DLLMA: Scaling LLAMA Into a Distributed Graph Database

This week, Daniel gave a presentation on his Part II project, which is called “DLLAMA: Scaling LLAMA Into a Distributed Graph Database.”

 

A graph database represents entities and relationships in the form of a graph instead of a table. In this approach, calculating the distance between two entities (see Bacon Number) would be much more efficient. Daniel discussed three naive ways to represent graphs in a computer and pointed out the problems of each of them.

 

We could use a matrix to represent the connectivity between nodes by recording a “1” in the matrix when there is a connection, and “0” otherwise:

matrix

Unfortunately, the matrix representation takes up a lot of space. Alternatively, we could use a linked-list representation to represent the graph:

linked list

Unfortunately, a linked list representation has bad read performance and interacts poorly with the cache since data is spread throughout memory.

To solve some of these problems, Dan introduced a data structure named CSR. This data structure is similar to the linked-list representation; the difference is that all the edges are stored in a single array. Doing so solves the high costs associated with reading in linked-list representation, at the expense of significant additional cost when adding edges:

CRS

 

This data structure (with a small improvement called“snapchat”) is used in LLAMA, a graph database engine. Dan modified LLAMA to support distributed processing across multiple computers and scale the data structure in a distributed system. His system works better than a single machine for read-heavy workloads and he provided benchmark results which showed that DLLAMA was significantly better than Neo4j (another frequently-used graph database) and the original version of LLAMA.

Advertisements

Interpolation in the Latent Space of Variational Autoencoders – What does it mean and why is it useful?

Dhruv’s part II project is titled “Interpolation in the Latent Space of Variational Autoencoders”. None of us had any idea what that means, so this week Dhruv aimed to help us all to understand that that means, why it’s useful, and how it’s done, with the promise that it would try to add some science to the black box of machine learning.

The first step was to explain what a variational autoencoder is. The basic idea behind them is that they take an image in a particular domain, or some other data, and convert it to a few numbers (in this project, they were converted to 2 numbers), from which the image can be approximately reconstructed later. The values were also weighted to follow a normal distribution, which, taken with the limited number of values per image, should mean that the values represent something useful about the image. The aim is to let us interpolate between the values from two images to get an interpolation that makes sense, such as interpolating between slanted “1”s to get a straight “1”.

variational-autoencoder-interpolation

Interpolation between two slanted ones gives a straight one

The first dataset used to test the model is a set of hand-written numbers. The values chosen for each of the images are tightly clustered by the number in the image, which means that the model is working well; when interpolating between images of the same number, it usually looks natural. However, there are a few points where different numbers are overlapping, which results in the model giving distorted combinations of the numbers when asked to interpolate across this region.

Variational Autoencoder VAE Plot

Plotting hand drawn numbers in the 2D latent space. Most of the numbers are clustered well, but there is some overlap between 5 and 3 in the centre of the plot.

The second dataset is a collection of photos of objects taken from several different angles, to try and interpolate between the angles and produce an image from a different perspective. Dhruv is still working on refining the model for this dataset, but currently it produces quite blurry images.

 

Deep Learning for Music Recommendation

This week, it was Andy’s turn to give a presentation about his Part II project on using Deep Learning to automatically tag music. Andy began by talking about how Spotify had millions of songs in its database and the need for machines to auto-tag these to form playlists.

 

He then explained some of the background theory that his project builds upon, such as:

  • Mel-frequency spectrograms (which represents the spectrum of frequencies according to their perceptual distance)
  • Deep networks and supervised learning
  • Convolutional layers and how deep networks learn through gradient descent
Screen Shot 2018-02-23 at 16.50.33

A mel-spectrogram of a typical music file.

Finally, Andy described traditional algorithms for tagging such as collaborative filtering and explained how his project aimed to improve their shortcomings. He described the network architecture and the research paper he was using as well as some preliminary results.

Screen Shot 2018-02-23 at 16.50.44

A diagram of a single neuron, and a deep network consisting of many neurons interconnected.

Summapp

Summapp

This evening Pavol Drotar gave a talk about Summapp, an Android application he wrote with a team of five friends. Summapp analyses an audio recording of a phone call and returns a list of key words and actions found in it. A beta version is available on the Google Play Store.

Summapp was implemented in Kotlin, a programming language which integrates closely with Java. The advantage of programming in Kotlin is that it offers more simplicity and safety.

The app itself uses Google Speech to Text on the phone handset and a custom cloud-based service which in turn makes use of DialogFlow to extract important parts of the call. The results are then fed back to the user’s phone using Google Firebase.

Benefits of using Summapp:

  • Extract key events, such as meeting places and times, from an audio recording of a phone call.
  • Export extracted events to Google Calendar and share with other users
  • View specific places described in the call on Google Maps
  • Determines contacts mentioned in the phone call
  • Provides an organised history of calls

 

Cambridge Hackathon 2018

Saturday January 20, right before Lent term starts, about 300 enthusiastic Hackers gather at 9 am in the Cambridge Corn Exchange to compete in Hack Cambridge Ternary – the 2018 edition of the Cambridge Hackathon. A long 24-hour period of brainstorming, discussing, snacking and above all, coding, is awaiting them.

The Cambridge Hackathon is a student-run coding competition where teams compete to create the most cutting-edge, creative, sophisticated, or amusing product. In the 24 hours, the participants have to come up with ideas, develop the concepts, put it all together, and give a presentation of their achievement. There are also various companies with mentors present to help all the Hackers with their problems. (Also, they give away tons of swag.) Despite the limited time, amazing products are made every year.

DSC04393

View of an average table at the Hackathon

Queens’ was well represented at the Hackathon, CompScis from various years signed up for the event and developed some cool products. Aliyah, Jack, Jamie and Lorelyn developed an app which uses Microsoft Cognitive Services to scan payment receipt and summarise these for the user. Jirka and some others developed a system for Amazon’s Alexa which can tell jokes, store new jokes, and even rate your jokes! We (Lex and some others) developed a distributed system for fast and secure sharing of medical records.

 

During the 24-hour period, there are many points at which a Hacker can feel tired and hopeless, but pushing through results in some great products, which are definitely worth the struggle. All these great ideas were showcased on Sunday, the variety of which was mind-blowing. It covered body-controlled games, health applications, speech recognition, and many more.

DSC04606

The showcase

The Cambridge Hackathon is a great way to meet new people, develop coding skills, but most importantly to have fun. I would personally recommend it to anyone who has done some coding and wants to have a great time!

What the freshers did over Christmas

Our first weekly meeting of this term involved brief presentations by each of the first years on programming projects they did over the Christmas break. Before cracking on with the presentations, Alastair gave a recap of the main parts of a presentation to nail: script, speech, slides and body language and we discussed what went well and what went badly during revision and project work over the break.

Zebulon started off by presenting his sudoku solver, written entirely in ML, a functional programming language. He showed us how he split his program up into submodules, how he represented the sudoku board, and detailed the algorithm he used to decide which number to place in each cell.

Adam created a renderer of the Mandelbrot set over the break. In his presentation he gave a description of what the Mandelbrot set is, how you calculate whether a point belongs in the set or not, and a demo of his program.

Alice also rendered the Mandelbrot set, as well as the Julia set, using the python notebook environment introduced this year for the first years’ scientific computing course. She also created an audio representation of the Mandelbrot set, as well as altering the Julia set parameters over time to generate some funky pictures to accompany the music.

mandelbrot

Rahma rendered another fractal, this time the Dragon curve, which is formed by starting from a single line and iteratively taking the existing curve, rotating it 90 degrees and adding it on the end of the current curve. She described how we can represent such a curve programmatically and how we can calculate the new definition of the curve at each iteration.

dragoncurve

Costin implemented the Quine McCluskey algorithm for minimising boolean functions in C++. He detailed what the Quine McCluskey algorithm involves and the data representation he used to implement it.

Pavol spent some of his Christmas break analysing crime data from Cambridge, creating graphs to show crime rates in each month of the year, and areas of high and low crime.

Finally, Mukal gave a presentation on deep reinforcement learning, a topic he read up on over the break. He gave a brief overview of what reinforcement learning is, the notion of exploitation vs exploration and what innovations deepmind brought to the table such as deep Q learning.

Interactive Revision Workshop

As the term draws to a close, our minds begin to turn to the winter holidays. We think of getting the chance to sleep, spend time with family, see our friends, listen to Christmas music and most importantly, sleep. However, this week’s meeting (as presented by the second-years) firmly reminded us that there was more than that to think about.

 

Jamie introduced the session

 

Jamie introduced the session by suggesting three things to do over the break. The first suggestion was to take a break. The term is long and busy and he suggested taking a week off to catch up on sleep. This point was contested by some, however:, Tamara pointed out that if you continue working into the first week of the holidays, it could be possible to complete all of the mandatory assignments early, relieving the pressure of completing those later in the break.

The second suggestion was to explore our interest in Computer Science; throughout the term, it is easy to see the subject you are studying as a means to an end, as merely content to learn for an exam. In reality, it is important to remember that we are here to learn more about a subject we are passionate about and if we are passionate it keeps us motivated to work hard on our subject. Thus he suggested researching and learning more about the topics we found most interesting so far; which would have the added bonus of helping us understand that topic.

The final suggestion was to revise. Instead of lecturing us on how to do this, the second years had a brilliant idea where they split the group into three subgroups to discuss this amongst ourselves.

 

We discussed revision ideas in subgroups

 

Each group contained a mix of first, second and third years. This allowed us to learn from our more experienced peers and discuss ways of revising over the Christmas break. After talking it over in small groups we compiled our ideas into a list on the board of do’s and don’ts for revision. It was really helpful to talk to the older years and get their first-hand experiences for revision. One of the talking points which was especially useful to have first- hand advice on was how revising for different topics could differ, for example topics like discrete maths may lend themselves to a more practical, question- based revision method whereas algorithms might be better studied through writing our own notes and implementing examples in ML or Java.

At the end of the session we regrouped and wrote down all of the ideas we’d come up with:

Do’s Don’t
Identify your concerns Don’t waste time on tiny bit or in general
Do practice questions Don’t procrastinate
Some exam questions Don’t simply rely on exam questions
Refresh – revisit information that you’ve learned in progressively longer time periods

(Eg: Flashcards/Anki app)

One size doesn’t fit all subjects
Teaching: to be able to understand something well enough to explain it Don’t just do one subject at a time
Get set work out of the way Don’t be distracted
Have a revision plan

-Schedule tasks, not time

-Schedule by week not by day

Don’t stay in the same location
Paper 3 (new course):

  • Discuss with friends
  • Discuss with supervision likely topics
  • List core concepts
  • Papers from older courses?