A Non-context-free Metamorphic Virus

Last wednesday’s session was an enlightening view of the dark and mysterious world of computer viruses. Mistral was our guide, delivering a detailed and intriguing talk on the metamorphic virus he has been developing for his Part II Project

Given the wide misuse of the term by the media, it is worth clarifying what is actually meant by a virus, in the technical sense.

Virus (pl. viruses) is a program that can ‘infect’ other programs by modifying them to include a possibly evolved copy of itself.

- Fred Cohen

With this definition in mind, we can move onto metamorphic viruses, which are viruses which mutate their code in some way upon each infection.

To wet out appetite, Mistral conjured up screenshots of historic viruses (pictured), giving a brief history of computer viruses. The earliest viruses were mainly for the purposes of a joke and it was typically the intention of the creator to make it obvious to the user that their machine had been compromised. This usually took the form of displaying messages or images to the user. Indeed, mistral gave a demo of a DOS virus which he had written (!) which informed users that their PC had taken a ‘holiday’ and would then proceed to lock the PC for the entire day!

example_mistral

Mistral went on to describe various techniques deployed by anti-virus engines which attempt to detect infected processes and files on the system. He covered a range of topics, covering simple techniques such as simply hashing the whole (or parts of) the file to far more advanced techniques, including the use of finite automata to model and detect infections.

He explained the principle components of a metamorphic virus and how his virus has been designed. This list of components included a disassembler, assembler and linker as well as the analysis engine and transformation system.

The first step in the infection process if to generate the first patient, which is taken by infecting the first process with your virus, which would have been compiled into an executable. This form of the virus is the one which is transmitted and will evolve continually on each new infection. The virus will need to disassemble itself at each step, into some internal representation, analyse and transform this representation before lining and assembling the result into the new infected process. This is a huge topic but Mistral delivered detailed examples including instruction permutation and constant folding/unfolding.

The final part of Mistral’s talk involved a discussion of formal languages and automata and how viruses can be classified:

  • Language: Virus family
  • Word: Virus Instance
  • Grammar: Metamorphic engine
  • Automata: metamorphic virus detector

It turns out that there is a weakness in the design of such detection schemes and Mistral presented some of his ideas about how this can be overcome.

Overall, a thoroughly enjoyable talk, with some extremely interesting food for thought!

The Hassabis Fellowship in Computer Science

Exciting news.

One of our alumni Demis Hassabis has made a donation to college to endow two fellowships. One will be the Hassabis Fellowship in Computer Science and the other will be the Niccoli Fellowship in Natural Sciences in honour of Demis’ wife, Teresa. They met at Queens’ and she studied Natural Sciences so I think that’s very fitting.

My fellowship will henceforth be known as the Hassabis Fellowship in Computer Science.

The supervision system, and indeed the close working relationship between me and my students is very expensive to provide. Interest generated from the college’s endowment is used to subsidise this. Demis’ donation means that there is now money in the endowment specifically for the purpose of teaching Computer Science. This is a really great thing to ensure that we can carry on delivering the highest quality teaching and support to Computer Science students at Queens’.

I’m also pleased to report that the college’s Governing Body has decided to admit Demis as a Fellow Benefactor. The first ever Computer Science Fellow Benefactor at college!

Thank you Demis!

Anonymous Peer-to-Peer File Sharing

300 million people use peer-to-peer file sharing systems every month. However, with privacy being a growing concern among todays users, the need for an anonymised file sharing system is evident. This is the basis for Sam’s Part II Project, Anonymous Peer-to-Peer File Sharing.

Sam introduced us to the basic concepts of file sharing, both client-server and P2P methods. BitTorrent uses the peer to peer method which has the advantage of having higher availability, reliability and usually providing faster download speeds. There are many uses for file sharing, such as downloading music from Spotify or downloading games via apps like Steam (Eduard was particularly surprised to learn that Steam uses a P2P system). Sam argued that although these services are useful, there are many reasons to provide privacy and it is very easy for a malicious user to discover peers using the service.

sampres1

There are methods to stay anonymous when using P2P services such as using a VPN to connect, or using TOR. However, a good VPN is usually not free and usage can still be traced back to you, the user. TOR is slow and is incompatible with many clients.  As a solution to this problem, Sam has implemented a system that provides this desired anonymity using relays. The program first sets up a route to each peer and a tracker encrypts each layer; this is essential for the relays to know where to forward the data. BitTorrent is run once the route is set up, and data is encrypted using an end-to-end key.

sampres2

Sam demoed the project by showing us the transfer of a picture from one location to another on his computer and as a closing remark outlined some limitations to this approach. Unfortunately the tracker provides a central point of weakness to the system and must be trusted itself.

Sam’s project demonstrates a possible solution to meet the privacy concerns of today’s users whilst still allowing them to download all their songs and movies. Best of luck with your project Sam!

Knot Theory Presentation

This week at our weekly Queens’ Computer Science meeting, Eduard presented his Part II Project about Knot Theory. While Knot Theory is still a relatively new field of study and is largely being explored on a theoretical level, a significant application of Knot Theory has been to DNA in genetics.

The general problem of Knot Theory is to devise the best method to convert a knot, a twisted or otherwise manipulated circle in 3D space to a circle in 3D space, assuming that this can be done. Eduard’s project concentrates on trying out two different solutions to this problem and seeing which one more efficiently discovers the actions needed to result in a circle.

The first part of Eduard’s talk focused on the image processing aspect of his project. In order to transform a possibly unclear image to a graph in 2D, the pixels must be processed one at a time and converted to either black or white depending on whether or not the specific pixel represents a knot or the background. In addition to showing the image below, Eduard also performed a cool live demo in w he took a photo of a knot drawn on the white board with a marker and demonstrated the image processing, converting the imperfect hand-drawn image into a clear representation of the knot on the computer.

An example of an image containing a knot what needs to be processed before being used.

 

Once the redundant information in an image is removed, any bridges that were broken in the image need to be connected and the problem becomes how to figure out the center of each line to best be able to model the knot with a line and avoid various edge cases where the line may become stuck or curve back on itself while trying to trace the knot. The center of a knot segment can be computed by using circles of a finite radius, limited by the perimeter of the knot, sampled at various points in the knot and calculating the local maxima.

The knot can be represented in two different ways using Gauss codes, either by encoding the bridges with numbers and with + or – to represent either going over or under an intersection, or by using a three half-planes to represent the knot.

One method of using Gauss codes to represent a knot.

slide_18

A second way to visualize and represent a knot.

A finite number of the three Reidmeister moves can be applied to the knot diagrams to end up with a circle. Eduard plans to explore the two algorithms that solve the Knot Problem with various heuristics.

The three Reidemeister Moves which can be applied to a knot.

 

 

In addition to learning about the fascinating field of Knot Theory, we discussed some recommendations for giving good presentations, including committing the first thirty seconds of your talk to memory, since the audience will likely decide whether they will remain engaged within the first thirty seconds of a talk.

 

 

Requirements management in large systems

Last night we had a guest talk from Bob Salmon who was an undergraduate at Queens’ in 1989.

Bob talked about how adding a single requirement can drastically change the design of a large system. We went through some examples starting with the requirement that your system should be upgradeable. This is easy to do if its your laptop: install updates and reboot. But if you also have the requirement that it must work for a very large system. There might not be enough downtime available (e.g. overnight) for you to shut the system down to upgrade it. So instead you might run a series of copies to a shadow system whilst leaving the original system (the system of record) running. Once all the copies are done you can then flick a switch to deploy the new one. We also looked at architectural approaches to provide high-availability – it was left to the audience to think about how to provide high-availability upgrades!

Bob has worked on the UK smart meter project which is slated to install smart meters in every home in the UK by 2020. He talked about the challenges in building this system and how some of the different requirements are interacting to make things even more difficult. One interesting aspect of the Smart Meter proposal is that other devices in my house will be able to talk to the meters. For example, my washing machine might want to know when electricity is cheap. This requires firewalling within the house to ensure that consumer devices cannot interfere with the national energy distribution infrastructure – we have to watch out for trojan washing machines.

After the talk we went to dinner as usual and we talked about how the Computer Science course has changed (or not changed!) since Bob’s time.

Algorithmic Video Summarisation

With the first week of Lent term out of the way, our Wednesday meeting saw Jake present his part II project on algorithmic video summarisation.

With the volume of video footage recorded each year from 6 million CCTV cameras in the UK estimated at a dizzying 50 billion hours, it is evident that humans can no longer keep up with the task of picking out important events for police work. Jake’s solution aims to use a variety of video summarisation techniques to reduce videos to just the scenes containing interesting activity. In simple terms:

video-summarisation

He presented a variety of methods for determining just what sections of a video are ‘interesting’. We were shown an early demo implemented using the open source computer vision library OpenVC, and assured that the choice of Windows & Visual C++ was necessitated by performance alone. The demo took some CCTV-style footage (although punting on the River Cam rarely appears on Crimewatch, admittedly), and analysed the change in colour distribution between frames to determine the moments where the most action was taking place.

This proof of concept showed the legitimacy of an algorithmic approach for selecting regions of activity, and we were then introduced to a range of more sophisticated techniques that might be considered. These included comparisons to a median image (with interesting parallels to how modern video encoding is performed, with H.264’s use of keyframes), as well as motion and object detection, facial recognition, and the easily forgotten idea of examining audio data too.

Of course, there is no use comparing these approaches without some objective measure of the quality of their results. This evaluation will be a significant part of Jake’s project, as he seeks to compare the computed videos with manually produced summaries. He presented the novel concept of asking users to perform objective tasks from viewing the extracted footage, for instance counting the number of people entering a room, as a means of determining the reliability of the summaries.

Aside from utilising CCTV footage more effectively, this technology also has also potential uses in video browsing and retrieval systems, and consumer video composition apps (a step forward from auto-generated slideshows of holiday snaps, we can only hope). Best of luck, Jake.

hackathon

An Electronic Trading Hackathon (eth0)

Last weekend, Jane Street organised ‘eth0′ – the first edition of their hackathon. Being a quantitative trading firm, Jane Street write smart computer programs that are constantly trading on the stock market, implementing novel algorithms and ideas. This means that they need efficient and robust code, as well as profitable strategies.

Thus, the aim of the hackathon was to provide a similar experience to computer science students by getting teams to write bots that would compete on a mock exchange. Not missing the opportunity to strut our stuff, the team from Queens’ comprised Jeppe, Eduard and Sid.

The brief was simple: an online service was provided, to which bots could send JSON requests and execute trades. There were a few stocks – FOO, BAR, BAZ, QUUZ – that every team could trade (along with a few bots written by Jane Street). There was also an additional product called CORGE which was a kind of ETF (Exchange Traded Fund). This meant that it was a composite of other stocks – 0.1 FOO and 0.8 BAR. At any point, you could convert from one form to the other.

Given only that much information, the teams were to compete in three rounds. The scores were weighted such that your worst performances counted more than your best ones. This was to simulate the real world where a big loss is far more negative (potentially causing the firm to close down) than a profit is positive.

The first part of the hackathon turned out to be more a technical than financial task. The team from Queens’ spent a few hours just writing code to connect to the server, receive updates on prices and trades, store/ parse this data, and finally send responses.

When it finally got to the trading bits, Queens’ managed to implement two strategies.

The first is called ‘market making’. Essentially, every stock currently trading has two sets of prices. One of these is the ‘offer’ – the price someone is willing to sell the stock at. The other is the ‘bid’ – the price someone is willing to buy at. There is normally a difference between the highest bid and the lowest offer – known as the ‘spread’. The bot worked by placing a buy request just above the bid and a sell request just below the offer. When people wanted to either buy or sell, the bot’s price would be the most attractive and the trade would get executed. Thus, the bid-offer spread (albeit small) is the profit the bot makes.

Queens’ managed to get this working before the first round and was the only team to make money.

For the next rounds, another strategy was used – ‘ETF arbitrage’. Arbitrage means the simultaneous buying and selling of a product to make a ‘riskless profit’ from price differences. For example, if you could buy an apple for $1 at store A, and then sell it to store B at $1.01, you could technically make $0.01 without spending any of your money. You could then borrow money and do this ad infinitum – as long as it was profitable. The ETF arbitrage worked similarly with CORGE, FOO and BAR. This could be thought of as buying a basket of 10 apples for $1 and then selling each for $0.11. Since CORGE could always be converted to 0.1 FOO and 0.8 BAR – it should always be priced as 0.1 the price of FOO plus 0.8 the price of BAR. If it is ever priced less than this (like the apples), you can buy the stock and convert it to FOO and BAR which are worth more. Thus, you can then sell them and pocket the difference (individual apples being sold). If the basket were to be worth more, you could do the reverse transaction. The challenge was made harder with a few extra transaction costs, but the idea was the same.

Queens’ turned out to be pretty successful in round two – coming second, and finishing third overall due to a large loss in the final round.

All in all, the excessive code, food and (of course) Red Bull, contributed to a really fun Saturday. We’re looking forward to doing more of these.