Algorithmic Video Summarisation

With the first week of Lent term out of the way, our Wednesday meeting saw Jake present his part II project on algorithmic video summarisation.

With the volume of video footage recorded each year from 6 million CCTV cameras in the UK estimated at a dizzying 50 billion hours, it is evident that humans can no longer keep up with the task of picking out important events for police work. Jake’s solution aims to use a variety of video summarisation techniques to reduce videos to just the scenes containing interesting activity. In simple terms:

video-summarisation

He presented a variety of methods for determining just what sections of a video are ‘interesting’. We were shown an early demo implemented using the open source computer vision library OpenVC, and assured that the choice of Windows & Visual C++ was necessitated by performance alone. The demo took some CCTV-style footage (although punting on the River Cam rarely appears on Crimewatch, admittedly), and analysed the change in colour distribution between frames to determine the moments where the most action was taking place.

This proof of concept showed the legitimacy of an algorithmic approach for selecting regions of activity, and we were then introduced to a range of more sophisticated techniques that might be considered. These included comparisons to a median image (with interesting parallels to how modern video encoding is performed, with H.264’s use of keyframes), as well as motion and object detection, facial recognition, and the easily forgotten idea of examining audio data too.

Of course, there is no use comparing these approaches without some objective measure of the quality of their results. This evaluation will be a significant part of Jake’s project, as he seeks to compare the computed videos with manually produced summaries. He presented the novel concept of asking users to perform objective tasks from viewing the extracted footage, for instance counting the number of people entering a room, as a means of determining the reliability of the summaries.

Aside from utilising CCTV footage more effectively, this technology also has also potential uses in video browsing and retrieval systems, and consumer video composition apps (a step forward from auto-generated slideshows of holiday snaps, we can only hope). Best of luck, Jake.

Advertisements