At the IEEE Conference on Computer Vision and Pattern Recognition which ran from 20 - 27 June 2014, a new algorithm was presented by an international group of researchers that can determine, with 80 per cent accuracy, whether video is running forward or backward. By observing Einstein’s arrow of time, this research could help create more realistic graphics for the entertainment industry and also further the understanding of the visual world.
‘It’s kind of like learning what the structure of the visual world is,’ said William Freeman, a professor of computer science and engineering at MIT and one of the authors of the paper. ‘To study shape perception, you might invert a photograph to make everything that’s black white, and white black, and then check what you can still see and what you can’t. Here we're doing a similar thing, by reversing time, then seeing what it takes to detect that change. We're trying to understand the nature of the temporal signal.’
Three separate algorithms were written by Freeman and his team that tackle the issue in different ways. All three were trained using a series of short videos that had already been identified as either playing backwards or forwards.
The algorithm that performed best begins by dividing a frame of video into a grid of hundreds of thousands of squares; then it divides each of those squares into a smaller, four-by-four grid. For each square in the smaller grid, it determines the direction and distance that clusters of pixels move from one frame to the next.
The algorithm logs approximately 4,000 four-by-four grids, where each square in a grid represents particular directions and degrees of motion. The selected results are chosen to offer a good approximation of all the grids in the training data. Finally, the algorithm combs through the example videos to determine whether particular combinations of grids tend to indicate forward or backward motion.
Following standard practice in the field, the researchers divided their training data into three sets, sequentially training the algorithm on two of the sets and testing its performance against the third. The algorithm’s success rates were 74, 77, and 90 per cent.
One vital aspect of the algorithm is that it can identify the specific regions of a frame that it is using to make its judgments. The types of visual cues that the algorithm is using could indicate the types of cues that the human visual system uses as well.
The next-best-performing algorithm was about 70 per cent accurate. It was based on the assumption that, in forward-moving video, motion tends to propagate outward rather than contracting inward. In a video of a break in pool, for instance, the cue ball is, initially, the only moving object. After it strikes the racked balls, motion begins to appear in a wider and wider radius from the point of contact.
The third algorithm was the least accurate, but it may be the most philosophically interesting. It attempts to offer a statistical definition of the direction of causation.
‘There’s a research area on causality,’ Freeman says. ‘And that’s actually really quite important, medically even, because in epidemiology, you can’t afford to run the experiment twice, to have people experience this problem and see if they get it and have people do that and see if they don’t. But you see things that happen together and you want to figure out: ‘Did one cause the other?’ There’s this whole area of study within statistics on, ‘How can you figure out when something did cause something else?’ And that relates in an indirect way to this study as well.’
If a ball is recorded travelling down a slope and collides with a bump, it would be launched into the air after the contact. However, when played in reverse, the ball would take flight with no apparent cause due to the absence of the bump. The researchers were able to model the intuitive distinction of whether the cause was present as a statistical relationship between a mathematical model of an object’s motion and the ‘noise,’ or error, in the visual signal.
However, the approach works only if the object’s motion can be described by a linear equation, and that’s rarely the case with motions involving human agency. The algorithm can determine, however, whether the video it’s being applied to meets that criterion. And in those cases, its performance is much better.