Google’s Artificial Intelligence is able to create videos from a single image

The team working on Google's DeepMind advanced neural network has unveiled the new progress achieved with a very interesting feature: called Transframer, it allows the Artificial Intelligence software in question to generate 30-second videos starting from a single input in the form of picture . At first glance it may seem like a nifty little trick, but the implications are much bigger than just a .GIF file.

artificial intelligence

The increasingly advanced Artificial Intelligence software

In reality, Transframer is something bigger: it is a new generic framework for image modeling and vision activities based on probabilistic frame prediction . This new model unifies a wide range of activities, including image segmentation, view synthesis and video interpolation. The framwork thus unifies a wide range of image modeling and viewing activities with the ability to create video or other functionality from a single image with one or more context frames.

We present a generic framework for image modeling and vision tasks based on probabilistic frame prediction. Our approach unifies a wide range of activities, from image segmentation, to new visualization, synthesis and video interpolation. We pair this framework with an architecture we call Transframer, which uses U-Net and Transformer components to condition annotated context frames and output sequences of sparse and compressed image features.

The proposed model, on which this Artificial Intelligence is based, has in fact shown promising results on eight activities in total , some of which are semantic segmentation, image classification and optical flow prediction. What this article wants to focus on, however, is the ability of Transframer to create different videos, even if at low quality. The research team says this is a state-of-the-art model that should be the strongest and most competitive on video synthesis, and based on little information it can generate consistent 30-second video from a single image .

Being a framework dedicated to visual prediction, it bases its operation on a collection of context images with various associated annotations (timestamps, camera views, etc.) and a query annotation, the task is to predict a probability distribution on the final image. This allows him to "train" and consequently understand how to imagine a real object and how it should look when viewed from a different angle.

Transframer is state of the art on a variety of video generation benchmarks, is competitive with the most powerful models on short shot view synthesis, and can generate consistent 30-second video from a single image without any explicit geometry information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification, and optical flow prediction without activity-specific architectural components, demonstrating that multitasking computer vision can be addressed using probabilistic image models. In principle, our approach can be applied to a wide range of applications that require learning the conditional structure of annotated data in image format.

Interesting developments for DeepMind

The developments were announced by Google on its own blog and published as a scientific paper entitled " Transframer: Arbitrary Frame Prediction with Generative Models " whose abstract reads:

Even if the videos shown are very low resolution, it is still a particularly interesting model of AI. In fact, he is able to demonstrate a certain ability in perceiving the depth and perspective of objects to create a series of images that give a sense of movement if placed one behind the other . There are certainly different fields of application for this technology.

The article Google's Artificial Intelligence can create videos from a single image was written on: Tech CuE | Close-up Engineering .