When computers prioritise what they see
Computer vision is a branch of artificial intelligence (AI) that aims to create techniques that enable computers to see and understand visual content such as images and videos. When processing visual content, computers look for patterns and link them to labels that make sense for us humans. For example, if the machine is taught that two eyes, a nose and a mouth, are features of a face, it will tell us that a face was detected in an image when those features or patterns were found. The same goes for any other object. Consider a scene where the computer programme detects different objects such as a person and a car. During the past decade, we have seen countless methods that successfully find objects in a scene and tell us what they are. For example, computers can successfully detect the person and the car, draw a bounding box around them and even tell us which pixels represent the person and the car. While computers can successfully detect objects, they do not indicate the order in which we can interpret the scene. However, this advancement introduces a new problem in computer vision: answering questions such as “Which object in the scene attracts the most attention?” Saliency...