GB2550209A

GB2550209A - Method of controlling a state of a display of a device

Info

Publication number: GB2550209A
Application number: GB1608484.0A
Authority: GB
Inventors: Woodhead Marc; Oakley Richard; Rogers David
Original assignee: Holograph Ltd; Lucozade Ribena Suntory Ltd
Current assignee: Holograph Ltd; LUCOZADE RIBENA SUNTORY Ltd
Priority date: 2016-05-13
Filing date: 2016-05-13
Publication date: 2017-11-15
Also published as: WO2017194978A1; GB201608484D0; EP3455831A1

Abstract

A method of controlling a state of a display 104 of a device 100, such as a vending machine, is disclosed. The method comprises operating the display 104 in a first state, for example by displaying first video, audio or image, and detecting the presence of an object associated with an individual 116, such as the face of the individual or a logo. It is determined whether the object is within a first threshold distance from the device, and the display is operated in a second state, for example by displaying second video, audio or image, if the object is determined to be within the first threshold distance. The presence of the object may be detected while the object is at a distance from the device greater than the first distance. A device 100 with a processor and camera 102 is also provided.

Description

METHOD OF CONTROLLING A STATE OF A DISPLAY OF A DEVICE Field A method of controlling a state of a display of a device is disclosed. Specifically, but not exclusively, a method of controlling a state of a display of a device to operate the display in first and second states is disclosed.

Background A large variety of devices exists which promotes or sells products or services to users. Such devices typically have a mechanical or touch screen interface with which a user can interact with the device to purchase such products or services. A vending machine is an example of one such device. Known vending machines have a transparent front door through which the products contained therein can be viewed. To purchase one of the products, a user inputs a number associated with the product into a mechanical or touch screen keypad attached to the vending machine. Once payment is made through any conventional means, the product is dispensed. Such known vending machines may also have static branding or advertising on the exterior of the machine.

Such known machines have limited capabilities to engage with a user, and rely on the user being drawn to the machine by seeing the static branding/advertising or contents from afar. This is generally ineffective as reliance is placed on a user being close enough to the machine to see the branding or products.

Other known machines may have a digital display screen to display dynamic content.

Such content may be an advert or series of adverts on a loop. While such adverts may be more clearly visible from a distance, they can be distracting and ultimately detract from the user engaging with the machine.

There is a desire to improve the interaction between a user and a machine.

Summary

An invention is set out in the claims.

According to an aspect, a method of controlling a state of a display of a device is provided. The method comprises: operating the display in a first state; detecting the presence of an object associated with an individual; determining whether the object is within a first threshold distance from the device; and operating the display in a second state if the object is determined to be within the first threshold distance. By carrying out these steps, the device is able to detect objects from afar and, if they draw closer, react to the object.

The object may be the face of an individual. When the face is detected, the individual is deemed to be looking at the device. Specifically, it is assumed that the individual is looking at the display of the device and the video and/or audio content being output. As the individual draws nearer, the face enters into within a first threshold distance of the device. The determination of the individual within the threshold distance, and the detection of the face of the individual while the individual is within this distance, causes the device to react by changing the state of the display in response to the determination. The state of the display is changed to a second, different state in which video and/or audio output by the display (and/or a speaker associated with the display in the case of audio) is different from that of the first state.

The change of state of the display serves to inform the individual that the device is aware that it is being looked at. Particularly, the state of the display may be changed to encourage the individual to move closer to the device and/or engage with the device.

Such a changing of display state improves the interaction between the device and the individual, and incentivises the individual to interact with the device.

The second display state may also provide information to the individual, thereby simplifying the sales process. By only displaying the second display state when the individual is both within range of the device and the face of the individual is detected, the device can be operated in the most optimum state depending on the distance of the individual from the device. For example, when the individual is far away from the device and the display is operated in the first state, the display may show a simple video with larger images designed to be visible from afar. When the individual turns to look at the display and moves closer, the display is operated in the second state in which more detailed or intricate content can be shown by the display.

As the individual draws yet closer to the device, the display can be operated in a yet further state in which more specific information relating to interaction with the device can be displayed. This is appropriate at this point in time as the individual would now be within range to touch the device, or interact with the device using wireless technology of an individual’s phone, for example.

The methods described herein therefore provide a device with the capability to tailor the interaction with an individual based on the distance of the individual from the device, and based on whether the individual is actually looking at the device. The content displayed or played is specific to the distance of the individual from the device, thereby ensuring that content relevant to, and visible from, the specific distance is provided. The tailoring of content improves the engagement of the individual with the device.

The methods and devices described herein may also respond to specific objects associated with the individual, such as branding or logos on the clothing of individuals. In response to detecting such branding or logos, the device reacts to reward the individual with offers, deals or similar.

Figure Listing

Exemplary arrangements of the disclosure shall now be described with reference to the drawings in which:

Figure 1 illustrates a system having a device with a display arranged to be operated in multiple states;

Figure 2 is a flow diagram of a process of changing a state of the device;

Figure 3 illustrates a device configuration for setting a threshold range for the device;

Figure 4 illustrates a method of detecting an object associated with an individual; and

Figure 5 is a flow diagram of a process of interacting with the device in a handsfree manner.

Throughout the description and the drawings, like reference numerals refer to like parts. Detailed Description

Figure 1 shows an embodiment of the device 100 in a networked system. The device 100 includes a variety of elements that provide capability for the device to detect an individual 116 near the device. To achieve such detection, the device comprises one or more cameras 102 and a processor 106. The cameras 102 may be HD cameras or stereoscopic cameras, for example. In the case of the cameras 102 being HD cameras, preferably two HD cameras are present at different locations of the device 100 to allow depth detection capability. The processor 106 is configured to receive image data from the one or more cameras 102, analyse said image data via software means, and determine whether an individual 116 is detected. The processing of the image data may be performed entirely by the device 100 itself, or may be performed by one or more other devices in communication with the device 100.

The device 100 also includes a display 104 to display content. The content may be stored in a memory 108 of the device 100, or the content may be provided from an external device having a data connection to the device 100. Alternatively, some content may be stored in the memory 108, while other content may be provided by the external device.

The external device may be a server 114. The display 104 is controlled by the processor 106 to operate in a plurality of states such that, when an individual 116 is detected, a change in the state of the display 104 serves to inform the individual 116 of the detection.

The device 100 may also comprise a payment system 110 to allow purchase of objects contained within the device 100. Such a payment system 110 may include near-field communication capabilities (NFC), or indeed any other payment mechanism known in the art that allows an individual 116 to exchange currency for goods.

The device 100 may be part of a network 112 including the device 100 and an external server 114. The network 112 provides a data connection to and from the device 100 and the external server 114. The device 100 may connect to the network 112 as an Ethernet connection, internet connection, 3G/4G connection, or any other data connection type that allows communication between the device 100 and the external server 114.

The device 100 may also comprise communication means 120 to communicate with a mobile wireless communications device 118 of the individual 116. The mobile wireless communications device 118 may be a mobile phone, tablet, laptop or any device able to communicate wirelessly with the device 100. The communication means 120 may provide communication via NFC, Bluetooth, internet, or any other form of wireless electronic communication known in the art. The device 100 may have multiple communication means 120, such as NFC and Bluetooth. The Bluetooth communication means may be a Bluetooth beacon.

Software on the individual’s mobile wireless communications device 118, such as an app, may allow interaction between the mobile wireless communications device 118 and the device 100.

The various software modules, classes and functions described in the appended Annex form part of the open source OpenCV library available at www.opencv.org. The skilled person is aware of the computer program code available in this library, and is able to implement such code to carry out the image processing functions described herein.

Figure 2 shows an overview of the operation of the device 100. At step 200, the device 100 is in an “attract mode” in which the display 104 is operated by the processor 106 in a first state. The first state could be the display 104 displaying predetermined content. For example, the first state could be displaying one or more of video or image content, and/or playing audio content via a speaker of the device 100.

While in the first state, the processor 106 monitors for objects associated with an individual. To monitor for objects, the processor 106 interprets image data received from the one or more cameras 102 and analyses the image data to determine whether any objects in the field of view of the one or more cameras 102 are identifiable by the processor. In an embodiment, the object is a face and the processor 106 interprets the image data to determine whether a face of an individual 116 is detected. Such a detection may be performed using image processing techniques well known to the skilled person. For example, a “CascadeClassifier” class module may be used, as detailed in the Annex of this specification. In any of the image processing steps described herein, the skilled person would understand how to make use of background subtraction techniques to identify objects. For example, the “background_subtraction module” functions, detailed in the Annex of this specification, may be used.

When a face is detected by the device 100, an individual is deemed to have turned to look at the device. Each time a face is detected, at step 200 a face detection event may optionally be stored in the memory 108 of the device 100 as a “head turn”. The device 100 is therefore able to determine the number of individuals who have turned to look at the device 100 based on detecting faces. Instead of or in addition to storing the data regarding the number of head turns in the memory 108, the device 100 may transmit the head turn data to an external device such as the external server 114 or the mobile wireless communications device 118.

Alternatively or additionally to detecting faces, in an embodiment the processor 106 may compare the image data received from the one or more cameras 102 with a database of object data stored within the memory of 108 of the device. Such a database may include data values corresponding to a variety of objects the device 100 is configured to detect. Data regarding the detection of objects may also be stored and/or transmitted in the same manner as the head turn data. An example of an object, other than a face, being detected is described below in relation to figure 4.

The content displayed on the display 104 while in the first state may be such that the content is visible and/or audible at a distance, since at step 200 the individual 116 may be far away from the device 100.

The skilled person would understand that the detection of a face or other object may be performed in different ways depending on the type of camera(s) 102 used. For example, in the case of the camera(s) 102 being HD cameras, individual still frame images are captured by the camera(s) 102 for analysis. Each image provided by an HD camera is handled by software (run by the processor 106) as a multidimensional array, for example a grayscale image. Two or more individual still frames may be compared by the software to determine whether a face or other object associated with the individual 116 has entered into a detection range of the device 100. Such a comparison may be performed using the known “NumPy” module, as detailed in the Annex of this specification. The two or more individual still frames may be successive images separated by the frame rate of the video capture. Alternatively, the images may be separated by a predetermined time. For example, still frames captured 5 seconds apart may be compared by the software. Any time duration or frame rate may be used, and ideally the frame rate is at least 10 FPS.

In the case that the camera 102 is a stereoscopic camera, the above steps also apply. However, the stereoscopic camera is able to capture multiple images at different angles, thereby providing a 3D model of the environment. The 3D model of the environment visible by the stereoscopic camera is built using device libraries as the skilled person would understand. Such device libraries are provided by the manufacturer or developer of the stereoscopic camera, and allow functionality of the stereoscopic camera to be used to build the 3D model. Further, using the 3D model, background noise is removed using known techniques and non-static objects are detected as detailed above in relation to the HD camera. Optionally, any “faces” that do not have a 3D depth are discounted. This allows the device 100 to discount any pictures of faces when using a stereoscopic camera.

The still frames captured by either of the above camera types may not be stored in the memory 108. Instead, any images may be converted to into several low resolution greyscale images. Each greyscale image may be tuned using known techniques to provide the best outline processing results for detecting the presence of an object associated with the individual 116.

The processor may additionally, at step 200 or at any other step, monitor for the presence of a human outline of an individual. Monitoring for a human outline, in additional to an object associated with an individual, improves the accuracy of detection since pictures of objects (such as faces or other objects detectable by the device 100) could be disregarded if no human outline is also detected. The detection of a human outline could be achieved by a number of techniques known to the skilled person. For example, the “Blob Extraction” functions detailed in the Annex of this specification may be used.

At step 200, if human outlines are detected then the device 100 may also count the number of individuals 116 who pass by the device 100 based on detected human outlines. This may be termed the “footfall” past the device 100. The footfall may include any or all of: all individuals that pass by the device 100, only individuals that pass by the device 100 and a face is not detected (i.e. the individuals do not look at the device 100), or only individuals that pass by the device 100 and a face or object is detected (i.e. the individuals do look at the device 100). The skilled person would understand how to use the image processing techniques described herein to determine the footfall past the device 100. For example, the functions detailed in the “Movement detection” and “people_counter module” sections of the Annex of this specification may be used.

The footfall may be stored in the memory 108 of the device 100. The device 100 is therefore able to determine the number of individuals who pass the device, and optionally whether the individuals did or did not look at the device 100 (based on detecting a face or not, as previously described). Instead of or in addition to storing the data regarding the footfall in the memory 108, the device 100 may transmit the footfall data to an external device such as the external server 114 or the mobile wireless communications device 118.

At step 202, the device 100 monitors for movement of the detected individual 116 into a first predetermined range or distance from the device 100, as described below. In other words, the device 100 determines whether the individual 116 has walked towards or closer to the device 100. The first predetermined range or distance of the device 100 may be termed an “engage zone”. The range or size of the engage zone may be based on the positioning of the one or more cameras 102 on the device 100, or may have a predetermined value. The predetermined value may be, for example, input during a configuration of the device 100. The range may be, for example, 3 meters, however other ranges may be used depending on the size of the device 100 and/or the area around the device 100.

In the case that the camera(s) 102 are HD cameras, the detection of the individual 116 entering into the engage zone may be achieved based on one or more dimensions of the detected face or other object. In the case that the face of the individual 116 is detected, the width and/or height of the detected face may be used to determine the distance of the individual 116 from the device 100. For example, a comparison between one or more dimensions of the detected face size and the corresponding dimension(s) of an average human face size provides an estimate of the distance between the device 100 and the individual 116. The skilled person would understand that such distance estimates can be made based on knowing the corresponding dimensions of an average face or the object at a certain distance from the device 100.

In the case that an object (other than a face) associated with the individual 116 is detected, a physical aspect of the object may be used to determine the distance of the object, and therefore the individual 116, from the device 100. For example, the width, height or dimension of a particular aspect of the object or entire object may be used. A comparison between the detected width, height or dimension of the detected object and the corresponding known width, height or dimension of the object provides an estimate of the distance between the device 100 and the object, and therefore between the device 100 and the individual 116. The skilled person would understand that such distance estimates can be made based on knowing the corresponding dimensions of the object at a certain distance from the device 100.

Optionally, an estimate of the age of the individual 116 may also be calculated based on the ratio of detected face height to detected human outline height. Known ratios between average face height to body height (human outline height) can be used to estimate the age, since face height makes up a smaller proportion of body height as an individual grows taller.

In the case that the camera 102 is a stereoscopic camera, the distance of the individual 116 from the device 100, and therefore the detection of the individual 116 within the engage zone, may be achieved using the accurate depth perception possible with the stereoscopic camera. As mentioned above in relation to step 200, the stereoscopic camera provides the capability to build a 3D model of the environment, and therefore based on this the processor 106 can readily determine the distance of any detected objects from the device 100.

Optionally, the stereoscopic camera may determine the height of a human outline. Using a normalized input, this may be achieved in one of the following ways: during a configuration of the device 100, an object of a known height may be placed within the view of the stereoscopic camera at a known distance from the camera to calibrate the device 100. The calibration ensures that the device 100 is able to accurately determine the height of objects entering the 3D space. Alternatively, the height may be estimated based on the accurate depth perception provided by the stereoscopic camera and knowledge of the height of the stereoscopic camera from the ground, as the skilled person would understand.

Optionally, as for the HD camera, the stereoscopic camera also provides the possibility to estimate the age of the individual 116. This may be estimated based on the ratio of face height to human outline height, for example. The skilled person would understand that the stereoscopic camera provides more accurate estimations for all aspects due to the 3D models and depth perception possible.

For any camera type used, based on known data for average height and average face width and/or height for each gender, the most probable gender of the individual can optionally also be determined. Such a determination can therefore be made without having to obtain any personal information from the individual 116.

Additional data may also be used to improve the probability that the correct gender determination is made. This additional data may include detection of other objects that the device 100 has been configured to detect, as explained above in relation to step 200. For example, clothing length, hair length and shape of the individual may be used and matched against existing data for each gender. If the camera 102 is a stereoscopic camera, the probability of correct gender detection is increased as the use of a 3D model allows the detection of a greater range of objects. Average human width data for each gender may also be used to determine the probable clothing size of the individual.

Figure 3 shows an example of how the physical positioning of the one or more cameras 102 defines the range or size of the engage zone. The camera(s) 102 are directed down at an angle Θ defined between the device 100 and the central viewing axis of the camera 102. In the case where the device 100 is generally perpendicular with the floor, by making Θ acute the central axis along which the camera 102 is facing is directed towards the floor. An individual 306 is not detected by the camera 102 as the face 306a of the individual 306 is not within the view of the camera 102. However, an individual 304, who is closer to the device than the individual 306, would be detected as the face 304a of the individual 304 is within the view of the camera 102. Therefore, when the individual 304 is standing within a distance 302 from the device 100, the presence of an individual is detected. Put another way, if the individual 306 moves to the position of the individual 304, the device 100 will detect that the individual 306 has moved within range of the device 100 by detecting the face 306a, and therefore into the engage zone.

Returning to figure 2, at step 204, once it has been determined that the individual 116 is within the first predetermined range of the device 100, the device 100 enters an “attract mode” and the processor 106 changes the state of the display 104. The state is changed from the first state of step 200, to a second state in which one or more of the displayed video, image or audio content is different from that of the first state. The second state may indicate to the individual 116 that the device 100 has detected the individual 116.

For example, in the first state a video may be displayed, and in the second state audio may be added to the video. Operating the state of the display 104 in the second state may mean adding new audio content to the same video or images as those displayed in the first state. The audio may be such that the individual is aware that the device 100 knows it is being looked at. As another example, the first state may be a first video and the second state may be a second, different video. The first video may be rolling digital advertising, and the second video may be an animation or other visual content indicating to the individual 116 that the device 100 knows it is being looked at. Many different variants of the first state and second state are possible.

As the device 100 optionally estimates the age of an individual as described above, the first and/or second state of the display 104 can be tailored based on the estimated age. For example, the first state may display a video, image or audio more appropriate for a younger individual if it is determined that the individual is a child. As another example, the processor 106 may prevent the display 104 from displaying certain content if it is determined that the individual is too young for such content to be appropriate. For example, there may be an age threshold below which certain content is prevented from being displayed/played in one or more of the states.

While in the second state, at step 206 the device 100 may monitor for the entry of the individual 116 into a second predetermined range or distance from the device 100, or “sell/display zone” of the device 100. The monitoring and detection of the individual 116 entering the second predetermined range may be performed in the same manner as the monitoring and detection of the individual 116 entering the first predetermined range or engage zone of the device 100, the difference being that the second predetermined range or distance is shorter than the first predetermined range. In other words, the second predetermined range represents a distance closer to the device 100 than the first predetermined range. The second predetermined range or distance is therefore within the first predetermined range or distance. The second range may be, for example, 1 meter. The size of the second predetermined range can vary depending on the size of the device 100 and/or the area the device 100 is in.

At step 208, if the device 100 detects that the individual 116 has entered into the sell/display zone, the processor changes the state of the display 104 into a third state. The third state is different from first state and/or the second state. For example, the third state may be the display 104 displaying information regarding goods or services offered by the device 100, however the third state may provide a variety of other visual or audio content.

In an embodiment, at any or all of steps 202 to 208, the device 100 may continuously or periodically monitor to determine whether the detected face of the individual 116 is still detected. In other words, the device 100 may determine whether the individual 116 is still looking at the device 100 at any or all of the steps 202 to 208. If the device 100 is no longer able to detect the face, i.e. the individual 116 looks away from the device 100, the device 100 may return to step 200, or indeed any other previous step of figure 2.

Additionally or alternatively, the device 100 may determine whether the individual 116 is still looking at the device 100 before advancing between any of the steps of figure 2. For example, before moving from step 202 to step 204, the device 100 may check that the face of the individual 116 is still detected. In other words, after the individual 116 has moved within the first predetermined range of the device 100 at step 202, the device 100 determines whether the individual 116 is still looking at the device before moving to step 204.

The device 100 may wait a predetermined length of time, for example 3 seconds, after failing to detect the face before returning to a previous step of figure 2. The predetermined length of time may be different depending on the specific step. By waiting a predetermined length of time, the individual 116 is able to briefly look away from the device 100 without the device 100 returning to a previous step. By continuously or periodically monitoring to determine whether the previously detected face is still present, the device 100 is able to revert to a previous state in the event that the individual 116 turns away and/or walks away from the device 100. In this way, the device 100 is quickly ready to interact with a new individual after the individual 116 has stopped interacting with (i.e. looking at) the device 100.

Figure 4 shows an example of an object other than a face being detected. In this example, the individual 116 is wearing clothing having an object 400 printed thereon. The memory 108 of the device 100 contains object data and/or images enabling the processor 106 to cross-reference image data of the object 400, obtained by the camera 102, with the object data and/or images stored in the memory 108. The stored object and/or image data may, for example, be obtained through machine learning as the skilled person would understand. For example, stored object data corresponding to detectable characteristics of the object 400 may be cross-referenced with the image data obtained from the camera 102 such that, after receiving the image data from the camera 102, the processor is able to interpret and match the image data to the object data to determine whether the object 400 is present.

If it is determined that the object 400 is present, the device 100 may react by playing a sound and/or changing the state of the display 104. The state of the display 104 may be changed from the first state to another state. For example, the another state could be the second, third or fourth previously described, or could be a fifth, unique state. The another state may serve to inform the individual 116 that the object 400 has been detected. In response to detecting the object 400, the display 104 may offer discounts, deals or other information to the individual 116. The object may for example be a particular product held or worn by the individual 116, or a brand’s logo or wording. The offered discounts, deals or information may therefore be related to the specific product or branding detected. The changing of the state of the display 104, in response to the object 400, may only be permitted during the step 200 of figure 2, and/or may be permitted during any other step of figure 2 previously described.

In an embodiment, the device 100 may be a vending machine that contains therein various goods. The individual may access the goods by inputting a number or code in a mechanical or touch screen keypad corresponding to a number or code allocated to the desired goods, and paying for the desired goods by any conventional means. For example, credit/debit cards may be used with an in-built card reader, coins or notes may be feedable into the machine, or contactless payment via a contactless credit/debit card or mobile device may be available. The specific goods and the numbers/codes allocated to the goods may only be visible on the display 104 when the display is in the third state. Different or other information regarding the goods may be displayed while in the third state. The third state of the display 104 therefore provides information regarding the goods or services provided by the device 100. For example, the third state may provide the individual with the necessary information to obtain one or more of the goods inside the vending machine.

The selection of specific goods may optionally be achieved in a contactless or hands-free manner, as detailed in relation to figure 5. At step 500, while the individual 116 is within the sell/display zone, the device 100 may estimate and monitor the face angle of the individual 116 to estimate where on the display 104 the individual 116 is looking. This estimation and monitoring may be based on an assumption that the eyes of the individual 116 are looking straight ahead, and therefore the individual 116 is looking in the direction determined by the face angle. Alternatively, this estimation and monitoring may be based on determining where the pupils of the individual are looking, as described in more detail below.

In the case that the camera(s) 102 are HD cameras, the face angle with respect to the camera(s) 102 may be estimated based on a detected eye and nose angle to determine a 3D position of the detected face. Optionally, making use of the relative width and height of the detected face compared to average face width and face height, and comparing this to the detected eye and nose angle can also be performed to improve the estimation.

To detect the eye and/or nose angle, the device 100 may be “trained” via conventional means, such as machine learning, to recognize the eyes and nose of a detected face. After determining the position of the eyes and nose, path lines are created between the detected eyes and down the length of the detected nose. By aligning these path lines with the detected face, the angle of the face in relation to the camera(s) 102 can be calculated based on the length the path lines compared to expected lengths if the face were looking straight ahead at the camera(s) 102. For example, if the distance between the eyes is different than expected, the skilled person would understand that the detected face is likely to be turned to the side. Similarly, if the path line down the length of the nose is different than expected, the skilled person would understand that the detected face is likely to be tilted up or down. The specific calculations depend on the position of the camera(s) 102 on the device 100, and the determined distance of the individual 116 from the device 100.

The relative width and height of the detected face, compared to average face width and height, may also be used to improve the estimation. The skilled person would understand that, if the detected face width is different to the average face width at the determined distance from the device 100, the detected face is likely to be turned to the side. Similarly, the skilled person would understand that, if the detected face height is different to the average face height at the determined distance from the device 100, the detected face is likely to be tilted up or down. As such, the most likely position of the face and therefore head in relation to the camera(s) 102 is determined. The specific calculations depend on the position of the camera(s) 102 on the device 100, and the determined distance of the individual 116 from the device 100.

In the case that the camera 102 is a stereoscopic camera, using the 3D model previously described, the nose angle and/or eye angle can be readily determined based on estimating the orientation of the detected face with respect to the camera 102. The skilled person would know how to estimate where on the display 104 the eyes of the detected face are looking, based on an assumption that the eyes are looking straight ahead and the orientation of the detected face using the 3D model.

If the assumption that the eyes are looking straight ahead is removed, the estimation and monitoring of the face angle may be improved by, after identifying the location of the eyes, further identifying the location of the pupils specifically. This may also be achieved by “training” the device to identify pupils, as the skilled person would understand. The above-described path line techniques may then be used to create path lines between the pupils. By cross-referencing the length of the path line between the pupils with the intersection point of the path line down the length of the nose, based on an assumption that the eyes are equidistant from the nose and that the pupils look in the same direction, the direction in which the pupils are looking can be determined. For example, if the path line drawn between the pupils is offset to the left from the camera’s point of view (i.e. the intersection point of the nose path line is offset to the right of the center of the pupil path line), the processor 106 may determine that the pupils are looking to the right from the individual’s point of view. Based on the above description, the skilled person would understand how the intersection point of the nose path line and the path line between the pupils may be used to estimate where the pupils of the individual 116 are looking. Other methods may be used to determine where the individual 116 is looking based on pupil detection.

At step 502, if the device 100 determines that the individual has been looking at the same location on the screen 104 for a predetermined duration, the device 100 determines that the location is selected by the individual 116. The predetermined duration may be, for example, 5 seconds, however other durations may be used. Alternatively, the device 100 may, after the predetermined duration, prompt the individual 116 to perform a predetermined action to confirm the selected location. Such a predetermined action could be nodding, for example. The detection of a nod or other predetermined action of the individual 116 may be achieved using standard image processing techniques available to the skilled person. The location may be associated with a product, advertisement, or other selectable element displayed on the display 104. In the case of the device 100 being a vending machine, the location may correspond to a particular product contained within the vending machine.

At step 504, the display 104 indicates that the chosen location is selected. The individual may then interact with the device 100 to action the selection. In the case of the device 100 being a vending machine, at step 504 the individual may optionally be prompted to pay for the selected product by any of the means previously described.

Once payment is complete, at step 506 the display 104 may be operated in a fourth state by the processor 106. The fourth state may be displaying different video, picture or audio content to one or all of the other states. For example, a “thank you” message or equivalent may be displayed in the fourth state. The processor 106 may only operate the display 104 in the fourth state for a set duration, for example 5 seconds, before returning to a previous state of the display 104.

Alternatively, the process may go straight from step 504 to step 508. At step 508, the individual may cancel a selection indicated at step 504. Such a cancellation may be by way of the individual looking at a certain location of the display 104, distinct from the location chosen at step 504. For example, the individual may look at an “X” on the display 104 for a predetermined duration to signal to the device 100 that the location chosen at step 504 was an error/mistake. The device 100 may determine that the individual is looking at the “X” location in the same manner as that described above in relation to step 502.

Alternatively, to effect the cancellation at step 508 the individual may shake their head at the display 104. The detection of a head shake or other predetermined action of the individual 116 may be achieved using standard image processing techniques available to the skilled person. This action is detected by the device 100. Additionally or alternatively, the cancellation may be effected by the individual 116 moving out of the sell/display zone.

Once the device 100 detects the cancellation at step 508, the process returns to step 500 in which the device 100 monitors the face angle and eye angle of the individual 116 as previously described.

Optionally, the individual 116 may be able to interact with the device 100 at any time, regardless of the state of the display 104 or distance of the individual from the device 100.

The device 100 may collect various information and store such information in the memory 108 and/or send all or part of the information to an external device, such as the external server 114. In the case that the device 100 has one or more HD cameras, some or all of the following information may be collected: • Number and frequency of human outlines passing the machine. For example, a count could be increased every time a human outline is detected. • Number and frequency of head turns towards the machine while an individual is outside of or within the engage zone. For example, a count could be increased every time an outline of a human face is detected while the individual is within the engage zone. • Number and frequency of sell/display zone interactions, and how this number of frequency is related to the above two points. For example, the number of times an individual interacted with the display 104 while the display 104 is in the third state may be logged, and this may be compared to the number of times a human outline is detected and/or the number of times a head turn is detected. • Number and frequency of system resets • Machine fills and number of times a door of the device 100 is accessed (open/closed) • Data from the payment system 110 or any of the communication means 120, together with operational status information. Such operational status information may be a successful sale of a product, a failed sale of a product, or an offline state of the device 100, for example.

In the case that the device 100 has a stereoscopic camera, all of the above information may be collected, in addition to some or all of the following information: • Number of individuals in a group. This is possible due to the 3D model made possible by the stereoscopic camera. Such information may only be obtained when the individuals are within a certain range of the device, such as the first and/or second predetermined ranges previously described. Alternatively, the number of individuals in a group may be detected at any range from the device 100, or any predetermined range from the device 100. For example, the number of individuals in a group may only be detected if the individuals are outside or inside a certain range of the device 100. • Average distances of individuals from the device 100 and each detected human outline • Height, weight/size, gender and age of each detected human outline

The skilled person would know how to determine the number and frequency of human outlines, i.e. individuals, passing the machine. For example, this information may be determined using the “people_counter module”, “ForegroundBlob class” and “PeopleCounter class” described in the Annex of this specification.

In operating the state of the display 104 of the device 100, the processor 106 may make use of external data. The external data may for example be data obtained from an external source connected to the device 100 via network 112. The external source could be the external server 114. For example, the processor 106 may make use of one or more of weather data, time of day data, location data, competition data or news data when controlling any one of the first, second, third, fourth or fifth state of the device 100. As an example, at night time the processor 106 may make use of time of day data such that the display 104 displays night-time related content during any of the states. As another example, when it’s raining the processor 106 may make use of local weather data to control the display 104 to display corresponding weather during any of the states.

In an embodiment, the state of the display 104 of the device 100 may be controlled based on an identity associated with the specific individual 116. For example, software (such as an app) on the mobile wireless communications device 118 of the individual 116 may communicate with the device 100. Such a communication may be automatic, and may be based on a distance of the individual 116 from the device 100. For example, the mobile wireless communication device 118 may communicate via Bluetooth or NFC with the communication means 120 of the device 100.

The communication between the mobile wireless communications device 118 and the device 100 informs the device 100 of an identity associated with the specific individual.

For example, the communication may include an identifier known to the device 100, thereby allowing the device 100 to determine the identity associated with the specific individual 116. Such a determination may be achieved by matching the identifier with a list of identifiers stored in the memory 108, or may be achieved via sending the identifier to an external device over the network 112 and receiving data identifying the individual 116 in return. Once the device 100 is aware of the identity of the specific individual, the processor 106 can customize the state of the display 104 based on the identity. For example, display 104 may be controlled to display a name or other detail associated with the individual. While a mobile wireless communications device 118 has been described, any device having wireless communications capabilities may be used in the same manner as that described above. For example, a wristband with NFC or other wireless capability may be used instead.

The various methods described above may be implemented by a computer program product. The software resident on the device 100, executable by the processor 106, is an example of such a computer program product. The computer program product may include computer code arranged to instruct the device 100 to perform the functions of one or more of the various methods described above. The computer program and/or the code for performing such methods may be provided to an apparatus, such as the processor 106 and the device 100, on a computer readable medium or computer program product. The computer readable medium may be transitory or non-transitory. The computer readable medium could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet. Alternatively, the computer readable medium could take the form of a physical computer readable medium such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-RA/V or DVD.

An apparatus such as the device 100 may be configured in accordance with such code to perform one or more processes in accordance with the various methods discussed herein. In one arrangement the apparatus comprises the processor 106, the memory 108, and the display 104. Typically, these are connected to a central bus structure, the display 104 being connected via a display adapter. The apparatus can also comprise one or more input devices (such as a mouse and/or keyboard) and/or a communications adapter for connecting the apparatus to other apparatus or networks, such as the network 112 and/or the server 114. In one arrangement, a database resides in the memory 108 of the computer system. Such an apparatus may take the form of a data processing system. Such a data processing system may be a distributed system. For example, such a data processing system may be distributed across the network 112.

Annex

CascadeClassifier

This class module is an object detector, and attempts to detect any object based on a cascade classifier given as argument. For example, a Haar Feature-based Cascade Classifier for faces may be used in order to detect faces in a given image. • detectMultiScale(image, scaleFactor, minNeighbors) -> rectangles

This function detects objects of different sizes in an image. The detected objects are returned as a list of rectangles. An image provided to the function is processed by the function. Various options can be specified, such as the “scaleFactor” used by the algorithm and the “minNeighbors” (the minimal number of neighbours needed for the algorithm). A module, such as the known NumPy module, creates an “empty” image from a multidimensional array in order to put a frame of part of another image inside the empty image.

One or more of the following functions may be used to draw on an image: • rectangle(image, topLeftCorner, bottomRightCorner, color, thickness) -> image

This function draws a rectangle on the image when 2 opposite points are given. • line(image, pointl, point2, color, thickness) -> image

This function draws a line on the image from a first point (pointl) to a second point (point 2). • putText(image, text, origin, fontFace, fontScale, color, thickness, lineType)

This function adds text onto the image, with the text origin point being the bottomLeftCorner of the text, and fontFace being the font type of the text. The font type must be chosen from one of the following: FONT_HERSHEY_SIMPLEX, FONT_HERSHEY_PLAIN, FONT_HERSHEY_DUPLEX, FONT_HERSHEY_COMPLEX, FONT_HERSHEY_TRIPLEX,

FONT_HERSHEY_COMPLEX_SMALL, FONT_HERSHEY_SCRIPT_SIMPLEX, FONT_HERSHEY_SCRIPT_COMPLEX

Antialiasing may be used as line type (LINE_AA).

Blob Detection

In order to monitor for the presence of an individual, “blobs” may be extracted from the images. A blob is a human outline. Such a blob extraction may be performed in a variety of methods. For example, to perform blob extraction, one or more of the following functions may be used: • subtract(array1, array2) -> arrayReturn

This function takes the per-element differences between two arrays. In the present case, this is therefore the per-element difference between two images. • add(array1, array2) -> arrayReturn

Contrary to the “subtract” function above, this function finds the per-element sum of two arrays. In the present case, this is therefore the per-element sum of two images. • mean(array) -> int

This function calculates an average of the elements of an array. • GaussianBlur(src, kernelSize, sigmaX) -> image

This function blurs an image using a Gaussian filter. The “kernelSize” is how big the blur is, and sigmaX is a parameter for the Gaussian filter. • MorphologyEx(array, operation, kernel) -> arrayReturned

This function performs a morphological transformation on the image. To remove noise, first an opening operation (MORPH_OPEN) may be performed, and then a closing operation (MORPH_CLOSE) may be performed. This sequence of operations removes noise from an image. • dilate(src, kernel, iterations) -> arrayReturned

This function dilates an image, which also has the effect of removing some noise. The “iterations” controls the number of times an image is dilated. • findContours(image, mode, method) -> image, contours, hierarchy

This function finds contours in a binary image A CV_RETR_EXTERNAL function may be used, which retrieves only the extreme outer contours. A CV_CHAIN_APPROX_SIMPLE function may also be used, which takes only the end points of segments composing contours.

As a return value, only the second one (contours) may be taken, which is a list of all contours found, each contour being represented by a vector of points. The vector of points representing the contour may be termed a contour vector. • ContourArea(contour) -> int

This function may be used to obtain the number of non-zero values (i.e. colour or greyscale) in an active contour area. This function may therefore be used to estimate the size of the contour area by comparing the number of non-zero values with zero values. • moments(array) -> moments

This function may be used on a contour vector to calculate the moment of the contour area. From this moment, centroid coordinates of the contour area can be calculated by performing the following : centroidX = int(M['m10']/M['m00']) centroidY = int(M['mOT]/M['mOO']) • boundingRect(array) -> x, y, w, h

By giving a contour vector to this method, we can have the bounding rectangle of it, with x, y being the coordinates of the top-left corner, w as width and h as height of it. “people counter” module

This module includes the PeopleCounter class and the ForegroundBlob class described below. It also relies on a background_subtraction module described below. • ForegroundBlob class

This class stores the position of the blob as well as its age and the estimated number of individuals it has in it (number of heads). It will also store the side of the screen that the blob started on (if known). This allows the PeopleCounter class (below) to avoid counting blobs which do not pass from one side of the screen to the other.

The class contains a number of helper functions for blob related tasks. These include checking finding the centroid of the blob (.centroid) and calculating the estimated number of heads in the blob.

The function to calculate the estimated number of heads in the blob finds local maxima points in the blob and counts them. It considers these likely to be a head. It requires an estimate of number of heads to remain stable for several frames before it will increase its own estimated number of heads. It will not decrease an estimate. This is accurate for small groups of people walking together in the same direction. • PeopleCounter class

This class is given frames of video and the equivalent foreground mask for that frame (normally created from a background subtraction algorithm, although a depth camera which has been thresholded at a certain depth would also work, for example a stereoscopic camera). The captured frames are resized to a very low resolution before sending them into this class for analysis. This reduces CPU load to allow running in realtime. High resolution images do not improve results very much, however, high framerate (ideally 30 frames per second) has a large impact on the accuracy of detection and tracking. Performance drops rapidly below 10 frames per second.

Internally the class works by creating blobs (see ForegroundBlob class) that it considers to be likely to be individuals from the foreground mask. It does this based on size and area of the blob as well as a history of the previous frames. It keeps track of movements between frames and handles merges and splits.

When a blob is destroyed (vanishes in the middle of an image) it is kept track of. If another blob appears nearby, it will be assumed to be the same individual and the history from the previous blob will be carried over to the new blob.

Movement detection absdiff(array1, array2) -> arrayReturned

Calculates the per-element absolute difference between two arrays. threshold(imageSource, thresholdValue, maxValue, type) -> retumValue, imageReturned

This function applies a fixed-level threshold to each array element. This may be used to remove remaining noise. countNonZero(array) -> int

This function counts non-zero array elements. normalize(imageSource, imageDestination, alpha, beta, norm) -> imageDestination

This function normalises the norm or value range of an array. For example, the values of alpha=0 to beta=255, with a norm as NORM_MINMAX may be used. This function is used to improve brightness and contrast in an image. cvtColor(image, colorSpace) -> imageReturned

This function changes the colour space of an image. For example, cv2.COLOR_BGR2GRAY may be used to convert a colour image to a grayscale image. blur(lmage, kernelSize)

This function is used to blur an image. The kernel size determines how much blur is required. “background subtraction” Module BackgroundSubtraction class

An algorithm that may be used for the background subtraction is LBFuzzyAdaptiveSOM. The configuration for this algorithm is in a folder called config inside the module.

The Python code works by calling into the C++ program in a separate thread. It then returns control to the program. It uses lazy evaluation and therefore the background subtraction algorithm will not block the main program until the foreground image is accessed via ,foreground_mask. After getting a raw foreground image from the algorithm, the code processes the mask to remove noise and connect disconnected blobs together (as they are likely to be a single individual). If this module sees results that it considers to be wrong (a large chunk of foreground appearing in a single frame, for example) it will reload the algorithm to recover from the situation more quickly.

This algorithm does not need a high resolution image, but works best at the highest framerate possible. To make better use of multiple cores in a processor, access to the ,foreground_mask object may be delayed until it is needed (as late as possible). This helps ensure good usage of the multiple cores rather than blocking the main program until the algorithm completes.

Claims

1. A method of controlling a state of a display of a device, the method comprising: operating the display in a first state; detecting the presence of an object associated with an individual; determining whether the object is within a first threshold distance from the device; and operating the display in a second state if the object is determined to be within the first threshold distance.

2. The method of claim 1, wherein the presence of the object associated with the individual is detected while the object is at a distance from the device greater than the first threshold distance.

3. The method of claim 1 to 2, wherein the object is the face of the individual.

4. The method of claim 3, wherein the determining step further comprises detecting the presence of the face of the individual while the face is within the first threshold distance from the device.

5. The method of any preceding claim, further comprising: determining one or more dimensions of the detected object; obtaining one or more corresponding known or average dimensions of the object; comparing the one or more determined dimensions with the one or more corresponding known or average dimensions; and determining the distance of the object from the device based on the comparison.

6. The method of claim 5, wherein the obtaining step further comprises obtaining the one or more corresponding known or average dimensions of the object when the object is at a predetermined distance from the device.

7. The method of claim 6, wherein the distance of the object is further based on the predetermined distance from the device.

8. The method of any preceding claim, wherein operating the display in the first state comprising displaying first video, first audio or first image content on the display.

9. The method of claim any preceding claim, wherein operating the display in the second state comprising displaying second video, second audio or second image content on the display.

10. The method of any preceding claim, wherein the second state provides an indication to the individual that the object is detected.

11. The method of any preceding claim, further comprising determining the age, height or gender of the individual.

12. The method of claim 11, wherein the object is the face of the individual, and the age, height or gender is determined based in part on a detected dimension of the face of the individual.

13. The method of claim 11 or 12, wherein the second state of the display is at least in part based on one or more of the determined age, height or gender of the individual.

14. The method of any preceding claim, further comprising logging the number of objects detected over a first time period.

15. The method of claim 14, wherein logging the number of objects detected comprises counting the number of objects detected outside of the threshold distance.

16. The method of any preceding claim, further comprising detecting an outline of the individual.

17. The method of claim 16, wherein the object is the face of the individual, and further comprising determining the age, height or gender of the individual based on the outline of the individual and one or more dimensions of the face.

18. The method of claim 17, wherein the age, height or gender of the individual is based on a comparison between a height of the outline and a height of the face.

19. The method of any of claims 16 to 18, further comprising logging the number of outlines detected over a second time period.

20. The method of claim 19, wherein logging the number of outlines detected comprises counting the number of outlines detected.

21. The method of any preceding claim, further comprising: determining whether the object is within a second threshold distance from the device, the second threshold distance being closer to the device than the first threshold distance; operating the display in a third state if the object is determined to be within the second threshold distance.

22. The method of claim 21, wherein operating the display in the third state comprising displaying third video, third audio or third image content on the display.

23. The method of claim 21 or 22, wherein the third state provides information regarding goods or services that may be purchased from the device.

24. The method of any of claims 21 to 23, further comprising, if the object is not detected for a predetermined period of time, changing the display from the third state to the second state or the first state.

25. The method of any of claims 21 to 24, further comprising: determining whether the object is within a third threshold distance from the device, the third threshold distance being closer to the device than the second threshold distance; operating the display in a fourth state if the object is determined to be within the third threshold distance.

26. The method of claim 25, wherein operating the display in the fourth state comprising displaying fourth video, fourth audio or fourth image content on the display.

27. The method of claim 25 or 26, wherein the fourth state provides confirmation that a purchase has been completed.

28. The method of any of claims 25 to 27, further comprising, if the object is not detected for a predetermined period of time, changing the display from the fourth state to any one of the third state, the second state or the first state.

29. The method of any preceding claim, further comprising operating the display in a fifth state if the presence of the object is detected.

30. The method of claim 29, wherein operating the display in the fifth state comprising displaying fifth video, fifth audio or fifth image content on the display.

31. The method of claim 29 or 30, wherein the fifth state provides an indication to the individual that the object is detected.

32. The method of any preceding claim, further comprising, if the object is not detected for a predetermined period of time, changing the display from the second state to the first state.

33. The method of any preceding claim, further comprising determining a location on the display based on a face angle of the individual indicating the location.

34. The method of claim 33, wherein the location is further determined based on a detected eye angle and/or nose angle.

35. The method of claim 34, wherein the location is further determined based on a detected eye pupil.

36. The method of any of claims 33 to 35, further comprising: after determining the location on the display, monitoring for the face angle to remain indicating the location for a predetermined period; and if the face angle indicates the location for the predetermined period, changing a state of the display in response to the indication.

37. The method of any preceding claim, further comprising detecting a movement of the head of the individual, and changing a state of the display in response to the detection.

38. The method of claim 37, wherein the movement is a head nod or a head shake.

39. A computer readable medium comprising computer readable code operable, in use, to instruct a computer to perform the method of any preceding claim.

40. A computer program product comprising computer readable code operable, in use, to instruct a computer to perform the method of any one of claims 1 to 38.

41. A device comprising a memory having computer executable code store thereon arranged to carry out the method of any of claims 1 to 38, and a processor arranged to execute the code stored in the memory.

42. A device comprising a display arranged to operate in multiple states, the device comprising a processor arranged to: operate the display in a first state; detect, based on data received from a camera of the device, the presence of an object associated with an individual; determine whether the object is within a first threshold distance from the device; and operate the display in a second state if the object is determined to be within the first threshold distance.

43. The device of claim 42, wherein the camera comprises two HD cameras.

44. The device of claim 42, wherein the camera comprises a stereoscopic camera.

45. The device of any of claims 42 to 44, wherein the device is a vending machine.

46. The device of any of claims 42 to 45, wherein the object is the face of the individual.

47. The device of any of claims 42 to 46, wherein the object is a symbol, logo or word associated with a brand.