WO2025058615A1 - Methods and systems for a gaze-based image capture mode - Google Patents
Methods and systems for a gaze-based image capture mode Download PDFInfo
- Publication number
- WO2025058615A1 WO2025058615A1 PCT/US2023/032458 US2023032458W WO2025058615A1 WO 2025058615 A1 WO2025058615 A1 WO 2025058615A1 US 2023032458 W US2023032458 W US 2023032458W WO 2025058615 A1 WO2025058615 A1 WO 2025058615A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gaze
- user
- image
- lens
- computer
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 claims abstract description 78
- 230000001815 facial effect Effects 0.000 claims abstract description 65
- 230000000007 visual effect Effects 0.000 claims description 23
- 238000013500 data storage Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 10
- 230000003993 interaction Effects 0.000 claims description 7
- 210000001747 pupil Anatomy 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 210000001508 eye Anatomy 0.000 description 20
- 238000003860 storage Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 7
- 210000003128 head Anatomy 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003825 pressing Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013478 data encryption standard Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000004709 eyebrow Anatomy 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000000744 eyelid Anatomy 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/631—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
Definitions
- This application generally relates to a capture of a self-portrait (“selfie”) or other images.
- selfie a self-portrait
- a user attempts to take a selfie using a mobile phone, they are often trying to view a preview of themselves in the display screen, and press the shutter to capture the image. Accordingly, they often look at a central portion of the screen instead of looking at the camera aperture.
- looking at the central portion of the screen while simultaneously pressing the shutter button e.g., located at the bottom of the screen
- the shutter button e.g., located at the bottom of the screen
- a computer-implemented method includes receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
- the method includes providing the facial image of the one or more facial attributes to a trained gaze detection model.
- the method includes applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
- the method includes, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
- a computing device includes one or more processors and data storage that has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out operations.
- the operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
- the operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model.
- the operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
- the operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
- an article of manufacture may include a non-transitory computer-readable medium having stored thereon program instructions that, upon execution by one or more processors of a computing device, cause the computing device to carry out operations.
- the operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
- the operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model.
- the operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
- the operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
- Figure 2 is an example overview of a processing pipeline for gaze-based image capture, in accordance with example embodiments.
- Figure 4 illustrates example images for an image capture shutter button, in accordance with example embodiments.
- Figure 5 illustrates example images for a lens indicator for gaze-based image capture, in accordance with example embodiments.
- Figure 6 illustrates example images for an image capture shutter active region, in accordance with example embodiments.
- Figure 7 illustrates additional example images for an image capture shutter active region, in accordance with example embodiments.
- Figure 8 illustrates example calibrations for gaze-based image capture in portrait orientation, in accordance with example embodiments.
- Figure 9 illustrates example calibrations for gaze-based image capture in landscape orientation, in accordance with example embodiments.
- Figure 10 is a block diagram of an example computing device, in accordance with example embodiments.
- Figure 11 is a flowchart of a method, in accordance with example embodiments.
- Figure 12 is another flowchart of a method, in accordance with example embodiments.
- Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
- This application generally relates to an automatic capture of a selfie.
- a user attempts to take a selfie, it can be challenging to look at the camera (typically at one end of a mobile device) and simultaneously press the shutter button (typically at the other end of the mobile device).
- the user looks at the middle of the screen instead of the camera.
- This problem is further accentuated in foldable devices where the camera can be on one portion of the foldable device and the shutter button can be on the other portion of the foldable device. Accordingly, there is a need to enable a user to look at the camera when the image is being captured.
- a trained gaze detection model may be used to determine whether the user is looking at the camera, and upon a determination that the user is looking at the camera, automatically capturing an image of the user, displaying a visual cue to facilitate an image capture of an image of the user, or both.
- the visual cue may be a countdown timer displayed prior to an automatic image capture, or the visual cue may be a modified shutter button (e.g., an enlarged version, a repositioned version, or a virtual active layer) it so that it is easier for the user to capture the selfie while not having to look away from the camera.
- the camera determines whether the user is looking at the camera aperture.
- a machine learning model that scans the user’s gaze to determine a gaze direction (e.g., output either as coordinates on the screen with respect to the coordinates of the location of the camera on the mobile device, or as a confidence measure of gazing at the camera).
- the image may be captured automatically.
- the device may provide a modified shutter button to enable the user to capture the image without having to look away from the camera aperture.
- the described features have accessibility related implementations. For example, for users with limited vision, a circular countdown timer around the camera aperture may be provided to direct them toward the camera aperture.
- a modified shutter button e.g., enlarged, repositioned, with flashing lights, more apparent colors, etc.
- the camera lens may be calibrated to accurately determine a direction of gaze for the user.
- the camera aperture is typically at the top of the device
- the shutter button to capture the image is at the bottom of the device
- a preview of the image is displayed between the camera aperture and the shutter button.
- an image capturing experience may be provided whereby the user is able to gaze at the camera aperture during image capture.
- the camera may be configured to capture the image automatically, and/or enable the user to capture the image without a need to shift their gaze away from the camera aperture.
- FIG. 1 is an example overview 100 of gaze-based image capture, in accordance with example embodiments.
- the operations may involve displaying, by a graphical user interface of a computing device, a preview image of a user’s face.
- device 105 may include graphical user interface (GUI) 110 displaying a preview image 115.
- GUI graphical user interface
- device 105 may be a mobile device.
- graphical user interface 110 may be an interface that displays a captured image.
- graphical user interface 110 may be a live-view interface that displays a live-view preview of an image. As illustrated, the displaying of the image may involve providing a live-view preview of the image prior to a capture of the image.
- a "live-view preview" of an image should be understood to be an image or sequence of images (e.g., video) that is generated and displayed based on an image data stream from an image sensor of an image-capture device.
- image data may be generated by a camera's image sensor (or a portion, subset, or sampling of the pixels on the image sensor).
- This image data is representative of the field-of-view (FOV) of the camera, and thus indicative of the image that will be captured if the user taps the camera's shutter button or initiates image capture in some other manner.
- FOV field-of-view
- a camera device or other computing device may generate and display a live-view preview image based on the image data stream from the image sensor.
- the live-view preview image can be a real-time image feed (e.g., video), such that the user is informed of the camera's FOV in real-time.
- the image may be a frame of a plurality of frames of a video.
- the user may open a camera system of device 105 (e.g., with a touch screen or other mechanism), and may direct the camera (e.g., a front-facing camera with aperture 105 A) to capture a self-portrait (e.g., a selfie), and/or direct the camera (e.g., a rearfacing camera) with an intent to capture an image of another person.
- Image capturing may be achieved by pressing a shutter button 105B.
- Graphical user interface 110 may display a live- view preview of the image.
- Device 105 may utilize one or more algorithms (e.g., an object detection algorithm, a face detection algorithm, a segmentation algorithm, and so forth) to identify one or more regions of interest in the image.
- a user-approved facial recognition algorithm may be applied to identify one or more individuals in the image as likely objects of interest.
- device 105 may have a history of user preferences, and may identify certain objects and/or individuals as being of high interest to the user.
- device 105 is illustrated with a front facing camera with aperture 105 A, the camera may be located on the rear side of device 105. In the event device 105 is a foldable device, the camera may be located on one or both panels of device 105.
- facial attributes may generally refer to any characteristic of a face that is indicative of gaze direction.
- facial attributes may include pupil attributes, such as a position of the pupils, and/or a flow tracking a movement of the pupils, may be used.
- a position and/or movement of the head e.g., a tilt of the head, a head pose, etc.
- a position and/or movement of one or both eyebrows e.g., a position and/or movement of one or both eyelids, and so forth may be used as an indication of gaze direction.
- device 105 may receive the facial image (e.g., a two- dimensional (2D) image patch including a face, and/or a 2D image patch including the face and 2D image patches including the eyes).
- a face region may be selected by a face detection algorithm, and device 105 may receive a cropped facial image of the face, and/or portions of the face that include the eyes, or portions around the eyes (e.g., including the eyebrows).
- eye comer landmarks may be utilized to identify such portions of the face that include the eyes.
- device 105 may detect a gaze direction.
- the facial image of the one or more facial attributes may be provided to a gaze detection model.
- the gaze detection model may have been trained to receive the facial image of the one or more facial attributes as an input and output a gaze direction.
- the gaze detection model may be trained on red-blue-green (RGB) images.
- RGB red-blue-green
- the facial image may be converted to RGB prior to being provided to the gaze detection model.
- the trained gaze detection model may output a gaze vector (e.g., a three-dimensional (3D) vector) that indicates a direction of the gaze.
- a gaze vector e.g., a three-dimensional (3D) vector
- the 3D gaze vector may be provided with reference to a coordinate system associated with device 105.
- a gaze directed at the camera aperture e.g., aperture 105 A
- the trained gaze detection model may output a gaze vector and a scalar numeral that indicates a likelihood that the gaze is directed toward the camera aperture (e.g., aperture 105A).
- the likelihood may be a confidence measure between “0” and “1,” where a value close to “0” is indicative of a low likelihood that the gaze is directed toward the camera aperture, and a value close to “1” is indicative of a high likelihood that the gaze is directed toward the camera aperture.
- a confidence measure of 0.163 may be indicative of a 16.3% likelihood that the gaze is directed toward the camera aperture.
- the trained gaze detection model may be a convolutional neural network (CNN) comprising a plurality of input and output layers.
- the gaze detection model may take an input such as a facial image including a face box and two eye patches.
- the model may predict a 3D gaze vector and a likelihood that the gaze is directed toward the camera aperture 105 A.
- Different versions of such a model may be utilized based on available computational resources, a desired processing speed, and so forth.
- the gaze detection model may reside on device 105. Additional and/or alternative embodiments may involve a gaze detection model residing in a cloud server, or configured as a distributed system.
- a training of the gaze detection model may involve generating ground truth images that capture a head pose (e.g., yaw, pitch, and roll), and a gaze pose (e.g., alpha and beta).
- the learning may involve, for example, supervised learning, where human labelers may use an annotation tool to record a ground truth 3D gaze vector for a given face.
- the annotation tool may detect a plurality of cases and a predicted gaze direction.
- the annotator may reject an incorrect prediction, and/or correct the gaze direction for an incorrect prediction.
- the annotation tool may enable an annotator to change direction angles for an output gaze vector, and/or enter coordinates for the output gaze vector.
- the gaze detection model may be an eye tracker based on a multilayer feed-forward CNN.
- a face detection algorithm may select the face region with associated eye corner landmarks, which may be used to crop the images down to the eye region alone. These cropped frames may be fed through two identical CNN towers with shared weights. Each convolutional layer may be followed by an average pooling layer.
- eye comer landmarks may be combined with the output of the two towers through fully connected layers. Also, for example, Rectified Linear Units (ReLUs) may be used for layers except the final fully connected output layer, which may have no activation.
- the eye tracker may then output a location of a user’s gaze.
- Another approach to gaze detection may involve a deep neural network learning framework based on a differential eyes’ appearances network (DEANet) to estimate gaze direction. Pairs of image patches for left and right eyes may be simultaneously provided to a Siamese neural network (SNNet) that has two identical branches. The output from the two branches may be concatenated with head pose information, to obtain a differential gaze for the pairs of image patches.
- DEANet differential eyes’ appearances network
- the gaze detection model may involve a model that generates 2D coordinates for the gaze direction.
- Some approaches involve using eye patches and an eye grid to construct a two-step training network for gaze detection on mobile devices.
- Other models may be based on eye patches, a full-face patch, and a face grid.
- device 105 may determine whether the gaze is directed toward the camera aperture 105 A.
- the 3D coordinates returned by the gaze detection model may be within a threshold range of (0, 0, 1), the coordinates associated with camera aperture 105 A.
- the likelihood that the gaze is directed toward the camera aperture 105 A may be within a threshold range of 1.0, indicative of a high likelihood that the gaze is directed toward the camera aperture 105 A.
- device 105 may initiate a gaze detection timer that determines whether the gaze remains directed toward the camera aperture 105 A for greater than a gaze threshold time. This is to ensure that the user intends to capture a selfie.
- the camera may be in a manual image capture mode.
- the visual cue may involve providing an active area on GUI 110 that is configured to receive a user indication to capture the image.
- the active area may be an expanded area around the shutter button 105B to enable ease of access for the user.
- the active area may be a region closer to camera aperture 105 A, to enable the user to conveniently trigger the image capture without having to look away from the camera aperture 105 A.
- the active area may be a virtual layer overlaid on GUI 110.
- gaze control component 210 may trigger capture module 230 to automatically capture the image, without providing a visual cue 225.
- FIG. 7 illustrates additional example images for an image capture shutter active region, in accordance with example embodiments.
- Image 700A illustrates a device with an active region 705 around the shutter button.
- Image 700B illustrates the situation where the modified view of the shutter button involves a virtual layer 710 overlaid over the display component.
- the camera detects a user interaction with the virtual layer 710, the camera is configured to perceive this as indicative of a user indication to capture the image.
- Computing device 1000 may include a user interface module 1001, a network communications module 1002, one or more processors 1003, data storage 1004, one or more cameras 1018, one or more sensors 1020, and power system 1022, all of which may be linked together via a system bus, network, or other connection mechanism 1005.
- Wireline interface(s) 1008 can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiberoptic link, or a similar physical connection to a wireline network.
- wireline transmitters such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiberoptic link, or a similar physical connection to a wireline network.
- USB Universal Serial Bus
- Data storage 1004 can include one or more non-transitory computer-readable storage media that can be read and/or accessed by at least one of one or more processors 1003.
- the one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one of one or more processors 1003.
- data storage 1004 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other examples, data storage 1004 can be implemented using two or more physical devices.
- computer-readable instructions 1006 can include instructions that, when executed by processor(s) 1003, enable computing device 1000 to carry out operations.
- the operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
- the operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model.
- the operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
- the operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
- computing device 1000 can include gaze detection module 1012.
- Gaze detection module 1012 can be configured to receive one or more facial features of the user and provide them to a trained gaze detection model. The trained gaze detection model may then indicate a direction of the gaze of the user. Gaze detection module 1012 may determine whether the user’s gaze is directed at the camera, and upon a determination that the user is gazing at the camera for a gaze threshold time, gaze detection module 1012 can trigger an automatic image capture of the user by one or more cameras 1018. In some embodiments, gaze detection module 1012 may display a visual cue to enable the user to continue to gaze at the camera for the gaze threshold time.
- computing device 1000 can include one or more cameras 1018.
- Camera(s) 1018 can include one or more image capture devices, such as still and/or video cameras, equipped to capture light and record the captured light in one or more images; that is, camera(s) 1018 can generate image(s) of captured light.
- the one or more images can be one or more still images and/or one or more images utilized in video imagery.
- Camera(s) 1018 can capture light and/or electromagnetic radiation emitted as visible light, infrared radiation, ultraviolet light, and/or as one or more other frequencies of light.
- computing device 1000 can include one or more sensors 1020. Sensors 1020 can be configured to measure conditions within computing device 1000 and/or conditions in an environment of computing device 1000 and provide data about these conditions.
- sensors 1020 can include one or more of: (i) sensors for obtaining data about computing device 1000, such as, but not limited to, a thermometer for measuring a temperature of computing device 1000, a battery sensor for measuring power of one or more batteries of power system 1022, and/or other sensors measuring conditions of computing device 1000; (ii) an identification sensor to identify other objects and/or devices, such as, but not limited to, a Radio Frequency Identification (RFID) reader, proximity sensor, one-dimensional barcode reader, two-dimensional barcode (e.g., Quick Response (QR) code) reader, and a laser tracker, where the identification sensors can be configured to read identifiers, such as RFID tags, barcodes, QR codes, and/or other devices and/or object configured to be read and provide at least identifying information; (ii)
- RFID Radio Frequency
- Power system 1022 can include one or more batteries 1024 and/or one or more external power interfaces 1026 for providing electrical power to computing device 1000.
- Each battery of the one or more batteries 1024 can, when electrically coupled to the computing device 1000, act as a source of stored electrical power for computing device 1000.
- One or more batteries 1024 of power system 1022 can be configured to be portable. Some or all of one or more batteries 1024 can be readily removable from computing device 1000. In other examples, some or all of one or more batteries 1024 can be internal to computing device 1000, and so may not be readily removable from computing device 1000. Some or all of one or more batteries 1024 can be rechargeable.
- a rechargeable battery can be recharged via a wired connection between the battery and another power supply, such as by one or more power supplies that are external to computing device 1000 and connected to computing device 1000 via the one or more external power interfaces.
- one or more batteries 1024 can be non-rechargeable batteries.
- One or more external power interfaces 1026 of power system 1022 can include one or more wired-power interfaces, such as a USB cable and/or a power cord, that enable wired electrical power connections to one or more power supplies that are external to computing device 1000.
- One or more external power interfaces 1026 can include one or more wireless power interfaces, such as a Qi wireless charger, that enable wireless electrical power connections, such as via a Qi wireless charger, to one or more external power supplies.
- computing device 1000 can draw electrical power from the external power source the established electrical power connection.
- power system 1022 can include related sensors, such as battery sensors associated with the one or more batteries or other types of electrical power sensors.
- One or more external power interfaces 1026 of power system 1022 can include one or more wired-power interfaces, such as a USB cable and/or a power cord, that enable wired electrical power connections to one or more power supplies that are external to computing device 1000.
- One or more external power interfaces 1026 can include one or more wireless power interfaces, such as a Qi wireless charger, that enable wireless electrical power connections, such as via a Qi wireless charger, to one or more external power supplies.
- computing device 1000 can draw electrical power from the external power source the established electrical power connection.
- power system 1022 can include related sensors, such as battery sensors associated with the one or more batteries or other types of electrical power sensors.
- Figure 11 is a flowchart of a method, in accordance with example embodiments.
- Method 1100 may include various blocks or steps. The blocks or steps may be carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 1100.
- Block 1110 involves receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
- Block 1120 involves providing the facial image of the one or more facial attributes to a trained gaze detection model.
- Block 1130 involves applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
- Block 1140 involves, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
- the facilitating of the image capture involves enabling an automatic capture of the image of the user by the image capturing device.
- the facilitating of the image capture involves displaying, by the display component, a visual cue to facilitate the image capture.
- the displaying of the visual cue involves displaying a visible countdown timer near the lens indicating an amount of time left for an automatic image capture of the image.
- Some embodiments involve determining an expiry of the visible countdown timer near the lens. Such embodiments involve enabling an automatic capture of the image of the user by the image capturing device.
- Some embodiments involve determining, by the trained gaze detection model, that the gaze of the user is not directed at the lens prior to an expiry of the visible countdown timer near the lens. Such embodiments involve disabling an automatic capture of the image of the user by the image capturing device.
- the visible countdown timer near the lens may be an animated circular countdown timer around the lens.
- the displaying of the visual cue involves displaying a modified view of a shutter button on the display component to enable the user to capture the image.
- Some embodiments involve detecting a user interaction with the modified view of the shutter button. Such embodiments involve capturing the image of the user by the image capturing device.
- the modified view of the shutter button may be an enlarged view of the shutter button.
- the modified view of the shutter button involves a virtual layer overlaid over the display component, wherein a user interaction with the virtual layer is indicative of a user indication to capture the image.
- the modified view of the shutter button involves a repositioning of the shutter button on the display component.
- the determination that the gaze of the user is directed at the lens involves determining whether the gaze of the user is directed at the lens for a time that exceeds a gaze threshold time.
- the facilitating of the image capture may be performed upon a determination that the gaze of the user is directed at the lens for the time that exceeds the gaze threshold time.
- the facilitating of the image capture may not be performed upon a determination that the gaze of the user is not directed at the lens for the time that exceeds the gaze threshold time.
- Some embodiments involve, in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a location of a gaze of the user.
- the determining of whether the gaze of the user is directed at the lens involves determining, based on the location of the gaze of the user, whether the user is looking in a vicinity of the lens.
- Some embodiments involve, in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a confidence measure indicating a likelihood that the gaze of the user is directed at the lens.
- the determining of whether the gaze of the user is directed at the lens comprises determining whether the confidence measure exceeds a threshold confidence measure.
- the trained gaze detection model may be a convolutional neural network.
- Some embodiments involve calibrating the lens to align with a gaze direction of the user.
- the determination that the gaze of the user is directed at the lens may be based on the calibrated lens.
- the calibrating of the lens involves providing a virtual point located on the display component. Such embodiments also involve directing the user to look at the virtual point. Such embodiments additionally involve determining, by the trained gaze detection model, a direction of the user gaze with reference to a location of the virtual point. The calibrating of the lens comprises a correction of an offset between the direction of the user gaze and the location of the virtual point.
- the image capturing device may be a component of a mobile device.
- the lens of the image capturing device may be situated at a same side of the mobile device as the display component.
- the lens of the image capturing device may be situated at a side of the mobile device opposite to the display component.
- the mobile device may be a foldable mobile device comprising first and second panels, wherein the image capturing device is located on the first panel, and wherein the display component is located on the second panel.
- the one or more facial attributes may be pupil attributes.
- Figure 12 is another flowchart of a method, in accordance with example embodiments.
- Method 1200 may include various blocks or steps. The blocks or steps may be carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 1200. [00110] The blocks of method 1200 may be carried out by various elements of computing device 1000 as illustrated and described in reference to Figure 10.
- Block 1210 involves determining that the user gaze is directed toward the camera.
- Block 1220 involves initializing the gaze detection timer. For example, a 3- second timer may be started.
- Block 1230 involves determining whether the user gaze remains directed toward the camera for longer than a gaze threshold time. For example, this may involve determining whether the user gaze remains directed toward the camera for longer than 3 seconds.
- the process proceeds to determine whether the camera is configured to operate in an auto-capture mode or a manual-capture mode. For example, a user may have selected a preference for one of these modes. Such a preference may be pre-selected, and/or may be selected at the time of image capture.
- Block 1240 involves initializing an image capture timer.
- a lens indicator such as, for example, a countdown timer animation around the lens (e.g., illustrated in Figure 5), may be provided.
- an automatic image capture may be triggered.
- the process may optionally proceed to block 1245.
- an automatic image capture may be triggered (e.g., without providing an image capture timer).
- Block 1245 involves providing a modified shutter button (e.g., illustrated in Figures 6, and/or 7), and causing an image to be captured subsequent to receiving a user interaction with the modified shutter button (e.g., pressing the shutter button, hovering over the shutter button, and so forth).
- a modified shutter button e.g., illustrated in Figures 6, and/or 7
- Block 1250 involves canceling the image capture. Subsequently, the process may loop back to block 1210.
- a step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique.
- a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data).
- the program code can include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique.
- the program code and/or related data can be stored on any type of computer readable medium such as a storage device including a disk, hard drive, or other storage medium.
- the computer readable medium can also include non-transitory computer readable media such as computer-readable media that store data for short periods of time like register memory, processor cache, and random access memory (RAM).
- the computer readable media can also include non-transitory computer readable media that store program code and/or data for longer periods.
- the computer readable media may include secondary or persistent long-term storage, like read only memory (ROM), optical or magnetic disks, compact disc read only memory (CD-ROM), for example.
- the computer readable media can also be any other volatile or non-volatile storage systems.
- a computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Studio Devices (AREA)
Abstract
An example method includes receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The method also includes providing the facial image of the one or more facial attributes to a trained gaze detection model. The method additionally includes applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The method further includes, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
Description
METHODS AND SYSTEMS FOR A GAZE-BASED IMAGE CAPTURE MODE
BACKGROUND
[0001] Many modern computing devices, including mobile phones, personal computers, and tablets, include image capture devices, such as still and/or video cameras. The image capture devices can capture images, such as images that include people, animals, landscapes, and/or objects.
SUMMARY
[0002] This application generally relates to a capture of a self-portrait (“selfie”) or other images. For example, when a user attempts to take a selfie using a mobile phone, they are often trying to view a preview of themselves in the display screen, and press the shutter to capture the image. Accordingly, they often look at a central portion of the screen instead of looking at the camera aperture. However, looking at the central portion of the screen while simultaneously pressing the shutter button (e.g., located at the bottom of the screen) may be challenging. This can make it difficult to get a desirable selfie, especially if the user is attempting to make eye contact (e.g., attempting to look at the camera) while initiating image capture.
[0003] Some computer vision based post-capture editing processes have attempted to apply a correction that shows that the gaze is directed toward the camera. However, the generated image may not be realistic, and an unexpected change around human eyes may be easily noticeable as human beings are generally sensitive to how they appear in images. Additionally, there is a lack of an automatic mechanism that can apply gaze correction during image capture. Accordingly, there is a need for an automatic image capture mechanism that can capture an image of a user while their gaze is directed to the camera.
[0004] In a first aspect, a computer-implemented method is provided. The method includes receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The method includes providing the facial image of the one or more facial attributes to a trained gaze detection model. The method includes applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The method includes, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0005] In a second aspect, a system is provided. The system may include one or more processors. The system may also include data storage, where the data storage has stored thereon
computer-executable instructions that, when executed by the one or more processors, cause the system to carry out operations. The operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model. The operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0006] In a third aspect, a computing device is provided. The device includes one or more processors and data storage that has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out operations. The operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model. The operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0007] In a fourth aspect, an article of manufacture is provided. The article of manufacture may include a non-transitory computer-readable medium having stored thereon program instructions that, upon execution by one or more processors of a computing device, cause the computing device to carry out operations. The operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The operations may further include providing the facial image of the one or more facial attributes to a trained gaze detection model. The operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0008] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further
aspects, embodiments, and features will become apparent by reference to the figures and the following detailed description and the accompanying drawings.
BRIEF DESCRIPTION OF THE FIGURES
[0009] Figure 1 is an example overview of gaze-based image capture, in accordance with example embodiments.
[0010] Figure 2 is an example overview of a processing pipeline for gaze-based image capture, in accordance with example embodiments.
[0011] Figure 3 illustrates example images for gaze-based image capture, in accordance with example embodiments.
[0012] Figure 4 illustrates example images for an image capture shutter button, in accordance with example embodiments.
[0013] Figure 5 illustrates example images for a lens indicator for gaze-based image capture, in accordance with example embodiments.
[0014] Figure 6 illustrates example images for an image capture shutter active region, in accordance with example embodiments.
[0015] Figure 7 illustrates additional example images for an image capture shutter active region, in accordance with example embodiments.
[0016] Figure 8 illustrates example calibrations for gaze-based image capture in portrait orientation, in accordance with example embodiments.
[0017] Figure 9 illustrates example calibrations for gaze-based image capture in landscape orientation, in accordance with example embodiments.
[0018] Figure 10 is a block diagram of an example computing device, in accordance with example embodiments.
[0019] Figure 11 is a flowchart of a method, in accordance with example embodiments.
[0020] Figure 12 is another flowchart of a method, in accordance with example embodiments.
DETAILED DESCRIPTION
[0021] Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other
embodiments or features. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
[0022] Thus, the example embodiments described herein are not meant to be limiting. Aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are contemplated herein.
[0023] Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.
Overview
[0024] This application generally relates to an automatic capture of a selfie. When a user attempts to take a selfie, it can be challenging to look at the camera (typically at one end of a mobile device) and simultaneously press the shutter button (typically at the other end of the mobile device). Generally, the user looks at the middle of the screen instead of the camera. This problem is further accentuated in foldable devices where the camera can be on one portion of the foldable device and the shutter button can be on the other portion of the foldable device. Accordingly, there is a need to enable a user to look at the camera when the image is being captured.
[0025] As described herein, a trained gaze detection model may be used to determine whether the user is looking at the camera, and upon a determination that the user is looking at the camera, automatically capturing an image of the user, displaying a visual cue to facilitate an image capture of an image of the user, or both. For example, the visual cue may be a countdown timer displayed prior to an automatic image capture, or the visual cue may be a modified shutter button (e.g., an enlarged version, a repositioned version, or a virtual active layer) it so that it is easier for the user to capture the selfie while not having to look away from the camera. The camera determines whether the user is looking at the camera aperture. This may be achieved using a machine learning model that scans the user’s gaze to determine a gaze direction (e.g., output either as coordinates on the screen with respect to the coordinates of the location of the camera on the mobile device, or as a confidence measure of gazing at the camera). Upon a determination that the user is gazing at the camera, the image may be captured automatically. In some embodiments, upon a determination that the user is gazing at the camera, the device may provide a modified shutter button to enable the user to capture the image without having to look away from the camera aperture.
[0026] As described here, the described features have accessibility related implementations. For example, for users with limited vision, a circular countdown timer around the camera aperture may be provided to direct them toward the camera aperture. Also, a modified shutter button (e.g., enlarged, repositioned, with flashing lights, more apparent colors, etc.) may be provided to direct such users to the capture button and enhance their image capture experience. As another example, for users that have misaligned eyes, the camera lens may be calibrated to accurately determine a direction of gaze for the user.
Example Gaze-based Image Capture Techniques
[0027] In a mobile device equipped with a front camera, the camera aperture is typically at the top of the device, the shutter button to capture the image is at the bottom of the device, and a preview of the image is displayed between the camera aperture and the shutter button. As a result, when a user attempts to take a selfie, they would like to preview their own expression at the time of image capture, direct their gaze at the camera aperture, and also press the shutter button to capture the selfie. An attempt to synchronize these acts can be challenging, and can diminish both the quality of user experience, and the quality of the captured image (e.g., the user is not looking at the camera, there is a motion blur, and so forth). Also, many users may believe they are looking at the camera aperture, but due to a different orientation of their eyeballs, their gaze may not be directed toward the camera aperture. Accordingly, there is a technical problem of enabling users to capture images where their gaze is correctly directed at the camera aperture during image capture. As described herein, an image capturing experience may be provided whereby the user is able to gaze at the camera aperture during image capture. The camera may be configured to capture the image automatically, and/or enable the user to capture the image without a need to shift their gaze away from the camera aperture.
[0028] Figure 1 is an example overview 100 of gaze-based image capture, in accordance with example embodiments. In some embodiments, the operations may involve displaying, by a graphical user interface of a computing device, a preview image of a user’s face. For example, device 105 may include graphical user interface (GUI) 110 displaying a preview image 115. In some embodiments, device 105 may be a mobile device. In some embodiments, graphical user interface 110 may be an interface that displays a captured image. In some embodiments, graphical user interface 110 may be a live-view interface that displays a live-view preview of an image. As illustrated, the displaying of the image may involve providing a live-view preview of the image prior to a capture of the image.
[0029] Herein a "live-view preview" of an image should be understood to be an image or sequence of images (e.g., video) that is generated and displayed based on an image data stream
from an image sensor of an image-capture device. For instance, image data may be generated by a camera's image sensor (or a portion, subset, or sampling of the pixels on the image sensor). This image data is representative of the field-of-view (FOV) of the camera, and thus indicative of the image that will be captured if the user taps the camera's shutter button or initiates image capture in some other manner. To help a user to decide how to position a camera for image capture, a camera device or other computing device may generate and display a live-view preview image based on the image data stream from the image sensor. The live-view preview image can be a real-time image feed (e.g., video), such that the user is informed of the camera's FOV in real-time. In some embodiments, the image may be a frame of a plurality of frames of a video.
[0030] In some embodiments, the user may open a camera system of device 105 (e.g., with a touch screen or other mechanism), and may direct the camera (e.g., a front-facing camera with aperture 105 A) to capture a self-portrait (e.g., a selfie), and/or direct the camera (e.g., a rearfacing camera) with an intent to capture an image of another person. Image capturing may be achieved by pressing a shutter button 105B. Graphical user interface 110 may display a live- view preview of the image. Device 105 may utilize one or more algorithms (e.g., an object detection algorithm, a face detection algorithm, a segmentation algorithm, and so forth) to identify one or more regions of interest in the image. In some implementations, a user-approved facial recognition algorithm may be applied to identify one or more individuals in the image as likely objects of interest. For example, device 105 may have a history of user preferences, and may identify certain objects and/or individuals as being of high interest to the user.
[0031] Although device 105 is illustrated with a front facing camera with aperture 105 A, the camera may be located on the rear side of device 105. In the event device 105 is a foldable device, the camera may be located on one or both panels of device 105.
[0032] At block 120, device 105 may receive a facial image of one or more facial attributes of a user. The term “facial attributes” as used herein, may generally refer to any characteristic of a face that is indicative of gaze direction. For example, facial attributes may include pupil attributes, such as a position of the pupils, and/or a flow tracking a movement of the pupils, may be used. Also, for example, a position and/or movement of the head (e.g., a tilt of the head, a head pose, etc.), a position and/or movement of one or both eyebrows, a position and/or movement of one or both eyelids, and so forth may be used as an indication of gaze direction. [0033] In some embodiments, device 105 may receive the facial image (e.g., a two- dimensional (2D) image patch including a face, and/or a 2D image patch including the face and 2D image patches including the eyes). In some embodiments, a face region may be selected
by a face detection algorithm, and device 105 may receive a cropped facial image of the face, and/or portions of the face that include the eyes, or portions around the eyes (e.g., including the eyebrows). In some embodiments, eye comer landmarks may be utilized to identify such portions of the face that include the eyes.
[0034] At block 125, device 105 may detect a gaze direction. For example, the facial image of the one or more facial attributes may be provided to a gaze detection model. The gaze detection model may have been trained to receive the facial image of the one or more facial attributes as an input and output a gaze direction. In some embodiments, the gaze detection model may be trained on red-blue-green (RGB) images. In the event the facial image (e.g., for the face portion, and/or the eyes) is in grayscale, the facial image may be converted to RGB prior to being provided to the gaze detection model.
[0035] In some embodiments, the trained gaze detection model may output a gaze vector (e.g., a three-dimensional (3D) vector) that indicates a direction of the gaze. For example, the 3D gaze vector may be provided with reference to a coordinate system associated with device 105. For example, a gaze directed at the camera aperture (e.g., aperture 105 A) may be associated with coordinates (0, 0, 1).
[0036] In some embodiments, the trained gaze detection model may output a gaze vector and a scalar numeral that indicates a likelihood that the gaze is directed toward the camera aperture (e.g., aperture 105A). For example, the likelihood may be a confidence measure between “0” and “1,” where a value close to “0” is indicative of a low likelihood that the gaze is directed toward the camera aperture, and a value close to “1” is indicative of a high likelihood that the gaze is directed toward the camera aperture. For example, a confidence measure of 0.163 may be indicative of a 16.3% likelihood that the gaze is directed toward the camera aperture.
[0037] In some embodiments, the trained gaze detection model may be a convolutional neural network (CNN) comprising a plurality of input and output layers. For example, the gaze detection model may take an input such as a facial image including a face box and two eye patches. The model may predict a 3D gaze vector and a likelihood that the gaze is directed toward the camera aperture 105 A. Different versions of such a model may be utilized based on available computational resources, a desired processing speed, and so forth. The gaze detection model may reside on device 105. Additional and/or alternative embodiments may involve a gaze detection model residing in a cloud server, or configured as a distributed system.
[0038] In some embodiments, a training of the gaze detection model may involve generating ground truth images that capture a head pose (e.g., yaw, pitch, and roll), and a gaze pose (e.g.,
alpha and beta). The learning may involve, for example, supervised learning, where human labelers may use an annotation tool to record a ground truth 3D gaze vector for a given face. For example, the annotation tool may detect a plurality of cases and a predicted gaze direction. The annotator may reject an incorrect prediction, and/or correct the gaze direction for an incorrect prediction. For example, the annotation tool may enable an annotator to change direction angles for an output gaze vector, and/or enter coordinates for the output gaze vector. [0039] In some embodiments, the gaze detection model may be an eye tracker based on a multilayer feed-forward CNN. A face detection algorithm may select the face region with associated eye corner landmarks, which may be used to crop the images down to the eye region alone. These cropped frames may be fed through two identical CNN towers with shared weights. Each convolutional layer may be followed by an average pooling layer. In some embodiments, eye comer landmarks may be combined with the output of the two towers through fully connected layers. Also, for example, Rectified Linear Units (ReLUs) may be used for layers except the final fully connected output layer, which may have no activation. The eye tracker may then output a location of a user’s gaze.
[0040] Another approach to gaze detection may involve a deep neural network learning framework based on a differential eyes’ appearances network (DEANet) to estimate gaze direction. Pairs of image patches for left and right eyes may be simultaneously provided to a Siamese neural network (SNNet) that has two identical branches. The output from the two branches may be concatenated with head pose information, to obtain a differential gaze for the pairs of image patches.
[0041] In some embodiments, the gaze detection model may involve a model that generates 2D coordinates for the gaze direction. Some approaches involve using eye patches and an eye grid to construct a two-step training network for gaze detection on mobile devices. Other models may be based on eye patches, a full-face patch, and a face grid.
[0042] At block 130, device 105 may determine whether the gaze is directed toward the camera aperture 105 A. For example, the 3D coordinates returned by the gaze detection model may be within a threshold range of (0, 0, 1), the coordinates associated with camera aperture 105 A. Also, for example, the likelihood that the gaze is directed toward the camera aperture 105 A may be within a threshold range of 1.0, indicative of a high likelihood that the gaze is directed toward the camera aperture 105 A. Upon a determination that the gaze is directed toward the camera aperture 105 A, device 105 may initiate a gaze detection timer that determines whether the gaze remains directed toward the camera aperture 105 A for greater than a gaze threshold time. This is to ensure that the user intends to capture a selfie. In the event that the gaze does
not remain directed toward the camera aperture 105 A for greater than the gaze threshold time, device 105 is configured to infer that the user does not intend to capture an image. In the event that the gaze remains directed toward the camera aperture 105 A for greater than the gaze threshold time, device 105 is configured to infer that the user intends to capture an image. The process proceeds to block 135.
[0043] At block 135, device 105 may display an optional visual cue, indicated by the block with a dashed boundary. For example, a camera may be in an automatic image capture mode. Accordingly, in some embodiments, device 105 may display a countdown timer on GUI 110 near the lens indicating an amount of time left for an automatic image capture. The countdown timer may involve a display of a “3, 2, 1...” timer. In some embodiments, the countdown timer may be an animated circular countdown timer positioned proximate to the camera aperture 105 A (e.g., around the lens). Some embodiments involve determining an expiry of the visible countdown timer near the lens, and enabling an automatic capture of the image of the user by the image capturing device. For example, as the circular timer completes one full circle, the camera is triggered to capture the image.
[0044] As described herein, a gaze detection timer indicates whether the user gaze is directed toward the lens for greater than a gaze threshold time. This enables device 105 to infer whether or not the user intends to capture the image. In the event the camera is in an automatic image capture mode, upon a determination that the user intends to capture an image (e.g., the user gaze is directed toward the lens for greater than a gaze threshold time), an optional second timer, such as a countdown timer, may be initialized and displayed. The countdown timer indicates an amount of time left before an automatic image capture is triggered.
[0045] In some embodiments, the camera may be in an automatic image capture mode, and device 105 may trigger an automatic image capture without displaying the optional second timer. For example, upon a determination that the user intends to capture an image (e.g., the user gaze is directed toward the lens for greater than a gaze threshold time), device 105 may automatically capture the image.
[0046] In some embodiments, the camera may be in a manual image capture mode. Accordingly, the visual cue may involve providing an active area on GUI 110 that is configured to receive a user indication to capture the image. For example, the active area may be an expanded area around the shutter button 105B to enable ease of access for the user. Also, for example, the active area may be a region closer to camera aperture 105 A, to enable the user to conveniently trigger the image capture without having to look away from the camera aperture 105 A. In some embodiments, the active area may be a virtual layer overlaid on GUI 110.
[0047] In the event the camera is a rear-facing camera, a first countdown timer (e.g., a series of flashing lights) may be provided via the rear-facing camera to a first user whose image is being captured, while an optional second synchronous countdown timer (e.g., an animated circle) may be provided on the front display to the second user who may be capturing the image of the first user. Also, for example, an automatic image capture may be performed without the countdown timers.
[0048] At block 140, the device 105 may capture the image. For example, an image 145 may be captured automatically with or without a countdown timer, or image 145 may be captured based on a user indication.
[0049] Figure 2 is an example overview of a processing pipeline for gaze-based image capture, in accordance with example embodiments. A device (e.g., device 105 of Figure 1) may be configured with a camera application 200. Frame pipeline 205 may receive one or more image frames from the camera sensor. In some embodiments, the frames may be of smaller resolution for more efficient processing.
[0050] The one or more image frames may be provided to gaze control component 210. In some embodiments, the JAVA native interface (JNI) 215 may provide a communication interface between camera application 200 and gaze capture pipeline 220. Accordingly, the images may be provided to gaze capture pipeline 220. Gaze capture pipeline 220 may be configured to apply a trained gaze detection model which may return a gaze vector, and/or a likelihood that the gaze is directed toward the camera aperture. JNI 215 provides this output of the gaze capture pipeline 220 to the gaze control component 210. As described previously, gaze control component 210 is configured to direct the image capture process. For example, gaze control component 210 may determine whether the gaze remains directed toward the camera aperture for longer than a gaze threshold time. Upon a determination that the gaze remains directed toward the camera aperture for longer than the gaze threshold time, gaze control component 210 may provide an optional visual cue to the user.
[0051] Also, for example, gaze control component 210 may determine whether camera application 200 is in an automatic capture mode or a manual capture mode. In the event the camera application 200 is in an automatic capture mode, gaze control component 210 may display an optional front lens indicator (e.g., a countdown timer) as a visual cue 225. In some embodiments, visual cue 225 may indicate to gaze control component 210 that the countdown timer has expired, and gaze control component 210 may trigger capture module 230 to capture the image.
[0052] In some embodiments, when the camera application 200 is in an automatic capture mode, upon a determination that the gaze remains directed toward the camera aperture for longer than the gaze threshold time, gaze control component 210 may trigger capture module 230 to automatically capture the image, without providing a visual cue 225.
[0053] In the event the camera application 200 is in a manual capture mode, gaze control component 210 may display a modified shutter button or a virtual screen overlay as visual cue 225. Upon an indication that the user has interacted with the visual cue 225, gaze control component 210 may trigger capture module 230 to capture the image.
[0054] Figure 3 illustrates example images 300 for gaze-based image capture, in accordance with example embodiments. Image 305 illustrates an example situation where the user’s gaze is directed toward shutter button 310 located at the bottom of device 315. Image 320 illustrates an example situation where the user’s gaze is directed toward the preview image located at the center 325 of device 315. Image 330 illustrates an example situation where the user’s gaze is directed toward the camera aperture 335 located at the top of device 315. As illustrated, image 330 is a desired gaze direction for a selfie.
[0055] Figure 4 illustrates example images for an image capture shutter button, in accordance with example embodiments. Image 400A illustrates a device with an image displayed on the screen. The shutter button 405 may be located at the bottom of the screen, below the image. Image 400B illustrates a foldable device with first panel 410 and second panel 415. A first camera 420 may be located on the first panel 410, and a second camera 425 may be located on the second panel 415. The display component displaying the preview image may be located on the second panel 415. The shutter button 430 may be located at the bottom of the screen, below the preview image, on the second panel 415. For a user to look at the first camera 420 while pressing the shutter button 430 may be challenging. Also, capturing the image with the first camera 420 while looking at the preview image displayed on the second panel 415 is unlikely to result in a desirable image.
[0056] Figure 5 illustrates example images for a lens indicator for gaze-based image capture, in accordance with example embodiments. Image 500 illustrates how displaying of the visual cue involves displaying an animated circular countdown timer 505 near the lens (e.g., encircling the lens) indicating an amount of time left for an automatic image capture of the image. In some embodiments, the visual cue may also include a visual indicator such as arrow 510 to direct the user’s attention to the countdown timer 505.
[0057] Images 515-525 illustrate the steps from gaze detection to image capture. Image 515 illustrates the situation where the gaze is not directed at the camera. No countdown timer is
displayed. Image 520 illustrates the situation where the device has detected that the user gaze is directed at the camera for longer than the gaze threshold time. As a result, a visible countdown timer (e.g., visible countdown timer 505) is displayed. The timer expires when the circle is completed. As illustrated in image 525, the completed circle indicates an expiry of the visible countdown timer near the lens. As a result, automatic capture of the image of the user by the camera is triggered.
[0058] Figure 6 illustrates example images for an image capture shutter active region, in accordance with example embodiments. Image 600A illustrates a device with an active region 605 around the shutter button. Image 600B illustrates the situation where the displaying of the visual cue involves displaying a modified view 610 of the shutter button on the display component to enable the user to capture the image. When the camera detects a user interaction with the modified view 610 of the shutter button, the image of the user is captured by the camera. As shown, the modified view 610 of the shutter button may be an enlarged view of the shutter button. Although not illustrated here, in some embodiments, the modified view of the shutter button may involve a repositioning of the shutter button on the display component.
[0059] Figure 7 illustrates additional example images for an image capture shutter active region, in accordance with example embodiments. Image 700A illustrates a device with an active region 705 around the shutter button. Image 700B illustrates the situation where the modified view of the shutter button involves a virtual layer 710 overlaid over the display component. When the camera detects a user interaction with the virtual layer 710, the camera is configured to perceive this as indicative of a user indication to capture the image.
[0060] Figure 8 illustrates example calibrations for gaze-based image capture in portrait orientation, in accordance with example embodiments. Image 800A illustrates an image displayed on a display component 805 of a device in portrait orientation. Display component 805 is shown to include five points from the top to the bottom, labeled A, B, C, D, E. Column 805B illustrates characteristics associated with each of these five points. For example, point A corresponds to coordinates (—0.063, 3.932), as may be output by a gaze detection model. Also, for example, point A corresponds to a probability or confidence measure 0.163, as may be output by a gaze detection model.
[0061] Some embodiments involve calibrating the lens to align with the gaze direction of the user. For example, the device may display the five points labeled A, B, C, D, E, on display component 805. The user may be directed to look at a point, say A. The gaze detection model may determine a direction of the user gaze with reference to a location of point A. For example,
the gaze detection model has data indicating the location of point A. However, the gaze direction of the user may be at another point A'. Accordingly, the calibrating of the lens may involve a correction of the offset between the gaze direction of the user (e.g., at A') and the location of the point (e.g., A). For example, a vector AA' may indicate a correction. Similar steps may be applied to additional points (e.g., B, C, D, E) on display component 805 to personalize the gaze detection model to the user, and/or apply a personalized correction to the camera lens.
[0062] Figure 9 illustrates example calibrations for gaze-based image capture in landscape orientation, in accordance with example embodiments. Image 900A illustrates an image displayed on a display component 905 of a device in landscape orientation. Display component 905 is shown to include five points from the left to the right, labeled A, B, C, D, E. Row 905B illustrates characteristics associated with each of these five points. For example, point A corresponds to coordinates (0.261, 0.443), as may be output by a gaze detection model. Also, for example, point A corresponds to a probability or confidence measure 0.982, as may be output by a gaze detection model. As described with respect to Figure 8, similar steps may be applied to points (e.g., A, B, C, D, E) on display component 905 to personalize the gaze detection model to the user, and/or apply a personalized correction to the camera lens.
[0063] Lens calibration, and/or personalization of the gaze detection model can be of particular assistance to users who may have misaligned pupils. For example, the user gaze direction measured by the gaze detection model may not be an accurate predictor for where the actual user gaze may be. For example, as described above, the user may be looking at a point A, whereas the user gaze may appear to be directed at point A'. Without a calibrated lens, the pupils in a user selfie may not be looking at the camera lens. However, once a correction is made by correlating points A and A' (e.g., using a vector AA'), the personalized gaze detection model is able to determine gaze direction for the user with a higher degree of accuracy, resulting in improved image quality.
Computing Device Architecture
[0064] Figure 10 is a block diagram of an example computing device 1000, in accordance with example embodiments. In particular, computing device 1000 shown in Figure 10 can be configured to perform at least one function described herein, including methods 1100 and/or 1200.
[0065] Computing device 1000 may include a user interface module 1001, a network communications module 1002, one or more processors 1003, data storage 1004, one or more
cameras 1018, one or more sensors 1020, and power system 1022, all of which may be linked together via a system bus, network, or other connection mechanism 1005.
[0066] User interface module 1001 can be operable to send data to and/or receive data from external user input/output devices. For example, user interface module 1001 can be configured to send and/or receive data to and/or from user input devices such as a touch screen, a computer mouse, a keyboard, a keypad, a touch pad, a trackball, a joystick, a voice recognition module, and/or other similar devices. User interface module 1001 can also be configured to provide output to user display devices, such as one or more cathode ray tubes (CRT), liquid crystal displays, light emitting diodes (LEDs), displays using digital light processing (DLP) technology, printers, light bulbs, and/or other similar devices, either now known or later developed. User interface module 1001 can also be configured to generate audible outputs, with devices such as a speaker, speaker jack, audio output port, audio output device, earphones, and/or other similar devices. User interface module 1001 can further be configured with one or more haptic devices that can generate haptic outputs, such as vibrations and/or other outputs detectable by touch and/or physical contact with computing device 1000. In some examples, user interface module 1001 can be used to provide a graphical user interface (GUI) for utilizing computing device 1000. For example, user interface module 1001 may be configured to provide a visual cue, a countdown timer, and so forth.
[0067] Network communications module 1002 can include one or more devices that provide one or more wireless interfaces 1007 and/or one or more wireline interfaces 1008 that are configurable to communicate via a network. Wireless interface(s) 1007 can include one or more wireless transmitters, receivers, and/or transceivers, such as a Bluetooth™ transceiver, a Zigbee® transceiver, a Wi-Fi™ transceiver, a WiMAX™ transceiver, an LTE™ transceiver, and/or other type of wireless transceiver configurable to communicate via a wireless network. Wireline interface(s) 1008 can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiberoptic link, or a similar physical connection to a wireline network.
[0068] In some examples, network communications module 1002 can be configured to provide reliable, secured, and/or authenticated communications. For each communication described herein, information for facilitating reliable communications (e.g., guaranteed message delivery) can be provided, perhaps as part of a message header and/or footer (e.g., packet/message sequencing information, encapsulation headers and/or footers, size/time information, and transmission verification information such as cyclic redundancy check (CRC)
and/or parity check values). Communications can be made secure (e.g., be encoded or encrypted) and/or decry pted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, Data Encryption Standard (DES), Advanced Encryption Standard (AES), a Rivest- Shamir- Adelman (RSA) algorithm, a Diffie-Hellman algorithm, a secure sockets protocol such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS), and/or Digital Signature Algorithm (DSA). Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decry pt/decode) communications.
[0069] One or more processors 1003 can include one or more general purpose processors (e.g., central processing unit (CPU), etc.), and/or one or more special purpose processors (e.g., digital signal processors, tensor processing units (TPUs), graphics processing units (GPUs), application specific integrated circuits, etc.). One or more processors 1003 can be configured to execute computer-readable instructions 1006 that are contained in data storage 1004 and/or other instructions as described herein.
[0070] Data storage 1004 can include one or more non-transitory computer-readable storage media that can be read and/or accessed by at least one of one or more processors 1003. The one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one of one or more processors 1003. In some examples, data storage 1004 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other examples, data storage 1004 can be implemented using two or more physical devices.
[0071] Data storage 1004 can include computer-readable instructions 1006 and perhaps additional data. In some examples, data storage 1004 can include storage required to perform at least part of the herein-described methods, scenarios, and techniques and/or at least part of the functionality of the herein-described devices and networks. In particular, computer- readable instructions 1006 can include instructions that, when executed by processor(s) 1003, enable computing device 1000 to provide for some or all of the functionality described herein. For example, data storage 1004 may store captured images, personalized correction factors, and so forth.
[0072] In some embodiments, computer-readable instructions 1006 can include instructions that, when executed by processor(s) 1003, enable computing device 1000 to carry out operations. The operations may include receiving, by an image capturing device, a facial image of one or more facial attributes of a user. The operations may further include providing the
facial image of the one or more facial attributes to a trained gaze detection model. The operations may also include applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device. The operations may additionally include, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0073] In some examples, computing device 1000 can include gaze detection module 1012. Gaze detection module 1012 can be configured to receive one or more facial features of the user and provide them to a trained gaze detection model. The trained gaze detection model may then indicate a direction of the gaze of the user. Gaze detection module 1012 may determine whether the user’s gaze is directed at the camera, and upon a determination that the user is gazing at the camera for a gaze threshold time, gaze detection module 1012 can trigger an automatic image capture of the user by one or more cameras 1018. In some embodiments, gaze detection module 1012 may display a visual cue to enable the user to continue to gaze at the camera for the gaze threshold time.
[0074] In some examples, computing device 1000 can include one or more cameras 1018. Camera(s) 1018 can include one or more image capture devices, such as still and/or video cameras, equipped to capture light and record the captured light in one or more images; that is, camera(s) 1018 can generate image(s) of captured light. The one or more images can be one or more still images and/or one or more images utilized in video imagery. Camera(s) 1018 can capture light and/or electromagnetic radiation emitted as visible light, infrared radiation, ultraviolet light, and/or as one or more other frequencies of light.
[0075] In some examples, computing device 1000 can include one or more sensors 1020. Sensors 1020 can be configured to measure conditions within computing device 1000 and/or conditions in an environment of computing device 1000 and provide data about these conditions. For example, sensors 1020 can include one or more of: (i) sensors for obtaining data about computing device 1000, such as, but not limited to, a thermometer for measuring a temperature of computing device 1000, a battery sensor for measuring power of one or more batteries of power system 1022, and/or other sensors measuring conditions of computing device 1000; (ii) an identification sensor to identify other objects and/or devices, such as, but not limited to, a Radio Frequency Identification (RFID) reader, proximity sensor, one-dimensional barcode reader, two-dimensional barcode (e.g., Quick Response (QR) code) reader, and a laser tracker, where the identification sensors can be configured to read identifiers, such as RFID tags, barcodes, QR codes, and/or other devices and/or object configured to be read and provide
at least identifying information; (iii) sensors to measure locations and/or movements of computing device 1000, such as, but not limited to, a tilt sensor, a gyroscope, an accelerometer, a Doppler sensor, a GPS device, a sonar sensor, a radar device, a laser-displacement sensor, and a compass; (iv) an environmental sensor to obtain data indicative of an environment of computing device 1000, such as, but not limited to, an infrared sensor, an optical sensor, a light sensor, a biosensor, a capacitive sensor, a touch sensor, a temperature sensor, a wireless sensor, a radio sensor, a movement sensor, a microphone, a sound sensor, an ultrasound sensor and/or a smoke sensor; and/or (v) a force sensor to measure one or more forces (e.g., inertial forces and/or G-forces) acting about computing device 1000, such as, but not limited to one or more sensors that measure: forces in one or more dimensions, torque, ground force, friction, and/or a zero moment point (ZMP) sensor that identifies ZMPs and/or locations of the ZMPs. Many other examples of sensors 1020 are possible as well.
[0076] Power system 1022 can include one or more batteries 1024 and/or one or more external power interfaces 1026 for providing electrical power to computing device 1000. Each battery of the one or more batteries 1024 can, when electrically coupled to the computing device 1000, act as a source of stored electrical power for computing device 1000. One or more batteries 1024 of power system 1022 can be configured to be portable. Some or all of one or more batteries 1024 can be readily removable from computing device 1000. In other examples, some or all of one or more batteries 1024 can be internal to computing device 1000, and so may not be readily removable from computing device 1000. Some or all of one or more batteries 1024 can be rechargeable. For example, a rechargeable battery can be recharged via a wired connection between the battery and another power supply, such as by one or more power supplies that are external to computing device 1000 and connected to computing device 1000 via the one or more external power interfaces. In other examples, some or all of one or more batteries 1024 can be non-rechargeable batteries.
[0077] One or more external power interfaces 1026 of power system 1022 can include one or more wired-power interfaces, such as a USB cable and/or a power cord, that enable wired electrical power connections to one or more power supplies that are external to computing device 1000. One or more external power interfaces 1026 can include one or more wireless power interfaces, such as a Qi wireless charger, that enable wireless electrical power connections, such as via a Qi wireless charger, to one or more external power supplies. Once an electrical power connection is established to an external power source using one or more external power interfaces 1026, computing device 1000 can draw electrical power from the external power source the established electrical power connection. In some examples, power
system 1022 can include related sensors, such as battery sensors associated with the one or more batteries or other types of electrical power sensors.
[0078] One or more external power interfaces 1026 of power system 1022 can include one or more wired-power interfaces, such as a USB cable and/or a power cord, that enable wired electrical power connections to one or more power supplies that are external to computing device 1000. One or more external power interfaces 1026 can include one or more wireless power interfaces, such as a Qi wireless charger, that enable wireless electrical power connections, such as via a Qi wireless charger, to one or more external power supplies. Once an electrical power connection is established to an external power source using one or more external power interfaces 1026, computing device 1000 can draw electrical power from the external power source the established electrical power connection. In some examples, power system 1022 can include related sensors, such as battery sensors associated with the one or more batteries or other types of electrical power sensors.
Example Methods of Operation
[0079] Figure 11 is a flowchart of a method, in accordance with example embodiments. Method 1100 may include various blocks or steps. The blocks or steps may be carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 1100.
[0080] The blocks of method 1100 may be carried out by various elements of computing device 1000 as illustrated and described in reference to Figure 10.
[0081] Block 1110 involves receiving, by an image capturing device, a facial image of one or more facial attributes of a user.
[0082] Block 1120 involves providing the facial image of the one or more facial attributes to a trained gaze detection model.
[0083] Block 1130 involves applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device.
[0084] Block 1140 involves, upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
[0085] In some embodiments, the facilitating of the image capture involves enabling an automatic capture of the image of the user by the image capturing device.
[0086] In some embodiments, the facilitating of the image capture involves displaying, by the display component, a visual cue to facilitate the image capture.
[0087] In some embodiments, the displaying of the visual cue involves displaying a visible countdown timer near the lens indicating an amount of time left for an automatic image capture of the image.
[0088] Some embodiments involve determining an expiry of the visible countdown timer near the lens. Such embodiments involve enabling an automatic capture of the image of the user by the image capturing device.
[0089] Some embodiments involve determining, by the trained gaze detection model, that the gaze of the user is not directed at the lens prior to an expiry of the visible countdown timer near the lens. Such embodiments involve disabling an automatic capture of the image of the user by the image capturing device.
[0090] In some embodiments, the visible countdown timer near the lens may be an animated circular countdown timer around the lens.
[0091] In some embodiments, the displaying of the visual cue involves displaying a modified view of a shutter button on the display component to enable the user to capture the image.
[0092] Some embodiments involve detecting a user interaction with the modified view of the shutter button. Such embodiments involve capturing the image of the user by the image capturing device.
[0093] In some embodiments, the modified view of the shutter button may be an enlarged view of the shutter button.
[0094] In some embodiments, the modified view of the shutter button involves a virtual layer overlaid over the display component, wherein a user interaction with the virtual layer is indicative of a user indication to capture the image.
[0095] In some embodiments, the modified view of the shutter button involves a repositioning of the shutter button on the display component.
[0096] In some embodiments, the determination that the gaze of the user is directed at the lens involves determining whether the gaze of the user is directed at the lens for a time that exceeds a gaze threshold time.
[0097] In some embodiments, the facilitating of the image capture may be performed upon a determination that the gaze of the user is directed at the lens for the time that exceeds the gaze threshold time.
[0098] In some embodiments, the facilitating of the image capture may not be performed upon a determination that the gaze of the user is not directed at the lens for the time that exceeds the gaze threshold time.
[0099] Some embodiments involve, in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a location of a gaze of the user. The determining of whether the gaze of the user is directed at the lens involves determining, based on the location of the gaze of the user, whether the user is looking in a vicinity of the lens.
[00100] Some embodiments involve, in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a confidence measure indicating a likelihood that the gaze of the user is directed at the lens. The determining of whether the gaze of the user is directed at the lens comprises determining whether the confidence measure exceeds a threshold confidence measure.
[00101] In some embodiments, the trained gaze detection model may be a convolutional neural network.
[00102] Some embodiments involve calibrating the lens to align with a gaze direction of the user. The determination that the gaze of the user is directed at the lens may be based on the calibrated lens.
[00103] In some embodiments, the calibrating of the lens involves providing a virtual point located on the display component. Such embodiments also involve directing the user to look at the virtual point. Such embodiments additionally involve determining, by the trained gaze detection model, a direction of the user gaze with reference to a location of the virtual point. The calibrating of the lens comprises a correction of an offset between the direction of the user gaze and the location of the virtual point.
[00104] In some embodiments, the image capturing device may be a component of a mobile device.
[00105] In some embodiments, the lens of the image capturing device may be situated at a same side of the mobile device as the display component.
[00106] In some embodiments, the lens of the image capturing device may be situated at a side of the mobile device opposite to the display component.
[00107] In some embodiments, the mobile device may be a foldable mobile device comprising first and second panels, wherein the image capturing device is located on the first panel, and wherein the display component is located on the second panel.
[00108] In some embodiments, the one or more facial attributes may be pupil attributes.
[00109] Figure 12 is another flowchart of a method, in accordance with example embodiments. Method 1200 may include various blocks or steps. The blocks or steps may be
carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 1200. [00110] The blocks of method 1200 may be carried out by various elements of computing device 1000 as illustrated and described in reference to Figure 10.
[00111] Block 1210 involves determining that the user gaze is directed toward the camera.
[00112] Block 1220 involves initializing the gaze detection timer. For example, a 3- second timer may be started.
[00113] Block 1230 involves determining whether the user gaze remains directed toward the camera for longer than a gaze threshold time. For example, this may involve determining whether the user gaze remains directed toward the camera for longer than 3 seconds.
[00114] Upon a determination that the user gaze remains directed toward the camera for longer than the gaze threshold time, the process proceeds to determine whether the camera is configured to operate in an auto-capture mode or a manual-capture mode. For example, a user may have selected a preference for one of these modes. Such a preference may be pre-selected, and/or may be selected at the time of image capture.
[00115] In the event that the camera is configured to operate in an auto-capture mode, the process may optionally proceed to block 1240 (indicated by the block with a dashed boundary). Block 1240 involves initializing an image capture timer. For example, a lens indicator such as, for example, a countdown timer animation around the lens (e.g., illustrated in Figure 5), may be provided. Upon a determination that the image capture timer has expired, at block 1245, an automatic image capture may be triggered.
[00116] Also, for example, in the event that the camera is configured to operate in an auto-capture mode, the process may optionally proceed to block 1245. At block 1245, an automatic image capture may be triggered (e.g., without providing an image capture timer).
[00117] In the event that the camera is configured to operate in a manual-capture mode, the process proceeds to block 1245. Block 1245 involves providing a modified shutter button (e.g., illustrated in Figures 6, and/or 7), and causing an image to be captured subsequent to receiving a user interaction with the modified shutter button (e.g., pressing the shutter button, hovering over the shutter button, and so forth).
[00118] Upon a determination that the user gaze does not remain directed toward the camera for longer than the gaze threshold time, the process proceeds to block 1250. Block 1250 involves canceling the image capture. Subsequently, the process may loop back to block 1210.
[00119] The particular arrangements shown in the Figures should not be viewed as limiting. It should be understood that other embodiments may include more or less of each element shown in a given Figure. Further, some of the illustrated elements may be combined or omitted. Yet further, an illustrative embodiment may include elements that are not illustrated in the Figures.
[00120] A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including a disk, hard drive, or other storage medium.
[00121] The computer readable medium can also include non-transitory computer readable media such as computer-readable media that store data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media can also include non-transitory computer readable media that store program code and/or data for longer periods. Thus, the computer readable media may include secondary or persistent long-term storage, like read only memory (ROM), optical or magnetic disks, compact disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
[00122] While various examples and embodiments have been disclosed, other examples and embodiments will be apparent to those skilled in the art. The various disclosed examples and embodiments are for purposes of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.
Claims
1. A computer-implemented method, comprising: receiving, by an image capturing device, a facial image of one or more facial attributes of a user; providing the facial image of the one or more facial attributes to a trained gaze detection model; applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device; and upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
2. The computer-implemented method of claim 1, wherein the facilitating of the image capture comprises enabling an automatic capture of the image of the user by the image capturing device.
3. The computer-implemented method of claim 1, wherein the facilitating of the image capture comprises displaying, by the display component, a visual cue to facilitate the image capture.
4. The computer-implemented method of claim 3, wherein the displaying of the visual cue comprises displaying a visible countdown timer near the lens indicating an amount of time left for an automatic image capture of the image.
5. The computer-implemented method of claim 4, further comprising: determining an expiry of the visible countdown timer near the lens; and enabling an automatic capture of the image of the user by the image capturing device.
6. The computer-implemented method of claim 4, further comprising: determining, by the trained gaze detection model, that the gaze of the user is not directed
at the lens prior to an expiry of the visible countdown timer near the lens; and disabling an automatic capture of the image of the user by the image capturing device.
7. The computer-implemented method of claim 4, wherein the visible countdown timer near the lens is an animated circular countdown timer around the lens.
8. The computer-implemented method of claim 3, wherein the displaying of the visual cue comprises displaying a modified view of a shutter button on the display component to enable the user to capture the image.
9. The computer-implemented method of claim 8, further comprising: detecting a user interaction with the modified view of the shutter button; and capturing the image of the user by the image capturing device.
10. The computer-implemented method of claim 8, wherein the modified view of the shutter button is an enlarged view of the shutter button.
11. The computer-implemented method of claim 8, wherein the modified view of the shutter button comprises a virtual layer overlaid over the display component, wherein a user interaction with the virtual layer is indicative of a user indication to capture the image.
12. The computer-implemented method of claim 8, wherein the modified view of the shutter button comprises a repositioning of the shutter button on the display component.
13. The computer-implemented method of claim 1, wherein the determination that the gaze of the user is directed at the lens further comprises: determining whether the gaze of the user is directed at the lens for a time that exceeds a gaze threshold time.
14. The computer-implemented method of claim 13, wherein the facilitating of the image capture is performed upon determining that the gaze of the user is directed at the lens for the time that exceeds the gaze threshold time.
15. The computer-implemented method of claim 13, wherein the facilitating of the
image capture is not performed upon determining that the gaze of the user is not directed at the lens for the time that exceeds the gaze threshold time.
16. The computer-implemented method of claim 1, further comprising: in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a location of a gaze of the user, and wherein the determining of whether the gaze of the user is directed at the lens comprises determining, based on the location of the gaze of the user, whether the user is looking in a vicinity of the lens.
17. The computer-implemented method of claim 1, further comprising: in response to the providing of the facial image of the one or more facial attributes to the trained gaze detection model, receiving, from the trained gaze detection model, a confidence measure indicating a likelihood that the gaze of the user is directed at the lens, and wherein the determining of whether the gaze of the user is directed at the lens comprises determining whether the confidence measure exceeds a threshold confidence measure.
18. The computer-implemented method of claim 1, wherein the trained gaze detection model is a convolutional neural network.
19. The computer-implemented method of claim 1, further comprising: calibrating the lens to align with a gaze direction of the user, and wherein the determination that the gaze of the user is directed at the lens is based on the calibrated lens.
20. The computer-implemented method of claim 19, the calibrating of the lens further comprising: providing a virtual point located on the display component; directing the user to look at the virtual point; and determining, by the trained gaze detection model, a direction of the user gaze with reference to a location of the virtual point, and wherein the calibrating of the lens comprises a correction of an offset between the direction of the user gaze and the location of the virtual point.
21. The computer-implemented method of claim 1, wherein the image capturing device is a component of a mobile device.
22. The computer-implemented method of claim 21, wherein the lens of the image capturing device is situated at a same side of the mobile device as the display component.
23. The computer-implemented method of claim 21, wherein the lens of the image capturing device is situated at a side of the mobile device opposite to the display component.
24. The computer-implemented method of claim 21, wherein the mobile device is a foldable mobile device comprising first and second panels, wherein the image capturing device is located on the first panel, and wherein the display component is located on the second panel.
25. The computer-implemented method of claim 1, wherein the one or more facial attributes are pupil attributes.
26. A computing device, comprising: one or more processors; and data storage, wherein the data storage has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out functions comprising: receiving, by an image capturing device, a facial image of one or more facial attributes of a user; providing the facial image of the one or more facial attributes to a trained gaze detection model; applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device; and upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
27. An article of manufacture comprising one or more computer readable media
having computer-readable instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to carry out functions comprising: receiving, by an image capturing device, a facial image of one or more facial attributes of a user; providing the facial image of the one or more facial attributes to a trained gaze detection model; applying the trained gaze detection model to determine whether a gaze of the user is directed at a lens of the image capturing device; and upon a determination that the gaze of the user is directed at the lens, facilitating, by a display component, an image capture of an image of the user such that the gaze of the user remains directed at the lens during the image capture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2023/032458 WO2025058615A1 (en) | 2023-09-12 | 2023-09-12 | Methods and systems for a gaze-based image capture mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2023/032458 WO2025058615A1 (en) | 2023-09-12 | 2023-09-12 | Methods and systems for a gaze-based image capture mode |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025058615A1 true WO2025058615A1 (en) | 2025-03-20 |
Family
ID=88287442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/032458 WO2025058615A1 (en) | 2023-09-12 | 2023-09-12 | Methods and systems for a gaze-based image capture mode |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2025058615A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9344673B1 (en) * | 2014-03-14 | 2016-05-17 | Brian K. Buchheit | Enhancing a camera oriented user interface via an eye focus guide |
US20160323503A1 (en) * | 2014-01-30 | 2016-11-03 | Sharp Kabushiki Kaisha | Electronic device |
US20210303968A1 (en) * | 2018-10-08 | 2021-09-30 | Google Llc | Systems and Methods for Providing Feedback for Artificial Intelligence-Based Image Capture Devices |
US20220300072A1 (en) * | 2021-03-19 | 2022-09-22 | Nvidia Corporation | Personalized calibration functions for user gaze detection in autonomous driving applications |
-
2023
- 2023-09-12 WO PCT/US2023/032458 patent/WO2025058615A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160323503A1 (en) * | 2014-01-30 | 2016-11-03 | Sharp Kabushiki Kaisha | Electronic device |
US9344673B1 (en) * | 2014-03-14 | 2016-05-17 | Brian K. Buchheit | Enhancing a camera oriented user interface via an eye focus guide |
US20210303968A1 (en) * | 2018-10-08 | 2021-09-30 | Google Llc | Systems and Methods for Providing Feedback for Artificial Intelligence-Based Image Capture Devices |
US20220300072A1 (en) * | 2021-03-19 | 2022-09-22 | Nvidia Corporation | Personalized calibration functions for user gaze detection in autonomous driving applications |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11989350B2 (en) | Hand key point recognition model training method, hand key point recognition method and device | |
US10956714B2 (en) | Method and apparatus for detecting living body, electronic device, and storage medium | |
US11263469B2 (en) | Electronic device for processing image and method for controlling the same | |
US12148250B2 (en) | AI-based face recognition method and apparatus, device, and medium | |
EP3195595B1 (en) | Technologies for adjusting a perspective of a captured image for display | |
US9750420B1 (en) | Facial feature selection for heart rate detection | |
US9607138B1 (en) | User authentication and verification through video analysis | |
US9864430B2 (en) | Gaze tracking via eye gaze model | |
EP2842075B1 (en) | Three-dimensional face recognition for mobile devices | |
US9105210B2 (en) | Multi-node poster location | |
US9035970B2 (en) | Constraint based information inference | |
US8942434B1 (en) | Conflict resolution for pupil detection | |
US10489912B1 (en) | Automated rectification of stereo cameras | |
CN102841354B (en) | Vision protection implementation method of electronic equipment with display screen | |
US10037614B2 (en) | Minimizing variations in camera height to estimate distance to objects | |
EP3382600A1 (en) | Method of recognition based on iris recognition and electronic device supporting the same | |
EP4026092B1 (en) | Scene lock mode for capturing camera images | |
US20180053352A1 (en) | Occluding augmented reality content or thermal imagery for simultaneous display | |
KR20180138300A (en) | Electronic device for providing property information of external light source for interest object | |
KR20160108388A (en) | Eye gaze detection with multiple light sources and sensors | |
KR102694171B1 (en) | Electronic device and method for providing function using corneal image thereof | |
CN109478227A (en) | Calculate the iris in equipment or the identification of other physical feelings | |
JP2022553501A (en) | Controlling exposure changes in low-light environments | |
KR20250056240A (en) | Touch-based augmented reality experience | |
CN106991376A (en) | With reference to the side face verification method and device and electronic installation of depth information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23783969 Country of ref document: EP Kind code of ref document: A1 |