CN120917404A

CN120917404A - Methods, computer programs and devices for adjusting services consumed by humans, robots, vehicles, entertainment systems and smart home systems

Info

Publication number: CN120917404A
Application number: CN202480018565.4A
Authority: CN
Inventors: 林珊; 皮尔乔治·萨托; 迪德里克·保罗·莫伊斯
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2023-03-21
Filing date: 2024-03-15
Publication date: 2025-11-07
Also published as: EP4684266A1; WO2024194211A1

Abstract

Examples relate to methods, computer programs and devices for adjusting services consumed by humans, robots, vehicles, entertainment systems and smart home systems. A method for adjusting a service consumed by a human includes capturing optical event data of the human using an event-based sensor and analyzing the optical event data to detect a predefined emotional pattern. The method further includes determining information about an emotional state of the human based on the emotional pattern detected in the optical event data and adjusting the service based on the information about the emotional state of the human.

Description

Methods, computer programs and devices for adjusting services consumed by humans, robots, vehicles, entertainment systems and smart home systems

Technical Field

Examples relate to methods, computer programs and devices for adjusting services consumed by humans, robots, vehicles, entertainment systems and smart home systems, and more particularly, but not exclusively, to concepts for adjusting services consumed by humans based on their emotional states.

Background

Conventionally, human emotion recognition is a challenging task because it is difficult to reveal a person's internal emotion. However, with facial mimicking and limb language, human emotion analysis will be possible. For example, in our daily communications, human emotions are mainly expressed using facial expressions. Facial expressions are complex in that they are based on a variety of factors and contributors. However, known models map certain human emotions to certain facial expressions. For example, it is well known that laughter or smile is a happy expression, and when laughter or smile, the corners of one's mouth are raised.

Disclosure of Invention

An example is based on the finding that micro-actions in a human face can be captured using an event-based optical sensor. Based on the optical event data including information about the micro-actions, an emotional state of the human may be detected based on the predefined emotional pattern. The service may then be adapted or adjusted based on the emotional state of the human.

An example provides a method for adjusting a service consumed by a human. The method includes capturing optical event data of a human using an event-based sensor and analyzing the optical event data to detect a predefined emotional pattern. The method further includes determining information about an emotional state of the human based on the emotional pattern detected in the optical event data and adjusting the service based on the information about the emotional state of the human. The event-based optical data may have a higher frame rate or resolution than conventional video data, so that shorter actions may be detected and considered for determination of an emotional state.

For example, the service is a game, movie, smart home service, or meta universe. As will be shown in the following description, there are a variety of services that may benefit from adjustment based on the emotional state of humans. Further, capturing may include capturing optical event data from a plurality of event-based sensors. Using event data from multiple sensors can facilitate resolution of the data and more reliable emotion pattern detection. In some examples, capturing may include capturing optical event data in a three-dimensional (3D) data volume including a time component. Thus, the use of 3D data over time, for example, may enable a more detailed analysis of the predefined pattern,

Depending on the particular implementation in the example, the optical event data may include pixel level polarity data regarding brightness, color, or contrast changes. Sampling of the optical quantity variation allows for a reduced data rate while still maintaining a high frame rate or sampling rate. Transmitting only differential data (e.g., changes in polarity, brightness, and/or color) allows for a reduction in the amount of data transmitted. For example, the optical event data has a sampling rate of at least 500 frames per second. Thereby, even very short optical changes can be tracked. Analysis may include determining micro-actions in the optical event data to detect a predefined emotional pattern. For example, a micro-motion refers to one or more elements of the group of a human mouth, eyes, eyebrows, eyelids, lips, lower jaw, skin, cheek and nose. Examples may enable tracking or acquisition of optical event data for micro-actions, allowing detailed analysis of human facial expressions, for example. Thus, the analysis may include analyzing the facial expression of the human to detect the emotional state. Since facial expression may not be the only indicator of emotional state, analysis may also include analyzing a body posture or pose of a human to detect emotional state.

As generally outlined above, in an example, any conceivable service may be adjusted based on emotional state. Adjustments to the service may include one or more elements from the group of increasing or decreasing the agility of the game character, increasing or decreasing the audio volume, increasing or decreasing the speed of speaking or audio content, changing the style of music, adjusting room temperature, adjusting the brightness and/or color of lights, providing special effects in the service, adjusting the avatar (avatar) of a human, and/or introducing other people to the human. The service may be provided in any environment, examples being smart home, car, virtual or augmented reality, entertainment, etc.

Examples also provide a computer program with a program code for performing the method as described herein when the computer program is executed in a computer, a processor or a programmable hardware component. Another example is an apparatus for adjusting a service consumed by a human, the apparatus comprising circuitry configured to perform any of the methods described herein. For example, the device also includes a plurality of event-based optical sensors having different fields of view for the user.

Other examples are a robot comprising the device, a vehicle comprising the device, an entertainment system comprising the device and/or a smart home system comprising the device.

Drawings

Some examples of devices and/or methods will be described below, by way of example only, and with reference to the accompanying drawings, in which:

FIG. 1 shows a block diagram of a flowchart of an example of a method for adjusting a service consumed by a human.

FIG. 2 shows a block diagram of an example of a device for adjusting services consumed by humans;

FIG. 3 shows an example of a discrete emotion model;

FIG. 4 illustrates an example of a dimensional emotion model;

FIG. 5 depicts an example of a facial expression;

FIG. 6 shows a block diagram of a method for detecting an emotional state of a human in an example;

FIG. 7 shows a block diagram of another example of a method for adjusting a service for a human based on an emotional state of the human;

FIG. 8 illustrates emotional state detection based on a predefined pattern;

FIG. 9 shows a block diagram of another example of a method for adjusting a service (game/movie or meta universe) for a human based on the emotional state of the human;

FIG. 10 shows a block diagram of the example of FIG. 9 with additional information;

FIG. 11 shows a block diagram of another example of a method for adjusting a robotic service for a human based on an emotional state of the human, and

Fig. 12 shows a block diagram of the example method of fig. 11 with additional information.

Detailed Description

Some examples will now be described in more detail with reference to the accompanying drawings. However, other possible examples are not limited to the features of these detailed examples. Other examples may include modifications to the features, equivalents and alternatives to the features. Furthermore, the terminology used herein to describe certain examples should not be limiting of other possible examples.

Throughout the description of the figures, the same or similar reference numerals refer to the same or similar elements and/or features, which may be the same or implemented in modified form while providing the same or similar functionality. The thickness of lines, layers and/or regions in the figures may also be exaggerated for clarity.

When two elements a and B are used in combination "or" it is to be understood that all possible combinations are disclosed, i.e. only a, only B and a and B, unless explicitly defined otherwise in the specific case. As alternative expressions for the same combination, "at least one of a and B" or "a and/or B" may be used. The same applies to combinations of more than two elements.

If the use of the singular forms (such as "a", "an", and "the") and the plural referents are not explicitly or implicitly defined as mandatory only for use of a single element, other examples may also use several elements to perform the same function. If a function is described below as being implemented using multiple elements, other examples may use a single element or a single processing entity to implement the same function. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used, specify the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, procedures, elements, components, and/or groups thereof.

Fig. 1 shows a block diagram of a flowchart of an example of a method 10 for adjusting a service consumed by a human. The method 10 includes capturing 12 optical event data of a human using an event-based sensor and analyzing 14 the optical event data to detect a predefined emotional pattern. Method 10 further includes determining 16 information about the emotional state of the human based on the emotional pattern detected in the optical event data and adjusting 18 the service based on the information about the emotional state of the human.

Fig. 2 shows a block diagram of an example of a device 20 for adjusting services consumed by humans. The apparatus 20 includes circuitry 22 configured to perform the method 10 or any of the methods described herein. Fig. 2 also shows an optional entity 200 comprising the device 20. For example, the entity 200 may be a robot (industrial or home), a vehicle (car, bus, truck, train, airplane, etc.), an entertainment system (home, on-board, etc.), or a smart home system (light control, temperature control, etc.).

Circuitry 22 may be implemented using one or more processing units, one or more processing devices, any means for processing, such as a processor, a computer, or programmable hardware components operable with corresponding adaptation software. In other words, the described functionality of circuitry 22 may also be implemented in software, which is then executed on one or more programmable hardware components. Such hardware components may include general purpose processors, digital Signal Processors (DSPs), graphics Processing Units (GPUs), microcontrollers, etc. Other examples are devices or entities 200, such as an automated system, e.g. for entertainment, gaming, exercise, vehicles, including examples of apparatus 20.

As also shown in fig. 2, the device 20 may optionally (shown in phantom) include one or more interfaces 24 coupled to corresponding circuitry 22. One or more interfaces 24 may correspond to one or more inputs and/or outputs for receiving and/or transmitting information within a module, between modules, or between modules of different entities, which may be transmitted in digital (bit) values according to a specified code. For example, interface 24 may include interface circuitry configured to receive and/or transmit information. In an example, interface 24 may correspond to any means for obtaining, receiving, transmitting, or providing analog or digital signals or information, such as any connector, contact, pin, register, input port, output port, conductor, channel, etc., that allows a signal or information to be provided or obtained. The interface 24 may be configured to communicate (transmit, receive, or both) in a wireless or wired manner, and it may be configured to communicate with other internal or external components, i.e., transmit and/or receive signals, information. The one or more interfaces 24 may include other components to enable communication in a (mobile) communication system, such as transceiver (transmitter and/or receiver) components, such as one or more Low Noise Amplifiers (LNAs), one or more Power Amplifiers (PAs), one or more diplexers, one or more duplex filters, one or more filters or filtering circuitry, one or more converters, one or more mixers, correspondingly, adapted radio frequency components, etc. The interface 24 may be used to provide service adjustment information to devices that provide services to a user or human.

Human emotion recognition is a challenging task because it is difficult to reveal a person's internal emotion. However, with facial mimicking and limb language, human emotion analysis will be possible. Studies revealed that in our daily communication, human emotion is mainly (55%) expressed using facial expressions, see Communicating Without Words (no language communication) of a.mehrabaian, psychic.today (today's psychology) (1968) 53-55. A normal camera may not be sufficient to capture facial micro-expressions, which may be implemented using event-based vision (EVS) sensors. EVS sensors can detect abrupt changes in expression, such as eyebrow movements, swallowing, pulsing blood vessels, or abrupt eye movements, for example, when people lie they tend to look downward, and when they recall they tend to look upward. Due to the speed and low contrast of conventional cameras, changes and movements of the human face are difficult for these cameras to detect. Furthermore, frame-based data may have high redundancy in the acquired data for emotion analysis. Image-level camera sensing may cause large data storage and processing delays. Conventional cameras may have difficulty capturing facial micro-expressions that occur in a short amount of time and may result in high costs when using high-speed cameras.

Examples utilize one or more event sensors. An event camera is an imaging sensor that is responsive to local changes in, for example, brightness. Event cameras do not use shutters to capture images as conventional (frame) cameras. Alternatively, each pixel inside the event camera operates independently and asynchronously, reporting changes in brightness as they occur, otherwise keeping silent. Event cameras are considered next generation technology for computer vision tasks due to their high speed, high dynamic range, and data acquisition efficiency. Event cameras are used to capture actions in a scene and static objects do not activate event sensors.

In other examples, device 20 may also include one or more (e.g., multiple) event-based optical sensors, which may have different fields of view for the user. The one or more sensors may be used to capture 12 human optical event data. For example, the sensor may comprise a pixel-level sensor that operates independently and asynchronously. The pixel sensor may be activated upon detection of a change, i.e. if the optical characteristics captured by the pixel change, for example, when an action or event occurs. The optical event data may include pixel level polarity data regarding brightness, color, and/or contrast changes.

This may allow for lower latency, lower power consumption, and lower data processing requirements than a frame-based sensor that captures full frame information in each sampling step. Event-based sensors may have a higher dynamic range than frame-based sensors, even though they use high-speed vision (high frame rate). For example, the optical event data has a sampling rate of at least 500 frames per second. Using event-based sensors, events may be recorded that would otherwise require a conventional camera to run at 10,000 images/second and higher. Event-based sensors may allow for a latency of 40 to 200 mus at high bandwidth, e.g., 60Mep (mega events per second) and above. Furthermore, even 3D capture can be achieved by event-based sensors. Capturing 12 may include capturing optical event data in a 3D data volume including a time component, which may form a data basis for very efficient and accurate motion detection.

Real-time or near real-time adjustment, 1ms to 10ms algorithmic adjustment, can be achieved with emotion recognition and emotion analysis of the event-based vision sensor (face/body). For example, a high speed camera may operate at 100 frames/second, while an event-based sensor may operate at 1,000 frames/second (events/second) and greater.

In an example, the predefined emotional pattern may be mapped to an emotional state of the human using the emotional model. Fig. 3 shows an example of a discrete emotion model. Two discrete emotion models for emotion calculation (AFFECTIVE COMPUTING) are shown. The left side (a) of fig. 3 shows six basic emotions (anger, aversion, fear, happiness, sadness, and surprise) shown as the emoticon types. The right side (b) of fig. 3 shows a higher level mood wheel model (emotional wheel model) that has finer differentiation between different mood. Discrete emotion models define emotions into a limited class, two widely used emotion models are six basic emotions (anger, aversion, fear, happiness, sadness, surprise) of Ekman (Ekman), see Universals and cultural DIFFERENCES IN FACIAL expressions of emotion of P.Ekman (commonality and cultural differences of emotion facial expressions), nebr.Symp.Motiv.19 (1971) 207-283, and the emotion wheel model of Prague (Plutchik), emotion and Life: PERSPECTIVE FROM PSYCHOLOGY BIOLOGY AND EVOLUTION of Plutchik Robert (perspectives of emotion versus life: psychology, biology and evolution), am.Physiol.assoc. (2003), both shown in FIG. 3.

The development of the basic emotion model of Ekkman is based on the assumption that human emotion is shared between groups and cultures, see Wang Y, song W, tao W et al ASYSTEMATIC REVIEW on affective computing: emotion models, databases, AND RECENT ADVANCES (overview of systems for emotion computing: emotion model, database and recent progress), information Fusion,2022. From the emoticons, differences in facial expressions for different emotions can be identified.

Unlike the basic emotion model of Ekkan, the emotion wheel model of Prazik involves eight emotions (happy, fear, surprise, sad, expected, anger and aversion) and the relationship between one emotion and another.

Fig. 4 shows an example of a dimensional emotion model. A continuous multidimensional model was introduced to describe fine-grained emotions. As shown in fig. 4, the two most accepted models are the Pleasure-Arousal-dominance model (plaeasure-Arousal-Dominance, PAD) model (fig. 4 left (A)) and the titer-Arousal (V-A) model (fig. 4 right (A)). The two-dimensional PAD model may represent A large portion of different emotions, and the V-A model is used to represent complex emotions.

Examples may consider facial expressions and emotions. With different actions and performances on our face, humans communicate their emotions and feelings with others, as will be shown in the examples below https:// www.scienceofpeople.com/microexpressions/.

Fig. 5 depicts an example of a facial expression. Fig. 5 depicts facial expressions for a pair of females and males that are surprised 52 (at the top), sad 54 (in the middle), and happy 56 (at the bottom).

For the facial expression of surprise 52, the following facial actions (emotional patterns) help to express the surprise feeling:

The eyebrow is raised and bent,

Skin stretching under the eyebrows,

Horizontal wrinkles are revealed across the forehead,

Eyelid opening, eye white appearing above and below, and

The mandible opens and the teeth separate, but the mouth is not strained or stretched.

For the facial expression of sadness 54, the following facial actions (emotional modes) help express the feeling of sadness:

the inner corners of the eyebrows are pulled inwardly and upwardly,

The skin under the eyebrows is triangular, with the inner angle up,

The mouth angle is pulled down and,

Mandibular lift, and

The lower lip is pursed up.

For the facial expression of happiness 56, the following facial actions (emotion patterns) contribute to expressing the sense of happiness:

The mouth angle is pulled back and up,

The mouth may or may not open, exposing the teeth,

Wrinkles extend from the outside of the nose to the outside of the lips,

The cheek is lifted up and the person's face is lifted up,

The lower eyelid can reveal wrinkles or be tight, and

Fish tail near the outside of the eye.

As part of the above analyzing 14 step, examples may determine micro-actions in the optical event data to detect a predefined emotional pattern. Micro-actions may refer to one or more elements of the group of a human mouth, eyes, eyebrows, eyelids, lips, lower jaw, skin, cheeks and nose. As outlined above and shown by the facial expressions in fig. 5, analysis 14 may include analyzing the facial expressions of the human to detect emotional states. In other examples, analysis 14 may additionally include analyzing a body posture or gesture of the human to detect an emotional state.

As mentioned above, facial expressions can be used as reliable features for analyzing human emotions, as humans tend to make similar facial expressions when they feel similar emotions. Thus, as the human face changes, the emotion of the person will be analyzed according to typical facial expression patterns. The actions and changes on the face, in particular facial micro-expressions, can be captured in an efficient manner using an event camera. Compared to conventional RGB (red, green, blue) cameras, event cameras have a higher frame rate to capture micro-expressions that may occur as fast as 1/15 to 1/25 seconds.

Fig. 6 shows a block diagram of a method for detecting an emotional state of a human in an example. Fig. 6 shows a diagram of a process for facial emotion detection with an event camera. Fig. 6 shows a face 61 using an event camera recording 62. The optical event data from the event sensor 62 is then analyzed, and events with facial feature points are extracted 63. In the face data 64, a comparison of the old event 64a with the new event 64b (mouth starting smile) is depicted. In step 65, facial actions are identified (emotional pattern detected). In the emotion analysis 66, it is determined that the mouth corners pull back and up, which is an emotion pattern indicating the emotional state "happy" 67. With EVS, people can detect sudden changes in expression, eyebrow movements, pulsating blood vessels, or sudden eye movements (people look down when they lie and up when they recall).

In other examples, event cameras or sensors may also be used for human emotion recognition with body gestures. In addition to facial expressions, the posture of the body also indicates different emotions of the person. Human actions may also be recorded with the event camera to aid in emotion recognition. Stress movements like restlessness or twitches and cramps may be detected to indicate a disagreement.

For the exemplary potential application areas, first, emotion recognition may be used as an auxiliary monitor for entertainment systems or interactive games. Second, the driving assistance monitor may also benefit from driver emotion recognition to avoid fatigue driving and recognize extreme emotion (anger, sadness, etc.) or disease signs (sadness, pain, etc.) to avoid traffic accidents and improve driving safety. Furthermore, human emotion recognition may facilitate human-machine interaction, as if a robot (e.g., a service robot) can understand the emotion of an owner, they may take action accordingly, which may increase user satisfaction with the robot. In a general example, the adjustment 18 to the service includes one or more elements from the group of increasing or decreasing the agility of the game character, increasing or decreasing the audio volume, increasing or decreasing the speed of speaking or audio content, changing the style of music, adjusting room temperature, adjusting the brightness and/or color of lights, providing special effects in the service, adjusting the avatar of a human, or introducing others to a human.

If the entire face of a person moves, the optical flow of most events can be detected and subtracted from the scene. In this way, only the part of the face that moves in different ways of movement (for example the eyebrows) can be segmented and analysed. Examples emotion recognition may be improved by capturing 12 human facial micro-expressions. Examples may provide affordable and accurate analysis methods for emotion recognition. Examples may be implemented with simple devices (event sensors) and may not require high computational costs. Examples may be incorporated into many areas to improve user satisfaction.

Fig. 7 shows a block diagram of another example of a method for adjusting a service for a human based on an emotional state of the human. Fig. 7 shows a user 71 whose optical event data is captured, and whose optical events are detected and analyzed 72. Based on the determined emotional state of the user, real-time parameter adjustment 73 is performed and a refreshed stream 74 is provided to the user 71 in the user device. Fig. 7 shows a control loop for parameter adjustment on a user device based on the emotional state of the user. Event cameras are used in emotion monitoring loops to allow decisions and adjustments to be made in real-time (or near real-time). To provide a better user experience, changes or adjustments to the device settings should be imperceptible to the user. In an example, the adjustment may be faster than a conventional imaging sensor that captures data frame-by-frame with an exposure time. With conventional data capture and transmission, delays in refreshing the device according to the new settings may be incurred. However, event sensors have a data stream, can record data continuously, and the burden of transmitting data can be much lower compared to a full image transmission. Furthermore, the event camera may capture a 3D data volume containing temporal information that may help better predict the facial emotion of the user. The EVS may allow refresh for a (nearly) imperceptible duration.

If users observe that the adjustment of the device (game, movie, music, robot) slows or pauses, they may become annoying to the discontinuity of the flow and this may decrease the user's satisfaction.

The following table shows parameter adjustments in examples.

The following table shows an example of parameter adjustment in a weighted emotion matrix.

In an example, the face region may be detected by detecting a blink. Additionally or alternatively, event-based face detection may be performed, for example, using an enhanced kernel correlation filter for event-based face detection.

Fig. 8 illustrates emotional state detection based on a predefined pattern. Fig. 8 shows the detected face at the top. After detecting the emotional pattern in the face, confidence or ranking information for the different emotional states of the user may be determined. These are shown in the bar graph, which shows that happy has the highest confidence/ranking. Another face is shown at the bottom of fig. 8, where relevant keypoints are shown along the eyebrows, eyes, nose, and mouth. For example, a distance measurement between consecutive keypoints marking a mouth corner may indicate that the mouth corner is moving up (smiling) or down (sad).

For example, with the forward event rate in the relevant area, a level of happiness (confidence) can be obtained by evaluating the percentage of forward events in the mouth and eye areas. Determining the distance of key points in the face may simplify facial marker deformation or facial pose alignment detection.

Fig. 9 shows a block diagram of another example of a method of adjusting a service (game/movie or meta universe) for a human based on an emotional state of the human. The service may be a virtual reality movie or game, and the user 98 may wear a Virtual Reality (VR) device 99 (e.g., glasses) having a plurality of event-based sensors 99a, 99b, 99c, each having a field of view. There may be more than the three sensors shown in fig. 9. For example, one sensor may be positioned at a corner of the glasses and used to detect motion at the mouth area. Another sensor may be positioned on the frame near the ear and used to collect motion at the user's cheek. Another sensor may be positioned toward the forehead and yet another sensor may be positioned toward the eye.

Fig. 9 shows activation of services 91 at the top, after which VR product 99 is started or added to the service/meta universe 92. This example method detects 93 boring on the face of the user 98 consistent with the description above. In a game/movie scene, the agility of the game character may be increased 94 or the volume of the music may be increased 94 based on the boring level. User 98 may then feel inspired and feel better 95. Steps 93, 94 and 95 run in a feedback loop for real-time updating or adjustment. In the case where the service is meta-universe, special effects may be provided on the avatar or new friends 96 may be introduced. The user 98 may then show interest and be happier 97. Steps 93, 96 and 97 also run in a feedback loop for real-time updating or adjustment.

Fig. 10 shows a block diagram of the example of fig. 9. In fig. 10, it is further indicated that the game developer may collect real-time and real-time feedback of recently introduced features, and that the game may be personalized or customized, such as with background music adjustments, game scene adjustments (e.g., on the beach or in the mountain). For movie/movie scenes, multiple EVSs may be used to develop a system for adjusting engagement levels (e.g., in terms of fear, laugh, throw, boring, etc.) to generate automatic movie ratings. Furthermore, in some examples, avatars in the meta-universe may be optimized in real-time at similar speech rates from a user individual database. With the loop shown in fig. 9, the avatar can be continuously refreshed, and smoother actions such as smiling or frowning can be provided according to the user in the real world. The advantage in this example is real-time information collection, rather than the traditional steps of data collection, processing and signaling. The event stream is continuous so that all steps are parallel and all data collection and parameter adjustment can be done in real time.

Fig. 11 shows a block diagram of another example of a method for adjusting a robotic service for a human based on an emotional state of the human. Fig. 11 shows on the left side that the robot 110 interacts with the user, who is first shown in a sad emotional state 111a, but then feels excited to become a more happy emotional state 111b. The corresponding flow chart is shown on the right. It starts with interactions 112 between the user and the robot. In step 113, face positioning using the EVS and facial emotion analysis are performed. Active emotion and system loop 114 operates as outlined above and monitors or supervises emotion at 115. In case the user is happy, a higher speed and frequency 116a may be used. This may result in the user being happy the more 116b the robot interacts with the user. The feedback loop includes steps 115, 116a and 116b for real-time updating. In case the user is ill-minded, a soothing music may be played or the conversation of the robot may be slowed 117a. If the user feels an boost (positive feedback) 117b, the feedback loop is closed, returning to step 115. If the user is still ill-minded (negative feedback) 118a, a relaxed scent is released at 118 b.

Fig. 12 shows a block diagram of the example method of fig. 11 with additional information. For EVS emotion analysis, since the sensor captures a 3D data volume containing a time component, the captured information can be used to better predict the emotion that occurs and estimate the duration of the emotion. As mentioned for EVS emotion sensing, it can not only detect emotion, but also analyze the transfer of emotion, which is especially helpful for the robot to adjust its settings according to the user's emotion.

Some examples are summarized below:

(1) A method for adjusting a service consumed by a human includes capturing optical event data of the human using an event-based sensor, analyzing the optical event data to detect a predefined emotional pattern, determining information about an emotional state of the human based on the emotional pattern detected in the optical event data, and adjusting the service based on the information about the emotional state of the human.

(2) The method of (1), wherein the service is a game, a movie, a smart home service, or a meta universe.

(3) The method of (1) or (2), wherein the capturing comprises capturing optical event data from a plurality of event-based sensors.

(4) The method of (1) to (3), wherein the capturing comprises capturing optical event data in a three-dimensional data volume comprising a time component.

(5) The method of (1) to (4), wherein the optical event data comprises pixel-level polarity data regarding brightness or contrast variation.

(6) The method of (1) to (5), wherein the optical event data has a sampling rate of at least 500 frames per second.

(7) The method of (1) to (6), wherein the analysis comprises determining micro-actions in the optical event data to detect a predefined emotional pattern.

(8) The method according to (8), wherein the micro-motions refer to one or more elements of the group of mouth, eyes, eyebrows, eyelids, lips, lower jaw, skin, cheek and nose of a human.

(9) The method of (1) to (8), wherein the analyzing comprises analyzing a facial expression of the human to detect an emotional state.

(10) The method of (1) to (9), wherein the analysis comprises analyzing a body posture or gesture of the human to detect an emotional state.

(11) The method of (1) to (10), wherein the adjustment of the service comprises one or more elements from the group of increasing or decreasing the agility of the game character, increasing or decreasing the volume of the audio, increasing or decreasing the speed of speaking or audio content, changing the style of music, adjusting room temperature, adjusting the brightness and/or color of lights, providing special effects in the service, adjusting the avatar of a human, or introducing others to the human.

(12) A computer program having a program code for performing a method according to any other example herein, when the computer program is executed in a computer, a processor or a programmable hardware component.

(13) An apparatus for adjusting a service consumed by a human comprising circuitry configured to perform one of the methods of (1) through (11).

(14) The apparatus of (13), further comprising a plurality of event-based optical sensors having different fields of view for the user.

(15) The device of (13) or (14), wherein the service is a game, a movie, a smart home service, or a meta universe.

(16) The device of (13) to (15), wherein the circuitry is configured to capture optical event data from a plurality of event-based sensors.

(17) The device of (13) to (16), wherein the circuitry is configured to capture optical event data in a three-dimensional data volume comprising a time component.

(18) The apparatus of (13) to (17), wherein the optical event data comprises pixel-level polarity data regarding brightness or contrast variation.

(19) The apparatus of (13) to (18), wherein the optical event data has a sampling rate of at least 500 frames per second.

(20) The device of (13) to (19), wherein the circuitry is configured to determine micro-actions in the optical event data to detect a predefined emotional pattern.

(21) The device of (20), wherein the micro-action refers to one or more elements of the group of a human mouth, eyes, eyebrows, eyelids, lips, lower jaw, skin, cheek and nose.

(22) The device of (13) to (21), wherein the circuitry is configured to analyze a facial expression of a human to detect an emotional state.

(23) The device of (13) to (22), wherein the circuitry is configured to analyze a body posture or gesture of a human to detect an emotional state.

(24) The device of (13) to (23), wherein the circuitry is configured to adjust the service by one or more elements of the group of increasing or decreasing the agility of the game character, increasing or decreasing the volume of audio, increasing or decreasing the speed of speaking or audio content, changing the style of music, adjusting room temperature, adjusting the brightness and/or color of lights, providing special effects in the service, adjusting the avatar of a human, or introducing others to a human.

(25) A robot comprising the apparatus according to (13) to (24).

(26) A vehicle comprising the apparatus according to (13) to (24).

(27) An entertainment system comprising the apparatus according to (13) to (24).

(28) An intelligent home system comprising the apparatus according to (13) to (24).

Aspects and features described with respect to a particular one of the foregoing examples may also be combined with one or more other examples to replace the same or similar features in the other examples or to additionally introduce features into the other examples.

Examples may also be or relate to a (computer) program comprising a program code for performing one or more of the above methods when the program is executed in a computer, processor or other programmable hardware component. Thus, the steps, operations, or processes in the various methods described above may also be performed by a programmed computer, processor, or other programmable hardware component. Examples may also encompass program storage devices (such as digital data storage media) that are machine-readable, processor-readable, or computer-readable, and that encode and/or contain machine-executable, processor-executable, or computer-executable programs and instructions. For example, the program storage device may include or be a digital storage device, a magnetic storage medium (such as a magnetic disk and tape), a hard disk drive, or an optically readable digital data storage medium. Other examples may also include a computer, processor, control unit, (field) programmable logic array ((F) PLA), (field) programmable gate array ((F) PGA), graphics Processor Unit (GPU), application Specific Integrated Circuit (ASIC), integrated Circuit (IC), or system on a chip (SoC) system programmed to perform the steps in the methods described above.

It should be further understood that the disclosure of several steps, processes, operations, or functions disclosed in the specification or claims should not be interpreted as implying that such operations must be dependent on the order of description unless explicitly stated in a particular case or for technical reasons. Therefore, the foregoing description does not limit the execution of the steps or functions to a particular order. Furthermore, in other examples, a single step, function, procedure, or operation may include and/or be broken into multiple sub-steps, sub-functions, sub-procedures, or sub-operations.

If some aspects have been described with respect to a device or system, those aspects should also be understood as descriptions of corresponding methods. For example, a block, a device or a functional aspect of a device or system may correspond to a feature of a respective method, such as a method step. Thus, aspects described with respect to a method should also be understood as a description of a corresponding block, corresponding element, attribute, or functional feature of a corresponding apparatus or corresponding system.

The following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate example. It should also be noted that although a dependent claim refers to a particular combination of one or more other claims in the claims, other examples may also include combinations of the dependent claim with the subject matter of any other dependent or independent claim. Such combinations are expressly set forth herein unless otherwise indicated in individual instances no particular combination is intended. Furthermore, the features of one claim should also be included in any other independent claim, even if that claim is not directly limited to being dependent on the other independent claim.

Claims

1. A method for adjusting a service consumed by a human, the method comprising:

capturing optical event data of the human using an event-based sensor;

Analyzing the optical event data to detect a predefined emotional pattern;

determining information about an emotional state of the human based on the emotional pattern detected in the optical event data;

The service is adjusted based on the information about the emotional state of the human.

2. The method of claim 1, wherein the service is a game, a movie, a smart home service, or a meta universe.

3. The method of claim 1, wherein the capturing comprises capturing optical event data from a plurality of event-based sensors.

4. The method of claim 1, wherein the capturing comprises capturing optical event data in a three-dimensional data volume comprising a time component.

5. The method of claim 1, wherein the optical event data comprises pixel level polarity data regarding brightness or contrast changes.

6. The method of claim 1, wherein the optical event data has a sampling rate of at least 500 frames per second.

7. The method of claim 1, wherein the analyzing comprises determining micro-actions in the optical event data to detect the predefined emotional pattern.

8. The method of claim 7, wherein the micro-motion is one or more elements of the group of the human mouth, eyes, eyebrows, eyelids, lips, mandible, skin, cheeks, and nose.

9. The method of claim 1, wherein the analyzing comprises analyzing a facial expression of the human to detect the emotional state.

10. The method of claim 1, wherein the analyzing comprises analyzing a body posture or gesture of the human to detect the emotional state.

11. The method of claim 1, wherein the adjustment to the service comprises one or more elements from the group of:

increasing or decreasing the agility of the game character,

The volume of the audio is increased or decreased,

Increasing or decreasing the speed of speaking or audio content,

The style of the music is changed and,

The room temperature is adjusted to be the same as the room temperature,

The brightness and/or color of the light is adjusted,

A special effect is provided in the service in question,

Adjusting the avatar of the human, and

Introducing other humans to the human.

12. A computer program having a program code for performing the method of claim 1 when the computer program is executed in a computer, a processor or a programmable hardware component.

13. An apparatus for adjusting a service consumed by a human, the apparatus comprising circuitry configured to perform the method of claim 1.

14. The apparatus of claim 13, further comprising a plurality of event-based optical sensors having different fields of view for a user.

15. A robot comprising the apparatus of claim 12.

16. A vehicle comprising the apparatus of claim 12.

17. An entertainment system comprising the apparatus of claim 12.

18. A smart home system comprising the apparatus of claim 12.