CN112307816B

CN112307816B - In-vehicle image acquisition method and device, electronic equipment and storage medium

Info

Publication number: CN112307816B
Application number: CN201910687586.6A
Authority: CN
Inventors: 孙浚凯
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2024-08-20
Anticipated expiration: 2039-07-29
Also published as: CN112307816A

Abstract

The embodiment of the disclosure provides an in-vehicle image acquisition method and device, electronic equipment and storage medium, and relates to the technical field of vehicles, wherein the method comprises the following steps: acquiring at least one frame of in-vehicle image acquired by an image acquisition device and in-vehicle sound acquired by a sound pickup device; judging whether a preset condition is met or not based on at least one frame of in-vehicle image and in-vehicle sound information, and if so, controlling an image acquisition device to acquire the current frame of in-vehicle image; the preset conditions comprise: the emotion of the personnel in the vehicle is a preset target emotion; according to the method, the device, the electronic equipment and the storage medium, the emotion of the personnel in the vehicle can be identified in a multi-dimensional and multi-feature mode, the accuracy and the reliability of emotion detection are improved, the image is acquired based on the emotion of the personnel in the vehicle, and the interestingness and the experience of driving are improved.

Description

In-vehicle image acquisition method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of vehicles, and in particular relates to a method and a device for acquiring an image in a vehicle, electronic equipment and a storage medium.

Background

Along with development of automobile technology, shooting equipment is arranged in an automobile, the scene in the automobile is shot through the shooting equipment, image data in the automobile is recorded, the shot image data is stored on media such as a hard disk, and the scene in the automobile can be monitored and traced through the stored image data. In the driving process, in order to promote the pleasure and experience of driving, the scene in the car is shot when the personnel in the car are in moods such as happiness, and images are left. However, the current photographing apparatus can start photographing only according to an on command input by a user, stop photographing according to a stop command input by the user, and cannot perform photographing based on the emotion of a person in a vehicle.

Disclosure of Invention

The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides an in-vehicle image acquisition method and device, electronic equipment and storage medium.

According to an aspect of the embodiments of the present disclosure, there is provided an in-vehicle image acquisition method including: acquiring at least one frame of in-vehicle image acquired by an image acquisition device and in-vehicle sound acquired by a sound pickup device; judging whether a preset condition is met or not based on the at least one frame of in-vehicle image and the in-vehicle sound information, and if so, controlling an image acquisition device to acquire the current frame of in-vehicle image; wherein, the preset conditions include: the emotion of the personnel in the vehicle is a preset target emotion.

According to another aspect of the embodiments of the present disclosure, there is provided an in-vehicle image acquisition apparatus including: the information acquisition module is used for acquiring at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the pickup device; the image acquisition module is used for judging whether a preset condition is met or not based on the at least one frame of in-vehicle image and the in-vehicle sound information, and if so, controlling the image acquisition device to acquire the current frame of in-vehicle image; wherein, the preset conditions include: the emotion of the personnel in the vehicle is a preset target emotion.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described method.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is used for executing the method.

According to the method and device for acquiring the image in the vehicle, the electronic equipment and the storage medium, which are provided by the embodiment of the disclosure, the emotion of the person in the vehicle can be identified in multiple dimensions, the image is acquired based on the emotion of the person in the vehicle, and the interestingness and experience of the driving are improved.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing embodiments thereof in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, not to limit the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a flow chart of one embodiment of an in-vehicle image acquisition method of the present disclosure;

FIG. 2 is a flow chart of one embodiment of the present disclosure for determining whether a preset condition is met based on image and sound;

FIG. 3 is a flow chart of one embodiment of the present disclosure for identifying an emotion of a person in a vehicle based on an image;

FIG. 4 is a flow chart of one embodiment of the present disclosure for voice-based recognition of an in-vehicle human emotion;

FIG. 5 is a flow chart of one embodiment of the present disclosure of whether to make a conditional determination based on a person number determination;

FIG. 6 is a schematic structural view of one embodiment of an in-vehicle image acquisition device of the present disclosure;

FIG. 7 is a schematic diagram of the architecture of one embodiment of an image acquisition module of the present disclosure;

Fig. 8 is a block diagram of one embodiment of an electronic device of the present disclosure.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, such as a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the present disclosure are applicable to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Summary of the application

In the process of realizing the present disclosure, the inventor finds that in the driving process, the current photographing device can only start photographing according to an opening instruction input by a user, stop photographing according to a stop instruction input by the user, cannot photograph based on the emotion of a person in the vehicle, and photograph a scene in the vehicle when the person in the vehicle is in a happy emotion, and leaves an image.

The in-vehicle image acquisition method provided by the disclosure judges whether a preset condition is met based on in-vehicle images and in-vehicle sound information, and if so, controls an image acquisition device to acquire the current frame in-vehicle images, wherein the preset condition comprises: the emotion of the personnel in the vehicle is a preset target emotion; the emotion of the person in the vehicle can be identified in a multi-dimensional and multi-feature mode, and accuracy and reliability of emotion detection are improved.

Exemplary method

FIG. 1 is a flowchart of one embodiment of an in-vehicle image acquisition method of the present disclosure, the method shown in FIG. 1 comprising the steps of: s101 and S102. The steps are described separately below.

S101, at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device are acquired.

If the navigation destination is determined to be a preset target and the distance from the navigation destination is greater than a preset distance threshold, prompting the user whether to start the driving shooting function. The preset targets can be parks, tourist areas, restaurants, shopping centers and the like, the distance threshold can be 5 kilometers, 10 kilometers and the like, the prompt of whether the user starts the driving shooting function can be carried out in a voice mode, or whether the snapshot preauthorization is started is displayed on a central control display screen. In one example, the user may not be prompted to start the driving shooting function, a driving shooting function button is set in the vehicle, and the user directly presses the button to start the driving shooting function.

The image acquisition device can be a camera and the like arranged in the vehicle, and the pickup device can be a microphone array and the like of an audio system in the vehicle. Judging whether a driving shooting function is started, if so, acquiring in-vehicle monitoring images acquired by the image pickup device and in-vehicle sound information acquired by the sound pickup device, and if not, not acquiring images and sounds.

S102, judging whether a preset condition is met or not based on at least one frame of in-vehicle image and in-vehicle sound information, and if so, controlling an image acquisition device to acquire the current frame of in-vehicle image. The preset conditions include that the emotion of the person in the vehicle is a preset target emotion, and the target emotion can be a happy emotion and the like.

The in-vehicle image can be subjected to face recognition, the number of in-vehicle personnel is determined based on the face recognition result, and when the number of in-vehicle personnel is greater than 1, whether the preset condition is met or not is judged based on at least one frame of in-vehicle image and in-vehicle sound information. If only one person is in the vehicle, namely only the driver is in the vehicle, the driver needs to concentrate on the emotion during driving and keep stable, and the driver cannot communicate with other people during driving, so that the emotion of the driver is not required to be captured. When a plurality of persons are in the vehicle, communication can be performed among the plurality of persons in the running process of the vehicle. When topics of interest are discussed in the vehicle, or when the vehicle passes through a scenery, music, comments and the like are played in the vehicle, the emotion such as happiness of the personnel in the vehicle can appear, images of the emotion such as happiness of the personnel in the vehicle are collected, good moments can be recorded, and the interestingness and experience of driving are improved.

Fig. 2 is a flowchart of an embodiment of the present disclosure for determining whether a preset condition is satisfied based on an image and a sound, and the method shown in fig. 2 includes the steps of: s201 to S203. The steps will be described separately

S201, identifying the emotion of the person in the vehicle based on at least one frame of image in the vehicle, and obtaining a first emotion identification result.

S202, identifying the emotion of the person in the vehicle based on the sound in the vehicle, and obtaining a second emotion identification result.

S203, judging whether the emotion of the person in the car is a target emotion according to the first emotion recognition result and the second emotion recognition result.

In one embodiment, there are various ways to determine whether the emotion of the person in the vehicle is the target emotion. The first emotion recognition result includes: a first emotion recognition confidence level corresponding to the target emotion; the second emotion recognition result includes: and a second emotion recognition confidence level corresponding to the target emotion. The first emotion recognition confidence and the second emotion recognition confidence may be probability values of the emotion being a target emotion.

If the first emotion recognition confidence coefficient and the second emotion recognition confidence coefficient are both larger than a preset first confidence coefficient threshold value, determining the emotion of the person in the vehicle as a target emotion; or if the sum of the product of the image recognition coefficient and the first emotion recognition confidence coefficient and the product of the voice recognition coefficient and the first emotion recognition confidence coefficient is larger than a second confidence threshold value, determining the emotion of the person in the vehicle as the target emotion.

For example, the first emotion recognition confidence level is 70%, the second emotion recognition confidence level is 80%, the first confidence level threshold is 65%, and the second confidence level threshold is 70%. And if the first emotion recognition confidence coefficient 70% and the second emotion recognition confidence coefficient 80% are both larger than the preset first confidence coefficient threshold value 65%, determining that the emotion of the person in the vehicle is the target emotion.

The set image recognition coefficient is 0.6 and the voice recognition coefficient is 0.4. If the sum of the product of the image recognition coefficient 0.6 and the first emotion recognition confidence coefficient 70% and the product of the voice recognition coefficient 0.4 and the first emotion recognition confidence coefficient 80% is 0.74 and is larger than the second confidence coefficient threshold value 70%, determining that the emotion of the person in the vehicle is the target emotion.

In one embodiment, the preset conditions further include: the sound volume decibel value is larger than a preset decibel threshold value. And obtaining a volume decibel value based on the in-vehicle voice information, and controlling an image acquisition device to obtain and store the in-vehicle image of the current frame if the emotion of the in-vehicle person is determined to be the target emotion or if the emotion of the in-vehicle person is determined to be the target emotion and the volume decibel value obtained based on the in-vehicle voice information is determined to be greater than a preset decibel threshold value.

The voice of a person speaking can reflect emotion. When a person is excited, for example, when the person is happy, the voice of the person speaking is louder than the voice of the person speaking normally. And when judging that the emotion of the person in the vehicle meets the preset condition, judging that the volume exceeds the threshold value, and judging the atmosphere in the vehicle by increasing volume detection, so that the accuracy and stability of the judgment can be enhanced.

FIG. 3 is a flowchart of one embodiment of the present disclosure for identifying an emotion of a person in a vehicle based on an image, the method shown in FIG. 3 comprising the steps of: s301 to S303. The steps are described separately below.

S301, face images of all persons in the in-vehicle monitoring image are obtained.

S302, determining emotion recognition confidence degrees of all the people according to face images of all the people.

S303, determining a first emotion recognition result based on the number of people and the emotion recognition confidence of each person.

There are various methods for identifying the emotion of the person in the vehicle based on the in-vehicle monitoring image to obtain the first emotion identification result. At least one face image in the in-vehicle monitoring image is obtained, the face image is input into a trained first emotion recognition model, and the image recognition confidence coefficient which is output by the first emotion recognition model and used for representing the emotion corresponding to the face image as a target emotion is obtained. And obtaining average image emotion recognition confidence based on the number of face images and the image recognition confidence, and taking the average image emotion recognition confidence as a first emotion recognition confidence.

For example, three face images in an in-vehicle monitoring image are obtained, emotion recognition feature information is obtained from the three face images, the emotion recognition feature information is input into a trained first emotion recognition model, and three image recognition confidence levels for representing the emotion corresponding to the three face images as a target emotion are respectively 60%, 77% and 69%. An average image recognition confidence of 68.7% of three image recognition confidence levels of 60%, 77% and 69% was obtained, and the average image recognition confidence level of 68.7% was taken as the first emotion recognition confidence level.

The first emotion recognition model may be a neural network model, such as a CNN, RNN network model, or the like. The first emotion recognition model comprises an input layer neuron model, an intermediate layer neuron model and an output layer neuron model, wherein the output of each layer of neuron model is used as the input of the next layer of neuron model, and the intermediate layer of neuron model is a full-connection layer.

The method comprises the steps of detecting an in-vehicle image through a preset face detection algorithm in advance, detecting a face image, and calibrating emotion information corresponding to the face image, wherein the emotion can be high heart, sadness, fear, gas generation, surprise, aversion or contempt and the like. And generating a sample training set based on the face image and the marked emotion information, and training the neural network model based on the sample training set to obtain a trained first emotion recognition model. At least one face image in the in-car monitoring image is obtained, the face image is input into a trained first emotion recognition model, and the image recognition confidence level of the target emotion output by the first emotion recognition model, namely the probability value that the emotion is the target emotion, is obtained.

FIG. 4 is a flow chart of one embodiment of the present disclosure for voice-based recognition of an emotion of a person in a vehicle, the method shown in FIG. 4 comprising the steps of: s401 and S402. The steps are described separately below.

S401, semantic content and intonation information in the in-car sound information are obtained.

S402, obtaining a second emotion recognition result according to the semantic content and the intonation information.

Semantic content and intonation information in the in-car sound information can be obtained through a voice recognition technology, the semantic content can be text content, and the intonation information comprises volume, speech speed, tone and the like. Emotion recognition keywords and recognition intonation are extracted from the semantic content and intonation information, respectively. For example, semantic content may be analyzed to determine keywords in which the emotion of the user can be clearly indicated as emotion recognition keywords; target utterances with volume exceeding a maximum threshold and below a minimum threshold may be identified as utterances, or utterances with speech rate exceeding a set threshold may be identified as identified utterances, or the like.

The second emotion recognition model may be a neural network model, such as a CNN, RNN network model, or the like. The second emotion recognition model comprises an input layer neuron model, an intermediate layer neuron model and an output layer neuron model, wherein the output of each layer of neuron model is used as the input of the next layer of neuron model, and the intermediate layer of neuron model is a full-connection layer.

The method comprises the steps of obtaining voice information in advance, obtaining semantic content and intonation information from the voice information, extracting emotion recognition keywords and recognition intonation from the semantic content and the intonation information, and calibrating the emotion information of the emotion recognition keywords and the recognition intonation, wherein the emotion can be high heart, sadness, fear, vigilance, surprise, aversion or contempt and the like. And generating a sample training set based on the emotion recognition keywords, the recognition intonation and the marked emotion information, and training the neural network model based on the sample training set to obtain a trained second emotion recognition model. And processing the sound information in the vehicle to obtain emotion recognition keywords and recognition intonation, inputting the emotion recognition keywords and the recognition intonation into a trained second emotion recognition model, and obtaining second emotion recognition confidence degree output by the second emotion recognition model, namely a probability value that the emotion is a target emotion.

FIG. 5 is a flow chart of one embodiment of the present disclosure for determining whether to make a condition determination based on a person number, the method shown in FIG. 5 comprising the steps of: s501 to S503. The steps are described separately below.

S501, obtaining the number of people in the first vehicle according to a pressure sensor arranged in the vehicle.

For example, a pressure sensor may be provided under a seat in a vehicle, and the number of persons in the first vehicle may be obtained based on weight information acquired by the pressure sensor.

S502, obtaining the number of the second passengers in the vehicle according to at least one frame of the in-vehicle image.

S503, if the number of the first vehicle interior people is judged to be matched with the number of the second vehicle interior people, judging whether the preset condition is met or not based on the vehicle interior monitoring image and the vehicle interior sound information.

For example, by recognizing the in-vehicle image, the number of the second in-vehicle persons is determined to be four. It is determined that there are four seats sitting by the pressure sensors provided under each seat, and the four pressures collected by the pressure sensors provided under the four seats are all satisfied with the human weight section (the human weight section may be [25-150] kg), the number of people in the first vehicle is determined to be four.

If the number of people in the first vehicle is equal to the number of people in the second vehicle, the accuracy of face recognition is verified, omission of face recognition on the people in the vehicle can be avoided, and the happy emotion and the like of all the people in the vehicle can be ensured to be shot. After the first number of people in the vehicle is equal to the second number of people in the vehicle, whether the preset condition is met or not is judged based on the in-vehicle monitoring image and the in-vehicle sound information.

The number of people in the first vehicle is determined by adopting the pressure sensor, the number of people in the second vehicle is determined by adopting the face recognition technology, and emotion recognition is carried out after the two people detection results are matched, so that the two-way matching of the number of people detection can be realized, the number of people detection accuracy is higher, and the effectiveness of emotion detection can be improved.

In one embodiment, the driving photographing function is turned on in the navigation, and if it is determined that the vehicle flameout time is less than a preset time threshold and the navigation destination is not changed, it is determined that the driving photographing function is continuously valid, the preset time threshold may be 2 hours, or the like. If the preset condition is met, the image pickup device is controlled to obtain and store the current frame of in-vehicle images based on a preset snapshot strategy, or multiple frames of in-vehicle images can be obtained and stored at intervals of preset interval time, the preset interval time can be 0.1 second, 15 frames of in-vehicle images, 20 frames of in-vehicle images and the like can be obtained.

If the navigation destination is determined to be reached or the navigation is canceled and the vehicle is in a standing state, the stored in-vehicle image is displayed and the stored in-vehicle image of the current frame is prompted to be processed, the stored in-vehicle image of the current frame can be reminded through voice and the like, meanwhile, the stored in-vehicle image of the current frame is displayed on a display screen interface in the vehicle, and the user can select to save and share.

Exemplary apparatus

In one embodiment, as shown in fig. 6, the present disclosure provides an in-vehicle image acquisition apparatus comprising: an information acquisition module 601 and an image acquisition module 602. The information acquisition module 601 acquires at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device. The image acquisition module 602 determines whether a preset condition is satisfied based on at least one frame of in-vehicle image and in-vehicle sound information, and if so, the image acquisition module 602 controls the image acquisition device to acquire the current frame of in-vehicle image, where the preset condition includes: the emotion of the personnel in the vehicle is a preset target emotion. The image acquisition module 602 determines the number of in-vehicle personnel, and when the number of in-vehicle personnel is greater than 1, determines whether a preset condition is satisfied based on at least one frame of in-vehicle image and in-vehicle sound information.

In one embodiment, as shown in fig. 7, the image acquisition module 602 includes: a first emotion recognition unit 6021, a second emotion recognition unit 6022, and a target emotion judgment unit 6023. The first emotion recognition unit 6021 recognizes the emotion of the person in the vehicle based on at least one frame of the in-vehicle image, and obtains a first emotion recognition result. The second emotion recognition unit 6022 recognizes the emotion of the in-vehicle person based on the in-vehicle sound, and obtains a second emotion recognition result. The target emotion determination unit 6023 determines whether the emotion of the person in the vehicle is a target emotion according to the first emotion recognition result and the second emotion recognition result.

The target emotion judging unit 6023 obtains a sound volume decibel value based on the in-vehicle voice information, and the preset conditions further include: the sound volume decibel value is larger than a preset decibel threshold value. The first emotion recognition unit 6021 obtains face images of each person in the in-vehicle monitoring image, determines emotion recognition confidence degrees of each person according to the face images of each person, and determines a first emotion recognition result based on the number of persons and the emotion recognition confidence degrees of each person. The second emotion recognition unit 6022 obtains semantic content and intonation information in the in-vehicle sound information, and obtains a second emotion recognition result based on the semantic content and the intonation information.

The target emotion judging unit 6023 obtains the first number of persons in the vehicle from the pressure sensor provided in the vehicle, obtains the second number of persons in the vehicle from at least one frame of the in-vehicle image, and judges whether or not a preset condition is satisfied based on the in-vehicle monitoring image and the in-vehicle sound information if it is judged that the first number of persons in the vehicle matches the second number of persons in the vehicle.

Fig. 8 is a block diagram of one embodiment of an electronic device of the present disclosure, as shown in fig. 8, the electronic device 81 including one or more processors 811 and memory 812.

The processor 811 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in the electronic device 81 to perform the desired functions.

Memory 812 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example: random Access Memory (RAM) and/or cache, etc. The nonvolatile memory may include, for example: read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 811 to implement the in-vehicle image acquisition methods and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.

In one example, the electronic device 81 may further include: input devices 813, output devices 814, and the like, interconnected by a bus system and/or other forms of connection mechanisms (not shown). In addition, the input device 813 may also include, for example, a keyboard, a mouse, and the like. The output device 814 may output various information to the outside. The output device 814 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 81 relevant to the present disclosure are shown in fig. 8 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device 81 may include any other suitable components depending on the particular application.

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in an in-vehicle image acquisition method according to various embodiments of the present disclosure described in the "exemplary methods" section of the present description.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in an in-vehicle image acquisition method according to various embodiments of the present disclosure described in the above "exemplary method" section of the present disclosure.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

In-vehicle image acquisition method and device, electronic equipment and storage medium in the above embodiments, whether a preset condition is satisfied is judged based on in-vehicle images and in-vehicle sound information, if yes, an image acquisition device is controlled to acquire the current frame of in-vehicle images, and the preset condition includes: the emotion of the personnel in the vehicle is a preset target emotion; the emotion detection method has the advantages that the emotion of the personnel in the vehicle can be identified in a multi-dimensional and multi-feature mode, the accuracy and the reliability of emotion detection are improved, the image is acquired based on the emotion of the personnel in the vehicle, and the interestingness and the experience of the driving are improved.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatus, devices, and systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, changes, additions, and sub-combinations thereof.

Claims

1. An in-vehicle image acquisition method comprising:

Acquiring at least one frame of in-vehicle image acquired by an image acquisition device and in-vehicle sound acquired by a sound pickup device;

obtaining the number of people in a first vehicle according to a pressure sensor arranged in the vehicle;

obtaining a second number of people in the vehicle according to the at least one frame of in-vehicle image;

if the first number of people in the vehicle is judged to be matched with the second number of people in the vehicle, judging whether a preset condition is met or not based on the at least one frame of in-vehicle image and the in-vehicle sound, and if so, controlling an image acquisition device to acquire the current frame of in-vehicle image;

wherein, the preset conditions include: the emotion of the personnel in the vehicle is a preset target emotion;

the judging whether the preset condition is met based on the at least one frame of in-vehicle image and the in-vehicle sound comprises the following steps:

Identifying the emotion of the person in the vehicle based on the at least one frame of image in the vehicle, and obtaining a first emotion identification result;

Identifying the emotion of the person in the vehicle based on the sound in the vehicle to obtain a second emotion identification result, wherein the second emotion identification result is obtained according to semantic content and intonation information in the sound in the vehicle;

judging whether the emotion of the in-car personnel is the target emotion according to the first emotion recognition result and the second emotion recognition result.

2. The method of claim 1, the method further comprising:

Determining the number of personnel in the vehicle;

and when the number of the personnel in the vehicle is greater than 1, judging whether a preset condition is met or not based on the at least one frame of image in the vehicle and the sound in the vehicle.

3. The method of claim 2, further comprising:

obtaining a volume decibel value based on the in-vehicle sound;

the preset conditions further include: the sound volume decibel value is larger than a preset decibel threshold value.

4. The method of claim 1, wherein the identifying the emotion of the person in the vehicle based on the at least one frame of in-vehicle image, the obtaining a first emotion identification result comprising:

obtaining face images of all persons in the at least one frame of in-car image;

determining emotion recognition confidence degrees of all the people according to the face images of all the people;

And determining the first emotion recognition result based on the number of people and the emotion recognition confidence of each person.

5. The method of claim 1, wherein the identifying the emotion of the in-vehicle person based on the in-vehicle sound comprises:

acquiring semantic content and intonation information in the in-car sound;

and obtaining the second emotion recognition result according to the semantic content and the intonation information.

6. An in-vehicle image acquisition apparatus comprising:

The information acquisition module is used for acquiring at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the pickup device;

The image acquisition module is used for acquiring the number of people in the first vehicle according to the pressure sensor arranged in the vehicle; obtaining a second number of people in the vehicle according to the at least one frame of in-vehicle image; if the first number of people in the vehicle is judged to be matched with the second number of people in the vehicle, judging whether a preset condition is met or not based on the at least one frame of in-vehicle image and the in-vehicle sound, and if so, controlling an image acquisition device to acquire the current frame of in-vehicle image; wherein, the preset conditions include: the emotion of the personnel in the vehicle is a preset target emotion;

The image acquisition module comprises a first emotion recognition unit, a second emotion recognition unit and a target emotion judgment unit;

the first emotion recognition unit is used for recognizing the emotion of the person in the vehicle based on the at least one frame of image in the vehicle to obtain a first emotion recognition result;

The second emotion recognition unit is configured to recognize emotion of an in-vehicle person based on the in-vehicle sound, and obtain a second emotion recognition result, where the second emotion recognition result is obtained according to semantic content and intonation information in the in-vehicle sound:

The target emotion judging unit is used for judging whether the emotion of the person in the vehicle is the target emotion according to the first emotion recognition result and the second emotion recognition result.

7. A computer readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-5.

8. An electronic device, the electronic device comprising:

A processor;

a memory for storing the processor-executable instructions;

The processor being configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-5.