Detailed Description
In order to more clearly illustrate the general inventive concept, a detailed description is given below by way of example with reference to the accompanying drawings.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a commercial super service system, referring to fig. 1, which may include:
The image acquisition module 10 is arranged at a first preset position in the business process in a distributed manner and is used for acquiring an environment image of the business process, wherein the environment information comprises a personnel behavior image and a commodity state image; the audio acquisition module 20 is arranged at a second preset position in the business super in a distributed manner and is used for acquiring audio information in the business super; the data analysis module 30 is respectively connected with the image acquisition module and the audio acquisition module and is used for generating a customer service strategy and a commodity management strategy based on the environment information and the audio information; and an execution module 40, connected to the data analysis unit, for executing the customer service policy and the commodity management policy.
The image acquisition module 10 may be a camera disposed in a business super, a self-carried camera of a business super service robot, etc., and the business super environment image acquired by the image acquisition module 10 may include an image of a customer and an image of a commodity, specifically, may include a behavior image of the customer and a commodity status image.
The acquisition of the behavior image can be acquired through a camera installed in a store, wherein the acquisition mode can comprise acquisition of video streams or images of a user when the user purchases goods and acts in the business. For example, when a user enters a store to start purchasing a commodity, a camera in the store collects images or videos of customers entering a shooting area of the camera as behavior information of the user in the store. Taking the commodity purchasing behavior as an example, when the first target action of the user is detected, the behavior information of the user is acquired, and when the second target action of the user is generated, the behavior information of the user is acquired. Specifically, when the first target action of the user is detected, the user can be characterized as starting to purchase the commodity, for example, the user stays for more than a preset time period before a certain commodity or a certain area, or the user picks up a certain commodity. Upon detecting that the user has occurred a second target action, the user may be characterized as pausing the purchase of the merchandise, e.g., the user taking the merchandise or being replaced on a shelf. In this embodiment, behavior information of the user between the first target action and the second target action may be acquired. In this embodiment, the action information of the user in the business super can also be collected, for example, gait images during walking, position information in the business super, face information, and the like.
The real-time state image of the goods shelf can be acquired by utilizing the camera for acquiring the goods state image in real time. As an alternative embodiment, a video stream containing shelves is acquired; and intercepting shelf images in the video stream as the real-time state images according to preset coordinates at intervals of preset time. For example, a video stream of a preset frame number may be intercepted at intervals of a preset duration, and the position of the shelf in the image may be calibrated in the image of the video stream, for example, the upper left corner coordinate and the lower right corner coordinate of the shelf may be calibrated in the image, and the shelf image may be intercepted through the calibrated coordinates to be used as a real-time status image of the shelf. Wherein the real-time status image of the shelf may include an image of the status of the merchandise on the shelf.
The audio collection module 20 may be a microphone installed near the image collection device, and may collect various audio information within the business in real time. After obtaining the environmental image collected by the image information and the audio information collected by the audio collection module 20, the data analysis module 30 may analyze the behavior image and the commodity status image of the customer in the environmental image, and then, by combining the audio information, may analyze the service required by the customer and the management required by the commodity, and sequentially generate a service policy and a commodity management policy, and execute the customer service policy and the commodity management policy through the execution module 40. Thereby reducing the labor cost of the business process and improving the management and service efficiency.
As an exemplary embodiment, as shown in fig. 2, the data analysis module 30 may include a behavior analysis unit 31, and the behavior analysis unit 31 may construct a user portrait based on a behavior image and/or the audio information, and provide service information corresponding to a user based on a comparison result of the user portrait and a preset user portrait, the service information including: at least one of merchandise advertisement, super preference information of an merchant, merchandise recommendation information and service recommendation information.
For example, after the environmental image and the audio information are obtained, the environmental image is identified, for example, information such as behavior information, face information, age, sex and the like of a customer in the environmental image can be identified, information required by the customer can be identified based on the audio information, for example, environmental sound can be collected through multiple microphones, effective human sound can be extracted through a noise reduction algorithm, and voice recognition is performed. Current demand information of the customer can be obtained. Therefore, the current user image of the customer can be constructed based on the analyzed behavior information, face information, age, gender and other information of the customer and the current demand information, and then the current user image is compared with the pre-stored user images to obtain the pre-user images with the similarity larger than the preset similarity, and the service corresponding to the preset user image is provided based on the obtained preset user images, for example, corresponding commodity advertisement, super-preference information of the commodity, commodity recommendation information, service recommendation information and the like are provided for the user. The user experience is improved, and the knowledge of customers to the business surpasses is further enhanced.
As an exemplary embodiment, the data analysis module 30 may further include a commodity analysis unit 32 for identifying commodity states based on the commodity state images and checking commodities based on the commodity states, the commodity states including: at least one of a commodity out-of-stock condition, commodity uniformity, and commodity name. Specifically, after obtaining the commodity state image, inputting the real-time commodity state image into a pre-trained commodity state detection model to obtain commodity state information, where the commodity state detection model may adopt a neural network model, for example, a convolutional neural network model, a deep convolutional neural network model, and the like. The commodity state detection model may be pre-trained using training samples. Specifically, the model may be trained by collecting labeled data of the status of the merchandise being out of stock, the uniformity of the merchandise, and the name of the merchandise, and in this embodiment, the training process may be described by taking the detection of the absence as an example:
And (3) manually marking commodity and the stock shortage position in the training image, and training the shelf model in a hardware environment with the GPU so as to improve the convergence rate of model training. The network is trained using a small batch (Mini-batch) random gradient descent method (SGD) that brings a quantity factor (Momentum). Wherein the number of samples per batch of image (Batchsize) is set to 8, the momentum factor is set to a fixed value of 0.95, and the weight decay (Decay) is 4 x 10-4. The initialization of the weights affects the convergence rate of model training, the initial learning rate (LEARNINGRATE) is set to 2×10 -4, and during training, data enhancement is performed by using random clipping and horizontal flipping. The size of the finally obtained training image is 608 x 608, the loss basically converges to a stable value and is smaller than 0.1, and the network model is proved to reach the expected training effect and stops training, so that the final model is obtained. In this embodiment, a Histogram of Oriented Gradient (HOG) feature is used, and image features do not need to be manually analyzed, and meanwhile sensitivity to image color and brightness change is small, and robustness is high.
As an alternative embodiment, during training, the status of the merchandise shortage, the uniformity of the merchandise, and the tag of the merchandise name may be marked in the training sample image, so as to perform training. After the real-time state of the commodity is obtained, automatic commodity counting is carried out based on the real-time state of the commodity, so that manual checking of the commodity one by one can be omitted, the commodity counting is carried out one by one, labor cost is saved, and commodity counting efficiency is improved.
In the above embodiment, the commodity analysis unit 32 may identify a commodity state, the behavior analysis unit 31 may also identify a mood category when the user purchases a commodity based on the behavior image and the audio information, and combine the commodity state and the purchase of a certain commodity so that the mood category may determine a preference degree of the user for a certain commodity or a certain class of commodity, and as an exemplary embodiment, the behavior analysis unit 31 is further configured to identify a mood category when the user purchases a commodity based on the behavior image and the audio information, and evaluate a preference degree of the user for the commodity based on the mood category; and carrying out real-time marketing interaction with the user by combining the marketing database according to the preference degree.
After the behavioral image and the audio information of the selected commodity of the user are acquired, the biological characteristic information of the user when the commodity is checked and displayed can be extracted based on the behavioral image and the audio information, the biological characteristic can be the biological characteristic information, such as facial expression, spirit, limb actions and the like, of the user when the commodity is selected and displayed, and the biological characteristic information displayed when the commodity is selected and viewed can reflect the attitude of the user to the current commodity, such as like, dislike or neutral. As an exemplary embodiment, an extraction method corresponding to a type of a biological feature may be used for extracting the biological feature, and for extracting a face image or a limb motion, for example, extraction may be performed based on a target detection algorithm, for example, a fast R-CNN model, an SSD model, or a YOLO model may be used for detection. And denoising the voice information, and classifying by adopting a voice classification model to obtain the user audio information. The above-mentioned method for extracting the biometric information is merely illustrative, and other methods for extracting the biometric information from the audio, video or image are also applicable in the present embodiment.
For the recognition of the emotion type, an emotion recognition model can be adopted, the emotion recognition model can be trained based on sample data, a large amount of biological feature information is used as a training sample, the emotion type is used as the output of the emotion recognition model to train the emotion recognition model, in the embodiment, the biological feature information can be of various types, the corresponding emotion recognition model can be a plurality of models corresponding to the biological feature information one by one, the model can also be one model, and the training is carried out by adopting various biological feature information together. In this embodiment, the emotion recognition model may employ a two-classification or multi-classification convolutional neural network model. In this embodiment, after the emotion recognition model is trained, the extracted biometric information may be input into the model to obtain the confidence level of the emotion type corresponding to the biometric information, and the emotion type with the confidence level greater than the preset value may be used as the output of the emotion recognition model. In embodiments of the present application, emotion categories may be categorized as, for example, likes and dislikes. Multiple classification results are also possible, e.g., like, offensive, neutral, etc.
The identification of the emotion type is based on the emotion exhibited when the user purchases and views a certain commodity, and the emotion is associated with the displayed commodity, so that the current attitude of the user to the displayed commodity can be obtained. The commodity that needs or is interested at present can be accurately known to the user, commodity that the user is not interested at present also can be accurately known, user demand can be predicted in real time to give the user real-time demand and carry out targeted marketing. To enable more targeted marketing.
As an exemplary embodiment, emotion classification may be performed by fusing multiple aspects of features for recognition, and illustratively, facial features, voiceprint features and gesture features are respectively extracted based on the face image, voice information and limb actions; inputting the facial features into a pre-trained facial emotion recognition model to obtain facial emotion categories; inputting the voiceprint features into a pre-trained voice emotion recognition model to obtain voice emotion categories; inputting the gesture features into a pre-trained gesture emotion recognition model to obtain gesture emotion categories; and carrying out emotion fusion on the facial emotion category, the voice emotion category and the gesture emotion category based on an emotion fusion model to obtain a user emotion category.
When the emotion categories are fused, the face gesture, the voice definition and the action amplitude can be respectively determined based on the face image, the voice information and the limb actions; determining real-time fusion weights of the facial emotion type, the voice emotion type and the posture emotion type in the emotion fusion model based on the face gesture, the voice definition and the action amplitude; and carrying out emotion fusion on the facial emotion category, the voice emotion category and the gesture emotion category based on the real-time fusion weight and the emotion fusion model to obtain a user emotion category. Specifically, the weight determination may be performed by acquiring preset fusion weights corresponding to the facial emotion category, the voice emotion category and the gesture emotion category respectively; comparing the face gesture, the voice definition and the action amplitude with a face reference gesture, a voice reference definition and an action reference amplitude respectively; and adjusting the preset fusion weight based on a comparison result to obtain the real-time fusion weight, wherein the sum of the fusion weights is 1. The actually extracted face gesture, voice definition and action amplitude are respectively compared with the face reference gesture voice definition reference definition and action reference amplitude in a preset state, and a corresponding preset fusion weight adjustment proportion is obtained. For example, when the face integrity is sixty percent of the face reference integrity, the corresponding proportion of the preset fusion weight of the face emotion type is reduced, and the preset fusion weight of the voice emotion type and the preset fusion weight of the gesture emotion type can be adjusted according to the proportion. In this embodiment, the adjustment may be performed according to a proportion, or may be performed according to another manner, so as to obtain an actual fusion weight of each emotion category.
As an exemplary embodiment, the commodity analysis unit 32 is further configured to determine the sales heat of each commodity and the sales heat of the shelf location based on the commodity status; and determining the type of the commodity and the placement position of various commodities of the current goods shelf by combining the sales heat of the commodities and the sales heat of the goods shelf position, so as to realize more reasonable management of the commodities.
And carrying out emotion fusion on the facial emotion category, the voice emotion category and the gesture emotion category based on the actual fusion weight and the emotion fusion model to obtain a user emotion category. In an exemplary embodiment, the real-time adjustment is performed on each emotion category during fusion based on the real-time fusion weight, so that the influence of the shooting environment on the prediction result can be reduced, the accuracy of determining the real-time emotion state of the user is further improved, and the accuracy of the marketing strategy for the user in real time is further improved.
For example, after obtaining the commodity status, the sales heat of each commodity and the sales heat of the shelf location may be determined by counting the backorder degree, the backorder location, etc. of each commodity in a certain period, for example, a day, a week, or a longer or shorter period, where the sales heat of the shelf location may be a location where the probability of occurrence of the backorder location of the shelf is greater, for example, a shelf top location, a shelf middle layer location, etc. After the sales heat of each commodity is obtained, selecting the commodity with the sales heat being greater than the preset heat as a hot commodity, and analyzing the matched commodity of the current commodity. For example, if the milk is a hot-sold product, the matched product can be breakfast bread or breakfast biscuits; for example, wine is a hot commodity and its collocation commodity can be a corresponding snack, etc. The method and the system are used for improving the overall vector of the overall goods, and adjusting the placement of the goods based on feedback of the identified purchase information of the user (based on the goods state, such as the goods shortage, the goods shortage position and the like), so that the goods placement meets the purchase requirement of the customer.
As an exemplary embodiment, the user's preferences may also be analyzed based on the environmental image to enable more rational management of the merchandise according to the user's preferences. Illustratively, the data analysis module 30 further includes: a user preference analysis unit 33, configured to extract user face information, gait information, position information, and time information based on the behavior image, and obtain shopping preference information of the user based on the face information, the gait information, the position information, the time information, and the commodity status, where the shopping preference includes: the users with different user portraits are in at least one of super shopping state, shopping period and shopping commodity type preference.
When the user portrait, the user shopping preference information, the emotion type of the user shopping and the commodity state are obtained, commodity stock exceeding the commodity can be predicted based on the analysis result, for example, the shopping preference information and the commodity state are counted to obtain a counting result; and predicting the types of the goods for replenishment and the placement positions of various goods in a preset period based on the statistical result. As an exemplary embodiment, user portraits, user shopping preference information, emotion categories at the time of user shopping, and commodity states for a period of time may be counted, for example, classification of user portraits, shopping preferences of each category of user portraits, favorite commodities, shopping periods, commodities purchased for each shopping period, and the like may be counted. The user of different user portraits can be prepared or placed according to the time period based on the statistical result. The method can better carry out stock and commodity placement aiming at different customers, improves the shopping experience of users, and increases the overstock performance of the customers. In this embodiment, targeted promotion and preference can also be provided based on predicted user portraits and user shopping preferences for a certain period of time, and the like.
The description is given by way of example with reference to specific examples: the shopping big data such as shopping time periods, shopping commodity information, emotion types during commodity shopping and sales heat of goods shelf positions of users of different user portraits can be collected, favorite commodities, offensive commodities, purchased commodities, the positions of the enthusiasm or goods shelf positions and the like of the users of different user portraits in different shopping time periods can be obtained, and the types of important goods and the positions of the commodities can be specifically adjusted according to the shopping big data analysis.
The business super service system may also implement automatic shop watching, and specifically, the data analysis module 30 is further configured to input the environmental image into an early warning detection module to obtain an early warning detection result, where the early warning detection model is obtained based on training of image information marked with early warning features, and the early warning features include: fire early warning features or theft early warning features; when the early warning detection result has an early warning condition, the execution module 40 executes early warning.
Taking a theft early warning feature as an example, the camera can monitor the business process, and meanwhile, algorithms such as human body recognition and the like are utilized to judge whether personnel are active in the business process or not, and judge whether the activity meets the preset theft feature or not. And when the voice prompt is matched, services such as voice prompt, remote call or abnormal alarm can be carried out.
As an exemplary embodiment, the image acquisition module 10 and the audio acquisition module 20 in the commercial super service system are integrated in a service robot, and the service robot further comprises a display device for displaying customer service contents. Exemplary, a robot includes: a screen, a microphone sensor, a plurality of cameras, a loudspeaker, a control unit and a battery. The screen is used for displaying the robot expression, commodity information and other information. The microphone sensor is used to capture sounds in the environment for speech recognition. The camera is used for monitoring the environment, and the acquired image is used for image processing. Speakers are used to play various audio. The control unit is the whole robot core and is responsible for processing the data of various sensors, carrying out data interaction with a background server through a network and simultaneously being responsible for displaying a screen.
As an exemplary embodiment, a data analysis module 30 may be included in the background server for generating customer service policies and merchandise management policies from the environmental image and the audio information.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the present embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The application can be realized by adopting or referring to the prior art at the places which are not described in the application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.