Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to facilitate understanding of the hot music mining method provided by the present application, a system used by the hot music mining method will be described below. Referring to fig. 1, an architecture diagram of a hotspot music mining system according to an embodiment of the present application is shown, and as shown in fig. 1, the architecture diagram includes a server 10.
In the application, the server 10 is used for executing the hot music mining method, which comprises the steps of determining a seed user from all users of a music platform, determining music to be mined based on operation behavior data of the seed user, constructing a target feature vector of the music to be mined based on the operation behavior data and basic information of the music to be mined, and inputting the target feature vector into a trained neural network model to obtain a prediction result of potential hot music.
It may be appreciated that the server 10 may be further provided with a neural network model for predicting potential hot music, where an input of the neural network model is a target feature vector constructed based on the operation behavior data of the music to be mined by the seed user and the basic information of the music to be mined, and an output of the neural network model may be a potential value corresponding to each music to be mined, where the potential value is greater than a preset potential value, and the music to be mined is the potential hot music.
Further, the hot music mining system may further include a plurality of clients 20 that establish communication connection with the server 10, where the clients 20 may include a fixed terminal such as a PC (chinese full name: personal computer, english full name: personal Computer) and a mobile terminal such as a mobile phone, for displaying the potential hot music.
The embodiment of the application discloses a hot music mining method, which improves the accuracy of hot music mining.
Referring to fig. 2, a flowchart of a hot music mining method provided by an embodiment of the present application, as shown in fig. 2, includes:
S101, determining a seed user in all users of a music platform, and constructing a first feature vector based on operation behavior data of the seed user;
The implementation main body of the embodiment is the server, and aims to screen potential hot music in the music platform. In this step, the seed user is first determined among all users in the music platform, and since the seed user has a strong appreciation capability or influence on music, the music related to the seed user has a greater chance of becoming potential hot music.
As a possible implementation manner, the method for determining the seed users in all users of the music platform comprises the steps of determining current hot music in the music platform and a cold time period corresponding to the current hot music, and determining the user with operation behaviors on the current hot music in the cold time period as the seed user. In the present embodiment, a user who successfully predicts the current popular music, which is a user having a prospective operation behavior, is regarded as a high-quality user having a priori knowledge, and the high-quality user is regarded as a seed user. The standard of the current hot music may be music with a current click rate greater than a preset click rate, and the prospective operation behavior includes playing, commonly playing, commenting, collecting or sharing the current hot music in a cold time period of the current hot music, where the click rate of the current hot music is less than the preset click rate. In specific implementation, the current hot music is analyzed in the music platform, the corresponding cold time period is determined by tracing backwards in time, and the user with the operation behavior of the current hot music in the cold time period is the high-grade user. As a preferred embodiment, the number and duration of the aforementioned prospective operation actions may be analyzed, and the user whose number of prospective operation actions is greater than a preset number of times, or whose duration is greater than a preset time, or a combination of both may be determined as the seed user.
As another possible implementation, the determining the seed user among all users of the music platform includes determining the quality music content in the music platform and determining the user having an operation behavior on the quality music content as the seed user. In this embodiment, content (User Generated Content, UGC) produced by an ordinary user in the music platform is analyzed to screen out high-quality music content. The high-quality music content can include the number of high-quality comments produced by the user, the number of high-quality singing videos or podcasts released, and the like, and the user producing the high-quality music content or the user commenting, collecting or analyzing the music content is regarded as a high-creativity user, and the high-creativity user is regarded as a seed user.
As a further possible implementation manner, the step of determining the seed user in all users of the music platform comprises the step of determining the user with the influence greater than the preset influence in the music platform as the seed user, wherein the influence comprises the influence in the music platform and/or the influence in other platforms. In this embodiment, the influence of all users in the music platform is analyzed, and the user with the larger influence is used as the seed user. In a specific implementation, the influence of the user may be represented by the attention amount, the number of works or the interaction data of the user, where the influence may include the influence of the user in the music platform, or may include the influence of the user in other platforms, that is, in this embodiment, the influence of the user on the music platform may be analyzed, or the influence of the user on other platforms may be analyzed.
Furthermore, the embodiment may further analyze the popularity, freshness, coldness, etc. of the song listening preference of the user, so as to screen out the seed users having the opportunity to predict the popular music, which is not limited in detail.
On the basis of the step, the method is used as a preferred implementation mode, after the seed users are determined in all users of the music platform, the method further comprises the steps of constructing user characteristics of the seed users based on user information and music preference portraits of the seed users, selecting non-seed training users from non-seed users, constructing user characteristics of the non-seed training users based on the user information and the music preference portraits of the non-seed training users, training a classification model by utilizing the user characteristics of the seed users and the user characteristics of the non-seed training users, inputting the user information and the music preference portraits of the non-seed users except the non-seed training users into the classification model after training to predict extended seed users, and adding the extended seed users into the seed users.
In implementations, the user characteristics are built based on user information of the user, which may include age, gender, region, school, etc., and music preference portraits for describing music preferred by the user, which may include a preferred music style, a preferred singer type, etc., of the user, without limitation. The training of the classification model is performed with the user features of the seed users as positive samples and the user features of some non-seed users as negative samples, where the classification model may include a logistic regression model, etc., and is not specifically limited herein. The trained classification model is used for predicting the rest non-seed users, and the expanded seed users with low taste, creativity and influence but comprehensive capability are added into the seed users. Furthermore, the attention relationship network and the sharing propagation network of the seed user can be analyzed, and the seed user can be expanded in the attention relationship network and the sharing propagation network of the seed user.
Next, a first feature vector is constructed based on operational behavior data of the seed user, each piece of operational behavior data including an operational behavior, the seed user performing the operational behavior, and music being operated. The method for constructing the first feature vector based on the operation behavior data of the seed user comprises the steps of constructing a network topology based on the operation behavior data of the seed user, wherein nodes in the network topology represent seed users or operated music executing the operation behaviors, edges between the nodes represent the operation behaviors, traversing the network topology according to preset rules to obtain a music sequence to be mined, and carrying out feature representation on the music sequence to be mined to obtain the first feature vector. In specific implementation, a network topology is constructed based on operation behavior data of a seed user on music, nodes of the network topology are the seed user and the music operated by the seed user, namely the music to be mined, and edges between the nodes represent operation behaviors of the seed user on the music to be mined, so that a music sequence to be mined can be determined by traversing the network topology through deepwalk or node2vec and other algorithms, wherein the music sequence comprises a plurality of identifications of the music to be mined. Further, the word2vec algorithm is used for carrying out feature representation on the music sequence to be mined to obtain a second feature vector.
S102, determining music to be mined based on the operation behavior data of the seed user, and constructing a second feature vector based on the basic information of the music to be mined;
In this step, music to be mined is determined based on the operation behavior data of the seed user, that is, music related to the seed user is determined as music to be mined, and the music to be mined may be determined based on the music sequence to be mined obtained after traversing the network topology in the previous step. The basic information of the music to be mined, which may include content knowledge maps, audio, lyrics text, etc., which may include a composer, singer, language, genre tag, etc., of the musical piece, may be characterized using a convolutional neural network (CNN, convolutionalNeuralNetworks), a transformer-based bi-directional encoder representation technique (BERT, bidirectional Encoder Representations from Transformers), a graph neural network (GNN, convolutional Neural Networks), etc., respectively, to construct a second feature vector, without being specifically limited thereto.
S103, splicing the first feature vector and the second feature vector to obtain a target feature vector of the music to be mined;
It should be noted that, constructing the first feature vector based on the operation behavior data of the seed user ensures that the music has a certain propagation degree and a detonation point, constructing the second feature vector based on the basic information of the music to be mined ensures that the music production quality is high enough, and the target feature vector obtained by splicing the first feature vector and the second feature vector can be used for evaluating whether the music is potential hot music.
And S104, inputting the target feature vector into a trained neural network model to obtain a prediction result of the potential hot spot music.
In implementations, a neural network model is used to predict potential hot music. In the step, a target feature vector constructed based on the operation behavior data of the seed user and the basic information of the music to be mined is input into a neural network model after training is completed, and a prediction result of the potential hot spot music is obtained.
As a possible implementation manner, the method can include the steps of obtaining a potential value corresponding to each piece of music to be mined from a prediction result neural network model input by the target feature vector, and determining the piece of music to be mined with the potential value being larger than a preset potential value as potential hot spot music. In a specific implementation, each row in the target feature vector corresponds to one piece of music to be mined, the neural network model is used for predicting potential values of the music to be mined, namely, the target feature vector is input into the neural network model to obtain the potential value corresponding to each piece of music to be mined, and the music to be mined with the potential value larger than a preset potential value is determined to be potential hot music.
According to the hot spot music mining method provided by the embodiment of the application, the seed users are firstly selected from the music platform, and the probability that the music to be mined is determined to be potential hot spot music based on the seed operation behavior data is larger because the appreciation capability or influence of the seed users on the music is stronger. And constructing a target feature vector based on the operation behavior data of the seed user to be mined and the basic information of the music to be mined, and inputting the target feature vector into a neural network model to obtain potential hot music. The construction of the target feature vector based on the operation behavior data of the seed user ensures that music has a certain propagation degree and a detonation point, the construction of the target feature vector based on the basic information of the music to be mined ensures that the music production quality is high enough, and the combination of the two can improve the accuracy of mining hot music.
The embodiment of the application discloses a hot music mining method, which further describes and optimizes a technical scheme relative to the previous embodiments. Specific:
Referring to fig. 3, a flowchart of another hot music mining method provided by an embodiment of the present application, as shown in fig. 3, includes:
s201, determining seed users in all users of a music platform;
S202, distributing corresponding weights for each type of operation behavior data, and constructing a network topology based on the operation behavior data of the seed user and the weights corresponding to each type of operation behavior data;
in this embodiment, a corresponding weight may be allocated to each type of operation behavior data, for example, the weight of the complete playing is greater than that of the normal playing, the shared weight is greater than that of the comment, and specific weight allocation criteria may be flexibly set by those skilled in the art according to the actual situation. And constructing a network topology based on the operation behavior data of the seed user and the weight corresponding to each type of operation behavior data, namely each side in the network topology has the corresponding weight.
S203, traversing the network topology according to the weight corresponding to each type of operation behavior data to obtain a music sequence to be mined, and carrying out feature representation on the music sequence to be mined to obtain a first feature vector;
In the step, traversing is carried out based on the weight of each edge in the network topology to obtain a music sequence to be mined, and feature representation is carried out on the music sequence to be mined so as to construct a first feature vector.
S204, determining music to be mined based on the music sequence to be mined, and constructing a second feature vector based on the basic information of the music to be mined;
S205, splicing the first feature vector and the second feature vector to obtain a target feature vector of the music to be mined;
S206, inputting the target feature vector into the trained neural network model to obtain a prediction result of the potential hot spot music.
Therefore, according to the embodiment, different weights are distributed to different types of operation behavior data, the weighted network topology can more accurately represent the operation behavior of the seed user, more accurate target feature vectors can be obtained by traversing the weighted network topology, and accordingly, the accuracy of the predicted potential hot spot music is higher.
The present embodiment describes a training process of a neural network model, specifically:
Referring to fig. 4, a flowchart of a training method of a neural network model according to an embodiment of the present application, as shown in fig. 4, includes:
S301, determining seed users in all users of a music platform, and determining current hot music, current cold music and cold time periods corresponding to the current hot music in the music platform;
In this embodiment, the current trending music in the music platform is taken as a positive sample, a certain proportion of the current trending music in the music platform is randomly extracted as a negative sample to train the neural network model, and the trained neural network model is used for predicting potential trending music.
S302, constructing training feature vectors corresponding to the current popular music based on the operation behavior data of the seed user on the current popular music and the basic information of the current popular music in the cold time period;
S303, constructing a training feature vector corresponding to the current cold music based on the operation behavior data of the seed user on the current cold music and the basic information of the current cold music;
and S304, training a neural network model by taking the training feature vector corresponding to the current hot music as a positive sample and the training feature vector corresponding to the current cold music as a negative sample, wherein the trained neural network model is used for predicting potential hot music based on the target feature vector of the music to be mined.
In specific implementation, a training feature vector corresponding to the current popular music, namely a training feature vector of a positive sample, is constructed based on the operation behavior data of the current popular music and the basic information of the current popular music of the seed user in the cold time period. And constructing a training feature vector corresponding to the current cold music, namely a training feature vector of a negative sample, based on the operation behavior data of the seed user on the current cold music and the basic information of the current cold music. And training the neural network model by utilizing training feature vectors respectively corresponding to the positive sample and the negative sample, so that the trained neural network model can predict whether the music to be mined corresponding to the input target feature vector is potential hot spot music.
The following describes a hot music mining apparatus according to an embodiment of the present application, and the hot music mining apparatus and the hot music mining method described above may be referred to each other.
Referring to fig. 5, a structure diagram of a hot music mining apparatus according to an embodiment of the present application, as shown in fig. 5, includes:
A first construction module 100, configured to determine a seed user among all users of the music platform, and construct a first feature vector based on operational behavior data of the seed user;
A second construction module 200, configured to determine music to be mined based on the operation behavior data of the seed user, and construct a second feature vector based on the basic information of the music to be mined;
The stitching module 300 is configured to stitch the first feature vector and the second feature vector to obtain a target feature vector of the music to be mined;
And the input module 400 is used for inputting the target feature vector into the trained neural network model to obtain a prediction result of the potential hot music.
According to the hot spot music mining device provided by the embodiment of the application, the seed users are firstly screened in the music platform, and the probability that the music to be mined is determined to be potential hot spot music based on the seed operation behavior data is larger because the appreciation capability or influence of the seed users on the music is stronger. And constructing a target feature vector based on the operation behavior data of the seed user to be mined and the basic information of the music to be mined, and inputting the target feature vector into a neural network model to obtain potential hot music. The construction of the target feature vector based on the operation behavior data of the seed user ensures that music has a certain propagation degree and a detonation point, the construction of the target feature vector based on the basic information of the music to be mined ensures that the music production quality is high enough, and the combination of the two can improve the accuracy of mining hot music.
On the basis of the foregoing embodiment, as a preferred implementation manner, the first construction module 100 is specifically a module for determining current hot music in a music platform and a cold time period corresponding to the current hot music, determining a user having an operation behavior on the current hot music in the cold time period as a seed user, and constructing a first feature vector based on operation behavior data of the seed user;
on the basis of the above embodiment, as a preferred implementation manner, the first construction module 100 is specifically a module for determining high-quality music content in a music platform, determining a user having an operation behavior on the high-quality music content as a seed user, and constructing a first feature vector based on operation behavior data of the seed user;
On the basis of the foregoing embodiment, as a preferred implementation manner, the first construction module 100 specifically determines a user with an influence greater than a preset influence in a music platform as a seed user, and constructs a first feature vector based on operation behavior data of the seed user, where the influence includes the influence in the music platform and/or the influence in other platforms.
On the basis of the above embodiment, as a preferred implementation manner, the method further includes:
A third construction module for constructing user characteristics of the seed user based on the user information of the seed user and the music preference portrait;
a fourth construction module, configured to select a non-seed training user from among non-seed users, and construct user features of the non-seed training user based on user information and music preference portraits of the non-seed training user;
the first training module is used for training a classification model by utilizing the user characteristics of the seed users and the user characteristics of the non-seed training users;
a prediction module for inputting user information of non-seed users except the non-seed training users and music preference portraits into the trained classification model to predict extended seed users;
and the adding module is used for adding the expanded seed users into the seed users.
On the basis of the above embodiment, as a preferred implementation manner, the first building module 100 includes:
A determining unit for determining a seed user among all users of the music platform;
The system comprises a construction unit, a storage unit and a processing unit, wherein the construction unit is used for constructing a network topology based on operation behavior data of the seed users, each piece of operation behavior data comprises operation behaviors, the seed users executing the operation behaviors and operated music, nodes in the network topology represent the seed users executing the operation behaviors or the operated music, and edges between the nodes represent the operation behaviors;
the traversing unit is used for traversing the network topology according to a preset rule to obtain a music sequence to be mined, and carrying out feature representation on the music sequence to be mined to obtain a first feature vector.
On the basis of the foregoing embodiment, as a preferred implementation manner, the second construction module 200 is specifically a module that determines music to be mined based on the sequence of music to be mined, and constructs a second feature vector based on the basic information of the music to be mined.
On the basis of the above embodiment, as a preferred implementation manner, the construction unit specifically allocates a corresponding weight for each type of operation behavior data, constructs a network topology based on the operation behavior data of the seed user and the weight corresponding to each type of operation behavior data, and determines a unit to be mined according to the network topology;
correspondingly, the traversing unit is specifically a unit for traversing the network topology according to the weight corresponding to each type of operation behavior data to obtain a music sequence to be mined, and carrying out feature representation on the music sequence to be mined to obtain a first feature vector.
On the basis of the foregoing embodiment, as a preferred implementation manner, the input module 400 is specifically a module that inputs the target feature vector into a trained neural network model to obtain a potential value corresponding to each piece of music to be mined, and determines the piece of music to be mined with the potential value greater than a preset potential value as potential hot music.
On the basis of the above embodiment, as a preferred implementation manner, the method further includes:
the determining module is used for determining seed users among all users of the music platform and determining current hot music, current cold music and cold time periods corresponding to the current hot music in the music platform;
A fifth construction module, configured to construct a training feature vector corresponding to the current popular music based on the operation behavior data of the seed user on the current popular music and the basic information of the current popular music in the cold time period;
A sixth construction module, configured to construct a training feature vector corresponding to the current cold music based on the operation behavior data of the seed user on the current cold music and the basic information of the current cold music;
the second training module is used for taking the training feature vector corresponding to the current hot music as a positive sample and the training feature vector corresponding to the current cold music as a negative sample to train the neural network model, and the trained neural network model is used for predicting potential hot music based on the target feature vector of the music to be mined.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The present application also provides an electronic device, referring to fig. 6, and a block diagram of an electronic device 60 provided in an embodiment of the present application, as shown in fig. 6, may include a processor 61 and a memory 62.
Processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 61 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 61 may also include a main processor, which is a processor for processing data in a wake-up state, also called a CPU (Central Processing Unit ), and a coprocessor, which is a low-power processor for processing data in a standby state. In some embodiments, the processor 61 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 61 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 62 may include one or more computer-readable storage media, which may be non-transitory. Memory 62 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 62 is at least used for storing a computer program 621, where the computer program, after being loaded and executed by the processor 61, can implement relevant steps in the hot spot music mining method performed by the server side as disclosed in any of the foregoing embodiments. In addition, the resources stored by the memory 62 may also include an operating system 622, data 623, and the like, and the storage manner may be transient storage or permanent storage. Wherein the operating system 622 may include Windows, unix, linux, etc.
In some embodiments, the electronic device 60 may further include a display 63, an input-output interface 64, a communication interface 65, a sensor 66, a power supply 67, and a communication bus 68.
Of course, the structure of the electronic device shown in fig. 6 is not limited to the electronic device in the embodiment of the present application, and the electronic device may include more or fewer components than those shown in fig. 6 or may combine some components in practical applications.
In another exemplary embodiment, a computer readable storage medium is also provided, comprising program instructions which, when executed by a processor, implement the steps of the hotspot music mining method performed by the electronic device of any of the embodiments described above.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.