Background
The Robot Process Automation (RPA) is a new type of artificial intelligent virtual Process Automation robot, which is used to simulate the operation of human on computer and automatically execute the Process task according to the rule.
Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. With the development of artificial intelligence, various intelligent interactive robots are also developed rapidly. In order to satisfy the interaction between the intelligent interactive robot and the user, a knowledge base needs to be established to deal with the various problems of the user, and a large amount of linguistic data is often required to be processed before the knowledge base is established.
At present, when a large amount of corpora are faced, the large amount of corpora are different in source, and are derived from a database, log data and front-end page data, so that the format presentation of the corpora data is diversified. Therefore, in the prior art, in the face of corpus data with diversified formats, professional personnel can perform processing such as corpus cleaning and clustering, and cannot realize automatic processing, so that the processing efficiency is low, and the processing accuracy is low.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present disclosure provide a method and an apparatus for processing corpus data in combination with RPA and AI, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for processing corpus data in combination with an RPA and an AI, including:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and acquiring target corpus data;
determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data;
processing the target corpus data according to the processing parameters corresponding to the corresponding format template;
and displaying the processing result of the target corpus data.
In an optional embodiment, determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
determining format information of the target corpus data;
and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data.
In an optional embodiment, determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
receiving trigger operation of a template selection option in the uploading interactive interface;
and responding to the trigger operation of the template selection option, and determining a format template corresponding to the target corpus data.
In an optional embodiment, the obtaining target corpus data in response to a trigger operation of a data upload option in an upload interactive interface includes:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and displaying NLP corpus data in the uploading interactive interface;
and receiving selection operation of target corpus data in the NLP corpus data in an uploading interactive interface, and acquiring the target corpus data.
In an optional embodiment, the processing the target corpus data according to the processing parameter corresponding to the corresponding format template includes:
receiving the trigger operation of the cleaning parameter configuration options in the cleaning interactive interface under the corresponding format template;
and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
In an optional embodiment, if the corresponding format template is the first format template, the washing parameter configuration options include any one or more of the following configuration options:
the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option;
the data classification storage options include: the method comprises the following steps of selecting the proportion of training data and test data and/or selecting the number of files of target corpus data after storage processing;
the data replacement options include any one of the following replacement options:
a preset symbol replacement option, a preset number replacement option, a preset character replacement option, a telephone number replacement option and a website address replacement option;
the deletion options include any one or more of the following deletion options:
a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
In an optional embodiment, if the corresponding format template is a second format template, the washing parameter configuration options further include any one or more of the following configuration options:
the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
In an optional embodiment, the processing the target corpus data according to the processing parameter corresponding to the corresponding format template includes:
receiving triggering operation of clustering options in the clustering interactive interface under the corresponding format template;
responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm;
and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
In an optional embodiment, the displaying the processing result of the target corpus data includes:
analyzing the processing result to form at least one processing result to be displayed;
receiving a display option triggering operation of the processing result to be displayed in a result interaction interface;
and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
In an optional embodiment, the processing result to be displayed includes: clustering results, data condition analysis results and high-frequency corpus problem results.
In a second aspect, the present disclosure provides a processing apparatus for corpus data in combination with RPA and AI, comprising:
the data acquisition module is used for responding to the triggering operation of the data uploading option in the uploading interactive interface and acquiring target corpus data;
the data processing module is used for determining a format template corresponding to the target corpus data from a plurality of preset format templates and processing the target corpus data according to processing parameters corresponding to the corresponding format template; the preset format templates are determined according to format information of historical corpus data;
and the result display module is used for displaying the processing result of the target corpus data.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the preceding claims.
In a fourth aspect, the disclosed embodiments provide a computer readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the method as described in any one of the preceding claims.
The embodiment of the disclosure provides a corpus data processing method, a corpus data processing device, electronic equipment and a storage medium, wherein target corpus data is acquired by responding to triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing automatic processing and improving the processing efficiency and the processing accuracy.
It should be understood that what is described in the foregoing disclosure section is not intended to limit key or critical features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
With the development of artificial intelligence, various intelligent interactive robots are also developed rapidly. In order to satisfy the interaction between the intelligent interactive robot and the user, a knowledge base needs to be established to deal with the various problems of the user, and a large amount of linguistic data is often required to be processed before the knowledge base is established.
At present, when a large amount of corpora are faced, the large amount of corpora are different in source, and are derived from a database, log data and front-end page data, so that the format presentation of the corpora data is diversified. Therefore, in the prior art, in the face of corpus data with diversified formats, professional personnel can perform processing such as corpus cleaning and clustering, and cannot realize automatic processing, so that the processing efficiency is low, and the processing accuracy is low.
In view of the above problems, the present disclosure provides a method and an apparatus for processing corpus data in combination with RPA and AI, an electronic device, and a storage medium.
RPA is a business process automation technology based on software robots and Artificial Intelligence (AI). The operation of a human on a computer is simulated through specific 'robot software', and the flow task is automatically executed according to the rule. The processing of corpus data belongs to a part of the technology in RPA. Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture on which the present disclosure is based, and as shown in fig. 1, one network architecture on which the present disclosure is based may include a processing device 2 that combines corpus data of RPA and AI and terminals 1.
The corpus data processing device 2 is hardware or software that can interact with each terminal 1 through a network, and can be used to execute the corpus data processing method described in the following example a, and provide a corpus data processing interface and service for a client carried on each terminal 1.
When the corpus data processing device 2 is hardware, it includes a cloud server with an arithmetic function. When the corpus data processing device 2 is software, it can be installed in an electronic device with computing function, wherein the electronic device includes, but is not limited to, a laptop portable computer, a desktop computer, and the like.
The terminal 1 is a device including a smart phone, a tablet computer, a desktop computer, and the like, which can communicate and exchange information with the processing device 2 of corpus data via a network.
The embodiment of the disclosure provides a processing method, a processing device, electronic equipment and a storage medium for corpus data combined with RPA and AI, and target corpus data is obtained by responding to the triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
In a first aspect, referring to fig. 2, fig. 2 is a schematic flowchart of a method for processing corpus data in combination with RPA and AI according to an embodiment of the present disclosure. The processing method of the corpus data combining the RPA and the AI, provided by the embodiment of the disclosure, comprises the following steps:
step 101, responding to a trigger operation of a data uploading option in an uploading interactive interface, and acquiring target corpus data.
Step 102, determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of the historical corpus data.
And 103, processing the target corpus data according to the processing parameters corresponding to the corresponding format template.
And 104, displaying the processing result of the target corpus data.
The execution main body of the processing method provided by the present example is the processing device for the corpus data, and the target corpus data is obtained by responding to the trigger operation of the data uploading option in the uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
In the following, the solution provided by the present disclosure will be further described:
first, as described in step 101, a user may trigger an operation of a data upload option in an upload interactive interface of a client of a terminal, so that the client sends the trigger operation to a processing device for the processing device to respond. In the uploading interactive interface, a user can upload target corpus data for Natural Language Processing (NLP).
NLP is an important direction in the fields of computer science and AI, and the content of NLP research includes but is not limited to the following branch fields: text classification, information extraction, automatic summarization, intelligent question answering, topic recommendation, machine translation, subject word recognition, knowledge base construction, deep text representation, named entity recognition, text generation, text analysis (lexical, syntactic, grammatical, etc.), speech recognition and synthesis, and the like. The linguistic data is an important resource for NLP, and a knowledge base can be constructed by utilizing the linguistic data and is used for machine translation, intelligent question answering and the like.
In order to facilitate user operation, in an optional implementation manner, the processing device may first display NLP corpus data in the upload interactive interface in response to a user triggering operation through a data upload option in the upload interactive interface; and then, receiving the selection operation of the user on the target corpus data in the displayed NLP corpus data in the uploading interactive interface, and acquiring the target corpus data according to the selection operation.
That is, when the data to be processed is large, the user may upload the NLP corpus data, and then select the target corpus data through the selection operation, so as to be processed by the processing device. Then, as shown in step 102, the processing device needs to determine a format template corresponding to the target corpus data from a plurality of preset format templates.
Specifically, in order to process data in multiple data formats, in this embodiment, a format template corresponding to the target corpus data needs to be determined. The format templates include corresponding processing parameters determined according to format information of the historical corpus data, namely the format templates are obtained based on experience.
In an alternative embodiment, the processing device may specifically adopt the following method when determining the format template of the target corpus data:
and determining format information of the target corpus data, and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data. And/or receiving a trigger operation of a template selection option in the uploading interactive interface, and determining a format template corresponding to the target corpus data in response to the trigger operation of the template selection option.
That is, the format template for the target corpus data may be determined by the processing device analyzing the format information of the target corpus data by itself, for example, the processing device may determine the format template by the number of data column entries included in the target corpus data.
In addition, the processing device may also determine the target corpus data by triggering an operation through a template selection option in the uploading interactive interface by the user, for example, the user may download a format template in advance and then upload the target corpus data based on the format template.
Then, as shown in steps 103 and 104, the processing device processes the target corpus data according to the processing parameters corresponding to the corresponding format template, and displays the processing result of the target corpus data.
Specifically, the processing includes various kinds such as a washing process, a clustering process, an analysis process, and the like. Accordingly, when the processing result is displayed, the processing result corresponding to any one of the processes can be displayed separately.
When the processing is cleaning processing, the processing device can receive the triggering operation of cleaning parameter configuration options in a cleaning interaction interface of a user under a format template corresponding to the target corpus data; and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
Further, during the cleaning process, the processing device may determine a cleaning policy for the target corpus data based on the format template corresponding to the target corpus data, and perform corresponding cleaning. In this embodiment, when the format template corresponding to the target corpus data is the first format template, the target corpus data may be cleaned according to different cleaning strategies based on one or more of the following cleaning parameter configuration options:
the wash parameter configuration options include: the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option.
For the data volume option, it can be specifically used to determine whether to determine the data retained in the target corpus data in the cleansing process, i.e. to retain the first row data or all the data.
All data refers to all dialogue data in the target corpus data of the reserved user, and the first line data refers to first sentence dialogue data in the target corpus data of the reserved user. The first line data can retain first sentence dialogue data with semantics in the target corpus data of the user, and if the first sentence dialogue data in the target corpus data of the user only consists of symbols or pictures, the next sentence dialogue data is retained.
For the cleaning process based on the data volume option, when the data volume of the target corpus data is huge, corresponding screening needs to be carried out on the target corpus data so as to keep the data with the most analysis value, such as the first sentence dialogue data; on the contrary, when the data volume of the target corpus data is not large, all data can be reserved, so that the data can be fully utilized during analysis, and an accurate analysis result can be obtained.
For the data classification storage option, the data classification storage option can be specifically used for determining the partition of the target corpus data in the cleaning process, so that the partitioned data can be used for training the corpus analysis model. That is, by selecting the data classification storage option, the proportion option of the training data and the test data and/or the option of storing the processed target corpus data file number can be determined.
For the data replacement option, the data replacement option can be specifically used for normalizing data such as emoticons, websites, numbers, phone numbers and the like in the target corpus data so as to avoid the influence of the data on subsequent clustering processing. That is, the data replacement options include any one of the following replacement options: a preset symbol replacement option, a preset number replacement option, a preset text replacement option, a telephone number replacement option and a website address replacement option. Further, during the cleaning process, a regular replacement mode may be adopted to process the data that needs to be replaced, such as replacing the "say" in the target corpus data with a blank.
For the data deletion option, the purpose is to remove the data of the target corpus data from which duplication occurs. Wherein the data deletion options include any one or more of the following deletion options: a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
For example, the data deletion option may include a complete reselection to delete the same data in the corpus;
fuzzy deduplication option (Simhash deduplication) is used for deleting relatively similar data in the corpus, such as: can a baby drink milk? Can a baby drink milk?
A number deletion option and/or a non-Chinese text deletion option, thereby removing numbers or text without Chinese characters, and aiming at deleting data with low value;
a standard speech deletion option for deleting a standard speech preceding a speech in the corpus data of the user, such as: first sentence of visitor' < Customer clicks on customer service Menu > "is standard speech, can delete;
a fuzzy word deleting option, which is used for deleting words which are not the linguistic data of the user, for example, the whole sentence can be deleted by using a plurality of words including 'little tiger online';
a first qualifier word number text retention option for using a question of retaining the first qualifier word number (2-50 words) as data, and data questions not in this range are not questions posed by the client, and have no corpus processing value;
a second limited word number text deletion option for allowing configuration for deletion of text between certain two characters, such as "clients" such as deleting content between "and" the book ".
In addition, in other optional embodiments, when the format template corresponding to the target corpus data is the second format template, the washing parameter configuration options further include any one or more of the following configuration options: the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
As mentioned above, the format template is determined based on the information of different formats, and specifically, when the target corpus data includes a customer service name and a user name, the data type retention option may be triggered and/or the data type selection may be cleared, so as to configure and distinguish a customer service from a user, and further retain user data, or remove customer service data.
When the database data and the log data have a certain format, the key information option in each target corpus can be triggered and reserved, and/or the key value option in the front-end page is reserved, so that certain reserved data, such as data after three blank spaces in a reserved area, is configured, and the value corresponding to the reserved key is configured on the front-end page.
In this embodiment, the cleaning parameter configuration options may include any one or more format templates of a data volume option, a data classification storage option, a data replacement option, and a data deletion option, which are referred to as a first format template; the cleaning parameter configuration options comprise any one or more of a data volume option, a data classification storage option, a data replacement option and a data deletion option, and further comprise one or more format templates of a data type retention option, a data type cleaning option, a key information option in each target corpus and a key value option in a front-end page, and the format templates are called as second format templates.
And receiving triggering operation of a user for cleaning parameter configuration options in the cleaning interactive interface under the corresponding format template through the processing device, so that the processing device can perform corresponding cleaning processing on the target corpus data based on a triggering operation pair.
In other optional embodiments, after the cleaning of the target corpus data is completed, the data can be clustered. Specifically, receiving triggering operation of a clustering option in a clustering interactive interface of a user under the corresponding format template; responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm; and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
Specifically, the cleaned target corpus data can be directly clustered to find problem clusters so as to determine knowledge points. Generally, various algorithms can be used in clustering the target corpus data, such as density-based clustering, k-means-based clustering, and hdbscan-based clustering. Based on factors such as the time required for clustering, the data amount of clustering, and the accuracy of the obtained knowledge points, frequent pattern clustering can be adopted in the embodiment. Specifically, the frequent pattern is a pattern frequently appearing in the data set, for example, a set of words (consultation, business) frequently appearing in the dialogue data is a frequent item set, in this embodiment, valuable words can be found first, the frequent item set can be found, and finally dialogs with the same set can be grouped together, thereby implementing clustering.
Finally, as stated in step 104, the processing device further displays the processing result of the target corpus data.
After the target corpus data is processed, a corresponding processing result can be displayed, for example, duplicate data in the target corpus data is deduplicated, and the deduplicated corpus can be displayed.
In practical application, one or more treatments may be performed on the target corpus, and when a treatment result is displayed, the treatment result can be analyzed to form at least one treatment result to be displayed; receiving a display option triggering operation of the processing result to be displayed in a result interaction interface; and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
As described previously, in displaying the results, displaying the processing results includes: clustering results, data condition analysis results and high-frequency corpus problem results.
The data after cleaning can directly give the data quantity condition and the clustering condition after cleaning, and is convenient to report and analyze.
Fig. 3 is a display interface of a clustering result provided by the present application, as shown in fig. 3, a first histogram in the graph shows a distribution of knowledge points for a cleaned result, the number of similarity questions is 365 in terms of 3-10 knowledge point clusters, the number of similarity questions is 25 in terms of 10-20 knowledge point clusters, and the number of similarity questions is 4 in terms of 20-50 knowledge point clusters; the second histogram in the graph is the frequency of sentence occurrences, e.g., the total number of sentences is: the 1530+317+317 is 2162, if 3-10 knowledge point clusters have 1530 sentences in total, 10-20 knowledge point clusters have 317 sentences in total, and the sentences of the knowledge point clusters of 20-50 are summarized into 417 sentences.
Fig. 4 is a display interface of data condition analysis results provided by the present application, and as shown in fig. 4, the total amount of data analyzed this time is 10969, the amount of data after cleaning is 7643, the amount of data after deduplication is 5643, the number of clusters copolymerized is 394, and the frequency sum of the number of similar questions after clustering is 2164.
Fig. 5 is a display interface of the result of the high-frequency corpus problem provided by the present application, as shown in fig. 5, which shows the content and frequency of the high-frequency query related to the current processing.
In addition, an analysis report can be generated according to a display result, algorithms such as cleaning and clustering can be explained, some pages are left, and the report content can be completed only by copying a data page and a histogram generated by the platform to a specified page.
In order to improve the automation, the triggering operation executed by the user on the uploading interactive interface, the cleaning interactive interface, the clustering interactive interface and the like of the terminal can be completed by the RPA robot, so that the human operation is reduced, and the automation degree and the processing efficiency of the corpus data are improved. The embodiment of the disclosure provides a corpus data processing method, which includes the steps of obtaining target corpus data by responding to triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
Fig. 6 is a block diagram of a processing device for corpus data combining RPA and AI according to an embodiment of the present disclosure, which corresponds to the information processing method for intelligent customer service in the foregoing embodiment. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 6, the apparatus for processing corpus data in association with RPA and AI includes: a data acquisition module 10, a data processing module 20 and a result display module 30.
The data acquisition module 10 is configured to respond to a trigger operation of a data uploading option in the uploading interactive interface, and acquire target corpus data;
the data processing module 20 is configured to determine a format template corresponding to the target corpus data from a plurality of preset format templates, and process the target corpus data according to processing parameters corresponding to the corresponding format template; the preset format templates are determined according to format information of historical corpus data;
and a result display module 30, configured to display a processing result of the target corpus data.
In an optional embodiment, the data processing module 20 is configured to determine format information of the target corpus data; and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data.
In an optional embodiment, the data processing module 20 is configured to receive a trigger operation of a template selection option in the upload interactive interface; and responding to the trigger operation of the template selection option, and determining a format template corresponding to the target corpus data.
In an optional embodiment, the data obtaining module 10 is configured to respond to a trigger operation of a data uploading option in an uploading interactive interface, and display NLP corpus data in the uploading interactive interface; and receiving selection operation of target corpus data in the NLP corpus data in an uploading interactive interface, and acquiring the target corpus data.
In an optional embodiment, the data processing module 20 is configured to receive a trigger operation of a cleaning parameter configuration option in the cleaning interaction interface under the corresponding format template; and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
In an optional embodiment, if the corresponding format template is the first format template, the washing parameter configuration options include any one or more of the following configuration options:
the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option;
the data classification storage options include: the method comprises the following steps of selecting the proportion of training data and test data and/or selecting the number of files of target corpus data after storage processing;
the data replacement options include any one of the following replacement options:
a preset symbol replacement option, a preset number replacement option, a preset character replacement option, a telephone number replacement option and a website address replacement option;
the data deletion options include any one or more of the following deletion options:
a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
In an optional embodiment, if the corresponding format template is a second format template, the washing parameter configuration options further include any one or more of the following configuration options:
the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
In an optional embodiment, the data processing module 20 is configured to receive a triggering operation of a clustering option in a clustering interactive interface under the corresponding format template; responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm; and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
In an optional embodiment, the result display module 30 is configured to analyze the processing result to form at least one processing result to be displayed; receiving a display option triggering operation of the processing result to be displayed in a result interaction interface; and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
In an optional embodiment, the processing result to be displayed includes: clustering results, data condition analysis results and high-frequency corpus problem results.
The embodiment of the disclosure provides a corpus data processing device, which obtains target corpus data by responding to a trigger operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
The electronic device provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
Referring to fig. 7, a schematic structural diagram of an electronic device, which may be a terminal device or a server, suitable for implementing an embodiment of the present disclosure is shown. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing apparatus 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While fig. 7 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.