[go: up one dir, main page]

CN112231444A - Method, device and electronic device for processing corpus data combining RPA and AI - Google Patents

Method, device and electronic device for processing corpus data combining RPA and AI Download PDF

Info

Publication number
CN112231444A
CN112231444A CN202011126085.XA CN202011126085A CN112231444A CN 112231444 A CN112231444 A CN 112231444A CN 202011126085 A CN202011126085 A CN 202011126085A CN 112231444 A CN112231444 A CN 112231444A
Authority
CN
China
Prior art keywords
data
option
corpus data
processing
target corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011126085.XA
Other languages
Chinese (zh)
Inventor
胡一川
刘金艳
胡景超
汪冠春
褚瑞
李玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Benying Network Technology Co Ltd
Beijing Laiye Network Technology Co Ltd
Original Assignee
Beijing Benying Network Technology Co Ltd
Beijing Laiye Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Benying Network Technology Co Ltd, Beijing Laiye Network Technology Co Ltd filed Critical Beijing Benying Network Technology Co Ltd
Publication of CN112231444A publication Critical patent/CN112231444A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提供一种结合RPA和AI的语料数据的处理方法、装置和电子设备,通过响应于上传交互界面中数据上传选项的触发操作,获取目标语料数据;从预设的多个格式模板中确定所述目标语料数据对应的格式模板;其中,预设的各格式模板是根据历史语料数据的格式信息确定的;根据所述对应的格式模板对应的处理参数对所述目标语料数据进行处理;显示所述目标语料数据的处理结果,从而实现语料数据的自动化处理,提高了处理的效率和处理的准确度。

Figure 202011126085

The present disclosure provides a method, device and electronic device for processing corpus data combining RPA and AI. Target corpus data is acquired by responding to a trigger operation of a data upload option in an upload interactive interface; determining from a plurality of preset format templates the format template corresponding to the target corpus data; wherein, each preset format template is determined according to the format information of the historical corpus data; the target corpus data is processed according to the processing parameter corresponding to the corresponding format template; displaying The processing result of the target corpus data can realize the automatic processing of the corpus data, and improve the processing efficiency and processing accuracy.

Figure 202011126085

Description

Processing method and device for corpus data combining RPA and AI and electronic equipment
Technical Field
The embodiment of the disclosure relates to the technical field of data processing, and in particular relates to a method and a device for processing corpus data by combining RPA and AI, and electronic equipment.
Background
The Robot Process Automation (RPA) is a new type of artificial intelligent virtual Process Automation robot, which is used to simulate the operation of human on computer and automatically execute the Process task according to the rule.
Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. With the development of artificial intelligence, various intelligent interactive robots are also developed rapidly. In order to satisfy the interaction between the intelligent interactive robot and the user, a knowledge base needs to be established to deal with the various problems of the user, and a large amount of linguistic data is often required to be processed before the knowledge base is established.
At present, when a large amount of corpora are faced, the large amount of corpora are different in source, and are derived from a database, log data and front-end page data, so that the format presentation of the corpora data is diversified. Therefore, in the prior art, in the face of corpus data with diversified formats, professional personnel can perform processing such as corpus cleaning and clustering, and cannot realize automatic processing, so that the processing efficiency is low, and the processing accuracy is low.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present disclosure provide a method and an apparatus for processing corpus data in combination with RPA and AI, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for processing corpus data in combination with an RPA and an AI, including:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and acquiring target corpus data;
determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data;
processing the target corpus data according to the processing parameters corresponding to the corresponding format template;
and displaying the processing result of the target corpus data.
In an optional embodiment, determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
determining format information of the target corpus data;
and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data.
In an optional embodiment, determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
receiving trigger operation of a template selection option in the uploading interactive interface;
and responding to the trigger operation of the template selection option, and determining a format template corresponding to the target corpus data.
In an optional embodiment, the obtaining target corpus data in response to a trigger operation of a data upload option in an upload interactive interface includes:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and displaying NLP corpus data in the uploading interactive interface;
and receiving selection operation of target corpus data in the NLP corpus data in an uploading interactive interface, and acquiring the target corpus data.
In an optional embodiment, the processing the target corpus data according to the processing parameter corresponding to the corresponding format template includes:
receiving the trigger operation of the cleaning parameter configuration options in the cleaning interactive interface under the corresponding format template;
and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
In an optional embodiment, if the corresponding format template is the first format template, the washing parameter configuration options include any one or more of the following configuration options:
the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option;
the data classification storage options include: the method comprises the following steps of selecting the proportion of training data and test data and/or selecting the number of files of target corpus data after storage processing;
the data replacement options include any one of the following replacement options:
a preset symbol replacement option, a preset number replacement option, a preset character replacement option, a telephone number replacement option and a website address replacement option;
the deletion options include any one or more of the following deletion options:
a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
In an optional embodiment, if the corresponding format template is a second format template, the washing parameter configuration options further include any one or more of the following configuration options:
the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
In an optional embodiment, the processing the target corpus data according to the processing parameter corresponding to the corresponding format template includes:
receiving triggering operation of clustering options in the clustering interactive interface under the corresponding format template;
responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm;
and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
In an optional embodiment, the displaying the processing result of the target corpus data includes:
analyzing the processing result to form at least one processing result to be displayed;
receiving a display option triggering operation of the processing result to be displayed in a result interaction interface;
and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
In an optional embodiment, the processing result to be displayed includes: clustering results, data condition analysis results and high-frequency corpus problem results.
In a second aspect, the present disclosure provides a processing apparatus for corpus data in combination with RPA and AI, comprising:
the data acquisition module is used for responding to the triggering operation of the data uploading option in the uploading interactive interface and acquiring target corpus data;
the data processing module is used for determining a format template corresponding to the target corpus data from a plurality of preset format templates and processing the target corpus data according to processing parameters corresponding to the corresponding format template; the preset format templates are determined according to format information of historical corpus data;
and the result display module is used for displaying the processing result of the target corpus data.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the preceding claims.
In a fourth aspect, the disclosed embodiments provide a computer readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the method as described in any one of the preceding claims.
The embodiment of the disclosure provides a corpus data processing method, a corpus data processing device, electronic equipment and a storage medium, wherein target corpus data is acquired by responding to triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing automatic processing and improving the processing efficiency and the processing accuracy.
It should be understood that what is described in the foregoing disclosure section is not intended to limit key or critical features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic diagram of a network architecture upon which the present disclosure is based;
fig. 2 is a schematic flowchart of a method for processing corpus data in combination with RPA and AI according to an embodiment of the present disclosure;
FIG. 3 is a display interface of a clustering result provided herein;
FIG. 4 is a display interface of data condition analysis results provided herein;
FIG. 5 is a display interface of a result of a high frequency corpus problem provided by the present application;
fig. 6 is a block diagram illustrating a structure of a processing apparatus for corpus data in combination with RPA and AI according to an embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
With the development of artificial intelligence, various intelligent interactive robots are also developed rapidly. In order to satisfy the interaction between the intelligent interactive robot and the user, a knowledge base needs to be established to deal with the various problems of the user, and a large amount of linguistic data is often required to be processed before the knowledge base is established.
At present, when a large amount of corpora are faced, the large amount of corpora are different in source, and are derived from a database, log data and front-end page data, so that the format presentation of the corpora data is diversified. Therefore, in the prior art, in the face of corpus data with diversified formats, professional personnel can perform processing such as corpus cleaning and clustering, and cannot realize automatic processing, so that the processing efficiency is low, and the processing accuracy is low.
In view of the above problems, the present disclosure provides a method and an apparatus for processing corpus data in combination with RPA and AI, an electronic device, and a storage medium.
RPA is a business process automation technology based on software robots and Artificial Intelligence (AI). The operation of a human on a computer is simulated through specific 'robot software', and the flow task is automatically executed according to the rule. The processing of corpus data belongs to a part of the technology in RPA. Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture on which the present disclosure is based, and as shown in fig. 1, one network architecture on which the present disclosure is based may include a processing device 2 that combines corpus data of RPA and AI and terminals 1.
The corpus data processing device 2 is hardware or software that can interact with each terminal 1 through a network, and can be used to execute the corpus data processing method described in the following example a, and provide a corpus data processing interface and service for a client carried on each terminal 1.
When the corpus data processing device 2 is hardware, it includes a cloud server with an arithmetic function. When the corpus data processing device 2 is software, it can be installed in an electronic device with computing function, wherein the electronic device includes, but is not limited to, a laptop portable computer, a desktop computer, and the like.
The terminal 1 is a device including a smart phone, a tablet computer, a desktop computer, and the like, which can communicate and exchange information with the processing device 2 of corpus data via a network.
The embodiment of the disclosure provides a processing method, a processing device, electronic equipment and a storage medium for corpus data combined with RPA and AI, and target corpus data is obtained by responding to the triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
In a first aspect, referring to fig. 2, fig. 2 is a schematic flowchart of a method for processing corpus data in combination with RPA and AI according to an embodiment of the present disclosure. The processing method of the corpus data combining the RPA and the AI, provided by the embodiment of the disclosure, comprises the following steps:
step 101, responding to a trigger operation of a data uploading option in an uploading interactive interface, and acquiring target corpus data.
Step 102, determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of the historical corpus data.
And 103, processing the target corpus data according to the processing parameters corresponding to the corresponding format template.
And 104, displaying the processing result of the target corpus data.
The execution main body of the processing method provided by the present example is the processing device for the corpus data, and the target corpus data is obtained by responding to the trigger operation of the data uploading option in the uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
In the following, the solution provided by the present disclosure will be further described:
first, as described in step 101, a user may trigger an operation of a data upload option in an upload interactive interface of a client of a terminal, so that the client sends the trigger operation to a processing device for the processing device to respond. In the uploading interactive interface, a user can upload target corpus data for Natural Language Processing (NLP).
NLP is an important direction in the fields of computer science and AI, and the content of NLP research includes but is not limited to the following branch fields: text classification, information extraction, automatic summarization, intelligent question answering, topic recommendation, machine translation, subject word recognition, knowledge base construction, deep text representation, named entity recognition, text generation, text analysis (lexical, syntactic, grammatical, etc.), speech recognition and synthesis, and the like. The linguistic data is an important resource for NLP, and a knowledge base can be constructed by utilizing the linguistic data and is used for machine translation, intelligent question answering and the like.
In order to facilitate user operation, in an optional implementation manner, the processing device may first display NLP corpus data in the upload interactive interface in response to a user triggering operation through a data upload option in the upload interactive interface; and then, receiving the selection operation of the user on the target corpus data in the displayed NLP corpus data in the uploading interactive interface, and acquiring the target corpus data according to the selection operation.
That is, when the data to be processed is large, the user may upload the NLP corpus data, and then select the target corpus data through the selection operation, so as to be processed by the processing device. Then, as shown in step 102, the processing device needs to determine a format template corresponding to the target corpus data from a plurality of preset format templates.
Specifically, in order to process data in multiple data formats, in this embodiment, a format template corresponding to the target corpus data needs to be determined. The format templates include corresponding processing parameters determined according to format information of the historical corpus data, namely the format templates are obtained based on experience.
In an alternative embodiment, the processing device may specifically adopt the following method when determining the format template of the target corpus data:
and determining format information of the target corpus data, and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data. And/or receiving a trigger operation of a template selection option in the uploading interactive interface, and determining a format template corresponding to the target corpus data in response to the trigger operation of the template selection option.
That is, the format template for the target corpus data may be determined by the processing device analyzing the format information of the target corpus data by itself, for example, the processing device may determine the format template by the number of data column entries included in the target corpus data.
In addition, the processing device may also determine the target corpus data by triggering an operation through a template selection option in the uploading interactive interface by the user, for example, the user may download a format template in advance and then upload the target corpus data based on the format template.
Then, as shown in steps 103 and 104, the processing device processes the target corpus data according to the processing parameters corresponding to the corresponding format template, and displays the processing result of the target corpus data.
Specifically, the processing includes various kinds such as a washing process, a clustering process, an analysis process, and the like. Accordingly, when the processing result is displayed, the processing result corresponding to any one of the processes can be displayed separately.
When the processing is cleaning processing, the processing device can receive the triggering operation of cleaning parameter configuration options in a cleaning interaction interface of a user under a format template corresponding to the target corpus data; and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
Further, during the cleaning process, the processing device may determine a cleaning policy for the target corpus data based on the format template corresponding to the target corpus data, and perform corresponding cleaning. In this embodiment, when the format template corresponding to the target corpus data is the first format template, the target corpus data may be cleaned according to different cleaning strategies based on one or more of the following cleaning parameter configuration options:
the wash parameter configuration options include: the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option.
For the data volume option, it can be specifically used to determine whether to determine the data retained in the target corpus data in the cleansing process, i.e. to retain the first row data or all the data.
All data refers to all dialogue data in the target corpus data of the reserved user, and the first line data refers to first sentence dialogue data in the target corpus data of the reserved user. The first line data can retain first sentence dialogue data with semantics in the target corpus data of the user, and if the first sentence dialogue data in the target corpus data of the user only consists of symbols or pictures, the next sentence dialogue data is retained.
For the cleaning process based on the data volume option, when the data volume of the target corpus data is huge, corresponding screening needs to be carried out on the target corpus data so as to keep the data with the most analysis value, such as the first sentence dialogue data; on the contrary, when the data volume of the target corpus data is not large, all data can be reserved, so that the data can be fully utilized during analysis, and an accurate analysis result can be obtained.
For the data classification storage option, the data classification storage option can be specifically used for determining the partition of the target corpus data in the cleaning process, so that the partitioned data can be used for training the corpus analysis model. That is, by selecting the data classification storage option, the proportion option of the training data and the test data and/or the option of storing the processed target corpus data file number can be determined.
For the data replacement option, the data replacement option can be specifically used for normalizing data such as emoticons, websites, numbers, phone numbers and the like in the target corpus data so as to avoid the influence of the data on subsequent clustering processing. That is, the data replacement options include any one of the following replacement options: a preset symbol replacement option, a preset number replacement option, a preset text replacement option, a telephone number replacement option and a website address replacement option. Further, during the cleaning process, a regular replacement mode may be adopted to process the data that needs to be replaced, such as replacing the "say" in the target corpus data with a blank.
For the data deletion option, the purpose is to remove the data of the target corpus data from which duplication occurs. Wherein the data deletion options include any one or more of the following deletion options: a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
For example, the data deletion option may include a complete reselection to delete the same data in the corpus;
fuzzy deduplication option (Simhash deduplication) is used for deleting relatively similar data in the corpus, such as: can a baby drink milk? Can a baby drink milk?
A number deletion option and/or a non-Chinese text deletion option, thereby removing numbers or text without Chinese characters, and aiming at deleting data with low value;
a standard speech deletion option for deleting a standard speech preceding a speech in the corpus data of the user, such as: first sentence of visitor' < Customer clicks on customer service Menu > "is standard speech, can delete;
a fuzzy word deleting option, which is used for deleting words which are not the linguistic data of the user, for example, the whole sentence can be deleted by using a plurality of words including 'little tiger online';
a first qualifier word number text retention option for using a question of retaining the first qualifier word number (2-50 words) as data, and data questions not in this range are not questions posed by the client, and have no corpus processing value;
a second limited word number text deletion option for allowing configuration for deletion of text between certain two characters, such as "clients" such as deleting content between "and" the book ".
In addition, in other optional embodiments, when the format template corresponding to the target corpus data is the second format template, the washing parameter configuration options further include any one or more of the following configuration options: the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
As mentioned above, the format template is determined based on the information of different formats, and specifically, when the target corpus data includes a customer service name and a user name, the data type retention option may be triggered and/or the data type selection may be cleared, so as to configure and distinguish a customer service from a user, and further retain user data, or remove customer service data.
When the database data and the log data have a certain format, the key information option in each target corpus can be triggered and reserved, and/or the key value option in the front-end page is reserved, so that certain reserved data, such as data after three blank spaces in a reserved area, is configured, and the value corresponding to the reserved key is configured on the front-end page.
In this embodiment, the cleaning parameter configuration options may include any one or more format templates of a data volume option, a data classification storage option, a data replacement option, and a data deletion option, which are referred to as a first format template; the cleaning parameter configuration options comprise any one or more of a data volume option, a data classification storage option, a data replacement option and a data deletion option, and further comprise one or more format templates of a data type retention option, a data type cleaning option, a key information option in each target corpus and a key value option in a front-end page, and the format templates are called as second format templates.
And receiving triggering operation of a user for cleaning parameter configuration options in the cleaning interactive interface under the corresponding format template through the processing device, so that the processing device can perform corresponding cleaning processing on the target corpus data based on a triggering operation pair.
In other optional embodiments, after the cleaning of the target corpus data is completed, the data can be clustered. Specifically, receiving triggering operation of a clustering option in a clustering interactive interface of a user under the corresponding format template; responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm; and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
Specifically, the cleaned target corpus data can be directly clustered to find problem clusters so as to determine knowledge points. Generally, various algorithms can be used in clustering the target corpus data, such as density-based clustering, k-means-based clustering, and hdbscan-based clustering. Based on factors such as the time required for clustering, the data amount of clustering, and the accuracy of the obtained knowledge points, frequent pattern clustering can be adopted in the embodiment. Specifically, the frequent pattern is a pattern frequently appearing in the data set, for example, a set of words (consultation, business) frequently appearing in the dialogue data is a frequent item set, in this embodiment, valuable words can be found first, the frequent item set can be found, and finally dialogs with the same set can be grouped together, thereby implementing clustering.
Finally, as stated in step 104, the processing device further displays the processing result of the target corpus data.
After the target corpus data is processed, a corresponding processing result can be displayed, for example, duplicate data in the target corpus data is deduplicated, and the deduplicated corpus can be displayed.
In practical application, one or more treatments may be performed on the target corpus, and when a treatment result is displayed, the treatment result can be analyzed to form at least one treatment result to be displayed; receiving a display option triggering operation of the processing result to be displayed in a result interaction interface; and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
As described previously, in displaying the results, displaying the processing results includes: clustering results, data condition analysis results and high-frequency corpus problem results.
The data after cleaning can directly give the data quantity condition and the clustering condition after cleaning, and is convenient to report and analyze.
Fig. 3 is a display interface of a clustering result provided by the present application, as shown in fig. 3, a first histogram in the graph shows a distribution of knowledge points for a cleaned result, the number of similarity questions is 365 in terms of 3-10 knowledge point clusters, the number of similarity questions is 25 in terms of 10-20 knowledge point clusters, and the number of similarity questions is 4 in terms of 20-50 knowledge point clusters; the second histogram in the graph is the frequency of sentence occurrences, e.g., the total number of sentences is: the 1530+317+317 is 2162, if 3-10 knowledge point clusters have 1530 sentences in total, 10-20 knowledge point clusters have 317 sentences in total, and the sentences of the knowledge point clusters of 20-50 are summarized into 417 sentences.
Fig. 4 is a display interface of data condition analysis results provided by the present application, and as shown in fig. 4, the total amount of data analyzed this time is 10969, the amount of data after cleaning is 7643, the amount of data after deduplication is 5643, the number of clusters copolymerized is 394, and the frequency sum of the number of similar questions after clustering is 2164.
Fig. 5 is a display interface of the result of the high-frequency corpus problem provided by the present application, as shown in fig. 5, which shows the content and frequency of the high-frequency query related to the current processing.
In addition, an analysis report can be generated according to a display result, algorithms such as cleaning and clustering can be explained, some pages are left, and the report content can be completed only by copying a data page and a histogram generated by the platform to a specified page.
In order to improve the automation, the triggering operation executed by the user on the uploading interactive interface, the cleaning interactive interface, the clustering interactive interface and the like of the terminal can be completed by the RPA robot, so that the human operation is reduced, and the automation degree and the processing efficiency of the corpus data are improved. The embodiment of the disclosure provides a corpus data processing method, which includes the steps of obtaining target corpus data by responding to triggering operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
Fig. 6 is a block diagram of a processing device for corpus data combining RPA and AI according to an embodiment of the present disclosure, which corresponds to the information processing method for intelligent customer service in the foregoing embodiment. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 6, the apparatus for processing corpus data in association with RPA and AI includes: a data acquisition module 10, a data processing module 20 and a result display module 30.
The data acquisition module 10 is configured to respond to a trigger operation of a data uploading option in the uploading interactive interface, and acquire target corpus data;
the data processing module 20 is configured to determine a format template corresponding to the target corpus data from a plurality of preset format templates, and process the target corpus data according to processing parameters corresponding to the corresponding format template; the preset format templates are determined according to format information of historical corpus data;
and a result display module 30, configured to display a processing result of the target corpus data.
In an optional embodiment, the data processing module 20 is configured to determine format information of the target corpus data; and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data.
In an optional embodiment, the data processing module 20 is configured to receive a trigger operation of a template selection option in the upload interactive interface; and responding to the trigger operation of the template selection option, and determining a format template corresponding to the target corpus data.
In an optional embodiment, the data obtaining module 10 is configured to respond to a trigger operation of a data uploading option in an uploading interactive interface, and display NLP corpus data in the uploading interactive interface; and receiving selection operation of target corpus data in the NLP corpus data in an uploading interactive interface, and acquiring the target corpus data.
In an optional embodiment, the data processing module 20 is configured to receive a trigger operation of a cleaning parameter configuration option in the cleaning interaction interface under the corresponding format template; and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
In an optional embodiment, if the corresponding format template is the first format template, the washing parameter configuration options include any one or more of the following configuration options:
the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option;
the data classification storage options include: the method comprises the following steps of selecting the proportion of training data and test data and/or selecting the number of files of target corpus data after storage processing;
the data replacement options include any one of the following replacement options:
a preset symbol replacement option, a preset number replacement option, a preset character replacement option, a telephone number replacement option and a website address replacement option;
the data deletion options include any one or more of the following deletion options:
a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
In an optional embodiment, if the corresponding format template is a second format template, the washing parameter configuration options further include any one or more of the following configuration options:
the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
In an optional embodiment, the data processing module 20 is configured to receive a triggering operation of a clustering option in a clustering interactive interface under the corresponding format template; responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm; and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
In an optional embodiment, the result display module 30 is configured to analyze the processing result to form at least one processing result to be displayed; receiving a display option triggering operation of the processing result to be displayed in a result interaction interface; and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
In an optional embodiment, the processing result to be displayed includes: clustering results, data condition analysis results and high-frequency corpus problem results.
The embodiment of the disclosure provides a corpus data processing device, which obtains target corpus data by responding to a trigger operation of a data uploading option in an uploading interactive interface; determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data; processing the target corpus data according to the processing parameters corresponding to the corresponding format template; and displaying the processing result of the target corpus data, thereby realizing the automatic processing of the corpus data and improving the processing efficiency and the processing accuracy.
The electronic device provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
Referring to fig. 7, a schematic structural diagram of an electronic device, which may be a terminal device or a server, suitable for implementing an embodiment of the present disclosure is shown. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing apparatus 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While fig. 7 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (13)

1. A method for processing corpus data in combination with RPA and AI, the method comprising:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and acquiring target corpus data;
determining a format template corresponding to the target corpus data from a plurality of preset format templates; the preset format templates are determined according to format information of historical corpus data;
processing the target corpus data according to the processing parameters corresponding to the corresponding format template;
and displaying the processing result of the target corpus data.
2. The method according to claim 1, wherein the determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
determining format information of the target corpus data;
and determining a corresponding format template from a plurality of preset format templates according to the format information of the target corpus data.
3. The method according to claim 1, wherein determining a format template corresponding to the target corpus data from a plurality of preset format templates includes:
receiving trigger operation of a template selection option in the uploading interactive interface;
and responding to the trigger operation of the template selection option, and determining a format template corresponding to the target corpus data.
4. The method according to claim 1, wherein the obtaining target corpus data in response to the triggering operation of the data uploading option in the uploading interactive interface comprises:
responding to the triggering operation of a data uploading option in the uploading interactive interface, and displaying Natural Language Processing (NLP) corpus data in the uploading interactive interface;
and receiving selection operation of target corpus data in the NLP corpus data in an uploading interactive interface, and acquiring the target corpus data.
5. The method according to claim 1, wherein the processing the target corpus data according to the processing parameters corresponding to the corresponding format template includes:
receiving the trigger operation of the cleaning parameter configuration options in the cleaning interactive interface under the corresponding format template;
and responding to the trigger operation of the cleaning parameter configuration option, and cleaning the target corpus data.
6. The method of claim 5, wherein if the corresponding format template is the first format template, the washing parameter configuration options include any one or more of the following configuration options:
the data management system comprises a data volume option, a data classification storage option, a data replacement option and a data deletion option;
the data classification storage options include: the method comprises the following steps of selecting the proportion of training data and test data and/or selecting the number of files of target corpus data after storage processing;
the data replacement options include any one of the following replacement options:
a preset symbol replacement option, a preset number replacement option, a preset character replacement option, a telephone number replacement option and a website address replacement option;
the data deletion options include any one or more of the following deletion options:
a full deduplication option, a fuzzy deduplication option, a numeric deletion option, a non-Chinese text deletion option, a standard tactical deletion option, a fuzzy tactical deletion option, a first limited word number text retention option, and a second limited word number text deletion option.
7. The method of claim 6, wherein if the corresponding format template is a second format template, the washing parameter configuration options further comprise any one or more of the following configuration options:
the method comprises the steps of reserving a data type option, clearing a data type option, reserving a key information option in each target corpus and reserving a key value option in a front-end page.
8. The method according to claim 1, wherein the processing the target corpus data according to the processing parameters corresponding to the corresponding format template includes:
receiving triggering operation of clustering options in the clustering interactive interface under the corresponding format template;
responding to the triggering operation of the clustering option, and clustering the target corpus data by adopting a frequent pattern clustering algorithm;
and outputting the clustering result of the target corpus data through the frequent pattern clustering algorithm.
9. The method according to any one of claims 1 to 8, wherein the displaying of the processing result of the target corpus data comprises:
analyzing the processing result to form at least one processing result to be displayed;
receiving a display option triggering operation of the processing result to be displayed in a result interaction interface;
and responding to the display option triggering operation, and displaying the corresponding processing result to be displayed.
10. The method of claim 9, wherein the processing results to be displayed comprise: clustering results, data condition analysis results and high-frequency corpus problem results.
11. A processing apparatus for corpus data in combination with RPA and AI, comprising:
the data acquisition module is used for responding to the triggering operation of the data uploading option in the uploading interactive interface and acquiring target corpus data;
the data processing module is used for determining a format template corresponding to the target corpus data from a plurality of preset format templates and processing the target corpus data according to processing parameters corresponding to the corresponding format template; the preset format templates are determined according to format information of historical corpus data;
and the result display module is used for displaying the processing result of the target corpus data.
12. An electronic device, comprising:
a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-10.
13. A computer-readable storage medium, on which a computer program is stored, which computer program is executable by a processor to implement the method according to any one of claims 1-10.
CN202011126085.XA 2020-03-31 2020-10-20 Method, device and electronic device for processing corpus data combining RPA and AI Pending CN112231444A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010246027 2020-03-31
CN2020102460274 2020-03-31

Publications (1)

Publication Number Publication Date
CN112231444A true CN112231444A (en) 2021-01-15

Family

ID=74118250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011126085.XA Pending CN112231444A (en) 2020-03-31 2020-10-20 Method, device and electronic device for processing corpus data combining RPA and AI

Country Status (1)

Country Link
CN (1) CN112231444A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010742A (en) * 2021-03-01 2021-06-22 歌尔微电子股份有限公司 Data processing method, device, equipment and medium
CN114564916A (en) * 2022-03-03 2022-05-31 山东新一代信息产业技术研究院有限公司 Method, device and medium for simplifying corpus addition and corpus tagging

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404650A (en) * 2015-10-30 2016-03-16 中国石油集团东方地球物理勘探有限责任公司 GIS data processing method and apparatus
CN108062367A (en) * 2017-12-08 2018-05-22 平安科技(深圳)有限公司 The method for uploading and its terminal of a kind of data list
CN110442716A (en) * 2019-08-05 2019-11-12 腾讯科技(深圳)有限公司 Intelligent text data processing method and device calculate equipment, storage medium
CN110457302A (en) * 2019-07-31 2019-11-15 河南开合软件技术有限公司 A kind of structural data intelligence cleaning method
CN110765195A (en) * 2019-10-23 2020-02-07 北京锐安科技有限公司 Data analysis method and device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404650A (en) * 2015-10-30 2016-03-16 中国石油集团东方地球物理勘探有限责任公司 GIS data processing method and apparatus
CN108062367A (en) * 2017-12-08 2018-05-22 平安科技(深圳)有限公司 The method for uploading and its terminal of a kind of data list
CN110457302A (en) * 2019-07-31 2019-11-15 河南开合软件技术有限公司 A kind of structural data intelligence cleaning method
CN110442716A (en) * 2019-08-05 2019-11-12 腾讯科技(深圳)有限公司 Intelligent text data processing method and device calculate equipment, storage medium
CN110765195A (en) * 2019-10-23 2020-02-07 北京锐安科技有限公司 Data analysis method and device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010742A (en) * 2021-03-01 2021-06-22 歌尔微电子股份有限公司 Data processing method, device, equipment and medium
CN114564916A (en) * 2022-03-03 2022-05-31 山东新一代信息产业技术研究院有限公司 Method, device and medium for simplifying corpus addition and corpus tagging

Similar Documents

Publication Publication Date Title
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
CN116501960B (en) Content retrieval method, device, equipment and medium
CN107526846B (en) Method, device, server and medium for generating and sorting channel sorting model
CN110569354B (en) Barrage emotion analysis method and device
CN107241260A (en) The method and apparatus of news push based on artificial intelligence
CN108228567B (en) Method and device for extracting short names of organizations
WO2020052061A1 (en) Method and device for processing information
CN107679217A (en) Association method for extracting content and device based on data mining
CN111339295A (en) Method, apparatus, electronic device and computer readable medium for presenting information
CN110569335A (en) triple verification method and device based on artificial intelligence and storage medium
CN112784591B (en) Data processing method and device, electronic equipment and storage medium
CN110750627A (en) Material retrieval method and device, electronic equipment and storage medium
CN113569018A (en) Question and answer pair mining method and device
CN112445959A (en) Retrieval method, retrieval device, computer-readable medium and electronic device
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
US20190122667A1 (en) Question Urgency in QA System with Visual Representation in Three Dimensional Space
CN113011169A (en) Conference summary processing method, device, equipment and medium
CN111026849A (en) Data processing method and device
CN117787290A (en) Drawing prompting method and device based on knowledge graph
CN112231444A (en) Method, device and electronic device for processing corpus data combining RPA and AI
CN115801980A (en) Video generation method and device
CN112287659A (en) Information generation method and device, electronic equipment and storage medium
CN111310465B (en) Parallel corpus acquisition method and device, electronic equipment and storage medium
CN109672706A (en) A kind of information recommendation method, device, server and storage medium
CN109472028B (en) Method and device for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 1902, 19th Floor, China Electronics Building, No. 3 Danling Road, Haidian District, Beijing

Applicant after: BEIJING LAIYE NETWORK TECHNOLOGY Co.,Ltd.

Applicant after: Laiye Technology (Beijing) Co.,Ltd.

Address before: 1902, 19 / F, China Electronics Building, 3 Danling Road, Haidian District, Beijing 100080

Applicant before: BEIJING LAIYE NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China

Applicant before: BEIJING BENYING NETWORK TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210115