[go: up one dir, main page]

CN119167946B - Method and terminal for embedding exclusive AI interpretation for shared documents - Google Patents

Method and terminal for embedding exclusive AI interpretation for shared documents Download PDF

Info

Publication number
CN119167946B
CN119167946B CN202411644517.4A CN202411644517A CN119167946B CN 119167946 B CN119167946 B CN 119167946B CN 202411644517 A CN202411644517 A CN 202411644517A CN 119167946 B CN119167946 B CN 119167946B
Authority
CN
China
Prior art keywords
document
text content
information
dependency
interpretation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411644517.4A
Other languages
Chinese (zh)
Other versions
CN119167946A (en
Inventor
杨初炀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cotton Candy Technology Co ltd
Original Assignee
Guangzhou Cotton Candy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Cotton Candy Technology Co ltd filed Critical Guangzhou Cotton Candy Technology Co ltd
Priority to CN202411644517.4A priority Critical patent/CN119167946B/en
Publication of CN119167946A publication Critical patent/CN119167946A/en
Application granted granted Critical
Publication of CN119167946B publication Critical patent/CN119167946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供了一种为分享文档植入专属AI解读的方法及终端,所述为分享文档植入专属AI解读的方法包括:接收转发者在聊天工具上转发的文档,提取文档的文本内容,调用AI知识库模型对文档的文本内容进行分析,得到关键信息;根据生成式摘要算法生成文本内容的摘要,利用LDA算法识别出文本内容的主题,并对文本内容进行依存句法分析,提取得到文本内容中句子的主干信息;对关键信息、摘要、主题及句子的主干信息进行整合后,生成对文档的解读结果,将解读结果发送给接收者。本发明实现了接收者在不阅读全文的情况下快速获取文档的核心内容和结构,提高了文档处理和信息传递的效率。

The present invention provides a method and terminal for implanting exclusive AI interpretation for shared documents, the method for implanting exclusive AI interpretation for shared documents comprising: receiving a document forwarded by a forwarder on a chat tool, extracting the text content of the document, calling an AI knowledge base model to analyze the text content of the document, and obtaining key information; generating a summary of the text content according to a generative summary algorithm, identifying the theme of the text content using an LDA algorithm, and performing dependency syntax analysis on the text content to extract the main information of the sentences in the text content; integrating the key information, summary, theme, and main information of the sentences, generating an interpretation result of the document, and sending the interpretation result to the receiver. The present invention enables the receiver to quickly obtain the core content and structure of the document without reading the full text, thereby improving the efficiency of document processing and information transmission.

Description

Method and terminal for implanting exclusive AI interpretation for sharing document
Technical Field
The invention relates to the technical field of document processing, in particular to a method and a terminal for implanting exclusive AI interpretation for sharing documents.
Background
In the current information age, the creation and sharing of documents has become very frequent, especially through email, instant messaging, social media, and other digital platforms. However, as the number and length of documents increases, when a forwarder forwards a document to a recipient, the recipient may experience information overload problems when facing a large amount of text, and it is difficult to quickly identify and extract key information from the large amount of text, affecting interaction efficiency with the forwarder.
In the technical scheme of the application number CN202311218741.2, although the text can be automatically classified and stored according to the trained general language model, so that the document management efficiency is improved, the document cannot be rapidly identified and key information can not be extracted.
In the technical scheme with the application number of CN202211085411.6, although the function of setting the document forwarding task in the online document is realized, a user can perform multi-person streaming of the online document without depending on a third party management tool, the convenience of online document sending is improved, but the automatic interpretation of the forwarded document is not realized, and the interaction efficiency of a receiver and a forwarder is affected.
Disclosure of Invention
The invention provides a method and a terminal for implanting exclusive AI interpretation for a shared document, which are used for rapidly identifying and interpreting key information of a forwarded document and improving interaction efficiency of a receiver and a forwarder.
In order to solve the problems, the invention adopts the following technical scheme:
The invention provides a method for implanting exclusive AI interpretation for a shared document, which comprises the following steps:
Receiving a document forwarded by a forwarder on a chat tool;
Extracting text content of the document, and calling an AI knowledge base model to analyze the text content of the document to obtain key information;
Generating a summary of the text content according to a generating summary algorithm;
Identifying the subject of the text content by using an LDA algorithm, performing dependency syntax analysis on the text content, and extracting trunk information of sentences in the text content;
and integrating the key information, the abstract, the theme and the trunk information of the sentence to generate an interpretation result of the document, and sending the interpretation result to a receiver.
Further, after transmitting the interpretation result to the receiver, the method further includes:
acquiring questioning information of a receiver about the document;
vectorizing the questioning information to obtain a vector corresponding to the questioning information;
invoking the AI knowledge base model to analyze and process the vector corresponding to the questioning information, and screening out reply information with the vector corresponding to the questioning information being greater than a preset similarity;
And sending the reply information to the receiver.
Further, after the reply information is sent to the receiver, the method further includes:
When the receiver is determined to belong to a target client of a client list, after the receiver finishes all questions, acquiring all interactive contents with the receiver;
analyzing all the interactive contents to determine interest preferences of the receiver;
And inquiring the product information matched with the interest preference, and transmitting the product information matched with the interest preference and all interactive contents to a sender.
Preferably, querying the product information matching the interest preference comprises:
determining a user tag describing interest preferences of the recipient;
vectorizing the user tag to obtain a user tag vector;
extracting product labels describing product information from a knowledge base;
vectorizing the product label to obtain a product label vector;
calculating the cosine distance between the user tag vector and the product tag vector, and taking the product tag vector with the cosine distance between the user tag vector and the product tag vector being larger than the preset cosine distance as a target product tag vector;
And determining product information corresponding to the target product label vector to obtain product information matched with the interest preference.
Preferably, invoking an AI knowledge base model to analyze text content of the document to obtain key information, including:
invoking an AI knowledge base model to perform semantic analysis on the text content of the document to obtain semantic information of the text content;
Inputting the semantic information and the text content into a Bayesian network layer of the AI knowledge base model for probability calculation to obtain a threshold probability of each node in the Bayesian network layer;
Screening target nodes with the threshold probability larger than a preset threshold probability from a plurality of nodes based on the threshold probability of each node;
and determining the information corresponding to the target node to obtain key information.
Preferably, extracting text content of the document includes:
identifying structural features of the document;
determining a segmentation mode according to the structural characteristics of the document;
constructing a corresponding regular expression according to the segmentation mode, matching segmentation points of the document by using the regular expression, and segmenting the document into a plurality of document fragments according to the matched segmentation points;
screening the plurality of document fragments to obtain target document fragments;
and extracting the text content of the target document fragment.
Preferably, extracting text content of the document includes:
when the document is determined to contain the target text in the picture or PDF format, the text content of the target text is recognized and extracted by utilizing OCR technology.
Preferably, dependency syntax analysis is performed on the text content, and trunk information of sentences in the text content is extracted, including:
calling a dependency syntax analysis tool to perform dependency syntax analysis on the text content to obtain a dependency tree, wherein the dependency tree is a graphical structure for representing dependency relations among words in sentences, each node in the dependency tree represents a word, and a directed edge represents the dependency relations among words;
identifying a plurality of functional words in the dependency tree, wherein the functional words comprise conjunctions, prepositions and articles;
Calculating the contribution degree of the directed edge corresponding to each functional word to the trunk structure of the dependency tree, removing the functional word with the contribution degree smaller than a preset threshold value and the corresponding directed edge, and generating a target dependency tree;
Determining the dependency relationship of each word in the text content according to the target dependency tree, wherein the dependency relationship comprises the dependency relationship between words;
And extracting trunk information of sentences in the text content according to the dependency relationship of each word in the text content.
Preferably, invoking an AI knowledge base model to analyze text content of the document to obtain key information, including:
Performing word segmentation, stop word removal and grammar annotation on the text content of the document to obtain target text content;
and inputting the target text content into an AI knowledge base model for analysis to obtain key information.
The invention provides a terminal, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for implanting exclusive AI interpretation for sharing documents.
Compared with the prior art, the technical scheme of the invention has at least the following advantages:
The method and the terminal for implanting exclusive AI interpretation for the shared document, provided by the invention, utilize an AI knowledge base model to analyze the document, quickly identify key information and key points in the document, apply a generated abstract algorithm to create a concise abstract for the document, facilitate a receiver to quickly grasp the document subject matter, identify the dominant subject of the document by using an LDA algorithm, enhance understanding of the document content, simultaneously carry out dependency syntactic analysis on the document, extract main information of sentences, further define sentence structures and core meanings, and finally integrate information of different dimensions such as the key information, the abstract, the subject, the sentence main and the like to form a comprehensive document interpretation result, the integrated information provides a multidimensional view angle of the document, facilitates the receiver to understand the document content more deeply, and also facilitates the receiver to quickly acquire the core content and the structure of the document under the condition of not reading the full text, thereby improving the efficiency of document processing and information transmission.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for implanting proprietary AI interpretation for a shared document;
FIG. 2 is a block diagram illustrating an embodiment of an apparatus for sharing document implantation proprietary AI interpretation;
fig. 3 is a block diagram illustrating an internal structure of a terminal according to an embodiment of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S11, S12, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by one of ordinary skill in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those of ordinary skill in the art that unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a method for implanting proprietary AI interpretation for a shared document, comprising:
s11, receiving a document forwarded by a forwarder on a chat tool;
s12, extracting text content of the document, and calling an AI knowledge base model to analyze the text content of the document to obtain key information;
s13, generating a summary of the text content according to a generated summary algorithm;
s14, recognizing the subject of the text content by using an LDA algorithm, performing dependency syntax analysis on the text content, and extracting trunk information of sentences in the text content;
And S15, integrating the key information, the abstract, the theme and the main information of the sentence to generate an interpretation result of the document, and transmitting the interpretation result to a receiver.
As described in the above step S11, the recipient receives the document forwarded by the forwarder through the chat tool, which includes Wechat, whatsapp or QQ, and the format of the document may include PDF, word document, etc.
As described above in step S12, plain text content may be extracted from documents in various formats using a document parsing tool that includes TexSmart or DocAI document intelligent analysis engines.
Invoking an AI knowledge base model, such as a machine learning based classifier or keyword extraction algorithm, identifies key information and primary concepts from the text, which may include words or phrases reflecting the core topic of the document, primary views or arguments supporting the topic of the document, numerical data or charts supporting arguments, guidelines or suggestions for actions based on analysis and conclusion, terms and definitions of specific fields or topics, and so forth. The AI knowledge base model is a pre-trained neural network model and is used for analyzing text content and extracting key information. It should be noted that, regarding the training process of the AI knowledge base model, the present invention is not particularly limited.
As described in step S13, the generated abstract algorithm is an automatic text abstract technology, which not only can extract key sentences from text content, but also can generate new sentences to form an abstract by using a language model, so that a smooth, concise and easy-to-understand text abstract is created while the core meaning of the original text is maintained.
In particular, features, such as keywords, phrases, sentence structures, etc., are extracted from text content and processed using a sequence-to-sequence (Seq 2 Seq) model that contains both encoder and decoder components. The encoder is responsible for reading and understanding the input text and the decoder is responsible for generating a summary, typically a word or phrase sequence. While attention mechanisms are introduced to assist the model in better focusing on the portions of the input features that are most relevant to generating the summary.
In addition, the generated abstract can be optimized, including grammar correction, duplicate content removal and abstract consistency and smoothness assurance.
As described in the above step S14, the present step uses the LDA algorithm to perform topic modeling on the text content, identify the main topic or topic in the text content, and perform dependency syntax analysis on the text at the same time to identify the dependency relationship between the words in the sentence, so as to extract the trunk information of the sentence, i.e. subject, predicate, object, etc.
Wherein dependency syntax analysis is used to determine dependencies between words in a sentence. The analysis may reveal the grammatical structure of the sentence, including word-to-word dependencies, such as subject, predicate, object, etc.
As described in step S15, the embodiment integrates the key information, abstract, theme, sentence trunk and other information with different dimensions to form a coherent and comprehensive document interpretation result, and sends the interpretation result to the receiver through the chat tool, thereby providing a fast and convenient information transmission mode. Wherein the generated interpretation result is a high summary and explanation of the document content, enabling the recipient to quickly grasp the subject matter and structure of the document.
The method for implanting exclusive AI interpretation for the shared document provided by the invention utilizes an AI knowledge base model to analyze the document, rapidly identify key information and key points in the document, and applies a generated abstract algorithm to create a concise abstract for the document, so that a receiver can rapidly grasp the document subject matter, the understanding of the document content is enhanced by using an LDA algorithm to identify the dominant subject matter of the document, meanwhile, dependency syntax analysis is carried out on the document, the trunk information of a sentence is extracted, the sentence structure and the core meaning are further clarified, finally, after the information of different dimensions such as the key information, the abstract, the subject matter, the sentence trunk and the like is integrated, a comprehensive document interpretation result is formed, the integrated information provides a multidimensional view angle of the document, the receiver can more deeply understand the document content, the receiver can rapidly acquire the core content and the structure of the document under the condition of not reading the full text, and the efficiency of document processing and information transmission is improved.
In one embodiment, after transmitting the interpretation result to the receiver, the method further comprises:
acquiring questioning information of a receiver about the document;
vectorizing the questioning information to obtain a vector corresponding to the questioning information;
invoking the AI knowledge base model to analyze and process the vector corresponding to the questioning information, and screening out reply information with the vector corresponding to the questioning information being greater than a preset similarity;
And sending the reply information to the receiver.
In this embodiment, the recipient has a specific question about the document or needs more information, they can ask questions in text or voice in the dialog box of the chat tool, the system obtains the question information about the document from the recipient, and the natural language processing technique is used to convert the question information of the recipient into a numeric vector. For example, the questioning information is converted into a format that the model can understand by Word2Vec, gloVe, or BERT.
The AI knowledge base model receives vector representation of the questioning information, searches information related to the vector corresponding to the questioning information in the knowledge base, screens out the answer information exceeding the preset similarity threshold value by comparing the vector corresponding to the questioning information with the answer vector existing in the knowledge base, thereby accurately matching the answer information, and sends the screened answer information back to the receiver to answer the questions of the receiver.
For example, suppose a recipient has a question about a scientific paper about climate change and ask: "what is the principal finding of this paper? the question is then converted into a vector form using NLP techniques, the AI knowledge base model receives this vector and searches in a vector database containing paper summaries, keypoints, and conclusions. The model computes the similarity between the challenge vector and each reply vector in the database. The system filters out replies with a similarity above a preset threshold (e.g. 0.8), including the main finding summary of the paper. The system sends these screened replies to the recipient.
The embodiment utilizes the advanced NLP technology to ensure that the questions are accurately understood and answered, and improves the interaction efficiency and effect of the receiver and the document content by providing timely and accurate replies. At the same time, the system may also provide personalized responses based on the recipient's particular questions.
In one embodiment, after the reply information is sent to the receiver, the method further includes:
When the receiver is determined to belong to a target client of a client list, after the receiver finishes all questions, acquiring all interactive contents with the receiver;
analyzing all the interactive contents to determine interest preferences of the receiver;
And inquiring the product information matched with the interest preference, and transmitting the product information matched with the interest preference and all interactive contents to a sender.
When a recipient is identified as belonging to a target customer group in a pre-built customer list, after the recipient has finished all questions, the system obtains all records of interactions with the recipient, including questions, feedback, purchase history, and the like. The present embodiment may analyze the collected interactive content using text analysis and data mining techniques to determine the interest preferences of the recipient, such as concerns about a certain class of products or specific needs.
According to the interest preference of the receiver, the system queries the product database for the product information matched with the interest preference, and sends the matched product information and all interactive contents to a sender (which can be a marketing team or a customer service representative) so as to know the interest points or the demand intention of the customer, so that the customer service representative can follow up and get insight into the business.
For example, suppose a customer shows an interest in a brand of smart watch and asks multiple times about the function and price of a product in a customer service chat. The customer service representative confirms the interest of the customer in the smart watch and collects all interactive contents after the customer asking questions. And carrying out emotion analysis and topic modeling on questions and feedback of the client by using an NLP technology, and determining that the client is particularly interested in the health monitoring function of the intelligent watch. And searching the intelligent watch product with the health monitoring function in the product database according to the analysis result by the system. The system finds several smart watches that meet the customer's preferences and extracts detailed information about these products, such as prices, functional introductions, user ratings, etc. The system sends the product information and all interactive contents of the clients to the client service representative so that the client service representative can provide more personalized service and recommendation, and the success rate of commodity transaction is improved.
The embodiment can increase sales opportunities and conversion rate by providing product information meeting customer demands, and enterprises can further understand interests and preferences of customers, thereby providing guidance for product development and marketing strategies. Meanwhile, enterprises can concentrate marketing resources on customer groups most likely to generate returns, and resource utilization efficiency is improved.
In one embodiment, querying product information that matches the interest preferences includes:
determining a user tag describing interest preferences of the recipient;
vectorizing the user tag to obtain a user tag vector;
extracting product labels describing product information from a knowledge base;
vectorizing the product label to obtain a product label vector;
calculating the cosine distance between the user tag vector and the product tag vector, and taking the product tag vector with the cosine distance between the user tag vector and the product tag vector being larger than the preset cosine distance as a target product tag vector;
And determining product information corresponding to the target product label vector to obtain product information matched with the interest preference.
The embodiment can determine the user tag describing the interest preference of the receiver, such as "technological lovers", "healthy life", etc., by analyzing the interactive content, purchase history, browsing behavior, etc. of the receiver. The user tag is converted into a vector form through Word2Vec to obtain a user tag vector so that a computer can process and analyze the user tag vector. Product tags describing product information, such as "smart wear", "low sugar food", etc., are extracted from a pre-built knowledge base. Similarly, converting the product label into a vector form through Word2Vec to obtain a product label vector, calculating cosine distance between the user label vector and the product label vector to evaluate similarity between the user label vector and the product label vector, screening out a product label vector which is closer to the user label vector, namely a target product label vector, according to a preset cosine distance threshold, and finally determining corresponding product information according to the screened target product label vector, wherein the products most probably accord with interest preference of a receiver.
For example, suppose a user frequently mentions content related to a healthy diet during an interaction. The system determines that the user tags are "healthy diet", "nutritional" and "organic food" by analyzing the user behavior. These user labels are converted into user label vectors by a vectorization process. The system extracts product tags, such as "low calorie", "whole wheat", "organic authentication", etc., from the knowledge base, and the product tags are also vectorized to form a product tag vector. The cosine distance between the user tag vector and all the product tag vectors is calculated. If the cosine distance threshold is set to be 0.7, the product label vector with the cosine distance larger than the threshold is screened out. Finally, the system determines products that match the "healthy diet" user tag, such as "organic whole wheat bread.
According to the embodiment, the user requirements and the product characteristics can be precisely matched in a vector form, the possibility of purchasing is increased, the product recommendation highly related to the interest preference is provided for the user, and the user satisfaction is improved.
In one embodiment, invoking an AI knowledge base model to analyze text content of the document to obtain key information includes:
invoking an AI knowledge base model to perform semantic analysis on the text content of the document to obtain semantic information of the text content;
Inputting the semantic information and the text content into a Bayesian network layer of the AI knowledge base model for probability calculation to obtain a threshold probability of each node in the Bayesian network layer;
Screening target nodes with the threshold probability larger than a preset threshold probability from a plurality of nodes based on the threshold probability of each node;
and determining the information corresponding to the target node to obtain key information.
In this embodiment, the text content of the document is analyzed in depth using the AI knowledge base model to understand the deep meaning and context of the text. After analysis, the AI knowledge base model extracts semantic information of the text content, including keywords, concepts, entity relationships, and the like.
Meanwhile, the Bayesian network layer is preconfigured by the AI knowledge base model, then the extracted semantic information and text content are input into the Bayesian network layer of the AI knowledge base model, probability calculation is carried out on each node on the Bayesian network layer, threshold probability of each node in the Bayesian network layer is obtained, the nodes represent different concepts or attributes in the text, and the threshold probability represents the possibility that the information corresponding to the nodes belongs to key information. The bayesian network is a graphical model for representing the dependency between variables.
And screening out target nodes with the threshold probability larger than the preset threshold probability based on the calculated threshold probability, wherein the target nodes are regarded as key information in the text, and determining specific information corresponding to the target nodes, so that the key information of the document is obtained.
For example, assume that there is a research report on the trend of emerging technologies. First, the AI knowledge base model is used to semantically analyze the report, identifying technological trends and related concepts discussed in the report. The semantic information extracted by the model comprises keywords such as artificial intelligence, machine learning, deep learning and the like. The semantic information and the report content are input into a Bayesian network layer of the model for probability calculation. Assuming that there are multiple nodes in the bayesian network layer, such as "technology maturity", "market potential", "application area", etc., the model calculates a threshold probability for each node. Setting the threshold probability to be 0.8, and screening out nodes with the threshold probability larger than 0.8, such as 'market potential' nodes. Information corresponding to the "market potential" node is determined, for example, the "artificial intelligence" mentioned in the report has a huge market potential in the application of the medical field.
The method and the device can further understand the content and the context of the document through semantic analysis, can accurately and effectively extract key information from the document through probability calculation and target node screening by a Bayesian network layer, and meanwhile, the Bayesian network can continuously update the probability according to new information, so that the adaptability and the accuracy of the model are improved.
In one embodiment, extracting text content of the document includes:
identifying structural features of the document;
determining a segmentation mode according to the structural characteristics of the document;
constructing a corresponding regular expression according to the segmentation mode, matching segmentation points of the document by using the regular expression, and segmenting the document into a plurality of document fragments according to the matched segmentation points;
screening the plurality of document fragments to obtain target document fragments;
and extracting the text content of the target document fragment.
The embodiment realizes an optimized processing flow of a document, and aims to extract text content of a specific document fragment by identifying structural characteristics of the document and segmenting by using a regular expression.
Specifically, first, the structural features of the document are identified, that is, the layout and format of the document are analyzed, and the structural elements such as the title, the paragraph, the list, the table and the like are identified. Suitable segmentation criteria, such as blank lines, specific separators, title levels, etc., are determined based on the structural characteristics of the document. And constructing a corresponding regular expression according to the determined segmentation mode, and identifying segmentation points in the document.
The regular expression is a text pattern matching mode and is used for operations such as character string searching, replacing, text verifying and the like. It consists of a series of characters that define a search pattern that can be used to check whether a string conforms to a certain format or to extract strings from text that conform to a particular pattern. In a document processing flow, regular expressions may be used to identify segmentation points in a document in order to segment the document into multiple segments. The cut points may be specific text patterns in the document, such as titles, specific separators, blank lines, etc.
Scanning the document by using a regular expression, finding out all matched segmentation points, segmenting the document into a plurality of independent document fragments according to the identified segmentation points, screening the document fragments obtained by segmentation, reserving target document fragments according to specific conditions (such as fragments containing specific keywords or topics), and extracting text contents from the screened target document fragments for further processing or analysis, thereby realizing accurate screening of text fragments with substantial meaning without analyzing and processing all the text contents, and improving the analysis efficiency of a system on the text contents.
For example, assuming that there is a Word document containing a plurality of chapters, each beginning with a title, such as the "first chapter introduction", the structural features of the document are first identified, note that the title of each chapter is in the form of the "chapter X" and follows an empty line. Then a slicing mode is determined, deciding to use the title and the following empty line of each chapter as slicing points. Constructing a regular expression, such as chapter \s $, matching rows beginning with chapter and ending with chapter, including the following empty rows, scanning the entire document using the regular expression, and identifying the titles and empty rows of all chapters. According to the identified segmentation points, the document is segmented into independent chapters, chapters containing specific keywords or information, such as chapters containing 'market analysis', are screened out, and text contents of the screened chapters are extracted for further analysis or abstract generation.
The embodiment can enable document processing to be more structured and systematic through recognition of the structural characteristics of the document, the accuracy of document segmentation is improved through a segmentation mode based on the structural characteristics, meanwhile, the manual operation requirement is reduced through automatic regular expression matching and document segmentation, in addition, information extraction is enabled to be focused on actual requirements through screening specific document fragments, the efficiency of document processing and information extraction is improved, and time is saved.
In one embodiment, extracting text content of the document includes:
when the document is determined to contain the target text in the picture or PDF format, the text content of the target text is recognized and extracted by utilizing OCR technology.
When text in the form of pictures is contained in a document or text embedded in a PDF file cannot be directly copied, optical Character Recognition (OCR) techniques may be used to identify and extract the text content. Specifically, firstly, determining that pictures or PDFs in a document contain target texts to be extracted, processing the pictures or PDF files by using an OCR technology, identifying text contents in the pictures or PDF files, and correcting and formatting texts identified by the OCR technology so as to improve accuracy and readability. Wherein the OCR technology converts text in a picture into an editable, searchable text format.
The embodiment converts the text in the picture or PDF into an editable format, improves the accessibility of the document, and reduces the time and labor cost for manually inputting the text and possible errors in the automated OCR process.
In one embodiment, performing dependency syntax analysis on the text content, and extracting trunk information of sentences in the text content includes:
calling a dependency syntax analysis tool to perform dependency syntax analysis on the text content to obtain a dependency tree, wherein the dependency tree is a graphical structure for representing dependency relations among words in sentences, each node in the dependency tree represents a word, and a directed edge represents the dependency relations among words;
identifying a plurality of functional words in the dependency tree, wherein the functional words comprise conjunctions, prepositions and articles;
Calculating the contribution degree of the directed edge corresponding to each functional word to the trunk structure of the dependency tree, removing the functional word with the contribution degree smaller than a preset threshold value and the corresponding directed edge, and generating a target dependency tree;
Determining the dependency relationship of each word in the text content according to the target dependency tree, wherein the dependency relationship comprises the dependency relationship between words;
And extracting trunk information of sentences in the text content according to the dependency relationship of each word in the text content.
The embodiment can clean the text of the text content, remove irrelevant characters and punctuations, and then process the text after text cleaning by using a dependency syntax analysis tool such as Stanford NLP, spaCy or NLTK to generate a dependency tree. The dependency tree is a graphical structure in which, in dependency syntax analysis, the dependency tree breaks down sentences into a directed graph in which each node represents a word and the directed edges represent the dependencies between words.
Specifically, the dependency tree has the characteristic that the dependency tree reflects the hierarchical structure of sentences, and each word directly depends on the function or grammar role in the sentences. Meanwhile, edges in the dependency tree are directional, indicating the dependency direction between words. For example, a verb may depend on its subject and object. The dependency tree has a root node, typically the predicate verb of a sentence, around which the dependency structure of the entire sentence is built. Each node (term) in the dependency tree is typically accompanied by a part-of-speech label, such as nouns, verbs, adjectives, etc. In addition, edges in the dependency tree are typically accompanied by dependency labels that describe the particular type of dependency, such as "nsubj" (noun subject), "dobj" (direct object), etc. In a dependency tree, a word may also have multiple dependencies, which allows the model to capture more complex syntactic structures.
Functional words such as conjunctions ("sum", "but"), prepositions ("pass") and articles ("one", "this") are identified in the dependency tree. The contribution of each functional word's corresponding directed edge to the dependency tree trunk is evaluated, e.g., by analyzing the type of edge, the frequency of the word, or its contribution in the sentence. And removing the functional words with low contribution and the corresponding directed edges according to the preset contribution threshold, so as to generate a more concise target dependency tree.
In one embodiment, weights may be assigned to different types of dependencies in calculating the contribution of each functional word's corresponding directed edge to the backbone structure of the dependency tree. For example, a dominant relationship may contribute more to the backbone structure than a scholarly relationship. Or to impart different importance depending on the part of speech. Generally, nouns, verbs, and adjectives contribute more to sentence meaning than adverbs and prepositions. Or analyzing the sentence components to determine the effect of the functional words in the sentence. If the functional word is the core part of the sentence component, its contribution is higher. Or to count how frequently functional words appear in the document. Words that occur frequently may contribute more to the overall structure.
Specifically, the system may calculate the dependency path length of each functional word from the subject sentence to other words, and determine the contribution of the directed edge corresponding to each functional word to the trunk structure of the dependency tree according to the dependency path length, where the contribution is inversely related to the dependency path length, that is, the contribution of the functional word with a shorter path may be higher to the trunk structure.
For example, assume that there is a dependency tree of sentences that "he walks in a park. "
The functional words "in" (prepositions) and "lining" (prepositions) are identified.
The "in" and "in" dependency types are analyzed, possibly "prep" (preposition phrase).
A weight, e.g., 0.5, is assigned to the "prep" dependency type.
The path lengths and syntactic depths of "in" and "lining" are calculated, and found to be closer to the subject "he" and farther from the subject "he".
According to the factors, the contribution degree of 'in' to the main structure is higher, and the contribution degree of 'lining' to the main structure is lower.
By the method, the fact that which functional words have small contributions to the trunk structure of the dependency tree can be accurately determined, and further the fact that the functional words are removed when the target dependency tree is generated is considered, sentence structure simplification is facilitated, and core information is highlighted.
Next, the target dependency tree is analyzed to determine the dependency relationship, including the dependency relationship, for each word in the text, and based on the dependency relationship, the trunk information of the sentence, typically including subject, predicate and object, is extracted.
For example, assume that there is a sentence "he has passed an examination by struggling to learn. The sentence is analyzed using the dependency syntax analysis tool to obtain a dependency tree, and functional words such as "pass" (prepositions) and "no" (assisted words) are identified in the dependency tree. The contribution degree of the corresponding directed edge of pass to the backbone structure is calculated, and the effect of the directed edge on the backbone's ' he-strive learning-passed examination ' is found to be smaller. The pass and its corresponding directed edge are removed, and a target dependency tree is generated. The target dependency tree is analyzed to determine that "he" is the subject, "effort to learn" is the predicate, and "passed the test" is the object. Extracting the sentence trunk, namely 'he struggles to learn to pass the examination'.
According to the method, the device and the system, key information can be extracted more accurately through dependency syntactic analysis, functional words with smaller influence can be screened and removed accurately through contribution degree calculation and preset threshold comparison, sentence structures are simplified, trunks are clearer, and meanwhile understanding efficiency of text content is improved through rapid recognition of the trunks of sentences.
In one embodiment, invoking an AI knowledge base model to analyze text content of the document to obtain key information includes:
Performing word segmentation, stop word removal and grammar annotation on the text content of the document to obtain target text content;
and inputting the target text content into an AI knowledge base model for analysis to obtain key information.
In this embodiment, the system performs word segmentation on the text content of the document, i.e., segments a continuous text string into meaningful words or phrases. Removing stop words is removing words (e.g., "and," "yes," etc.) common in languages that do not typically carry important information in text analysis. Grammar labeling is performed on the text content, and parts of speech (nouns, verbs, adjectives and the like) and sentence structures in the text are identified.
And finally, inputting the preprocessed target text content into an AI knowledge base model to obtain key information, such as key information of 'market demand increase', 'consumer behavior change' in Word documents related to market trend analysis.
Referring to fig. 2, an embodiment of the present invention further provides an apparatus for implanting dedicated AI interpretations for sharing documents, including:
A receiving module 11, configured to receive a document forwarded by a forwarder on a chat tool;
The extracting module 12 is used for extracting the text content of the document, and calling an AI knowledge base model to analyze the text content of the document to obtain key information;
a generating module 13, configured to generate a summary of the text content according to a generated summary algorithm;
the recognition module 14 is configured to recognize a topic of the text content by using an LDA algorithm, perform dependency syntax analysis on the text content, and extract trunk information of sentences in the text content;
and the integration module 15 is used for integrating the key information, the abstract, the theme and the trunk information of the sentence to generate an interpretation result of the document, and sending the interpretation result to a receiver.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The terminal provided by the invention comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for implanting exclusive AI interpretation for sharing documents.
In an embodiment, referring to fig. 3, the terminal provided in an embodiment of the present application may be a computer device, and the internal structure of the terminal may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing relevant data of a method for embedding exclusive AI interpretation for the shared document. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements the method of implanting proprietary AI interpretations for shared documents described in the above embodiments.
In one embodiment, the present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the above method of implanting a proprietary AI interpretation for a shared document. Wherein the storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored in a storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
As can be seen from the above embodiments, the present invention has the following advantages:
The method and the terminal for implanting exclusive AI interpretation for the shared document, provided by the invention, utilize an AI knowledge base model to analyze the document, quickly identify key information and key points in the document, apply a generated abstract algorithm to create a concise abstract for the document, facilitate a receiver to quickly grasp the document subject matter, identify the dominant subject of the document by using an LDA algorithm, enhance understanding of the document content, simultaneously carry out dependency syntactic analysis on the document, extract main information of sentences, further define sentence structures and core meanings, and finally integrate information of different dimensions such as the key information, the abstract, the subject, the sentence main and the like to form a comprehensive document interpretation result, the integrated information provides a multidimensional view angle of the document, facilitates the receiver to understand the document content more deeply, and also facilitates the receiver to quickly acquire the core content and the structure of the document under the condition of not reading the full text, thereby improving the efficiency of document processing and information transmission.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (9)

1.一种为分享文档植入专属AI解读的方法,其特征在于,包括:1. A method for embedding exclusive AI interpretation into a shared document, characterized by comprising: 接收转发者在聊天工具上转发的文档;Receive documents forwarded by the forwarder on the chat tool; 提取所述文档的文本内容,调用AI知识库模型对所述文档的文本内容进行分析,得到关键信息;Extract the text content of the document, call the AI knowledge base model to analyze the text content of the document, and obtain key information; 根据生成式摘要算法生成所述文本内容的摘要;Generate a summary of the text content according to a generative summary algorithm; 利用LDA算法识别出所述文本内容的主题,并对所述文本内容进行依存句法分析,提取得到所述文本内容中句子的主干信息;Using the LDA algorithm to identify the subject of the text content, and performing dependency syntactic analysis on the text content to extract the main information of the sentences in the text content; 对所述关键信息、摘要、主题及句子的主干信息进行整合后,生成对所述文档的解读结果,将所述解读结果发送给接收者;After integrating the key information, abstract, theme and main information of the sentence, generating an interpretation result of the document, and sending the interpretation result to a receiver; 所述对所述文本内容进行依存句法分析,提取得到所述文本内容中句子的主干信息,包括:The performing dependency syntactic analysis on the text content to extract the main information of the sentences in the text content includes: 调用依存句法分析工具对所述文本内容进行依存句法分析,得到依存树,所述依存树为用于表示句子中词与词之间依存关系的图形化结构,依存树中的每个节点代表一个词,有向边代表词与词之间的依存关系,识别所述依存树中的多个功能性词,所述功能性词包括连词、介词和冠词,计算每个所述功能性词对应的有向边对所述依存树的主干结构的贡献度,移除贡献度小于预设阈值的功能性词及对应的有向边,生成目标依存树,根据所述目标依存树,确定所述文本内容中每个词的依存关系,所述依存关系包括词与词之间的从属关系,根据所述文本内容中每个词的依存关系,提取所述文本内容中句子的主干信息;Calling a dependency syntactic analysis tool to perform dependency syntactic analysis on the text content to obtain a dependency tree, wherein the dependency tree is a graphical structure for representing dependency relationships between words in a sentence, wherein each node in the dependency tree represents a word, and directed edges represent dependency relationships between words, identifying multiple functional words in the dependency tree, wherein the functional words include conjunctions, prepositions, and articles, calculating the contribution of directed edges corresponding to each functional word to the trunk structure of the dependency tree, removing functional words and corresponding directed edges whose contribution is less than a preset threshold, generating a target dependency tree, determining the dependency relationship of each word in the text content according to the target dependency tree, wherein the dependency relationship includes a subordinate relationship between words, and extracting trunk information of sentences in the text content according to the dependency relationship of each word in the text content; 所述计算每个所述功能性词对应的有向边对所述依存树的主干结构的贡献度,包括:The calculating the contribution of the directed edge corresponding to each functional word to the trunk structure of the dependency tree includes: 计算每个所述功能性词在句子主语到其他词的依存路径长度,根据所述依存路径长度确定每个所述功能性词对应的有向边对所述依存树的主干结构的贡献度,所述贡献度与所述依存路径长度成反相关。The length of the dependency path from the subject of the sentence to other words of each functional word is calculated, and the contribution of the directed edge corresponding to each functional word to the trunk structure of the dependency tree is determined according to the dependency path length, and the contribution is inversely correlated with the dependency path length. 2.根据权利要求1所述的为分享文档植入专属AI解读的方法,其特征在于,将所述解读结果发送给接收者之后,还包括:2. The method for embedding exclusive AI interpretation into shared documents according to claim 1, characterized in that after sending the interpretation result to the recipient, it also includes: 获取接收者关于所述文档的提问信息;Obtaining the recipient's question information about the document; 对所述提问信息进行向量化处理,得到所述提问信息对应的向量;Performing vectorization processing on the question information to obtain a vector corresponding to the question information; 调用所述AI知识库模型对所述提问信息对应的向量进行分析处理,并筛选出与所述提问信息对应的向量大于预设相似度的答复信息;Calling the AI knowledge base model to analyze and process the vector corresponding to the question information, and screening out the answer information whose vector corresponding to the question information has a similarity greater than a preset degree; 将所述答复信息发送给接收者。The reply information is sent to the recipient. 3.根据权利要求2所述的为分享文档植入专属AI解读的方法,其特征在于,将所述答复信息发送给接收者之后,还包括:3. The method for embedding exclusive AI interpretation into a shared document according to claim 2, characterized in that after sending the reply information to the recipient, it also includes: 当确定所述接收者属于客户名单的目标客户时,在所述接收者结束所有提问后,获取与所述接收者的所有互动内容;When it is determined that the receiver is a target customer in the customer list, after the receiver finishes asking all questions, all interaction contents with the receiver are obtained; 对所述所有互动内容进行分析,确定所述接收者的兴趣偏好;Analyze all interactive contents to determine the interest preferences of the recipient; 查询与所述兴趣偏好相匹配的产品信息,将与兴趣偏好相匹配的产品信息及所有互动内容发送给发送者。Query product information that matches the interest preference, and send the product information and all interactive content that matches the interest preference to the sender. 4.根据权利要求3所述的为分享文档植入专属AI解读的方法,其特征在于,查询与所述兴趣偏好相匹配的产品信息,包括:4. The method for embedding exclusive AI interpretation into shared documents according to claim 3, characterized in that querying product information matching the interest preference comprises: 确定描述所述接收者的兴趣偏好的用户标签;determining a user tag describing the interest preferences of the recipient; 将所述用户标签进行向量化处理,得到用户标签向量;Vectorize the user label to obtain a user label vector; 从知识库中提取出描述产品信息的产品标签;Extract product tags describing product information from the knowledge base; 将所述产品标签进行向量化处理,得到产品标签向量;Vectorize the product label to obtain a product label vector; 计算所述用户标签向量与产品标签向量的余弦距离,将与所述用户标签向量的余弦距离大于预设余弦距离的产品标签向量作为目标产品标签向量;Calculate the cosine distance between the user label vector and the product label vector, and use the product label vector whose cosine distance with the user label vector is greater than a preset cosine distance as the target product label vector; 确定所述目标产品标签向量对应的产品信息,得到与所述兴趣偏好相匹配的产品信息。Determine the product information corresponding to the target product label vector, and obtain product information matching the interest preference. 5.根据权利要求1所述的为分享文档植入专属AI解读的方法,其特征在于,调用AI知识库模型对所述文档的文本内容进行分析,得到关键信息,包括:5. The method for embedding exclusive AI interpretation into shared documents according to claim 1, characterized in that the AI knowledge base model is called to analyze the text content of the document to obtain key information, including: 调用AI知识库模型对所述文档的文本内容进行语义分析,得到所述文本内容的语义信息;Calling the AI knowledge base model to perform semantic analysis on the text content of the document to obtain semantic information of the text content; 将所述语义信息和所述文本内容输入所述AI知识库模型的贝叶斯网络层上进行概率计算,得到贝叶斯网络层中的每个节点的阈值概率;Inputting the semantic information and the text content into the Bayesian network layer of the AI knowledge base model for probability calculation to obtain the threshold probability of each node in the Bayesian network layer; 基于每个节点的阈值概率,从多个节点中筛选出阈值概率大于预设阈值概率的目标节点;Based on the threshold probability of each node, a target node whose threshold probability is greater than a preset threshold probability is selected from multiple nodes; 确定所述目标节点对应的信息,得到关键信息。Determine the information corresponding to the target node and obtain key information. 6.根据权利要求1所述的为分享文档植入专属AI解读的方法,其特征在于,提取所述文档的文本内容,包括:6. The method for embedding exclusive AI interpretation into a shared document according to claim 1, characterized in that extracting the text content of the document comprises: 识别所述文档的结构特点;identifying structural features of the document; 根据所述文档的结构特点确定切分模式;Determine a segmentation mode according to the structural characteristics of the document; 根据所述切分模式构建相应的正则表达式,利用所述正则表达式对所述文档进行切分点匹配,按照匹配的切分点将所述文档分割成多个文档片段;Constructing a corresponding regular expression according to the segmentation pattern, using the regular expression to match segmentation points of the document, and dividing the document into multiple document segments according to the matched segmentation points; 对所述多个文档片段进行筛选后,得到目标文档片段;After screening the multiple document fragments, a target document fragment is obtained; 提取所述目标文档片段的文本内容。The text content of the target document fragment is extracted. 7.根据权利要求1所述的为分享文档植入专属AI解读的方法,其特征在于,提取所述文档的文本内容,包括:7. The method for embedding exclusive AI interpretation into a shared document according to claim 1, characterized in that extracting the text content of the document comprises: 当确定所述文档中包含图片或PDF格式的目标文本时,利用OCR技术识别和提取出所述目标文本的文本内容。When it is determined that the document contains a target text in a picture or PDF format, the text content of the target text is identified and extracted using OCR technology. 8.根据权利要求1所述的为分享文档植入专属AI解读的方法,其特征在于,调用AI知识库模型对所述文档的文本内容进行分析,得到关键信息,包括:8. The method for embedding exclusive AI interpretation into shared documents according to claim 1, characterized in that the AI knowledge base model is called to analyze the text content of the document to obtain key information, including: 对所述文档的文本内容进行分词、去除停用词及语法标注后,得到目标文本内容;After word segmentation, stop word removal and grammatical annotation of the text content of the document, the target text content is obtained; 将所述目标文本内容输入AI知识库模型进行分析,得到关键信息。The target text content is input into the AI knowledge base model for analysis to obtain key information. 9.一种终端,其特征在于,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至8中任一项所述的为分享文档植入专属AI解读的方法的步骤。9. A terminal, characterized in that it includes a memory and a processor, wherein the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the steps of the method for implanting exclusive AI interpretation for shared documents as described in any one of claims 1 to 8.
CN202411644517.4A 2024-11-18 2024-11-18 Method and terminal for embedding exclusive AI interpretation for shared documents Active CN119167946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411644517.4A CN119167946B (en) 2024-11-18 2024-11-18 Method and terminal for embedding exclusive AI interpretation for shared documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411644517.4A CN119167946B (en) 2024-11-18 2024-11-18 Method and terminal for embedding exclusive AI interpretation for shared documents

Publications (2)

Publication Number Publication Date
CN119167946A CN119167946A (en) 2024-12-20
CN119167946B true CN119167946B (en) 2025-03-25

Family

ID=93888480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411644517.4A Active CN119167946B (en) 2024-11-18 2024-11-18 Method and terminal for embedding exclusive AI interpretation for shared documents

Country Status (1)

Country Link
CN (1) CN119167946B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294391A (en) * 2015-05-20 2017-01-04 天脉聚源(北京)科技有限公司 A kind of method and system displayed file attributes in instant messenger
CN114428861A (en) * 2022-01-27 2022-05-03 陕西煤业股份有限公司 Enterprise policy intelligent reading method, system, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4647610B2 (en) * 2003-05-16 2011-03-09 グーグル インコーポレイテッド Networked chat and media sharing system and method
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
CN109271626B (en) * 2018-08-31 2023-09-26 北京工业大学 Text semantic analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294391A (en) * 2015-05-20 2017-01-04 天脉聚源(北京)科技有限公司 A kind of method and system displayed file attributes in instant messenger
CN114428861A (en) * 2022-01-27 2022-05-03 陕西煤业股份有限公司 Enterprise policy intelligent reading method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN119167946A (en) 2024-12-20

Similar Documents

Publication Publication Date Title
US20210065045A1 (en) Artificial intelligence (ai) based innovation data processing system
CN114238573A (en) Information pushing method and device based on text countermeasure sample
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
US20220414463A1 (en) Automated troubleshooter
KR20160026892A (en) Non-factoid question-and-answer system and method
JPH07295989A (en) Device that forms interpreter to analyze data
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN116882414B (en) Automatic comment generation method and related device based on large-scale language model
CN118377881A (en) Intelligent question answering method, system, device, computer equipment and readable storage medium
CN119577459B (en) Intelligent customer service training method and device for multi-mode large model and storage medium
CN118551026A (en) A method and system for generating artificial intelligence entities based on RAG technology
JP2022035314A (en) Information processing unit and program
Rafail et al. Natural language processing
CN118535978B (en) A news analysis method and system based on multimodal large model
CN119228386A (en) Optimization method, system, device and medium of intelligent customer service system
CN119106322A (en) User grouping method, device, equipment, storage medium and program product
Janssens Natural language processing in requirements elicitation and requirements analysis: a systematic literature review
CN119167946B (en) Method and terminal for embedding exclusive AI interpretation for shared documents
US11017172B2 (en) Proposition identification in natural language and usage thereof for search and retrieval
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
Patra et al. Machine learning based sentiment analysis and swarm intelligence
Li et al. Semantics-Enhanced Online Intellectual Capital Mining Service for Enterprise Customer Centers
CN117852553B (en) Language processing system for extracting component transaction scene information based on chat record
CN118484665B (en) Intelligent extraction method and system of text topics based on NLP technology
CN114330295B (en) Information timeliness identification, model training, push method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant