[go: up one dir, main page]

CN116644756A - Semantic Analysis Method Based on Knowledge Graph - Google Patents

Semantic Analysis Method Based on Knowledge Graph Download PDF

Info

Publication number
CN116644756A
CN116644756A CN202310490219.3A CN202310490219A CN116644756A CN 116644756 A CN116644756 A CN 116644756A CN 202310490219 A CN202310490219 A CN 202310490219A CN 116644756 A CN116644756 A CN 116644756A
Authority
CN
China
Prior art keywords
information
semantic
tone
data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310490219.3A
Other languages
Chinese (zh)
Inventor
瞿珂
万澎江
张少杰
于政
翟士丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haizhi Technology Group Co ltd
Original Assignee
Beijing Haizhi Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haizhi Technology Group Co ltd filed Critical Beijing Haizhi Technology Group Co ltd
Priority to CN202310490219.3A priority Critical patent/CN116644756A/en
Publication of CN116644756A publication Critical patent/CN116644756A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of semantic analysis, in particular to a semantic analysis method based on a knowledge graph, which comprises the steps of obtaining voice data, screening the voice data, removing disordered information, leaving an effective sentence pattern, analyzing the effective sentence pattern, obtaining text information and tone information, identifying the text information and the tone information according to a pre-trained semantic model, analyzing the text information and the tone information through standard synonyms and a spoken word stock, obtaining first semantic information, carrying out association analysis according to the knowledge graph constructed by combining the first semantic information, obtaining second semantic information, comparing the second semantic information with the first semantic information, outputting the second semantic information when the difference value of a comparison result is lower than a set value, and reminding a user to reenter the difference value of the comparison result when the difference value of the comparison result is higher, so that semantic information which meets the requirements of a user can be obtained through comparison of double voice information, and the accuracy of the information in the semantic data is improved.

Description

基于知识图谱的语义分析方法Semantic Analysis Method Based on Knowledge Graph

技术领域technical field

本发明涉及语义分析技术领域,特别涉及基于知识图谱的语义分析方法。The invention relates to the technical field of semantic analysis, in particular to a semantic analysis method based on a knowledge graph.

背景技术Background technique

近年来,语音识别技术的发展较大程度的提高了人机交互水平,而语义分析技术作为理解自然语言的关键部分,对于人机交互的智能化程度起到了决定性作用。然而,从自然语言的角度来说,大部分词具有一词多义的特点,一个词除了表达本意之外,还可能具有其它的隐含语义,仅通过关键字进行识别的方法无法准确识别出其实际意义。此外,当句子为口语化语句时,语义分析过程中可能无法寻找到句子中的谓语,进而难以实现准确的语义分析;In recent years, the development of speech recognition technology has greatly improved the level of human-computer interaction, and semantic analysis technology, as a key part of understanding natural language, has played a decisive role in the intelligence of human-computer interaction. However, from the perspective of natural language, most words have the characteristics of polysemy. In addition to expressing the original meaning, a word may also have other hidden semantics, which cannot be accurately identified only by keywords. its practical significance. In addition, when the sentence is a colloquial sentence, the predicate in the sentence may not be found during the semantic analysis process, making it difficult to achieve accurate semantic analysis;

知识图谱是通过将应用数学、图形学、信息可视化技术、信息科学等学科的理论与方法与计量学引文分析、共现分析等方法结合,并利用可视化的图谱形象地展示学科的核心结构、发展历史、前沿领域以及整体知识架构达到多学科融合目的的现代理论。它把复杂的知识领域通过数据挖掘、信息处理、知识计量和图形绘制而显示出来,揭示知识领域的动态发展规律,为学科研究提供切实的、有价值的参考。The knowledge map is a combination of theories and methods of applied mathematics, graphics, information visualization technology, information science and other disciplines with metrology citation analysis, co-occurrence analysis and other methods, and uses the visual map to vividly display the core structure and development of the subject. History, frontier fields, and the overall knowledge structure achieve the modern theory of multidisciplinary integration. It displays complex knowledge fields through data mining, information processing, knowledge measurement and graphic drawing, reveals the dynamic development law of knowledge fields, and provides practical and valuable references for subject research.

因此,有必要提供基于知识图谱的语义分析方法解决上述技术问题。Therefore, it is necessary to provide a semantic analysis method based on knowledge graphs to solve the above technical problems.

发明内容Contents of the invention

为解决上述技术问题,本发明提供基于知识图谱的语义分析方法。In order to solve the above technical problems, the present invention provides a semantic analysis method based on knowledge graph.

本发明提供的基于知识图谱的语义分析方法,具体步骤如下:The semantic analysis method based on the knowledge map provided by the present invention, the specific steps are as follows:

S1、获取语音数据,并对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息以及音调信息,并将得到的文本信息与音调信息输入到语音模型中;S1. Acquire voice data, and screen the voice data, so as to eliminate messy information, leaving only effective sentence patterns, and analyze the effective sentence patterns to obtain text information and tone information, and combine the obtained text information and tone information input into the speech model;

S2、根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息;S2. Recognize the text information and tone information according to the pre-trained semantic model, and analyze them through the standard synonyms and colloquial word lexicon to obtain the first semantic information;

S3、根据第一语义信息结合构建的知识图谱进行关联分析,以得到第二语义信息;S3. Perform association analysis based on the first semantic information combined with the constructed knowledge map to obtain second semantic information;

S4、将第二语义信息与第一语义信息进行对比,当对比结果的差异值低于设定值时,根据第二语义信息确定语义分析结果,并将其进行输出,将对比结果的差异值较高时,提醒使用者重新输入。S4. Comparing the second semantic information with the first semantic information, when the difference value of the comparison result is lower than the set value, determine the semantic analysis result according to the second semantic information, and output it, and compare the difference value of the result When it is higher, the user is reminded to re-enter.

优选的,所述知识图谱的构建的具体方法为:Preferably, the specific method of constructing the knowledge map is:

S31、通过可视化方式对接各种数据源,以及对数据源进行校验及管理,根据数据源建立基本知识库;S31. Connect various data sources in a visual way, verify and manage the data sources, and establish a basic knowledge base according to the data sources;

S32、对数据源进行解析,将其解析生成多个语义向量;S32. Analyze the data source, and generate multiple semantic vectors by analyzing it;

S33、对生成的语义向量进行数据融合,并对融合后的语义向量之间挖掘关联关系,生成语义向量之间关系;S33. Perform data fusion on the generated semantic vectors, and mine association relationships between the fused semantic vectors to generate relationships between semantic vectors;

S34、对将融合后的语义向量与语义向量之间关系构成语义数据库,并将其存储在服务器中,并对语义数据库进行质量评估,完成知识图谱的构建。S34, forming a semantic database from the fused semantic vector and the relationship between the semantic vectors, storing it in the server, evaluating the quality of the semantic database, and completing the construction of the knowledge map.

优选的,所述步骤S31中数据源包括标准的同义词与口语化词语词库,用户填写的基本信息,用户平时的行为数据、用户的聊天信息与网络上公开的网页数据。Preferably, the data sources in step S31 include standard synonyms and colloquial word thesaurus, basic information filled in by the user, user's usual behavior data, user's chat information and webpage data published on the Internet.

优选的,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息的步骤为:Preferably, the voice data is obtained, and the voice data is screened to remove the messy information, leaving only effective sentence patterns, and analyzing the effective sentence patterns to obtain the text information. The steps are:

S11、从所述语音数据中提取声音特征量;S11. Extracting sound features from the speech data;

S12、将所述声音特征量与声音库内的模型化的声音数据进行匹配,以获取相似度匹配的声音数据;S12. Match the sound feature quantity with the modeled sound data in the sound library, so as to obtain sound data with similarity matching;

S13、把所述声音数据与文字语音库内存储的语音数据进行对比匹配,得出文本信息。S13. Compare and match the voice data with the voice data stored in the text and voice database to obtain text information.

优选的,所述文字语音库中存储有文字、文字对应的语音和文字扩展词句的语音数据。Preferably, the text and speech database stores text, speech corresponding to the text, and speech data of extended words and sentences of the text.

优选的,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到音调信息的步骤为:Preferably, the voice data is obtained, and the voice data is screened to remove the messy information, leaving only effective sentence patterns, and analyzing the effective sentence patterns to obtain the tone information as follows:

S14、对所述语音数据进行频谱分析,并从所述语音数据中提取出声调音素;S14. Perform spectrum analysis on the voice data, and extract tone phonemes from the voice data;

S15、根据所述声调音素在声调模型中匹配出所述语音数据的声调。S15. Match the tone of the voice data in the tone model according to the tone phoneme.

优选的,根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息的步骤为:Preferably, the text information and tone information are identified according to the pre-trained semantic model, and analyzed through a standard synonym and colloquial word lexicon to obtain the first semantic information:

S21、通过语音模型对音调信息进行识别,并得到音调顺序,可通过音调信息剔除文本信息中音调顺序与识别出的音调顺序音调信息差距过大的组合;S21. Identify the tone information through the voice model, and obtain the tone order, and eliminate the combination of the tone order in the text information and the recognized tone order tone information through the tone information;

S22、通过语义模型对所述文本信息进行分词,得到至少一个词语;分别获取所述至少一个词语的特性;S22. Segment the text information through a semantic model to obtain at least one word; respectively acquire the characteristics of the at least one word;

S23、根据所述特性分别确定所述至少一个词语包含的信息量,并从所述至少一个词语中选取包含信息量多的至少一个词语作为关键词;S23. Determine the amount of information contained in the at least one word according to the characteristics, and select at least one word containing a large amount of information from the at least one word as a keyword;

S24、分别以所述关键词为中心做窗口,确定所述关键词的上下文词语;S24. Make a window centering on the keyword respectively, and determine the context words of the keyword;

S25、将所述上下文词语与标准的同义词与口语化词语词库进行匹配,得到匹配结果;S25. Match the context words and standard synonyms with the colloquial word lexicon to obtain a matching result;

S26、根据所述匹配结果分析语义,以得到第一语义信息。S26. Analyze semantics according to the matching result to obtain first semantic information.

与相关技术相比较,本发明提供的基于知识图谱的语义分析方法具有如下Compared with related technologies, the semantic analysis method based on knowledge map provided by the present invention has the following

有益效果:Beneficial effect:

1、本发明根据根据预先训练好的语义模型对语音数据中的文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息,根据第一语义信息结合构建的知识图谱进行关联分析,以得到第二语义信息,通过第二语义信息与第一语义信息对比,得到最终输出的语音信息,通过双语音信息的对比,可得到更加符合使用者要求的语义信息,提高对语义数据中信息解析的准确率;1. The present invention recognizes the text information and pitch information in the speech data according to the pre-trained semantic model, and analyzes it through standard synonyms and colloquial word lexicons to obtain the first semantic information, according to The first semantic information is combined with the constructed knowledge map to perform association analysis to obtain the second semantic information. By comparing the second semantic information with the first semantic information, the final output voice information is obtained. By comparing the dual voice information, a more consistent Semantic information required by users to improve the accuracy of information analysis in semantic data;

2、本发明在进行语义分析前,先进行知识图谱的构建,通过对数据源进行解析,将其解析生成多个语义向量;对生成的语义向量进行数据融合,并对融合后的语义向量之间挖掘关联关系,生成语义向量之间关系,对将融合后的语义向量与语义向量之间关系构成语义数据库,并将其存储在服务器中,并对语义数据库进行质量评估,完成知识图谱的构建,通过语义向量之间的关系,可实现准确的语义分析;2. Before performing semantic analysis, the present invention first constructs a knowledge map, analyzes the data source, and generates a plurality of semantic vectors by analyzing it; performs data fusion on the generated semantic vectors, and performs data fusion on the fused semantic vectors. Mining the association relationship between semantic vectors, generating the relationship between semantic vectors, forming a semantic database for the fused semantic vector and the relationship between semantic vectors, and storing it in the server, and evaluating the quality of the semantic database to complete the construction of the knowledge graph , through the relationship between semantic vectors, accurate semantic analysis can be realized;

附图说明Description of drawings

图1为本发明提供的基于知识图谱的语义分析方法的流程示意图;Fig. 1 is a schematic flow chart of a semantic analysis method based on a knowledge map provided by the present invention;

图2为图1所示的知识图谱的构建的具体方法的流程示意图;FIG. 2 is a schematic flowchart of a specific method for constructing the knowledge map shown in FIG. 1;

图3为图1所示的从语音数据中得到文本信息的流程示意图;Fig. 3 is the flow schematic diagram that obtains text information from speech data shown in Fig. 1;

图4为图1所示的从语音数据中得到音调信息的流程示意图;Fig. 4 is the flow diagram that obtains tone information from voice data shown in Fig. 1;

图5为图1所示的从文本信息得到第一语义信息流程示意图。FIG. 5 is a schematic flow diagram of obtaining first semantic information from text information shown in FIG. 1 .

具体实施方式Detailed ways

为了使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

在本发明实施例的描述中,需要说明的是,若出现术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,或者是该发明产品使用时惯常摆放的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,若出现术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In the description of the embodiments of the present invention, it should be noted that if the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", The orientation or positional relationship indicated by "outside" is based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship that is usually placed when the product of the invention is used, and is only for the convenience of describing the present invention and simplifying the description. It is not to indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, or operate in a particular orientation, and thus should not be construed as limiting the invention. In addition, the terms "first", "second", "third" and so on are only used for distinguishing descriptions, and should not be understood as indicating or implying relative importance.

此外,若出现术语“水平”、“竖直”、“悬垂”等术语并不表示要求部件绝对水平或悬垂,而是可以稍微倾斜。如“水平”仅仅是指其方向相对“竖直”而言更加水平,并不是表示该结构一定要完全水平,而是可以稍微倾斜。In addition, the appearance of the terms "horizontal", "vertical", "overhanging" etc. does not mean that the parts are absolutely horizontal or overhanging, but may be slightly inclined. For example, "horizontal" only means that its direction is more horizontal than "vertical", and it does not mean that the structure must be completely horizontal, but can be slightly inclined.

在本发明实施例的描述中,还需要说明的是,除非另有明确的规定和限定,若出现术语“设置”、“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the description of the embodiments of the present invention, it should also be noted that, unless otherwise specified and limited, the terms "setting", "installation", "connection" and "connection" should be interpreted in a broad sense, for example, It can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediary, and it can be the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention according to specific situations.

以下结合具体实施例对本发明的具体实现进行详细描述。The specific implementation of the present invention will be described in detail below in conjunction with specific embodiments.

本发明提供的基于知识图谱的语义分析方法包括:具体步骤如下:The semantic analysis method based on the knowledge map provided by the present invention includes: the specific steps are as follows:

S1、获取语音数据,并对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息以及音调信息,并将得到的文本信息与音调信息输入到语音模型中;S1. Acquire voice data, and screen the voice data, so as to eliminate messy information, leaving only effective sentence patterns, and analyze the effective sentence patterns to obtain text information and tone information, and combine the obtained text information and tone information input into the speech model;

S2、根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息;S2. Recognize the text information and tone information according to the pre-trained semantic model, and analyze them through the standard synonyms and colloquial word lexicon to obtain the first semantic information;

S3、根据第一语义信息结合构建的知识图谱进行关联分析,以得到第二语义信息;S3. Perform association analysis based on the first semantic information combined with the constructed knowledge map to obtain second semantic information;

S4、将第二语义信息与第一语义信息进行对比,当对比结果的差异值低于设定值时,根据第二语义信息确定语义分析结果,并将其进行输出,将对比结果的差异值较高时,提醒使用者重新输入。S4. Comparing the second semantic information with the first semantic information, when the difference value of the comparison result is lower than the set value, determine the semantic analysis result according to the second semantic information, and output it, and compare the difference value of the result When it is higher, the user is reminded to re-enter.

需要说明的是:获取语音数据一般可以通过声音获取设备,例如麦克风等传声器获取,用户对着麦克风讲话,以使麦克风可以收集到用户说话的语音数据,语音数据还可以通过音频收集装置等直接收集得到。例如,直接对某个音频文件进行收集,以获取语音数据。It should be noted that the acquisition of voice data can generally be obtained through sound acquisition equipment, such as microphones such as microphones. The user speaks into the microphone so that the microphone can collect the voice data of the user's speech. The voice data can also be directly collected through audio collection devices, etc. get. For example, directly collect an audio file to obtain speech data.

进一步说明,通过对有效句式进行分析,得到每个音频所对应的文字,并将这些文字进行进行组合,将组合的结果按照顺序排练,将排练的信息生成文本信息,文本信息可用于分析以获取用户所说的话的含义,通过将文本信息与音调信息输入到语义模型中,可通过音调信息剔除音调顺序与音调信息差距过大的组合,接着结合标准的同义词与口语化词语词库,剔除无法前后构成词语的组合,得到相应的第一语义信息,可通过以下举例进行说明:It is further explained that by analyzing the effective sentence patterns, the text corresponding to each audio is obtained, and these texts are combined, and the combined results are rehearsed in order, and the rehearsed information is generated into text information, which can be used for analysis. Obtain the meaning of what the user said. By inputting the text information and tone information into the semantic model, the combination of tone order and tone information that is too far apart can be eliminated through the tone information, and then combined with the standard synonyms and colloquial word lexicon, eliminate It is impossible to form a combination of words before and after, and obtain the corresponding first semantic information, which can be illustrated by the following examples:

通过将语音数据剔除杂乱信息,只留下的有效句式ni de quan li yao qiu shishen me,接着通过得到每个音频所对应的文字,并按照顺序将文字进行组合,,得到“你的劝离要求是什么”“你的全力要求是什么”,“你的权利要求是什么”等多组词语的组合,并得到音调信息,其有效句式的音调顺序为三声、轻声、二声、四声、一声、二声、四声、二声、轻声;即nǐde quán lìyāo qiúshìshén me,并结合标准的同义词与口语化词语词库,可得到第一语义信息“你的权利要求是什么”,并结合知识图谱进行分析,由于知识图谱中含有使用者的行为数据,因此得到第二语义信息“你的权利要求(书)是什么”即所要询问的是专利的权利要求书是什么,并不是询问法律意义上的权利要求,并且通过将第二语义信息“你的权利要求(书)是什么”与第一语义信息“你的权利要求是什么”做对比,得到差异值:“(书)”,而“(书)”低于设置值,设置值可以为字数数量,因此输出第二第二语义信息“你的权利要求(书)是什么”。By removing the messy information from the voice data, only the effective sentence pattern ni de quan li yao qiu shishen me is left, and then by obtaining the text corresponding to each audio, and combining the text in order, we can get "your persuasion What is the requirement", "What is your full demand", "What is your claim" and other groups of words, and the tone information is obtained. The tone order of the effective sentence pattern is three tones, soft tones, two tones, and four tones. sound, one tone, two tone, four tone, two tone, soft tone; i.e. nǐde quán lìyāo qiúshìshén me, combined with the standard synonyms and colloquial word lexicon, the first semantic information "what is your claim" can be obtained, and Combined with the analysis of the knowledge map, since the knowledge map contains the user's behavior data, the second semantic information "what is your claim (book)" is obtained, that is, the question to be asked is what is the patent claim, not an inquiry Claims in the legal sense, and by comparing the second semantic information "what is your claim (book)" with the first semantic information "what is your claim", the difference value is obtained: "(book)" , and "(book)" is lower than the set value, and the set value can be the number of words, so the second second semantic information "what is your claim (book)" is output.

在本发明的实施例中,所述知识图谱的构建的具体方法为:In an embodiment of the present invention, the specific method of constructing the knowledge map is as follows:

S31、通过可视化方式对接各种数据源,以及对数据源进行校验及管理,根据数据源建立基本知识库;S31. Connect various data sources in a visual way, verify and manage the data sources, and establish a basic knowledge base according to the data sources;

S32、对数据源进行解析,将其解析生成多个语义向量;S32. Analyze the data source, and generate multiple semantic vectors by analyzing it;

S33、对生成的语义向量进行数据融合,并对融合后的语义向量之间挖掘关联关系,生成语义向量之间关系;S33. Perform data fusion on the generated semantic vectors, and mine association relationships between the fused semantic vectors to generate relationships between semantic vectors;

S34、对将融合后的语义向量与语义向量之间关系构成语义数据库,并将其存储在服务器中,并对语义数据库进行质量评估,完成知识图谱的构建。S34, forming a semantic database from the fused semantic vector and the relationship between the semantic vectors, storing it in the server, evaluating the quality of the semantic database, and completing the construction of the knowledge map.

在本发明的实施例中,所述步骤S31中数据源包括标准的同义词与口语化词语词库,用户填写的基本信息,用户平时的行为数据、用户的聊天信息与网络上公开的网页数据。In an embodiment of the present invention, the data sources in step S31 include standard synonyms and colloquial word thesaurus, basic information filled in by the user, user's usual behavior data, user's chat information and web page data published on the Internet.

需要说明的是:标准的同义词与口语化词语词库无需进行处理,直接添加进入知识图谱系统里,用户填写的基本信息除了个别字段需要进一步处理,很多字段则直接可以用于建模或者添加到知识图谱系统里,对于行为数据以及用户的聊天信息来说,需要通过一些简单的处理,并从中提取有效的信息比如“用户在某个页面停留时长”“用户惯用的词语”等等,对于网络上公开的网页数据,则需要通过信息抽取相关的技术进行数据处理。It should be noted that standard synonyms and colloquial word thesaurus do not need to be processed, and are directly added to the knowledge graph system. Except for a few fields, the basic information filled in by users needs to be further processed, and many fields can be directly used for modeling or added to In the knowledge graph system, for behavioral data and user chat information, some simple processing is required to extract effective information such as "the length of time the user stays on a certain page", "the user's usual words", etc. For the network For webpage data published on the Internet, data processing needs to be carried out through information extraction related technologies.

在本发明的实施例中,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息的步骤为:In an embodiment of the present invention, the speech data is obtained, and the speech data is screened to remove the messy information, leaving only valid sentence patterns, and analyzing the effective sentence patterns to obtain the text information. The steps are as follows:

S11、从所述语音数据中提取声音特征量;S11. Extracting sound features from the speech data;

S12、将所述声音特征量与声音库内的模型化的声音数据进行匹配,以获取相似度匹配的声音数据;S12. Match the sound feature quantity with the modeled sound data in the sound library, so as to obtain sound data with similarity matching;

S13、把所述声音数据与文字语音库内存储的语音数据进行对比匹配,得出文本信息。S13. Compare and match the voice data with the voice data stored in the text and voice database to obtain text information.

在本发明的实施例中,所述文字语音库中存储有文字、文字对应的语音和文字扩展词句的语音数据。In an embodiment of the present invention, the text and speech database stores text, speech corresponding to the text, and speech data of extended words and sentences of the text.

在本发明的实施例中,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到音调信息的步骤为:In an embodiment of the present invention, the voice data is obtained, and the voice data is screened to remove messy information, leaving only valid sentence patterns, and analyzing the effective sentence patterns to obtain the tone information. The steps are as follows:

S14、对所述语音数据进行频谱分析,并从所述语音数据中提取出声调音素;S14. Perform spectrum analysis on the voice data, and extract tone phonemes from the voice data;

S15、根据所述声调音素在声调模型中匹配出所述语音数据的声调。S15. Match the tone of the voice data in the tone model according to the tone phoneme.

在本发明的实施例中,根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息的步骤为:In the embodiment of the present invention, the text information and tone information are identified according to the pre-trained semantic model, and analyzed through the standard synonyms and colloquial word lexicon to obtain the first semantic information. :

S21、通过语音模型对音调信息进行识别,并得到音调顺序,可通过音调信息剔除文本信息中音调顺序与识别出的音调顺序音调信息差距过大的组合;S21. Identify the tone information through the voice model, and obtain the tone order, and eliminate the combination of the tone order in the text information and the recognized tone order tone information through the tone information;

由于语音数据中,不是所有人发音都为标准音节,因此,可设定一个差异值,当差异值过大时,将数据剔除,Since not everyone in the voice data is pronounced as a standard syllable, a difference value can be set, and when the difference value is too large, the data will be eliminated.

S22、通过语义模型对所述文本信息进行分词,得到至少一个词语;分别获取所述至少一个词语的特性;S22. Segment the text information through a semantic model to obtain at least one word; respectively acquire the characteristics of the at least one word;

词语的特性可以分为名词、动词.、形容词、数词、量词、代词、副词、介词、连词、助词、叹词以及拟声词的大类,并在大类中继续细分为多个小类,例如将名词可以分为人物名词、事物名词、时间名词、方位名词,关系名词等,通过将获取词语的特性,可更好的理解词语的含义;具体如下:你(人称代词)的(结构助词)权利(事物名词)要求(事物名词)什么(疑问代词)。The characteristics of words can be divided into nouns, verbs , adjectives, numerals, quantifiers, pronouns, adverbs, prepositions, conjunctions, auxiliary words, interjections and onomatopoeia, and continue to be subdivided into several small categories within the category. For example, nouns can be divided into person nouns, thing nouns, time nouns, location nouns, relative nouns, etc. By acquiring the characteristics of the words, the meaning of the words can be better understood; the details are as follows: your (personal pronoun) ( Structural particle) right (thing noun) requires (thing noun) what (interrogative pronoun).

S23、根据所述特性分别确定所述至少一个词语包含的信息量,并从所述至少一个词语中选取包含信息量多的至少一个词语作为关键词;S23. Determine the amount of information contained in the at least one word according to the characteristics, and select at least one word containing a large amount of information from the at least one word as a keyword;

S24、分别以所述关键词为中心做窗口,确定所述关键词的上下文词语;S24. Make a window centering on the keyword respectively, and determine the context words of the keyword;

S25、将所述上下文词语与标准的同义词与口语化词语词库进行匹配,得到匹配结果;S25. Match the context words and standard synonyms with the colloquial word lexicon to obtain a matching result;

S26、根据所述匹配结果分析语义,以得到第一语义信息。S26. Analyze semantics according to the matching result to obtain first semantic information.

本发明中涉及的电路以及控制均为现有技术,在此不进行过多赘述。The circuits and controls involved in the present invention are all prior art, and will not be repeated here.

以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本发明的专利保护范围内。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, are all included in the scope of patent protection of the present invention in the same way.

Claims (7)

1.基于知识图谱的语义分析方法,其特征在于,具体步骤如下:1. A semantic analysis method based on a knowledge map, characterized in that the specific steps are as follows: S1、获取语音数据,并对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息以及音调信息,并将得到的文本信息与音调信息输入到语音模型中;S1. Acquire voice data, and screen the voice data, so as to eliminate messy information, leaving only effective sentence patterns, and analyze the effective sentence patterns to obtain text information and tone information, and combine the obtained text information and tone information input into the speech model; S2、根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息;S2. Recognize the text information and tone information according to the pre-trained semantic model, and analyze them through the standard synonyms and colloquial word lexicon to obtain the first semantic information; S3、根据第一语义信息结合构建的知识图谱进行关联分析,以得到第二语义信息;S3. Perform association analysis based on the first semantic information combined with the constructed knowledge map to obtain second semantic information; S4、将第二语义信息与第一语义信息进行对比,当对比结果的差异值低于设定值时,根据第二语义信息确定语义分析结果,并将其进行输出,将对比结果的差异值较高时,提醒使用者重新输入。S4. Comparing the second semantic information with the first semantic information, when the difference value of the comparison result is lower than the set value, determine the semantic analysis result according to the second semantic information, and output it, and compare the difference value of the result When it is higher, the user is reminded to re-enter. 2.根据权利要求1所述的基于知识图谱的语义分析方法,其特征在于,所述知识图谱的构建的具体方法为:2. The semantic analysis method based on knowledge map according to claim 1, characterized in that, the specific method of the construction of said knowledge map is: S31、通过可视化方式对接各种数据源,以及对数据源进行校验及管理,根据数据源建立基本知识库;S31. Connect various data sources in a visual way, verify and manage the data sources, and establish a basic knowledge base according to the data sources; S32、对数据源进行解析,将其解析生成多个语义向量;S32. Analyze the data source, and generate multiple semantic vectors by analyzing it; S33、对生成的语义向量进行数据融合,并对融合后的语义向量之间挖掘关联关系,生成语义向量之间关系;S33. Perform data fusion on the generated semantic vectors, and mine association relationships between the fused semantic vectors to generate relationships between semantic vectors; S34、对将融合后的语义向量与语义向量之间关系构成语义数据库,并将其存储在服务器中,并对语义数据库进行质量评估,完成知识图谱的构建。S34, forming a semantic database from the fused semantic vector and the relationship between the semantic vectors, storing it in the server, evaluating the quality of the semantic database, and completing the construction of the knowledge map. 3.根据权利要求2所述的基于知识图谱的语义分析方法,其特征在于,所述步骤S31中数据源包括标准的同义词与口语化词语词库,用户填写的基本信息,用户平时的行为数据、用户的聊天信息与网络上公开的网页数据。3. The semantic analysis method based on knowledge graph according to claim 2, characterized in that, the data sources in the step S31 include standard synonyms and colloquial word lexicons, basic information filled in by users, and user's usual behavior data , user's chat information and webpage data published on the Internet. 4.根据权利要求1所述的基于知识图谱的语义分析方法,其特征在于,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到文本信息的步骤为:4. The semantic analysis method based on knowledge graph according to claim 1, characterized in that, the speech data is obtained, and the speech data is screened, so as to eliminate the messy information, leaving only valid sentence patterns, and analyzing the effective sentence patterns , the steps to get the text information are: S11、从所述语音数据中提取声音特征量;S11. Extracting sound features from the speech data; S12、将所述声音特征量与声音库内的模型化的声音数据进行匹配,以获取相似度匹配的声音数据;S12. Match the sound feature quantity with the modeled sound data in the sound library, so as to obtain sound data with similarity matching; S13、把所述声音数据与文字语音库内存储的语音数据进行对比匹配,得出文本信息。S13. Compare and match the voice data with the voice data stored in the text and voice database to obtain text information. 5.根据权利要求4所述的基于知识图谱的语义分析方法,其特征在于,所述文字语音库中存储有文字、文字对应的语音和文字扩展词句的语音数据。5. The semantic analysis method based on knowledge graph according to claim 4, characterized in that, the text and speech database stores text, speech corresponding to the text and speech data of extended words and sentences of the text. 6.根据权利要求1所述的基于知识图谱的语义分析方法,其特征在于,获取语音数据,对语音数据进行筛选,从而剔除杂乱信息,只留下有效句式,并对有效句式进行分析,以得到音调信息的步骤为:6. The semantic analysis method based on knowledge graph according to claim 1, characterized in that, the speech data is obtained, and the speech data is screened, so as to eliminate the messy information, leaving only valid sentence patterns, and analyzing the effective sentence patterns , the steps to obtain pitch information are: S14、对所述语音数据进行频谱分析,并从所述语音数据中提取出声调音素;S14. Perform spectrum analysis on the voice data, and extract tone phonemes from the voice data; S15、根据所述声调音素在声调模型中匹配出所述语音数据的声调。S15. Match the tone of the voice data in the tone model according to the tone phoneme. 7.根据权利要求1所述的基于知识图谱的语义分析方法,其特征在于,根据预先训练好的语义模型对文本信息与音调信息进行识别,并通过标准的同义词与口语化词语词库,对其进行分析,以得到第一语义信息的步骤为:7. The semantic analysis method based on knowledge graph according to claim 1, characterized in that, according to the pre-trained semantic model, text information and tone information are identified, and through standard synonyms and colloquial word lexicon, the The steps of analyzing it to obtain the first semantic information are: S21、通过语音模型对音调信息进行识别,并得到音调顺序,可通过音调信息剔除文本信息中音调顺序与识别出的音调顺序音调信息差距过大的组合;S21. Identify the tone information through the voice model, and obtain the tone order, and eliminate the combination of the tone order in the text information and the recognized tone order tone information through the tone information; S22、通过语义模型对所述文本信息进行分词,得到至少一个词语;分别获取所述至少一个词语的特性;S22. Segment the text information through a semantic model to obtain at least one word; respectively acquire the characteristics of the at least one word; S23、根据所述特性分别确定所述至少一个词语包含的信息量,并从所述至少一个词语中选取包含信息量多的至少一个词语作为关键词;S23. Determine the amount of information contained in the at least one word according to the characteristics, and select at least one word containing a large amount of information from the at least one word as a keyword; S24、分别以所述关键词为中心做窗口,确定所述关键词的上下文词语;S24. Make a window centering on the keyword respectively, and determine the context words of the keyword; S25、将所述上下文词语与标准的同义词与口语化词语词库进行匹配,得到匹配结果;S25. Match the context words and standard synonyms with the colloquial word lexicon to obtain a matching result; S26、根据所述匹配结果分析语义,以得到第一语义信息。S26. Analyze semantics according to the matching result to obtain first semantic information.
CN202310490219.3A 2023-05-04 2023-05-04 Semantic Analysis Method Based on Knowledge Graph Pending CN116644756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310490219.3A CN116644756A (en) 2023-05-04 2023-05-04 Semantic Analysis Method Based on Knowledge Graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310490219.3A CN116644756A (en) 2023-05-04 2023-05-04 Semantic Analysis Method Based on Knowledge Graph

Publications (1)

Publication Number Publication Date
CN116644756A true CN116644756A (en) 2023-08-25

Family

ID=87622125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310490219.3A Pending CN116644756A (en) 2023-05-04 2023-05-04 Semantic Analysis Method Based on Knowledge Graph

Country Status (1)

Country Link
CN (1) CN116644756A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116822529A (en) * 2023-08-29 2023-09-29 国网信息通信产业集团有限公司 Knowledge element extraction method based on semantic generalization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583907A (en) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 Information processing method, device and storage medium
CN112071304A (en) * 2020-09-08 2020-12-11 深圳市天维大数据技术有限公司 Semantic analysis method and device
CN114742063A (en) * 2022-01-20 2022-07-12 洪芳华 Sentence pattern semantic analysis method for knowledge graph member
US20220230629A1 (en) * 2021-01-20 2022-07-21 Microsoft Technology Licensing, Llc Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583907A (en) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 Information processing method, device and storage medium
CN112071304A (en) * 2020-09-08 2020-12-11 深圳市天维大数据技术有限公司 Semantic analysis method and device
US20220230629A1 (en) * 2021-01-20 2022-07-21 Microsoft Technology Licensing, Llc Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module
CN114742063A (en) * 2022-01-20 2022-07-12 洪芳华 Sentence pattern semantic analysis method for knowledge graph member

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116822529A (en) * 2023-08-29 2023-09-29 国网信息通信产业集团有限公司 Knowledge element extraction method based on semantic generalization
CN116822529B (en) * 2023-08-29 2023-12-29 国网信息通信产业集团有限公司 Knowledge element extraction method based on semantic generalization

Similar Documents

Publication Publication Date Title
Tucker et al. The massive auditory lexical decision (MALD) database
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
US10672391B2 (en) Improving automatic speech recognition of multilingual named entities
Griol et al. Combining speech-based and linguistic classifiers to recognize emotion in user spoken utterances
JP6815899B2 (en) Output statement generator, output statement generator and output statement generator
CN107305768A (en) Easy wrongly written character calibration method in interactive voice
US7197457B2 (en) Method for statistical language modeling in speech recognition
Kumar et al. A knowledge graph based speech interface for question answering systems
Seljan et al. Combined automatic speech recognition and machine translation in business correspondence domain for english-croatian
JP5073024B2 (en) Spoken dialogue device
Savargiv et al. Text material design for fuzzy emotional speech corpus based on persian semantic and structure
Dyriv et al. The user's psychological state identification based on Big Data analysis for person's electronic diary
KR101097186B1 (en) System and method for synthesizing voice of multi-language
CN116644756A (en) Semantic Analysis Method Based on Knowledge Graph
KR20210071713A (en) Speech Skill Feedback System
Vadapalli et al. Learning continuous-valued word representations for phrase break prediction.
Ronzhin et al. Survey of russian speech recognition systems
Alsharhan et al. Robust automatic accent identification based on the acoustic evidence
KR101168312B1 (en) Apparatus and method for creating response using weighting
Declerck Towards a new ontology for sign languages
Sinha et al. Transforming interactions: mouse-based to voice-based interfaces
Harvey et al. Associating colours with emotions detected in social media tweets
Juan et al. Language modelling for a low-resource language in Sarawak, Malaysia
Ueberla Analyzing and improving statistical language models for speech recognition
Heydarova Compiling of Phonetic Database Structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination