CN111639240B - Cross-modal Hash retrieval method and system based on attention awareness mechanism - Google Patents
Cross-modal Hash retrieval method and system based on attention awareness mechanism Download PDFInfo
- Publication number
- CN111639240B CN111639240B CN202010408302.8A CN202010408302A CN111639240B CN 111639240 B CN111639240 B CN 111639240B CN 202010408302 A CN202010408302 A CN 202010408302A CN 111639240 B CN111639240 B CN 111639240B
- Authority
- CN
- China
- Prior art keywords
- modal
- cross
- hash
- data
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000007246 mechanism Effects 0.000 title claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 34
- 230000008447 perception Effects 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 20
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 4
- 238000004321 preservation Methods 0.000 claims description 2
- 238000013139 quantization Methods 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 241000257303 Hymenoptera Species 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于注意力感知机制的跨模态哈希检索方法及系统,包括:对跨模态数据集中的训练集进行特征提取和注意力特征提取,得到经注意力特征加权的跨模态特征;将跨模态数据对的跨模态特征输入至哈希学习模型中,根据输出的跨模态哈希码以最小化损失函数为目标优化哈希学习模型;根据由优化后的哈希学习模型得到的待测数据的哈希码,在与待测数据模态不同的模态数据的哈希码中,筛选满足检索要求的模态数据。将注意力机制应用于跨模态哈希检索任务中,提出注意力感知机制的新型注意力方法,实现对原始数据中的噪声和冗余进行抑制处理同时对重点关注区域进行增强,提高哈希码的生成质量。
The invention discloses a cross-modal hash retrieval method and system based on an attention perception mechanism. Modal features; input the cross-modal features of the cross-modal data pair into the hash learning model, and optimize the hash learning model with the goal of minimizing the loss function according to the output cross-modal hash code; The hash code of the data to be tested obtained by the hash learning model, among the hash codes of the modal data that is different from that of the data to be tested, screen the modal data that meets the retrieval requirements. The attention mechanism is applied to the cross-modal hash retrieval task, and a new attention method based on the attention-aware mechanism is proposed, which can suppress the noise and redundancy in the original data while enhancing the key attention areas and improve the hash rate. code generation quality.
Description
技术领域technical field
本发明涉及跨模态哈希检索技术领域,特别是涉及一种基于注意力感知机制的跨模态哈希检索方法及系统。The invention relates to the technical field of cross-modal hash retrieval, in particular to a cross-modal hash retrieval method and system based on an attention-aware mechanism.
背景技术Background technique
本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.
随着网络多媒体数据的爆发式增长,需要采用现有图像检索与其相关的文本或视频,或者基于文本检索图像或视频,即使用一种模态的数据检索另一种模态的相似样本,与此同时数据的高效存储和快速查询也成为一个难题,基于此,近年来有学者提出使用哈希学习的方式来解决这一难题,因为哈希学习方法能够将原始的高维样本数据使用简单紧凑的二进制哈希码来表示,由此可以极大的压缩数据规模,便于数据保存和互相检索。With the explosive growth of network multimedia data, it is necessary to use existing images to retrieve related texts or videos, or to retrieve images or videos based on texts, that is, to use data from one modality to retrieve similar samples from another modality. At the same time, the efficient storage and fast query of data has also become a difficult problem. Based on this, in recent years, some scholars have proposed the use of hash learning to solve this problem, because the hash learning method can use the original high-dimensional sample data in a simple and compact manner. It can be represented by the binary hash code, which can greatly compress the data size, which is convenient for data storage and mutual retrieval.
跨模态检索技术旨在根据已有的数据检索与之相匹配的不同模态的数据,如通过文本信息在数据库中查找符合文字描述的图片集。现有技术按照是否结合深度学习技术可以分为深度模型和非深度模型,传统的深度跨模态哈希检索模型通常分为三个步骤,首先使用深度网络提取不同模态的特征,然后根据提取到的特征使用全连接网络在交叉熵损失以及样本相似度矩阵的监督下学习哈希函数,最后通过哈希函数将样本转成哈希码保存在数据库中。Cross-modal retrieval technology aims at retrieving data of different modalities that match the existing data, such as searching for a set of pictures that match the text description in a database through text information. Existing technologies can be divided into deep models and non-deep models according to whether they are combined with deep learning technology. The traditional deep cross-modal hash retrieval model is usually divided into three steps. First, the deep network is used to extract the features of different modalities. The obtained features use a fully connected network to learn a hash function under the supervision of cross entropy loss and sample similarity matrix, and finally convert the sample into a hash code and save it in the database through the hash function.
现阶段已有很多跨模态哈希检索方法被提出,但是,发明人发现现有技术至少存在以下问题:对于检索任务来说,真实数据往往会存在一些噪声和冗余,而在特征提取时,需要提取最有用的视觉信息,而忽略背景信息,因为背景信息会对检索造成干扰;但是在实际数据中,有价值类别的信息仅覆盖一小部分,大多数区域为背景,而当前的大多数跨模态检索方法会忽略这一问题,直接从原始数据中学习特征,因此可能会被无效或冗余信息所误导,以致生成低质量的哈希码;此外,很多效果较好的深度跨模态哈希检索模型为了提升检索效果,往往会引入参数量较多效果更好的网络模型,如GAN(生成对抗网络)等,但是会大幅度增加训练和检索时间。At this stage, many cross-modal hash retrieval methods have been proposed. However, the inventor found that the prior art has at least the following problems: For retrieval tasks, real data often have some noise and redundancy, and when feature extraction , it is necessary to extract the most useful visual information, while ignoring the background information, because the background information will interfere with the retrieval; but in the actual data, the information of the valuable category only covers a small part, most of the areas are background, and the current large Most cross-modal retrieval methods ignore this problem and learn features directly from raw data, so they may be misled by invalid or redundant information, resulting in low-quality hash codes; In order to improve the retrieval effect of the modal hash retrieval model, a network model with more parameters and better effect is often introduced, such as GAN (generative adversarial network), etc., but it will greatly increase the training and retrieval time.
发明内容SUMMARY OF THE INVENTION
为了解决上述问题,本发明提出了一种基于注意力感知机制的跨模态哈希检索方法及系统,将注意力机制应用于跨模态哈希检索任务中,提出注意力感知机制的新型注意力方法,包含多种模态数据的跨模态数据集实现同时进行特征学习和哈希编码学习,最后将经注意力加权的特征表示反馈到哈希学习模型中用以指导哈希码的生成,实现对原始数据中的噪声和冗余进行抑制处理同时对重点关注区域进行增强,提高哈希码的生成质量。In order to solve the above problems, the present invention proposes a cross-modal hash retrieval method and system based on an attention-aware mechanism, applies the attention mechanism to the cross-modal hash retrieval task, and proposes a new type of attention-aware mechanism. Force method, the cross-modal data set containing multiple modal data realizes feature learning and hash coding learning at the same time, and finally the attention-weighted feature representation is fed back to the hash learning model to guide the generation of hash codes , to suppress the noise and redundancy in the original data and enhance the key areas of interest, so as to improve the generation quality of the hash code.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
第一方面,本发明提供一种基于注意力感知机制的跨模态哈希检索方法,包括:In a first aspect, the present invention provides a cross-modal hash retrieval method based on an attention-aware mechanism, including:
对跨模态数据集中的训练集进行特征提取和注意力特征提取,得到经注意力特征加权的跨模态特征;Perform feature extraction and attention feature extraction on the training set in the cross-modal dataset to obtain cross-modal features weighted by attention features;
将训练集中跨模态数据对的跨模态特征输入至哈希学习模型中,根据输出的跨模态哈希码以最小化损失函数为目标优化哈希学习模型;Input the cross-modal features of the cross-modal data pairs in the training set into the hash learning model, and optimize the hash learning model with the goal of minimizing the loss function according to the output cross-modal hash code;
根据由优化后的哈希学习模型得到的待测数据的哈希码,在跨模态数据集中与待测数据模态不同的模态数据的哈希码中,筛选满足检索要求的模态数据。According to the hash codes of the data to be tested obtained from the optimized hash learning model, the modal data that meet the retrieval requirements are screened from the hash codes of the modal data in the cross-modal data set that are different from the modalities of the data to be tested. .
第二方面,本发明提供一种基于注意力感知机制的跨模态哈希检索系统,包括:In a second aspect, the present invention provides a cross-modal hash retrieval system based on an attention-aware mechanism, including:
特征提取模块,用于对跨模态数据集中的训练集进行特征提取和注意力特征提取,得到经注意力特征加权的跨模态特征;The feature extraction module is used to perform feature extraction and attention feature extraction on the training set in the cross-modal data set, and obtain the cross-modal feature weighted by the attention feature;
哈希学习模块,用于将训练集中跨模态数据对的跨模态特征输入至哈希学习模型中,根据输出的跨模态哈希码以最小化损失函数为目标优化哈希学习模型;The hash learning module is used to input the cross-modal features of the cross-modal data pairs in the training set into the hash learning model, and optimize the hash learning model with the goal of minimizing the loss function according to the output cross-modal hash code;
检索模块,用于根据由优化后的哈希学习模型得到的待测数据的哈希码,在跨模态数据集中与待测数据模态不同的模态数据的哈希码中,筛选满足检索要求的模态数据。The retrieval module is used to filter the hash codes of the modal data that are different from the modalities of the data to be tested in the cross-modal data set according to the hash codes of the data to be tested obtained by the optimized hash learning model. Modal data requested.
第三方面,本发明提供一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成第一方面所述的方法。In a third aspect, the present invention provides an electronic device, comprising a memory, a processor, and computer instructions stored in the memory and executed on the processor, and when the computer instructions are executed by the processor, the method described in the first aspect is completed .
第四方面,本发明提供一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成第一方面所述的方法。In a fourth aspect, the present invention provides a computer-readable storage medium for storing computer instructions, and when the computer instructions are executed by a processor, the method described in the first aspect is completed.
与现有技术相比,本发明的有益效果为:Compared with the prior art, the beneficial effects of the present invention are:
本发明中跨模态数据集中包含多种模态数据,且多种模态数据实现同时进行特征学习和哈希编码学习,提高哈希码生成的效率。In the present invention, the cross-modal data set includes multiple modal data, and the multiple modal data realizes feature learning and hash coding learning at the same time, thereby improving the efficiency of hash code generation.
本发明提出注意力感知机制的新型注意力方法,将注意力机制应用于跨模态哈希检索任务中,两个不同的模态加权,不仅可以突出跨模态数据的关键部分,如图片中物体存在的区域或文本输入中的某个单词,同时还可以抑制冗余或无效部分对检索效果的影响,如图片背景或文本干扰词等,有效地提高哈希码生成的质量,并且能适用于各种多模态数据场景下的跨模态检索任务The present invention proposes a new attention method of the attention perception mechanism, and applies the attention mechanism to the cross-modal hash retrieval task. Two different modal weights can not only highlight the key parts of the cross-modal data, such as in the picture The area where the object exists or a word in the text input can also suppress the influence of redundant or invalid parts on the retrieval effect, such as picture background or text noise words, etc., which can effectively improve the quality of hash code generation, and can be applied Cross-modal retrieval tasks in various multimodal data scenarios
附图说明Description of drawings
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings forming a part of the present invention are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention.
图1(a)-(b)为图片模态数据;Figure 1(a)-(b) is the picture modal data;
图1(c)为公共数据集MIRFlicker-25K中文本标注词频排名前10位的单词;Figure 1(c) shows the top 10 words in the public dataset MIRFlicker-25K in terms of text annotation frequency;
图1(d)为图1(a)的文本标注数据;Fig. 1(d) is the text annotation data of Fig. 1(a);
图2为本发明实施例1提供的基于注意力感知机制的跨模态哈希检索方法流程图;2 is a flowchart of a cross-modal hash retrieval method based on an attention-aware mechanism provided in
图3为本发明实施例1提供的图像注意力特征提取流程图;3 is a flowchart of image attention feature extraction provided by
图4为本发明实施例1提供的文本注意力特征提取流程图;4 is a flowchart of text attention feature extraction provided by
图5为本发明实施例1提供的基于注意力感知机制的跨模态哈希检索系统结构图。FIG. 5 is a structural diagram of a cross-modal hash retrieval system based on an attention-aware mechanism according to
具体实施方式:Detailed ways:
下面结合附图与实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.
应该指出,以下详细说明都是例示性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the invention. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that the terms "including" and "having" and any conjugations thereof are intended to cover the non-exclusive A process, method, system, product or device comprising, for example, a series of steps or units is not necessarily limited to those steps or units expressly listed, but may include those steps or units not expressly listed or for such processes, methods, Other steps or units inherent to the product or equipment.
在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
实施例1Example 1
目前已有多种跨模态哈希检索方法被提出,但是由于真实数据存在噪声和冗余,而目前的检索方法直接从原始数据中学习特征,会被无效或冗余信息所误导,以致生成低质量的哈希码。以图片和文本两个模态为例,如图1(a)-1(b)所示,对于图1(a)的图片,需要突出蜜蜂和花朵所在区域而忽略后面的背景部分,因为会对检索造成干扰;同样地,对于图1(b)的图片,标签即监督信息为“动物”、“花朵”和“植物生命”,最有用的视觉信息可能是在花朵上悬停的蝴蝶。但是,这些有价值类别的信息仅覆盖整个图像的一小部分,而该图像中的大多数区域是背景;At present, a variety of cross-modal hash retrieval methods have been proposed, but due to the noise and redundancy in real data, the current retrieval methods directly learn features from the original data, which will be misled by invalid or redundant information, resulting in the generation of Low quality hash codes. Taking the two modes of picture and text as an example, as shown in Figure 1(a)-1(b), for the picture in Figure 1(a), it is necessary to highlight the area where bees and flowers are located and ignore the background part behind, because it will interferes with retrieval; similarly, for the picture in Figure 1(b), where the labels, i.e., the supervision information, are “animal”, “flower”, and “plant life”, the most useful visual information may be the butterfly hovering over the flower. However, the information of these valuable categories covers only a small part of the whole image, and most of the areas in this image are the background;
如图1(c)所示包含了公共数据集MIRFlicker-25K中,文本标注词频排名前10位的单词,可以看到有一半的单词:“explore”,“canon”,“bw”,“nikon”和“2007”都是与图像内容没有直接关系的无效单词;图1(d)则是对图1(a)的文本标注,只有单词“bees”是与检索任务相关的。As shown in Figure 1(c), which contains the top 10 words in the public dataset MIRFlicker-25K, the word frequency of text annotations, you can see that there are half of the words: "explore", "canon", "bw", "nikon" ” and “2007” are both invalid words that are not directly related to the image content; Figure 1(d) is the text annotation of Figure 1(a), and only the word “bees” is relevant to the retrieval task.
由此可以看出,如果不对原始数据中噪声和冗余进行抑制处理,容易导致生成低质量的哈希码,影响检索结果。It can be seen from this that if the noise and redundancy in the original data are not suppressed, it is easy to generate low-quality hash codes and affect the retrieval results.
Attention机制近年来在计算机视觉领域得到广泛应用,例如对自然语言处理、物体检测、图像识别以及语音识别等方面,但在跨模态检索方向却鲜有人使用。传统的Attention机制用在图像识别上,能够自发寻找图片中需要重点关注的部分,即通过学习生成一个和图片表示(这个图片表示可以是原始图片,也可以是feature map等)大小相同的Mask;对于关注区域,Mask对应位置具有较高的激活值。根据作用区域,Attention模型通常可以分为空间注意力模型和通道注意力模型;空间注意力模型针对feature map中的不同位置生成对应的注意力值,还原到原始图片中就是图片中不同的位置对于任务具有不同程度的影响;通道注意力机制针对feature map中的不同channel生成对应的注意力值,更具有抽象性。The Attention mechanism has been widely used in the field of computer vision in recent years, such as natural language processing, object detection, image recognition, and speech recognition, but it is rarely used in cross-modal retrieval. The traditional Attention mechanism is used in image recognition, which can spontaneously find the part of the picture that needs to be focused on, that is, by learning to generate a Mask with the same size as the picture representation (this picture representation can be the original picture, or a feature map, etc.); For the region of interest, the corresponding position of the Mask has a higher activation value. According to the area of action, the Attention model can usually be divided into a spatial attention model and a channel attention model; the spatial attention model generates corresponding attention values for different positions in the feature map, and when restored to the original picture, the different positions in the picture are for Tasks have different degrees of influence; the channel attention mechanism generates corresponding attention values for different channels in the feature map, which is more abstract.
本实施例融合了空间注意力机制,将注意力机制应用于跨模态哈希检索任务中,在传统的注意力机制的基础上提出一种新的注意力方法,称为注意力感知机制,用于两个不同的模态加权;This embodiment integrates the spatial attention mechanism, applies the attention mechanism to the cross-modal hash retrieval task, and proposes a new attention method based on the traditional attention mechanism, which is called the attention-aware mechanism. for two different modal weights;
即本实施例中的基于注意力感知机制的跨模态哈希检索方法,对原始数据中的噪声和冗余进行抑制处理同时对重点关注区域进行增强,进而提取注意力矩阵,对于生成哈希码的质量有较好的提升效果,可用于各种多模态数据场景下的跨模态信息检索,如图2所示,具体包括以下步骤:That is, the cross-modal hash retrieval method based on the attention perception mechanism in this embodiment suppresses the noise and redundancy in the original data while enhancing the key attention area, and then extracts the attention matrix. The quality of the code has a good improvement effect, and can be used for cross-modal information retrieval in various multi-modal data scenarios, as shown in Figure 2, which includes the following steps:
S1:对跨模态数据集中的训练集进行特征提取和注意力特征提取,得到经注意力特征加权的跨模态特征;S1: Perform feature extraction and attention feature extraction on the training set in the cross-modal dataset to obtain cross-modal features weighted by attention features;
S2:将训练集中跨模态数据对的跨模态特征输入至哈希学习模型中,根据输出的跨模态哈希码以最小化损失函数为目标优化哈希学习模型;S2: Input the cross-modal features of the cross-modal data pairs in the training set into the hash learning model, and optimize the hash learning model with the goal of minimizing the loss function according to the output cross-modal hash code;
S3:根据由优化后的哈希学习模型得到的待测数据的哈希码,在跨模态数据集中与待测数据模态不同的模态数据的哈希码中,筛选满足检索要求的模态数据。S3: According to the hash code of the data to be tested obtained by the optimized hash learning model, among the hash codes of the modal data in the cross-modal data set with different modalities from the data to be tested, filter the models that meet the retrieval requirements. state data.
所述步骤S1中,跨模态数据集包括多种模态数据,在本实施例中,以图像模态数据和文本模态数据为例,可以理解的,该模态类型可以扩展其他模态,如视频、语音等。In the step S1, the cross-modal data set includes multiple modal data. In this embodiment, taking image modal data and text modal data as examples, it can be understood that this modal type can extend other modalities. , such as video, voice, etc.
将跨模态数据集划分为训练集和测试集,对训练集中的图像和文本的跨模态数据对采用两个并列的卷积神经网络同时进行特征提取和注意力特征提取;具体为:获取初始注意力矩阵,以最小化损失函数对卷积神经网络进行训练,输出改进后的注意力矩阵;将注意力矩阵与卷积神经网络输出的特征矩阵进行点乘操作,得到经注意力特征加权的跨模态特征。Divide the cross-modal data set into a training set and a test set, and use two parallel convolutional neural networks to simultaneously perform feature extraction and attention feature extraction on the cross-modal data pairs of images and texts in the training set; specifically: obtaining The initial attention matrix is used to train the convolutional neural network by minimizing the loss function, and the improved attention matrix is output; the attention matrix and the feature matrix output by the convolutional neural network are dot-multiplied to obtain the weighted attention feature. cross-modal features.
其中,对训练集中的图像进行图像特征提取和图像注意力特征提取,具体包括:Among them, image feature extraction and image attention feature extraction are performed on the images in the training set, including:
S1-1:图像特征提取过程采用卷积神经网络CNN_F作为基础的网络结构,在第五个卷积层Conv5输出图像特征矩阵;S1-1: The image feature extraction process uses the convolutional neural network CNN_F as the basic network structure, and outputs the image feature matrix in the fifth convolutional layer Conv5;
S1-2:图像注意力特征提取过程包括:(1)在第五个卷积层和全连接层之间引入一个attention层,改进了残差网络Resnet-50,如图3所示,采用新的卷积层Conv6和最大池化层Max pooling替换全连接层,引入Conv6层的目的是确保最终注意力图的大小与图像特征提取过程中Conv5层输出的图像特征矩阵大小一致;使用改进的Resnet-50网络提取初始注意力矩阵O,并使用交叉熵函数作为损失函数来对该网络进行预训练。S1-2: The image attention feature extraction process includes: (1) An attention layer is introduced between the fifth convolutional layer and the fully connected layer, and the residual network Resnet-50 is improved, as shown in Figure 3, using a new The convolutional layer Conv6 and the maximum pooling layer Max pooling replace the fully connected layer. The purpose of introducing the Conv6 layer is to ensure that the size of the final attention map is consistent with the image feature matrix output by the Conv5 layer during the image feature extraction process; using the improved Resnet- 50 The network extracts the initial attention matrix O and uses the cross-entropy function as the loss function to pre-train this network.
(2)对初始注意力矩阵进一步改进:(2) Further improve the initial attention matrix:
O′ir=sigmoid(maxk(Oijk)),O′ ir = sigmoid(max k (O ijk )),
其中,O′ir是图片Ii的第r个区域对应的注意力权重,Oijk是预训练网络输出O中同样位置第k个类别(共有Nc个类别)的数值。Among them, O'ir is the attention weight corresponding to the rth region of the image I i , and Oijk is the value of the kth category (a total of Nc categories) in the same position in the output O of the pre-training network.
其中,是最终获得的注意力矩阵,μi是可计算的阈值,具体计算方式如下:in, is the final attention matrix obtained, and μ i is the computable threshold, the specific calculation method is as follows:
将该图片不同区域的注意力值进行升序排序,并假设一张图片中大约有p%(0<p<100)的区域属于冗余区,同时剩下的部分(约占1-p%)是重点区域;那么μi的值设为O′i排序后的第个激活值,其中Nr=n×n表示区域数量。Sort the attention values of different areas of the image in ascending order, and assume that about p% (0<p<100) of the areas in an image belong to redundant areas, while the rest (about 1-p%) is the key area; then the value of μ i is set to the th activation values, where Nr=n×n represents the number of regions.
(3)将在通道层面上进行延展,得到新的权重矩阵然后和Conv5层输出的图像特征矩阵做点乘操作,得到经图像注意力特征加权的图像特征。(3) will Extend at the channel level to get a new weight matrix Then, do a dot product with the image feature matrix output by the Conv5 layer to obtain image features weighted by image attention features.
对训练集中的图像进行文本特征提取和文本注意力特征提取,具体包括:Perform text feature extraction and text attention feature extraction on the images in the training set, including:
S1-3:文本特征提取过程中采用两个全连接层获取文本特征;S1-3: In the process of text feature extraction, two fully connected layers are used to obtain text features;
S1-4:文本注意力特征提取过程包括:(1)在第一全连接层Fc1前引入attention层,采用不含隐藏层的神经网络,即一个两层的非线性分类网络,得到输入文本表示的每一个标注和其对应分类之间的映射关系W,如图4所示,并使用W作为初始注意力矩阵,使用最小平方误差损失指导该分类网络的训练。S1-4: The text attention feature extraction process includes: (1) Introducing an attention layer before the first fully connected layer Fc1, using a neural network without hidden layers, that is, a two-layer nonlinear classification network, to obtain the input text representation The mapping relationship W between each annotation and its corresponding classification is shown in Figure 4, and using W as the initial attention matrix, the least square error loss is used to guide the training of the classification network.
(2)对初始注意力矩阵进一步改进:(2) Further improve the initial attention matrix:
使用SoftMax函数标准化Wij,并假设文本yi对于不同类别的贡献度服从分布Fi(·),Use the SoftMax function to standardize Wi ij , and assume that the contribution of text yi to different categories obeys the distribution F i ( ),
Fi(lj)=W′ij,F i (l j )=W′ ij ,
其中,lj是第j个样本对应的标签信息,Among them, l j is the label information corresponding to the jth sample,
求解每个标注对应的信息熵:Solve the information entropy corresponding to each label:
W″i=-Ei,W″ i =-E i ,
求解最终的注意力矩阵 Solve the final attention matrix
其中,v是可计算的阈值,具体计算方式为:Among them, v is a computable threshold, and the specific calculation method is:
将注意力矩阵W″i进行升序排列,把v设置为第个位置对应的值,其中Nt表示文本标注集合中不同标签的数量。Arrange the attention matrix W″ i in ascending order, and set v as the first The value corresponding to each position, where Nt represents the number of distinct labels in the text annotation set.
(3)将原始文本特征与文本注意图相乘得到经文本注意力特征加权的文本特征;其中,原始文本特征使用BoW表示,也可以是其他形式如Word2Vec。(3) Combine the original text features with the text attention map Multiply to obtain text features weighted by text attention features; wherein, the original text features are represented by BoW, and can also be in other forms such as Word2Vec.
所述步骤S2中,将图像特征和文本特征输入至哈希学习网络模型中,采用sign函数得到二值化哈希码,以最小化损失函数为目标构建全局目标函数:In the step S2, the image features and text features are input into the hash learning network model, the sign function is used to obtain a binary hash code, and the global objective function is constructed with the goal of minimizing the loss function:
其中,n为样本集中样本数量,Bx是图片模态对应的二值哈希码,By是文本模态对应的二值哈希码,设置B=Bx=By=sign(γ(F+G)),Wx、Wy是图片模态数据和文本模态数据对应的初始注意力矩阵,F*=fx(xi,θx),θx是图像网络参数,F是图像网络的输出;G*=fy(yi,θy),θy是文本网络参数,G是文本网络的输出;令γ和η均为超参数;相似度矩阵S为:对于两个不同的样本i,j,若两个样本标签至少有一个类都存在,那么将Sij设置为1,否则置为0。Among them, n is the number of samples in the sample set, B x is the binary hash code corresponding to the picture mode, By is the binary hash code corresponding to the text mode, set B = B x = By = sign(γ( F+G)), W x , W y are the initial attention matrices corresponding to the image modal data and the text modal data, F * = f x (x i , θ x ), θ x is the image network parameter, F is The output of the image network; G * = f y (y i , θ y ), θ y is the text network parameter, G is the output of the text network; let Both γ and η are hyperparameters; the similarity matrix S is: for two different samples i, j, if at least one class of the two sample labels exists, then S ij is set to 1, otherwise it is set to 0.
在本实施例中,全局目标函数第一项为负对数似然损失函数,第二项为量化损失函数,由于样本之间的相似性关系是通过标签信息L得到的,因此为了更加充分的利用样本监督信息,本实施例提出第三项损失,即语义保持损失函数。In this embodiment, the first term of the global objective function is the negative log-likelihood loss function, and the second term is the quantization loss function. Using the sample supervision information, this embodiment proposes a third loss, that is, a semantic preservation loss function.
所述步骤S2中,以最小化损失函数为目标优化哈希学习模型,需要优化的变量分别为B,F,G,Wx,Wy,本实施例采用迭代优化的方式最小化损失函数,即每次只优化一个变量,其他变量保持不变。具体的优化策略如下:In the step S2, the hash learning model is optimized with the goal of minimizing the loss function, and the variables to be optimized are B, F, G, W x , and W y respectively. In this embodiment, an iterative optimization method is used to minimize the loss function, That is, only one variable is optimized at a time, and the other variables remain unchanged. The specific optimization strategy is as follows:
S2-1:固定变量B,G,Wx,Wy,更新变量F:S2-1: Fix variables B, G, W x , W y , update variable F:
对于样本点xi,使用随机梯度下降法优化F*,即:For sample points x i , use stochastic gradient descent to optimize F * , ie:
采用链式法则计算即经反向传播更新图像网络的参数θx。Calculated using the chain rule which is The parameters θ x of the image network are updated via backpropagation.
S2-2:固定变量B,F,G,Wy,更新变量Wx:S2-2: Fix variables B, F, G, W y , update variable W x :
使用随机梯度下降法更新该变量, Update this variable using stochastic gradient descent,
S2-3:固定变量B,F,Wx,Wy,更新变量G:S2-3: Fix variables B, F, W x , W y , update variable G:
和更新变量F的过程类似,对于样本点yj,首先计算变量G的梯度,即:Similar to the process of updating the variable F, for the sample point y j , the gradient of the variable G is first calculated, namely:
使用链式法则计算并更新参数θy。Calculate using the chain rule and update the parameter θ y .
S2-4:固定变量B,F,G,Wx,更新变量Wy,即:S2-4: Fix variables B, F, G, W x , update variable W y , namely:
S2-5:固定变量F,G,Wx,Wy,更新变量B,即:S2-5: Fix variables F, G, W x , W y , update variable B, namely:
其中,V=γ(F+G)。where V=γ(F+G).
所述步骤S3中,对哈希学习模型完成优化后,根据优化后的哈希学习模型,对跨模态数据集中所有样本计算得到对应的哈希码;In the step S3, after the optimization of the hash learning model is completed, according to the optimized hash learning model, the corresponding hash codes are calculated for all samples in the cross-modal data set;
在进行检索任务时,将得到数据输入至模型中得到对应的哈希码,在跨模态数据集中与待测数据模态不同的模态数据的哈希码中,检索汉明距离最近的N个哈希码,筛选出满足该检索要求的跨模态数据。When the retrieval task is performed, the obtained data is input into the model to obtain the corresponding hash code, and in the hash code of the modal data in the cross-modal data set with different modalities from the data to be tested, the N nearest Hamming distance is retrieved. A hash code is used to filter out cross-modal data that meets the retrieval requirements.
实施例2Example 2
如图5所示,本实施例提供一种基于注意力感知机制的跨模态哈希检索系统,包括:As shown in FIG. 5 , this embodiment provides a cross-modal hash retrieval system based on an attention-aware mechanism, including:
特征提取模块,用于对跨模态数据集中的训练集进行特征提取和注意力特征提取,得到经注意力特征加权的跨模态特征;The feature extraction module is used to perform feature extraction and attention feature extraction on the training set in the cross-modal data set, and obtain the cross-modal feature weighted by the attention feature;
哈希学习模块,用于将训练集中跨模态数据对的跨模态特征输入至哈希学习模型中,根据输出的跨模态哈希码以最小化损失函数为目标优化哈希学习模型;The hash learning module is used to input the cross-modal features of the cross-modal data pairs in the training set into the hash learning model, and optimize the hash learning model with the goal of minimizing the loss function according to the output cross-modal hash code;
检索模块,用于根据由优化后的哈希学习模型得到的待测数据的哈希码,在跨模态数据集中与待测数据模态不同的模态数据的哈希码中,筛选满足检索要求的模态数据。The retrieval module is used to filter the hash codes of the modal data that are different from the modalities of the data to be tested in the cross-modal data set according to the hash codes of the data to be tested obtained by the optimized hash learning model. Modal data requested.
此处需要说明的是,上述模块对应于实施例1中的步骤S1至S3,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述模块作为系统的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。It should be noted here that the above modules correspond to steps S1 to S3 in
在本实施例中,特征提取模块中接收图片和文本,图像数据和文本数据同时进行特征学习和哈希编码学习,在图像特征提取网络中包括图像注意力特征提取模块,在文本特征提取网络中包括文本注意力特征提取模块,最后将经过注意力加权的特征输入到哈希学习模块中用以指导哈希码的生成,提高哈希码生成的质量,适用于各种多模态数据场景下的跨模态检索任务。In this embodiment, the feature extraction module receives pictures and texts, and the image data and text data perform feature learning and hash coding learning at the same time. The image feature extraction network includes an image attention feature extraction module, and the text feature extraction network includes an image attention feature extraction module. Including the text attention feature extraction module, and finally input the attention-weighted features into the hash learning module to guide the generation of hash codes and improve the quality of hash code generation, which is suitable for various multi-modal data scenarios cross-modal retrieval task.
在更多实施例中,还提供:In further embodiments, there is also provided:
一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成实施例1中所述的方法。为了简洁,在此不再赘述。An electronic device includes a memory, a processor, and computer instructions stored on the memory and executed on the processor, and when the computer instructions are executed by the processor, the method described in
应理解,本实施例中,处理器可以是中央处理单元CPU,处理器还可以是其他通用处理器、数字信号处理器DSP、专用集成电路ASIC,现成可编程门阵列FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general-purpose processors, digital signal processors DSP, application-specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据、存储器的一部分还可以包括非易失性随机存储器。例如,存储器还可以存储设备类型的信息。The memory may include read-only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成实施例1中所述的方法。A computer-readable storage medium for storing computer instructions, when the computer instructions are executed by a processor, the method described in
实施例1中的方法可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器、闪存、只读存储器、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。The method in
本领域普通技术人员可以意识到,结合本实施例描述的各示例的单元即算法步骤,能够以电子硬件或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the unit, that is, the algorithm step of each example described in conjunction with this embodiment, can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
以上仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.
上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative efforts. Various modifications or deformations that can be made are still within the protection scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010408302.8A CN111639240B (en) | 2020-05-14 | 2020-05-14 | Cross-modal Hash retrieval method and system based on attention awareness mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010408302.8A CN111639240B (en) | 2020-05-14 | 2020-05-14 | Cross-modal Hash retrieval method and system based on attention awareness mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111639240A CN111639240A (en) | 2020-09-08 |
CN111639240B true CN111639240B (en) | 2021-04-09 |
Family
ID=72331952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010408302.8A Active CN111639240B (en) | 2020-05-14 | 2020-05-14 | Cross-modal Hash retrieval method and system based on attention awareness mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111639240B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199375B (en) * | 2020-09-30 | 2024-03-01 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
CN112364198B (en) * | 2020-11-17 | 2023-06-30 | 深圳大学 | A cross-modal hash retrieval method, terminal device and storage medium |
CN112329439B (en) * | 2020-11-18 | 2021-11-19 | 北京工商大学 | Food safety event detection method and system based on graph convolution neural network model |
CN112287159B (en) * | 2020-12-18 | 2021-04-09 | 北京世纪好未来教育科技有限公司 | Retrieval method, electronic device and computer readable medium |
CN112598067A (en) * | 2020-12-25 | 2021-04-02 | 中国联合网络通信集团有限公司 | Emotion classification method and device for event, electronic equipment and storage medium |
CN112817914A (en) * | 2021-01-21 | 2021-05-18 | 深圳大学 | Attention-based deep cross-modal Hash retrieval method and device and related equipment |
CN112734625B (en) * | 2021-01-29 | 2022-06-07 | 成都视海芯图微电子有限公司 | Hardware acceleration system and method based on 3D scene design |
CN112862727B (en) * | 2021-03-16 | 2023-06-23 | 上海壁仞智能科技有限公司 | Cross-modal image conversion method and device |
CN113095415B (en) * | 2021-04-15 | 2022-06-14 | 齐鲁工业大学 | A cross-modal hashing method and system based on multimodal attention mechanism |
CN113032614A (en) * | 2021-04-28 | 2021-06-25 | 泰康保险集团股份有限公司 | Cross-modal information retrieval method and device |
CN113220919B (en) * | 2021-05-17 | 2022-04-22 | 河海大学 | A cross-modal retrieval method and model for dam defect image text |
CN113343014A (en) * | 2021-05-25 | 2021-09-03 | 武汉理工大学 | Cross-modal image audio retrieval method based on deep heterogeneous correlation learning |
CN113239237B (en) * | 2021-07-13 | 2021-11-30 | 北京邮电大学 | Cross-media big data searching method and device |
CN114090801B (en) * | 2021-10-19 | 2024-07-19 | 山东师范大学 | Deep countering attention cross-modal hash retrieval method and system |
CN114817606B (en) * | 2022-03-07 | 2025-01-28 | 齐鲁工业大学(山东省科学院) | Image-text retrieval method and system based on cross-attention hashing network |
CN116776157B (en) * | 2023-08-17 | 2023-12-12 | 鹏城实验室 | Model learning method supporting modal increase and device thereof |
CN117194740B (en) * | 2023-11-08 | 2024-01-30 | 武汉大学 | Geographic information retrieval intent update method and system based on guided iterative feedback |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885764A (en) * | 2017-09-21 | 2018-04-06 | 银江股份有限公司 | Based on the quick Hash vehicle retrieval method of multitask deep learning |
CN108170755A (en) * | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346440B (en) * | 2014-10-10 | 2017-06-23 | 浙江大学 | A kind of across media hash indexing methods based on neutral net |
CN107562812B (en) * | 2017-08-11 | 2021-01-15 | 北京大学 | Cross-modal similarity learning method based on specific modal semantic space modeling |
US11062179B2 (en) * | 2017-11-02 | 2021-07-13 | Royal Bank Of Canada | Method and device for generative adversarial network training |
US10248664B1 (en) * | 2018-07-02 | 2019-04-02 | Inception Institute Of Artificial Intelligence | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval |
US11556581B2 (en) * | 2018-09-04 | 2023-01-17 | Inception Institute of Artificial Intelligence, Ltd. | Sketch-based image retrieval techniques using generative domain migration hashing |
CN109992686A (en) * | 2019-02-24 | 2019-07-09 | 复旦大学 | Image-text retrieval system and method based on multi-angle self-attention mechanism |
CN109960732B (en) * | 2019-03-29 | 2023-04-18 | 广东石油化工学院 | Deep discrete hash cross-modal retrieval method and system based on robust supervision |
CN110222140B (en) * | 2019-04-22 | 2021-07-13 | 中国科学院信息工程研究所 | A cross-modal retrieval method based on adversarial learning and asymmetric hashing |
CN110472642B (en) * | 2019-08-19 | 2022-02-01 | 齐鲁工业大学 | Fine-grained image description method and system based on multi-level attention |
CN111125457A (en) * | 2019-12-13 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | A deep cross-modal hash retrieval method and device |
-
2020
- 2020-05-14 CN CN202010408302.8A patent/CN111639240B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885764A (en) * | 2017-09-21 | 2018-04-06 | 银江股份有限公司 | Based on the quick Hash vehicle retrieval method of multitask deep learning |
CN108170755A (en) * | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
Also Published As
Publication number | Publication date |
---|---|
CN111639240A (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639240B (en) | Cross-modal Hash retrieval method and system based on attention awareness mechanism | |
CN118277538B (en) | Legal intelligent question-answering method based on retrieval enhancement language model | |
CN111027595B (en) | Two-stage semantic word vector generation method | |
CN111191002B (en) | Neural code searching method and device based on hierarchical embedding | |
CN112560432A (en) | Text emotion analysis method based on graph attention network | |
WO2023160472A1 (en) | Model training method and related device | |
CN115203442B (en) | Cross-modal deep hash retrieval method, system and medium based on joint attention | |
CN110297931A (en) | A kind of image search method | |
CN112232087A (en) | An Aspect-Specific Sentiment Analysis Approach for Transformer-Based Multi-granularity Attention Models | |
CN107665248A (en) | File classification method and device based on deep learning mixed model | |
CN118312833A (en) | Hierarchical multi-label classification method and system for travel resources | |
CN114821050A (en) | Named image segmentation method based on transformer | |
CN118411572A (en) | Small sample image classification method and system based on multi-mode multi-level feature aggregation | |
CN115098646A (en) | A multi-level relationship analysis and mining method for graphic data | |
CN114780767A (en) | A large-scale image retrieval method and system based on deep convolutional neural network | |
CN113157919A (en) | Sentence text aspect level emotion classification method and system | |
CN115794871A (en) | Table question-answer processing method based on Tapas model and graph attention network | |
CN115422369A (en) | Knowledge graph completion method and device based on improved TextRank | |
CN115205640A (en) | A multi-level image-text fusion method and system for rumor detection | |
CN114881172A (en) | Software vulnerability automatic classification method based on weighted word vector and neural network | |
CN120011558A (en) | Text classification method based on pre-training language model fusion deep convolutional network | |
CN112784017B (en) | Archive cross-modal data feature fusion method based on main affinity expression | |
CN111930972B (en) | Method and system for cross-modal retrieval of multimedia data using tag level information | |
CN118484529A (en) | A contract risk detection method and device based on large language model | |
CN114925211A (en) | Fact verification method for tabular data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |