CN113792207A - Cross-modal retrieval method based on multi-level feature representation alignment - Google Patents
Cross-modal retrieval method based on multi-level feature representation alignment Download PDFInfo
- Publication number
- CN113792207A CN113792207A CN202111149240.4A CN202111149240A CN113792207A CN 113792207 A CN113792207 A CN 113792207A CN 202111149240 A CN202111149240 A CN 202111149240A CN 113792207 A CN113792207 A CN 113792207A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- data
- target
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000006870 function Effects 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000013528 artificial neural network Methods 0.000 claims abstract description 27
- 230000000007 visual effect Effects 0.000 claims description 75
- 238000013507 mapping Methods 0.000 claims description 20
- 239000013598 vector Substances 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 16
- 239000000284 extract Substances 0.000 claims description 11
- 238000003062 neural network model Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 239000000758 substrate Substances 0.000 claims 2
- 238000005070 sampling Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 7
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000003993 interaction Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- FKOQWAUFKGFWLH-UHFFFAOYSA-M 3,6-bis[2-(1-methylpyridin-1-ium-4-yl)ethenyl]-9h-carbazole;diiodide Chemical compound [I-].[I-].C1=C[N+](C)=CC=C1C=CC1=CC=C(NC=2C3=CC(C=CC=4C=C[N+](C)=CC=4)=CC=2)C3=C1 FKOQWAUFKGFWLH-UHFFFAOYSA-M 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于多层次特征表示对齐的跨模态检索方法,涉及跨模态检索技术领域。本发明通过在跨模态细粒度精确对齐阶段,分别计算图像和文本两种不同模态数据之间的全局相似度、局部相似度和关系相似度,并融合得到图像‑文本综合相似度,在神经网络训练阶段,设计相应损失函数,挖掘跨模态结构约束信息,从多个角度约束和监督检索模型的参数学习,最后根据图像‑文本综合相似度获取测试查询样例的检索结果,从而通过引入图像和文本两种不同模态数据之间的细粒度关联关系,有效提高跨模态检索的准确率,在图文检索、模式识别等领域具有广泛的市场需求和应用前景。
The invention discloses a cross-modal retrieval method based on multi-level feature representation alignment, and relates to the technical field of cross-modal retrieval. The present invention calculates the global similarity, local similarity and relationship similarity between two different modal data of image and text respectively in the stage of cross-modal fine-grained and accurate alignment, and fuses to obtain the comprehensive similarity of image-text, in the In the neural network training phase, the corresponding loss function is designed, the cross-modal structure constraint information is mined, the parameter learning of the retrieval model is constrained and supervised from multiple perspectives, and finally the retrieval results of the test query samples are obtained according to the comprehensive similarity of the image-text, so as to pass Introducing the fine-grained correlation between two different modal data, image and text, effectively improves the accuracy of cross-modal retrieval, and has broad market demand and application prospects in image retrieval, pattern recognition and other fields.
Description
技术领域technical field
本发明涉及跨模态检索技术领域,特别涉及一种基于多层次特征表示对齐的跨模态检索方法。The invention relates to the technical field of cross-modality retrieval, in particular to a cross-modality retrieval method based on alignment of multi-level feature representations.
背景技术Background technique
随着移动互联网、社交网络等新一代互联网技术的快速发展,文本、图像、视频等多模态数据呈现爆炸式增长。跨模态检索技术旨在通过挖掘和利用不同模态数据之间的关联信息,实现不同模态数据之间的跨越检索,其核心是实现跨模态数据之间的相似度度量。近年来,跨模态检索技术已成为国内外研究热点,受到学术界和工业界的广泛关注,是跨模态智能的重要研究领域之一,也是信息检索领域未来发展的重要方向。With the rapid development of new-generation Internet technologies such as mobile Internet and social networks, multi-modal data such as text, images, and videos have exploded. The cross-modal retrieval technology aims to realize the cross-retrieval between different modal data by mining and utilizing the correlation information between the different modal data, and its core is to realize the similarity measurement between the cross-modal data. In recent years, cross-modal retrieval technology has become a research hotspot at home and abroad, and has received extensive attention from academia and industry. It is one of the important research fields of cross-modal intelligence and an important direction for the future development of information retrieval.
跨模态检索同时涉及多种模态的数据,这些数据之间存在“异构鸿沟”,即它们在高层语义上相互关联,但在底层特征上呈现异构性,因此需要检索算法能够深入挖掘不同模态数据之间的关联信息,实现一种模态数据到另一种模态数据的对齐。Cross-modal retrieval involves data from multiple modalities at the same time. There is a "heterogeneous gap" between these data, that is, they are related to each other in high-level semantics, but they are heterogeneous in underlying features, so the retrieval algorithm needs to be able to dig deeper. The association information between different modal data, to realize the alignment of one modal data to another.
目前,子空间学习方法是跨模态检索的主流方法,该类方法又可细分为基于传统统计相关性分析的检索模型和基于深度学习的检索模型。其中,基于传统统计相关性分析的跨模态检索方法通过线性映射矩阵将不同模态数据映射到子空间,最大化不同模态数据之间的相关性。基于深度学习的跨模态检索方法利用深度神经网络的特征抽取能力抽取不同模态数据的有效表示,同时利用神经网络的复杂非线性映射能力挖掘跨模态数据之间复杂关联特性。At present, the subspace learning method is the mainstream method of cross-modal retrieval, which can be subdivided into retrieval models based on traditional statistical correlation analysis and retrieval models based on deep learning. Among them, the cross-modal retrieval method based on traditional statistical correlation analysis maps different modal data to subspaces through a linear mapping matrix to maximize the correlation between different modal data. The deep learning-based cross-modal retrieval method uses the feature extraction ability of deep neural network to extract effective representations of different modal data, and uses the complex nonlinear mapping ability of neural network to mine complex correlation characteristics between cross-modal data.
在实现本发明的过程中,申请人发现现有技术存在以下技术问题:In the process of realizing the present invention, the applicant found that the prior art has the following technical problems:
现有技术提供的跨模态检索方法注重图像和文本的全局特征和局部特征的表示学习、关联分析和对齐,但缺乏视觉目标之间关系的推理和关系信息的对齐,且无法全面有效利用训练数据蕴含的结构约束信息监督模型进行训练,导致跨模态检索方法对图像和文本的跨模态检索精确度较低。The cross-modal retrieval methods provided by the prior art focus on the representation learning, association analysis and alignment of global and local features of images and texts, but lack reasoning about the relationship between visual objects and alignment of relationship information, and cannot fully and effectively utilize training. The structural constraint information contained in the data supervises the training of the model, resulting in low cross-modal retrieval accuracy of images and texts for cross-modal retrieval methods.
发明内容SUMMARY OF THE INVENTION
为了解决现有技术存在的上述问题,本发明提供了一种基于多层次特征表示对齐的跨模态检索方法,通过跨模态多层次表示关联,准确衡量图像和文本之间的相似度,有效提供检索准确率,从而解决现有跨模态检索方法表示不够精细、跨模态关联不够充分的技术问题,同时,利用跨模态结构约束信息监督检索模型的训练。本发明的技术方案如下:In order to solve the above problems existing in the prior art, the present invention provides a cross-modal retrieval method based on multi-level feature representation alignment. Provide retrieval accuracy to solve the technical problems of insufficient representation of existing cross-modal retrieval methods and insufficient cross-modal correlation, and at the same time, use cross-modal structural constraint information to supervise the training of retrieval models. The technical scheme of the present invention is as follows:
根据本发明实施例的一个方面,提供一种基于多层次特征表示对齐的跨模态检索方法,其特征在于,所述方法包括:According to an aspect of the embodiments of the present invention, a cross-modal retrieval method based on multi-level feature representation alignment is provided, wherein the method includes:
获取训练数据集,对于所述训练数据集中的每组数据对,所述数据对包括图像数据、文本数据,以及所述图像数据与所述文本数据共同对应的语义标签;Obtaining a training data set, for each group of data pairs in the training data set, the data pairs include image data, text data, and a semantic label corresponding to the image data and the text data in common;
对于所述训练数据集中的每组数据对,分别提取所述数据对中图像数据对应的图像全局特征、图像局部特征和图像关系特征,以及所述数据对中文本数据对应的文本全局特征、文本局部特征和文本关系特征;For each group of data pairs in the training data set, extract the image global features, image local features and image relationship features corresponding to the image data in the data pair, and text global features, textual features corresponding to the text data in the data pair, respectively. Local features and text relationship features;
对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据对,根据所述目标数据对对应的图像全局特征和文本全局特征、所述目标数据对对应的图像局部特征和文本局部特征、所述目标数据对对应的图像关系特征和文本关系特征计算得到所述目标数据对对应的图像-文本综合相似度;For a target data pair composed of any image data and any text data in the training data set, according to the image global feature and text global feature corresponding to the target data pair, the image local feature and text local feature corresponding to the target data pair The image-text comprehensive similarity corresponding to the target data pair is obtained by calculating the corresponding image relationship feature and the text relationship feature of the feature, the target data pair;
基于各组目标数据对对应的图像-文本综合相似度,设计模态间结构约束损失函数和模态内结构约束损失函数,并采用所述模态间结构约束损失函数和所述模态内结构约束损失函数对模型进行训练。Based on the image-text comprehensive similarity corresponding to each set of target data pairs, an inter-modal structural constraint loss function and an intra-modal structural constraint loss function are designed, and the inter-modal structural constraint loss function and the intra-modal structural constraint loss function are used. The constrained loss function trains the model.
在一个优选的实施例中,所述对于所述训练数据集中的每组数据对,分别提取所述数据对中图像数据对应的图像全局特征、图像局部特征和图像关系特征,以及所述数据对中文本数据对应的文本全局特征、文本局部特征和文本关系特征的步骤,包括:In a preferred embodiment, for each group of data pairs in the training data set, image global features, image local features and image relationship features corresponding to the image data in the data pairs are extracted respectively, and the data pairs The steps of text global features, text local features and text relationship features corresponding to Chinese text data include:
对于所述训练数据集中的每组数据对,采用卷积神经网络CNN提取所述数据对所 对应图像数据的图像全局特征,然后采用视觉目标检测器检测所述图像数据包括的视觉 目标并提取每个视觉目标的图像局部特征,其中,M为所述图像数据包括的视 觉目标数量,为视觉目标的特征向量,再通过图像视觉关系编码网络提取各个视觉目 标之间的图像关系特征,其中,为视觉目标和视觉目标之间的图像关系特 征; For each group of data pairs in the training data set, the convolutional neural network CNN is used to extract the image global features of the image data corresponding to the data pairs , and then use a visual target detector to detect the visual targets included in the image data and extract the image local features of each visual target , where M is the number of visual objects included in the image data, for visual target feature vector, and then extract the image relationship features between each visual object through the image visual relationship coding network ,in, for visual target and visual target The image relationship features between;
对于所述训练数据集中的每组数据对,采用词嵌入模型将所述数据对所对应文本 数据中的每个词转换为词向量,其中,N为所述文本数据包括的词数量,然后将各个词 向量依次输入至递归神经网络,获得所述文本数据对应的文本全局特征,再将各个词向 量输入至前馈神经网络获得各个词对应的文本局部特征,同时将各个词向量 输入至文本关系编码网络提取各个词之间的文本关系特征,其中,为词和词 之间的文本关系特征。 For each set of data pairs in the training data set, a word embedding model is used to convert each word in the text data corresponding to the data pair into a word vector , where N is the number of words included in the text data, and then input each word vector into the recurrent neural network in turn to obtain the text global features corresponding to the text data , and then input each word vector into the feedforward neural network to obtain the local text features corresponding to each word , and input each word vector into the text relation coding network to extract the text relation features between each word ,in, for word and word textual relationship between.
在一个优选的实施例中,所述对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据对,根据所述目标数据对对应的图像全局特征和文本全局特征、所述目标数据对对应的图像局部特征和文本局部特征、所述目标数据对对应的图像关系特征和文本关系特征计算得到所述目标数据对对应的图像-文本综合相似度的步骤,包括:In a preferred embodiment, for the target data pair composed of any image data and any text data in the training data set, according to the image global feature and text global feature corresponding to the target data pair, the target The step of calculating the image-text comprehensive similarity corresponding to the target data pair by calculating the corresponding image local feature and text local feature, the target data pair corresponding image relationship feature and text relationship feature, including:
对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据对,基于所 述目标数据对中图像数据对应的图像全局特征和文本数据对应的文本全局特征的 余弦距离,计算得到所述目标数据对对应的图像-文本全局相似度;其中,图像-文本 全局相似度的计算公式如公式(1): For a target data pair composed of any image data and any text data in the training data set, based on the image global feature corresponding to the image data in the target data pair Text global features corresponding to text data The cosine distance is calculated to obtain the image-text global similarity corresponding to the target data pair ; where, image-text global similarity The calculation formula is as formula (1):
) 公式(1) ) Formula 1)
采用文本引导注意力机制计算所述目标数据对中图像数据包括的每个视觉目标 的权重,将各个视觉目标的图像局部特征进行对应权重加权后,经前馈神经网络映射获 得新的图像局部表示,然后采用视觉引导注意力机制计算所述目标数据对中文本数据 包括的每个词的权重,将各个词的文本局部特征进行对应权重加权后,经前馈神经网络 映射得到新的文本局部表示,根据各个图像局部表示和各个文本局部表示计算所 有视觉目标和词的余弦相似度,并以其均值计算得到所述目标数据对对应的图像-文本局 部相似度;其中,图像-文本局部相似度的计算公式如公式(2),M为视觉目标数 量,N为词数量: The text-guided attention mechanism is used to calculate the weight of each visual target included in the image data in the target data pair, and the image local features of each visual target are calculated. After the corresponding weights are weighted, a new local representation of the image is obtained through the feedforward neural network mapping , and then the visual guidance attention mechanism is used to calculate the weight of each word included in the text data in the target data pair, and the text local features of each word are calculated. After the corresponding weights are weighted, the new local representation of the text is obtained through the feedforward neural network mapping. , according to the local representation of each image and individual text local representations Calculate the cosine similarity of all visual targets and words, and calculate the corresponding image-text local similarity of the target data pair with its mean ; where, the image-text local similarity The calculation formula is as formula (2), M is the number of visual objects, and N is the number of words:
公式(2) Formula (2)
根据所述目标数据对中各个图像关系特征和各个文本关系特征的余弦相似度均 值,计算得到所述目标数据对对应的图像-文本关系相似度;其中,图像-文本关系相 似度的计算公式如公式(3),P表示图像数据和文本数据的关系个数: According to the average cosine similarity of each image relationship feature and each text relationship feature in the target data pair, the image-text relationship similarity corresponding to the target data pair is calculated and obtained ; where, the image-text relationship similarity The calculation formula of , such as formula (3), P represents the number of relationships between image data and text data:
公式(3) Formula (3)
根据所述目标数据对对应的图像-文本全局相似度、图像-文本局部相似度、图像-文本关系相似度计算得到所述目标数据对对应的图像-文本综合相似 度;其中,图像-文本综合相似度的计算公式如公式(4): According to the target data pair corresponding image-text global similarity , image-text local similarity , image-text relationship similarity Calculate the image-text comprehensive similarity corresponding to the target data pair ; Among them, the image-text comprehensive similarity The calculation formula is as formula (4):
公式(4)。 Formula (4).
在一个优选的实施例中,所述模态间结构约束损失函数的计算公式如公式(5),其 中,B为样本数,为模型超参数,为匹配的目标数据对,和为非匹配 的目标数据对: In a preferred embodiment, the calculation formula of the structural constraint loss function between modes is as formula (5), where B is the number of samples, are model hyperparameters, is the matching target data pair, and For non-matching target data pairs:
公式(5) Formula (5)
所述模态内结构约束损失函数的计算公式如公式(6),其中,为图像三 元组,相比于,与具有更多共同语义标签,为文本三元组,相比于,与 具有更多共同语义标签:The calculation formula of the structural constraint loss function within the mode is as formula (6), wherein, is an image triple, compared to , and have more common semantic labels, is a text triple, compared to , and Has more common semantic labels:
公式(6)。 Formula (6).
在一个优选的实施例中,所述采用所述模态间结构约束损失函数和所述模态内结构约束损失函数对神经网络模型进行训练的步骤,包括:In a preferred embodiment, the step of using the inter-modal structural constraint loss function and the intra-modal structural constraint loss function to train the neural network model includes:
从所述训练数据集中随机采样获得匹配的目标数据对、非匹配的目标数据对、图像三元组、文本三元组,分别根据所述模态间结构约束损失函数计算模态间结构约束损失函数值,根据所述模态内结构约束损失函数计算模态内结构约束损失函数值,并按公式(7)进行融合,利用反向传播算法优化网络参数:The matching target data pairs, non-matching target data pairs, image triples, and text triples are randomly sampled from the training data set, and the inter-modal structural constraint loss is calculated according to the inter-modal structural constraint loss function respectively. function value, according to the structural constraint loss function in the mode, calculate the loss function value of the structure constraint in the mode, and fuse it according to formula (7), and use the back propagation algorithm to optimize the network parameters:
公式(7) Formula (7)
其中是超参数。 in are hyperparameters.
在一个优选的实施例中,所述通过图像视觉关系编码网络提取各个视觉目标之间 的图像关系特征的步骤,包括: In a preferred embodiment, the image relationship feature between each visual object is extracted through an image visual relationship coding network steps, including:
经图像视觉目标检测器获得图像中视觉目标和视觉目标的特征,,以及两 个目标联合区域的特征,采用公式(8)对上述各个特征进行融合,计算得到各个关系特 征: The visual target in the image is obtained by the image visual target detector and visual target Characteristics , , and the features of the two target joint regions , using formula (8) to fuse the above features, and calculate each relationship feature:
公式(8) Formula (8)
其中[]为向量拼接操作,为神经元激活函数,为模型参数。 Where [] is the vector concatenation operation, is the neuron activation function, are model parameters.
在一个优选的实施例中,所述将各个词向量输入至文本关系编码网络提取各个词 之间的文本关系特征的步骤,包括: In a preferred embodiment, the inputting each word vector into a text relational coding network extracts textual relational features between each word steps, including:
在文本关系编码网络中,采用公式(9)计算词和词之间的文本关系特征: In the text relational coding network, formula (9) is used to calculate the word and word textual relational features between :
公式(9) Formula (9)
其中,表示神经元激活函数,为模型参数。 in, represents the neuron activation function, are model parameters.
在一个优选的实施例中,所述采用文本引导注意力机制计算所述目标数据对中图 像数据包括的每个视觉目标的权重,将各个视觉目标的图像局部特征进行对应权重加 权后,经前馈神经网络映射获得新的图像局部表示的步骤,包括: In a preferred embodiment, the weight of each visual target included in the image data in the target data pair is calculated by using a text-guided attention mechanism, and the image local features of each visual target are combined After the corresponding weights are weighted, a new local representation of the image is obtained through the feedforward neural network mapping steps, including:
采用文本引导注意力机制,通过公式(10)计算图像中每个视觉目标的权重:Using the text-guided attention mechanism, the weight of each visual object in the image is calculated by formula (10):
公式(10) Formula (10)
其中,、为模型参数; in, , are model parameters;
通过公式(11)对每个视觉目标进行加权,并经过前馈神经网络映射获得新的图像 局部表示: Each visual object is weighted by formula (11), and a new image local representation is obtained through feedforward neural network mapping :
公式(11) Formula (11)
其中,为模型参数。 in, are model parameters.
在一个优选的实施例中,所述采用视觉引导注意力机制计算所述目标数据对中文 本数据包括的每个词的权重,将各个词的文本局部特征进行对应权重加权后,经前馈神 经网络映射得到新的文本局部表示的步骤,包括: In a preferred embodiment, the visual guidance attention mechanism is used to calculate the weight of each word included in the text data in the target data pair, and the local text features of each word After the corresponding weights are weighted, the new local representation of the text is obtained through the feedforward neural network mapping. steps, including:
采用视觉引导注意力机制,通过公式(12)计算文本中每个词的权重:Using a vision-guided attention mechanism, the weight of each word in the text is calculated by formula (12):
公式(12) Formula (12)
其中,、为模型参数; in, , are model parameters;
通过公式(13)对各个词的文本局部特征进行对应权重加权,并经过前馈神经网 络映射获得新的文本局部表示: By formula (13), the text local features of each word are The corresponding weights are weighted, and a new local representation of the text is obtained through feedforward neural network mapping :
公式(13) Formula (13)
其中,为模型参数。 in, are model parameters.
在一个优选的实施例中,所述训练数据集通过Wikipedia、MS COCO、Pascal Voc获取。In a preferred embodiment, the training data set is obtained through Wikipedia, MS COCO, and Pascal Voc.
与现有技术相比,本发明提供的一种基于多层次特征表示对齐的跨模态检索方法具有以下优点:Compared with the prior art, the cross-modal retrieval method based on the alignment of multi-level feature representations provided by the present invention has the following advantages:
本发明提供的一种基于多层次特征表示对齐的跨模态检索方法,通过在跨模态细粒度精确对齐阶段,分别计算图像和文本两种不同模态数据之间的全局相似度、局部相似度和关系相似度,并融合得到图像-文本综合相似度,在网络训练阶段,设计相应损失函数,挖掘跨模态结构约束信息,从多个角度约束和监督检索模型的参数学习,最后根据图像-文本综合相似度获取测试查询样例的检索结果,从而通过引入图像和文本两种不同模态数据之间的细粒度关联关系,有效提高跨模态检索的准确率,在图文检索、模式识别等领域具有广泛的市场需求和应用前景。The present invention provides a cross-modal retrieval method based on multi-level feature representation alignment. In the cross-modal fine-grained and precise alignment stage, the global similarity and local similarity between two different modal data of image and text are calculated respectively. In the network training stage, the corresponding loss function is designed, the cross-modal structural constraint information is mined, the parameter learning of the retrieval model is constrained and supervised from multiple perspectives, and finally, according to the image - Text comprehensive similarity to obtain the retrieval results of test query samples, so as to effectively improve the accuracy of cross-modal retrieval by introducing fine-grained correlation between image and text two different modal data, in image and text retrieval, mode Identification and other fields have a wide range of market demand and application prospects.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并于说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.
图1是本发明一个实施例提供的一种实施环境的示意图。FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present invention.
图2是根据一示例性实施例示出的一种基于多层次特征表示对齐的跨模态检索方法的方法流程图。Fig. 2 is a method flowchart of a cross-modal retrieval method based on alignment of multi-level feature representations according to an exemplary embodiment.
图3是本发明实施例示出的一种模态间结构约束损失示意图。FIG. 3 is a schematic diagram of a structural constraint loss between modes according to an embodiment of the present invention.
图4是本发明实施例示出的一种模态内结构约束损失示意图。FIG. 4 is a schematic diagram of an intra-modal structural constraint loss according to an embodiment of the present invention.
图5是本发明实施例进行文本检索图像的一种结果示意图。FIG. 5 is a schematic diagram of a result of text retrieval of images according to an embodiment of the present invention.
图6是根据一示例性实施例示出的一种用于实现基于多层次特征表示对齐的跨模态检索方法的装置框图。Fig. 6 is a block diagram of an apparatus for implementing a cross-modal retrieval method based on alignment of multi-level feature representations, according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种用于实现基于多层次特征表示对齐的跨模态检索方法的装置框图。Fig. 7 is a block diagram of an apparatus for implementing a cross-modal retrieval method based on alignment of multi-level feature representations according to an exemplary embodiment.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,以下结合具体实施例(但不限于所举实施例)与附图详细描述本发明,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to specific embodiments (but not limited to the illustrated embodiments) and the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明实施例可以适用于多种场景,其涉及的实施环境可以包括单个服务器的输入输出场景,或者终端与服务器的互动场景。当实施环境为单个服务器的输入输出场景时,图像数据和文本数据的获取和存储主体均为服务器;当实施环境为终端与服务器的互动场景,此时,实施例所涉及的实施环境的示意图可以如图1所示。在图1所示实施环境的示意图中,该实施环境包括终端101和服务器102。The embodiments of the present invention may be applicable to various scenarios, and the implementation environment involved may include an input and output scenario of a single server, or an interaction scenario between a terminal and a server. When the implementation environment is an input and output scene of a single server, the acquisition and storage of image data and text data are both servers; when the implementation environment is an interaction scene between a terminal and a server, at this time, the schematic diagram of the implementation environment involved in the embodiment can be As shown in Figure 1. In the schematic diagram of the implementation environment shown in FIG. 1 , the implementation environment includes a terminal 101 and a
终端101是运行有至少一个客户端的电子设备,客户端是某个应用程序的客户端,又称APP(Application,应用程序)。终端101可以是智能手机、平板电脑等。The terminal 101 is an electronic device running at least one client, and the client is a client of an application program, also known as APP (Application, application program). The terminal 101 may be a smartphone, a tablet computer, or the like.
终端101和服务器102之间通过无线或有线网络连接。终端101用于向服务器102发送数据,或,终端用于接收服务器102发送的数据。在一种可能的实施方式中,终端101可以向服务器102发送图像数据或文本数据中的至少一种。The terminal 101 and the
服务器102用于接收终端101发送的数据,或,服务器102用于向终端101发送数据。其中,服务器102可以对终端101发送的数据进行分析处理,从而从数据库中匹配出相似度最高的图像数据和文本数据并发送至终端101。The
图2是根据一示例性实施例示出的一种基于多层次特征表示对齐的跨模态检索方法的方法流程图,如图2所示,该一种基于多层次特征表示对齐的跨模态检索方法,其特征在于,所述方法包括:Fig. 2 is a method flowchart of a cross-modal retrieval method based on alignment of multi-level feature representations according to an exemplary embodiment. As shown in Fig. 2, the cross-modal retrieval method based on alignment of multi-level feature representations A method, characterized in that the method comprises:
步骤100:获取训练数据集,对于所述训练数据集中的每组数据对,所述数据对包括图像数据、文本数据,以及所述图像数据与所述文本数据共同对应的语义标签。Step 100: Acquire a training data set. For each data pair in the training data set, the data pair includes image data, text data, and a semantic label corresponding to the image data and the text data.
需要说明的是,文本数据可以为任意语种对应的文本内容,比如英文、中文、日文、德文等;图像数据可以为任意色彩对应的图像内容,比如彩色图像、灰度图像等。It should be noted that the text data can be text content corresponding to any language, such as English, Chinese, Japanese, German, etc.; the image data can be image content corresponding to any color, such as color images, grayscale images, etc.
步骤200:对于所述训练数据集中的每组数据对,分别提取所述数据对中图像数据对应的图像全局特征、图像局部特征和图像关系特征,以及所述数据对中文本数据对应的文本全局特征、文本局部特征和文本关系特征。Step 200: For each group of data pairs in the training data set, extract the image global feature, image local feature and image relationship feature corresponding to the image data in the data pair, and the text global feature corresponding to the text data in the data pair. features, text local features, and text relation features.
在一个优选的实施例中,步骤200具体包括:In a preferred embodiment, step 200 specifically includes:
步骤210:对于所述训练数据集中的每组数据对,采用卷积神经网络CNN提取所述 数据对所对应图像数据的图像全局特征,然后采用视觉目标检测器检测所述图像数据包 括的视觉目标并提取每个视觉目标的图像局部特征,其中,M为所述图像数据 包括的视觉目标数量,为视觉目标的特征向量,再通过图像视觉关系编码网络提取各 个视觉目标之间的图像关系特征,其中,为视觉目标和视觉目标之间的图 像关系特征。 Step 210: For each group of data pairs in the training data set, use a convolutional neural network CNN to extract the global image features of the image data corresponding to the data pairs , and then use a visual target detector to detect the visual targets included in the image data and extract the image local features of each visual target , where M is the number of visual objects included in the image data, for visual target feature vector, and then extract the image relationship features between each visual object through the image visual relationship coding network ,in, for visual target and visual target The image relationship features between them.
步骤220:对于所述训练数据集中的每组数据对,采用词嵌入模型将所述数据对所 对应文本数据中的每个词转换为词向量,其中,N为所述文本数据包括的词数量,然后 将各个词向量依次输入至递归神经网络,获得所述文本数据对应的文本全局特征,再将 各个词向量输入至前馈神经网络获得各个词对应的文本局部特征,同时将各 个词向量输入至文本关系编码网络提取各个词之间的文本关系特征,其中,为 词和词之间的文本关系特征。 Step 220: For each group of data pairs in the training data set, use a word embedding model to convert each word in the text data corresponding to the data pair into a word vector , where N is the number of words included in the text data, and then input each word vector into the recurrent neural network in turn to obtain the text global features corresponding to the text data , and then input each word vector into the feedforward neural network to obtain the local text features corresponding to each word , and input each word vector into the text relation coding network to extract the text relation features between each word ,in, for word and word textual relationship between.
通过上述步骤200的实施,可实现跨模态多层次精细化表示。Through the implementation of the above-mentioned
步骤300:对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据对,根据所述目标数据对对应的图像全局特征和文本全局特征、所述目标数据对对应的图像局部特征和文本局部特征、所述目标数据对对应的图像关系特征和文本关系特征计算得到所述目标数据对对应的图像-文本综合相似度。Step 300: For the target data pair composed of any image data and any text data in the training data set, according to the image global feature and text global feature corresponding to the target data pair, and the image local feature corresponding to the target data pair. The image-text comprehensive similarity corresponding to the target data pair is obtained by calculating with the local text feature, the image relationship feature and the text relationship feature corresponding to the target data pair.
在一个优选的实施例中,步骤300具体包括:In a preferred embodiment, step 300 specifically includes:
步骤310:对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据 对,基于所述目标数据对中图像数据对应的图像全局特征和文本数据对应的文本全局 特征的余弦距离,计算得到所述目标数据对对应的图像-文本全局相似度。 Step 310: For a target data pair composed of any image data and any text data in the training data set, based on the image global feature corresponding to the image data in the target data pair Text global features corresponding to text data The cosine distance is calculated to obtain the image-text global similarity corresponding to the target data pair .
其中,图像-文本全局相似度的计算公式如公式(1): Among them, the image-text global similarity The calculation formula is as formula (1):
) 公式(1) ) Formula 1)
步骤320:采用文本引导注意力机制计算所述目标数据对中图像数据包括的每个 视觉目标的权重,将各个视觉目标的图像局部特征进行对应权重加权后,经前馈神经网 络映射获得新的图像局部表示,然后采用视觉引导注意力机制计算所述目标数据对中 文本数据包括的每个词的权重,将各个词的文本局部特征进行对应权重加权后,经前馈 神经网络映射得到新的文本局部表示,根据各个图像局部表示和各个文本局部表示计算所有视觉目标和词的余弦相似度,并以其均值计算得到所述目标数据对对应的图 像-文本局部相似度。 Step 320: Calculate the weight of each visual target included in the image data in the target data pair by using the text-guided attention mechanism, and combine the image local features of each visual target. After the corresponding weights are weighted, a new local representation of the image is obtained through the feedforward neural network mapping , and then the visual guidance attention mechanism is used to calculate the weight of each word included in the text data in the target data pair, and the text local features of each word are calculated. After the corresponding weights are weighted, the new local representation of the text is obtained through the feedforward neural network mapping. , according to the local representation of each image and individual text local representations Calculate the cosine similarity of all visual targets and words, and calculate the corresponding image-text local similarity of the target data pair with its mean .
其中,图像-文本局部相似度的计算公式如公式(2),M为视觉目标数量,N为 词数量: Among them, the image-text local similarity The calculation formula of is as formula (2), M is the number of visual objects, and N is the number of words:
公式(2) Formula (2)
步骤330:根据所述目标数据对中各个图像关系特征和各个文本关系特征的余弦 相似度均值,计算得到所述目标数据对对应的图像-文本关系相似度。其中,图像-文 本关系相似度的计算公式如公式(3),P表示图像数据和文本数据的关系个数: Step 330: According to the average cosine similarity of each image relationship feature and each text relationship feature in the target data pair, calculate the image-text relationship similarity corresponding to the target data pair. . Among them, the image-text relationship similarity The calculation formula of , such as formula (3), P represents the number of relationships between image data and text data:
公式(3) Formula (3)
步骤340:根据所述目标数据对对应的图像-文本全局相似度、图像-文本局 部相似度、图像-文本关系相似度计算得到所述目标数据对对应的图像-文本综合相 似度。 Step 340: According to the target data, the corresponding image-text global similarity , image-text local similarity , the image-text relationship similarity is calculated to obtain the corresponding image-text comprehensive similarity of the target data pair .
其中,图像-文本综合相似度的计算公式如公式(4): Among them, the image-text comprehensive similarity The calculation formula is as formula (4):
公式(4) Formula (4)
通过上述步骤300的实施,可实现跨模态细粒度精确对齐。Through the implementation of the
步骤400:基于各组目标数据对对应的图像-文本综合相似度,设计模态间结构约束损失函数和模态内结构约束损失函数,并采用所述模态间结构约束损失函数和所述模态内结构约束损失函数对神经网络模型进行训练。Step 400: Design the inter-modal structural constraint loss function and the intra-modal structural constraint loss function based on the image-text comprehensive similarity corresponding to each set of target data pairs, and use the inter-modal structural constraint loss function and the modality constraint loss function. The neural network model is trained using the in-state structure-constrained loss function.
在一个优选的实施例中,所述模态间结构约束损失函数的计算公式如公式(5),其 中,B为样本数,为模型超参数,为匹配的目标数据对,和为非匹配 的目标数据对: In a preferred embodiment, the calculation formula of the structural constraint loss function between modes is as formula (5), where B is the number of samples, are model hyperparameters, is the matching target data pair, and For non-matching target data pairs:
公式(5) Formula (5)
所述模态内结构约束损失函数的计算公式如公式(6),其中,为图像三 元组,相比于,与具有更多共同语义标签,为文本三元组,相比于,与 具有更多共同语义标签:The calculation formula of the structural constraint loss function within the mode is as formula (6), wherein, is an image triple, compared to , and have more common semantic labels, is a text triple, compared to , and Has more common semantic labels:
公式(6) Formula (6)
其中,图3是本发明实施例示出的一种模态间结构约束损失示意图。3 is a schematic diagram of a structural constraint loss between modes according to an embodiment of the present invention.
在一个优选的实施例中,所述采用所述模态间结构约束损失函数和所述模态内结构约束损失函数对神经网络模型进行训练的步骤,包括:In a preferred embodiment, the step of using the inter-modal structural constraint loss function and the intra-modal structural constraint loss function to train the neural network model includes:
从所述训练数据集中随机采样获得匹配的目标数据对、非匹配的目标数据对、图像三元组、文本三元组,分别根据所述模态间结构约束损失函数计算模态间结构约束损失函数值,根据所述模态内结构约束损失函数计算模态内结构约束损失函数值,并按公式(7)进行融合,利用反向传播算法优化网络参数:The matching target data pairs, non-matching target data pairs, image triples, and text triples are randomly sampled from the training data set, and the inter-modal structural constraint loss is calculated according to the inter-modal structural constraint loss function respectively. function value, according to the structural constraint loss function in the mode, calculate the loss function value of the structure constraint in the mode, and fuse it according to formula (7), and use the back propagation algorithm to optimize the network parameters:
公式(7) Formula (7)
其中是超参数。 in are hyperparameters.
其中,图4是本发明实施例示出的一种模态内结构约束损失示意图。Among them, FIG. 4 is a schematic diagram of an intra-modal structural constraint loss shown in an embodiment of the present invention.
通过上述步骤400的实施,可实现利用跨模态结构约束信息监督检索模型的训练,从而使网络训练朝着拉升匹配的目标数据对之间相似度,降低非匹配的目标数据对之间相似度的方向进行,同时使训练后的网络能够学习到更具判别力的图像和文本表示。Through the implementation of the
在一个优选的实施例中,所述通过图像视觉关系编码网络提取各个视觉目标之间 的图像关系特征的步骤,包括: In a preferred embodiment, the image relationship feature between each visual object is extracted through an image visual relationship coding network steps, including:
经图像视觉目标检测器获得图像中视觉目标和视觉目标的特征,,以及两 个目标联合区域的特征,采用公式(8)对上述各个特征进行融合,计算得到各个关系特 征: The visual target in the image is obtained by the image visual target detector and visual target Characteristics , , and the features of the two target joint regions , using formula (8) to fuse the above features, and calculate each relationship feature:
公式(8) Formula (8)
其中[]为向量拼接操作,为神经元激活函数,为模型参数。 Where [] is the vector concatenation operation, is the neuron activation function, are model parameters.
在一个优选的实施例中,所述将各个词向量输入至文本关系编码网络提取各个词 之间的文本关系特征的步骤,包括: In a preferred embodiment, the inputting each word vector into a text relational coding network extracts textual relational features between each word steps, including:
在文本关系编码网络中,采用公式(9)计算词和词之间的文本关系特征: In the text relational coding network, formula (9) is used to calculate the word and word textual relational features between :
公式(9) Formula (9)
其中,表示神经元激活函数,为模型参数。 in, represents the neuron activation function, are model parameters.
在一个优选的实施例中,所述采用文本引导注意力机制计算所述目标数据对中图 像数据包括的每个视觉目标的权重,将各个视觉目标的图像局部特征进行对应权重加 权后,经前馈神经网络映射获得新的图像局部表示的步骤,包括: In a preferred embodiment, the weight of each visual target included in the image data in the target data pair is calculated by using a text-guided attention mechanism, and the image local features of each visual target are combined After the corresponding weights are weighted, a new local representation of the image is obtained through the feedforward neural network mapping steps, including:
采用文本引导注意力机制,通过公式(10)计算图像中每个视觉目标的权重:Using the text-guided attention mechanism, the weight of each visual object in the image is calculated by formula (10):
公式(10) Formula (10)
其中,、为模型参数; in, , are model parameters;
通过公式(11)对每个视觉目标进行加权,并经过前馈神经网络映射获得新的图像 局部表示: Each visual object is weighted by formula (11), and a new image local representation is obtained through feedforward neural network mapping :
公式(11) Formula (11)
其中,为模型参数。 in, are model parameters.
在一个优选的实施例中,所述采用视觉引导注意力机制计算所述目标数据对中文 本数据包括的每个词的权重,将各个词的文本局部特征进行对应权重加权后,经前馈神 经网络映射得到新的文本局部表示的步骤,包括: In a preferred embodiment, the visual guidance attention mechanism is used to calculate the weight of each word included in the text data in the target data pair, and the local text features of each word After the corresponding weights are weighted, the new local representation of the text is obtained through the feedforward neural network mapping. steps, including:
采用视觉引导注意力机制,通过公式(12)计算文本中每个词的权重:Using a vision-guided attention mechanism, the weight of each word in the text is calculated by formula (12):
公式(12) Formula (12)
其中,、为模型参数; in, , are model parameters;
通过公式(13)对各个词的文本局部特征进行对应权重加权,并经过前馈神经网 络映射获得新的文本局部表示: By formula (13), the text local features of each word are The corresponding weights are weighted, and a new local representation of the text is obtained through feedforward neural network mapping :
公式(13) Formula (13)
其中,为模型参数。 in, are model parameters.
在一个优选的实施例中,所述训练数据集通过Wikipedia、MS COCO、Pascal Voc获取。In a preferred embodiment, the training data set is obtained through Wikipedia, MS COCO, and Pascal Voc.
需要说明的是,当采用上述步骤100-400实现神经网络模型的训练后,不同模态的数据经过神经网络模型计算就能准确输出二者之间的相似度。使用测试数据集中的任意一种模态类型作为查询模态,以另一种模态类型作为目标模态,将查询模态的每个数据作为查询样例,检索目标模态中的数据,按照公式(4)所示图像-文本综合相似度计算公式,计算查询样例和查询目标的相似性。在一种可能的实施方式中,神经网络模型可以将相似性最高的目标模态数据作为匹配数据进行输出,或,神经网络模型将各个神经网络模型相似性按照从大到小排序,得到预设数量的目标模态数据的相关结果列表,从而实现不同模态数据间的跨模态检索作业。It should be noted that after the above steps 100-400 are used to implement the training of the neural network model, the similarity between the data of different modes can be accurately outputted through the calculation of the neural network model. Use any modality type in the test data set as the query modality, use another modality type as the target modality, take each data of the query modality as a query sample, retrieve the data in the target modality, and follow The image-text comprehensive similarity calculation formula shown in formula (4) calculates the similarity between the query sample and the query target. In a possible implementation, the neural network model may output the target modal data with the highest similarity as matching data, or the neural network model may sort the similarities of each neural network model in descending order to obtain a preset A list of relevant results for a large number of target modal data, so as to realize cross-modal retrieval operations between different modal data.
本实施例采用了MS COCO跨模态数据集进行实验,该数据集由文献(T. Lin, etal. Microsoft COCO: Common objects in context, ECCV 2014, pp.740-755.)首次提出,已成为跨模态检索领域最常用的实验数据集之一。该数据集中的每张图片均带有5个文本标注,其中82783张图片及其文本标注作为训练样本集,在剩余样本中随机挑选5000张图片及其文本标注作为测试样本集。为了更好地说明本发明实施例提供的基于多层次特征表示对齐的跨模态检索方法的有益效果,将本发明提供的基于多层次特征表示对齐的跨模态检索方法与以下3种现有跨模态检索方法进行实验测试比对:This example uses the MS COCO cross-modal dataset for experiments, which was first proposed by the literature (T. Lin, etal. Microsoft COCO: Common objects in context, ECCV 2014, pp.740-755.), and has become a One of the most commonly used experimental datasets in the field of cross-modal retrieval. Each image in this dataset has 5 text annotations, of which 82,783 images and their text annotations are used as the training sample set, and 5,000 images and their text annotations are randomly selected from the remaining samples as the test sample set. In order to better illustrate the beneficial effects of the cross-modal retrieval method based on alignment of multi-level feature representations provided by the embodiments of the present invention, the cross-modal retrieval method based on alignment of multi-level feature representations provided by the present invention is compared with the following three existing Experimental test comparison of cross-modal retrieval methods:
现有方法一:文献(I. Vendrov, R. Kiros, S. Fidler, and R. Urtasun,Order-embeddings ofimages and language, ICLR, 2016.)中记载的Order-embedding方法。Existing method 1: The Order-embedding method described in the literature (I. Vendrov, R. Kiros, S. Fidler, and R. Urtasun, Order-embeddings of images and language, ICLR, 2016.).
现有方法二:文献(F. Faghri, D. Fleet, R. Kiros, and S. Fidler, VSE++:Improved visualsemantic embeddings with hard negatives, BMVC, 2018.)中记载的VSE++方法。Existing method 2: The VSE++ method described in the literature (F. Faghri, D. Fleet, R. Kiros, and S. Fidler, VSE++: Improved visualsemantic embeddings with hard negatives, BMVC, 2018.).
现有方法三:文献(J. Yu, W. Zhang, Y. Lu, Z. Qin, et al. Reasoning onthe relation: Enhancing visual representation for visual question answeringand cross-modal retrieval, IEEE Transactions on Multimedia, 22(12):3196-3209,2020.)中记载的c-VRANet方法。Existing method three: literature (J. Yu, W. Zhang, Y. Lu, Z. Qin, et al. Reasoning on the relation: Enhancing visual representation for visual question answering and cross-modal retrieval, IEEE Transactions on Multimedia, 22(12) ): 3196-3209, 2020.) The c-VRANet method described in.
实验采用跨模态检索领域常用的R@n指标来评测跨模态检索的准确率,该指标表示检索返回的n个样例中正确样例的百分比,该指标越高表示检索的结果越好,本实验中n分别取1,5,10。In the experiment, the R@n index commonly used in the field of cross-modal retrieval is used to evaluate the accuracy of cross-modal retrieval. This index represents the percentage of correct samples among the n samples returned by the retrieval. , in this experiment, n is taken as 1, 5, and 10, respectively.
表一Table I
通过表一示出的数据可知,与现有跨模态检索方法相比,本发明提供的一种基于多层次特征表示对齐的跨模态检索方法在图像数据检索文本数据,以及文本数据检索图像数据两大任务上的检索准确率均有明显提升,从而充分证明了本发明提出的图像文本全局-局部-关系多层次特征表示精细化对齐的有效性。为了便于理解,还示出采用本发明实施例进行文本检索图像的结果示意图,如图5所示,其中,第一列为检索用文本,第二列为数据集给定的匹配图像,第三列到第七列为相似度前五的对应检索结果。It can be seen from the data shown in Table 1 that, compared with the existing cross-modal retrieval method, the cross-modal retrieval method based on the alignment of multi-level feature representation provided by the present invention retrieves text data from image data, and retrieves image data from text data. The retrieval accuracy rates of the two major tasks of the data are significantly improved, which fully proves the effectiveness of the refined alignment of the global-local-relational multi-level feature representation of the image text proposed by the present invention. For ease of understanding, a schematic diagram of the results of text retrieval images using an embodiment of the present invention is also shown, as shown in FIG. 5 , in which the first column is the text for retrieval, the second column is the matching image given by the dataset, and the third column is the text for retrieval. Columns to the seventh column are the corresponding search results of the top five similarities.
下面的实验结果表明,与现有方法相比,本发明基于多层次特征表示对齐的跨模态检索方法,可以取得更高的检索准确率。The following experimental results show that, compared with the existing methods, the cross-modal retrieval method based on the alignment of multi-level feature representations of the present invention can achieve higher retrieval accuracy.
综上所述,本发明提供的一种基于多层次特征表示对齐的跨模态检索方法,通过在跨模态细粒度精确对齐阶段,分别计算图像和文本两种不同模态数据之间的全局相似度、局部相似度和关系相似度,并融合得到图像-文本综合相似度,在网络训练阶段,设计相应损失函数,挖掘跨模态结构约束信息,从多个角度约束和监督检索模型的参数学习,最后根据图像-文本综合相似度获取测试查询样例的检索结果,从而通过引入图像和文本两种不同模态数据之间的细粒度关联关系,有效提高跨模态检索的准确率,在图文检索、模式识别等领域具有广泛的市场需求和应用前景。To sum up, the present invention provides a cross-modal retrieval method based on multi-level feature representation alignment. In the cross-modal fine-grained and precise alignment stage, the global image and text data between two different modal data are calculated respectively. Similarity, local similarity and relational similarity are combined to obtain the comprehensive image-text similarity. In the network training stage, the corresponding loss function is designed to mine the cross-modal structural constraint information, and the parameters of the retrieval model are constrained and supervised from multiple perspectives. Finally, the retrieval results of the test query samples are obtained according to the comprehensive image-text similarity, so as to effectively improve the accuracy of cross-modal retrieval by introducing the fine-grained correlation between the two different modal data of image and text. Graphic retrieval, pattern recognition and other fields have a wide range of market demand and application prospects.
图6是根据一示例性实施例示出的一种用于实现基于多层次特征表示对齐的跨模态检索方法的装置框图。例如,装置600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。Fig. 6 is a block diagram of an apparatus for implementing a cross-modal retrieval method based on alignment of multi-level feature representations, according to an exemplary embodiment. For example,
参照图6,装置600可以包括以下一个或多个组件:处理组件602,存储器604,电源组件606,多媒体组件608,音频组件610,输入/输出(I/ O)的接口612,传感器组件614,以及通信组件616。6, the
处理组件602通常控制装置600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件602可以包括一个或多个处理器620来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理组件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。The
存储器604被配置为存储各种类型的数据以支持在装置600的操作。这些数据的示例包括用于在装置600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件606为装置600的各种组件提供电力。电源组件606可以包括电源管理系统,一个或多个电源,及其他与为装置600生成、管理和分配电力相关联的组件。
多媒体组件608包括在装置600和目标用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自目标用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当装置600处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件610被配置为输出和/或输入音频信号。例如,音频组件610包括一个麦克风(MIC),当装置600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音频组件610还包括一个扬声器,用于输出音频信号。
I/ O接口612为处理组件602和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。I/
传感器组件614包括一个或多个传感器,用于为装置600提供各个方面的状态评估。例如,传感器组件614可以检测到装置600的打开/关闭状态,组件的相对定位,例如组件为装置600的显示器和小键盘,传感器组件614还可以检测装置600或装置600一个组件的位置改变,目标用户与装置600接触的存在或不存在,装置600方位或加速/减速和装置600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件616被配置为便于装置600和其他设备之间有线或无线方式的通信。装置600可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件616经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件616还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment,
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器604,上述指令可由装置600的处理器620执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as a
一种非临时性计算机可读存储介质,当存储介质中的指令由装置600的处理器执行时,使得装置600能够执行一种基于多层次特征表示对齐的跨模态检索方法,该方法包括:A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the
获取训练数据集,对于所述训练数据集中的每组数据对,所述数据对包括图像数据、文本数据,以及所述图像数据与所述文本数据共同对应的语义标签;Obtaining a training data set, for each group of data pairs in the training data set, the data pairs include image data, text data, and a semantic label corresponding to the image data and the text data in common;
对于所述训练数据集中的每组数据对,分别提取所述数据对中图像数据对应的图像全局特征、图像局部特征和图像关系特征,以及所述数据对中文本数据对应的文本全局特征、文本局部特征和文本关系特征;For each group of data pairs in the training data set, extract the image global features, image local features and image relationship features corresponding to the image data in the data pair, and text global features, textual features corresponding to the text data in the data pair, respectively. Local features and text relationship features;
对于所述训练数据集中任一图像数据与任一文本数据组成的目标数据对,根据所述目标数据对对应的图像全局特征和文本全局特征、所述目标数据对对应的图像局部特征和文本局部特征、所述目标数据对对应的图像关系特征和文本关系特征计算得到所述目标数据对对应的图像-文本综合相似度;For a target data pair composed of any image data and any text data in the training data set, according to the image global feature and text global feature corresponding to the target data pair, the image local feature and text local feature corresponding to the target data pair The image-text comprehensive similarity corresponding to the target data pair is obtained by calculating the corresponding image relationship feature and the text relationship feature of the feature, the target data pair;
基于各组目标数据对对应的图像-文本综合相似度,设计模态间结构约束损失函数和模态内结构约束损失函数,并采用所述模态间结构约束损失函数和所述模态内结构约束损失函数对神经网络模型进行训练。Based on the image-text comprehensive similarity corresponding to each set of target data pairs, an inter-modal structural constraint loss function and an intra-modal structural constraint loss function are designed, and the inter-modal structural constraint loss function and the intra-modal structural constraint loss function are used. The constrained loss function trains the neural network model.
图7是根据一示例性实施例示出的一种用于实现基于多层次特征表示对齐的跨模态检索方法的装置框图。例如,装置700可以被提供为一服务器。参照图7,装置700包括处理组件722,其进一步包括一个或多个处理器,以及由存储器732所代表的存储器资源,用于存储可由处理部件722执行的指令,例如应用程序。存储器732中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件722被配置为执行指令,以执行上述启动页面生成方法。Fig. 7 is a block diagram of an apparatus for implementing a cross-modal retrieval method based on alignment of multi-level feature representations according to an exemplary embodiment. For example, the
装置700还可以包括一个电源组件726被配置为执行装置700的电源管理,一个有线或无线网络接口750被配置为将装置700连接到网络,和一个输入输出(I/O)接口758。装置700可以操作基于存储在存储器732的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM, LinuxTM,FreeBSDTM或类似。
虽然,前文已经用一般性说明、具体实施方式及试验,对本发明做了详尽的描述,但在本发明基础上,可以对之进行修改或改进,这对本领域技术人员而言是显而易见的。因此,在不偏离本发明精神的基础上所做的这些修改或改进,均属于本发明要求保护的范围。Although the present invention has been described in detail above with general description, specific embodiments and tests, it is obvious to those skilled in the art that modifications or improvements can be made on the basis of the present invention. Therefore, these modifications or improvements made without departing from the spirit of the present invention fall within the scope of the claimed protection of the present invention.
本领域技术人员在考虑说明书及实践这里的发明的后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention herein. The present invention is intended to cover any variations, uses or adaptations of the present invention which follow the general principles of the present invention and include common knowledge or conventional techniques in the technical field not disclosed by the present invention . It should be understood that the present invention is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from its scope.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111149240.4A CN113792207B (en) | 2021-09-29 | 2021-09-29 | Cross-modal retrieval method based on multi-level feature representation alignment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111149240.4A CN113792207B (en) | 2021-09-29 | 2021-09-29 | Cross-modal retrieval method based on multi-level feature representation alignment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113792207A true CN113792207A (en) | 2021-12-14 |
CN113792207B CN113792207B (en) | 2023-11-17 |
Family
ID=78877521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111149240.4A Active CN113792207B (en) | 2021-09-29 | 2021-09-29 | Cross-modal retrieval method based on multi-level feature representation alignment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113792207B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114239730A (en) * | 2021-12-20 | 2022-03-25 | 华侨大学 | A Cross-modal Retrieval Method Based on Neighbor Ranking Relation |
CN114550302A (en) * | 2022-02-25 | 2022-05-27 | 北京京东尚科信息技术有限公司 | Method and device for generating action sequence and method and device for training correlation model |
CN114880441A (en) * | 2022-07-06 | 2022-08-09 | 北京百度网讯科技有限公司 | Visual content generation method, device, system, equipment and medium |
CN115129917A (en) * | 2022-06-06 | 2022-09-30 | 武汉大学 | optical-SAR remote sensing image cross-modal retrieval method based on modal common features |
CN115712740A (en) * | 2023-01-10 | 2023-02-24 | 苏州大学 | Method and system for multi-modal implication enhanced image text retrieval |
US20230162490A1 (en) * | 2021-11-19 | 2023-05-25 | Salesforce.Com, Inc. | Systems and methods for vision-language distribution alignment |
CN115827954B (en) * | 2023-02-23 | 2023-06-06 | 中国传媒大学 | Dynamically weighted cross-modal fusion network retrieval method, system, and electronic device |
CN116402063A (en) * | 2023-06-09 | 2023-07-07 | 华南师范大学 | Multimodal satire recognition method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110122A (en) * | 2018-06-22 | 2019-08-09 | 北京交通大学 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
CN110490946A (en) * | 2019-07-15 | 2019-11-22 | 同济大学 | Text generation image method based on cross-module state similarity and generation confrontation network |
CN111026894A (en) * | 2019-12-12 | 2020-04-17 | 清华大学 | A Cross-modal Image Text Retrieval Method Based on Credibility Adaptive Matching Network |
CN112148916A (en) * | 2020-09-28 | 2020-12-29 | 华中科技大学 | Cross-modal retrieval method, device, equipment and medium based on supervision |
CN112784092A (en) * | 2021-01-28 | 2021-05-11 | 电子科技大学 | Cross-modal image text retrieval method of hybrid fusion model |
CN113157974A (en) * | 2021-03-24 | 2021-07-23 | 西安维塑智能科技有限公司 | Pedestrian retrieval method based on character expression |
-
2021
- 2021-09-29 CN CN202111149240.4A patent/CN113792207B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110122A (en) * | 2018-06-22 | 2019-08-09 | 北京交通大学 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
CN110490946A (en) * | 2019-07-15 | 2019-11-22 | 同济大学 | Text generation image method based on cross-module state similarity and generation confrontation network |
CN111026894A (en) * | 2019-12-12 | 2020-04-17 | 清华大学 | A Cross-modal Image Text Retrieval Method Based on Credibility Adaptive Matching Network |
CN112148916A (en) * | 2020-09-28 | 2020-12-29 | 华中科技大学 | Cross-modal retrieval method, device, equipment and medium based on supervision |
CN112784092A (en) * | 2021-01-28 | 2021-05-11 | 电子科技大学 | Cross-modal image text retrieval method of hybrid fusion model |
CN113157974A (en) * | 2021-03-24 | 2021-07-23 | 西安维塑智能科技有限公司 | Pedestrian retrieval method based on character expression |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230162490A1 (en) * | 2021-11-19 | 2023-05-25 | Salesforce.Com, Inc. | Systems and methods for vision-language distribution alignment |
US12112523B2 (en) * | 2021-11-19 | 2024-10-08 | Salesforce, Inc. | Systems and methods for vision-language distribution alignment |
CN114239730A (en) * | 2021-12-20 | 2022-03-25 | 华侨大学 | A Cross-modal Retrieval Method Based on Neighbor Ranking Relation |
CN114550302A (en) * | 2022-02-25 | 2022-05-27 | 北京京东尚科信息技术有限公司 | Method and device for generating action sequence and method and device for training correlation model |
CN115129917A (en) * | 2022-06-06 | 2022-09-30 | 武汉大学 | optical-SAR remote sensing image cross-modal retrieval method based on modal common features |
CN115129917B (en) * | 2022-06-06 | 2024-04-09 | 武汉大学 | optical-SAR remote sensing image cross-modal retrieval method based on modal common characteristics |
CN114880441A (en) * | 2022-07-06 | 2022-08-09 | 北京百度网讯科技有限公司 | Visual content generation method, device, system, equipment and medium |
CN115712740A (en) * | 2023-01-10 | 2023-02-24 | 苏州大学 | Method and system for multi-modal implication enhanced image text retrieval |
CN115827954B (en) * | 2023-02-23 | 2023-06-06 | 中国传媒大学 | Dynamically weighted cross-modal fusion network retrieval method, system, and electronic device |
CN116402063A (en) * | 2023-06-09 | 2023-07-07 | 华南师范大学 | Multimodal satire recognition method, device, equipment and storage medium |
CN116402063B (en) * | 2023-06-09 | 2023-08-15 | 华南师范大学 | Multimodal satire recognition method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113792207B (en) | 2023-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11120078B2 (en) | Method and device for video processing, electronic device, and storage medium | |
CN113792207B (en) | Cross-modal retrieval method based on multi-level feature representation alignment | |
TWI754855B (en) | Method and device, electronic equipment for face image recognition and storage medium thereof | |
CN107491541B (en) | Text classification method and device | |
CN111259148B (en) | Information processing method, device and storage medium | |
TWI766286B (en) | Image processing method and image processing device, electronic device and computer-readable storage medium | |
CN110008401B (en) | Keyword extraction method, keyword extraction device, and computer-readable storage medium | |
WO2022011892A1 (en) | Network training method and apparatus, target detection method and apparatus, and electronic device | |
CN109145213B (en) | Method and device for query recommendation based on historical information | |
CN111368541B (en) | Named entity identification method and device | |
CN109800325A (en) | Video recommendation method, device and computer readable storage medium | |
CN111931844B (en) | Image processing method and device, electronic equipment and storage medium | |
CN110781305A (en) | Text classification method and device based on classification model and model training method | |
CN113515942A (en) | Text processing method and device, computer equipment and storage medium | |
CN111259967A (en) | Image classification and neural network training method, device, equipment and storage medium | |
KR20210091076A (en) | Method and apparatus for processing video, electronic device, medium and computer program | |
CN111611490A (en) | Resource searching method, device, equipment and storage medium | |
CN114332503A (en) | Object re-identification method and device, electronic device and storage medium | |
CN113705210B (en) | A method and device for generating article outline and a device for generating article outline | |
CN116166843B (en) | Text video cross-modal retrieval method and device based on fine granularity perception | |
CN112926310B (en) | Keyword extraction method and device | |
CN115294327A (en) | A small target detection method, device and storage medium based on knowledge graph | |
WO2024179519A1 (en) | Semantic recognition method and apparatus | |
CN113869063A (en) | Data recommendation method and device, electronic equipment and storage medium | |
CN111538998B (en) | Text encryption method and device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 314000 No. 899, guangqiong Road, Nanhu District, Jiaxing City, Zhejiang Province Patentee after: Jiaxing University Country or region after: China Address before: No. 899 Guangqiong Road, Nanhu District, Jiaxing City, Zhejiang Province Patentee before: JIAXING University Country or region before: China |