[go: up one dir, main page]

CN118230052A - Cervical panoramic image few-sample classification method based on visual guidance and language prompt - Google Patents

Cervical panoramic image few-sample classification method based on visual guidance and language prompt Download PDF

Info

Publication number
CN118230052A
CN118230052A CN202410423610.6A CN202410423610A CN118230052A CN 118230052 A CN118230052 A CN 118230052A CN 202410423610 A CN202410423610 A CN 202410423610A CN 118230052 A CN118230052 A CN 118230052A
Authority
CN
China
Prior art keywords
image
instance
text
cervical
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410423610.6A
Other languages
Chinese (zh)
Inventor
郭璐瑶
王小玉
俞越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202410423610.6A priority Critical patent/CN118230052A/en
Publication of CN118230052A publication Critical patent/CN118230052A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及宫颈病理全景图像分类技术,旨在提出一种多模态集成诊断方案,填补目前视觉语言技术在宫颈组织病理诊断的应用空白,提升宫颈组织病理诊断效率。现有视觉语言模型通常在大规模图像文本对上进行训练,具备较强的表征能力,泛化能力和迁移能力。然而,这种方式在宫颈组织病理学领域面临挑战,由于数据的私密性高、标注成本高以及专家经验难复制等问题,难以构建大规模图像文本对。针对这些问题,本发明提出基于视觉引导及语言提示的宫颈全景图像少样本分类方法。实验结果显示,该方法能够有效识别宫颈组织病理学特征,并给出病理学诊断依据和诊断结果。本发明在宫颈组织病理学诊断领域具有广阔的应用前景。

The present invention relates to a panoramic image classification technology for cervical pathology, and aims to propose a multimodal integrated diagnosis solution to fill the current application gap of visual language technology in cervical tissue pathology diagnosis and improve the efficiency of cervical tissue pathology diagnosis. Existing visual language models are usually trained on large-scale image-text pairs, and have strong representation, generalization and migration capabilities. However, this approach faces challenges in the field of cervical tissue pathology. Due to the high privacy of data, high annotation costs, and difficulty in replicating expert experience, it is difficult to construct large-scale image-text pairs. In response to these problems, the present invention proposes a small-sample classification method for cervical panoramic images based on visual guidance and language prompts. Experimental results show that the method can effectively identify cervical tissue pathology features and provide pathological diagnosis basis and diagnosis results. The present invention has broad application prospects in the field of cervical tissue pathology diagnosis.

Description

基于视觉引导及语言提示的宫颈全景图像少样本分类方法A few-shot classification method for panoramic cervical images based on visual guidance and language prompts

技术领域Technical Field

本发明设计基于视觉引导及语言提示的宫颈全景图像少样本分类方法The present invention designs a few-sample classification method for cervical panoramic images based on visual guidance and language prompts

背景技术Background technique

近年来宫颈癌高发,已经成为威胁妇女生命的社会问题。在全世界女性中,每年新诊断子宫颈癌约52.76万人,近26.50万人死于该病,中国每年新发现子宫颈癌例数约7.5万,占全球宫颈癌新发总人数的1/7,3.5万人死于此病。然而,宫颈组织病理医师不仅数量有限,而且其丰富的临床经验和专业知识难以复制和普及,供需矛盾显著。宫颈组织病理学被视为诊断宫颈癌及相关病变的“金标准”,其对于识别早期病变、确定病变性质、指导个体化治疗方案的制定乃至评估预后等方面至关重要。因此,宫颈组织病理学研究对于提高宫颈癌诊断的准确性、减少误诊和漏诊以及改善总体生存质量等具有重大的现实意义和长远价值。In recent years, the high incidence of cervical cancer has become a social problem that threatens women's lives. Among women worldwide, about 527,600 are newly diagnosed with cervical cancer each year, and nearly 265,000 die from the disease. In China, about 75,000 new cases of cervical cancer are discovered each year, accounting for 1/7 of the total number of new cervical cancer cases worldwide, and 35,000 die from the disease. However, not only are the number of cervical histopathologists limited, but their rich clinical experience and professional knowledge are difficult to replicate and popularize, and the contradiction between supply and demand is significant. Cervical histopathology is regarded as the "gold standard" for diagnosing cervical cancer and related lesions. It is crucial for identifying early lesions, determining the nature of lesions, guiding the formulation of individualized treatment plans, and even evaluating prognosis. Therefore, cervical histopathology research has great practical significance and long-term value in improving the accuracy of cervical cancer diagnosis, reducing misdiagnosis and missed diagnosis, and improving overall quality of life.

目前,宫颈癌组织病理学研究主要依赖于深度学习图像处理技术来实现对病理切片中病变区域的自动检测、精确分割以及分类识别。此类技术虽然在一定程度上提升诊断效率,但由于图像本身的特征提取方法以及数据规模的局限性,难以接近病理医生基于临床经验所掌握的复杂病变模式和细微病理变化的理解与描述。相比之下,基于语言提示的多模态学习方法,结合了视觉与自然语言描述,能够更全面地捕捉和理解病理图像中的复杂病理特征,有助于弥补单纯依赖图像分析的不足。尽管视觉语言模型学习在其他医学影像领域已有初步探索并展现出潜力,但在宫颈组织病理学领域,此类方法的应用尚处于起步阶段,未得到大规模推广。究其原因,一方面,宫颈组织病理诊断涉及的专业知识壁垒较高,不同疾病类型之间的病理特征差异显著,现有的通用性模型难以直接适应宫颈癌特异性诊断需求;另一方面,宫颈组织病理数据因其敏感性和隐私保护要求,公开数据集极为稀缺,加上病理切片的制片风格各异、标注成本高昂,使得基于大规模数据驱动的学习方法在宫颈病理学中面临严重挑战。At present, the study of cervical cancer tissue pathology mainly relies on deep learning image processing technology to achieve automatic detection, accurate segmentation and classification and identification of lesion areas in pathological sections. Although such technology improves diagnostic efficiency to a certain extent, due to the limitations of the feature extraction method of the image itself and the size of the data, it is difficult to approach the understanding and description of complex lesion patterns and subtle pathological changes mastered by pathologists based on clinical experience. In contrast, the multimodal learning method based on language cues combines visual and natural language descriptions, which can more comprehensively capture and understand the complex pathological features in pathological images, helping to make up for the shortcomings of relying solely on image analysis. Although visual language model learning has been initially explored in other medical imaging fields and has shown potential, the application of such methods in the field of cervical tissue pathology is still in its infancy and has not been widely promoted. The reasons are that, on the one hand, the professional knowledge barrier involved in cervical tissue pathology diagnosis is relatively high, and the pathological characteristics between different disease types vary significantly, making it difficult for existing general models to directly adapt to the specific diagnostic needs of cervical cancer; on the other hand, due to its sensitivity and privacy protection requirements, public data sets for cervical tissue pathology data are extremely scarce. In addition, the production styles of pathological sections vary and the labeling costs are high, which makes large-scale data-driven learning methods face serious challenges in cervical pathology.

鉴于上述背景,本发明提出一种基于视觉引导及语言提示的宫颈全景图像少样本分类方法。该方法利用视觉先验信息与文本描述相结合的方式,有效应对缺乏精细化标注数据及制片风格差异等实际问题,实现对宫颈病理特征的高效学习与精准识别,提高宫颈组织病理学诊断的精度与效率,为宫颈癌的早期发现、精准分期、个性化治疗决策以及预后评估提供更为科学、准确的依据。In view of the above background, the present invention proposes a small sample classification method for panoramic cervical images based on visual guidance and language prompts. This method uses a combination of visual prior information and text description to effectively deal with practical problems such as lack of refined annotation data and differences in production styles, achieve efficient learning and accurate recognition of cervical pathological characteristics, improve the accuracy and efficiency of cervical tissue pathology diagnosis, and provide a more scientific and accurate basis for early detection, accurate staging, personalized treatment decisions and prognosis evaluation of cervical cancer.

发明内容Summary of the invention

本发明旨在应对宫颈组织病理诊断领域面临的两大核心挑战:大规模精细化标注数据的匮乏与制片风格的显著差异,旨在填补当前视觉语言技术在此领域的应用空白。其创新思路源自对医生诊断流程的深入理解和临床实践需求的紧密贴合。为此,我们提出一种基于视觉引导与语言提示的宫颈全景图像少样本分类方法,旨在通过整合视觉信息与语言描述的优势,有效地应对有限标注数据条件下的诊断任务,并确保在各类制片风格下精准识别宫颈病理特征,进而提升宫颈癌诊断的准确性和效率,满足临床实践的高标准要求。This invention aims to address the two core challenges in the field of cervical tissue pathology diagnosis: the lack of large-scale finely labeled data and the significant differences in production styles, and aims to fill the current application gap of visual language technology in this field. Its innovative ideas come from a deep understanding of the doctor's diagnostic process and a close fit with the needs of clinical practice. To this end, we propose a few-sample classification method for cervical panoramic images based on visual guidance and language prompts, which aims to effectively cope with diagnostic tasks under conditions of limited labeled data by integrating the advantages of visual information and language description, and ensure accurate identification of cervical pathological features under various production styles, thereby improving the accuracy and efficiency of cervical cancer diagnosis and meeting the high standards of clinical practice.

上述发明目的主要是通过以下技术方案实现的:The above invention objectives are mainly achieved through the following technical solutions:

S1、宫颈组织全景图像视觉先验引导图生成:S1. Generation of visual priori guidance map of panoramic image of cervical tissue:

预处理阶段:首先获取低放大倍率(2.5x)的宫颈全景图像,对其进行染色归一化处理,包括RGB转Lab颜色空间转换、通道归一化和Z-score标准化,以减少图像色彩差异,接着通过二值化和轮廓检测确定组织边界;Preprocessing stage: First, a low-magnification (2.5x) panoramic image of the cervix was obtained and subjected to dye normalization processing, including RGB to Lab color space conversion, channel normalization, and Z-score standardization to reduce image color differences. Then, the tissue boundary was determined by binarization and contour detection.

边界距离计算阶段:由于宫颈组织病理各分级呈现出特定的分布规律,病理学家在诊断过程中,首要关注的是宫颈上皮层内病变的演变进程,再关注浸润深度,因此宫颈组织病理分级任务感兴趣区域应从组织切片边缘至中心,可以依据这一点计算边界距离作为先验知识,具体方案如下:Boundary distance calculation stage: Since each grade of cervical tissue pathology presents a specific distribution pattern, pathologists pay attention to the evolution of lesions in the cervical epithelium and the depth of invasion during the diagnosis process. Therefore, the area of interest of the cervical tissue pathology grading task should be from the edge to the center of the tissue section. The boundary distance can be calculated based on this as prior knowledge. The specific plan is as follows:

依据宫颈切片组织形态特点,通常上皮组织部分包裹间质部分,且上皮部分多呈条状,为了使上皮组织部分得到更大的边界权重,首先考虑获取切片组织块的质心,据预处理阶段得到的组织切片外轮廓线通过计算轮廓C围成的面积A和一阶矩M10、M01According to the morphological characteristics of cervical slice tissue, the epithelial tissue usually wraps the stroma part, and the epithelial part is mostly strip-shaped. In order to make the epithelial tissue part obtain a larger boundary weight, first consider obtaining the centroid of the slice tissue block, and calculate the area A and the first-order moments M10 and M01 enclosed by the outer contour line of the tissue slice obtained in the preprocessing stage;

再根据轮廓面积和一阶矩计算质心(cX,cY);Then calculate the centroid (cX, cY) based on the contour area and the first-order moment;

接着根据组织轮廓Ci,找到轮廓的最小外接矩形R四个顶点P1(x1,y1),P2(x2,y2),P3(x3,y3),P4(x4,y4),表示为:{P1,P2,P3,P4}=minAreaRect(Ci);Next, according to the tissue contour Ci , find the four vertices P1 ( x1 , y1 ), P2 ( x2 , y2 ), P3 ( x3 , y3 ), P4 ( x4 , y4 ) of the minimum circumscribed rectangle R of the contour, expressed as: { P1 , P2 , P3 , P4 } = minAreaRect( Ci );

再分别计算得到过质心的平行于外接四边形边的直线方程l1,l2,并求轮廓内每个像素点到直线的距离dist1,dist2Then calculate the equations of the straight lines l 1 , l 2 passing through the centroid and parallel to the sides of the circumscribed quadrilateral, and find the distances dist 1 , dist 2 from each pixel point in the contour to the straight lines;

最后,计算轮廓内像素距离值与轮廓上点距离的比例值作为像素点的距离表示,这样越接近轮廓的像素点的距离表示越接近1,将轮廓外的像素点距离表示记为-1,这样分别形成两个方向上的图像级距离表示,记为与/> Finally, the ratio of the distance value of the pixel inside the contour to the distance of the point on the contour is calculated as the distance representation of the pixel point. In this way, the distance representation of the pixel point closer to the contour is closer to 1, and the distance representation of the pixel point outside the contour is recorded as -1. In this way, the image-level distance representation in two directions is formed, which is recorded as With/>

ROI硬划分阶段:由于宫颈组织上皮及其腺上皮细胞相较于间质细胞表现出更为密集的排列特点,且该类细胞的染色质密度相对较大,因此,在组织病理学染色中,上皮组织区域常显示出较为深沉的颜色;在病理诊断的过程中,医生尤为注重对上皮基底层、副基底层以及腺上皮细胞排列状态的细致观察,这些区域特有的深染特性可作为一种重要的视觉先验,以提高诊断效率,提取视觉先验感兴趣区域(ROI)步骤如下:ROI hard segmentation stage: Since the cervical tissue epithelium and its glandular epithelial cells show a more dense arrangement than the stromal cells, and the chromatin density of such cells is relatively large, the epithelial tissue area often shows a darker color in histopathological staining; in the process of pathological diagnosis, doctors pay special attention to the careful observation of the arrangement of the epithelial basal layer, parabasal layer and glandular epithelial cells. The unique dark staining characteristics of these areas can be used as an important visual prior to improve the diagnostic efficiency. The steps for extracting the visual prior region of interest (ROI) are as follows:

首先利用大津阈值(OTSU)全局阈值法分别对宫颈组织图像组织块中的红色(R)和蓝色(B)通道进行二值化处理;Firstly, the red (R) and blue (B) channels in the cervical tissue image block were binarized using the OTSU global threshold method.

接着根据计算得到的阈值,将每个通道的像素值与其阈值比较,大于阈值的设为1(视为染色质密集区),小于等于阈值的设为0(视为背景),完成红色和蓝色通道的二值化分割,二值化图像分别记为Rbin,BbinThen, according to the calculated threshold, the pixel value of each channel is compared with its threshold. The pixel value greater than the threshold is set to 1 (considered as chromatin-dense area), and the pixel value less than or equal to the threshold is set to 0 (considered as background). The binary segmentation of the red and blue channels is completed, and the binary images are recorded as R bin and B bin respectively;

再计算经过模糊去噪等预处理后的灰度图像Igrey的梯度增强图像G(x,y),采用Sobel算子求得水平和垂直方向梯度再求得梯度幅值图像M(x,y);Then calculate the gradient enhanced image G(x,y) of the grayscale image I grey after fuzzy denoising and other preprocessing, use the Sobel operator to obtain the horizontal and vertical gradients and then obtain the gradient magnitude image M(x,y);

应用分水岭变化算法对梯度图像进行处理,寻找图像中梯度突变的区域,通过构建拓扑图,设定阈值分割值域,获得分割区域,并转化为二值掩膜;The watershed change algorithm is used to process the gradient image, find the area with gradient mutation in the image, and obtain the segmentation area by constructing a topological map and setting the threshold segmentation value range, and then convert it into a binary mask;

最后综合划定感兴趣区域,将上述三种二值化图像进行逻辑运算,得到宫颈组织切片感兴趣区域(ROI)的硬划分ROI(i,j);Finally, the region of interest is comprehensively delineated, and the above three binary images are logically operated to obtain the hard segmentation ROI (i, j) of the region of interest (ROI) of the cervical tissue section;

视觉先验引导图生成阶段:综合上述宫颈组织图像组织块边界距离计算阶段与组织块视觉先验感兴趣区域(ROI)硬划分阶段两阶段得到的图像级边界距离表示和ROI硬划分ROI(i,j),通过加权计算得到视觉先验引导图IactivateVisual prior guided image generation stage: The image-level boundary distance representation obtained by combining the above two stages of cervical tissue image tissue block boundary distance calculation stage and tissue block visual prior region of interest (ROI) hard segmentation stage And ROI hard division ROI(i,j), through weighted calculation to obtain the visual prior guidance map I activate ;

S2、宫颈组织图像实例级特征提取:S2. Instance-level feature extraction of cervical tissue images:

鉴于图像编码器Vision Transformer(ViT)模型能够有效捕获相邻patch间的上下文依赖关系的优势,且为了贴合宫颈组织病理的精细化诊断场景,本发明在宫颈组织图像实例级特征提取阶段注重宫颈组织在分化过程中不同层次间的病变发展规律及其空间分布序列特征;鉴于宫颈组织切片的组织块具有不规则的形状特征,传统的横纵方向粗略切分方法可能无法充分考虑到病变区域的区域性连贯性与发展进程,因此,本发明提出一种基于视觉先验的网格采样方法,通过引入PE模块,利用视觉先验引导图Iactivate生成视觉先验位置编码特征Fposition,实现对宫颈组织区域更为精细化的组织结构导向划分,形成具备生物学意义的、反映组织层次变化的组织图像序列,替代了以往多实例学习任务中仅依赖于固定步长简单划分Slide策略,使得模型能够深入探究病变在宫颈组织不同层次之间的演变轨迹,并充分捕捉病变区域的空间连贯性特征;In view of the advantage that the image encoder Vision Transformer (ViT) model can effectively capture the contextual dependency between adjacent patches, and in order to fit the refined diagnosis scenario of cervical tissue pathology, the present invention focuses on the development law of lesions at different levels of cervical tissue during the differentiation process and its spatial distribution sequence characteristics in the instance-level feature extraction stage of cervical tissue images; in view of the irregular shape characteristics of the tissue blocks of cervical tissue sections, the traditional rough segmentation method in the horizontal and vertical directions may not fully consider the regional coherence and development process of the lesion area. Therefore, the present invention proposes a grid sampling method based on visual priors, which introduces a PE module and uses a visual prior guidance map I activate to generate a visual prior position encoding feature F position to achieve a more refined tissue structure-guided division of the cervical tissue area, forming a tissue image sequence with biological significance that reflects the changes in tissue levels, replacing the simple division Slide strategy that only relies on a fixed step size in previous multi-instance learning tasks, so that the model can deeply explore the evolution trajectory of lesions between different levels of cervical tissue and fully capture the spatial coherence characteristics of the lesion area;

接着用预训练后的CLIP图像编码器(ViT)提取40x放大倍率下的宫颈组织切片图像特征Fimage_patch,将生成的视觉先验位置编码特征Fposition与宫颈组织切片图像特征Fimage_patch进行拼接操作,再通过全连接层(FC)得到宫颈组织图像实例级特征Fimage_instance,公式表示如下:Then, the pre-trained CLIP image encoder (ViT) is used to extract the cervical tissue slice image feature F image_patch at 40x magnification, and the generated visual prior position encoding feature F position is concatenated with the cervical tissue slice image feature F image_patch . Then, the cervical tissue image instance-level feature F image_instance is obtained through the fully connected layer (FC). The formula is as follows:

Fposition=LN(MHSA(IactivateWPE)) (1)F position = LN(MHSA(I activate W PE )) (1)

Fimage_instance=Concat(Fimage_patch,Fposition)Wfeat (2)F image_instance = Concat(F image_patch ,F position )W feat (2)

S3、宫颈组织图像关键实例特征提取:S3. Key instance feature extraction of cervical tissue images:

在40倍显微放大下对宫颈组织切片图像处理时,考虑到数据量巨大和逐片图像与文本匹配计算的复杂性,同时考虑到宫颈组织病理变化具有连续性,同部位相邻切片的病理特征相似性高,即使跳过某些切片进行分析,对病理分级诊断的影响也相对有限。因此,本发明在当前阶段运用注意力机制对来自S2阶段提取出的宫颈组织图像实例级特征Fimage_instance进行计算,实例级特征包含空间信息,再通过多头注意力机制(Multi-HeadSelf-Attention),目的是挖掘实例之间的空间相关性以及上下文相关性,并由此生成注意力得分。这些得分用于筛选出最具代表性的关键特征FTopK_instance,简化冗余计算,具体方法如下:When processing cervical tissue slice images at 40 times microscopic magnification, taking into account the huge amount of data and the complexity of the calculation of image-by-image and text matching, and considering that the pathological changes of cervical tissue are continuous, the pathological features of adjacent slices in the same part are highly similar. Even if some slices are skipped for analysis, the impact on pathological grading diagnosis is relatively limited. Therefore, in the current stage, the present invention uses the attention mechanism to calculate the instance-level features F image_instance of the cervical tissue image extracted from the S2 stage. The instance-level features contain spatial information, and then through the multi-head attention mechanism (Multi-HeadSelf-Attention), the purpose is to mine the spatial correlation and contextual correlation between instances, and thus generate attention scores. These scores are used to screen out the most representative key features F TopK_instance and simplify redundant calculations. The specific method is as follows:

首先将宫颈组织图像实例级特征Fimage_instance进行层归一化(LayerNormalization)操作,然后引入多头注意力模块(MHSA)计算注意力分数αimage_instance并与实例级特征Fimage_instance进行加和得到视觉引导的加权实例特征Fimage_instance′,接着分别通过池化层(MaxPooling)与全连接层(FC)得到实例的类别概率Pimage_instance,根据该分值选择前K个实例级特征作为关键实例特征FTOPK_instance,公式表示如下:First, the instance-level feature F image_instance of the cervical tissue image is layer-normalized. Then, the multi-head attention module (MHSA) is introduced to calculate the attention score α image_instance and add it to the instance-level feature F image_instance to obtain the visually guided weighted instance feature F image_instance ′. Then, the class probability P image_instance of the instance is obtained through the pooling layer (MaxPooling) and the fully connected layer (FC). According to the score, the top K instance-level features are selected as the key instance features F TOPK_instance . The formula is as follows:

αimage_instance=MHSA(LN(Fimage_instance)) (3)α image_instance =MHSA(LN(F image_instance )) (3)

Fimage_instance′=αimage_instanceFimage_instance (4)F image_instance ′=α image_instance F image_instance (4)

Pimage_instance=σ(Pool(Fimage_instance′)Wimage_instance) (5)P image_instance =σ(Pool(F image_instance ′)W image_instance ) (5)

再将视觉引导的加权实例特征Fimage_instance′随机遮盖形成特征序列Fmask,通过设计位置解码器PD,将包含空间及位置信息的特征Fimage_instance′解码,利用视觉引导图Iactivate前景和背景信息作为监督信息,进行掩膜位置重构,进而更新视觉引导图Iactivate′,提升空间分布语义一致性和泛化能力,通过计算分布损失Lposition,进而不断更新迭代,公式表示如下:Then, the visually guided weighted instance feature F image_instance ′ is randomly masked to form a feature sequence F mask . By designing a position decoder PD, the feature F image_instance ′ containing spatial and position information is decoded. The foreground and background information of the visual guidance map I activate is used as supervision information to reconstruct the mask position, and then the visual guidance map I activate ′ is updated to improve the spatial distribution semantic consistency and generalization ability. The distribution loss L position is calculated and then continuously updated and iterated. The formula is as follows:

(F′,Fp′)=LN(MHSA(Fmask))WPD (8)(F′,F p ′)=LN(MHSA(F mask ))W PD (8)

S4、构建宫颈组织图像实例级文本提示描述:S4. Constructing instance-level textual description of cervical tissue images:

为了实现图像文本语义对齐,提升图像分类任务的泛化性,本发明利用ChatGPT4生成一系列关于宫颈组织病理病变分级发生部位及其正异常特征的实例级文本描述,这些文本描述经过预训练的CLIP文本编码器转化为文本特征,与S3阶段提取的宫颈组织关键实例特征FTOPK_instance计算相似度,以建立实例级图像和文本之间的关联;In order to achieve image-text semantic alignment and improve the generalization of image classification tasks, the present invention uses ChatGPT4 to generate a series of instance-level text descriptions of the grading locations of cervical tissue pathological lesions and their positive and abnormal features. These text descriptions are converted into text features through the pre-trained CLIP text encoder, and the similarity is calculated with the key instance feature F TOPK_instance of cervical tissue extracted in the S3 stage to establish the association between instance-level images and texts;

S5、实例级图像特征与文本特征对齐:S5. Alignment of instance-level image features and text features:

为了降低CLIP模型在实例级组织病理图像与文本语义对齐任务中训练的计算量级,本发明设计轻量化的训练策略:将预训练好的CLIP图像编码器与文本编码器冻结权重,训练重点在于优化图像和文本的可学习参数Token;图像Token Timage_token通过将提取到的关键实例特征经过两层多层感知机(MLP)构成的轻量网络(LightNet)得到,文本TokenTtext_token设置目标在于针对每个实例学习,捕获不同病理类型的特定语言描述信息,以学习到更合适的文本描述;因此,将图像Timage_token与文本Ttext_token拼接形成可学习参数序列Ttoken,其数量可以自行设计,例如设置包含L个Token作为经验设定值,记为Ttoken={t1,t2,…,tL},对应的实例级标签记为Y={y1,y2,…,yL}),通过交叉熵损失(Cross-EntropyLoss)来衡量预测结果和标签间差异,并使用反向传播误差来更新这些Token,公式如下:In order to reduce the computational level of CLIP model training in the task of semantic alignment of instance-level histopathological images and texts, the present invention designs a lightweight training strategy: the weights of the pre-trained CLIP image encoder and text encoder are frozen, and the training focuses on optimizing the learnable parameter Tokens of the image and text; the image Token T image_token is obtained by passing the extracted key instance features through a lightweight network (LightNet) composed of two layers of multi-layer perceptrons (MLPs), and the text Token T text_token is set to learn for each instance and capture the specific language description information of different pathological types to learn a more appropriate text description; therefore, the image T image_token and the text T text_token are concatenated to form a learnable parameter sequence T token , the number of which can be designed by oneself, for example, L Tokens are set as empirical setting values, denoted as T token ={t 1 ,t 2 ,…,t L }, and the corresponding instance-level label is denoted as Y ={y 1 ,y 2 ,…,y L }), the cross-entropy loss is used to measure the difference between the prediction results and the labels, and the back-propagation error is used to update these tokens. The formula is as follows:

Ttoken=Concat(LightNet(FTOPK_instance),Ttext_token) (10)T token = Concat(LightNet(F TOPK_instance ),T text_token ) (10)

其中,pi是模型对于第i个Token预测为对应类别标签的概率,即Among them, pi is the probability that the model predicts the corresponding category label for the i-th Token, that is,

pi=P(ti|FTOPK_instance,Ttoken);p i =P(t i |F TOPK_instance ,T token );

接着,利用预训练的文本编码器得到文本特征,与S3得到的宫颈组织关键实例特征计算余弦相似度,目标函数公式如下:Next, the pre-trained text encoder is used to obtain text features, and the cosine similarity is calculated with the key instance features of cervical tissue obtained in S3. The objective function formula is as follows:

Lcontrastive=(Ltext2image+Limage2text)/2 (15)L contrastive = (L text2image + L image2text )/2 (15)

其中,Ii是第i个文本描述的嵌入向量,Tj是第j个切片图像嵌入向量,文本到图像对比损失表示为Ltext2image,图像到文本对比损失表示为Limage2text,整个相似度对比损失表示为LcontrastiveWhere Ii is the embedding vector of the i-th text description, Tj is the embedding vector of the j-th slice image, the text-to-image contrast loss is denoted as Ltext2image , the image-to-text contrast loss is denoted as Limage2text , and the overall similarity contrast loss is denoted as Lcontrastive ;

S6、实例级特征聚合:S6. Instance-level feature aggregation:

通过计算实例级图像特征与文本特征之间的相似度分数,对S3中提取的关键实例特征进行加权聚合,形成包级特征FBag,以在宫颈全景图像全局范围内使不同层级、不同类型的病变特征得到有效的整合和表达;By calculating the similarity scores between instance-level image features and text features, the key instance features extracted from S3 are weighted and aggregated to form bag-level features F Bag , so that the features of lesions at different levels and of different types can be effectively integrated and expressed in the global scope of the panoramic cervical image.

S7、建立基于全切片图像(WSI)级别的病变提示库:S7. Establish a lesion prompt library based on the whole slice image (WSI) level:

利用宫颈组织病理学图谱中积累的CIN1至CIN3各级别病变的宫颈组织病理病变图示及特征描述,创建病变描述智能提示库,以包含标准化、智能化的专业知识体系:由ChatGPT4模型对图谱中描述信息进行再描述,以缩小医学专业知识与深度学习语义理解之间的距离;利用预训练的CLIP图像编码器和文本编码器提取对应的图像特征Fkey和文本特征Fvalue:构建图像特征与文本特征对形成多个键值对(Key-Value),记为Fkey_valueUsing the cervical histopathological lesion illustrations and feature descriptions of lesions of various levels from CIN1 to CIN3 accumulated in the cervical histopathology atlas, an intelligent prompt library for lesion description is created to include a standardized and intelligent professional knowledge system: the ChatGPT4 model re-describes the description information in the atlas to narrow the gap between medical professional knowledge and deep learning semantic understanding; the pre-trained CLIP image encoder and text encoder are used to extract the corresponding image features F key and text features F value : image features and text feature pairs are constructed to form multiple key-value pairs (Key-Value), recorded as F key_value ;

S8、包级图像特征检索文本特征:S8. Package-level image feature retrieval text feature:

为了进一步降低CLIP模型在组织病理病变特征与文本语义对齐任务中训练的计算量级,考虑用免训练的方式将图文匹配任务转化为检索任务:将S6中的包级特征FBag与S7中的图像特征Fkey计算相似度分数ABag_key,利用键值对Fkey_value,通过标签传播的方式与Fvalue进行聚合形成预测FBag′实现知识的检索,公式如下:In order to further reduce the computational level of the CLIP model in the task of aligning pathological lesion features with text semantics, we consider converting the image-text matching task into a retrieval task in a training-free way: the bag-level feature F Bag in S6 and the image feature F key in S7 are used to calculate the similarity score A Bag_key , and the key-value pair F key_value is aggregated with F value through label propagation to form a prediction F Bag ′ to realize knowledge retrieval. The formula is as follows:

ABag_key=exp(-β(1-FBagFkey T)) (16)A Bag_key = exp(-β(1-F Bag F key T )) (16)

FBag′=ABag_keyFvalue (17)F Bag ′=A Bag_key F value (17)

S9、病变诊断决策:S9. Lesion diagnosis decision:

根据S8阶段得到的预测FBag′和原始包特征FBag通过由两层MLP构成的轻量级网络LightNet得到预测特征,LightNet权重记为Wc,两者进行残差连接加权求和获得最终的预测结果logitsBag,该结果同时获得了实例级数据信息和Few shot知识,预测结果计算公式如下:According to the prediction F Bag ′ obtained in the S8 stage and the original bag feature F Bag, the prediction features are obtained through the lightweight network LightNet composed of two layers of MLP. The LightNet weight is recorded as W c . The two are weighted summed by residual connection to obtain the final prediction result logits Bag . This result obtains both instance-level data information and Few shot knowledge. The prediction result calculation formula is as follows:

最后,根据该类别预测判断病变所属的等级范围,并结合S7构建的病变提示库对应描述信息作为最终诊断依据,从而辅助医生做出更加精准的宫颈癌及病变诊断结论。Finally, the level range of the lesion is determined based on the category prediction, and the corresponding description information of the lesion prompt library built by S7 is used as the final diagnosis basis, thereby assisting doctors to make more accurate diagnosis conclusions of cervical cancer and lesions.

发明效果Effects of the Invention

本发明专注于宫颈组织病理诊断领域,针对大规模精细化标注数据不足与制片风格多样性两大挑战,提出了基于视觉引导和语言提示的宫颈全景图像少样本分类方法。首先从从低放大倍率的宫颈全景图像入手,采用色彩归一化手段消除图像色彩差异,并通过轮廓检测确定组织边界。在此基础上,利用宫颈组织病理学原理和形态学特征,计算边界距离作为视觉先验知识,定位关键诊断区域。在ROI硬划分阶段,运用OTSU阈值法和梯度增强技术,结合分水岭变化算法,实现了对宫颈组织中上皮及腺上皮细胞密集区域的有效分割,形成视觉先验感兴趣区域(ROI)。进一步,发明融合了边界距离计算和ROI硬划分的结果,生成视觉先验引导图,指导组织图像序列的精细化划分。利用预训练的Vision Transformer(ViT)模型提取实例级特征,尤其是引入视觉先验位置编码特征,克服了传统方法在病变区域定位方面的局限性,能够更好地捕捉病变在组织层次间的演变规律和空间分布特征。同时,发明创新性地利用ChatGPT4生成宫颈组织病变实例级文本描述,结合CLIP模型进行图像文本对齐,强化了模型的语义理解和泛化能力。通过设计轻量化的训练策略,重点关注关键实例特征的挖掘和选择,有效降低了大规模数据处理的复杂度和计算成本。最后,发明通过计算实例级图像特征与文本特征的相似度,实现关键特征的加权聚合,并通过组织病理学图谱构建了全切片图像级别的病变提示库,采用免训练的检索方式进行图文匹配,大大提升了诊断效能。整个诊断决策过程融合了实例级数据信息和少量样本知识,确保了宫颈病变分级诊断的精准性与可靠性。该发明技术的意义在于,通过整合视觉和语言信息,突破了传统单一图像分析方法的局限,有效提升了宫颈癌及其他病变的识别准确度和诊断效率,为临床实践提供一定支持。实验表明,该方法可以快速有效地在少量宫颈组织病理数据下训练,能够给出准确的病变分级诊断结果的同时,给出诊断依据,并提供病变区域预测结果,贴合临床诊断场景需求。The present invention focuses on the field of cervical tissue pathology diagnosis. Aiming at the two major challenges of insufficient large-scale refined annotation data and diversity of production styles, a few-sample classification method for cervical panoramic images based on visual guidance and language prompts is proposed. First, starting from the low-magnification panoramic image of the cervix, color normalization is used to eliminate image color differences, and the tissue boundary is determined by contour detection. On this basis, the boundary distance is calculated as visual prior knowledge using the pathological principles and morphological characteristics of cervical tissue to locate the key diagnostic area. In the ROI hard segmentation stage, the OTSU threshold method and gradient enhancement technology are used, combined with the watershed change algorithm, to achieve effective segmentation of the dense epithelial and glandular epithelial cell areas in the cervical tissue, forming a visual prior region of interest (ROI). Furthermore, the invention integrates the results of boundary distance calculation and ROI hard segmentation to generate a visual prior guidance map to guide the refined segmentation of tissue image sequences. The pre-trained Vision Transformer (ViT) model is used to extract instance-level features, especially the introduction of visual prior position encoding features, which overcomes the limitations of traditional methods in locating lesion areas and can better capture the evolution law and spatial distribution characteristics of lesions between tissue levels. At the same time, the invention innovatively uses ChatGPT4 to generate instance-level text descriptions of cervical tissue lesions, and combines the CLIP model for image-text alignment, which enhances the semantic understanding and generalization capabilities of the model. By designing a lightweight training strategy, focusing on the mining and selection of key instance features, the complexity and computational cost of large-scale data processing are effectively reduced. Finally, the invention achieves weighted aggregation of key features by calculating the similarity between instance-level image features and text features, and constructs a lesion prompt library at the full-slice image level through histopathological atlases, and uses a training-free retrieval method for image-text matching, which greatly improves the diagnostic efficiency. The entire diagnostic decision-making process integrates instance-level data information and a small amount of sample knowledge to ensure the accuracy and reliability of graded diagnosis of cervical lesions. The significance of the invention technology is that by integrating visual and language information, it breaks through the limitations of traditional single image analysis methods, effectively improves the recognition accuracy and diagnostic efficiency of cervical cancer and other lesions, and provides certain support for clinical practice. Experiments have shown that this method can be quickly and effectively trained with a small amount of cervical tissue pathology data. It can provide accurate lesion grading diagnosis results, provide diagnostic basis, and provide lesion area prediction results, which meets the needs of clinical diagnosis scenarios.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1基于视觉引导及语言提示的宫颈全景图像少样本分类方法技术路线图;Figure 1 Technical roadmap of the few-sample classification method for cervical panoramic images based on visual guidance and language prompts;

图2基于视觉引导及语言提示的宫颈全景图像少样本分类方法结构补充图;Fig. 2 Supplementary diagram of the structure of the few-sample classification method of cervical panoramic images based on visual guidance and language prompts;

图3视觉先验引导效果图;Figure 3: Visual prior guidance effect diagram;

图4至图9诊断分级效果图;Figures 4 to 9 are diagrams showing the diagnostic grading effects;

具体实施方法Specific implementation methods

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

如图1、图2所示本文提供的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,包含以下阶段:As shown in Figures 1 and 2, the few-sample classification method for cervical panoramic images based on visual guidance and language prompts provided in this paper includes the following stages:

S1、宫颈组织全景图像视觉先验引导图生成阶段:S1. Cervical tissue panoramic image visual priori guidance map generation stage:

步骤1、宫颈全景图像预处理:Step 1: Preprocessing of cervical panoramic images:

步骤101、获取并缩放宫颈全景图像缩略图;Step 101, obtaining and scaling a cervical panoramic image thumbnail;

步骤102、进行染色归一化:转为Lab颜色空间,进行归一化和Z-score标准化,再转回RGB颜色空间;Step 102, perform color normalization: convert to Lab color space, perform normalization and Z-score standardization, and then convert back to RGB color space;

步骤103、图像二值化与外轮廓检测:进行图像二值化处理,通过边缘检测和连通域标记提取图像外轮廓线,据此分割并统一调整切片图像大小;Step 103, image binarization and outer contour detection: perform image binarization processing, extract the image outer contour line through edge detection and connected domain marking, and segment and uniformly adjust the slice image size based on it;

步骤2、宫颈组织图像边界距离计算:Step 2: Calculation of cervical tissue image boundary distance:

步骤201、计算组织块质心:依据轮廓面积和一阶矩确定质心位置;Step 201, calculating the centroid of the tissue block: determining the centroid position according to the contour area and the first-order moment;

步骤202、确定像素距离表示:通过计算轮廓内每个像素点到质心所在直线的距离比例,得到两个方向上的图像级边界距离表示;Step 202, determine the pixel distance representation: by calculating the distance ratio from each pixel point in the contour to the straight line where the centroid is located, obtain the image-level boundary distance representation in two directions;

步骤3、宫颈组织图像ROI硬划分:Step 3: Hard segmentation of ROI of cervical tissue image:

步骤301、通道二值化:利用OTSU阈值法对红蓝通道进行二值化处理;Step 301, channel binarization: using the OTSU threshold method to perform binarization processing on the red and blue channels;

步骤302、梯度增强与分割:通过Sobel算子计算梯度图像,应用分水岭变化算法分割出梯度突变区域,形成ROI的硬划分掩膜;Step 302, gradient enhancement and segmentation: calculate the gradient image by using the Sobel operator, apply the watershed change algorithm to segment the gradient mutation area, and form a hard segmentation mask of the ROI;

步骤4、视觉先验引导图生成:结合步骤2得到的边界距离表示与步骤3得到的ROI硬化分掩膜,通过加权及逻辑运算生成视觉先验引导图;Step 4: Generate a visual priori guidance map: Combine the boundary distance representation obtained in step 2 with the ROI hardening sub-mask obtained in step 3, and generate a visual priori guidance map through weighted and logical operations;

S2、宫颈组织图像实例级特征提取阶段:S2, cervical tissue image instance-level feature extraction stage:

步骤5、网格采样与视觉先验位置编码:将视觉先验引导图与宫颈组织全景图像及高倍率组织切片对齐,利用双线性插值进行上采样和切片操作,通过位置编码器构建视觉先验引导位置编码;Step 5, grid sampling and visual prior position encoding: align the visual prior guidance map with the panoramic image of cervical tissue and high-magnification tissue slices, use bilinear interpolation to perform upsampling and slicing operations, and construct the visual prior guidance position encoding through the position encoder;

步骤6、图像特征提取:使用预训练的CLIP图像编码器(ViT)模型提取40x放大倍率下组织切片的图像特征;Step 6: Image feature extraction: Use the pre-trained CLIP image encoder (ViT) model to extract image features of tissue sections at 40x magnification;

步骤7、特征融合:将视觉先验位置编码特征与图像特征进行拼接,并通过全连接层(FC)生成实例级特征;Step 7: Feature fusion: concatenate the visual prior position encoding features with the image features, and generate instance-level features through a fully connected layer (FC);

S3、宫颈组织关键实例特征提取阶段:S3, Cervical tissue key instance feature extraction stage:

步骤8、实例级特征处理:对实例级特征进行层归一化,并通过多头注意力机制(MHSA)计算注意力得分,生成视觉引导的加权实例特征;Step 8: Instance-level feature processing: perform layer normalization on instance-level features, calculate attention scores through the multi-head attention mechanism (MHSA), and generate visually guided weighted instance features;

步骤9、关键特征筛选:利用池化层与全连接层得到实例的类别概率,并根据概率值选取前K个实例级特征作为关键实例特征;Step 9: Key feature screening: Use the pooling layer and the fully connected layer to obtain the category probability of the instance, and select the top K instance-level features as key instance features based on the probability value;

S4、构建宫颈组织实例级文本提示描述阶段:S4: Constructing instance-level text prompt description of cervical tissue:

步骤10、文本描述生成:使用ChatGPT4生成针对宫颈组织病理病变部位和正异常特征的实例级文本描述信息提示;Step 10: Text description generation: Use ChatGPT4 to generate instance-level text description information prompts for cervical tissue pathological lesion sites and positive abnormal features;

步骤11、文本特征转化:将这些文本描述通过预训练的CLIP文本编码器转化为文本特征;Step 11: Text feature conversion: These text descriptions are converted into text features through the pre-trained CLIP text encoder;

S5、实例级图像文本对对齐阶段:S5, instance-level image-text pair alignment stage:

步骤12、特征对齐与优化:将关键实例特征与相应的文本特征计算相似度,并通过轻量化训练策略更新模型参数,优化图文对齐效果;Step 12: Feature alignment and optimization: Calculate the similarity between key instance features and corresponding text features, and update model parameters through lightweight training strategy to optimize image-text alignment effect;

S6、实例级特征聚合阶段:S6, instance-level feature aggregation stage:

步骤13、特征加权聚合:根据关键实例特征与文本特征之间的相似度分数进行加权聚合,形成反映整体病变特征的包级特征;Step 13: Weighted feature aggregation: perform weighted aggregation based on the similarity scores between key instance features and text features to form a package-level feature that reflects the overall lesion features;

S7、全切片图像级别的病变提示库构建阶段:S7. Construction of the lesion prompt library at the whole-slice image level:

步骤14、病变库建立:收集CIN1至CIN3各级别的宫颈组织病理病变图像及其特征描述,利用ChatGPT4对描述进行规范化处理;分别提取病变示例的图像特征和文本特征,构建键值对形式的病变提示库;Step 14: Establishment of lesion library: Collect images of cervical tissue pathological lesions of various levels from CIN1 to CIN3 and their feature descriptions, and use ChatGPT4 to normalize the descriptions; extract image features and text features of lesion examples respectively, and build a lesion prompt library in the form of key-value pairs;

S8、包级图像特征检索文本特征:S8. Package-level image feature retrieval text feature:

步骤15、图文检索:使用包级特征与病变提示库中的图像特征计算余弦相似度,通过标签传播聚合得到预测结果;Step 15, image and text retrieval: use the package-level features and the image features in the lesion prompt library to calculate the cosine similarity, and obtain the prediction results through label propagation aggregation;

S9、病变诊断决策:S9. Lesion diagnosis decision:

步骤16、预测与决策:将预测结果与原始包特征结合,通过轻量级网络生成最终预测特征,结合预测结果与病变提示库信息作出病变等级范围的诊断决策。Step 16: Prediction and decision: Combine the prediction results with the original package features, generate the final prediction features through a lightweight network, and combine the prediction results with the lesion prompt library information to make a diagnostic decision on the lesion grade range.

本发明实施例在视觉先验引导图生成阶段,通过预处理、边界距离计算和ROI硬划分方法生成视觉先验引导图以指导后续处理;接着,借助预训练的CLIP图像编码器(ViT)提取高精度的图像特征,并与视觉先验位置编码进行特征融合,再通过多头注意力机制筛选出关键实例特征;利用ChatGPT4生成与病变特征对应的实例级文本描述,将文本信息通过CLIP文本编码器转化为可与图像特征对比的文本特征,从而实现图文间的精细对齐优化;再将关键实例特征与文本特征进行加权聚合,产生能够全面反映宫颈病变情况的包级特征;以此为基础,利用组织病理图谱创建涵盖CIN1至CIN3级别的宫颈组织病变图像特征库,并利用包级特征与库中数据计算相似度,通过标签传播聚合技术得出预测结果;最后,结合病变预测结果与实例级预测结果,实现病变诊断决策;整个流程旨在借助AI模型实现宫颈病理图像的精准识别与标准化描述,以辅助病理诊断。In the visual prior guidance map generation stage, the embodiment of the present invention generates a visual prior guidance map through preprocessing, boundary distance calculation and ROI hard segmentation method to guide subsequent processing; then, with the help of the pre-trained CLIP image encoder (ViT), high-precision image features are extracted, and feature fusion is performed with the visual prior position encoding, and then the key instance features are screened out through the multi-head attention mechanism; ChatGPT4 is used to generate instance-level text descriptions corresponding to lesion features, and the text information is converted into text features that can be compared with image features through the CLIP text encoder, so as to achieve fine alignment optimization between images and texts; then the key instance features and text features are weightedly aggregated to generate package-level features that can comprehensively reflect the cervical lesion situation; based on this, a cervical tissue lesion image feature library covering levels CIN1 to CIN3 is created using tissue pathology maps, and the package-level features are used to calculate the similarity with the data in the library, and the prediction results are obtained through label propagation aggregation technology; finally, the lesion prediction results are combined with the instance-level prediction results to achieve lesion diagnosis decisions; the entire process aims to achieve accurate recognition and standardized description of cervical pathology images with the help of AI models to assist pathological diagnosis.

下面对本发明实施例进行详细的说明:The embodiments of the present invention are described in detail below:

本发明实施例采用私有400张宫颈全景图像应用本发明算法进行训练,应用30张不同病变等级的宫颈全景图像验证本发明算法实现病理诊断效果。The embodiment of the present invention uses 400 private cervical panoramic images to apply the algorithm of the present invention for training, and uses 30 cervical panoramic images of different lesion levels to verify the pathological diagnosis effect of the algorithm of the present invention.

如图1、图2所示模型训练步骤包含以下部分:As shown in Figure 1 and Figure 2, the model training steps include the following parts:

S1、宫颈组织全景图像视觉先验引导图生成,具体实现如下:S1. Generation of visual priori guidance map of panoramic image of cervical tissue, specifically implemented as follows:

步骤1、低放大倍率下的宫颈组织全景图像预处理:Step 1: Preprocessing of panoramic images of cervical tissue at low magnification:

步骤101、利用OpenSlide工具获取2.5x放大倍率下的宫颈全景图像的缩略图;Step 101, using the OpenSlide tool to obtain a thumbnail of the panoramic image of the cervix at a magnification of 2.5x;

步骤102、在宫颈全景图像训练集中进行染色归一化:Step 102: Perform dye normalization in the cervical panoramic image training set:

首先进行RGB到Lab颜色空间的转换,公式如下:First, convert RGB to Lab color space. The formula is as follows:

L=0.2126×R+0.7152×G+0.0722×B (19)L=0.2126×R+0.7152×G+0.0722×B (19)

a=0.5×(R-G)+128 (20)a=0.5×(R-G)+128 (20)

b=0.5×(R-B)+128 (21)b=0.5×(R-B)+128 (21)

这里的R、G、B分别表示RGB图像中每个像素点的红、绿、蓝分量;Here R, G, and B represent the red, green, and blue components of each pixel in the RGB image respectively;

再进行Lab通道归一化,再用Z-sorce标准化,即将每个通道的数值减去均值再除以标准差,Lab通道的均值和标准差可以通过训练数据集计算统计量得到,公式如下:Then perform Lab channel normalization and then standardize with Z-sorce, that is, subtract the mean from the value of each channel and divide it by the standard deviation. The mean and standard deviation of the Lab channel can be obtained by calculating the statistics of the training data set. The formula is as follows:

其中,Li表示Lab图像中第i个像素点的明度值,N表示像素点的总数,类似地可以算出通道a和通道b的均值和标准差;Where Li represents the brightness value of the i-th pixel in the Lab image, and N represents the total number of pixels. Similarly, the mean and standard deviation of channel a and channel b can be calculated;

将归一化后的Lab图像转换回RGB颜色空间,得到归一化后的RGB图像,公式如下:Convert the normalized Lab image back to the RGB color space to get the normalized RGB image. The formula is as follows:

R=L+1.402×(b-128) (24)R=L+1.402×(b-128) (24)

G=L-0.3441×(a-128)-0.7141×(b-128) (25)G=L-0.3441×(a-128)-0.7141×(b-128) (25)

B=L+1.772×(a-128) (26)B=L+1.772×(a-128) (26)

步骤103、进行图像二值化与外轮廓检测:Step 103: perform image binarization and outer contour detection:

得到染色归一化的图像后,利用OpenCV工具库对图像进行二值化,然后进行外轮廓检测,该方法基于图像边缘检测和连通域标记技术,通过寻找图像中的连通域并提取器外轮廓线,实现对物体外部轮廓的准确检测;After obtaining the dyed normalized image, the image is binarized using the OpenCV tool library, and then the outer contour is detected. This method is based on image edge detection and connected domain marking technology. It accurately detects the outer contour of the object by finding the connected domain in the image and extracting the outer contour line.

接着根据检测到的外轮廓线C中的最大连通域对图像进行切分,再将切分后的图片缩放为一致大小,如1024×1024大小;Then the image is segmented according to the largest connected domain in the detected outer contour line C, and the segmented images are scaled to a consistent size, such as 1024×1024;

步骤2、计算宫颈组织图像组织块边界距离;Step 2, calculating the boundary distance of the tissue block of the cervical tissue image;

步骤201、计算组织块质心,依据轮廓面积和一阶矩确定质心位置:Step 201, calculate the centroid of the tissue block, and determine the centroid position based on the contour area and the first-order moment:

据第一步得到的组织切片外轮廓线通过计算轮廓C围成的面积A和一阶矩M10、M01,公式表示如下:According to the outer contour of the tissue slice obtained in the first step, the area A and the first-order moments M 10 and M 01 enclosed by the contour C are calculated, and the formula is expressed as follows:

Ai=contourArea(Ci) (27)A i = contourArea(C i ) (27)

其中,Ci表示第i个轮廓,x和y分别是像素的横纵坐标;Among them, Ci represents the i-th contour, x and y are the horizontal and vertical coordinates of the pixel respectively;

再根据轮廓面积和一阶矩计算质心(cX,cY),公式如下:Then calculate the centroid (cX, cY) based on the contour area and the first-order moment. The formula is as follows:

步骤202、确定像素距离表示,通过计算轮廓内每个像素点到质心所在直线的距离比例,得到两个方向上的图像级边界距离表示:Step 202: Determine the pixel distance representation, by calculating the distance ratio from each pixel point in the contour to the straight line where the centroid is located, and obtain the image-level boundary distance representation in two directions:

根据组织轮廓Ci,找到轮廓的最小外接矩形R四个顶点P1(x1,y1),P2(x2,y2),P3(x3,y3),P4(x4,y4),表示为:According to the tissue contour C i , find the four vertices P 1 (x 1 ,y 1 ), P 2 (x 2 ,y 2 ), P 3 (x 3 ,y 3 ), P 4 (x 4 ,y 4 ) of the minimum circumscribed rectangle R of the contour, expressed as:

{P1,P2,P3,P4}=minAreaRect(Ci) (32){P 1 ,P 2 ,P 3 ,P 4 } = minAreaRect(C i ) (32)

再分别计算得到过质心的平行于外接四边形边的直线方程l1,l2,并求轮廓内每个像素点到直线的距离dist1,dist2,公式如下:Then calculate the equations of the straight lines l 1 and l 2 passing through the centroid and parallel to the sides of the circumscribed quadrilateral, and find the distances dist 1 and dist 2 from each pixel point in the contour to the straight lines. The formulas are as follows:

其中,(xp,yp)为轮廓Ci内的像素点;Among them, (x p ,y p ) is the pixel point within the contour Ci ;

计算轮廓内像素距离值与轮廓上点距离的比例值作为像素点的距离表示,这样越接近轮廓的像素点的距离表示越接近1,将轮廓外的像素点距离表示记为-1,这样分别形成两个方向上的图像级距离表示,记为与/> The ratio of the distance value of the pixel inside the contour to the distance of the point on the contour is calculated as the distance representation of the pixel point. In this way, the distance representation of the pixel point closer to the contour is closer to 1, and the distance representation of the pixel point outside the contour is recorded as -1. In this way, the image-level distance representation in two directions is formed, which is recorded as With/>

步骤3、宫颈组织图像组织块视觉先验感兴趣区域(ROI)硬划分:Step 3: Hard segmentation of the visual prior region of interest (ROI) of the cervical tissue image tissue block:

步骤301、通道二值化:Step 301: Channel binarization:

首先利用大津阈值(OTSU)全局阈值法分别对宫颈组织图像组织块中的红色(R)和蓝色(B)通道进行二值化处理,对于任一像素点p的红色通道灰度值Rp和蓝色通道灰度值Bp,计算各自通道的最佳阈值TR和TB,使前景(染色质密集区)和背景之间的类间方差最大,计算公式如下:First, the red (R) and blue (B) channels in the tissue block of the cervical tissue image are binarized using the Otsu threshold (OTSU) global threshold method. For the red channel grayscale value Rp and blue channel grayscale value Bp of any pixel point p, the optimal thresholds TR and TB of each channel are calculated to maximize the inter-class variance between the foreground (chromatin dense area) and the background. The calculation formula is as follows:

其中ω0和ω1分别是背景和前景的像素占比,μ0和μ1分别是背景和前景的平均灰度值;Where ω 0 and ω 1 are the pixel proportions of the background and foreground, respectively, and μ 0 and μ 1 are the average grayscale values of the background and foreground, respectively;

接着根据计算得到的阈值,将每个通道的像素值与其阈值比较,大于阈值的设为1(视为染色质密集区),小于等于阈值的设为0(视为背景),完成红色和蓝色通道的二值化分割,二值化图像分别记为Rbin,BbinThen, according to the calculated threshold, the pixel value of each channel is compared with its threshold. The pixel value greater than the threshold is set to 1 (considered as chromatin-dense area), and the pixel value less than or equal to the threshold is set to 0 (considered as background). The binary segmentation of the red and blue channels is completed, and the binary images are recorded as R bin and B bin respectively;

步骤302、梯度增强与分割:Step 302: Gradient enhancement and segmentation:

计算经过模糊去噪等预处理后的灰度图像Igrey的梯度增强图像G(x,y),采用Sobel算子求得水平和垂直方向梯度再求得梯度幅值图像M(x,y),公式如下:Calculate the gradient enhanced image G(x,y) of the grayscale image I grey after fuzzy denoising and other preprocessing, use the Sobel operator to obtain the horizontal and vertical gradients and then obtain the gradient magnitude image M(x,y), the formula is as follows:

应用分水岭变化算法对梯度图像进行处理,寻找图像中梯度突变的区域,通过构建拓扑图,设定阈值分割值域,获得分割区域,并转化为二值掩膜:The watershed change algorithm is applied to process the gradient image to find the area where the gradient changes suddenly in the image. By constructing a topological map and setting the threshold segmentation range, the segmentation area is obtained and converted into a binary mask:

Maskw=Fliter(Watered(Igrey,M(x,y))) (36)Mask w = Filter(Watered(I grey ,M(x,y))) (36)

最后综合划定感兴趣区域,将上述三种二值化图像进行逻辑运算,得到宫颈组织切片感兴趣区域(ROI)的硬划分ROI(i,j):Finally, the region of interest is comprehensively delineated, and the above three binary images are logically operated to obtain the hard segmentation ROI (i, j) of the region of interest (ROI) of the cervical tissue section:

MaskC(i,j)=Rbin(i,j)∨Bbin(i,j) (37)Mask C (i,j)=R bin (i,j)∨B bin (i,j) (37)

ROI(i,j)=MaskC(i,j)∧MaskW(i,j) (38)ROI(i,j)=Mask C (i,j)∧Mask W (i,j) (38)

步骤4、宫颈组织图像组织块视觉先验引导图生成:Step 4: Generate visual prior guidance map of cervical tissue image tissue block:

综合上述宫颈组织图像组织块边界距离计算阶段与组织块视觉先验感兴趣区域(ROI)硬划分阶段两阶段得到的图像级边界距离表示Idist1,Idist2和ROI硬划分ROI(i,j),通过加权计算得到视觉先验引导图Iactivate,公式如下:Combining the image-level boundary distance representations I dist1 , I dist2 and ROI hard segmentation ROI(i,j) obtained in the above two stages of the cervical tissue image tissue block boundary distance calculation stage and the tissue block visual prior region of interest (ROI) hard segmentation stage, the visual prior guidance map I activate is obtained through weighted calculation, and the formula is as follows:

其中,α,β,γ分别是值在0~1之间超参数,softmax将向量映射为一个概率分布,确保每个元素取值在0~1之间;Among them, α, β, and γ are hyperparameters with values between 0 and 1. Softmax maps the vector into a probability distribution to ensure that each element takes a value between 0 and 1.

S2、宫颈组织图像实例级特征提取,具体步骤如下:S2. Instance-level feature extraction of cervical tissue images. The specific steps are as follows:

步骤5、网格采样与视觉先验位置编码:Step 5: Grid sampling and visual prior position encoding:

首先,将视觉先验引导图Iactivate与宫颈组织切片对齐;分别在5x倍率下获取宫颈组织全景图像和40x倍率下获取宫颈组织切片集;再将步骤2中得到的视觉先验引导图Iactivate利用双线性插值法,将权重进行上采样与5x倍率下的宫颈组织全景图像对齐,本发明用PyTorch中的工具类UpSample实现,其原理如下:First, the visual prior guidance map I activate is aligned with the cervical tissue slices; a panoramic image of the cervical tissue is obtained at a magnification of 5x and a set of cervical tissue slices is obtained at a magnification of 40x respectively; then the visual prior guidance map I activate obtained in step 2 is upsampled by bilinear interpolation method, and aligned with the panoramic image of the cervical tissue at a magnification of 5x. The present invention is implemented by the tool class UpSample in PyTorch, and its principle is as follows:

其中Q11(x1,y1),Q12(x1,y2),Q21(x2,y1),Q22(x2,y2)为原图像相邻四点;Among them, Q 11 (x 1 ,y 1 ),Q 12 (x 1 ,y 2 ),Q 21 (x 2 ,y 1 ),Q 22 (x 2 ,y 2 ) are four adjacent points of the original image;

将5x倍率下的宫颈组织全景图像与插值后的权重图切片,并与40x倍率下的组织切片对齐,再次将切片进行上采样,将权重与切片形成映射关系,构建先验视觉引导切片集Iactivate_patchThe panoramic image of cervical tissue at 5x magnification is sliced with the interpolated weight map, and aligned with the tissue slice at 40x magnification, and the slice is upsampled again, and the weight and slice are mapped to construct the prior visual guidance slice set I activate_patch ;

再通过位置编码器构建视觉先验引导位置编码Fposition,该编码器由全连接层(FC)、多头注意力层(MHSA)、层归一化(LN)构建,其计算过程表示如下:Then, the visual prior-guided position encoding F position is constructed through the position encoder. The encoder is constructed by a fully connected layer (FC), a multi-head attention layer (MHSA), and a layer normalization (LN). The calculation process is shown as follows:

Fposition=LN(MHSA(Iactivate_pathWPE)) (42)F position = LN(MHSA(I activate_path W PE )) (42)

步骤6、图像特征提取:用预训练后的CLIP图像编码器(ViT)提取40x放大倍率下的宫颈组织切片图像特征Fimage_patchStep 6, image feature extraction: Use the pre-trained CLIP image encoder (ViT) to extract the image features F image_patch of cervical tissue slices at 40x magnification;

步骤7、将生成的视觉先验位置编码特征Fposition与宫颈组织切片图像特征Fimage_patch进行拼接操作,再通过全连接层(FC)得到宫颈组织图像实例级特征Fimage_instance,公式表示如下:Step 7: Concatenate the generated visual prior position encoding feature F position with the cervical tissue slice image feature F image_patch , and then obtain the cervical tissue image instance-level feature F image_instance through a fully connected layer (FC). The formula is as follows:

Fimage_instance=Concat(Fimage_patch,Fposition)Wfeat (43)F image_instance = Concat(F image_patch ,F position )W feat (43)

S3、宫颈组织图像关键实例特征提取:S3. Key instance feature extraction of cervical tissue images:

步骤8、实例级特征处理:将宫颈组织图像实例级特征Fimage_instance进行层归一化(Layer Normalization)操作,然后引入多头注意力模块(MHSA)计算注意力分数αimage_instance并与实例级特征Fimage_instance进行加和得到视觉引导的加权实例特征Fimage_instance′,公式如下:Step 8, instance-level feature processing: perform layer normalization on the instance-level feature F image_instance of the cervical tissue image, and then introduce the multi-head attention module (MHSA) to calculate the attention score α image_instance and add it to the instance-level feature F image_instance to obtain the visually guided weighted instance feature F image_instance ′, the formula is as follows:

αimage_instance=MHSA(LN(Fimage_instance)) (44)α image_instance =MHSA(LN(F image_instance )) (44)

Fimage_instance′=αimage_instanceFimage_instance (45)F image_instance ′=α image_instance F image_instance (45)

步骤9、关键特征筛选:视觉引导的加权实例特征Fimage_instance′分别通过池化层(MaxPooling)与全连接层(FC)得到实例的类别概率Pimage_instance,根据该分值选择前K个实例级特征作为关键实例特征FTOPK_instance,公式表示如下:Step 9, key feature screening: The visually guided weighted instance feature F image_instance ′ is respectively passed through the pooling layer (MaxPooling) and the fully connected layer (FC) to obtain the instance category probability P image_instance . According to the score, the top K instance-level features are selected as the key instance features F TOPK_instance . The formula is as follows:

Pimage_instance=σ(Pool(Fimage_instance′)Wimage_instance) (46)P image_instance =σ(Pool(F image_instance ′)W image_instance ) (46)

再将视觉引导的加权实例特征Fimage_instance′随机遮盖形成特征序列Fmask,通过设计位置解码器PD,将包含空间及位置信息的特征Fimage_instance′解码,利用视觉引导图Iactivate前景和背景信息作为监督信息,进行掩膜位置重构,进而更新视觉引导图Iactivate′,提升空间分布语义一致性和泛化能力,通过计算分布损失Lposition,进而不断更新迭代,公式表示如下:Then, the visually guided weighted instance feature F image_instance ′ is randomly masked to form a feature sequence F mask . By designing a position decoder PD, the feature F image_instance ′ containing spatial and position information is decoded. The foreground and background information of the visual guidance map I activate is used as supervision information to reconstruct the mask position, and then the visual guidance map I activate ′ is updated to improve the spatial distribution semantic consistency and generalization ability. The distribution loss L position is calculated and then continuously updated and iterated. The formula is as follows:

(F′,Fp′)=LN(MHSA(Fmask))WPD (49)(F′,F p ′)=LN(MHSA(F mask ))W PD (49)

S4、构建宫颈组织图像实例级文本提示描述:S4. Constructing instance-level textual description of cervical tissue images:

步骤10、文本描述生成:利用ChatGPT4针对宫颈组织病理病变分级发生部位及其正异常特征生成系列文本提示信息,提示ChatGPT4按细胞类型、细胞形态描述、是否有异常这三类信息描述并整合为实例级文本描述信息提示,示例如下:Step 10, text description generation: ChatGPT4 is used to generate a series of text prompt information for the location of cervical tissue pathological lesions and their positive and abnormal characteristics. ChatGPT4 describes the three types of information, namely cell type, cell morphology description, and whether there is an abnormality, and integrates them into instance-level text description information prompts. The examples are as follows:

“([Cervical epithelial basal layer cells],[Columnar or cuboidal,tightly attached to the basement membrane,with nuclei that are elongated andrelatively uniformin size,arranged in a regular pattern along the basementmembrane],[Normal])”,“([Intermediate and superficial layers of cervicalepithelium],[Intermediate layer cells begin to flatten and pack densely,without significant thickening,exhibiting uneven chromatin distribution,irregular nuclear membranes,darker staining.],[Abnormal])”,“([Cervicalglandular epithelium],[Columnar cells form the glandular structures,withnuclei positioned at their bases,displaying disordered arrangement,large andirregular nuclei,increased cytoplasm,uneven chromatin distribution,and darklystained nuclei.],[Abnormal])”,“([Cervical stroma],[Located beneath theepithelial layer is the connective tissue,which contains sparse smooth musclecells arranged loosely.The cells exhibit uniformly sized nuclei with lightstaining,contributing to an overall lighter hue.],[Unkown])”,……;“([Cervical epithelial basal layer cells],[Columnar or cuboidal,tightly attached to the basement membrane,with nuclei that are elongated and relatively uniform in size,arranged in a regular pattern along the basement membrane],[Normal])”,“([Intermediate and superficial layers of cervical epithelial],[Intermediate layer cells begin to flatten and pack densely,without significant thickening,exhibiting uneven chromatin distribution,irregular nuclear membranes,darker staining.],[Abnormal])”,“([Cervical glandular epithelium],[Columnar cells form the glandular structures,withnuclei positioned at their bases,displaying disordered arrangement,large and irregular nuclei,increased cytoplasm,uneven chromatin distribution,and darklystained nuclei.],[Abnormal])”,“([Cervical stroma],[Located beneath the epithelial layer is the connective tissue, which contains sparse smooth muscle cells arranged loosely. The cells exhibit uniformly sized nuclei with lightstaining, contributing to an overall lighter hue.],[Unkown])”,…;

步骤11、文本特征转化:将文本描述经过预训练的CLIP文本编码器转化为文本特征,与S3阶段提取的宫颈组织关键实例特征FTOPK_instance计算余弦相似度,以建立实例级图像和文本之间的关联;Step 11: Text feature conversion: The text description is converted into text features through the pre-trained CLIP text encoder, and the cosine similarity is calculated with the key instance feature F TOPK_instance of cervical tissue extracted in the S3 stage to establish the association between instance-level images and texts;

S5、实例级图像特征与文本特征对齐:S5. Alignment of instance-level image features and text features:

步骤12、特征对齐与优化:Step 12: Feature alignment and optimization:

本实施例将预训练好的CLIP图像编码器与文本编码器冻结权重,设置10个Token作为经验设定值,该Token通过设置文本TokenTtext_token与关键实例特征FTOPK_instance通过两层MLP得到的图像Token拼接得到,记为Ttoken={t1,t2,…,t10},对应的实例级标签记为Y={y1,y2,…,y10}),通过交叉熵损失(Cross-Entropy Loss)来衡量预测结果和标签间差异,并使用反向传播误差来更新这些Token,公式如下:In this embodiment, the weights of the pre-trained CLIP image encoder and text encoder are frozen, and 10 tokens are set as empirical setting values. The token is obtained by setting the text token T text_token and the key instance feature F TOPK_instance to concatenate the image token obtained by two layers of MLP, denoted as T token = {t 1 , t 2 , …, t 10 }, and the corresponding instance-level label is denoted as Y = {y 1 , y 2 , …, y 10 }). The cross-entropy loss is used to measure the difference between the prediction result and the label, and the back-propagation error is used to update these tokens. The formula is as follows:

Ttoken=Concat(LightNet(FTOPK_instance),Ttext_token) (51)T token = Concat(LightNet(F TOPK_instance ),T text_token ) (51)

其中,pi是模型对于第i个Token预测为对应类别标签的概率,即Among them, pi is the probability that the model predicts the corresponding category label for the i-th Token, that is,

pi=P(ti|FTOPK_instance,Ttoken);p i =P(t i |F TOPK_instance ,T token );

根据步骤11,其目标函数构建如下:According to step 11, its objective function is constructed as follows:

Lcontrastive=(Ltext2image+Limage2text)/2 (56)L contrastive = (L text2image + L image2text )/2 (56)

其中,Ii是第i个文本描述的嵌入向量,Tj是第j个切片图像嵌入向量,文本到图像对比损失表示为Ltext2image,图像到文本对比损失表示为Limage2text,整个相似度对比损失表示为LcontrastiveWhere Ii is the embedding vector of the i-th text description, Tj is the embedding vector of the j-th slice image, the text-to-image contrast loss is denoted as Ltext2image , the image-to-text contrast loss is denoted as Limage2text , and the overall similarity contrast loss is denoted as Lcontrastive ;

S6、实例级特征聚合:S6. Instance-level feature aggregation:

步骤13、特征加权聚合:通过计算实例级图像特征与文本特征之间的相似度分数,对S3中提取的关键实例特征进行加权聚合,形成包级特征FBagStep 13: Weighted feature aggregation: By calculating the similarity scores between instance-level image features and text features, the key instance features extracted from S3 are weightedly aggregated to form bag-level features F Bag ;

S7、构建基于全切片图像(WSI)级别的病变提示库:S7. Construct a lesion prompt library based on the whole slice image (WSI) level:

步骤14、病变库建立:通过系统性采集宫颈组织病理学图谱中的宫颈组织病理分级病变特征描述及图例,将其视为权威的专家知识储备,再进一步运用先进的人工智能模型ChatGPT4对这些专业知识进行深度整合与梳理,旨在将医学专家的传统主观性描述方式转换为标准化、智能化的描述体系,从而有效弥合知识传递过程中的差距,示例如下:Step 14: Establishment of lesion library: By systematically collecting the descriptions and legends of cervical histopathology lesions in the cervical histopathology atlas, we regard them as authoritative expert knowledge reserves, and further use the advanced artificial intelligence model ChatGPT4 to deeply integrate and sort out these professional knowledge, aiming to transform the traditional subjective description method of medical experts into a standardized and intelligent description system, thereby effectively bridging the gap in the knowledge transfer process. The examples are as follows:

“([CIN1],[Slight nuclear enlargement,mild chromatin condensation,andslightly enlarged nucleoli.Cells show minor disarray but maintain a single-layer structure,mainly affecting the lower third of the cervicalepithelium,…],[Abnormal])”,“([CIN2],[The lesion primarily involves theintermediate to deep layers of the cervical epithelium,where despite thebasal layer structure typically remaining relatively intact,cells immediatelyadjacent to the basal layer exhibit pronounced abnormalities,manifesting asenlarged nuclei,increased nuclear staining intensity,heightened nucleo-cytoplasmic ratio…],[Abnormal])”,“([CIN3],[The lesion involves the deeplayers and potentially the entire epithelium of the cervix,accompanied bysignificant increases in cellular atypia.Although the basal cell layerstructure may be locally preserved,there are marked changes suggestive ofmalignant potential in adjacent regions as well as in intermediate to deeplayers of cells.This is characterized by notably enlarged and irregularlyshaped nuclei,with intense and dense nuclear staining,severe imbalance in thenucleus-to-cytoplasm ratio,prominently abnormal nucleolar structures,and caneven manifest as multinucleation or bizarre nuclei…],[Abnormal])”,……;“([CIN1],[Slight nuclear enlargement,mild chromatin condensation,and slightly enlarged nucleoli.Cells show minor disarray but maintain a single-layer structure,mainly affecting the lower third of the cervical epithelium,…],[Abnormal])”,“([CIN2],[The lesion primarily involves the intermediate to deep layers of the cervical epithelium,where despite the basal layer structure typically remaining relatively intact,cells immediately adjacent to the basal layer exhibit pronounced abnormalities,manifesting as enlarged nuclei,increased nuclear staining intensity,heightened nucleo-cytoplasmic ratio…],[Abnormal])”,“([CIN3],[The lesion involves the deeplayers and potentially the entire epithelium of the cervix,accompanied bysignificant increases in cellular atypia.Although the basal cell layerstructure may be locally preserved,there are marked changes suggestive of malignant potential in adjacent regions as well as in intermediate to deeplayers of cells.This is characterized by notably enlarged and irregularly shaped nuclei, with intense and dense nuclear staining, severe imbalance in the nucleus-to-cytoplasm ratio, prominently abnormal nucleolar structures, and can even manifest as multinucleation or bizarre nuclei…],[Abnormal])”,…;

再利用预训练的CLIP图像编码器和文本编码器提取对应的图像特征Fkey和文本特征Fvalue:构建图像特征与文本特征对形成多个键值对(Key-Value),记为Fkey_valueThen use the pre-trained CLIP image encoder and text encoder to extract the corresponding image features F key and text features F value : construct image features and text features to form multiple key-value pairs (Key-Value), recorded as F key_value ;

S8、包级图像特征检索文本特征:S8. Package-level image feature retrieval text feature:

步骤15、图文检索:将S6中的包级特征FBag与S7中的图像特征Fkey计算相似度分数ABag_key,利用键值对Fkey_value,通过标签传播的方式与Fvalue进行聚合形成预测FBag′实现知识的检索,公式如下:Step 15, image and text retrieval: Calculate the similarity score A Bag_key between the bag-level feature F Bag in S6 and the image feature F key in S7, and use the key-value pair F key_value to aggregate with F value through label propagation to form the prediction F Bag ′ to realize knowledge retrieval. The formula is as follows:

ABag_key=exp(-β(1-FBagFkey T)) (57)A Bag_key = exp(-β(1-F Bag F key T )) (57)

FBag′=ABag_keyFvalue (58)F Bag ′=A Bag_key F value (58)

S9、病变诊断决策:S9. Lesion diagnosis decision:

步骤16、预测与决策:根据S8阶段得到的预测FBag′和原始包特征FBag通过由两层MLP构成的轻量级网络LightNet得到预测特征,LightNet权重记为Wc,两者进行残差连接加权求和获得最终的预测结果logitsBag,该结果同时获得了实例级数据信息和Few-shot知识,预测结果计算公式如下:Step 16, prediction and decision: According to the prediction F Bag ′ obtained in the S8 stage and the original bag feature F Bag , the prediction feature is obtained through the lightweight network LightNet composed of two layers of MLP. The LightNet weight is recorded as W c . The two are weighted summed by residual connection to obtain the final prediction result logits Bag . This result obtains both instance-level data information and Few-shot knowledge. The prediction result calculation formula is as follows:

根据该类别预测判断病变所属的等级范围,并结合S7构建的病变提示库对应描述信息作为最终诊断依据。The level range of the lesion is determined based on the category prediction, and the corresponding description information of the lesion prompt library constructed by S7 is used as the final diagnosis basis.

最终使用3张不同病变等级的WSI演示算法诊断结果,实现效果如图4至图9所示,清晰可见视觉引导先验对于病变识别起到了一定的指导作用。将视觉与文本模型相结合应用于宫颈组织病理的精细化诊断中,该方法成功契合了临床实践的需求,不仅能精准提供诊断结果,还能同时揭示支撑诊断结论的详细依据。Finally, the algorithm diagnosis results were demonstrated using three WSIs of different lesion levels, and the results are shown in Figures 4 to 9. It is clear that the visual guidance prior plays a certain guiding role in lesion identification. The combination of visual and text models is applied to the refined diagnosis of cervical tissue pathology. This method successfully meets the needs of clinical practice, not only accurately providing diagnostic results, but also revealing the detailed basis supporting the diagnostic conclusions.

本发明还可有其它多种实施例,在不背离本发明精神及其实质的情况下,本领域技术人员当可根据本发明做出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明范围。The present invention may also have other various embodiments. Without departing from the spirit and essence of the present invention, those skilled in the art may make various corresponding changes and modifications based on the present invention, but these corresponding changes and modifications should all fall within the scope of the present invention.

Claims (6)

1.基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,包含视觉先验引导、实例级特征提取、关键实例筛选、文本描述构建及图像文本对齐、病变等级判定等多个阶段:1. A few-sample classification method for panoramic cervical images based on visual guidance and language prompts, characterized by including multiple stages such as visual prior guidance, instance-level feature extraction, key instance screening, text description construction and image-text alignment, and lesion grade determination: 所述方法包含步骤:The method comprises the steps of: S1、宫颈组织全景图像视觉先验引导图生成阶段:S1. Cervical tissue panoramic image visual priori guidance map generation stage: 步骤1、宫颈全景图像预处理:获取并缩放宫颈全景图像缩略图,进行染色归一化、图像二值化与外轮廓检测;Step 1: Preprocessing of cervical panoramic images: obtaining and scaling cervical panoramic image thumbnails, performing dye normalization, image binarization, and outer contour detection; 步骤2、宫颈组织图像边界距离计算:计算组织块质心,确定像素距离表示;Step 2: Calculation of the boundary distance of the cervical tissue image: Calculate the centroid of the tissue block and determine the pixel distance representation; 步骤3、宫颈组织图像ROI硬划分:利用OTSU阈值法对红蓝通道进行二值化处理,通过Sobel算子计算梯度图像,应用分水岭变化算法分割出梯度突变区域,形成ROI的硬划分掩膜;Step 3: Hard segmentation of ROI of cervical tissue image: Binarization of red and blue channels is performed using the OTSU threshold method, gradient image is calculated using the Sobel operator, and the watershed change algorithm is applied to segment the gradient mutation area to form a hard segmentation mask of ROI; 步骤4、视觉先验引导图生成:结合步骤2得到的边界距离表示与步骤3得到的ROI硬化分掩膜,通过加权及逻辑运算生成视觉先验引导图;Step 4: Generate a visual priori guidance map: Combine the boundary distance representation obtained in step 2 with the ROI hardening sub-mask obtained in step 3, and generate a visual priori guidance map through weighted and logical operations; S2、宫颈组织图像实例级特征提取阶段:S2, cervical tissue image instance-level feature extraction stage: 步骤5、网格采样与视觉先验位置编码:将视觉先验引导图与宫颈组织全景图像及高倍率组织切片对齐,利用双线性插值进行上采样和切片操作,通过位置编码器构建视觉先验引导位置编码;Step 5, grid sampling and visual prior position encoding: align the visual prior guidance map with the panoramic image of cervical tissue and high-magnification tissue slices, use bilinear interpolation to perform upsampling and slicing operations, and construct the visual prior guidance position encoding through the position encoder; 步骤6、图像特征提取:使用预训练的CLIP图像编码器(ViT)模型提取40x放大倍率下组织切片的图像特征;Step 6: Image feature extraction: Use the pre-trained CLIP image encoder (ViT) model to extract image features of tissue sections at 40x magnification; 步骤7、特征融合:将视觉先验位置编码特征与图像特征进行拼接,并通过全连接层(FC)生成实例级特征;Step 7: Feature fusion: concatenate the visual prior position encoding features with the image features, and generate instance-level features through a fully connected layer (FC); S3、宫颈组织关键实例特征提取阶段:S3, Cervical tissue key instance feature extraction stage: 步骤8、实例级特征处理:对实例级特征进行层归一化,并通过多头注意力机制(MHSA)计算注意力得分,生成视觉引导的加权实例特征;Step 8: Instance-level feature processing: perform layer normalization on instance-level features, calculate attention scores through the multi-head attention mechanism (MHSA), and generate visually guided weighted instance features; 步骤9、关键特征筛选:利用池化层与全连接层得到实例的类别概率,并根据概率值选取前K个实例级特征作为关键实例特征;Step 9: Key feature screening: Use the pooling layer and the fully connected layer to obtain the category probability of the instance, and select the top K instance-level features as key instance features based on the probability value; S4、构建宫颈组织实例级文本提示描述阶段:S4: Constructing instance-level text prompt description of cervical tissue: 步骤10、文本描述生成:使用ChatGPT4生成针对宫颈组织病理病变部位和正异常特征的实例级文本描述信息提示;Step 10: Text description generation: Use ChatGPT4 to generate instance-level text description information prompts for cervical tissue pathological lesion sites and positive abnormal features; 步骤11、文本特征转化:将文本描述通过预训练的CLIP文本编码器转化为文本特征;Step 11: Text feature conversion: convert the text description into text features through the pre-trained CLIP text encoder; S5、实例级图像文本对对齐阶段:S5, instance-level image-text pair alignment stage: 步骤12、特征对齐与优化:将关键实例特征与相应的文本特征计算相似度,并通过轻量化训练策略更新模型参数,优化图文对齐效果;Step 12: Feature alignment and optimization: Calculate the similarity between key instance features and corresponding text features, and update model parameters through lightweight training strategy to optimize image-text alignment effect; S6、实例级特征聚合阶段:S6, instance-level feature aggregation stage: 步骤13、特征加权聚合:根据关键实例特征与文本特征之间的相似度分数进行加权聚合,形成反映整体病变特征的包级特征;Step 13: Weighted feature aggregation: perform weighted aggregation based on the similarity scores between key instance features and text features to form a package-level feature that reflects the overall lesion features; S7、全切片图像级别的病变提示库构建阶段:S7. Construction of the lesion prompt library at the whole-slice image level: 步骤14、病变库建立:收集CIN1至CIN3各级别的宫颈组织病理病变图像及其特征描述,利用ChatGPT4对描述进行规范化处理;分别提取病变示例的图像特征和文本特征,构建键值对形式的病变提示库;Step 14: Establishment of lesion library: Collect images of cervical tissue pathological lesions of various levels from CIN1 to CIN3 and their feature descriptions, and use ChatGPT4 to normalize the descriptions; extract image features and text features of lesion examples respectively, and build a lesion prompt library in the form of key-value pairs; S8、包级图像特征检索文本特征:S8. Package-level image feature retrieval text feature: 步骤15、图文检索:使用包级特征与病变提示库中的图像特征计算余弦相似度,通过标签传播聚合得到预测结果;Step 15, image and text retrieval: use the package-level features and the image features in the lesion prompt library to calculate the cosine similarity, and obtain the prediction results through label propagation aggregation; S9、病变诊断决策:S9. Lesion diagnosis decision: 步骤16、预测与决策:将预测结果与原始包特征结合,通过轻量级网络生成最终预测特征,结合预测结果与病变提示库信息作出病变等级范围的诊断决策。Step 16: Prediction and decision: Combine the prediction results with the original package features, generate the final prediction features through a lightweight network, and combine the prediction results with the lesion prompt library information to make a diagnostic decision on the lesion grade range. 2.如权利要求1所述的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,S2中步骤4至步骤7所述的宫颈组织全景图像视觉先验引导图生成方法,该方法针对宫颈组织特定结构和病变特征,认为上皮组织深染区域的分布信息具备天然的先验性质,具体步骤如下:2. The method for classifying a small number of cervical panoramic images based on visual guidance and language prompts as claimed in claim 1, characterized in that the method for generating a visual priori guidance map of a panoramic image of cervical tissue described in steps 4 to 7 in S2, the method is based on the specific structure and pathological characteristics of cervical tissue, and it is considered that the distribution information of the dark-stained area of epithelial tissue has a natural priori property, and the specific steps are as follows: 步骤4、低放大倍率下的宫颈组织全景图像预处理:Step 4: Preprocessing of panoramic images of cervical tissue at low magnification: 步骤401、获取2.5x放大倍率下的宫颈全景图像的缩略图;Step 401, obtaining a thumbnail of a panoramic image of the cervix at a magnification of 2.5x; 步骤402、在宫颈全景图像训练集中进行染色归一化:Step 402: Perform dye normalization on the cervical panoramic image training set: 首先进行RGB到Lab颜色空间的转换,公式如下:First, convert RGB to Lab color space. The formula is as follows: L=0.2126×R+0.7152×G+0.0722×BL=0.2126×R+0.7152×G+0.0722×B a=0.5×(R-G)+128a=0.5×(R-G)+128 b=0.5×(R-B)+128b=0.5×(R-B)+128 这里的R、G、B分别表示RGB图像中每个像素点的红、绿、蓝分量;Here R, G, and B represent the red, green, and blue components of each pixel in the RGB image respectively; 再进行Lab通道归一化,再用Z-sorce标准化,即将每个通道的数值减去均值再除以标准差,Lab通道的均值和标准差可以通过训练数据集计算统计量得到,公式如下:Then perform Lab channel normalization and then standardize with Z-sorce, that is, subtract the mean from the value of each channel and divide it by the standard deviation. The mean and standard deviation of the Lab channel can be obtained by calculating the statistics of the training data set. The formula is as follows: 其中,Li表示Lab图像中第i个像素点的明度值,N表示像素点的总数,类似地可以算出通道a和通道b的均值和标准差;Where Li represents the brightness value of the i-th pixel in the Lab image, and N represents the total number of pixels. Similarly, the mean and standard deviation of channel a and channel b can be calculated; 将归一化后的Lab图像转换回RGB颜色空间,得到归一化后的RGB图像,公式如下:Convert the normalized Lab image back to the RGB color space to get the normalized RGB image. The formula is as follows: R=L+1.402×(b-128)R=L+1.402×(b-128) G=L-0.3441×(a-128)-0.7141×(b-128)G = L - 0.3441 × (a-128) - 0.7141 × (b-128) B=L+1.772×(a-128)B=L+1.772×(a-128) 步骤403、进行图像二值化与外轮廓检测:Step 403: perform image binarization and outer contour detection: 得到染色归一化的图像后,再对图像进行二值化,然后进行外轮廓检测,该方法基于图像边缘检测和连通域标记技术,通过寻找图像中的连通域并提取器外轮廓线,实现对物体外部轮廓的准确检测;After obtaining the dyed normalized image, the image is binarized and then the outer contour is detected. This method is based on image edge detection and connected domain marking technology. It accurately detects the outer contour of the object by finding the connected domain in the image and extracting the outer contour line. 接着根据检测到的外轮廓线C中的最大连通域对图像进行切分,再将切分后的图片缩放为一致大小,如1024×1024大小;Then the image is segmented according to the largest connected domain in the detected outer contour line C, and the segmented images are scaled to a consistent size, such as 1024×1024; 步骤5、计算宫颈组织图像组织块边界距离;Step 5, calculating the boundary distance of the tissue block of the cervical tissue image; 步骤501、计算组织块质心,依据轮廓面积和一阶矩确定质心位置:Step 501, calculate the centroid of the tissue block, and determine the centroid position based on the contour area and the first-order moment: 据第一步得到的组织切片外轮廓线通过计算轮廓C围成的面积A和一阶矩M10、M01,再根据轮廓面积和一阶矩计算质心(cX,cY),公式如下:According to the outer contour of the tissue slice obtained in the first step, the area A and the first-order moments M 10 and M 01 enclosed by the contour C are calculated, and then the centroid (cX, cY) is calculated based on the contour area and the first-order moment. The formula is as follows: 其中,Ci表示第i个轮廓,x和y分别是像素的横纵坐标;Among them, Ci represents the i-th contour, x and y are the horizontal and vertical coordinates of the pixel respectively; 步骤502、确定像素距离表示,通过计算轮廓内每个像素点到质心所在直线的距离比例,得到两个方向上的图像级边界距离表示:Step 502: Determine the pixel distance representation, by calculating the distance ratio from each pixel point in the contour to the straight line where the centroid is located, and obtain the image-level boundary distance representation in two directions: 根据组织轮廓Ci,找到轮廓的最小外接矩形R四个顶点P1(x1,y1),P2(x2,y2),P3(x3,y3),P4(x4,y4),表示为:According to the tissue contour C i , find the four vertices P 1 (x 1 ,y 1 ), P 2 (x 2 ,y 2 ), P 3 (x 3 ,y 3 ), P 4 (x 4 ,y 4 ) of the minimum circumscribed rectangle R of the contour, expressed as: {P1,P2,P3,P4}=min Area Rect(Ci){P 1 ,P 2 ,P 3 ,P 4 } = min Area Rect(C i ) 再分别计算得到过质心的平行于外接四边形边的直线方程,并求轮廓内每个像素点到直线的距离dist1,dist2,(xp,yp)为轮廓Ci内的像素点,公式如下:Then calculate the equations of the lines passing through the centroid and parallel to the sides of the circumscribed quadrilateral, and find the distances from each pixel point in the contour to the line, dist 1 , dist 2 , (x p , y p ) is a pixel point in the contour Ci , and the formula is as follows: 计算轮廓内像素距离值与轮廓上点距离的比例值作为像素点的距离表示,这样越接近轮廓的像素点的距离表示越接近1,将轮廓外的像素点距离表示记为-1,这样分别形成两个方向上的图像级距离表示,记为与/> The ratio of the distance value of the pixel inside the contour to the distance of the point on the contour is calculated as the distance representation of the pixel point. In this way, the distance representation of the pixel point closer to the contour is closer to 1, and the distance representation of the pixel point outside the contour is recorded as -1. In this way, the image-level distance representation in two directions is formed, which is recorded as With/> 步骤6、宫颈组织图像组织块视觉先验感兴趣区域(ROI)硬划分:Step 6: Visual prior region of interest (ROI) hard segmentation of cervical tissue image tissue block: 步骤601、通道二值化:Step 601: Channel binarization: 首先利用大津阈值(OTSU)全局阈值法分别对宫颈组织图像组织块中的红色(R)和蓝色(B)通道进行二值化处理,对于任一像素点p的红色通道灰度值Rp和蓝色通道灰度值Bp,计算各自通道的最佳阈值TR和TB,使前景(染色质密集区)和背景之间的类间方差最大,计算公式如下:First, the red (R) and blue (B) channels in the tissue block of the cervical tissue image are binarized using the Otsu threshold (OTSU) global threshold method. For the red channel grayscale value Rp and blue channel grayscale value Bp of any pixel point p, the optimal thresholds TR and TB of each channel are calculated to maximize the inter-class variance between the foreground (chromatin dense area) and the background. The calculation formula is as follows: 其中ω0和ω1分别是背景和前景的像素占比,μ0和μ1分别是背景和前景的平均灰度值;Where ω 0 and ω 1 are the pixel proportions of the background and foreground, respectively, and μ 0 and μ 1 are the average grayscale values of the background and foreground, respectively; 接着根据计算得到的阈值,将每个通道的像素值与其阈值比较,大于阈值的设为1(视为染色质密集区),小于等于阈值的设为0(视为背景),完成红色和蓝色通道的二值化分割,二值化图像分别记为Rbin,BbinThen, according to the calculated threshold, the pixel value of each channel is compared with its threshold. The pixel value greater than the threshold is set to 1 (considered as chromatin-dense area), and the pixel value less than or equal to the threshold is set to 0 (considered as background). The binary segmentation of the red and blue channels is completed, and the binary images are recorded as R bin and B bin respectively; 步骤602、梯度增强与分割:Step 602: Gradient enhancement and segmentation: 计算经过模糊去噪等预处理后的灰度图像Igrey的梯度增强图像G(x,y),采用Sobel算子求得水平和垂直方向梯度再求得梯度幅值图像M(x,y),公式如下:Calculate the gradient enhanced image G(x,y) of the grayscale image I grey after fuzzy denoising and other preprocessing, use the Sobel operator to obtain the horizontal and vertical gradients and then obtain the gradient magnitude image M(x,y), the formula is as follows: 应用分水岭变化算法对梯度图像进行处理,寻找图像中梯度突变的区域,通过构建拓扑图,设定阈值分割值域,获得分割区域,并转化为二值掩膜:The watershed change algorithm is applied to process the gradient image to find the area where the gradient changes suddenly in the image. By constructing a topological map and setting the threshold segmentation range, the segmentation area is obtained and converted into a binary mask: Maskw=Fliter(Watered(Igrey,M(x,y)))Mask w = Filter(Watered(I grey ,M(x,y))) 最后综合划定感兴趣区域,将上述三种二值化图像进行逻辑运算,得到宫颈组织切片感兴趣区域(ROI)的硬划分ROI(i,j):Finally, the region of interest is comprehensively delineated, and the above three binary images are logically operated to obtain the hard segmentation ROI (i, j) of the region of interest (ROI) of the cervical tissue section: MaskC(i,j)=Rbin(i,j)∨Bbin(i,j)Mask C (i,j)=R bin (i,j)∨B bin (i,j) ROI(i,j)=MaskC(i,j)∧MaskW(i,j)ROI(i,j)=Mask C (i,j)∧Mask W (i,j) 步骤7、宫颈组织图像组织块视觉先验引导图生成:Step 7: Generate visual prior guidance map of cervical tissue image tissue block: 综合上述宫颈组织图像组织块边界距离计算阶段与组织块视觉先验感兴趣区域(ROI)硬划分阶段两阶段得到的图像级边界距离表示和ROI硬划分ROI(i,j),通过加权计算得到视觉先验引导图Iactivate,公式如下:The image-level boundary distance representation obtained by combining the above two stages of cervical tissue image tissue block boundary distance calculation stage and tissue block visual prior region of interest (ROI) hard segmentation stage is And ROI hard division ROI(i,j), through weighted calculation to obtain the visual prior guidance map I activate , the formula is as follows: 其中,α,β,γ分别是值在0~1之间超参数,softmax将向量映射为一个概率分布,确保每个元素取值在0~1之间。Among them, α, β, and γ are hyperparameters with values between 0 and 1. Softmax maps the vector into a probability distribution to ensure that each element takes a value between 0 and 1. 3.如权利要求1所述的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,步骤S2中所述的视觉引导网格采样器技术,这是针对宫颈组织不规则形状特征和病变区域连贯性的划分策略,使得模型能够更深入地挖掘病变在组织不同层次间的演变规律和空间联系,具体步骤如下:3. The method for classifying cervical panoramic images with a small number of samples based on visual guidance and language prompts as claimed in claim 1, characterized in that the visually guided grid sampler technology described in step S2 is a division strategy for the irregular shape characteristics of cervical tissue and the coherence of the lesion area, so that the model can more deeply explore the evolution law and spatial connection of the lesion between different levels of the tissue, and the specific steps are as follows: 步骤5、网格采样与视觉先验位置编码:Step 5: Grid sampling and visual prior position encoding: 首先,将视觉先验引导图Iactivate与宫颈组织切片对齐;分别在5x倍率下获取宫颈组织全景图像和40x倍率下获取宫颈组织切片集;再将步骤2中得到的视觉先验引导图Iactivate利用双线性插值法,将权重进行上采样与5x倍率下的宫颈组织全景图像对齐,其原理如下:First, the visual prior guidance map I activate is aligned with the cervical tissue slices; a panoramic image of cervical tissue is obtained at a magnification of 5x and a set of cervical tissue slices is obtained at a magnification of 40x respectively; then the visual prior guidance map I activate obtained in step 2 is upsampled by bilinear interpolation and aligned with the panoramic image of cervical tissue at a magnification of 5x. The principle is as follows: 其中Q11(x1,y1),Q12(x1,y2),Q21(x2,y1),Q22(x2,y2)为原图像相邻四点;Among them, Q 11 (x 1 ,y 1 ),Q 12 (x 1 ,y 2 ),Q 21 (x 2 ,y 1 ),Q 22 (x 2 ,y 2 ) are four adjacent points of the original image; 将5x倍率下的宫颈组织全景图像与插值后的权重图切片,并与40x倍率下的组织切片对齐,再次将切片进行上采样,将权重与切片形成映射关系,构建先验视觉引导切片集Iactivate_patchThe panoramic image of cervical tissue at 5x magnification is sliced with the interpolated weight map, and aligned with the tissue slice at 40x magnification, and the slice is upsampled again, and the weight and slice are mapped to construct the prior visual guidance slice set I activate_patch ; 再通过位置编码器构建视觉先验引导位置编码Fposition,该编码器由全连接层(FC)、多头注意力层(MHSA)、层归一化(LN)构建,其计算过程表示如下:Then, the visual prior-guided position encoding F position is constructed through the position encoder. The encoder is constructed by a fully connected layer (FC), a multi-head attention layer (MHSA), and a layer normalization (LN). The calculation process is shown as follows: Fposition=LN(MHSA(Iactivate_pathWPE))F position = LN(MHSA(I activate_path W PE )) 步骤6、图像特征提取:用预训练后的CLIP图像编码器(ViT)提取40x放大倍率下的宫颈组织切片图像特征Fimage_patchStep 6, image feature extraction: Use the pre-trained CLIP image encoder (ViT) to extract the image features F image_patch of cervical tissue slices at 40x magnification; 步骤7、将生成的视觉先验位置编码特征Fposition与宫颈组织切片图像特征Fimage_patch进行拼接操作,再通过全连接层(FC)得到宫颈组织图像实例级特征Fimage_instance,公式表示如下:Step 7: Concatenate the generated visual prior position encoding feature F position with the cervical tissue slice image feature F image_patch , and then obtain the cervical tissue image instance-level feature F image_instance through a fully connected layer (FC). The formula is as follows: Fimage_instance=Concat(Fimage_patch,Fposition)Wfeat。 F image_instance = Concat(F image_patch ,F position )W feat. 4.如权利要求1所述的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,S3中步骤8至步骤9所述的关键实例特征提取方法,具体步骤如下:4. The method for classifying cervical panoramic images with a small number of samples based on visual guidance and language prompts according to claim 1, characterized in that the key instance feature extraction method described in steps 8 to 9 in S3 comprises the following specific steps: 步骤8、实例级特征处理:将宫颈组织图像实例级特征Fimage_instance进行层归一化(LayerNormalization)操作,然后引入多头注意力模块(MHSA)计算注意力分数αimage_instance并与实例级特征Fimage_instance进行加和得到视觉引导的加权实例特征Fimage_instance′,公式如下:Step 8, instance-level feature processing: perform layer normalization on the instance-level feature F image_instance of the cervical tissue image, and then introduce the multi-head attention module (MHSA) to calculate the attention score α image_instance and add it to the instance-level feature F image_instance to obtain the visually guided weighted instance feature F image_instance ′, the formula is as follows: αimage_instance=MHSA(LN(Fimage_instance))α image_instance =MHSA(LN(F image_instance )) Fimage_instance′=αimage_instanceFimage_instance F image_instance ′=α image_instance F image_instance 步骤9、关键特征筛选:视觉引导的加权实例特征Fimage_instance′分别通过池化层(MaxPooling)与全连接层(FC)得到实例的类别概率Pimage_instance,根据该分值选择前K个实例级特征作为关键实例特征FTOPK_instance,公式表示如下:Step 9, key feature screening: The visually guided weighted instance feature F image_instance ′ is respectively passed through the pooling layer (MaxPooling) and the fully connected layer (FC) to obtain the instance category probability P image_instance . According to the score, the top K instance-level features are selected as the key instance features F TOPK_instance . The formula is as follows: Pimage_instance=σ(Pool(Fimage_instance′)Wimage_instance)P image_instance =σ(Pool(F image_instance ′)W image_instance ) 再将视觉引导的加权实例特征Fimage_instance′随机遮盖形成特征序列Fmask,通过设计位置解码器PD,将包含空间及位置信息的特征Fimage_instance′解码,利用视觉引导图Iactivate前景和背景信息作为监督信息,进行掩膜位置重构,进而更新视觉引导图Iactivate′,提升空间分布语义一致性和泛化能力,通过计算分布损失Lposition,进而不断更新迭代,公式表示如下:Then, the visually guided weighted instance feature F image_instance ′ is randomly masked to form a feature sequence F mask . By designing a position decoder PD, the feature F image_instance ′ containing spatial and position information is decoded. The foreground and background information of the visual guidance map I activate is used as supervision information to reconstruct the mask position, and then the visual guidance map I activate ′ is updated to improve the spatial distribution semantic consistency and generalization ability. The distribution loss L position is calculated and then continuously updated and iterated. The formula is as follows: Fmask=[f1 m,f2 m,...,fn m]F mask = [f 1 m ,f 2 m ,...,f n m ] (F′,Fp′)=LN(MHSA(Fmask))WPD (F′, F p ′)=LN(MHSA(F mask ))W PD 5.如权利要求1所述的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,S4至S5中步骤10至步骤12所述的实例级文本提示学习方法,具体步骤总结如下:5. The method for classifying cervical panoramic images with a small number of samples based on visual guidance and language prompts according to claim 1, characterized in that the instance-level text prompt learning method described in steps 10 to 12 in S4 to S5 is summarized as follows: 步骤10、文本描述生成:利用ChatGPT4针对宫颈组织病理病变分级发生部位及其正异常特征生成系列文本提示信息,提示ChatGPT4按细胞类型、细胞形态描述、是否有异常这三类信息描述并整合为实例级文本描述信息提示;Step 10, text description generation: ChatGPT4 is used to generate a series of text prompt information for the location of cervical tissue pathological lesions and their positive and abnormal characteristics. ChatGPT4 is used to describe the three types of information, namely, cell type, cell morphology description, and whether there is an abnormality, and integrate them into instance-level text description information prompts; 步骤11、文本特征转化:将文本描述经过预训练的CLIP文本编码器转化为文本特征,与S3阶段提取的宫颈组织关键实例特征FTOPK_instance计算余弦相似度,以建立实例级图像和文本之间的关联;Step 11: Text feature conversion: The text description is converted into text features through the pre-trained CLIP text encoder, and the cosine similarity is calculated with the key instance feature F TOPK_instance of cervical tissue extracted in the S3 stage to establish the association between instance-level images and texts; 步骤12、特征对齐与优化:Step 12: Feature alignment and optimization: 为了降低CLIP模型在实例级组织病理图像与文本语义对齐任务中训练的计算量级,本发明设计轻量化的训练策略:将预训练好的CLIP图像编码器与文本编码器冻结权重,训练重点在于优化图像和文本的可学习参数Token;图像Token Timage_token通过将提取到的关键实例特征经过两层多层感知机(MLP)构成的轻量网络(LightNet)得到,文本TokenTtext_token设置目标在于针对每个实例学习,捕获不同病理类型的特定语言描述信息,以学习到更合适的文本描述;因此,将图像Timage_token与文本Ttext_token拼接形成可学习参数序列Ttoken,其数量可以自行设计,例如设置包含L个Token作为经验设定值,记为Ttoken={t1,t2,…,tL},对应的实例级标签记为Y={y1,y2,…,yL}),通过交叉熵损失(Cross-Entropy Loss)来衡量预测结果和标签间差异,并使用反向传播误差来更新这些Token,公式如下:In order to reduce the computational level of CLIP model training in the task of semantic alignment of instance-level histopathological images and texts, the present invention designs a lightweight training strategy: the weights of the pre-trained CLIP image encoder and text encoder are frozen, and the training focuses on optimizing the learnable parameter Tokens of the image and text; the image Token T image_token is obtained by passing the extracted key instance features through a lightweight network (LightNet) composed of two layers of multi-layer perceptrons (MLPs), and the text Token T text_token is set to learn for each instance and capture the specific language description information of different pathological types to learn a more appropriate text description; therefore, the image T image_token and the text T text_token are concatenated to form a learnable parameter sequence T token , the number of which can be designed by oneself, for example, L Tokens are set as empirical setting values, denoted as T token ={t 1 , t 2 ,…, t L }, and the corresponding instance-level label is denoted as Y ={y 1 , y 2 ,…, y L }), and the cross-entropy loss (Cross-Entropy Loss) is used to measure the difference between the prediction results and the labels, and the back propagation error is used to update these tokens. The formula is as follows: Ttoken=Concat(LightNet(FTOPK_instance),Ttext_token)T token = Concat(LightNet(F TOPK_instance ),T text_token ) 其中,pi是模型对于第i个Token预测为对应类别标签的概率,即Among them, pi is the probability that the model predicts the corresponding category label for the i-th Token, that is, pi=P(ti|FTOPK_instance,Ttoken);p i =P(t i |F TOPK_instance ,T token ); 接着,利用预训练的文本编码器得到文本特征,与S3得到的宫颈组织关键实例特征计算余弦相似度,目标函数公式如下:Next, the pre-trained text encoder is used to obtain text features, and the cosine similarity is calculated with the key instance features of cervical tissue obtained in S3. The objective function formula is as follows: Lcontrastive=(Ltext2image+Limage2text)/2L contrastive = (L text2image + L image2text )/2 其中,Ii是第i个文本描述的嵌入向量,Tj是第j个切片图像嵌入向量,文本到图像对比损失表示为Ltext2image,图像到文本对比损失表示为Limage2text,整个相似度对比损失表示为Lcontrastive。 where Ii is the embedding vector of the i-th text description, Tj is the embedding vector of the j-th slice image, the text-to-image contrastive loss is denoted as Ltext2image , the image-to-text contrastive loss is denoted as Limage2text , and the overall similarity contrastive loss is denoted as Lcontrastive. 6.如权利要求1所述的基于视觉引导及语言提示的宫颈全景图像少样本分类方法,其特征在于,S6至S9中步骤13至步骤16所述的包级文本提示学习方法,具体步骤总结如下:6. The method for classifying cervical panoramic images with a small number of samples based on visual guidance and language prompts according to claim 1, characterized in that the package-level text prompt learning method described in steps 13 to 16 in S6 to S9 is summarized as follows: 步骤13、特征加权聚合:通过计算实例级图像特征与文本特征之间的相似度分数,对S3中提取的关键实例特征进行加权聚合,形成包级特征FBagStep 13: Weighted feature aggregation: By calculating the similarity scores between instance-level image features and text features, the key instance features extracted from S3 are weightedly aggregated to form bag-level features F Bag ; 步骤14、病变库建立:通过系统性采集宫颈组织病理学图谱中的宫颈组织病理分级病变特征描述及图例,将其视为权威的专家知识储备,再进一步运用先进的人工智能模型ChatGPT4对这些专业知识进行深度整合与梳理,旨在将医学专家的传统主观性描述方式转换为标准化、智能化的描述体系,从而有效弥合知识传递过程中的差距,再利用预训练的CLIP图像编码器和文本编码器提取对应的图像特征Fkey和文本特征Fvalue:构建图像特征与文本特征对形成多个键值对(Key-Value),记为Fkey_valueStep 14, lesion library establishment: by systematically collecting the descriptions and legends of cervical histopathology graded lesions in the cervical histopathology atlas, we regard them as authoritative expert knowledge reserves, and further use the advanced artificial intelligence model ChatGPT4 to deeply integrate and sort out these professional knowledge, aiming to convert the traditional subjective description method of medical experts into a standardized and intelligent description system, so as to effectively bridge the gap in the knowledge transfer process, and then use the pre-trained CLIP image encoder and text encoder to extract the corresponding image features F key and text features F value : construct image features and text features to form multiple key-value pairs (Key-Value), recorded as F key_value ; 步骤15、图文检索:将包级特征FBag与图像特征Fkey计算相似度分数ABag_key,利用键值对Fkey_value,通过标签传播的方式与Fvalue进行聚合形成预测FBag′实现知识的检索,公式如下:Step 15, image and text retrieval: Calculate the similarity score A Bag_key between the bag-level feature F Bag and the image feature F key , and use the key-value pair F key_value to aggregate with F value through label propagation to form the prediction F Bag ′ to realize knowledge retrieval. The formula is as follows: ABag_key=exp(-β(1-FBagFkey T))A Bag_key = exp(-β(1-F Bag F key T )) FBag′=ABag_keyFvalue F Bag ′=A Bag_key F value 步骤16、预测与决策:根据得到的预测FBag′和原始包特征FBag通过由两层MLP构成的轻量级网络LightNet得到预测特征,LightNet权重记为Wc,两者进行残差连接加权求和获得最终的预测结果logitsBag,该结果同时获得了实例级数据信息和Few-shot知识,预测结果计算公式如下:Step 16, prediction and decision: According to the obtained prediction F Bag ′ and the original bag feature F Bag , the prediction feature is obtained through the lightweight network LightNet composed of two layers of MLP. The LightNet weight is recorded as W c . The two are weighted summed by residual connection to obtain the final prediction result logits Bag . This result obtains both instance-level data information and Few-shot knowledge. The prediction result calculation formula is as follows: 根据该类别预测判断病变所属的等级范围,并结合病变提示库对应描述信息作为最终诊断依据。The grade range of the lesion is determined based on the category prediction, and combined with the corresponding description information in the lesion prompt library as the basis for the final diagnosis.
CN202410423610.6A 2024-04-09 2024-04-09 Cervical panoramic image few-sample classification method based on visual guidance and language prompt Pending CN118230052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410423610.6A CN118230052A (en) 2024-04-09 2024-04-09 Cervical panoramic image few-sample classification method based on visual guidance and language prompt

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410423610.6A CN118230052A (en) 2024-04-09 2024-04-09 Cervical panoramic image few-sample classification method based on visual guidance and language prompt

Publications (1)

Publication Number Publication Date
CN118230052A true CN118230052A (en) 2024-06-21

Family

ID=91503668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410423610.6A Pending CN118230052A (en) 2024-04-09 2024-04-09 Cervical panoramic image few-sample classification method based on visual guidance and language prompt

Country Status (1)

Country Link
CN (1) CN118230052A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118398079A (en) * 2024-06-25 2024-07-26 中国人民解放军军事科学院军事医学研究院 Computer device, method and application for predicting amino acid mutation effect or carrying out design modification on protein
CN118691899A (en) * 2024-06-28 2024-09-24 南京邮电大学 A zero-shot image classification method and system based on prompt guidance

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118398079A (en) * 2024-06-25 2024-07-26 中国人民解放军军事科学院军事医学研究院 Computer device, method and application for predicting amino acid mutation effect or carrying out design modification on protein
CN118691899A (en) * 2024-06-28 2024-09-24 南京邮电大学 A zero-shot image classification method and system based on prompt guidance

Similar Documents

Publication Publication Date Title
Ding et al. Multi-scale fully convolutional network for gland segmentation using three-class classification
CN110428432B (en) Deep neural network algorithm for automatically segmenting colon gland image
Gertych et al. Machine learning approaches to analyze histological images of tissues from radical prostatectomies
CN112070772A (en) Blood leukocyte image segmentation method based on UNet + + and ResNet
Wan et al. Robust nuclei segmentation in histopathology using ASPPU-Net and boundary refinement
CN110852316B (en) Image tampering detection and positioning method adopting convolution network with dense structure
Pan et al. Mitosis detection techniques in H&E stained breast cancer pathological images: A comprehensive review
CN118230052A (en) Cervical panoramic image few-sample classification method based on visual guidance and language prompt
CN111860406A (en) A classification method of blood cell microscopic images based on neural network of regional confusion mechanism
Pan et al. Cell detection in pathology and microscopy images with multi-scale fully convolutional neural networks
CN112990214A (en) Medical image feature recognition prediction model
Nofallah et al. Machine learning techniques for mitoses classification
CN113902669B (en) Method and system for reading urine exfoliated cell liquid-based smear
Razavi et al. MiNuGAN: Dual segmentation of mitoses and nuclei using conditional GANs on multi-center breast H&E images
CN109635726A (en) A kind of landslide identification method based on the symmetrical multiple dimensioned pond of depth network integration
CN117036288A (en) Tumor subtype diagnosis method for full-slice pathological image
Li et al. VBNet: An end-to-end 3D neural network for vessel bifurcation point detection in mesoscopic brain images
KR20240012738A (en) Cluster analysis system and method of artificial intelligence classification for cell nuclei of prostate cancer tissue
Khoshdeli et al. Deep learning models delineates multiple nuclear phenotypes in h&e stained histology sections
Inamdar et al. A novel attention-based model for semantic segmentation of prostate glands using histopathological images
Sun et al. Detection of breast tumour tissue regions in histopathological images using convolutional neural networks
CN113012167A (en) Combined segmentation method for cell nucleus and cytoplasm
Yancey Deep Feature Fusion for Mitosis Counting
CN114842206B (en) Remote sensing image semantic segmentation system and method based on double-layer global convolution
CN117292217A (en) Skin typing data augmentation method and system based on countermeasure generation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination