CN117806601A - Code text processing method, code supplementing method and computing device - Google Patents
Code text processing method, code supplementing method and computing device Download PDFInfo
- Publication number
- CN117806601A CN117806601A CN202311613794.4A CN202311613794A CN117806601A CN 117806601 A CN117806601 A CN 117806601A CN 202311613794 A CN202311613794 A CN 202311613794A CN 117806601 A CN117806601 A CN 117806601A
- Authority
- CN
- China
- Prior art keywords
- code
- target
- information
- text
- metadata
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Document Processing Apparatus (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
技术领域Technical field
本说明书实施例涉及深度学习技术领域,特别涉及一种代码文本处理方法、代码补充方法以及计算设备。The embodiments of this specification relate to the field of deep learning technology, and in particular to a code text processing method, a code supplement method, and a computing device.
背景技术Background technique
随着深度学习技术的发展,利用针对性训练得到的代码生成模型来生成代码,辅助开发者编写代码文本,提升开发效率,降低开发成本。With the development of deep learning technology, code generation models obtained through targeted training are used to generate code, assist developers in writing code text, improve development efficiency, and reduce development costs.
目前,代码生成模型进行代码文本处理,依赖于参考代码文本。代码生成模型需要基于参考代码,生成待处理代码文本对应的目标代码文本,完成代码生成、代码补充或者代码改写。Currently, code generation models for code text processing rely on reference code text. The code generation model needs to be based on the reference code, generate the target code text corresponding to the code text to be processed, and complete code generation, code supplementation or code rewriting.
然而,参考代码是通过文本相似度的方式查找得到的,这样的方式不能保证参考代码的有效性,例如,待处理代码文本包括“ABC”这一变量,在另一个项目的项目文件中同样存在“ABC”这一变量,虽然两者存在高文本相似度,但两者的定义、逻辑等诸多内容都不相同。引入这样的无效参考代码,无法引导代码生成模型进行准确的代码生成,导致生成的目标代码文本的准确度不足,代码文本处理的准确度不足。因此,亟需一种高准确度的代码文本处理方法。However, the reference code is found through text similarity. This method cannot guarantee the validity of the reference code. For example, the code text to be processed includes the variable "ABC", which also exists in the project file of another project. Although the variable "ABC" has high textual similarity between the two, their definitions, logic and many other contents are different. Introducing such invalid reference code cannot guide the code generation model to perform accurate code generation, resulting in insufficient accuracy of the generated target code text and insufficient accuracy of code text processing. Therefore, a highly accurate code text processing method is urgently needed.
发明内容Summary of the invention
有鉴于此,本说明书实施例提供了一种代码文本处理方法。本说明书一个或者多个实施例同时涉及一种代码补充方法,一种代码文本处理装置,一种代码补充装置,一种计算设备,一种计算机可读存储介质以及一种计算机程序,以解决现有技术中存在的技术缺陷。In view of this, embodiments of this specification provide a code text processing method. One or more embodiments of this specification relate to a code supplement method, a code text processing device, a code supplement device, a computing device, a computer readable storage medium and a computer program to solve current problems. There are technical flaws in the technology.
本说明书一个实施例提供了一种代码文本处理方法,包括:One embodiment of this specification provides a code text processing method, including:
获取目标项目的待处理代码文本;Get the pending code text of the target project;
对待处理代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;Parse the code text to be processed to obtain the first code metadata of each code element, where each code element has corresponding element information;
根据各代码元素的元素信息,确定目标查询信息;Determine the target query information based on the element information of each code element;
从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;Search the second code metadata corresponding to the target query information from the target database. The target database records the corresponding relationship between the reference query information and the reference code metadata. The reference query information is based on the code elements of each code element in the project file of the target project. Element information construction;
基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。Based on the first code metadata and the second code metadata, a target code text is generated using a code generation model, wherein the code generation model is obtained by training a text processing model based on the sample code text.
本说明书一个实施例中,获取目标项目的待处理代码文本;对待处理代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;根据各代码元素的元素信息,确定目标查询信息;从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。通过预先解析目标项目的项目文件,基于各代码元素的元素信息构建出参考查询信息,与参考代码元数据对应存储在目标数据库中,在进行代码文本处理过程中,通过解析待处理代码文本,获得各代码元素的第一代码元数据,进而根据各代码元素的元素信息,确定目标查询信息,利用这种明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,生成高准确度的目标代码文本,提升了代码文本处理的准确度。In one embodiment of this specification, the to-be-processed code text of the target project is obtained; the to-be-processed code text is parsed to obtain the first code metadata of each code element, where each code element has corresponding element information; according to the element information, determine the target query information; search for the second code metadata corresponding to the target query information from the target database, where the corresponding relationship between the reference query information and the reference code metadata is recorded in the target database, and the reference query information is based on the target project The element information of each code element in the project file is constructed; based on the first code metadata and the second code metadata, the code generation model is used to generate the target code text, where the code generation model trains the text processing model based on the sample code text. . By pre-parsing the project files of the target project, reference query information is constructed based on the element information of each code element, and stored in the target database corresponding to the reference code metadata. During the code text processing process, by parsing the code text to be processed, we obtain The first code metadata of each code element is used to determine the target query information based on the element information of each code element. This clear code reference relationship is used to query the target database and obtain the effective second code metadata as a reference code. Guide the code generation model to generate high-accuracy target code text, improving the accuracy of code text processing.
附图说明Description of drawings
图1是本说明书一个实施例提供的一种代码文本处理方法的流程图;Figure 1 is a flow chart of a code text processing method provided by an embodiment of this specification;
图2是本说明书一个实施例提供的一种代码文本处理方法中目标数据库的构建流程图;Figure 2 is a flow chart of building a target database in a code text processing method provided by an embodiment of this specification;
图3是本说明书一个实施例提供的一种代码文本处理方法中抽象语法树的示意图;Figure 3 is a schematic diagram of an abstract syntax tree in a code text processing method provided by an embodiment of this specification;
图4是本说明书一个实施例提供的一种代码文本处理方法中第一引用词典和第二引用词典的示意图;Figure 4 is a schematic diagram of the first reference dictionary and the second reference dictionary in a code text processing method provided by an embodiment of this specification;
图5是本说明书一个实施例提供的一种代码文本处理方法中代码文本序列的更新流程图;Figure 5 is a flow chart of updating a code text sequence in a code text processing method provided by an embodiment of this specification;
图6是本说明书一个实施例提供的一种代码文本处理方法中代码文本的实时分析流程图;Figure 6 is a flow chart of real-time analysis of code text in a code text processing method provided by an embodiment of this specification;
图7是本说明书一个实施例提供的一种代码文本处理方法中代码文本的解析流程图;Figure 7 is a flow chart of code text parsing in a code text processing method provided by an embodiment of this specification;
图8是本说明书一个实施例提供的一种代码文本处理方法的前端示意图;Figure 8 is a front-end schematic diagram of a code text processing method provided by an embodiment of this specification;
图9是本说明书一个实施例提供的一种代码补充方法的流程图;Figure 9 is a flow chart of a code supplement method provided by an embodiment of this specification;
图10是本说明书一个实施例提供的一种应用于集成开发环境的代码文本处理方法的处理过程流程图;Figure 10 is a processing flow chart of a code text processing method applied to an integrated development environment provided by an embodiment of this specification;
图11是本说明书一个实施例提供的一种代码文本处理装置的结构示意图;Figure 11 is a schematic structural diagram of a code text processing device provided by an embodiment of this specification;
图12是本说明书一个实施例提供的一种代码补充装置的结构示意图;Figure 12 is a schematic structural diagram of a code supplement device provided by an embodiment of this specification;
图13是本说明书一个实施例提供的一种计算设备的结构框图。Figure 13 is a structural block diagram of a computing device provided by an embodiment of this specification.
具体实施方式Detailed ways
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth to facilitate a thorough understanding of this specification. However, this specification can be implemented in many other ways different from those described here. Those skilled in the art can make similar extensions without violating the connotation of this specification. Therefore, this specification is not limited by the specific implementation disclosed below.
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "a", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of this specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of one or more embodiments of this specification, the first may also be called the second, and similarly, the second may also be called the first. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining."
此外,需要说明的是,本说明书一个或多个实施例所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。In addition, it should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, etc.) involved in one or more embodiments of this specification , displayed data, etc.), are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions, and provide corresponding Operation portal for users to choose to authorize or deny.
本说明书一个或多个实施例中,大模型是指具有大规模模型参数的深度学习模型,通常包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型(Foundation Model),通过大规模无标注的语料进行大模型的预训练,产出亿级以上参数的预训练模型,这种模型能适应广泛的下游任务,模型具有较好的泛化能力,例如大规模语言模型(Large Language Model,简称LLM)、多模态预训练模型(Multi-modal Pre-training Model)等。In one or more embodiments of this specification, a large model refers to a deep learning model with large-scale model parameters, which usually includes hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters. The large model can also be called the cornerstone model/foundation model. It is pre-trained through large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks. , the model has good generalization ability, such as Large Language Model (LLM), Multi-modal Pre-training Model, etc.
大模型在实际应用时,仅需少量样本对预训练模型进行微调即可应用于不同的任务中,大模型可以广泛应用于自然语言处理(Natural Language Processing,简称NLP)、计算机视觉等领域,具体可以应用于如视觉问答(Visual Question Answering,简称VQA)、图像描述(Image Caption,简称IC)、图像生成等计算机视觉领域任务,以及基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务,大模型主要的应用场景包括数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。In practical applications, large models only require a small number of samples to fine-tune the pre-trained model and can be used in different tasks. Large models can be widely used in natural language processing (NLP), computer vision and other fields. Specifically, It can be applied to tasks in computer vision fields such as Visual Question Answering (VQA), Image Caption (IC), image generation, and other natural language processing such as text-based sentiment classification, text summary generation, and machine translation. Domain tasks, the main application scenarios of large models include digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc.
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。First, terminology used in one or more embodiments of this specification will be explained.
代码生成:通过深度学习技术,让文本处理模型学习海量的样本代码文本,为开发者针对性生成代码文本的技术。Code generation: Through deep learning technology, text processing models can learn massive sample code texts and generate targeted code texts for developers.
模型幻觉:受限于模型性能,混淆不同来源的参考知识,从而生成低准确度的代码文本的问题,可以类比于人类对于同形异义词语的错误理解。Model Illusion: Limited by model performance, the problem of confusing reference knowledge from different sources and thus generating low-accuracy code text can be compared to human misunderstanding of homographs.
抽象语法树(Abstract Syntax-code Tree,简称AST):是代码文件的语法结构的树状表示,树上的每个节点都表示代码文件中的一种代码元素,表征了整个代码文件(文本)的语法结构,例如:模块1-对象A-函数a-变量x,其中,模块、对象、函数和变量为代码元素,抽象语法树表征了上述的层级结构。抽象语法树不依赖于具体的代码语言。抽象语法树中记录了各节点对应的代码元素的位置信息。Abstract Syntax-code Tree (AST): It is a tree representation of the syntax structure of a code file. Each node on the tree represents a code element in the code file and represents the entire code file (text). The syntax structure, for example: module 1-object A-function a-variable x, where modules, objects, functions and variables are code elements, and the abstract syntax tree represents the above-mentioned hierarchical structure. Abstract syntax trees do not depend on specific coding languages. The abstract syntax tree records the location information of the code elements corresponding to each node.
符号表:符号表存储源代码文本中的各种符号及其相关信息,如变量、标签、函数、类型等,并记录符号在整个程序中的位置。Symbol table: The symbol table stores various symbols in the source code text and their related information, such as variables, labels, functions, types, etc., and records the location of the symbols in the entire program.
Java包:Java包是一种命名空间机制,它可以用来组织相关的类和接口,避免类名称冲突。Java package: Java package is a namespace mechanism that can be used to organize related classes and interfaces to avoid class name conflicts.
Java导入类型:在Java中,如果你想使用其他包中的类或接口,你需要使用import语句来导入它。Java import types: In Java, if you want to use a class or interface from another package, you need to import it using the import statement.
Java类定义:类是Java中的基本构建块,用于定义对象的数据和行为。类定义包括类的名字、成员变量、构造器以及各种访问控制修饰符。Java Class Definition: Classes are the basic building blocks in Java that define the data and behavior of an object. The class definition includes the class name, member variables, constructors, and various access control modifiers.
Java继承类:Java支持单继承,即一个类只能直接继承另一个类的属性和行为。这样做是为了提高代码重用性。Java inheritance classes: Java supports single inheritance, that is, a class can only directly inherit the properties and behaviors of another class. This is done to improve code reusability.
Java实现接口:接口是在Java中提供功能特性的另一种方式。一个类可以通过impleme nts关键字来声明它实现了某个接口。Java Implementation of Interfaces: Interfaces are another way of providing functional features in Java. A class can declare that it implements an interface through the implement keyword.
Java类属性:类属性也被称为字段或成员变量。它们代表了类对象的状态,并可以在类的所有对象之间共享。Java Class Properties: Class properties are also known as fields or member variables. They represent the state of a class object and can be shared among all objects of the class.
Java类方法:类方法是在类级别上定义的方法,可以由类的任何对象来调用。它们用来定义类对象能够执行的操作或者行为。Java class methods: Class methods are methods defined at the class level and can be called by any object of the class. They are used to define the operations or behaviors that class objects can perform.
Python模块:Python模块是包含相关Python对象(如函数、类、变量)的文件。通过使用import语句,可以在程序中导入并使用这些模块。Python modules: Python modules are files that contain related Python objects such as functions, classes, variables. These modules can be imported and used in your program by using the import statement.
Python导入类型:在Python中,可以导入特定模块或从某个模块中导入特定的对象(如函数、类或变量)。这是通过import或from...import语句完成的。Python import types: In Python, you can import a specific module or import a specific object (such as a function, class, or variable) from a module. This is done via the import or from...import statements.
Python函数定义:函数是一组封装在一起的语句,可以在需要时多次重复使用。在Pyth on中,可以使用def关键字来定义函数。Python function definition: A function is a set of statements that are packaged together and can be reused as many times as needed. In Python, functions can be defined using the def keyword.
Python全局变量:全局变量是在函数外部定义的变量,可以在整个程序中被所有函数访问。Python Global Variables: Global variables are variables defined outside a function and can be accessed by all functions throughout the program.
Python类:Python类是一种数据类型,可以拥有属性和方法。它是面向对象编程的基础。Python class: A Python class is a data type that can have properties and methods. It is the basis of object-oriented programming.
Python类定义:类定义包括类名、属性、方法和特殊方法。类定义使用class关键字开始。Python class definition: Class definition includes class name, attributes, methods and special methods. Class definitions begin with the class keyword.
Python类属性:类属性是指属于类而不是特定对象的变量。类属性在所有对象中共享。Python class attributes: Class attributes refer to variables that belong to a class rather than a specific object. Class properties are shared among all objects.
Python类方法:类方法是属于类而非特定对象的方法。它主要用于处理类级别的操作。Python class methods: Class methods are methods that belong to a class rather than a specific object. It is mainly used to handle class level operations.
Python继承类:在Python中,类可以从已有的类继承,获取其所有的属性和方法。这可以通过在类定义中使用extends关键字完成。Python inherited classes: In Python, a class can inherit from an existing class and obtain all its properties and methods. This can be done by using the extends keyword in the class definition.
JavaScript模块:在JavaScript中,“模块”是自包含的功能集合,可以通过exports关键字导出供其他文件使用。JavaScript modules: In JavaScript, a "module" is a self-contained collection of functionality that can be exported via the exports keyword for use in other files.
JavaScript导入类型:可以通过import关键字导入已经导出过的模块内容。JavaScript import type: Exported module content can be imported through the import keyword.
JavaScript函数定义:在JavaScript中,函数是由关键字function声明的一种特殊类型的变量。JavaScript function definition: In JavaScript, a function is a special type of variable declared by the keyword function.
JavaScript全局变量/常量:全局变量和常量是在任何函数外部定义的变量和常量,可以在整个程序中被访问。JavaScript Global Variables/Constants: Global variables and constants are variables and constants defined outside any function and can be accessed throughout the program.
JavaScript类:JavaScript类是一种创建对象的模板。它描述了一种对象应具备的属性和行为。JavaScript Class: A JavaScript class is a template for creating objects. It describes the properties and behavior that an object should have.
JavaScript类定义:类定义包括类名、属性、方法和特殊方法。类定义使用class关键字开始。JavaScript class definition: A class definition includes the class name, properties, methods, and special methods. A class definition starts with the class keyword.
JavaScript类属性:类属性是指属于类而不是特定对象的变量。类属性在所有对象中共享。JavaScript class attributes: Class attributes are variables that belong to a class rather than a specific object. Class properties are shared among all objects.
JavaScript类方法:类方法是属于类而非特定对象的方法。它主要用于处理类级别的操作。JavaScript class methods: Class methods are methods that belong to a class rather than a specific object. It is mainly used to handle class level operations.
JavaScript继承类:在JavaScript中,类可以从已有的类继承,获取其所有的属性和方法。这可以通过在类定义中使用extends关键字完成。JavaScript inherited classes: In JavaScript, a class can inherit from an existing class and obtain all its properties and methods. This can be done by using the extends keyword in the class definition.
JavaScript对象:JavaScript对象是一系列关联的值(通常是变量和函数)的集合,通过点标记法或者方括号标记法可以访问其内容。JavaScript object: A JavaScript object is a collection of associated values (usually variables and functions) whose contents can be accessed through dot notation or square bracket notation.
JavaScript对象定义:对象定义使用大括号{}括起来,内部是一系列以属性名作为键,属性值作为值的键值对。JavaScript object definition: The object definition is enclosed in curly brackets {}. Inside is a series of key-value pairs with the attribute name as the key and the attribute value as the value.
JavaScript对象属性:对象的属性是指定的对象状态,可以通过点标记法或者方括号标记法访问。JavaScript object properties: Object properties are specific object states and can be accessed using dot notation or square bracket notation.
JavaScript对象方法:对象方法是指定的对象行为,可以通过点标记法或者方括号标记法调用。JavaScript object methods: Object methods are specified object behaviors and can be called through dot notation or square bracket notation.
集成开发环境(Integrated Development Environment,简称IDE),又称集成开发平台,是指一种将编程功能及其他辅助开发工具结合在一起的软件应用套件。它通常包含代码编辑器、编译器/解释器、代码解析器、调试器、构建自动化系统、版本控制系统以及部署工具等组成部分。Integrated Development Environment (IDE), also known as integrated development platform, refers to a software application suite that combines programming functions and other auxiliary development tools. It usually includes components such as code editors, compilers/interpreters, code parsers, debuggers, build automation systems, version control systems, and deployment tools.
代码元数据(源代码):是用于表征代码文本中数据的组织、数据域及其关系的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。Code metadata (source code): is information used to characterize the organization, data domains, and relationships of data in code text, and is used to support functions such as indicating storage locations, historical data, resource searches, and file records.
抽象语法树解析器(AST解析器):一种特殊的代码解析器,其目标是从代码文件的代码文本中提取代码元素,构建抽象语法树(AST)。通过AST解析器,代码文件会被转化为一个便于处理和分析的数据结构,使开发者更容易理解代码。Abstract Syntax Tree Parser (AST Parser): A special code parser whose goal is to extract code elements from the code text of a code file and build an Abstract Syntax Tree (AST). Through the AST parser, the code file will be converted into a data structure that is easy to process and analyze, making it easier for developers to understand the code.
深度自注意力模型(Transformer模型):一种基于注意力机制(Attention)的深度学习架构,用于处理序列数据,如自然语言。Deep Self-Attention Model (Transformer Model): A deep learning architecture based on the attention mechanism (Attention) for processing sequence data such as natural language.
双向编码表示的深度自注意力模型(Bidirectional Encoder Representationsfrom Transfor mers,简称BERT模型):一种特殊的Transformer模型,使用双向Transformer编码器和大规模无标注文本数据进行训练的。BERT的出色表现使其成为许多NLP任务的标准基线。Bidirectional Encoder Representations from Transformers (BERT): A special Transformer model trained using a bidirectional Transformer encoder and large-scale unlabeled text data. BERT's outstanding performance has made it a standard baseline for many NLP tasks.
大语言模型(Large Language Model,简称LLM):在大规模语料库上训练的深度学习模型,用于自然语言处理任务。这些模型一般包含多层神经网络,其输入是文本序列,进行文本生成,输出是对该文本序列执行特定自然语言处理任务,生成的任务结果文本。预训练意味着在特定任务之前,模型已经被训练并预先学会处理大量的语言数据。通过预先训练模型,它们可以捕捉到更加复杂的语言和语义规则,从而在各种自然语言处理任务中表现出色,并减少对特定任务的大规模数据需求。Large Language Model (LLM): A deep learning model trained on a large-scale corpus and used for natural language processing tasks. These models generally include multi-layer neural networks, whose input is a text sequence for text generation, and the output is the task result text generated by performing a specific natural language processing task on the text sequence. Pre-training means that the model has been trained and learned to process a large amount of language data before a specific task. By pre-training models, they can capture more complex language and semantic rules to perform well in a variety of natural language processing tasks and reduce large-scale data requirements for specific tasks.
资源文件:包含程序运行过程中所需的各种非可执行数据或代码的文件集合。可以为图像、音频、视频或其他类型的多媒体数据,也可以为文本、配置信息或其他类型的非媒体数据。资源文件的作用是为了方便程序加载和访问所需的外部数据,同时避免将大量的数据硬编码到程序代码中,降低程序体积,增强可移植性。通常来说,程序会将各种外部资源数据打包成一个或多个资源文件,并在运行过程中按需加载和解压缩这些数据。Resource file: A collection of files that contains various non-executable data or codes required for the program to run. It can be images, audio, video or other types of multimedia data, or it can be text, configuration information or other types of non-media data. The function of the resource file is to facilitate the program to load and access the required external data, while avoiding hard-coding a large amount of data into the program code, reducing the program size and enhancing portability. Generally speaking, a program will package various external resource data into one or more resource files, and load and decompress these data on demand during operation.
二进制文件:一种包含以二进制形式编码数据或程序指令的计算机文件。这些文件通常用于存储图形、音频、视频和其他非文本数据,并且不能通过简单的文本编辑器进行查看或编辑。Binary file: A computer file containing encoded data or program instructions in binary form. These files are typically used to store graphics, audio, video, and other non-text data, and cannot be viewed or edited with a simple text editor.
日志文件:记录系统操作事件的记录文件或文件集合,主要用于跟踪和监控系统的运行情况。它可以包括系统启动和关闭事件、错误信息、警告和调试信息等内容。日志文件通常是文本格式的,可以通过简单的文本编辑器进行查看和分析。Log file: A file or collection of files that records system operation events, mainly used to track and monitor system operation. It can include system startup and shutdown events, error messages, warnings, and debugging information. Log files are usually in text format and can be viewed and analyzed with a simple text editor.
路径文件:记录文件或目录路径的文件,通常被用于在计算机系统之间共享文件路径信息或者保存用户经常访问的文件路径。路径文件可以有不同的格式,比如CSV、XML或纯文本文件,取决于它的用途和需求。Path file: A file that records the path of a file or directory. It is usually used to share file path information between computer systems or to save the file paths that users frequently access. Path files can have different formats, such as CSV, XML, or plain text files, depending on its purpose and requirements.
JsonRpc协议:一种无状态且轻量级的远程过程调用(Remote Procedure Call,简称RPC)协议,使用JSON格式进行数据交换。它是面向服务的架构(Service-OrientedArchitecture,简称SOA)的一种实现方式,支持跨语言、跨平台的应用程序间交互。JsonRpc的核心概念是请求/响应模型,客户端发送一个请求到服务器,服务器处理请求后返回一个响应给客户端。请求和响应均采用JSON格式表示,包含方法名、参数、id等字段。JsonRpc的优点在于简洁、高效、易于实现,适用于多种编程语言和平台。同时,JsonRpc还具有良好的可扩展性,可以方便地添加自定义方法和参数。JsonRpc protocol: A stateless and lightweight Remote Procedure Call (RPC) protocol that uses JSON format for data exchange. It is an implementation method of Service-Oriented Architecture (SOA) and supports cross-language and cross-platform interaction between applications. The core concept of JsonRpc is the request/response model. The client sends a request to the server, and the server returns a response to the client after processing the request. Both requests and responses are expressed in JSON format, including method names, parameters, id and other fields. The advantages of JsonRpc are that it is simple, efficient, easy to implement, and is suitable for a variety of programming languages and platforms. At the same time, JsonRpc also has good scalability, and custom methods and parameters can be easily added.
WebSocket通信连接:一种在客户端和服务器之间建立全双工、长连接的通信协议,可以实现实时双向通信。它允许客户端和服务器互相发送数据,而不必等待对方的回应。WebSocket communication connection: A communication protocol that establishes a full-duplex, long-term connection between the client and the server, which can achieve real-time two-way communication. It allows clients and servers to send data to each other without having to wait for a response from the other party.
n-gram方式:是一种语言建模方法,它可以用来计算一段给定文本序列的概率分布。n-gram模型的主要特点是假设每个词只与其前n-1个词有关,而不考虑后面的词,因此它忽略了文本之间的长期相关性。n-gram method: It is a language modeling method that can be used to calculate the probability distribution of a given text sequence. The main feature of the n-gram model is that it assumes that each word is only related to its previous n-1 words, without considering the following words, so it ignores the long-term correlation between texts.
目前,参考代码是通过文本相似度的方式查找得到的,这样的方式不能保证参考代码的有效性,引入这样的无效参考代码,会对代码生成模型造成模型幻觉,无法引导代码生成模型进行准确的代码生成,导致生成的目标代码文本的准确度不足,代码文本处理的准确度不足。Currently, reference codes are found through text similarity. This method cannot guarantee the validity of the reference code. Introducing such invalid reference codes will cause model illusion to the code generation model and fail to guide the code generation model to perform accurate calculations. Code generation, resulting in insufficient accuracy in generated target code text and insufficient accuracy in code text processing.
针对上述问题,本说明书中提供了一种代码文本处理方法,本说明书同时涉及一种代码补充方法,一种代码文本处理装置,一种代码补充装置,一种计算设备,一种计算机可读存储介质以及一种计算机程序,在下面的实施例中逐一进行详细说明。In response to the above problems, this specification provides a code text processing method. This specification also relates to a code supplement method, a code text processing device, a code supplement device, a computing device, and a computer-readable storage. The medium and a computer program are described in detail one by one in the following embodiments.
参见图1,图1示出了本说明书一个实施例提供的一种代码文本处理方法的流程图,包括如下具体步骤:Referring to Figure 1, Figure 1 shows a flow chart of a code text processing method provided by an embodiment of this specification, including the following specific steps:
步骤102:获取目标项目的待处理代码文本。Step 102: Obtain the code text to be processed of the target project.
本说明书实施例应用于具有代码文本处理功能的应用、网站或者小程序,可以为该应用、网站或者小程序的客户端,也可以为该应用、网站或者小程序的服务端,例如,集成开发环境通过接口的形式调用该方法,又例如,直接提供代码开发的应用,在该应用的服务端上实现该方法。应用于代码文本处理任务场景中,包括但不限于:代码文本生成任务场景、代码文本补充任务场景和代码文本改写任务场景。The embodiments of this specification are applied to applications, websites or small programs with code text processing functions, and can be the client of the application, website or small program, or the server of the application, website or small program, for example, integrated development The environment calls this method in the form of an interface. For example, it directly provides an application developed with code and implements this method on the server side of the application. Applied to code text processing task scenarios, including but not limited to: code text generation task scenarios, code text supplementation task scenarios, and code text rewriting task scenarios.
目标项目为需要执行代码文本处理的代码文本所在的开发项目,例如,应用程序,接口,网站,插件和数据库管理工具等。The target project is the development project where the code text that needs to be processed is located, for example, applications, interfaces, websites, plug-ins, database management tools, etc.
目标项目的待处理代码文本为目标项目中需要执行代码文本处理的代码文本,可以为需要执行代码文本生成的待生成代码文本,也可为需要执行代码文本补充的待补充代码文本,还可以为需要执行代码文本改写的待改写代码文本,在此不作限定。待处理代码文本是用特定代码语言编写得到的,包括但不限于:Java、Python、JavaScript和TypeScript。例如,利用JavaScript编写的待补充代码文本:“var demo={name:"John",age:25,add”。The code text to be processed of the target project is the code text in the target project that needs to be processed. It can be the code text to be generated that needs to be generated by the code text, or it can be the code text to be supplemented that needs to be supplemented by the code text, or it can be the code text to be rewritten that needs to be rewritten, and there is no limitation here. The code text to be processed is written in a specific code language, including but not limited to: Java, Python, JavaScript and TypeScript. For example, the code text to be supplemented written in JavaScript: "var demo = {name: "John", age: 25, add".
获取目标项目的待处理代码文本,可以为接收前端上传的目标项目的待处理代码文本,例如,用户在前端上传一个插件的代码文件,该代码文件包括待改写代码文本,也可以为识别前端输入的目标项目的待处理代码文本,例如,识别用户在集成开发环境的前端界面上开发插件过程中正在输入的待补充代码文本,还可以为从数据库中获取目标项目的待处理代码文本,例如,利用索引从代码数据库中获取一个插件的代码文件,该代码文件包括待改写代码文本,在此不作限定。Obtaining the code text to be processed of the target project can be for receiving the code text to be processed of the target project uploaded by the front end, for example, a user uploads a code file of a plug-in on the front end, and the code file includes the code text to be rewritten. It can also be for identifying the code text to be processed of the target project input by the front end, for example, identifying the code text to be supplemented that the user is entering during the plug-in development process on the front-end interface of the integrated development environment. It can also be for obtaining the code text to be processed of the target project from a database, for example, using an index to obtain a code file of a plug-in from a code database, and the code file includes the code text to be rewritten. There is no limitation here.
示例性地,在JavaScript的集成开发环境上,部署有代码生成接口,用户在该集成开发环境上启动该代码生成接口后,开始编写某插件的代码文本,在编写一个JavaScript对象的过程中,代码生成接口识别当前的待补充代码文本为:“var demo={name:"John",age:25,add”。For example, a code generation interface is deployed on a JavaScript integrated development environment. After the user starts the code generation interface on the integrated development environment, he starts writing the code text of a certain plug-in. In the process of writing a JavaScript object, the code The generated interface identifies the current code text to be added as: "var demo={name: "John", age: 25, add".
获取目标项目的待处理代码文本,为后续解析得到第一代码元数据提供了代码文本基础。Obtaining the code text to be processed of the target project provides a code text basis for subsequent analysis to obtain the first code metadata.
步骤104:对待处理代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息。Step 104: Parse the code text to be processed and obtain the first code metadata of each code element, where each code element has corresponding element information.
代码元素为代码文本的组成单位,代码文本包括不同层级的代码元素,不同层级的代码元素构成的特定结构数据,表征了该代码文本的语法结构。其中,特定结构数据包括但不限于:抽象语法树和符号表。代码元素根据不同代码语言有不同的定义,且存在不同的语法结构:对于Java语言,Java类这一父代码元素包括:Java包定义、Java导入类型、Java类定义、Java继承类、Java实现接口、Java类属性和Java类方法等子代码元素;对于Python语言,P ython模块这一父代码元素包括:Python导入类型、Python函数定义和Python全局变量等子代码元素,Python类这一父代码元素包括:Python导入类型、Python类定义、Python类属性、Python类方法和Python继承类等子代码元素;对于JavaScript语言,JavaScript模块这一父代码元素包括:JavaScript导入类型、JavaScript函数定义和JavaScript全局变量/常量等子代码元素,JavaScript类这一父代码元素包括:JavaScript导入类型、JavaScript类定义、JavaScrip t类定义、JavaScript类属性、JavaScript类方法和JavaScript继承类等子代码元素,JavaScript对象这一父代码元素包括:JavaScript导入类型、JavaScript对象定义、JavaScript对象属性和JavaScript对象方法等子代码元素。为了方便表述,本说明书实施例中采用对象、函数和变量三个层级的代码元素进行表述。Code elements are the constituent units of code text, which include code elements of different levels. Code elements of different levels constitute specific structural data, which represent the grammatical structure of the code text. The specific structural data include but are not limited to: abstract syntax tree and symbol table. Code elements have different definitions according to different code languages, and have different grammatical structures: for Java language, the parent code element of Java class includes: Java package definition, Java import type, Java class definition, Java inherited class, Java implementation interface, Java class attribute and Java class method and other child code elements; for Python language, the parent code element of Python module includes: Python import type, Python function definition and Python global variable and other child code elements, the parent code element of Python class includes: Python import type, Python class definition, Python class attribute, Python class method and Python inherited class and other child code elements; for JavaScript language, the parent code element of JavaScript module includes: JavaScript import type, JavaScript function definition and JavaScript global variable/constant and other child code elements, the parent code element of JavaScript class includes: JavaScript import type, JavaScript class definition, JavaScript class definition, JavaScript class attribute, JavaScript class method and JavaScript inherited class and other child code elements, the parent code element of JavaScript object includes: JavaScript import type, JavaScript object definition, JavaScript object attribute and JavaScript object method and other child code elements. For the convenience of expression, the code elements of three levels of object, function and variable are used for expression in the embodiments of this specification.
代码元数据为是用于表征代码文本中数据的组织、数据域及其关系的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。代码元素的代码元数据为对代码元素的组织、数据域及其关系的进行描述的信息。第一代码元数据为待处理代码文本中各代码元素的代码元数据,例如,待补充代码文本为:“var demo={name:"John",age:25,add”,其中,包括对象元素("demo")和函数元素("add"),对象元素的第一代码元数据为:"name":"demo","signature":"var demo","full_name":"project.path.demo","fields",函数元素的第一代码元数据为:"methods":{"add":{"method_name":"add","signature":"add:function()"}}。Code metadata is information used to characterize the organization, data domains and relationships of data in code text, and is used to support functions such as indicating storage location, historical data, resource search, file recording, etc. The code metadata of a code element is information that describes the organization, data fields, and relationships of the code element. The first code metadata is the code metadata of each code element in the code text to be processed. For example, the code text to be supplemented is: "var demo={name: "John", age: 25, add", which includes object elements. ("demo") and function elements ("add"), the first code metadata of the object element is: "name":"demo","signature":"var demo","full_name":"project.path. demo", "fields", the first code metadata of the function element is: "methods":{"add":{"method_name":"add","signature":"add:function()"}}.
代码元素的元素信息为代码元素的元素属性信息,包括但不限于:代码元素的名称、代码元素的位置信息和代码元素的类型信息。例如,对于代码元素("demo"),代码元素的名称为demo,代码元素的位置信息为:起始行列号[0,1];终止行列号[0,8];范围:方法体内部、类内部、方法定义、类定义等,代码元素的类型信息为JavaScript对象类型。The element information of a code element is the element attribute information of the code element, including but not limited to: the name of the code element, the location information of the code element, and the type information of the code element. For example, for a code element ("demo"), the name of the code element is demo, the location information of the code element is: starting row and column number [0,1]; ending row and column number [0,8]; range: inside the method body, inside the class, method definition, class definition, etc., and the type information of the code element is the JavaScript object type.
对待处理代码文本进行解析,获得各代码元素的第一代码元数据,具体方式为:对待处理代码文本进行语法结构解析,获得各代码元素的第一代码元数据。此步骤具体通过代码解析器实现,例如,针对Java语言和JavaScript语言的Parser解析器,针对Python语言的Pyg ments解析器。语法结构解析可以通过特定结构数据实现,包括但不限于:抽象语法树和符号表。其中,通过抽象语法树实现的方式,可以参见下述图3说明。The code text to be processed is parsed to obtain the first code metadata of each code element, specifically in the following manner: the code text to be processed is parsed for syntax structure to obtain the first code metadata of each code element. This step is specifically implemented by a code parser, for example, a Parser for Java and JavaScript, and a Pyg ments parser for Python. Syntax structure parsing can be implemented by specific structural data, including but not limited to: an abstract syntax tree and a symbol table. Among them, the implementation method by the abstract syntax tree can be referred to in the following FIG3.
示例性地,利用Parser解析器,通过抽象语法树的结构数据,对待补充代码文本“var d emo={name:"John",age:25,add”进行语法结构解析,获得对象元素("demo")的第一代码元数据为:"methods":{"add":{"method_name":"add","signature":"add:function()"}},函数元素("add")的第一代码元数据为:"methods":{"add":{"method_name":"add","signat ure":"add:function()"}}。For example, the Parser parser is used to analyze the syntax structure of the supplementary code text "var d emo={name: "John", age: 25, add" through the structural data of the abstract syntax tree, and obtain the object element ("demo ")'s first code metadata is: "methods":{"add":{"method_name":"add","signature":"add:function()"}}, the function element ("add") The first code metadata is: "methods":{"add":{"method_name":"add","signature":"add:function()"}}.
对待处理代码文本进行解析,获得各代码元素的第一代码元数据,从待处理代码文本自身的结构上进行解析,得到各代码元素的第一代码元数据,为后续确定目标查询信息,奠定了代码元素基础,为后续生成目标代码文本,提供了代码文本支持。The code text to be processed is parsed to obtain the first code metadata of each code element. The code text to be processed is parsed based on its own structure to obtain the first code metadata of each code element, which lays the foundation for the code elements for the subsequent determination of the target query information and provides code text support for the subsequent generation of the target code text.
步骤106:根据各代码元素的元素信息,确定目标查询信息。Step 106: Determine the target query information based on the element information of each code element.
目标查询信息为用于查询代码元数据的具有辨识性的查询信息。查询信息是基于各代码元素间的结构信息构建的,各代码元素间的结构信息为各代码元素间的层级结构信息,例如,模块-对象-函数-变量这样的层级结构。对于目标查询信息,例如,对于代码元素("add")这一函数元素,在整个目标项目的代码文件中,存在大量的对象包括该函数,因而,考虑到和对象元素("demo")存在父子代码元素关系,确定的目标查询信息为demo+add,这一“父类型代码元素的名称+子类型代码元素的名称”的形式,在整个目标项目的代码文件中具有辨识性。本说明书实施例中,由于代码元素的存储路径具有辨识性,可以将代码元素的存储路径确定为查询信息,例如,对于Java语言,确定Java包元素的名称+Java类元素名称这样的存储路径为查询信息,又例如,例如,对于Python语言,确定Python模块路径+Python模块元素的名称,或者确定Python模块路径+Python模块元素的名称+Python类元素的名称这样的存储路径为查询信息,还例如,对于JavaScript语言,确定JavaScript模块路径+JavaScript模块元素的名称,或者确定JavaScript模块路径+JavaScript模块元素的名称+JavaScript类元素的名称,或者确定JavaScript模块路径+JavaScript模块元素的名称+JavaScript对象元素的名称这样的存储路径为查询信息。具体的格式表现为“/workspace/project/path$demo”。The target query information is identifying query information used to query code metadata. The query information is constructed based on the structural information between each code element. The structural information between each code element is the hierarchical structure information between each code element, for example, a hierarchical structure such as module-object-function-variable. For the target query information, for example, for the function element of code element ("add"), there are a large number of objects including this function in the code file of the entire target project. Therefore, considering that the object element ("demo") exists For the relationship between parent and child code elements, the determined target query information is demo+add. This form of "the name of the parent type code element + the name of the child type code element" is identifiable in the code file of the entire target project. In the embodiment of this specification, since the storage path of the code element is identifiable, the storage path of the code element can be determined as the query information. For example, for the Java language, the storage path such as the name of the Java package element + the name of the Java class element is determined as Query information, for example, for the Python language, determine the Python module path + the name of the Python module element, or determine the Python module path + the name of the Python module element + the name of the Python class element. Such a storage path is the query information. For example, , for the JavaScript language, determine the JavaScript module path + the name of the JavaScript module element, or determine the JavaScript module path + the name of the JavaScript module element + the name of the JavaScript class element, or determine the JavaScript module path + the name of the JavaScript module element + the name of the JavaScript object element A storage path with a name like this is the query information. The specific format is "/workspace/project/path$demo".
根据各代码元素的元素信息,确定目标查询信息,具体方式为:根据各代码元素中目标代码元素的元素信息,确定目标查询信息。其中,目标代码元素为待处理代码文本的各代码元素中用于查找第二代码元数据的代码单元,目标代码元素间具有元素结构,进一步地,根据各代码元素中目标代码元素的元素信息,确定目标代码元素间的结构信息,根据目标代码元素间的结构信息,确定目标查询信息,例如,待处理代码文本包括3个代码元素:对象元素("demo")、函数元素("add")和函数元素("mul"),需要补充的代码元素为函数元素("mul"),确定目标代码元素为对象元素("demo")和函数元素("mul"),结构信息为:对象元素("demo")+函数元素("mul"),函数元素("add")不为目标代码元素。确定目标查询信息可以为通过目标代码元素的元素信息,查询预先记录有查询信息的信息表得到,例如,预先记录的引用词典中记录有元素信息和查询信息之间的对应关系,基于目标代码元素的元素信息查询引用词典得到对应的目标查询信息。确定目标查询信息也可以为直接确定目标代码元素的元素信息为目标查询信息,在此不作限定。其中,引用词典可以参见下述图4说明。Determine the target query information based on the element information of each code element. The specific method is: determine the target query information based on the element information of the target code element in each code element. Among them, the target code element is a code unit used to find the second code metadata in each code element of the code text to be processed. There is an element structure between the target code elements. Further, according to the element information of the target code element in each code element, Determine the structural information between the target code elements, and determine the target query information based on the structural information between the target code elements. For example, the code text to be processed includes 3 code elements: object element ("demo"), function element ("add") and function elements ("mul"), the code elements that need to be supplemented are function elements ("mul"), determine the target code elements as object elements ("demo") and function elements ("mul"), the structure information is: object elements ("demo")+function element ("mul"), function element ("add") is not an object code element. Determining the target query information can be obtained by querying an information table with query information pre-recorded through the element information of the target code element. For example, the correspondence between the element information and the query information is recorded in the pre-recorded reference dictionary, based on the target code element. The element information queries the reference dictionary to obtain the corresponding target query information. Determining the target query information may also directly determine the element information of the target code element as the target query information, which is not limited here. The citation dictionary can be seen in Figure 4 below.
示例性地,根据各代码元素中目标代码元素(对象元素("demo")和函数元素("add"))的元素名称,查询预先记录有元素信息和查询信息之间的对应关系的引用词典,得到目标查询信息:模块路径+模块名+demo+add。For example, according to the element name of the target code element (object element ("demo") and function element ("add")) in each code element, a reference dictionary in which the correspondence between the element information and the query information is pre-recorded is queried , get the target query information: module path + module name + demo + add.
根据各代码元素的元素信息,确定目标查询信息。得到了明确的代码引用关系,为后续查询得到第二代码元数据,提供了查询依据。Based on the element information of each code element, the target query information is determined. A clear code reference relationship is obtained, which provides a basis for subsequent queries to obtain the second code metadata.
步骤108:从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建。Step 108: Find the second code metadata corresponding to the target query information from the target database. The target database records the corresponding relationship between the reference query information and the reference code metadata. The reference query information is based on each item in the project file of the target project. Element information construction of code elements.
目标数据库为预先构建的目标项目的项目文件中各代码元素的代码元数据的数据库,目标数据库中同时记录有参考查询信息和参考代码元数据的对应关系,以使在查询代码元数据时作为查询依据。目标数据库可以为启动代码文本处理后构建得到,也可以为启动代码文本处理前预先构建得到,参考查询信息和参考代码元数据以键值对的形式提供代码元数据查询。The target database is a database of code metadata of each code element in the project file of the pre-built target project. The target database also records the corresponding relationship between the reference query information and the reference code metadata, so that when querying the code metadata, it can be used as a query in accordance with. The target database can be built after starting code text processing, or can be built in advance before starting code text processing. Reference query information and reference code metadata provide code metadata query in the form of key-value pairs.
参考查询信息为目标数据库中记录的用于查询代码元数据的具有辨识性的查询信息。具体参见步骤106中目标查询信息的内容。参考代码元数据为目标项目的项目文件中各代码元素的代码元数据,具体参见步骤104中第一代码元数据的内容。第二代码元数据为目标数据库中记录的与目标查询信息对应的代码元数据,具体参见步骤104中第一代码元数据的内容。The reference query information is the identifying query information recorded in the target database for querying code metadata. For details, refer to the content of the target query information in step 106. The reference code metadata is the code metadata of each code element in the project file of the target project. For details, see the content of the first code metadata in step 104. The second code metadata is the code metadata corresponding to the target query information recorded in the target database. For details, please refer to the content of the first code metadata in step 104.
需要说明的是,目标数据库是预先通过对目标项目的项目文件进行解析,获得各代码元素的参考代码元数据后,根据各代码元素的元素信息构建的参考查询信息,对参考代码元数据进行存储得到的。其中,参与解析的目标项目的项目文件可以为目标项目中的全部项目文件,也可以为目标项目中的部分项目文件。对应地,参考代码元数据可以为全部代码元素的代码元数据,也可以为部分代码元素的代码元数据,根据实际的查询情景进行动态更新,例如,将参考代码元数据写入队列中,根据实际查询情景进行动态更新。It should be noted that the target database obtains the reference code metadata of each code element by parsing the project file of the target project in advance, and then stores the reference code metadata based on the reference query information constructed based on the element information of each code element. owned. Among them, the project files of the target project participating in the analysis may be all project files in the target project, or may be part of the project files in the target project. Correspondingly, the reference code metadata can be the code metadata of all code elements, or the code metadata of some code elements, which can be dynamically updated according to the actual query scenario. For example, the reference code metadata can be written into the queue, and the reference code metadata can be dynamically updated according to the actual query scenario. Actual query scenarios are dynamically updated.
从目标数据库中,查找目标查询信息对应的第二代码元数据,具体方式为:基于目标数据库中记录的参考查询信息和参考代码元数据的对应关系,查找目标查询信息对应的第二代码元数据。进一步地,基于目标数据库中记录的参考查询信息和参考代码元数据的对应关系构建的键值对,以目标查询信息为键,查找对应值的第二代码元数据。From the target database, search for the second code metadata corresponding to the target query information. The specific method is: based on the correspondence between the reference query information and the reference code metadata recorded in the target database, search for the second code metadata corresponding to the target query information. . Further, the key-value pair constructed based on the correspondence between the reference query information and the reference code metadata recorded in the target database is used to search for the second code metadata of the corresponding value with the target query information as the key.
示例性地,基于目标数据库中记录的参考查询信息和参考代码元数据的对应关系的键值对,以目标查询信息“模块路径+模块名+demo+add”为键,查找对应值的第二代码元数据:{"name":"demo","signature":"vardemo","full_name":"project.path.demo","fields":{"name":{"field_name":"name","field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_v alue":"25","signature":"age:25"}},"methods":{"add":{"method_name":"add","signature":"add:funct ion()"}}}。For example, based on the key-value pair of the corresponding relationship between the reference query information and the reference code metadata recorded in the target database, using the target query information "module path + module name + demo + add" as the key, find the second key of the corresponding value. Code metadata: {"name":"demo","signature":"vardemo","full_name":"project.path.demo","fields":{"name":{"field_name":"name" ,"field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_v alue":"25","signature":"age :25"}},"methods":{"add":{"method_name":"add","signature":"add:function()"}}}.
从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建。利用元素信息确定的查询信息和代码元数据的对应关系,这一明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,为后续代码生成模型,生成目标代码文件提供了参考代码。The second code metadata corresponding to the target query information is searched from the target database, wherein the target database records the correspondence between the reference query information and the reference code metadata, and the reference query information is constructed based on the element information of each code element in the project file of the target project. The correspondence between the query information and the code metadata determined by the element information, this clear code reference relationship, is used to query the target database and obtain valid second code metadata, which provides reference code for the subsequent code generation model to generate the target code file.
步骤110:基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。Step 110: Based on the first code metadata and the second code metadata, use a code generation model to generate target code text, wherein the code generation model is obtained by training the text processing model based on the sample code text.
代码生成模型为具有代码文本生成功能的深度学习模型,代码生成模型是基于样本代码文本对文本处理模型训练得到,代码生成模型是一种针对于代码文本处理任务、对文本处理模型进行微调得到的深度学习模型。代码生成模型可以为Transformer模型、BERT模型或者大语言模型,在此不作限定。The code generation model is a deep learning model with code text generation function. The code generation model is trained on the text processing model based on sample code text. The code generation model is a fine-tuning of the text processing model for code text processing tasks. Deep learning model. The code generation model can be a Transformer model, a BERT model or a large language model, which is not limited here.
目标代码文本为对待处理代码文本执行代码文本处理得到的代码文本,为代码文本处理的处理结果,可以为执行代码文本生成的生成代码文本,也可为执行代码文本补充的补充代码文本,还可以为执行代码文本改写的改写代码文本,在此不作限定。目标代码文本是用特定代码语言编写得到的,可以与待处理代码文本一致,也可以与待处理代码文本不一致。例如,利用JavaScript编写的待补充代码文本:“var demo={name:"John",age:25,add”,执行代码补充处理,得到JavaScript编写的补充代码文本为:“var demo={name:"John",age:25,add:function(){console.log(this.name);},};demo.add”,执行代码改写处理,利用P ython编写的改写代码文本为:“class demo:def__init__(self):self.name="John"self.age=25def add(self):print(self.name)demo=demo()demo.add()”。The target code text is the code text obtained by executing the code text processing on the code text to be processed, which is the processing result of the code text processing. It can be the generated code text generated by the executed code text, or the supplementary code text supplemented by the executed code text, or the rewritten code text rewritten by the executed code text, which is not limited here. The target code text is written in a specific code language, and can be consistent with the code text to be processed, or inconsistent with the code text to be processed. For example, the code text to be supplemented written in JavaScript: "var demo = {name: "John", age: 25, add", execute the code supplementation processing, and obtain the supplementary code text written in JavaScript: "var demo = {name: "John", age: 25, add: function () {console.log (this.name); },}; demo.add", execute the code rewriting processing, and the rewritten code text written in Python is: "class demo: def__init__ (self): self.name = "John" self.age = 25 def add (self): print (self.name) demo = demo () demo.add ().
基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,具体方式为:利用代码生成模型,基于第二代码元数据,对第一代码元数据执行代码文本处理,生成目标代码文本。基于第二代码元数据,对第一代码元数据执行代码文本处理,具体是通过将第二代码元数据作为参考代码,对第一代码元数据执行代码文本处理,生成目标代码文本。Based on the first code metadata and the second code metadata, the code generation model is used to generate the target code text. The specific method is: using the code generation model to perform code text processing on the first code metadata based on the second code metadata, Generate object code text. Based on the second code metadata, code text processing is performed on the first code metadata. Specifically, by using the second code metadata as a reference code, code text processing is performed on the first code metadata to generate target code text.
可选地,在步骤110之前,还包括如下具体步骤:Optionally, before step 110, the following specific steps are also included:
对待处理代码文本进行预处理,得到上下文信息,其中,预处理为上下文信息提取;Preprocess the code text to be processed to obtain context information, where preprocessing is context information extraction;
对应地,步骤110包括如下具体步骤:Correspondingly, step 110 includes the following specific steps:
将上下文信息、第一代码元数据和第二代码元数据输入代码生成模型,基于上下文信息和第二代码元数据,对第一代码元数据执行代码文本处理,生成目标代码文本。The context information, the first code metadata and the second code metadata are input into the code generation model, and based on the context information and the second code metadata, code text processing is performed on the first code metadata to generate target code text.
上下文信息为待处理代码文本中的上下文代码文本,为待处理代码文本中的编写习惯、变量定义、参数数值等信息。The context information is the context code text in the code text to be processed, and is the writing habits, variable definitions, parameter values and other information in the code text to be processed.
示例性地,对待补充代码文本为:“var demo={name:"John",age:25,add”进行预处理,得到编写习惯、变量定义、参数数值等信息等上下文信息,将上下文信息、第一代码元数据"methods":{"add":{"method_name":"add","signature":"add:function()"}}和第二代码元数据{"name":"demo","signature":"vardemo","full_name":"project.path.demo","fields":{"na me":{"field_name":"name","field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_value":"25","signature":"age:25"}},"methods":{"add":{"method_name":"add","signature":"add:function()"}}}输入具有代码文本处理功能的大语言模型,利用大语言模型,基于上下文信息和第二代码元数据,对第一代码元数据执行代码补充处理,生成补充代码文本:“var demo={name:"John",age:25,add:function(){console.log(this.name);},};demo.add”。For example, the text of the supplementary code is: "var demo={name: "John", age: 25, add", which is preprocessed to obtain context information such as writing habits, variable definitions, parameter values, etc., and the context information, The first code metadata "methods":{"add":{"method_name":"add","signature":"add:function()"}} and the second code metadata {"name":"demo" ,"signature":"vardemo","full_name":"project.path.demo","fields":{"na me":{"field_name":"name","field_value":"John","signature ":"name:'John'"},"age":{"field_name":"age","field_value":"25","signature":"age:25"}},"methods":{" add":{"method_name":"add","signature":"add:function()"}}}Input a large language model with code text processing function, using the large language model, based on contextual information and second code elements Data, perform code supplementary processing on the first code metadata, and generate supplementary code text: "var demo={name:"John",age:25,add:function(){console.log(this.name);}, };demo.add".
本说明书实施例中,获取目标项目的待处理代码文本;对待处理代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;根据各代码元素的元素信息,确定目标查询信息;从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。通过预先解析目标项目的项目文件,基于各代码元素的元素信息构建出参考查询信息,与参考代码元数据对应存储在目标数据库中,在进行代码文本处理过程中,通过解析待处理代码文本,获得各代码元素的第一代码元数据,进而根据各代码元素的元素信息,确定目标查询信息,利用这种明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,生成高准确度的目标代码文本,提升了代码文本处理的准确度。In the embodiment of this specification, the to-be-processed code text of the target project is obtained; the to-be-processed code text is parsed to obtain the first code metadata of each code element, where each code element has corresponding element information; according to the elements of each code element information, determine the target query information; search for the second code metadata corresponding to the target query information from the target database, where the corresponding relationship between the reference query information and the reference code metadata is recorded in the target database, and the reference query information is based on the target project The element information of each code element in the project file is constructed; based on the first code metadata and the second code metadata, the code generation model is used to generate the target code text, where the code generation model is trained on the text processing model based on the sample code text. By pre-parsing the project files of the target project, reference query information is constructed based on the element information of each code element, and stored in the target database corresponding to the reference code metadata. During the code text processing process, by parsing the code text to be processed, we obtain The first code metadata of each code element is used to determine the target query information based on the element information of each code element. This clear code reference relationship is used to query the target database and obtain the effective second code metadata as a reference code. Guide the code generation model to generate high-accuracy target code text, improving the accuracy of code text processing.
在本说明书一种可选实施例中,在步骤108之前,还包括如下具体步骤:In an optional embodiment of the present specification, before step 108, the following specific steps are also included:
获取目标项目的项目文件;Get the project files of the target project;
对项目文件进行解析,获得各代码元素的参考代码元数据;Parse the project file to obtain the reference code metadata of each code element;
根据各代码元素的元素信息,构建参考查询信息;Construct reference query information based on the element information of each code element;
基于参考查询信息和参考代码元数据的对应关系,构建目标数据库。Based on the correspondence between reference query information and reference code metadata, a target database is constructed.
目标项目的项目文件为项目代码文件是指与开发项目相关的代码文件。包括但不限于:源代码文件,资源文件、二进制文件、日志文件和路径文件。The project files of the target project are project code files, which refer to code files related to the development project, including but not limited to: source code files, resource files, binary files, log files, and path files.
对项目文件进行解析,获得各代码元素的参考代码元数据,具体方式为:对项目文件的代码文本进行语法结构解析,获得各代码元素的参考代码元数据。此步骤具体通过代码解析器实现,例如,针对Java语言和JavaScript语言的Parser解析器,针对Python语言的Pygme nts解析器。语法结构解析可以通过特定结构数据实现,包括但不限于:抽象语法树和符号表。Parse the project file to obtain the reference code metadata of each code element, specifically by performing grammatical structure analysis on the code text of the project file to obtain the reference code metadata of each code element. This step is specifically implemented by a code parser, for example, a Parser for Java and JavaScript, and a Pygmends parser for Python. Syntax structure analysis can be implemented by specific structure data, including but not limited to: an abstract syntax tree and a symbol table.
根据各代码元素的元素信息,构建参考查询信息,具体方式为:根据各代码元素的元素信息,确定各代码元素间的结构信息,基于各代码元素间的结构信息,构建参考查询信息。According to the element information of each code element, reference query information is constructed. Specifically, the structural information between the code elements is determined according to the element information of each code element, and the reference query information is constructed based on the structural information between the code elements.
基于参考查询信息和参考代码元数据的对应关系,构建目标数据库,具体方式为:基于参考查询信息和参考代码元数据的对应关系,构建键值对,基于构建的键值对,存储参考代码元数据,得到目标数据库。Based on the corresponding relationship between the reference query information and the reference code metadata, the target database is constructed. The specific method is: based on the corresponding relationship between the reference query information and the reference code metadata, a key-value pair is constructed, and based on the constructed key-value pair, the reference code element is stored. data to get the target database.
可选地,在获取目标项目的项目文件之后,还包括如下具体步骤:Optionally, after obtaining the project file of the target project, the following specific steps are also included:
对项目文件中的无效项目文件进行过滤。Filter invalid project files in project files.
无效项目文件为无法被解析或者无法用于代码文本处理的项目文件,例如,过长的文件、二进制文件、日志文件和路径文件等。Invalid project files are project files that cannot be parsed or cannot be used for code text processing, such as overly long files, binary files, log files, path files, etc.
需要说明的是,本说明书实施例可以是在代码开发时,确定需要进行代码文本处理后,自动执行的。对应地,获取目标项目的项目文件,具体为:响应于代码文本处理请求,获取目标项目的项目文件。例如,开发者在启动目标项目后,确定启动代码文本处理服务进程,基于jsonrpc协议建立与远程代码文本处理的服务端的websocket通信连接,连接完成后,插件端向本地代码文本处理进程发送初始化的代码文本处理请求,本地代码文本处理进程收到初始化的代码文本处理请求后,开始以索引方式获取目标项目的项目文件。It should be noted that the embodiments of this specification can be automatically executed after it is determined that code text processing needs to be performed during code development. Correspondingly, the project file of the target project is obtained, specifically: in response to the code text processing request, the project file of the target project is obtained. For example, after starting the target project, the developer determines to start the code text processing service process, and establishes a websocket communication connection with the remote code text processing server based on the jsonrpc protocol. After the connection is completed, the plug-in sends initialization code to the local code text processing process. Text processing request. After the local code text processing process receives the initialized code text processing request, it starts to obtain the project files of the target project in index mode.
图2示出了本说明书一个实施例提供的一种代码文本处理方法中目标数据库的构建流程图,如图2所示:FIG. 2 shows a flowchart of constructing a target database in a code text processing method provided in one embodiment of this specification, as shown in FIG. 2 :
获取目标项目的项目文件;过滤无效项目文件;调用对应代码语言的解析器,对项目文件进行解析,获得各代码元素的参考代码元数据;根据各代码元素的元素信息,构建参考查询信息,基于参考查询信息和参考代码元数据的对应关系,构建目标数据库。Obtain the project files of the target project; filter invalid project files; call the parser of the corresponding code language to parse the project files and obtain the reference code metadata of each code element; build reference query information based on the element information of each code element. Build the target database by referring to the corresponding relationship between query information and reference code metadata.
示例性地,响应于代码文本处理请求,获取某插件的项目文件,对项目文件中的过长的文件、二进制文件、日志文件和路径文件进行过滤,利用Parser解析器,通过抽象语法树的结构数据,对项目文件进行解析,获得各代码元素(JavaScript模块、JavaScript导入类型、J avaScript函数定义……JavaScript对象方法)的参考代码元数据,根据各代码元素的元素名称,确定各代码元素间的结构信息:(JavaScript模块:JavaScript导入类型、JavaScript函数定义和JavaScript全局变量/常量);(JavaScript类:JavaScript导入类型、JavaScript类定义、Ja vaScript类定义、JavaScript类属性、JavaScript类方法和JavaScript继承类);(JavaScript对象:JavaScript导入类型、JavaScript对象定义、JavaScript对象属性和JavaScript对象方法),基于各代码元素间的结构信息,构建参考查询信息:(JavaScript模块路径+JavaScript模块元素的名称);(JavaScript模块路径+JavaScript模块元素的名称+JavaScript类元素的名称);(JavaScript模块路径+JavaScript模块元素的名称+JavaScript对象元素的名称),基于参考查询信息和参考代码元数据的对应关系,构建键值对,基于构建的键值对,存储参考代码元数据,得到目标数据库。Exemplarily, in response to a code text processing request, a project file of a certain plug-in is obtained, overly long files, binary files, log files and path files in the project file are filtered, and the project file is parsed using the Parser through the structure data of the abstract syntax tree to obtain the reference code metadata of each code element (JavaScript module, JavaScript import type, JavaScript function definition... JavaScript object method), and the structural information between each code element is determined according to the element name of each code element: (JavaScript module: JavaScript import type, JavaScript function definition and JavaScript global variable/constant); (JavaScript class: JavaScript import type, JavaScript class definition, JavaScript object method, JavaScript module, ... module, JavaScript import type, JavaScript function definition and JavaScript global variable/constant); (JavaScript class: JavaScript import type, JavaScript class definition, JavaScript object method, JavaScript module, JavaScript module vaScript class definition, JavaScript class attributes, JavaScript class methods and JavaScript inherited classes); (JavaScript objects: JavaScript import types, JavaScript object definitions, JavaScript object attributes and JavaScript object methods), based on the structural information between each code element, build reference query information: (JavaScript module path + name of JavaScript module element); (JavaScript module path + name of JavaScript module element + name of JavaScript class element); (JavaScript module path + name of JavaScript module element + name of JavaScript object element), based on the correspondence between the reference query information and the reference code metadata, build key-value pairs, based on the built key-value pairs, store the reference code metadata, and obtain the target database.
获取目标项目的项目文件;对项目文件进行解析,获得各代码元素的参考代码元数据;根据各代码元素的元素信息,构建参考查询信息;基于参考查询信息和参考代码元数据的对应关系,构建目标数据库。通过解析目标项目的项目文件,利用元素信息确定的查询信息和代码元数据的对应关系,这一明确的代码引用关系,构建得到目标数据库,为后续代码文本处理中确定对应代码元数据,奠定了基础。Obtain the project file of the target project; parse the project file to obtain the reference code metadata of each code element; construct reference query information based on the element information of each code element; build target database. By parsing the project files of the target project and using the corresponding relationship between the query information and code metadata determined by the element information, this clear code reference relationship is constructed to obtain the target database, which lays the foundation for determining the corresponding code metadata in subsequent code text processing. Base.
在本说明书一种可选实施例中,对项目文件进行解析,获得各代码元素的参考代码元数据,包括如下具体步骤:In an optional embodiment of this specification, the project file is parsed to obtain the reference code metadata of each code element, including the following specific steps:
对项目文件中的代码进行语法结构分析,得到项目文件的项目语法树;Perform syntax structure analysis on the code in the project file to obtain the project syntax tree of the project file;
解析项目语法树,获得各代码元素的参考代码元数据。Parse the project syntax tree and obtain the reference code metadata of each code element.
项目文件的项目语法树为项目文件的抽象语法树,用于描述项目文件中的源代码文本的语法结构的特定结构数据。以树形结构描述了项目文件中的各个代码元素之间的依赖关系,以及它们在整个项目文件中的位置信息。The project syntax tree of the project file is an abstract syntax tree of the project file, which is used to describe specific structural data of the syntax structure of the source code text in the project file. A tree structure describes the dependencies between various code elements in the project file, as well as their location information in the entire project file.
图3示出了本说明书一个实施例提供的一种代码文本处理方法中抽象语法树的示意图,如图3所示:Figure 3 shows a schematic diagram of an abstract syntax tree in a code text processing method provided by an embodiment of this specification, as shown in Figure 3:
项目文件中,代码文本为“var demo={name:"John",age:25,add:function(){consol e.log(this.name);},};demo.add”,构建抽象语法树为:节点1连接节点2和节点3,节点2连接节点4和节点5,节点3连接节点6和节点7。通过对出现过抽象语法树进行解析,得到抽象语法树解析结果:对象元素(节点1)-demo;函数元素(节点2)-add;函数元素(节点3)-mul;变量元素(节点4)-a;变量元素(节点5)-b;变量元素(节点6)-a;变量元素(节点7)-b,得到各代码元素的参考代码元数据。In the project file, the code text is "var demo="{name:"John",age:25,add:function(){consol e.log(this.name);},};demo.add" to build abstract syntax The tree is: node 1 connects node 2 and node 3, node 2 connects node 4 and node 5, and node 3 connects node 6 and node 7. By parsing the abstract syntax tree that has appeared, the abstract syntax tree parsing results are obtained: object element (node 1)-demo; function element (node 2)-add; function element (node 3)-mul; variable element (node 4) -a; variable element (node 5)-b; variable element (node 6)-a; variable element (node 7)-b, obtain the reference code metadata of each code element.
示例性地,利用Parser解析器,对某插件的项目文件的代码进行语法结构分析,得到项目文件的项目语法树,解析项目语法树,获得各代码元素(JavaScript模块、JavaScript导入类型、JavaScript函数定义……JavaScript对象方法)的参考代码元数据。For example, the Parser parser is used to analyze the syntax structure of the code of the project file of a certain plug-in, and the project syntax tree of the project file is obtained. The project syntax tree is parsed to obtain each code element (JavaScript module, JavaScript import type, JavaScript function definition). ... JavaScript object method) reference code metadata.
对项目文件中的代码进行语法结构分析,得到项目文件的项目语法树;解析项目语法树,获得各代码元素的参考代码元数据。通过抽象语法树,实现了对项目文件中的代码的高准确度的语法结构分析,得到了高准确度的各代码元素的参考代码元数据,为后续利用元素信息确定的查询信息和代码元数据的对应关系,这一明确的代码引用关系,奠定了数据基础。Analyze the syntax structure of the code in the project file to obtain the project syntax tree of the project file; parse the project syntax tree to obtain the reference code metadata of each code element. Through the abstract syntax tree, high-accuracy syntax structure analysis of the code in the project file is achieved, and high-accuracy reference code metadata of each code element is obtained, which determines query information and code metadata for subsequent use of element information. The corresponding relationship, this clear code reference relationship, lays the data foundation.
在本说明书一种可选实施例中,根据各代码元素的元素信息,构建参考查询信息,包括如下具体步骤:In an optional embodiment of this specification, constructing reference query information based on the element information of each code element includes the following specific steps:
根据各代码元素的元素信息,确定各代码元素间的结构信息;Determine the structural information between each code element based on the element information of each code element;
基于各代码元素间的结构信息,构建各代码元素的参考代码元数据的存储路径作为参考查询信息。Based on the structural information between each code element, the storage path of the reference code metadata of each code element is constructed as the reference query information.
各代码元素间的结构信息为各代码元素间的语法结构信息,为各代码元素间的层级语法结构的结构信息,表现为父类型代码元素+子类型代码元素的层级结构。针对不同的代码语言,存在不同的语法结构信息,例如,对于Java语言,Java类这一父代码元素包括:Java包定义、Java导入类型、Java类定义、Java继承类、Java实现接口、Java类属性和Java类方法等子代码元素;对于Python语言,Python模块这一父代码元素包括:Python导入类型、Python函数定义和Python全局变量等子代码元素,Python类这一父代码元素包括:Python导入类型、P ython类定义、Python类属性、Python类方法和Python继承类等子代码元素;对于JavaScrip t语言,JavaScript模块这一父代码元素包括:JavaScript导入类型、JavaScript函数定义和Jav aScript全局变量/常量等子代码元素,JavaScript类这一父代码元素包括:JavaScript导入类型、JavaScript类定义、JavaScript类定义、JavaScript类属性、JavaScript类方法和JavaScript继承类等子代码元素,JavaScript对象这一父代码元素包括:JavaScript导入类型、JavaScript对象定义、JavaScript对象属性和JavaScript对象方法等子代码元素。The structural information between the code elements is the grammatical structure information between the code elements, and is the structural information of the hierarchical grammatical structure between the code elements, which is expressed as a hierarchical structure of parent type code element + child type code element. Different grammatical structure information exists for different code languages. For example, for Java language, the parent code element Java class includes sub-code elements such as Java package definition, Java import type, Java class definition, Java inherited class, Java implementation interface, Java class attribute and Java class method; for Python language, the parent code element Python module includes sub-code elements such as Python import type, Python function definition and Python global variable, and the parent code element Python class includes sub-code elements such as Python import type, Python class definition, Python class attribute, Python class method and Python inherited class; for JavaScript language, the parent code element JavaScript module includes sub-code elements such as JavaScript import type, JavaScript function definition and JavaScript global variable/constant, and the parent code element JavaScript class includes sub-code elements such as JavaScript import type, JavaScript function definition and JavaScript global variable/constant, and the parent code element JavaScript class includes sub-code elements such as JavaScript import type, JavaScript class definition, JavaScript class definition, JavaScript class attribute, JavaScript class method and JavaScript inherited class, and the parent code element JavaScript object includes sub-code elements such as JavaScript import type, JavaScript object definition, JavaScript object attribute and JavaScript object method.
参考代码元数据的存储路径为导入项目文件时,父类型代码元素的存储路径。获取目标项目的项目文件时获得的,例如,对于模块代码元素,存在对应的模块路径。在此基础上拼接得到各代码元素间的结构信息,得到参考查询信息,例如,确定模块路径+模块名+对象名+函数名+变量名为参考查询信息,对于两个变量元素的查询信息,即使模块名、对象名、函数名和变量名都相同,由于存储路径不同,构建的参考查询信息也不同,具有辨识性。The storage path of reference code metadata is the storage path of the parent type code element when importing the project file. Obtained when obtaining the project file of the target project. For example, for module code elements, there is a corresponding module path. On this basis, the structural information between each code element is spliced to obtain the reference query information. For example, determine the module path + module name + object name + function name + variable name reference query information. For the query information of two variable elements, Even if the module name, object name, function name, and variable name are all the same, due to different storage paths, the constructed reference query information is also different and identifiable.
根据各代码元素的元素信息,确定各代码元素间的结构信息,具体方式为:根据各代码元素的元素信息,将各子类型代码元素归属至对应的父类型代码元素,获得各代码元素间的结构信息。进一步地,根据各代码元素的元素信息和代码语言,将各子类型代码元素归属至对应的父类型代码元素,获得各代码元素间的结构信息。According to the element information of each code element, the structural information between each code element is determined. The specific method is: according to the element information of each code element, each sub-type code element is assigned to the corresponding parent type code element, and the structural information between each code element is obtained. structural information. Further, according to the element information and code language of each code element, each sub-type code element is assigned to the corresponding parent type code element, and the structural information between each code element is obtained.
示例性地,根据各代码元素的元素名称和代码语言(JavaScript),将各子类型代码元素归属至对应的父类型代码元素,获得各代码元素间的结构信息:(JavaScript模块:JavaScri pt导入类型、JavaScript函数定义和JavaScript全局变量/常量);(JavaScript类:JavaScript导入类型、JavaScript类定义、JavaScript类定义、JavaScript类属性、JavaScript类方法和Jav aScript继承类);(JavaScript对象:JavaScript导入类型、JavaScript对象定义、JavaScript对象属性和JavaScript对象方法),拼接父类型代码元素的存储路径,得到各代码元素间的结构信息,得到参考查询信息:(JavaScript模块路径+JavaScript模块元素的名称);(Java Script模块路径+JavaScript模块元素的名称+JavaScript类元素的名称);(JavaScript模块路径+JavaScript模块元素的名称+JavaScript对象元素的名称)。For example, according to the element name and code language (JavaScript) of each code element, each subtype code element is assigned to the corresponding parent type code element, and the structural information between each code element is obtained: (JavaScript module: JavaScript import type , JavaScript function definitions and JavaScript global variables/constants); (JavaScript classes: JavaScript import types, JavaScript class definitions, JavaScript class definitions, JavaScript class attributes, JavaScript class methods and JavaScript inheritance classes); (JavaScript objects: JavaScript import types, JavaScript object definition, JavaScript object attributes and JavaScript object methods), splice the storage path of the parent type code element, obtain the structural information between each code element, and obtain the reference query information: (JavaScript module path + JavaScript module element name); (Java Script module path + JavaScript module element name + JavaScript class element name); (JavaScript module path + JavaScript module element name + JavaScript object element name).
根据各代码元素的元素信息,确定各代码元素间的结构信息;基于各代码元素间的结构信息,构建各代码元素的参考代码元数据的存储路径作为参考查询信息。基于各代码元素间的结构信息和存储路径,构建得到了具有辨识性的参考查询信息,得到高准确度的查询信息和代码元数据的对应关系,这一明确的代码引用关系,为构建得到目标数据库,奠定了基础。According to the element information of each code element, the structural information between each code element is determined; based on the structural information between each code element, a storage path of the reference code metadata of each code element is constructed as the reference query information. Based on the structural information and storage paths between each code element, identifiable reference query information is constructed, and the correspondence between the query information and code metadata is obtained with high accuracy. This clear code reference relationship provides the basis for constructing the target The database laid the foundation.
在本说明书一种可选实施例中,在步骤104之前,还包括如下具体步骤:In an optional embodiment of the present specification, before step 104, the following specific steps are also included:
获取待处理代码文本中待处理代码元素的目标位置信息;Obtain target position information of the code element to be processed in the code text to be processed;
对应地,步骤104包括如下具体步骤:Correspondingly, step 104 includes the following specific steps:
对待处理代码文本中的代码进行语法结构分析,得到待处理代码文本的基础语法树;Perform grammatical structure analysis on the code in the code text to be processed, and obtain the basic syntax tree of the code text to be processed;
解析基础语法树,获得各代码元素的第一代码元数据;Parse the basic syntax tree and obtain the first code metadata of each code element;
对应地,步骤106包括如下具体步骤:Correspondingly, step 106 includes the following specific steps:
基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素;Based on the target location information, traverse each code element recorded in the basic syntax tree to determine the target code element;
基于目标代码元素的元素信息,确定目标查询信息。Target query information is determined based on element information of the target code element.
待处理代码元素为需要执行代码文本处理的代码元素。待处理代码元素的目标位置信息为待处理代码元素在待处理代码文本中的位置信息,包括但不限于:行号、列号和范围。待处理代码元素的位置信息可以为选定的位置信息,例如,当前光标所在位置,也可以为当前编写的代码元素的位置信息,在此不作限定。The code element to be processed is a code element that needs to be processed. The target position information of the code element to be processed is the position information of the code element to be processed in the code text to be processed, including but not limited to: line number, column number and range. The position information of the code element to be processed can be selected position information, for example, the current cursor position, or the position information of the currently written code element, which is not limited here.
待处理代码文本的基础语法树为待处理代码文本的抽象语法树,用于描述待处理代码文本的语法结构的特定结构数据。以树形结构描述了待处理代码文本中的各个代码元素之间的依赖关系,以及它们在待处理代码文本中的位置信息。The basic syntax tree of the code text to be processed is the abstract syntax tree of the code text to be processed, which is used to describe the specific structural data of the syntax structure of the code text to be processed. The tree structure describes the dependency relationship between the various code elements in the code text to be processed and their position information in the code text to be processed.
目标代码元素为待处理代码文本的各代码元素中用于查找第二代码元数据的代码单元,是通过目标位置信息,从基础语法树中遍历得到的。例如,以图3中抽象语法树为例进行说明,目标位置信息为起始行列号[0,1];终止行列号[0,5];范围:方法体内部,从基础语法树记录的代码元素的位置信息中确定目标位置信息对应的代码元素为节点5,节点5和节点2连接,节点2和节点1连接,确定节点1、节点2和节点5对应的代码元素为目标代码元素,即对象元素("demo")、函数元素("add")和变量元素("b")为目标代码元素。The target code element is the code unit used to find the second code metadata in each code element of the code text to be processed, and is obtained by traversing the basic syntax tree through the target position information. For example, taking the abstract syntax tree in Figure 3 as an example, the target position information is the starting row and column number [0,1]; the ending row and row number [0,5]; range: inside the method body, the code recorded from the basic syntax tree In the position information of the element, the code element corresponding to the target position information is determined to be node 5. Node 5 is connected to node 2, and node 2 is connected to node 1. It is determined that the code element corresponding to node 1, node 2 and node 5 is the target code element, that is The object element ("demo"), function element ("add"), and variable element ("b") are object code elements.
基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素,具体方式为:基于目标位置信息,对基础语法树中记录的各代码元素的位置信息进行遍历,确定目标代码元素。Based on the target location information, traverse each code element recorded in the basic syntax tree to determine the target code element. The specific method is: based on the target location information, traverse the location information of each code element recorded in the basic syntax tree to determine the target. Code elements.
基于目标代码元素的元素信息,确定目标查询信息,具体方式为:基于目标代码元素的元素信息,查询预先记录有查询信息的信息表,得到目标查询信息。Based on the element information of the target code element, the target query information is determined. The specific method is: based on the element information of the target code element, query an information table with query information pre-recorded to obtain the target query information.
示例性地,获取待补充代码文本:“var demo={name:"John",age:25,add”中待补充代码元素("add")的目标位置信息:起始行列号[0,1];终止行列号[0,2];范围:方法体内部,利用Parser解析器,对待补充代码文本“var demo={name:"John",age:25,add”进行语法结构解析,获得对象元素("demo")的第一代码元数据为:"name":"demo","signature":"var demo","full_name":"project.path.demo","fields",函数元素("add")的第一代码元数据为:"methods":{"add":{"method_name":"add","signature":"add:function()"}},基于目标位置信息,对基础语法树中记录的各代码元素的位置信息进行遍历,确定目标代码元素为对象元素("demo")和函数元素("add"),基于目标代码元素的元素名称,查询预先记录有查询信息的引用词典,得到目标查询信息:模块路径+模块名+demo+add。For example, the target position information of the code element ("add") to be supplemented in the code text to be supplemented: "var demo = {name: "John", age: 25, add" is obtained: the starting row and column number [0, 1]; the ending row and column number [0, 2]; the range: inside the method body, and the Parser is used to parse the syntax structure of the code text to be supplemented "var demo = {name: "John", age: 25, add", and the first code metadata of the object element ("demo") is obtained as follows: "name": "demo", "signature": "var demo","full_name":"project.path.demo","fields", the first code metadata of the function element ("add") is: "methods":{"add":{"method_name":"add","signature":"add:function()"}}, based on the target position information, the position information of each code element recorded in the basic syntax tree is traversed, and the target code elements are determined to be object elements ("demo") and function elements ("add"). Based on the element name of the target code element, the reference dictionary that pre-records the query information is queried to obtain the target query information: module path + module name + demo + add.
获取待处理代码文本中待处理代码元素的目标位置信息;对待处理代码文本中的代码进行语法结构分析,得到待处理代码文本的基础语法树;解析基础语法树,获得各代码元素的第一代码元数据;基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素;基于目标代码元素的元素信息,确定目标查询信息。通过目标位置信息,对解析得到的基础语法树中各代码元素进行遍历,确定目标代码元素,进而准确确定出需要对应的目标查询信息,为后续查询得到第二代码元数据,提供了准确的查询依据。Obtain the target position information of the code element to be processed in the code text to be processed; analyze the syntax structure of the code in the code text to be processed to obtain the basic syntax tree of the code text to be processed; parse the basic syntax tree to obtain the first code of each code element Metadata; based on the target location information, traverse each code element recorded in the basic syntax tree to determine the target code element; based on the element information of the target code element, determine the target query information. Through the target location information, each code element in the parsed basic syntax tree is traversed to determine the target code element, and then accurately determine the corresponding target query information, which provides an accurate query for subsequent queries to obtain the second code metadata. in accordance with.
在本说明书一种可选实施例中,在基于目标代码元素的元素信息,确定目标查询信息之前,还包括如下具体步骤:In an optional embodiment of the present specification, before determining the target query information based on the element information of the target code element, the following specific steps are also included:
基于目标代码元素的元素标识,查询第一引用词典,获得目标代码元素的元素信息,其中,第一引用词典为遍历基础语法树的过程中更新得到,第一引用词典中记录有各代码元素的元素标识和元素信息的对应关系;Based on the element identifier of the target code element, query the first reference dictionary to obtain the element information of the target code element, wherein the first reference dictionary is updated in the process of traversing the basic syntax tree, and the first reference dictionary records the correspondence between the element identifier and the element information of each code element;
对应地,基于目标代码元素的目标元素信息,确定目标查询信息,包括如下具体步骤:Correspondingly, based on the target element information of the target code element, determining the target query information includes the following specific steps:
基于目标代码元素的元素信息,查询第二引用词典,获得目标代码元素的存储路径作为目标查询信息,其中,第二引用词典中记录有各代码元素的元素信息和存储路径的对应关系。Based on the element information of the target code element, the second reference dictionary is queried to obtain the storage path of the target code element as the target query information, wherein the second reference dictionary records the correspondence between the element information of each code element and the storage path.
元素标识为代码元素在基础语法树中节点的节点标识。例如,目标位置信息确定节点为节点5,则元素标识为节点5。The element identifier is the node identifier of the code element in the basic syntax tree. For example, if the target position information determines that the node is node 5, the element identifier is node 5.
第一引用词典为预先构建的记录有各代码元素的元素标识和元素信息的对应关系,是在遍历基础语法树的过程中不断更新的动态信息表。第一引用词典中还可以记录有各节点(元素标识)的位置信息。例如,对于节点1,第一引用词典中记录有:对象元素-demo;位置信息:起始行列号[0,1];终止行列号[0,8];范围:方法体内部、类内部、方法定义、类定义等。The first reference dictionary is a pre-built record with the corresponding relationship between the element identification and element information of each code element. It is a dynamic information table that is continuously updated during the process of traversing the basic syntax tree. The first reference dictionary may also record the position information of each node (element identifier). For example, for node 1, the first reference dictionary records: object element-demo; position information: starting row number [0,1]; ending row number [0,8]; scope: inside the method body, inside the class, Method definitions, class definitions, etc.
第二引用词典为预先构建的记录有各代码元素的元素信息和存储路径的对应关系,其中,存储路径作为查询信息,用于确定目标查询信息。按照键值对的方式记录各代码元素的元素信息和存储路径,实现了元素信息和存储路径之间的对应查询。例如,对于变量元素-a,第二引用词典中记录有:元素信息:对象元素-demo+函数元素-add+变量元素-a,存储路径:模块路径+模块名+demo+add+a。The second reference dictionary is a pre-built record that contains the corresponding relationship between the element information of each code element and the storage path, where the storage path is used as query information and is used to determine the target query information. The element information and storage path of each code element are recorded in the form of key-value pairs, realizing corresponding query between element information and storage path. For example, for variable element-a, the second reference dictionary records: element information: object element-demo+function element-add+variable element-a, storage path: module path+module name+demo+add+a.
图4示出了本说明书一个实施例提供的一种代码文本处理方法中第一引用词典和第二引用词典的示意图,如图4所示:Figure 4 shows a schematic diagram of the first reference dictionary and the second reference dictionary in a code text processing method provided by an embodiment of this specification, as shown in Figure 4:
对于图3中的代码文本的抽象语法树,在遍历抽象语法树的过程中更新得到第一引用词典。第一引用词典中记录有:节点1:对象元素-demo;位置信息:起始行列号[0,1];终止行列号[0,8];范围:对象定义。节点2:函数元素-add;位置信息:起始行列号[0,2];终止行列号[0,4];范围:模块内部。节点3:函数元素-mul;位置信息:起始行列号[0,5];终止行列号[0,7];范围:模块内部。节点4:变量元素-a;位置信息:起始行列号[0,2];终止行列号[0,3];范围:方法定义。节点5:变量元素-b;位置信息:起始行列号[0,2];终止行列号[0,3];范围:方法定义。节点6:变量元素-a;位置信息:起始行列号[0,5];终止行列号[0,6];范围:方法定义。节点7:变量元素-b;位置信息:起始行列号[0,5];终止行列号[0,6];范围:方法定义。For the abstract syntax tree of the code text in Figure 3, the first reference dictionary is updated during the traversal of the abstract syntax tree. The records in the first reference dictionary include: node 1: object element-demo; position information: starting row and column number [0,1]; ending row and row number [0,8]; range: object definition. Node 2: Function element -add; position information: starting row and column number [0,2]; ending row and row number [0,4]; range: inside the module. Node 3: Function element -mul; position information: starting row and column number [0,5]; ending row and row number [0,7]; range: inside the module. Node 4: variable element -a; position information: starting row and column number [0,2]; ending row and row number [0,3]; range: method definition. Node 5: variable element -b; position information: starting row and column number [0,2]; ending row and row number [0,3]; range: method definition. Node 6: variable element -a; position information: starting row and column number [0,5]; ending row and row number [0,6]; range: method definition. Node 7: variable element -b; position information: starting row and column number [0,5]; ending row and row number [0,6]; range: method definition.
第二引用词典中按照键值对的形式,记录有:(元素信息:对象元素-demo;存储路径:模块路径+模块名+demo)、(元素信息:对象元素-demo+函数元素-add;存储路径:模块路径+模块名+demo+add)、(元素信息:对象元素-demo+函数元素-mul;存储路径:模块路径+模块名+demo+mul)、(元素信息:对象元素-demo+函数元素-add+变量元素-a;存储路径:模块路径+模块名+demo+add+a)、(元素信息:对象元素-demo+函数元素-add+变量元素-b;存储路径:模块路径+模块名+demo+add+b)、(元素信息:对象元素-demo+函数元素-mul+变量元素-a;存储路径:模块路径+模块名+demo+mul+b)、(元素信息:对象元素-demo+函数元素-mul+变量元素-b;存储路径:模块路径+模块名+demo+mul+b)。In the second reference dictionary, in the form of key-value pairs, the records are: (element information: object element-demo; storage path: module path+module name+demo), (element information: object element-demo+function element-add; storage Path: module path + module name + demo + add), (element information: object element - demo + function element - mul; storage path: module path + module name + demo + mul), (element information: object element - demo + function element -add+variable element-a; storage path: module path+module name+demo+add+a), (element information: object element-demo+function element-add+variable element-b; storage path: module path+module name+demo +add+b), (Element information: object element-demo+function element-mul+variable element-a; storage path: module path+module name+demo+mul+b), (element information: object element-demo+function element- mul+variable element-b; storage path: module path+module name+demo+mul+b).
通过第二引用词典可以理解,对于相同名称的代码元素(例如,节点4对应的变量元素a和节点6对应的变量元素a),由于明确有辨识性的元素信息(对象元素-demo+函数元素-a dd+变量元素-a,以及对象元素-demo+函数元素-mul+变量元素-a),可以对代码元素进行准确区分,保证确定的目标查询信息的准确度,得到有效的第二代码元数据,避免引入无效参考代码,对代码生成模型造成模型幻觉,实现了高准确度的代码文本处理。是一种全路径唯一标识的代码元数据获取方式。Through the second reference dictionary, it can be understood that for code elements with the same name (for example, variable element a corresponding to node 4 and variable element a corresponding to node 6), due to the clear and identifiable element information (object element-demo+function element-add+variable element-a, and object element-demo+function element-mul+variable element-a), the code elements can be accurately distinguished, the accuracy of the determined target query information is guaranteed, and effective second code metadata is obtained, which avoids the introduction of invalid reference code and the model illusion caused to the code generation model, and realizes high-accuracy code text processing. It is a code metadata acquisition method with full path unique identification.
需要说明的是,由于第一引用词典是在遍历过程中更新的,在遇到重复的元素信息的情况下,对重复的元素信息进行合并,取位置信息更新的元素信息来确定目标查询信息。It should be noted that since the first reference dictionary is updated during the traversal process, when repeated element information is encountered, the repeated element information is merged, and the element information with updated position information is used to determine the target query information.
示例性地,基于目标代码元素的元素标识:节点1和节点2,查询第一引用词典Refmap,获得目标代码元素(对象元素("demo")和函数元素("add"))的元素名称,基于目标代码元素的元素名称,查询第二引用词典importRefmap,获得目标代码元素的存储路径作为目标查询信息:模块路径+模块名+demo+add。For example, based on the element identification of the target code element: node 1 and node 2, query the first reference dictionary Refmap to obtain the element name of the target code element (object element ("demo") and function element ("add")), Based on the element name of the target code element, query the second reference dictionary importRefmap to obtain the storage path of the target code element as the target query information: module path + module name + demo + add.
基于目标代码元素的元素标识,查询第一引用词典,获得目标代码元素的元素信息,其中,第一引用词典为遍历基础语法树的过程中更新得到,第一引用词典中记录有各代码元素的元素标识和元素信息的对应关系;基于目标代码元素的元素信息,查询第二引用词典,获得目标代码元素的存储路径作为目标查询信息,其中,第二引用词典中记录有各代码元素的元素信息和存储路径的对应关系。实现了全路径唯一标识的路径查询,保证了准确地得到有效的第二代码元数据,避免引入无效参考代码,对代码生成模型造成模型幻觉,实现了高准确度的代码文本处理。Based on the element identifier of the target code element, the first reference dictionary is queried to obtain the element information of the target code element. The first reference dictionary is updated during the traversal of the basic syntax tree. The first reference dictionary records the information of each code element. Correspondence between element identification and element information; based on the element information of the target code element, query the second reference dictionary to obtain the storage path of the target code element as the target query information, where the element information of each code element is recorded in the second reference dictionary corresponding relationship with the storage path. It implements the path query with the unique identification of the whole path, ensuring that the valid second code metadata is accurately obtained, avoiding the introduction of invalid reference codes and causing model illusion to the code generation model, and achieving high-accuracy code text processing.
在本说明书一种可选实施例中,在基于目标代码元素的元素标识,查询第一引用词典之后,还包括如下具体步骤:In an optional embodiment of the present specification, after searching the first reference dictionary based on the element identifier of the target code element, the following specific steps are also included:
在查询到目标代码元素的元素信息的情况下,利用目标位置信息,更新第一引用词典中目标代码元素的位置信息。When the element information of the target code element is queried, the target location information is used to update the location information of the target code element in the first reference dictionary.
第一引用词典是在遍历基础语法树的过程中更新的,因此在目标位置信息为当前编写的代码元素的位置信息的情况下,随着编写过程的进行,在第一引用词典中记录有目标代码元素的位置信息的情况下,需要利用目标位置信息对位置信息进行动态更新,保证第一引用词典中记录信息的准确性。例如,对于节点1,第一引用词典中已经记录有节点1:对象元素-demo;位置信息:起始行列号[0,1];终止行列号[0,8];范围:对象体内部,则目标位置信息为第9行,更新后的节点1的元素信息为:对象元素-demo;位置信息:起始行列号[0,1];终止行列号[0,9];范围:对象体内部。The first reference dictionary is updated in the process of traversing the basic syntax tree. Therefore, when the target position information is the position information of the currently written code element, as the writing process proceeds, the target is recorded in the first reference dictionary. In the case of position information of code elements, the target position information needs to be used to dynamically update the position information to ensure the accuracy of the information recorded in the first reference dictionary. For example, for node 1, the first reference dictionary has already recorded node 1: object element-demo; position information: starting row number [0,1]; ending row number [0,8]; range: inside the object body, Then the target position information is line 9, and the updated element information of node 1 is: object element-demo; position information: starting row and column number [0,1]; ending row and row number [0,9]; range: object body internal.
示例性地,在查询到目标代码元素(对象元素("demo")和函数元素("add"))的元素名称的情况下,利用目标位置信息(第9行),更新第一引用词典中目标代码元素的位置信息:起始行列号[0,1];终止行列号[0,9];范围:对象体内部。For example, when the element name of the target code element (object element ("demo") and function element ("add")) is queried, the target location information (line 9) is used to update the first reference dictionary Position information of the target code element: starting row and column number [0,1]; ending row and row number [0,9]; range: inside the object body.
在查询到目标代码元素的元素信息的情况下,利用目标位置信息,更新第一引用词典中目标代码元素的位置信息。保证了第一引用词典记录的元素信息和位置信息的准确性,保证了后续查询的准确度。When the element information of the target code element is queried, the target position information is used to update the position information of the target code element in the first reference dictionary. This ensures the accuracy of the element information and position information recorded in the first reference dictionary, and ensures the accuracy of subsequent queries.
在本说明书一种可选实施例中,该方法还包括如下具体步骤:In an optional embodiment of this specification, the method also includes the following specific steps:
在未查询到目标代码元素的元素信息的情况下,将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中。If the element information of the target code element is not found, the element identification, element information and position information of the target code element are recorded in the first reference dictionary.
第一引用词典是在遍历基础语法树的过程中更新的,因此在目标位置信息为当前编写的代码元素的位置信息的情况下,随着编写过程的进行,在第一引用词典中未记录有目标代码元素的位置信息的情况下,表明正在编写的代码元素为新的代码元素,需要将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中,保证第一引用词典中记录信息的完整性。例如,对于节点1,第一引用词典中未经记录有节点7,将元素标识(节点7)、元素信息(变量元素-b)和位置信息(起始行列号[0,5];终止行列号[0,6];范围:方法定义)记录至第一引用词典中。The first reference dictionary is updated during the process of traversing the basic syntax tree. Therefore, when the target position information is the position information of the code element currently being written, as the writing process proceeds, if the position information of the target code element is not recorded in the first reference dictionary, it indicates that the code element being written is a new code element, and the element identifier, element information and position information of the target code element need to be recorded in the first reference dictionary to ensure the integrity of the information recorded in the first reference dictionary. For example, for node 1, node 7 is not recorded in the first reference dictionary, and the element identifier (node 7), element information (variable element-b) and position information (starting row and column number [0,5]; ending row and column number [0,6]; range: method definition) are recorded in the first reference dictionary.
示例性地,在未查询到目标代码元素(变量元素(b))的元素名称的情况下,将元素标识(节点7)、元素信息(变量元素-b)和位置信息(起始行列号[0,5];终止行列号[0,6];范围:方法定义)记录至第一引用词典中。For example, when the element name of the target code element (variable element (b)) is not found, the element identification (node 7), element information (variable element-b) and position information (starting row and column number [ 0,5]; Termination row number [0,6]; Range: method definition) is recorded in the first reference dictionary.
在未查询到目标代码元素的元素信息的情况下,将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中。保证了第一引用词典记录的元素标识、元素信息和位置信息的完整性,保证了后续查询的准确度。If the element information of the target code element is not found, the element identification, element information and location information of the target code element are recorded in the first reference dictionary, thereby ensuring the integrity of the element identification, element information and location information recorded in the first reference dictionary and the accuracy of subsequent queries.
在本说明书一种可选实施例中,第二代码元数据为至少一个;In an optional embodiment of this specification, the second code metadata is at least one;
对应地,步骤110包括如下具体步骤:Correspondingly, step 110 includes the following specific steps:
基于至少一个第二代码元数据对应的目标代码元素的位置信息,确定至少一个第二代码元数据的权重;Determine the weight of at least one second code metadata based on the position information of the target code element corresponding to the at least one second code metadata;
基于各第二代码元数据的权重,将各第二代码元数据放入代码文本序列;Based on the weight of each second code metadata, each second code metadata is placed into a code text sequence;
将第一代码元数据和代码文本序列输入代码生成模型,生成目标代码文本。The first code metadata and the code text sequence are input into the code generation model to generate target code text.
由于代码生成模型是基于样本代码文本对文本处理模型训练得到,文本生成模型可以处理的文本长度是有限的,在获取大量第二代码元数据的情况下,不能全部输入代码生成模型,需要进行选择。第一引用词典中记录的位置信息是动态更新的,前后可能编写了相似代码元素,都可以作为第二代码元数据,优先选择新编写的代码元数据,让其有更高权重。Since the code generation model is trained on the text processing model based on sample code text, the length of text that the text generation model can process is limited. When a large amount of second code metadata is obtained, all of it cannot be input into the code generation model and selection needs to be made. . The location information recorded in the first reference dictionary is dynamically updated. Similar code elements may be written before and after, which can be used as second code metadata. The newly written code metadata is given priority to give it a higher weight.
第二代码元数据的权重为文本序列中的排序权重,权重越高越优先放入文本序列。例如,对于第二代码元数据:"methods":{"add":{"method_name":"add","signature":"add:function()"}}和"methods":{"mul":{"method_name":"mul","signature":"mul:function()"}},前者的权重高于后者,则代码文本序列为"methods":{"add":{"method_name":"add","signature":"add:function()"}},分隔符,"methods":{"mul":{"method_name":"mul","signature":"mul:function()"}}。The weight of the second code metadata is the sorting weight in the text sequence. The higher the weight, the priority is placed in the text sequence. For example, for the second code metadata: "methods":{"add":{"method_name":"add","signature":"add:function()"}} and "methods":{"mul": {"method_name":"mul","signature":"mul:function()"}}, the former has a higher weight than the latter, then the code text sequence is "methods":{"add":{"method_name": "add","signature":"add:function()"}}, delimiter, "methods":{"mul":{"method_name":"mul","signature":"mul:function()" }}.
在本说明书一种可选实施例中,在基于至少一个第二代码元数据对应的目标代码元素的位置信息,确定至少一个第二代码元数据的权重之前,还包括如下具体步骤:In an optional embodiment of this specification, before determining the weight of at least one second code metadata based on the position information of the target code element corresponding to the at least one second code metadata, the following specific steps are further included:
获取待处理代码文本中待处理代码元素的目标位置信息;Obtain the target position information of the code element to be processed in the code text to be processed;
对应地,基于至少一个第二代码元数据对应的目标代码元素的位置信息,确定至少一个第二代码元数据的权重,包括如下具体步骤:Correspondingly, determining the weight of at least one second code metadata based on the position information of the target code element corresponding to the at least one second code metadata includes the following specific steps:
基于目标位置信息和至少一个第二代码元数据对应的目标代码元素的位置信息,确定目标代码元素与待处理代码元素之间的位置距离;Determine the positional distance between the target code element and the code element to be processed based on the target position information and the position information of the target code element corresponding to the at least one second code metadata;
基于目标代码元素与待处理代码元素之间的位置距离,确定至少一个第二代码元数据的权重。A weight of at least one second code metadata is determined based on a positional distance between the target code element and the code element to be processed.
可选地,在代码文本序列的长度达到预设阈值的情况下,停止执行步骤108。Optionally, when the length of the code text sequence reaches a preset threshold, step 108 is stopped.
另外,如果第二代码元数据在高速读写介质中时,权重添加固定数值。而且,根据代码语言和不同的范围,设定不同权重。In addition, if the second code metadata is in a high-speed read-write medium, a fixed value is added to the weight. Moreover, different weights are set according to the code language and different scopes.
可选地,权重还可以通过预设算法计算得到,预设算法包括但不限于:Optionally, the weight can also be calculated through a preset algorithm, which includes but is not limited to:
方法1:method 1:
在获取目标项目的项目文件时候,构建项目文件中的代码引用序列,采用n-gram方式建立索引。在获得第二代码元数据并进行权重时,获取目标位置信息前一定范围的代码引用序列信息,并通过n-gram算法预测得到第二代码元数据,将第二代码元数据的代码引用序列信息中权重确定为权重。When obtaining the project file of the target project, the code reference sequence in the project file is constructed and indexed using n-gram method. When the second code metadata is obtained and weighted, a certain range of code reference sequence information before the target position information is obtained, and the second code metadata is predicted through the n-gram algorithm, and the code reference sequence information of the second code metadata is obtained The medium weight is determined as the weight.
方法2:Method 2:
通过构建样本数据,根据样本代码元数据的位置信息、范围信息、语法结构等多种特征维度训练深度学习模型,得到排序模型,用于对代码元数据进行排序。在将各第二代码元数据放入代码文本序列前,将获取的所有代码元数据的位置信息、范围信息、语法结构等多种特征维度的信息输入到排序模型中,为每个代码元数据输出概率值,根据概率值高低确定权重,将各第二代码元数据放入代码文本序列。By constructing sample data, a deep learning model is trained based on multiple feature dimensions such as location information, range information, and grammatical structure of sample code metadata to obtain a sorting model for sorting code metadata. Before putting each second code metadata into the code text sequence, the location information, range information, grammatical structure, and other feature dimensions of all the acquired code metadata are input into the sorting model, and a probability value is output for each code metadata. The weight is determined according to the probability value, and each second code metadata is put into the code text sequence.
示例性地,基于2个第二代码元数据对应的目标代码元素(函数元素("add")和函数元素("mul"))的位置信息(第9行和第5行)和目标位置信息(第14行),2个目标代码元素的距离为:5和9,基于2个目标代码元素的距离,确定2个第二代码元数据的权重为:1/5和1/9,基于2个第二代码元数据的权重(1/5大于1/9),将2个第二代码元数据放入代码文本序列:"methods":{"add":{"method_name":"add","signature":"add:function()"}},分隔符,"methods":{"mul":{"method_name":"mul","signature":"mul:function()"}},将第一代码元数据和代码文本序列输入大语言模型,生成补充代码文本:“var demo={name:"John",age:25,add:function(){console.log(this.name);},};demo.add”。Illustratively, based on the position information (line 9 and line 5) and target position information of the target code elements (function element ("add") and function element ("mul")) corresponding to the two second code metadata (Line 14), the distance between the two target code elements is: 5 and 9, based on the distance between the two target code elements, the weight of the two second code metadata is determined: 1/5 and 1/9, based on 2 The weight of the second code metadata (1/5 is greater than 1/9), put the two second code metadata into the code text sequence: "methods":{"add":{"method_name":"add", "signature":"add:function()"}}, delimiter, "methods":{"mul":{"method_name":"mul","signature":"mul:function()"}}, will The first code metadata and code text sequence are input into the large language model to generate supplementary code text: "var demo={name:"John",age:25,add:function(){console.log(this.name);} ,};demo.add".
基于至少一个第二代码元数据对应的目标代码元素的位置信息,确定至少一个第二代码元数据的权重;基于各第二代码元数据的权重,将各第二代码元数据放入代码文本序列;将第一代码元数据和代码文本序列输入代码生成模型,生成目标代码文本。通过位置信息,实现了对第二代码元数据的合理排序,构建了代码文本序列,在满足了代码生成模型的文本序列限定的情况下,优先处理更可能被引用的第二代码元数据,提升了代码文本处理的准确度。Based on the position information of the target code element corresponding to the at least one second code metadata, determine the weight of the at least one second code metadata; based on the weight of each second code metadata, put each second code metadata into the code text sequence ; Input the first code metadata and code text sequence into the code generation model to generate target code text. Through location information, the second code metadata is reasonably sorted, and the code text sequence is constructed. When the text sequence limit of the code generation model is met, the second code metadata that is more likely to be referenced is prioritized, improving Improved the accuracy of code text processing.
在本说明书一种可选实施例中,在步骤106之前,还包括如下具体步骤:In an optional embodiment of the present specification, before step 106, the following specific steps are also included:
识别待处理代码文本的代码语言;Identify the code language of the code text to be processed;
对应地,步骤106包括如下具体步骤:Correspondingly, step 106 includes the following specific steps:
根据代码语言和各代码元素的元素信息,确定代码语言下的目标查询信息;According to the code language and the element information of each code element, determine the target query information under the code language;
从代码语言的目标数据库中,查找目标查询信息对应的第二代码元数据。Search the second code metadata corresponding to the target query information from the target database of the code language.
不同的代码语言,对应有不同的元素信息,进而确定不同的查询信息,具体参见步骤104中对查询信息的说明。同时,不同的代码语言的代码元数据也不同,因而,需要根据代码语言和各代码元素的元素信息,确定代码语言下的目标查询信息,并从代码语言的目标数据库中,查找目标查询信息对应的第二代码元数据。Different code languages correspond to different element information, thereby determining different query information. For details, please refer to the description of the query information in step 104. At the same time, different code languages have different code metadata. Therefore, it is necessary to determine the target query information under the code language based on the code language and the element information of each code element, and find the corresponding target query information from the target database of the code language. Second code metadata.
示例性地,待补充代码文本为:“var demo={name:"John",age:25,add”,识别待补充代码文本的代码语言为JavaScript,根据代码语言JavaScript和各代码元素中目标代码元素(对象元素("demo")和函数元素("add"))的元素名称,查询预先记录有JavaScript元素信息和查询信息之间的对应关系的引用词典,得到目标查询信息:模块路径+模块名+de mo+add,基于JavaScript数据库中记录的参考查询信息和参考代码元数据的对应关系的键值对,以目标查询信息“模块路径+模块名+demo+add”为键,查找对应值的第二代码元数据:{"name":"demo","signature":"vardemo","full_name":"project.path.demo","fields":{"name":{"field_name":"name","field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_v alue":"25","signature":"age:25"}},"methods":{"add":{"method_name":"add","signature":"add:funct ion()"}}}。Exemplarily, the code text to be supplemented is: "var demo={name:"John", age:25, add", the code language of the code text to be supplemented is identified as JavaScript, and according to the code language JavaScript and the element name of the target code element (object element ("demo") and function element ("add")) in each code element, a reference dictionary that pre-records the correspondence between JavaScript element information and query information is queried to obtain the target query information: module path+module name+de mo+add, and based on the key-value pairs of the correspondence between the reference query information and the reference code metadata recorded in the JavaScript database, the target query information "module path+module name+demo+add" is used as the key to search for the second code metadata of the corresponding value: {"name":"demo","signature":"vardemo","full_name":"project.path.demo","fields":{"name":{"field_name":"name","field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_value":"John","signature":"name:'John'"},"age":{"field_name":"age","field_value":"John","signature":"name:'John'"}," age":"25","signature":"age:25"}},"methods":{"add":{"method_name":"add","signature":"add:funct ion()"}}}.
识别待处理代码文本的代码语言;根据代码语言和各代码元素的元素信息,确定代码语言下的目标查询信息;从代码语言的目标数据库中,查找目标查询信息对应的第二代码元数据。通过识别代码语言,对应保证了确定的目标查询信息的准确性和查询得到第二代码元数据的有效性,保证了代码文本处理的准确度。Identify the code language of the code text to be processed; determine the target query information under the code language based on the code language and the element information of each code element; search for the second code metadata corresponding to the target query information from the target database of the code language. By identifying the code language, the correspondence ensures the accuracy of the determined target query information and the validity of the second code metadata obtained from the query, ensuring the accuracy of code text processing.
图5示出了本说明书一个实施例提供的一种代码文本处理方法中代码文本序列的更新流程图,如图5所示:FIG5 shows a flowchart of updating a code text sequence in a code text processing method provided by an embodiment of this specification, as shown in FIG5 :
在开发者打开目标项目中的项目文件、关闭目标项目中的项目文件或者对目标项目中的项目文件进行修改的情况下,需要对应更新代码元数据。更新的代码元数据可能为近期需要使用的代码元数据,将更新的代码元数据加入动态文件队列(位于内存或者缓存等高速读写介质),提升后续代码文本处理的效率。When a developer opens a project file in the target project, closes a project file in the target project, or modifies a project file in the target project, the code metadata needs to be updated accordingly. The updated code metadata may be code metadata that needs to be used in the near future. Add the updated code metadata to the dynamic file queue (located in high-speed read and write media such as memory or cache) to improve the efficiency of subsequent code text processing.
具体地:打开项目文件/关闭项目文件/修改项目文件;请求本地文件变更接口;对变更后的项目文件进行解析,获得各代码元素的代码元数据;根据代码语言将代码元数据加入动态文件队列;判断队列长度是否超出阈值;若否,直接结束;若是,按照先入先出顺序,移除队列中的代码元数据,结束。Specifically: open the project file/close the project file/modify the project file; request the local file change interface; parse the changed project file to obtain the code metadata of each code element; add the code metadata to the dynamic file queue according to the code language ; Determine whether the queue length exceeds the threshold; if not, end directly; if so, remove the code metadata in the queue in first-in, first-out order and end.
图6示出了本说明书一个实施例提供的一种代码文本处理方法中代码文本的实时分析流程图,如图6所示:Figure 6 shows a flow chart of real-time analysis of code text in a code text processing method provided by an embodiment of this specification, as shown in Figure 6:
在开发者编写代码文本的过程中,会实时调用代码文本处理进程中的代码生成接口,并将正在编写的目标项目的待处理代码文本传入进程中,实现以下步骤:During the process of developers writing code text, the code generation interface in the code text processing process will be called in real time, and the pending code text of the target project being written will be passed into the process to implement the following steps:
首先,获取目标项目的待处理代码文本;请求本地代码生成接口。First, obtain the code text to be processed of the target project; request the local code generation interface.
接着,对待处理代码文本进行解析,获得各代码元素的第一代码元数据,并根据各代码元素的元素信息,确定目标查询信息;从目标数据库中,查找目标查询信息对应的第二代码元数据。Next, the code text to be processed is parsed to obtain the first code metadata of each code element, and the target query information is determined according to the element information of each code element; and the second code metadata corresponding to the target query information is searched from the target database.
同时,对待处理代码文本进行预处理,获得代码上下文信息。At the same time, the code text to be processed is preprocessed to obtain code context information.
最后,基于第一代码元数据、第二代码元数据和代码上下文信息,调用代码生成模型,生成目标代码文本。Finally, based on the first code metadata, the second code metadata and the code context information, the code generation model is called to generate the target code text.
图7示出了本说明书一个实施例提供的一种代码文本处理方法中代码文本的解析流程图,如图7所示:Figure 7 shows a flow chart of parsing code text in a code text processing method provided by an embodiment of this specification, as shown in Figure 7:
图6中代码解析的一个可行实施例如下:A possible implementation of the code parsing in Figure 6 is as follows:
获取目标项目的待处理代码文本和待处理代码文本中待处理代码元素的目标位置信息;对待处理代码文本中的代码进行语法结构分析,得到待处理代码文本的基础语法树;解析基础语法树,获得各代码元素的第一代码元数据;基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素;基于目标代码元素的元素标识,查询第一引用词典,并记录各代码元素的元素标识和元素信息的对应关系;判断是否查询到目标代码元素的元素信息;若否,将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中;若是,利用目标位置信息,更新第一引用词典中目标代码元素的位置信息;基于所述目标代码元素的元素信息,查询第二引用词典,获得目标代码元素的存储路径作为目标查询信息,并记录各代码元素的元素信息和存储路径的对应关系;从目标数据库中,查找所述目标查询信息对应的第二代码元数据,基于第二代码元数据对应的目标代码元素的位置信息,确定第二代码元数据的权重;基于各第二代码元数据的权重,将各第二代码元数据放入代码文本序列。Obtain the target project's code text to be processed and the target position information of the code element to be processed in the code text to be processed; analyze the syntax structure of the code in the code text to be processed to obtain the basic syntax tree of the code text to be processed; parse the basic syntax tree, Obtain the first code metadata of each code element; based on the target location information, traverse each code element recorded in the basic syntax tree to determine the target code element; based on the element identification of the target code element, query the first reference dictionary and record The corresponding relationship between the element identification and element information of each code element; determine whether the element information of the target code element is queried; if not, record the element identification, element information and position information of the target code element into the first reference dictionary; if so, Using the target location information, update the location information of the target code element in the first reference dictionary; based on the element information of the target code element, query the second reference dictionary, obtain the storage path of the target code element as the target query information, and record each code The corresponding relationship between the element information and the storage path of the element; search the second code metadata corresponding to the target query information from the target database, and determine the second code element based on the position information of the target code element corresponding to the second code metadata. The weight of the data; based on the weight of each second code metadata, each second code metadata is placed into the code text sequence.
图8示出了本说明书一个实施例提供的一种代码文本处理方法的前端示意图,如图8所示:Figure 8 shows a front-end schematic diagram of a code text processing method provided by an embodiment of this specification, as shown in Figure 8:
在集成开发环境的前端界面上,包括代码解析控件、代码生成接口(启动或者关闭)、目标项目的项目文件目录和代码文本编写区域。The front-end interface of the integrated development environment includes code parsing controls, code generation interfaces (start or shut down), the project file directory of the target project, and the code text writing area.
目标项目的项目文件目录按照层级结构为:项目工程-项目文件1-模块1.1和模块1,2;项目工程-项目文件2。The project file directory of the target project is hierarchically structured as: project project-project file 1-module 1.1 and modules 1,2; project project-project file 2.
如上方图所示:代码文本编写区域已经编写的待补充代码文本为:“var demo={name:"John",age:25,add”,当前光标位置停留在第2行。As shown in the figure above: the code text to be supplemented in the code text editing area is: "var demo = {name: "John", age: 25, add", and the current cursor position stays on the second line.
如下方图所示:通过执行上述说明书实施例,得到补充代码文本:“var demo={name:"John",age:25,add:function(){console.log(this.name);},};demo.add”。利用补充代码文本对待补充代码文本进行补充。As shown in the figure below: By executing the above embodiment of the description, the supplementary code text is obtained: "var demo={name:"John",age:25,add:function(){console.log(this.name);}, };demo.add". The supplementary code text is supplemented with the supplementary code text.
参见图9,图9示出了本说明书一个实施例提供的一种代码补充方法的流程图,该方法应用于云侧设备,包括如下具体步骤:Referring to Figure 9, Figure 9 shows a flow chart of a code supplement method provided by an embodiment of this specification. The method is applied to cloud-side devices and includes the following specific steps:
步骤902:接收端侧设备输入的目标项目的待补充代码文本。Step 902: Receive the code text to be supplemented of the target item input by the end-side device.
步骤904:对待补充代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息。Step 904: Parse the code text to be supplemented to obtain first code metadata of each code element, wherein each code element has corresponding element information.
步骤906:根据各代码元素的元素信息,确定目标查询信息。Step 906: Determine target query information according to the element information of each code element.
步骤908:从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建。Step 908: Find the second code metadata corresponding to the target query information from the target database. The target database records the corresponding relationship between the reference query information and the reference code metadata. The reference query information is based on each item in the project file of the target project. Element information construction of code elements.
步骤910:基于第一代码元数据和第二代码元数据,利用代码生成模型,生成补充代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。Step 910: Based on the first code metadata and the second code metadata, a code generation model is used to generate supplementary code text, wherein the code generation model is obtained by training a text processing model based on sample code text.
步骤912:将补充代码文本发送至端侧设备,以使端侧设备利用补充代码文本对待补充代码文本进行补充。Step 912: Send the supplementary code text to the end-side device, so that the end-side device supplements the code text to be supplemented with the supplementary code text.
本说明书实施例应用于具有代码文本处理功能的网页、应用程序或者小程序的服务端所在的网络云设备,为一种虚拟设备,该云侧设备上部署具有代码文本处理功能的代码生成功能模型。端侧设备为用户登录的具有代码文本处理功能的网页、应用程序或者小程序的客户端所在的终端,为一种实体设备。云侧设备和端侧设备通过网络传输信道连接,进行数据传输。云侧设备的算力性能和存储性能高于端侧设备。The embodiments of this specification are applied to the network cloud device where the server of a web page, application or applet with code text processing function is located. It is a virtual device. A code generation function model with code text processing function is deployed on the cloud side device. . The end-side device is the terminal where the client of the web page, application program or applet with code text processing function that the user logs in is located, and is a physical device. Cloud-side devices and end-side devices are connected through network transmission channels for data transmission. The computing performance and storage performance of cloud-side devices are higher than those of end-side devices.
本说明书实施例与上述图1说明书实施例出于同一发明构思,步骤904-步骤910的具体方式参见图1说明书实施例中步骤104-步骤110的内容,在此不再赘述。The embodiment of this description is based on the same inventive concept as the above-mentioned embodiment of FIG. 1. For the specific methods of steps 904 to 910, please refer to the content of steps 104 to 110 in the embodiment of FIG. 1, which will not be described again here.
本说明书实施例中,接收端侧设备输入的目标项目的待补充代码文本;对待补充代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;根据各代码元素的元素信息,确定目标查询信息;从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;基于第一代码元数据和第二代码元数据,利用代码生成模型,生成补充代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到;将补充代码文本发送至端侧设备,以使端侧设备利用补充代码文本对待补充代码文本进行补充。通过预先解析目标项目的项目文件,基于各代码元素的元素信息构建出参考查询信息,与参考代码元数据对应存储在目标数据库中,在进行代码文本处理过程中,通过解析待处理代码文本,获得各代码元素的第一代码元数据,进而根据各代码元素的元素信息,确定目标查询信息,利用这种明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,生成高准确度的补充代码文本,提升了代码补充的准确度,同时,在高计算性能和高存储性能的云侧设备上实现,提升了代码补充的效率和准确度。In the embodiment of this specification, the code text to be supplemented of the target item input by the end-side device is received; the code text to be supplemented is parsed to obtain the first code metadata of each code element, wherein each code element has corresponding element information; according to The element information of each code element determines the target query information; the second code metadata corresponding to the target query information is searched from the target database, where the corresponding relationship between the reference query information and the reference code metadata is recorded in the target database, and the reference query The information is constructed based on the element information of each code element in the project file of the target project; based on the first code metadata and the second code metadata, a code generation model is used to generate supplementary code text, where the code generation model is based on the sample code text pair text The processing model is trained; the supplementary code text is sent to the end-side device, so that the end-side device uses the supplementary code text to supplement the to-be-supplemented code text. By pre-parsing the project files of the target project, reference query information is constructed based on the element information of each code element, and stored in the target database corresponding to the reference code metadata. During the code text processing process, by parsing the code text to be processed, we obtain The first code metadata of each code element is used to determine the target query information based on the element information of each code element. This clear code reference relationship is used to query the target database and obtain the effective second code metadata as a reference code. Guide the code generation model to generate high-accuracy supplementary code text, which improves the accuracy of code supplementation. At the same time, it is implemented on cloud-side devices with high computing performance and high storage performance, improving the efficiency and accuracy of code supplementation.
在本说明书一种可选实施例中,在步骤912之后,还包括如下具体步骤:In an optional embodiment of the present specification, after step 912, the following specific steps are also included:
接收端侧设备发送的补充反馈信息,其中,补充反馈信息为针对补充代码文本进行反馈的信息;Receive supplementary feedback information sent by the end-side device, where the supplementary feedback information is information for feedback on the supplementary code text;
基于补充反馈信息,调整代码生成模型的参数。Based on the additional feedback information, the parameters of the code generation model are adjusted.
补充反馈信息为针对补充代码文本进行反馈的信息,包括但不限于:代码语法错误、代码命名错误和代码补充不完整等。Supplementary feedback information is feedback on supplementary code text, including but not limited to: code syntax errors, code naming errors, incomplete code supplements, etc.
示例性地,接收端侧设备针对补充代码文本进行反馈的补充反馈信息:存在代码语法错误,存在代码命名错误和存在代码补充不完整,基于补充反馈信息,调整大语言模型的参数。Exemplarily, the receiving-side device provides supplementary feedback information for supplementing the code text: code syntax errors, code naming errors, and incomplete code supplementation, and adjusts the parameters of the large language model based on the supplementary feedback information.
本说明书实施例中,通过交互的方式,完成了对代码生成模型的参数的反馈调整,针对性地提升了代码补充的效果。In the embodiment of this specification, the feedback adjustment of the parameters of the code generation model is completed through interaction, thereby improving the effect of code supplementation in a targeted manner.
下述结合附图10,以本说明书提供的代码文本处理方法在集成开发环境的应用为例,对所述代码文本处理方法进行进一步说明。其中,图10示出了本说明书一个实施例提供的一种应用于集成开发环境的代码文本处理方法的处理过程流程图,包括如下具体步骤:The code text processing method will be further described below with reference to Figure 10 , taking the application of the code text processing method provided in this specification in an integrated development environment as an example. Among them, Figure 10 shows a process flow chart of a code text processing method applied to an integrated development environment provided by one embodiment of this specification, including the following specific steps:
步骤1002:在启动集成开发环境的情况下,获取目标项目的项目文件。Step 1002: With the integrated development environment started, obtain the project file of the target project.
步骤1004:对无效项目文件进行过滤。Step 1004: Filter invalid project files.
步骤1006:对项目文件中的代码进行语法结构分析,得到项目文件的项目语法树,解析项目语法树,获得各代码元素的参考代码元数据。Step 1006: Perform syntax structure analysis on the code in the project file to obtain a project syntax tree of the project file, parse the project syntax tree, and obtain reference code metadata of each code element.
步骤1008:根据各代码元素的元素信息,确定各代码元素间的结构信息,基于各代码元素间的结构信息,构建各代码元素的参考代码元数据的存储路径作为参考查询信息。Step 1008: Determine the structural information between the code elements according to the element information of each code element, and construct the storage path of the reference code metadata of each code element as reference query information based on the structural information between the code elements.
步骤1010:基于参考查询信息和参考代码元数据的对应关系,构建目标数据库。Step 1010: Build a target database based on the correspondence between the reference query information and the reference code metadata.
步骤1012:获取待处理代码文本和当前光标位置所在的待处理代码元素的目标位置信息。Step 1012: Obtain the code text to be processed and the target position information of the code element to be processed where the current cursor position is located.
步骤1014:对待处理代码文本中的代码进行语法结构分析,得到待处理代码文本的基础语法树,解析基础语法树,获得各代码元素的第一代码元数据。Step 1014: Perform grammatical structure analysis on the code in the code text to be processed, obtain the basic syntax tree of the code text to be processed, parse the basic syntax tree, and obtain the first code metadata of each code element.
步骤1016:基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素,基于目标代码元素的元素标识,查询第一引用词典,获得目标代码元素的元素信息,并在查询到目标代码元素的元素信息的情况下,利用目标位置信息,更新第一引用词典中目标代码元素的位置信息,或者在未查询到目标代码元素的元素信息的情况下,将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中。Step 1016: Based on the target position information, traverse each code element recorded in the basic syntax tree to determine the target code element. Based on the element identifier of the target code element, query the first reference dictionary to obtain the element information of the target code element, and When the element information of the target code element is queried, the target position information is used to update the position information of the target code element in the first reference dictionary, or when the element information of the target code element is not queried, the position information of the target code element is updated. The element identification, element information and position information are recorded in the first reference dictionary.
步骤1018:基于目标代码元素的元素信息,查询第二引用词典,获得目标代码元素的存储路径作为目标查询信息。Step 1018: Based on the element information of the target code element, query the second reference dictionary to obtain the storage path of the target code element as the target query information.
步骤1020:从目标数据库中,查找目标查询信息对应的第二代码元数据,并基于第二代码元数据对应的目标代码元素的位置信息,对重复的第二代码元数据进行合并。Step 1020: Search the second code metadata corresponding to the target query information from the target database, and merge the repeated second code metadata based on the position information of the target code element corresponding to the second code metadata.
步骤1022:基于各第二代码元数据对应的目标代码元素的位置信息,确定各第二代码元数据的权重,基于各第二代码元数据的权重,将各第二代码元数据放入代码文本序列。Step 1022: Determine the weight of each second code metadata based on the position information of the target code element corresponding to each second code metadata, and put each second code metadata into the code text based on the weight of each second code metadata. sequence.
步骤1024:将第一代码元数据和代码文本序列输入代码生成模型,生成目标代码文本;Step 1024: Input the first code metadata and code text sequence into the code generation model to generate target code text;
步骤1026:利用目标代码文本对待处理代码文本进行补充,并在集成开发环境的前端界面上渲染。Step 1026: Use the target code text to supplement the code text to be processed, and render it on the front-end interface of the integrated development environment.
通过解析整个目标项目,通过分析代码之间的引用关系,构建高辨识性的查询信息,从目标数据库中查询实际使用的代码元数据,有效提升了提高代码引用的召回率和命中率,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,克服了模型幻觉问题,生成高准确度的目标代码文本,提升了代码文本处理的准确度。By parsing the entire target project and analyzing the reference relationships between codes, we construct highly identifiable query information and query the actually used code metadata from the target database, effectively improving the recall rate and hit rate of code references and obtaining effective results. The second code metadata is used as a reference code to guide the code generation model, which overcomes the problem of model illusion, generates high-accuracy target code text, and improves the accuracy of code text processing.
与上述方法实施例相对应,本说明书还提供了代码文本处理装置实施例,图11示出了本说明书一个实施例提供的一种代码文本处理装置的结构示意图。如图11所示,该装置包括:Corresponding to the above method embodiment, this specification also provides a code text processing device embodiment, and FIG11 shows a schematic diagram of the structure of a code text processing device provided by an embodiment of this specification. As shown in FIG11, the device includes:
获取模块1102,被配置为获取目标项目的待处理代码文本;The acquisition module 1102 is configured to obtain the code text to be processed of the target project;
第一解析模块1104,被配置为对待处理代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;The first parsing module 1104 is configured to parse the code text to be processed and obtain the first code metadata of each code element, where each code element has corresponding element information;
第一确定模块1106,被配置为根据各代码元素的元素信息,确定目标查询信息;The first determination module 1106 is configured to determine the target query information based on the element information of each code element;
第一查找模块1108,被配置为从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;The first search module 1108 is configured to search for the second code metadata corresponding to the target query information from the target database, where the corresponding relationship between the reference query information and the reference code metadata is recorded in the target database, and the reference query information is based on the target Construction of element information for each code element in the project's project file;
第一生成模块1110,被配置为基于第一代码元数据和第二代码元数据,利用代码生成模型,生成目标代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到。The first generation module 1110 is configured to use a code generation model to generate target code text based on the first code metadata and the second code metadata, where the code generation model is trained on the text processing model based on the sample code text.
可选地,该装置还包括:构建模块,被配置为获取目标项目的项目文件;对项目文件进行解析,获得各代码元素的参考代码元数据;根据各代码元素的元素信息,构建参考查询信息;基于参考查询信息和参考代码元数据的对应关系,构建目标数据库。Optionally, the device further includes: a building module configured to obtain the project file of the target project; parse the project file to obtain reference code metadata of each code element; and construct reference query information based on the element information of each code element. ;Construct the target database based on the correspondence between the reference query information and the reference code metadata.
可选地,构建模块被进一步配置为:对项目文件中的代码进行语法结构分析,得到项目文件的项目语法树;解析项目语法树,获得各代码元素的参考代码元数据。Optionally, the building module is further configured to: analyze the syntax structure of the code in the project file to obtain the project syntax tree of the project file; parse the project syntax tree to obtain the reference code metadata of each code element.
可选地,构建模块被进一步配置为:根据各代码元素的元素信息,确定各代码元素间的结构信息;基于各代码元素间的结构信息,构建各代码元素的参考代码元数据的存储路径作为参考查询信息。Optionally, the building module is further configured to: determine the structural information between each code element based on the element information of each code element; based on the structural information between each code element, construct a storage path of the reference code metadata of each code element as Reference query information.
可选地,该装置还包括:位置信息获取模块,被配置为获取待处理代码文本中待处理代码元素的目标位置信息;对应地,第一解析模块1104被进一步配置为:对待处理代码文本中的代码进行语法结构分析,得到待处理代码文本的基础语法树;解析基础语法树,获得各代码元素的第一代码元数据;对应地,第一确定模块1106被进一步配置为:基于目标位置信息,对基础语法树中记录的各代码元素进行遍历,确定目标代码元素;基于目标代码元素的元素信息,确定目标查询信息。Optionally, the device further includes: a location information acquisition module configured to obtain the target location information of the code element to be processed in the code text to be processed; correspondingly, the first parsing module 1104 is further configured to: in the code text to be processed Perform syntax structure analysis on the code to obtain the basic syntax tree of the code text to be processed; parse the basic syntax tree to obtain the first code metadata of each code element; correspondingly, the first determination module 1106 is further configured to: based on the target location information , traverse each code element recorded in the basic syntax tree to determine the target code element; based on the element information of the target code element, determine the target query information.
可选地,该装置还包括:查询模块,被配置为基于目标代码元素的元素标识,查询第一引用词典,获得目标代码元素的元素信息,其中,第一引用词典为遍历基础语法树的过程中更新得到,第一引用词典中记录有各代码元素的元素标识和元素信息的对应关系;对应地,第一确定模块1106被进一步配置为:基于目标代码元素的元素信息,查询第二引用词典,获得目标代码元素的存储路径作为目标查询信息,其中,第二引用词典中记录有各代码元素的元素信息和存储路径的对应关系。Optionally, the device further includes: a query module configured to query the first reference dictionary based on the element identifier of the target code element to obtain element information of the target code element, wherein the first reference dictionary is a process of traversing the basic syntax tree It is obtained in the update that the corresponding relationship between the element identification and element information of each code element is recorded in the first reference dictionary; correspondingly, the first determination module 1106 is further configured to: query the second reference dictionary based on the element information of the target code element. , obtain the storage path of the target code element as the target query information, wherein the corresponding relationship between the element information of each code element and the storage path is recorded in the second reference dictionary.
可选地,该装置还包括:更新模块,被配置为在查询到目标代码元素的元素信息的情况下,利用目标位置信息,更新第一引用词典中目标代码元素的位置信息。Optionally, the device further includes: an update module configured to update the position information of the target code element in the first reference dictionary using the target position information when the element information of the target code element is queried.
可选地,该装置还包括:词典记录模块,被配置为在未查询到目标代码元素的元素信息的情况下,将目标代码元素的元素标识、元素信息和位置信息记录至第一引用词典中。Optionally, the device further includes: a dictionary recording module configured to record the element identification, element information and location information of the target code element into the first reference dictionary when the element information of the target code element is not queried. .
可选地,第二代码元数据为至少一个;对应地,第一生成模块1110被进一步配置为:基于至少一个第二代码元数据对应的目标代码元素的位置信息,确定至少一个第二代码元数据的权重;基于各第二代码元数据的权重,将各第二代码元数据放入代码文本序列;将第一代码元数据和代码文本序列输入代码生成模型,生成目标代码文本。Optionally, there is at least one second code metadata; correspondingly, the first generation module 1110 is further configured to: determine at least one second code element based on the position information of the target code element corresponding to the at least one second code metadata. The weight of the data; based on the weight of each second code metadata, put each second code metadata into a code text sequence; input the first code metadata and the code text sequence into the code generation model to generate the target code text.
可选地,该装置还包括:位置信息获取模块,被配置为获取待处理代码文本中待处理代码元素的目标位置信息;对应地,第一生成模块1110被进一步配置为:基于目标位置信息和至少一个第二代码元数据对应的目标代码元素的位置信息,确定目标代码元素与待处理代码元素之间的位置距离;基于目标代码元素与待处理代码元素之间的位置距离,确定至少一个第二代码元数据的权重。Optionally, the device further includes: a location information acquisition module configured to obtain the target location information of the code element to be processed in the code text to be processed; correspondingly, the first generation module 1110 is further configured to: based on the target location information and The position information of the target code element corresponding to at least one second code metadata determines the position distance between the target code element and the code element to be processed; based on the position distance between the target code element and the code element to be processed, determines at least one third The weight of secondary code metadata.
可选地,该装置还包括:代码语言识别模块,被配置为识别待处理代码文本的代码语言;对应地,第一确定模块1106被进一步配置为:根据代码语言和各代码元素的元素信息,确定代码语言下的目标查询信息;从代码语言的目标数据库中,查找目标查询信息对应的第二代码元数据。Optionally, the device further includes: a code language identification module configured to identify the code language of the code text to be processed; correspondingly, the first determination module 1106 is further configured to: based on the code language and element information of each code element, Determine the target query information in the code language; search for the second code metadata corresponding to the target query information from the target database of the code language.
本说明书实施例中,通过预先解析目标项目的项目文件,基于各代码元素的元素信息构建出参考查询信息,与参考代码元数据对应存储在目标数据库中,在进行代码文本处理过程中,通过解析待处理代码文本,获得各代码元素的第一代码元数据,进而根据各代码元素的元素信息,确定目标查询信息,利用这种明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,生成高准确度的目标代码文本,提升了代码文本处理的准确度。In the embodiment of this specification, the project file of the target project is parsed in advance, and the reference query information is constructed based on the element information of each code element. The reference query information is stored in the target database corresponding to the reference code metadata. During the code text processing, through parsing For the code text to be processed, the first code metadata of each code element is obtained, and then the target query information is determined based on the element information of each code element. This clear code reference relationship is used to query the target database and obtain the effective second code element. The data is used as a reference code to guide the code generation model, generate high-accuracy target code text, and improve the accuracy of code text processing.
上述为本实施例的一种代码文本处理装置的示意性方案。需要说明的是,该代码文本处理装置的技术方案与上述的代码文本处理方法的技术方案属于同一构思,代码文本处理装置的技术方案未详细描述的细节内容,均可以参见上述代码文本处理方法的技术方案的描述。The above is a schematic solution of a code text processing device in this embodiment. It should be noted that the technical solution of the code text processing device and the technical solution of the above code text processing method belong to the same concept. For details that are not described in detail in the technical solution of the code text processing device, please refer to the above code text processing method. Description of the technical solution.
与上述方法实施例相对应,本说明书还提供了代码补充装置实施例,图12示出了本说明书一个实施例提供的一种代码补充装置的结构示意图。如图12所示,该装置应用于云侧设备,包括:Corresponding to the above method embodiments, this specification also provides an embodiment of a code supplement device. Figure 12 shows a schematic structural diagram of a code supplement device provided by an embodiment of this specification. As shown in Figure 12, this device is applied to cloud-side equipment, including:
接收模块1202,被配置为接收端侧设备输入的目标项目的待补充代码文本;The receiving module 1202 is configured to receive the code text to be supplemented of the target item input by the end-side device;
第二解析模块1204,被配置为对待补充代码文本进行解析,获得各代码元素的第一代码元数据,其中,各代码元素具有对应的元素信息;The second parsing module 1204 is configured to parse the code text to be supplemented and obtain the first code metadata of each code element, where each code element has corresponding element information;
第二确定模块1206,被配置为根据各代码元素的元素信息,确定目标查询信息;The second determination module 1206 is configured to determine the target query information based on the element information of each code element;
第二查找模块1208,被配置为从目标数据库中,查找目标查询信息对应的第二代码元数据,其中,目标数据库中记录有参考查询信息和参考代码元数据的对应关系,参考查询信息基于目标项目的项目文件中各代码元素的元素信息构建;The second search module 1208 is configured to search the second code metadata corresponding to the target query information from the target database, where the corresponding relationship between the reference query information and the reference code metadata is recorded in the target database, and the reference query information is based on the target Construction of element information for each code element in the project's project file;
第二生成模块1210,被配置为基于第一代码元数据和第二代码元数据,利用代码生成模型,生成补充代码文本,其中,代码生成模型基于样本代码文本对文本处理模型训练得到;The second generation module 1210 is configured to use a code generation model to generate supplementary code text based on the first code metadata and the second code metadata, where the code generation model is trained on the text processing model based on the sample code text;
发送模块1212,被配置为将补充代码文本发送至端侧设备,以使端侧设备利用补充代码文本对待补充代码文本进行补充。The sending module 1212 is configured to send the supplementary code text to the end-side device, so that the end-side device uses the supplementary code text to supplement the to-be-supplemented code text.
本说明书实施例中,通过预先解析目标项目的项目文件,基于各代码元素的元素信息构建出参考查询信息,与参考代码元数据对应存储在目标数据库中,在进行代码文本处理过程中,通过解析待处理代码文本,获得各代码元素的第一代码元数据,进而根据各代码元素的元素信息,确定目标查询信息,利用这种明确的代码引用关系,查询目标数据库,得到有效的第二代码元数据,作为参考代码来引导代码生成模型,生成高准确度的补充代码文本,提升了代码补充的准确度,同时,在高计算性能和高存储性能的云侧设备上实现,提升了代码补充的效率和准确度。In the embodiment of this specification, the project file of the target project is parsed in advance, and the reference query information is constructed based on the element information of each code element. The reference query information is stored in the target database corresponding to the reference code metadata. During the code text processing, through parsing For the code text to be processed, the first code metadata of each code element is obtained, and then the target query information is determined based on the element information of each code element. This clear code reference relationship is used to query the target database and obtain the effective second code element. The data is used as a reference code to guide the code generation model and generate high-accuracy supplementary code text, which improves the accuracy of code supplementation. At the same time, it is implemented on cloud-side devices with high computing performance and high storage performance, improving the efficiency of code supplementation. efficiency and accuracy.
上述为本实施例的一种代码补充装置的示意性方案。需要说明的是,该代码补充装置的技术方案与上述的代码补充方法的技术方案属于同一构思,代码补充装置的技术方案未详细描述的细节内容,均可以参见上述代码补充方法的技术方案的描述。The above is a schematic solution of a code supplement device in this embodiment. It should be noted that the technical solution of the code supplementing device and the technical solution of the above code supplementing method belong to the same concept. For details that are not described in detail in the technical solution of the code supplementing device, please refer to the description of the technical solution of the above code supplementing method. .
图13示出了本说明书一个实施例提供的一种计算设备的结构框图。该计算设备1300的部件包括但不限于存储器1310和处理器1320。处理器1320与存储器1310通过总线1330相连接,数据库1350用于保存数据。Figure 13 shows a structural block diagram of a computing device provided by an embodiment of this specification. Components of the computing device 1300 include, but are not limited to, memory 1310 and processor 1320 . The processor 1320 and the memory 1310 are connected through a bus 1330, and the database 1350 is used to save data.
计算设备1300还包括接入设备1340,接入设备1340使得计算设备1300能够经由一个或多个网络1360通信。这些网络的示例包括公用交换电话网(Public SwitchedTelephone Network,简称PSTN)、局域网(LocalAreaNetwork,简称LAN)、广域网(WideAreaNetwork,简称WAN)、个域网(PersonalAreaNetwork,简称PAN)或诸如因特网的通信网络的组合。接入设备1340可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(Network Interface Controller,简称NIC))中的一个或多个,诸如IEEE802.11无线局域网(Wireless Local Area Network,简称WLAN)无线接口、全球微波互联接入(WorldwideInteroperability for MicrowaveAccess,简称Wi-MAX)接口、以太网接口、通用串行总线(Universal Serial Bus,简称USB)接口、蜂窝网络接口、蓝牙接口、近场通信(Near FieldCommunication,简称NFC)。Computing device 1300 also includes an access device 1340 that enables computing device 1300 to communicate via one or more networks 1360 . Examples of these networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or communication networks such as the Internet. combination. Access device 1340 may include one or more of any type of network interface (eg, Network Interface Controller, NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (Wireless Local Area Network, for short). WLAN) wireless interface, Worldwide Interoperability for Microwave Access (Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, near field communication (Near Field Communication, referred to as NFC).
在本说明书的一个实施例中,计算设备1300的上述部件以及图13中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图13所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。In one embodiment of the present description, the above-mentioned components of the computing device 1300 and other components not shown in FIG. 13 may also be connected to each other, such as through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 13 is for illustrative purposes only and does not limit the scope of this description. Those skilled in the art can add or replace other components as needed.
计算设备1300可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或个人计算机(Personal Computer,简称PC)的静止计算设备。计算设备1300还可以是移动式或静止式的服务器。Computing device 1300 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet computer, personal digital assistant, laptop computer, notebook computer, netbook, etc.), a mobile telephone (e.g., smartphone ), wearable computing devices (e.g., smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or personal computers (PCs). Computing device 1300 may also be a mobile or stationary server.
其中,处理器1320用于执行如下计算机可执行指令,该计算机可执行指令被处理器执行时实现上述代码文本处理方法或者代码补充方法的步骤。The processor 1320 is configured to execute the following computer executable instructions. When the computer executable instructions are executed by the processor, the steps of the above code text processing method or code supplement method are implemented.
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的代码文本处理方法和代码补充方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述代码文本处理方法或者代码补充方法的技术方案的描述。The above is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the above code text processing method and the code supplement method. For details that are not described in detail in the technical solution of the computing device, please refer to the above code text processing method or Description of technical solutions for code supplementation methods.
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现上述代码文本处理方法或者代码补充方法的步骤。An embodiment of this specification also provides a computer-readable storage medium that stores computer-executable instructions. When the computer-executable instructions are executed by a processor, the steps of the above code text processing method or code supplement method are implemented.
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的代码文本处理方法和代码补充方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述代码文本处理方法或者代码补充方法的技术方案的描述。The above is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above code text processing method and the code supplement method. For details that are not described in detail in the technical solution of the storage medium, please refer to the above code text processing method or Description of technical solutions for code supplementation methods.
本说明书一实施例还提供一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行上述代码文本处理方法或者代码补充方法的步骤。An embodiment of the present specification also provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the above-mentioned code text processing method or code supplementation method.
上述为本实施例的一种计算机程序的示意性方案。需要说明的是,该计算机程序的技术方案与上述的代码文本处理方法和代码补充方法的技术方案属于同一构思,计算机程序的技术方案未详细描述的细节内容,均可以参见上述代码文本处理方法或者代码补充方法的技术方案的描述。The above is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of this computer program belongs to the same concept as the technical solution of the above-mentioned code text processing method and code supplementation method. For details that are not described in detail in the technical solution of the computer program, please refer to the above code text processing method or Description of technical solutions for code supplementation methods.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memor y,简称RAM)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据专利实践的要求进行适当的增减,例如在某些地区,根据专利实践,计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program codes, which may be in source code form, object code form, executable files or some intermediate forms, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the contents contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of patent practice. For example, in some regions, according to patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that for the convenience of description, each of the foregoing method embodiments is expressed as a series of action combinations. However, those skilled in the art should know that the embodiments of this specification are not limited by the described action sequence. limitation, because according to the embodiments of this specification, certain steps may be performed in other orders or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily necessary for the embodiments of this specification.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of this specification disclosed above are only used to help explain this specification. The optional embodiments do not describe all the details in detail, nor do they limit the invention to only the specific implementation methods described. Obviously, many modifications and changes can be made according to the content of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that technicians in the relevant technical field can well understand and use this specification. This specification is limited only by the claims and their full scope and equivalents.
Claims (15)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311613794.4A CN117806601A (en) | 2023-11-28 | 2023-11-28 | Code text processing method, code supplementing method and computing device |
| PCT/CN2024/122293 WO2025112888A1 (en) | 2023-11-28 | 2024-09-29 | Code text processing method, code supplementing method, and computing device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311613794.4A CN117806601A (en) | 2023-11-28 | 2023-11-28 | Code text processing method, code supplementing method and computing device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117806601A true CN117806601A (en) | 2024-04-02 |
Family
ID=90430872
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311613794.4A Pending CN117806601A (en) | 2023-11-28 | 2023-11-28 | Code text processing method, code supplementing method and computing device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117806601A (en) |
| WO (1) | WO2025112888A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025112888A1 (en) * | 2023-11-28 | 2025-06-05 | 阿里巴巴(中国)有限公司 | Code text processing method, code supplementing method, and computing device |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9928040B2 (en) * | 2013-11-12 | 2018-03-27 | Microsoft Technology Licensing, Llc | Source code generation, completion, checking, correction |
| CN116406459A (en) * | 2020-11-02 | 2023-07-07 | 华为云计算技术有限公司 | Code processing method, device, equipment and medium |
| CN116069808A (en) * | 2023-03-03 | 2023-05-05 | 中国工商银行股份有限公司 | Method, device and electronic device for determining dependency information of database storage process |
| CN116719520B (en) * | 2023-08-07 | 2023-11-17 | 支付宝(杭州)信息技术有限公司 | Code generation method and device |
| CN117806601A (en) * | 2023-11-28 | 2024-04-02 | 杭州阿里云飞天信息技术有限公司 | Code text processing method, code supplementing method and computing device |
-
2023
- 2023-11-28 CN CN202311613794.4A patent/CN117806601A/en active Pending
-
2024
- 2024-09-29 WO PCT/CN2024/122293 patent/WO2025112888A1/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025112888A1 (en) * | 2023-11-28 | 2025-06-05 | 阿里巴巴(中国)有限公司 | Code text processing method, code supplementing method, and computing device |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025112888A1 (en) | 2025-06-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12380277B2 (en) | Written-modality prosody subsystem in a natural language understanding (NLU) framework | |
| US11604626B1 (en) | Analyzing code according to natural language descriptions of coding practices | |
| US11334692B2 (en) | Extracting a knowledge graph from program source code | |
| CN117648079B (en) | Task processing, code completion, code question and answer, and task processing model training methods | |
| CN111831624A (en) | Data table creating method and device, computer equipment and storage medium | |
| CN118363601A (en) | Task processing method, code complement method, code processing model training method, information processing method based on code processing model and model training platform | |
| CN111104796B (en) | Method and device for translation | |
| US10460044B2 (en) | Methods and systems for translating natural language requirements to a semantic modeling language statement | |
| Hossain et al. | Natural language–Based conceptual modelling frameworks: state of the art and future opportunities | |
| WO2025112888A1 (en) | Code text processing method, code supplementing method, and computing device | |
| CN116521621A (en) | Data processing method and device, electronic equipment and storage medium | |
| CN116795960A (en) | Multi-round dialogue method and device for government affair dynamic update and electronic equipment | |
| CN120447910A (en) | Code generation method, apparatus, device and computer program product | |
| CN120258146A (en) | Question-answer pair generation method and system | |
| CN120011385A (en) | Database query method, device, equipment and medium based on natural language | |
| CN119203991A (en) | Detection methods, devices, equipment, media and products | |
| CN111783465A (en) | Named entity normalization method, system and related device | |
| CN115794858A (en) | Query statement processing method, device, equipment and storage medium | |
| CN114417826B (en) | Sentence simplification method, sentence simplification device, sentence simplification apparatus, and computer-readable storage medium | |
| US20250363152A1 (en) | Natural language processing over a document repository | |
| US20250307744A1 (en) | Intelligent explanation of configuration keys | |
| CN113901800B (en) | A method and system for extracting scene graphs from Chinese text | |
| Varagnolo et al. | Translating natural Language questions into CIDOC-CRM SPARQL queries to access cultural heritage knowledge bases | |
| WO2020162985A1 (en) | Knowledge-driven digital companion | |
| CN118093964B (en) | Method, device, electronic device and readable medium for sending query result information |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |