[go: up one dir, main page]

CN114186974B - A multi-model fusion development task association method, device, equipment and medium - Google Patents

A multi-model fusion development task association method, device, equipment and medium Download PDF

Info

Publication number
CN114186974B
CN114186974B CN202111542359.8A CN202111542359A CN114186974B CN 114186974 B CN114186974 B CN 114186974B CN 202111542359 A CN202111542359 A CN 202111542359A CN 114186974 B CN114186974 B CN 114186974B
Authority
CN
China
Prior art keywords
task
report
model
association
development
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111542359.8A
Other languages
Chinese (zh)
Other versions
CN114186974A (en
Inventor
张洋
蔡孟栾
王涛
王怀民
吴逸文
陈婷婷
邬小军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111542359.8A priority Critical patent/CN114186974B/en
Publication of CN114186974A publication Critical patent/CN114186974A/en
Application granted granted Critical
Publication of CN114186974B publication Critical patent/CN114186974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了多模型融合的开发任务关联方法,根据预设指标在协同开发社区构建活跃开源项目集;在活跃开源项目集中,利用API采集所有项目开发任务报告数据以构建备选任务报告数据库;在备选任务报告数据库中利用正则表达式抽取所有任务报告中的URL链接信息以生成任务报告数据集;在任务报告数据集中构建查询任务数据组和候选任务数据组,获得计算查询任务和每个候选任务之间的相似性得分;将查询任务和每个候选任务之间的相似性得分进行加权求和并得到每个任务报告之间的最终相似度得分,根据最终相似度得分构建基于多模型融合的开发任务关联模型,生成任务报告关联工具。

The present invention discloses a development task association method of multi-model fusion, which constructs an active open source project set in a collaborative development community according to preset indicators; in the active open source project set, all project development task report data are collected by using an API to construct an alternative task report database; in the alternative task report database, URL link information in all task reports is extracted by using a regular expression to generate a task report data set; a query task data group and a candidate task data group are constructed in the task report data set to obtain a similarity score between a query task and each candidate task; similarity scores between the query task and each candidate task are weighted and summed to obtain a final similarity score between each task report, a development task association model based on multi-model fusion is constructed according to the final similarity score, and a task report association tool is generated.

Description

Development task association method, device, equipment and medium for multi-model fusion
Technical Field
The application relates to the field of software development, in particular to a multi-model fusion development task association method, device, equipment and medium.
Background
Social programming (socialization) was first proposed by the open community GitHub, and aims to provide a developer-friendly software development environment, which helps developers to efficiently interconnect, collaborate and develop. The presence of social programming greatly enhances code multiplexing and development task resolution efficiency. The developer can participate in reporting and discussing tasks autonomously, so that task reports are often reported by different developers at different times as an important class of software development knowledge. In practice, it is often the case that two task reports contain relevant information, and the developer can link the relevant task reports together through URL links during the task discussion. In one software project, finding and correlating related task reports can provide more resources and information for developers to solve target tasks, thereby improving task solution efficiency.
Currently in collaborative development communities like Github, the approach of correlating task reports relies primarily on manual links by the developer. However, the real world linking process requires a lot of time and labor. Especially for those large-scale software projects, developers may need to find large amounts of historical task data to locate relevant tasks through their textual description information, and such manually-based association methods rely primarily on the experience and knowledge of individual developers. Therefore, how to implement an automated development task is a technical problem to be solved.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a multi-model fusion development task association method, device, equipment and medium, and aims to solve the technical problem that the prior art cannot realize automatic development task association.
In order to achieve the above object, the present invention provides a development task association method for multi-model fusion, the method comprising:
Constructing an active open source project set in the collaborative development community according to a preset index;
Collecting development task report data of all projects by using an API in the active open source project set to construct an alternative task report database;
Extracting URL link information in all task reports by using regular expressions in the alternative task report database to generate a task report data set;
Constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between a calculation query task and each candidate task;
and carrying out weighted summation on the similarity scores between the query task and each candidate task, obtaining a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the final similarity score to generate a task report association tool.
Optionally, the step of constructing the active open source project set in the collaborative development community according to the preset index includes:
In the collaborative development community Github, basic information data of the project is collected by using an API, and the flow opening source project is screened according to Star, fork, delete and the Creation time index;
and constructing an active open source item set from the screened popular open source items.
Optionally, the step of collecting development task report data of all projects by using an API in the active open source project set to construct an alternative task report database includes:
Collecting task report data of all projects in the active open source project set by utilizing an Issu API and a Pull Request (PR) API of the Github, wherein the specific data collection content comprises task ID, task processing state, submitter, task title, task description, task comment, submission time, category, label, milestone and the like;
And constructing an alternative task report database according to the collected report data.
Optionally, the step of extracting URL link information in all task reports by using regular expressions in the alternative task report database to generate a task report data set includes:
extracting URL link information in all task reports by using a regular expression in the alternative task report database;
Checking URL link information in the task report by utilizing the Cross-REFERENCED API of Github, screening out actual task report association connection, and constructing an association information reference library according to the task report association connection;
and removing the task report data which does not contain the link information from the associated information reference library to form a final task report data set.
Optionally, the step of constructing a query task data set and a candidate task data set in the task report data set, and obtaining the similarity score between the query task and each candidate task by using the structural data analysis model, the text semantic representation model and the historical relevance model, respectively, further includes:
Extracting text data in each task report data in the task report data set, wherein the text data comprises a task report title, a description and a comment;
deleting stop words, numbers, punctuation marks and other non-alphabetic characters in the text data;
the remaining words are converted to root form using the Snowball Stemmer technique in NLTK to reduce feature dimensions and unify similar words into a common representation to obtain pre-processed task report data.
Optionally, the step of constructing a query task data set and a candidate task data set in the task report data set, and obtaining a similarity score between the computing query task and each candidate task by using a structural data analysis model, a text semantic representation model and a task association network model, respectively, includes:
Selecting the latest 40% sample as a query task data set according to the creation time of a task report in the task report data set, and taking the task report data as a candidate task data set;
Calculating a structural information (Structural information) similarity Score S between the query task and each candidate task using a structural data parsing model;
Calculating a text information (Textual information) similarity Score T between the query task and each candidate task using a text semantic representation model;
Historical information (Historical information) similarity Score H between the query task and each candidate task is calculated using a historical relevance model.
Optionally, the step of weighting and summing the similarity scores between the query task and each candidate task and obtaining a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the similarity scores to generate a task report association tool includes:
Weighting and summing the similarity scores between the query task and each candidate task to obtain a final similarity score, and constructing a development task association model based on multi-model fusion according to the final similarity score;
evaluating the model by using the Top-k recall rate evaluation index and the task report data set;
and selecting an optimal sub-model weight combination according to the evaluation result to form a task report association tool.
In addition, in order to achieve the above object, the present invention also proposes a development task association apparatus for multimodal fusion, the apparatus comprising:
The project construction module is used for constructing an active open source project set in the collaborative development community according to preset indexes;
The data construction module is used for collecting development task report data of all projects in the active open source project set by using an API so as to construct an alternative task report database;
the link acquisition module is used for extracting URL link information in all task reports by using a regular expression in the alternative task report database so as to generate a task report data set;
The task calculation module is used for constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between a calculation query task and each candidate task;
and the tool generation module is used for carrying out weighted summation on the similarity scores between the query task and each candidate task, obtaining a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the final similarity score so as to generate a task report association tool.
In addition, in order to achieve the aim, the invention also provides a computer device, which comprises a memory, a processor and a multi-model fusion development task association program which is stored on the memory and can run on the processor, wherein the multi-model fusion development task association program is configured to realize the multi-model fusion development task association method.
In addition, in order to achieve the above object, the present invention also proposes a medium on which a multimodal fusion development task association program is stored, which when executed by a processor, implements the steps of the multimodal fusion development task association method as described above.
The method comprises the steps of constructing an active open source project set in a collaborative development community according to preset indexes, collecting development task report data of all projects in the active open source project set by using an API to construct an alternative task report database, extracting URL link information in all task reports in the alternative task report database by using a regular expression to generate a task report data set, constructing a query task data set and a candidate task data set in the task report data set, respectively using a structural data analysis model, a text semantic representation model and a historical relevance model to obtain similarity scores between a calculation query task and each candidate task, carrying out weighted summation on the similarity scores between the query task and each candidate task to obtain a final similarity score between each task report, constructing a development task relevance model based on multi-model fusion according to the final similarity scores, generating a task report relevance tool, realizing task report related to new task recommendation by combining with a deep learning technology, and realizing automatic development task relevance by carrying out weighted summation on the similarity scores to screen out optimal weight and final construction of the task report relevance tool.
Drawings
FIG. 1 is a schematic diagram of a multi-model converged development task association device of a hardware runtime environment in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart of a development task association method for multi-model fusion according to a first embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a development task association device structure of multi-model fusion of a hardware running environment according to an embodiment of the present invention.
As shown in FIG. 1, the multimodal fusion development task association apparatus may include a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the structure shown in FIG. 1 does not constitute a limitation of a multi-model fusion development task association apparatus, and may include more or fewer components than illustrated, or may combine certain components, or may be a different arrangement of components.
As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a multimodal fusion development task association program may be included in the memory 1005 as one type of storage medium.
In the multi-model fusion development task association device shown in fig. 1, the network interface 1004 is mainly used for carrying out data communication with a network server, the user interface 1003 is mainly used for carrying out data interaction with a user, and the processor 1001 and the memory 1005 in the multi-model fusion development task association device can be arranged in the multi-model fusion development task association device, and the multi-model fusion development task association device invokes a multi-model fusion development task association program stored in the memory 1005 through the processor 1001 and executes the multi-model fusion development task association method provided by the embodiment of the invention.
The embodiment of the invention provides a multi-model fusion development task association method, and referring to fig. 2, fig. 2 is a flow diagram of a first embodiment of the multi-model fusion development task association method.
In this embodiment, the development task association method for multi-model fusion includes the following steps:
And step S10, constructing an active open source project set in the collaborative development community according to a preset index.
In specific implementation, the API of Github is utilized to collect popular open source projects which at least contain 10 Stars, are at least 1 time by the Fork, are not deleted and are not the Fork projects, are created after 2010 and before 2021, and meanwhile, active open source projects which at least contain 100 Issues or Pull requests, at least contain 3 code contributors, and have development activities such as code submission, development task processing, contribution merging, comment submission and the like in the last 3 months are screened.
Further, the step of constructing the active open source project set in the collaborative development community according to the preset index comprises the steps of utilizing an API to collect basic information data of projects in the collaborative development community Github, screening out the flow open source projects according to Star, fork, delete and the Creation time index, and constructing the screened out popular open source projects into the active open source project set.
And S20, collecting development task report data of all projects in the active open source project set by using an API to construct an alternative task report database.
In specific implementation, based on an active open source project set, an Issureport data of all projects is collected by using an IssuAPI of Github, specific data collection contents comprise task ID, task processing state (open, closed), submitter, task title, task description, task comment, submission time, category (0 represents Issue), label, milestone and the like, PR report data of all projects is collected by using a Pull Request (PR) API of Github, specific data collection contents comprise task ID, task processing state (open, closed), submitter, task title, task description, task comment, submission time, category (1 represents PR), label, milestone and the like, and candidate task report databases are constructed by combining the Issureport data of all projects and the collected data result of PR report data of all projects.
Further, the step of collecting development task report data of all projects by using APIs in the active open source project set to construct an alternative task report database comprises the steps of collecting task report data of all projects by using an Issu API and a Pull Request (PR) API of Github in the active open source project set, wherein specific data collection content comprises task ID, task processing state, submitter, task title, task description, task comment, submission time, category, label, milestone and the like, and constructing the alternative task report database according to the collected report data.
And step S30, extracting URL link information in all task reports by using a regular expression in the alternative task report database to generate a task report data set.
In specific implementation, for the alternative task report dataset, URL link information in all task reports is extracted by using a regular expression (gitub. Com/[ a-zA-Z0-9- ]/issues |pull/[0-9] +), link information in the task reports is checked by using the Cross-REFERENCED API of Github, and an actual task report associated link is screened out, so that a task associated information reference library is constructed. In a specific work, only intra-project links are considered, and task links crossing the project are temporarily not considered. Removing task report data which does not contain link information according to the task associated information reference library to form a final task report data set;
The step of extracting URL link information in all task reports by using a regular expression in the alternative task report database to generate a task report data set comprises the steps of extracting URL link information in all task reports by using a regular expression in the alternative task report database, checking URL link information in the task report by using a Cross-REFERENCED API of Github, screening out actual task report association connection, constructing an association information reference library according to the task report association connection, and removing task report data which does not contain link information in the association information reference library to form a final task report data set.
And S40, constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between the calculation query task and each candidate task.
Further, before the step of obtaining the similarity score between the query task and each candidate task by respectively using the structural data analysis model, the text semantic representation model and the historical relevance model, the method further comprises the steps of extracting text data in each task report data in the task report data set, including titles, descriptions and comments of task reports, deleting stop words, numbers, punctuations and other non-alphabetical characters in the text data, and converting the residual words into root forms by using Snowball Stemmer technology in NLTK to reduce feature dimensions and unify similar words into one common representation so as to obtain the preprocessed task report data.
In implementations, the Snowball Stemmer technique in NLTK is used to convert the remaining words into their root form to reduce feature dimensions and unify similar words into a common representation.
Further, the step of constructing a query task data set and a candidate task data set in the task report data set, respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity Score between a query task and each candidate task comprises the steps of selecting the latest 40% sample as the query task data set according to the creation time of a task report in the task report data set, taking the task report data as the candidate task data set, calculating structural information (Structural information) similarity Score S between the query task and each candidate task by using the structural data analysis model, calculating text information (Textual information) similarity Score T between the query task and each candidate task by using the text semantic representation model, and calculating historical information (Historical information) similarity Score H between the query task and each candidate task by using the historical relevance model.
In a specific implementation, for each project, according to the creation time of the task report, the latest 40% sample is selected as the query task data set, all task report data are candidate task data sets, the structured data of the task report are extracted and analyzed, and the structural data analysis model is used for calculating the structural information similarity Score S between the query task and each candidate task (the creation time of the task is required to be earlier than that of the query task). Structured data variables of the task report include task processing status state (boolean type, "0" stands for open, "1" stands for closed), submitter submitter (text type), category label type (boolean type, "0" stands for Issue, "1" stands for PR), label (text type), milestone milestone (text type), description complexity complexity (numerical, total number of words of task report title and description), comment (numerical, comment number), etc. For a specific text type variable X, summarizing and de-duplicating all X variable values in task report data to obtain N different X values, then encoding the different X values by natural numbers { 1..N } in sequence, and establishing one-to-one mapping between the numerical values and the X text values; for a task report with multiple labels, only the first label is selected as the analysis object. Then, a feature vector { state, submitter, type, label, milestone, complexity, comment }, which characterizes its structural information, is constructed for each task report. For the structural information feature vectors V s1 and V s2 given two task reports, their structural information similarity Score S is calculated using cosine similarity, as follows:
Text data of the task report is extracted and analyzed, and a text semantic representation model is used to calculate a text information similarity Score T between the query task and each candidate task (task creation time is required earlier than the query task). Based on the preprocessed task report document data (task title, task description, and task comments), a text similarity Score B between the query task and each candidate task (task creation time needs to be earlier than the query task) is calculated using the Bert text representation model. And calling BERTClient functions in the BERT_service.client library to extract the characteristics of each sentence in the task report, wherein the characteristic vector dimension threshold can be set to be 100, 200, 500, 1000 and the like. For text information feature vectors V t1 and V t2 of a given two task report, their text information similarity Score T is calculated using cosine similarity, the calculation method is as follows:
According to the information of the task report submitter, extracting all the task report data of the historical submissions or the participation comments of the submitter, and calculating the historical information (Historical information) similarity Score H between the query task and each candidate task (the task creation time is required to be earlier than the query task) by using a historical relevance model. According to the information of the submitters of the task reports, extracting the historical participation (submission or participation comment) task report IDs of the submitters from the task report dataset, arranging the historical participation (submission or participation comment) task report IDs in a reverse order according to the submission time of the task reports, and constructing and forming a characteristic vector { ID 1,id2,…,idn } representing the historical information of the submitters of each task report. Wherein the token vector dimension threshold may be set to 100, 200, 500, 1000, etc., and the feature vector is filled with "0" if the presenter has not generated historical engagement information or the existing dimension is below the dimension threshold. Thus, for the historical information feature vectors V h1 and V h2 given two task reports, their historical information similarity Score H is calculated using cosine similarity, the calculation method is as follows:
and S50, carrying out weighted summation on the similarity scores between the query task and each candidate task, obtaining a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the final similarity score to generate a task report association tool.
Further, the step of carrying out weighted summation on the similarity scores between the query task and each candidate task to obtain a final similarity score between each task report, constructing a development task association model based on multi-model fusion according to the similarity scores to generate a task report association tool comprises the steps of carrying out weighted summation on the similarity scores between the query task and each candidate task to obtain a final similarity score, constructing a development task association model based on multi-model fusion according to the final similarity scores, evaluating the models by utilizing Top-k recall rating level evaluation indexes and the task report data set, and selecting optimal sub-model weight combinations according to evaluation results to form the task report association tool.
In specific implementation, the three sub-model similarity scores obtained above are weighted and summed to construct a development task association model based on multi-model fusion, model evaluation is performed by using various evaluation indexes, and an optimal sub-model weight combination is selected to form a final task report association tool. The specific implementation steps are as follows, the three similarity scores obtained in the step S5 are weighted and summed, the three scoring weights are A, B, C respectively, so that the final similarity Score between each task report pair is calculated, and the calculation mode is shown in the formula:
Score=A.Scores+B.ScoreT+C.ScoreH
The model is evaluated using the Top-k recall (R@k) evaluation index, and the task association information benchmark library, wherein R@k is to check if the Top-k recommendation is correct. For the task report i to be queried, R@k can calculate the following formula, and k can be 1-10 when actually evaluating:
It can be understood that the three scoring weights A, B, C are respectively given different weights (the sum is 1), performance evaluation is performed on the query task report association results of all the items according to the model evaluation indexes mentioned above, average values of R@1, R@5 and R@10 of all the items are calculated, and the three indexes are added and summed to form a final evaluation index. And selecting an optimal sub-model weight combination according to the final evaluation index, and combining the three sub-models on the basis to form a final task report association tool.
The method comprises the steps of constructing an active open source project set in a collaborative development community according to preset indexes, collecting development task report data of all projects in the active open source project set by using an API to construct an alternative task report database, extracting URL link information in all task reports in the alternative task report database by using a regular expression to generate a task report data set, constructing a query task data set and a candidate task data set in the task report data set, respectively using a structural data analysis model, a text semantic representation model and a historical relevance model to obtain similarity scores between a computing query task and each candidate task, carrying out weighted summation on the similarity scores between the query task and each candidate task to obtain a final similarity score between each task report, constructing a development task relevance model based on multi-model fusion according to the final similarity scores, generating a task report relevance tool, realizing task report related to new task recommendation by combining with a deep learning technology, carrying out weighted summation on the similarity scores to obtain optimal weights and finally constructing the task report relevance tool, and realizing automatic development task relevance.
In addition, the embodiment of the invention also provides a medium, wherein the medium is stored with a multi-model fusion development task association program, and the multi-model fusion development task association program realizes the steps of the multi-model fusion development task association method when being executed by a processor.
The embodiments or specific implementation manners of the multi-model fusion development task association device of the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (9)

1. A method of development task association for multimodal fusion, the method comprising:
Constructing an active open source project set in the collaborative development community according to a preset index;
Collecting development task report data of all projects by using an API in the active open source project set to construct an alternative task report database;
Extracting URL link information in all task reports by using regular expressions in the alternative task report database to generate a task report data set;
Constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between a calculation query task and each candidate task;
weighting and summing the similarity scores between the query task and each candidate task to obtain a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the final similarity score to generate a task report association tool;
the step of constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between a calculation query task and each candidate task comprises the following steps:
Selecting the latest 40% sample as a query task data set according to the creation time of a task report in the task report data set, and taking the task report data as a candidate task data set;
calculating a structural information similarity Score S between the query task and each candidate task using a structural data parsing model;
Calculating a text information similarity Score T between the query task and each candidate task using a text semantic representation model;
historical information similarity Score H between the query task and each candidate task is calculated using a historical relevance model.
2. The method of claim 1, wherein the step of constructing the active open source project set in the collaborative development community according to the preset index comprises:
In the collaborative development community Github, basic information data of the project is collected by using an API, and the flow opening source project is screened according to Star, fork, delete and the Creation time index;
and constructing an active open source item set from the screened popular open source items.
3. The method of claim 1, wherein the step of collecting development task report data for all projects using an API in the active open source project set to build an alternative task report database comprises:
Collecting task report data of all projects in the active open source project set by utilizing an Issu API and a Pull Request API of Github, wherein the specific data collection content comprises task ID, task processing state, submitter, task title, task description, task comment, submission time, category, label, milestone and the like;
And constructing an alternative task report database according to the collected report data.
4. The method of claim 1, wherein the step of extracting URL link information in all task reports using regular expressions in the alternative task report database to generate a task report data set comprises:
extracting URL link information in all task reports by using a regular expression in the alternative task report database;
Checking URL link information in the task report by utilizing the Cross-REFERENCED API of Github, screening out actual task report association connection, and constructing an association information reference library according to the task report association connection;
and removing the task report data which does not contain the link information from the associated information reference library to form a final task report data set.
5. The method of claim 1, wherein the step of constructing a query task data set and a candidate task data set in the task report data set, using a structural data parsing model, a text semantic representation model, and a historical relevance model, respectively, to obtain a similarity score between the computed query task and each candidate task, further comprises:
Extracting text data in each task report data in the task report data set, wherein the text data comprises a task report title, a description and a comment;
deleting stop words, numbers, punctuation marks and other non-alphabetic characters in the text data;
the remaining words are converted to root form using the Snowball Stemmer technique in NLTK to reduce feature dimensions and unify similar words into a common representation to obtain pre-processed task report data.
6. The method of claim 1, wherein the step of weighting and summing the similarity scores between the query task and each candidate task and obtaining a final similarity score between each task report, and constructing a multi-model fusion-based development task association model based on the similarity scores to generate task report association tools comprises:
Weighting and summing the similarity scores between the query task and each candidate task to obtain a final similarity score, and constructing a development task association model based on multi-model fusion according to the final similarity score;
evaluating the model by using the Top-k recall rate evaluation index and the task report data set;
and selecting an optimal sub-model weight combination according to the evaluation result to form a task report association tool.
7. A multimodal fusion development task association apparatus for implementing the multimodal fusion development task association method of any of claims 1 to 6, the apparatus comprising:
The project construction module is used for constructing an active open source project set in the collaborative development community according to preset indexes;
The data construction module is used for collecting development task report data of all projects in the active open source project set by using an API so as to construct an alternative task report database;
the link acquisition module is used for extracting URL link information in all task reports by using a regular expression in the alternative task report database so as to generate a task report data set;
The task calculation module is used for constructing a query task data set and a candidate task data set in the task report data set, and respectively utilizing a structural data analysis model, a text semantic representation model and a historical relevance model to obtain a similarity score between a calculation query task and each candidate task;
and the tool generation module is used for carrying out weighted summation on the similarity scores between the query task and each candidate task, obtaining a final similarity score between each task report, and constructing a development task association model based on multi-model fusion according to the final similarity score so as to generate a task report association tool.
8. A multimodal fusion development task association apparatus comprising a memory, a processor and a multimodal fusion development task association program stored on the memory and executable on the processor, the multimodal fusion development task association program being configured to implement the steps of the multimodal fusion development task association method of any of claims 1 to 6.
9. A medium, wherein a multimodal fusion development task related program is stored on the medium, and the multimodal fusion development task related program, when executed by a processor, implements the steps of the multimodal fusion development task related method according to any of claims 1 to 6.
CN202111542359.8A 2021-12-13 2021-12-13 A multi-model fusion development task association method, device, equipment and medium Active CN114186974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111542359.8A CN114186974B (en) 2021-12-13 2021-12-13 A multi-model fusion development task association method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111542359.8A CN114186974B (en) 2021-12-13 2021-12-13 A multi-model fusion development task association method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114186974A CN114186974A (en) 2022-03-15
CN114186974B true CN114186974B (en) 2024-12-06

Family

ID=80605332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111542359.8A Active CN114186974B (en) 2021-12-13 2021-12-13 A multi-model fusion development task association method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114186974B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118466911B (en) * 2024-07-11 2024-10-11 宁波银行股份有限公司 Recommendation method, recommendation device, recommendation equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008121737A1 (en) * 2007-03-30 2008-10-09 Amazon Technologies, Inc. Service for providing item recommendations
US10108697B1 (en) * 2013-06-17 2018-10-23 The Boeing Company Event matching by analysis of text characteristics (e-match)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019008394A1 (en) * 2017-07-07 2019-01-10 Cscout Ltd Digital information capture and retrieval

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008121737A1 (en) * 2007-03-30 2008-10-09 Amazon Technologies, Inc. Service for providing item recommendations
US10108697B1 (en) * 2013-06-17 2018-10-23 The Boeing Company Event matching by analysis of text characteristics (e-match)

Also Published As

Publication number Publication date
CN114186974A (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN112579707B (en) Log data knowledge graph construction method
CN113254507B (en) Intelligent construction and inventory method for data asset directory
US12147454B2 (en) Systems and methods for determining entity attribute representations
CN118861999B (en) A method for establishing an intelligent job matching model and matching method based on deep reinforcement learning
CN113283795B (en) Data processing method and device based on two-classification model, medium and equipment
CN110580291A (en) Intelligent search method and computer equipment based on ERP customer service knowledge map
Li et al. A hybrid model for experts finding in community question answering
CN113469752A (en) Content recommendation method and device, storage medium and electronic equipment
JP5682448B2 (en) Causal word pair extraction device, causal word pair extraction method, and causal word pair extraction program
Li et al. Detecting duplicate contributions in pull-based model combining textual and change similarities
Hao et al. Semantic patterns for user‐interactive question answering
CN111859074B (en) Network public opinion information source influence evaluation method and system based on deep learning
CN118227742A (en) Data trend analysis method, device, equipment, storage medium and program product
CN117194742A (en) Industrial software component recommendation method and system
CN111221881B (en) User characteristic data synthesis method and device and electronic equipment
CN114186974B (en) A multi-model fusion development task association method, device, equipment and medium
CN119417424A (en) A risk control system platform for preventing fraudulent bidding based on large model technology
CN117171428B (en) Method for improving accuracy of search and recommendation results
Moalla et al. Data warehouse design from social media for opinion analysis: the case of Facebook and Twitter
CN109582802B (en) Entity embedding method, device, medium and equipment
CN115374108B (en) Knowledge graph technology-based data standard generation and automatic mapping method
CN117094391A (en) Causal event pair processing method, causal event pair processing device, causal event pair processing computer equipment and storage medium
Gupta A hybrid machine learning framework of gradient boosting decision tree and sequence model for predicting escalation in customer support
CN113934818A (en) An enterprise-level question answering update method and system based on language model
CN118426817B (en) Code data asset management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant