CN118570030A

CN118570030A - Job generation method and system based on multi-source data fusion

Info

Publication number: CN118570030A
Application number: CN202411000195.XA
Authority: CN
Inventors: 李冰冰; 宋潇婧; 章红红; 李晨希; 郑娟; 庞飞天; 程桃莉
Original assignee: Anhui Qichu Education Technology Co ltd; Anhui Education Press
Current assignee: Anhui Qichu Education Technology Co ltd; Anhui Education Press
Priority date: 2024-07-24
Filing date: 2024-07-24
Publication date: 2024-08-30
Anticipated expiration: 2044-07-24
Also published as: CN118570030B

Abstract

The present invention provides a homework generation method and system based on multi-source data fusion, which relates to the field of educational technology. The method obtains multi-source heterogeneous data, obtains multi-source data that fully covers students' learning situations after processing, extracts semantic information, decomposes to obtain semantic representation vectors, and then fuses to obtain multi-source learning data, constructs student portraits, identifies each student's knowledge weaknesses and learning interests, trains homework guidance models, traverses the online point-selected content uploaded by the student end chapter by chapter, identifies keyword data of the point-selected content, generates a knowledge point architecture tree of the corresponding data database, and filters out a resource sub-data set of the knowledge point architecture tree, pushes resource data filtered according to the keyword data from the resource sub-data set, and generates personalized homework that meets each student. The present invention accurately identifies students' knowledge weaknesses and learning interests by fusing multi-source heterogeneous data, and generates personalized learning homework to improve learning effects.

Description

Job generation method and system based on multi-source data fusion

技术领域Technical Field

本发明涉及教育技术领域，具体涉及基于多源数据融合的作业生成方法与系统。The present invention relates to the field of educational technology, and in particular to a homework generation method and system based on multi-source data fusion.

背景技术Background Art

随着信息技术的不断发展和教育资源的日益丰富，教育领域正在经历一场深刻的变革。传统的教育模式主要依赖于教师的经验和教材内容，作业内容通常是统一设计，缺乏个性化和针对性，这在一定程度上制约了学生的学习效果和教学质量的提升。在这种背景下，利用大数据、人工智能和数据融合技术生成个性化的作业内容，逐渐成为教育技术领域的研究热点。With the continuous development of information technology and the increasing abundance of educational resources, the field of education is undergoing a profound change. The traditional education model mainly relies on the experience of teachers and the content of teaching materials. The homework content is usually designed in a unified way, lacking in personalization and pertinence, which to a certain extent restricts the improvement of students' learning effect and teaching quality. In this context, the use of big data, artificial intelligence and data fusion technology to generate personalized homework content has gradually become a research hotspot in the field of educational technology.

其中，传统作业生成方式存在缺乏个性化、效率低下以及反馈滞后的局限性。原因在于，传统的作业主要基于教材和教师的经验布置，难以针对每个学生的学习水平和需求进行调整，造成部分学生作业难度不适应其学习状态，影响学习效果。其次，教师需要花费大量时间和精力设计作业，检查和反馈作业结果，这在大班教学中显得尤为困难。而且，学生在完成作业后，需要等待教师批改和反馈，这一过程很可能导致错过最佳的学习纠错时间。Among them, the traditional way of generating homework has the limitations of lack of personalization, low efficiency and delayed feedback. The reason is that traditional homework is mainly based on textbooks and teachers' experience, and it is difficult to adjust according to each student's learning level and needs, resulting in some students' homework difficulty not being suitable for their learning status, affecting the learning effect. Secondly, teachers need to spend a lot of time and energy designing homework, checking and feedback on homework results, which is particularly difficult in large-class teaching. Moreover, after completing the homework, students need to wait for teachers to correct and feedback, and this process is likely to result in missing the best time for learning and error correction.

而随着教育信息化的发展，在线学习平台和学习管理系统也在不断普及。而随着大量在线学习平台涌现，这些学习平台可以记录学生详细的学习行为数据，如观看视频的时间、做题情况、课程完成度等，为个性化教育提供了数据支持。同时，学习管理系统也可以方便地管理课程内容、学生记录和学习活动数据。这些系统为收集和分析学生的学习行为提供了基础设施。With the development of educational informatization, online learning platforms and learning management systems are also becoming more and more popular. With the emergence of a large number of online learning platforms, these learning platforms can record students' detailed learning behavior data, such as the time they watch videos, the situation of answering questions, the degree of course completion, etc., providing data support for personalized education. At the same time, learning management systems can also conveniently manage course content, student records and learning activity data. These systems provide infrastructure for collecting and analyzing students' learning behaviors.

但是目前的在线学习平台和学习管理系统虽然能够在线记录学生学习情况和课程管理，但是覆盖学生学习进度及学习情况的数据来源广泛，包括但不局限于在线学习平台、课堂表现、考试成绩、教师评价等。如何将这些异构数据有效融合，是实现个性化教育的关键。尤其是如何根据每个学生的特点和需求，提供适合其学习路径的作业内容，是有效提升学生学习效率和效果的关键。However, although the current online learning platforms and learning management systems can record students' learning status and course management online, the data sources covering students' learning progress and learning status are wide-ranging, including but not limited to online learning platforms, classroom performance, test scores, teacher evaluations, etc. How to effectively integrate these heterogeneous data is the key to achieving personalized education. In particular, how to provide homework content suitable for each student's learning path based on their characteristics and needs is the key to effectively improving students' learning efficiency and effectiveness.

发明内容Summary of the invention

有鉴于此，针对上述问题，本发明提出了一种基于多源数据融合的作业生成方法与系统，旨在通过融合多源异构数据，精准识别学生的知识薄弱点和学习兴趣，生成个性化的学习作业，以提升学习效果。In view of this, in order to solve the above problems, the present invention proposes a homework generation method and system based on multi-source data fusion, which aims to accurately identify students' knowledge weaknesses and learning interests by fusing multi-source heterogeneous data, and generate personalized learning homework to improve learning effects.

本发明采用以下技术方案实现：The present invention is implemented by the following technical solutions:

第一方面，本发明提供了一种基于多源数据融合的作业生成方法，该方法包括以下步骤：In a first aspect, the present invention provides a method for generating a job based on multi-source data fusion, the method comprising the following steps:

获取用于数据融合的多源异构数据，对多源异构数据进行标准化处理后，得到全面覆盖学生学习情况的多源数据；Obtain multi-source heterogeneous data for data fusion, and after standardizing the multi-source heterogeneous data, obtain multi-source data that fully covers the students' learning situation;

将得到的多源数据转化为文本数据，根据语义知识库提取多源数据中的语义信息，投影到共享语义空间生成词向量矩阵，对词向量矩阵分解后得到语义表征向量，并对语义表征向量进行融合，得到融合后的多源学习数据；The obtained multi-source data is converted into text data, and the semantic information in the multi-source data is extracted according to the semantic knowledge base, and projected into the shared semantic space to generate a word vector matrix, and the word vector matrix is decomposed to obtain the semantic representation vector, and the semantic representation vector is fused to obtain the fused multi-source learning data;

根据融合后的多源学习数据构建学生画像，识别出每个学生的知识薄弱点和学习兴趣点；Build student portraits based on the integrated multi-source learning data to identify each student's knowledge weaknesses and learning interests;

以语义表征向量和包括知识薄弱点和学习兴趣点的学生画像为输入，以个性化作业推荐为标签，训练作业指导模型；The homework guidance model is trained using semantic representation vectors and student portraits including knowledge weaknesses and learning interests as inputs and personalized homework recommendations as labels;

逐章节遍历学生端上传的在线点选内容，识别点选内容的关键词数据，生成对应资料数据库的知识点架构树，并筛选出知识点架构树的资源子数据集；Traverse the online selected content uploaded by the student end chapter by chapter, identify the keyword data of the selected content, generate the knowledge point architecture tree of the corresponding data database, and filter out the resource sub-dataset of the knowledge point architecture tree;

基于作业指导模型和学生画像对应的知识薄弱点和学习兴趣点，从资源子数据集中推送依据所述关键词数据筛选的资源数据，生成符合每个学生的个性化作业。Based on the knowledge weaknesses and learning interests corresponding to the homework guidance model and the student portrait, resource data filtered according to the keyword data is pushed from the resource sub-dataset to generate personalized homework that suits each student.

通过上述步骤，本发明能够有效地利用多源数据融合生成符合学生个性化需求的作业，帮助学生更好地掌握知识点并提升学习兴趣和效果。Through the above steps, the present invention can effectively utilize multi-source data fusion to generate homework that meets the personalized needs of students, helping students to better master knowledge points and improve their learning interest and effect.

作为本发明的进一步方案，获取的多源异构数据包括但不限于：As a further solution of the present invention, the acquired multi-source heterogeneous data includes but is not limited to:

学生的课堂表现数据（如出勤率、课堂参与度等）。Students’ classroom performance data (such as attendance, class participation, etc.).

学生的考试成绩数据（如期中考试、期末考试等）。Students' test score data (such as midterm exams, final exams, etc.).

学生的作业完成情况数据（如作业完成时间、正确率等）。Students' homework completion data (such as homework completion time, accuracy rate, etc.).

学生的在线学习数据（如在线课程观看记录、在线测试结果等）。Students’ online learning data (such as online course viewing records, online test results, etc.).

学生的学习行为数据（如学习时间分布、学习路径等）。Students’ learning behavior data (such as learning time distribution, learning path, etc.).

作为本发明的进一步方案，对多源异构数据进行标准化处理时，还包括对多源异构数据进行插值和偏差订正处理，其中，采用反距离权重插值将多源异构数据插值到指定分辨率的网格上，使用历史数据对当前数据进行偏差订正，插值时，插值点的估计值计算公式为：式中，为插值点处的估计值，为参与插值计算的已知数据点的总数，是从1到的索引，用于标记每一个已知数据点，为第个已知点的数值，为插值点与第个已知点之间的距离，为权重指数。As a further solution of the present invention, when standardizing multi-source heterogeneous data, it also includes interpolation and deviation correction processing of the multi-source heterogeneous data, wherein the multi-source heterogeneous data is interpolated to a grid of a specified resolution using inverse distance weighted interpolation, and the deviation of the current data is corrected using historical data. During interpolation, the estimated value calculation formula of the interpolation point is: In the formula, Interpolation point The estimated value at is the total number of known data points involved in the interpolation calculation, From 1 to The index is used to mark each known data point. For the The value of the known point, is the interpolation point and The distance between known points, is the weight index.

其中，插值点的估计值是已知点值的加权平均值，权重是已知点与插值点之间距离的倒数的次幂，其中，的计算公式中，分子为对所有已知点值按照其权重进行加权求和，分母为计算所有已知点权重的总和，估计值为分子除以分母，得到插值点处的估计值。Among them, the interpolation point Estimated value of is a known point value The weighted average of the known points and the interpolation points is the inverse of the distance between them. Power, where In the calculation formula, the numerator For all known point values According to its weight Perform weighted summation, the denominator To calculate the sum of the weights of all known points, the estimated value Divide the numerator by the denominator to get the interpolation point The estimated value at .

作为本发明的进一步方案，根据语义知识库提取多源数据中的语义信息时，基于现有的WordNet、DBpedia和ConceptNet知识库，选择和构建覆盖多源异构数据领域的语义知识库；应用Stanford NER在标准化处理后的多源数据上识别出命名实体，并将识别出的实体与语义知识库中的条目进行匹配，在识别出的实体上进行词义消歧，于语义知识库提取实体和概念之间的语义关系，构建和生成包含实体和关系的语义图谱。通过以上详细步骤，可以系统化地从多源数据中提取语义信息，生成准确且结构清晰的语义图谱。每个步骤都有明确的执行方式，并且通过权利要求书保证了方法的创新性和合法性。As a further solution of the present invention, when extracting semantic information from multi-source data based on a semantic knowledge base, a semantic knowledge base covering the field of multi-source heterogeneous data is selected and constructed based on the existing WordNet, DBpedia and ConceptNet knowledge bases; Stanford NER is used to identify named entities on the standardized multi-source data, and the identified entities are matched with entries in the semantic knowledge base, word sense disambiguation is performed on the identified entities, and the semantic relationships between entities and concepts are extracted from the semantic knowledge base, and a semantic graph containing entities and relationships is constructed and generated. Through the above detailed steps, semantic information can be systematically extracted from multi-source data to generate an accurate and clearly structured semantic graph. Each step has a clear execution method, and the innovativeness and legality of the method are guaranteed by the claims.

作为本发明的进一步方案，投影到共享语义空间生成词向量矩阵，对词向量矩阵分解后得到语义表征向量，包括：As a further solution of the present invention, projecting to a shared semantic space to generate a word vector matrix, and decomposing the word vector matrix to obtain a semantic representation vector, including:

将提取语义信息后的文本数据输入预训练的词向量模型，生成对应的词向量；Input the text data after extracting semantic information into the pre-trained word vector model to generate the corresponding word vector;

将词向量映射到一个共享的语义空间，使用对抗训练对齐不同源的词向量；Map word vectors to a shared semantic space and use adversarial training to align word vectors from different sources;

将对齐后的词向量组合构建形成完整的词向量矩阵；The aligned word vectors are combined to form a complete word vector matrix;

对词向量矩阵进行中心化处理，减去每一列的均值，应用奇异值分解将词向量矩阵分解为三个矩阵U、Σ、，其中：式中，是一个m×n的词向量矩阵；U是一个m×n的正交矩阵，U列向量是的左奇异向量；Σ是一个 m×n的对角矩阵，Σ对角线上的元素是的奇异值；是一个 m×n的正交矩阵，行向量是的右奇异向量的转置；The word vector matrix is centered, the mean of each column is subtracted, and the singular value decomposition is applied to decompose the word vector matrix into three matrices U, Σ, ,in: In the formula, is an m×n word vector matrix; U is an m×n orthogonal matrix, and the column vector of U is The left singular vector of ; Σ is an m×n diagonal matrix, and the elements on the diagonal of Σ are The singular values of is an m×n orthogonal matrix, The row vector is The transpose of the right singular vectors of ;

选取前k个奇异值及其对应的奇异向量，得到低维语义表征向量；选取前k个奇异值后，公式表示为：式中，是U的前k列，大小m×k；是Σ的前k个奇异值，对应一个k×k的对角矩阵；是的前k行，大小为k×n；使用分解后的U矩阵中的前k列，作为词的低维语义表征向量。Select the first k singular values and their corresponding singular vectors to obtain a low-dimensional semantic representation vector; after selecting the first k singular values, the formula is expressed as: In the formula, is the first k columns of U, of size m×k; are the first k singular values of Σ, corresponding to a k×k diagonal matrix; yes The first k rows of the decomposed U matrix are of size k×n; the first k columns of the decomposed U matrix are used as the low-dimensional semantic representation vector of the word.

其中，所述词向量模型为预训练的Word2Vec模型，通过使用Word2Vec在语料库上进行训练得到，对抗训练时，还包括设计的一个对抗训练框架，该对抗训练框架包括生成器和判别器，生成器负责将词向量转换到共享语义空间，判别器则评估词向量是否来自同一数据源，通过反复训练生成器和判别器，使不同源的词向量在共享语义空间中具有一致性。Among them, the word vector model is a pre-trained Word2Vec model, which is obtained by training on a corpus using Word2Vec. During adversarial training, it also includes a designed adversarial training framework, which includes a generator and a discriminator. The generator is responsible for converting the word vector to a shared semantic space, and the discriminator evaluates whether the word vector comes from the same data source. By repeatedly training the generator and the discriminator, the word vectors from different sources are consistent in the shared semantic space.

通过上述步骤，可以将多源数据中的词向量对齐到共享语义空间，并通过奇异值分解方法生成低维语义表征向量，这一过程确保了不同来源的数据能够在统一的语义空间中进行比较和分析，提供了高效且准确的语义表征。Through the above steps, the word vectors in multi-source data can be aligned to a shared semantic space, and a low-dimensional semantic representation vector can be generated through the singular value decomposition method. This process ensures that data from different sources can be compared and analyzed in a unified semantic space, providing efficient and accurate semantic representation.

作为本发明的进一步方案，构建学生画像时，构建包含知识水平、学习习惯、兴趣爱好多维特征的学生画像，将融合后的多源学习数据分解成多维的特征子集，建立知识点映射矩阵：式中，表示第个学习活动涉及第个知识点的程度；进行知识点掌握度计算，计算每个学生对各知识点的掌握度：其中，是学生对各知识点的掌握度矩阵，其中，为分解成的特征子集；As a further solution of the present invention, when constructing a student portrait, a student portrait containing multi-dimensional features of knowledge level, learning habits, and hobbies is constructed, and the fused multi-source learning data is Decompose into multi-dimensional feature subsets and establish a knowledge point mapping matrix: In the formula, Indicates The learning activities involve The degree of mastery of each knowledge point is calculated; the degree of mastery of each knowledge point is calculated: in, is the matrix of students’ mastery of each knowledge point, where is the feature subset decomposed into;

薄弱点识别，设定一个掌握度阈值，识别出掌握度低于阈值的知识点：式中，W是学生的知识薄弱点集合；使用K-means对学习兴趣特征子集进行聚类，识别出不同的兴趣点，对聚类结果进行标注，识别出每个兴趣点的含义；Identify weaknesses and set a mastery threshold , identify the knowledge points whose mastery level is below the threshold: In the formula, W is the set of students' weak points in knowledge; K-means is used to cluster the learning interest feature subsets, identify different points of interest, annotate the clustering results, and identify the meaning of each point of interest;

将上述计算结果整合，生成每个学生的详细画像报告。Integrate the above calculation results to generate a detailed portrait report for each student.

作为本发明的进一步方案，训练作业指导模型时，将生成的语义表征向量与对应的语义标签配对，形成训练数据集模型进行训练；其中，作业语义表征向量作为输入，知识薄弱点和学习兴趣点对应的语义信息作为标签；选择Transformer模型并使用生成的训练数据集进行模型训练，得到作业指导模型，利用训练好的模型生成个性化作业指导。As a further solution of the present invention, when training the homework guidance model, the generated semantic representation vector is paired with the corresponding semantic label to form a training data set model for training; wherein the homework semantic representation vector is used as input, and the semantic information corresponding to the knowledge weaknesses and learning interest points is used as the label; the Transformer model is selected and the generated training data set is used for model training to obtain the homework guidance model, and the trained model is used to generate personalized homework guidance.

作为本发明的进一步方案，生成知识点架构树时，根据章节内容和提取的关键词，初步建立知识点架构树的层级结构，通过语义分析和关联规则挖掘，确定各知识点之间的关系，去除冗余节点，优化架构树结构，将知识点与资料数据库中的资源进行匹配，根据得到的知识点架构树筛选出与其对应的资源子数据集。As a further solution of the present invention, when generating a knowledge point architecture tree, a hierarchical structure of the knowledge point architecture tree is initially established based on the chapter content and extracted keywords, and the relationship between each knowledge point is determined through semantic analysis and association rule mining, redundant nodes are removed, and the architecture tree structure is optimized. The knowledge points are matched with the resources in the data database, and the corresponding resource sub-datasets are filtered out based on the obtained knowledge point architecture tree.

第二方面，本发明还包括一种基于多源数据融合的作业生成系统，该系统包括：In a second aspect, the present invention also includes a job generation system based on multi-source data fusion, the system comprising:

多源数据获取模块：用于获取多源异构数据，并对获取的多源异构数据进行标准化处理，生成全面覆盖学生学习情况的多源数据；Multi-source data acquisition module: used to acquire multi-source heterogeneous data and perform standardization on the acquired multi-source heterogeneous data to generate multi-source data that fully covers students' learning situations;

语义信息提取模块：用于将标准化处理后的多源数据转化为文本数据，根据语义知识库，提取多源数据中的语义信息，并投影到共享语义空间，生成词向量矩阵；Semantic information extraction module: used to convert the standardized multi-source data into text data, extract the semantic information in the multi-source data according to the semantic knowledge base, and project it into the shared semantic space to generate a word vector matrix;

词向量分解及融合模块：用于对词向量矩阵进行分解，得到语义表征向量，并对这些向量进行融合，生成融合后的多源学习数据；Word vector decomposition and fusion module: used to decompose the word vector matrix to obtain semantic representation vectors, and fuse these vectors to generate fused multi-source learning data;

学生画像构建模块：基于融合后的多源学习数据，构建学生画像，识别每个学生的知识薄弱点和学习兴趣点；Student portrait construction module: Based on the integrated multi-source learning data, student portraits are constructed to identify each student’s knowledge weaknesses and learning interests;

作业指导模型训练模块：用于以语义表征向量为输入，以对应的语义信息为标签，结合学生画像及其知识薄弱点和学习兴趣点，训练作业指导模型；Homework guidance model training module: used to train the homework guidance model by taking the semantic representation vector as input and the corresponding semantic information as label, combined with the student portrait, their knowledge weaknesses and learning interests;

在线内容识别模块：用于逐章节遍历学生端上传的在线点选内容，识别点选内容的关键词数据；Online content recognition module: used to traverse the online selected content uploaded by the student end chapter by chapter and identify the keyword data of the selected content;

架构树生成模块：用于根据识别出的关键词数据，生成对应的资料数据库的知识点架构树，并筛选出知识点架构树的资源子数据集；Architecture tree generation module: used to generate a knowledge point architecture tree of the corresponding data database according to the identified keyword data, and filter out resource sub-datasets of the knowledge point architecture tree;

作业生成模块：用于基于作业指导模型和学生画像对应的知识薄弱点和学习兴趣点，从资源子数据集中筛选资源数据，生成符合每个学生的个性化作业。Homework generation module: used to filter resource data from the resource sub-dataset based on the homework guidance model and the knowledge weaknesses and learning interests corresponding to the student portrait, and generate personalized homework for each student.

作为本发明的进一步方案，该作业生成系统还包括数据推送模块，用于将生成的个性化作业推送至学生端，确保每个学生都能收到符合其学习需求和兴趣的作业。As a further solution of the present invention, the homework generation system also includes a data push module for pushing the generated personalized homework to the student end, ensuring that each student can receive homework that meets his or her learning needs and interests.

通过上述模块的协同工作，本发明的作业生成系统能够高效、准确地生成个性化的学习作业，帮助学生更好地掌握知识，提升学习效果。Through the collaborative work of the above modules, the homework generation system of the present invention can efficiently and accurately generate personalized learning homework, helping students to better master knowledge and improve learning effects.

本发明还包括一种计算机设备，包括：至少一个处理器，以及与所述至少一个处理器通信连接的存储器，其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行所述的基于多源数据融合的作业生成方法。The present invention also includes a computer device, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor executes the job generation method based on multi-source data fusion.

本发明还包括一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令用于使所述计算机执行所述的基于多源数据融合的作业生成方法。The present invention also includes a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to execute the job generation method based on multi-source data fusion.

与现有技术相比，本发明提供的基于多源数据融合的作业生成方法与系统，具有以下有益效果：Compared with the prior art, the job generation method and system based on multi-source data fusion provided by the present invention has the following beneficial effects:

1.通过获取和融合多源异构数据，如学术成绩、课堂表现、作业成绩、在线学习记录等，全面覆盖学生的学习情况，有助于准确评估学生的学习状态，消除不同数据源之间的差异后，得到全面覆盖学生学习情况的多源数据。1. By acquiring and integrating multi-source heterogeneous data, such as academic performance, classroom performance, homework grades, online learning records, etc., it comprehensively covers the students' learning situation, helps to accurately evaluate the students' learning status, and after eliminating the differences between different data sources, obtains multi-source data that comprehensively covers the students' learning situation.

2.实现了语义信息提取和多源数据融合。利用语义知识库提取多源数据中的语义信息，并投影到共享语义空间，生成词向量矩阵，通过分解和融合生成语义表征向量，确保数据中的重要信息能够被充分利用。2. Semantic information extraction and multi-source data fusion are realized. Semantic information in multi-source data is extracted using the semantic knowledge base and projected into a shared semantic space to generate a word vector matrix. Semantic representation vectors are generated through decomposition and fusion to ensure that important information in the data can be fully utilized.

3.能够构建个性化学生画像和作业指导模型。基于融合后的多源学习数据，构建详细的学生画像，识别每个学生的知识薄弱点和学习兴趣点，提供了个性化学习的基础；以语义表征向量为输入，结合学生画像及其知识薄弱点和学习兴趣点，训练精确的作业指导模型，确保生成的作业能够针对学生的具体需求和兴趣。3. Ability to build personalized student portraits and homework guidance models. Based on the integrated multi-source learning data, a detailed student portrait is built to identify each student's knowledge weaknesses and learning interests, providing a basis for personalized learning; using semantic representation vectors as input, combined with student portraits and their knowledge weaknesses and learning interests, an accurate homework guidance model is trained to ensure that the generated homework can target students' specific needs and interests.

4.实现了在线内容的智能识别与资源数据筛选，针对性地生成作业。通过逐章节遍历学生端上传的在线点选内容，识别点选内容的关键词数据，生成对应的知识点架构树，确保作业内容与学生实际学习内容的高度相关性。基于生成的知识点架构树和学生画像，从资源子数据集中筛选出最适合的资源数据，确保学生获得的作业内容是最契合其学习需求的。通过作业指导模型和学生画像，生成符合每个学生的个性化作业，帮助学生更有针对性地进行学习，提升学习效果。4. Intelligent identification of online content and resource data screening are realized, and homework is generated in a targeted manner. By traversing the online selected content uploaded by the student end chapter by chapter, identifying the keyword data of the selected content, and generating the corresponding knowledge point architecture tree, the homework content is ensured to be highly relevant to the students' actual learning content. Based on the generated knowledge point architecture tree and student portrait, the most suitable resource data is screened from the resource sub-dataset to ensure that the homework content obtained by the students is the most suitable for their learning needs. Through the homework guidance model and student portrait, personalized homework is generated for each student, helping students to study more targetedly and improve learning effects.

综上所述，本发明的基于多源数据融合的作业生成方法与系统通过全面的数据获取、智能的语义分析、精准的作业指导和高效的个性化作业生成，显著提升了作业生成的准确性和个性化程度，促进了学生的学习效果和教育质量的提升。In summary, the homework generation method and system based on multi-source data fusion of the present invention significantly improves the accuracy and personalization of homework generation through comprehensive data acquisition, intelligent semantic analysis, precise homework guidance and efficient personalized homework generation, and promotes the improvement of students' learning effects and education quality.

本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。These and other aspects of the present invention will become more concise and understandable in the following description of the embodiments. It should be understood that the above general description and the following detailed description are only exemplary and explanatory and cannot limit the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或相关技术中的技术方案，下面将对示例性实施例或相关技术描述中所需要使用的附图作一简单地介绍，附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制。在附图中：In order to more clearly illustrate the technical solutions in the embodiments of the present invention or related technologies, the following briefly introduces the drawings required for use in the exemplary embodiments or related technical descriptions. The drawings are used to provide a further understanding of the present invention and constitute a part of the specification. Together with the embodiments of the present invention, they are used to explain the present invention and do not constitute a limitation to the present invention. In the drawings:

图1为本发明实施例的基于多源数据融合的作业生成方法的流程图。FIG1 is a flow chart of a method for generating a job based on multi-source data fusion according to an embodiment of the present invention.

图2为本发明实施例的基于多源数据融合的作业生成方法中分解得到语义表征向量的流程图。FIG. 2 is a flow chart of decomposing and obtaining a semantic representation vector in a job generation method based on multi-source data fusion according to an embodiment of the present invention.

图3为本发明实施例的基于多源数据融合的作业生成方法中构建学生画像的流程图。FIG3 is a flowchart of constructing a student portrait in a homework generation method based on multi-source data fusion according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

在本发明的说明书和权利要求书及上述附图中的描述的一些流程中，包含了按照特定顺序出现的多个操作，但是应该清楚了解，这些操作可以不按照其在本文中出现的顺序来执行或并行执行，操作的序号如101、102等，仅仅是用于区分开各个不同的操作，序号本身不代表任何的执行顺序。另外，这些流程可以包括更多或更少的操作，并且这些操作可以按顺序执行或并行执行。需要说明的是，本文中的“第一”、“第二”等描述，是用于区分不同的消息、设备、模块等，不代表先后顺序，也不限定“第一”和“第二”是不同的类型。In some of the processes described in the specification and claims of the present invention and the above-mentioned figures, multiple operations that appear in a specific order are included, but it should be clearly understood that these operations may not be executed in the order in which they appear in this article or executed in parallel. The serial numbers of the operations, such as 101, 102, etc., are only used to distinguish between different operations, and the serial numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed in sequence or in parallel. It should be noted that the descriptions of "first", "second", etc. in this article are used to distinguish different messages, devices, modules, etc., do not represent the order of precedence, and do not limit the "first" and "second" to be different types.

下面将结合本发明示例性实施例中的附图，对本发明示例性实施例中的技术方案进行清楚、完整地描述，显然，所描述的示例性实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the exemplary embodiments of the present invention to clearly and completely describe the technical solutions in the exemplary embodiments of the present invention. Obviously, the described exemplary embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of the present invention.

为了根据每个学生的特点和需求，提供适合其学习路径的作业内容，本发明提供的一种基于多源数据融合的作业生成方法与系统，通过融合多源异构数据，精准识别学生的知识薄弱点和学习兴趣，生成个性化的学习作业，以提升学习效果。In order to provide homework content suitable for the learning path of each student according to his/her characteristics and needs, the present invention provides a homework generation method and system based on multi-source data fusion, which accurately identifies students' knowledge weaknesses and learning interests by fusing multi-source heterogeneous data, and generates personalized learning homework to improve learning effects.

下面结合具体实施例对本发明的技术方案作进一步的说明：The technical solution of the present invention is further described below in conjunction with specific embodiments:

参阅图1所示，图1为本发明提供的一种基于多源数据融合的作业生成方法的流程图。本发明一个实施例中提供的一种基于多源数据融合的作业生成方法，包括以下步骤：Refer to Figure 1, which is a flow chart of a method for generating a job based on multi-source data fusion provided by the present invention. A method for generating a job based on multi-source data fusion provided in one embodiment of the present invention comprises the following steps:

步骤S10、获取用于数据融合的多源异构数据，对多源异构数据进行标准化处理后，得到全面覆盖学生学习情况的多源数据。Step S10: Acquire multi-source heterogeneous data for data fusion, and standardize the multi-source heterogeneous data to obtain multi-source data that fully covers the student's learning situation.

该步骤中，获取的多源异构数据包括但不限于：In this step, the multi-source heterogeneous data obtained include but are not limited to:

示例性的，在通过本实施例基于多源数据融合的方法生成个性化作业时，在选定的某一学期中，获取了某个学生（小明）的以下多源异构数据：For example, when generating personalized homework by the method based on multi-source data fusion in this embodiment, the following multi-source heterogeneous data of a student (Xiao Ming) is obtained in a selected semester:

1.课堂表现数据：1. Classroom performance data:

出勤率：95%；Attendance rate: 95%;

课堂参与度：高；Class participation: High;

2.考试成绩数据：2. Test score data:

期中考试成绩：85分；Midterm exam score: 85 points;

期末考试成绩：88分；Final exam score: 88 points;

3.作业完成情况数据：3. Job completion data:

平均作业完成时间：40分钟；Average homework completion time: 40 minutes;

平均正确率：90%；Average accuracy: 90%;

4.在线学习数据：4. Online learning data:

在线课程观看记录：完成率80%；Online course viewing record: completion rate 80%;

在线测试结果：85%；Online test results: 85%;

5.学习行为数据：5. Learning behavior data:

学习时间分布：每天晚上7点到9点；Study time distribution: 7 to 9 pm every day;

学习路径：先复习课堂笔记，再完成作业，最后进行在线测试。Learning path: Review class notes first, then complete assignments, and finally take an online test.

在本实施例中，对多源异构数据进行标准化处理时，还包括对多源异构数据进行插值和偏差订正处理，其中，采用反距离权重插值将多源异构数据插值到指定分辨率的网格上，使用历史数据对当前数据进行偏差订正，插值时，插值点的估计值计算公式为：式中，为插值点处的估计值，为参与插值计算的已知数据点的总数，是从1到的索引，用于标记每一个已知数据点，为第个已知点的数值，为插值点与第个已知点之间的距离，为权重指数。其中，插值点的估计值是已知点值的加权平均值，权重是已知点与插值点之间距离的倒数的次幂，其中，的计算公式中，分子为对所有已知点值按照其权重进行加权求和，分母为计算所有已知点权重的总和，估计值为分子除以分母，得到插值点处的估计值。In this embodiment, when standardizing multi-source heterogeneous data, it also includes interpolation and deviation correction processing of the multi-source heterogeneous data, wherein the multi-source heterogeneous data is interpolated to a grid of a specified resolution using inverse distance weighted interpolation, and the deviation of the current data is corrected using historical data. During interpolation, the estimated value calculation formula of the interpolation point is: In the formula, Interpolation point The estimated value at is the total number of known data points involved in the interpolation calculation, From 1 to The index is used to mark each known data point. For the The value of the known point, is the interpolation point and The distance between known points, is the weight index. Among them, the interpolation point Estimated value of is a known point value The weighted average of the known points and the interpolation points is the inverse of the distance between them. Power, where In the calculation formula, the numerator For all known point values According to its weight Perform weighted summation, the denominator To calculate the sum of the weights of all known points, the estimated value Divide the numerator by the denominator to get the interpolation point The estimated value at .

步骤S20、将得到的多源数据转化为文本数据，根据语义知识库提取多源数据中的语义信息，投影到共享语义空间生成词向量矩阵，对词向量矩阵分解后得到语义表征向量，并对语义表征向量进行融合，得到融合后的多源学习数据。Step S20, convert the obtained multi-source data into text data, extract semantic information from the multi-source data according to the semantic knowledge base, project it into a shared semantic space to generate a word vector matrix, decompose the word vector matrix to obtain a semantic representation vector, and fuse the semantic representation vectors to obtain fused multi-source learning data.

在该步骤中，根据语义知识库提取多源数据中的语义信息时，基于现有的WordNet、DBpedia和ConceptNet知识库，选择和构建覆盖多源异构数据领域的语义知识库；应用Stanford NER在标准化处理后的多源数据上识别出命名实体，并将识别出的实体与语义知识库中的条目进行匹配，在识别出的实体上进行词义消歧，于语义知识库提取实体和概念之间的语义关系，构建和生成包含实体和关系的语义图谱。In this step, when extracting semantic information from multi-source data based on the semantic knowledge base, a semantic knowledge base covering the field of multi-source heterogeneous data is selected and constructed based on the existing WordNet, DBpedia and ConceptNet knowledge bases; Stanford NER is used to identify named entities on the standardized multi-source data, and the identified entities are matched with the entries in the semantic knowledge base, word sense disambiguation is performed on the identified entities, the semantic relations between entities and concepts are extracted from the semantic knowledge base, and a semantic graph containing entities and relations is constructed and generated.

通过以上详细步骤，可以系统化地从多源数据中提取语义信息，生成准确且结构清晰的语义图谱。每个步骤都有明确的执行方式，并且通过权利要求书保证了方法的创新性和合法性。Through the above detailed steps, semantic information can be systematically extracted from multi-source data to generate accurate and clearly structured semantic maps. Each step has a clear execution method, and the innovation and legality of the method are guaranteed by the claims.

示例性的，将多源数据转化为文本数据时，可以将上述收集到的多源数据转换为结构化的文本表示：小明的出勤率是95%。他的课堂参与度很高。期中考试成绩是85分，期末考试成绩是88分。作业完成时间平均为40分钟，正确率为90%。他完成了80%的在线课程，在线测试结果是85%。小明每天晚上7点到9点学习，学习路径是先复习课堂笔记，然后完成作业，最后进行在线测试。Exemplarily, when converting multi-source data into text data, the multi-source data collected above can be converted into a structured text representation: Xiao Ming's attendance rate is 95%. His class participation is very high. The midterm exam score is 85 points, and the final exam score is 88 points. The average homework completion time is 40 minutes, and the accuracy rate is 90%. He completed 80% of the online courses, and the online test result is 85%. Xiao Ming studies from 7 to 9 pm every day, and his learning path is to review class notes first, then complete homework, and finally take online tests.

在本实施例中，参见图2所示，投影到共享语义空间生成词向量矩阵，对词向量矩阵分解后得到语义表征向量，包括：In this embodiment, as shown in FIG. 2 , a word vector matrix is generated by projecting into a shared semantic space, and a semantic representation vector is obtained by decomposing the word vector matrix, including:

步骤S201、将提取语义信息后的文本数据输入预训练的词向量模型，生成对应的词向量；Step S201: input the text data after semantic information is extracted into a pre-trained word vector model to generate a corresponding word vector;

步骤S202、将词向量映射到一个共享的语义空间，使用对抗训练对齐不同源的词向量；Step S202: Map the word vectors to a shared semantic space and use adversarial training to align word vectors from different sources;

步骤S203、将对齐后的词向量组合构建形成完整的词向量矩阵；Step S203: Combining the aligned word vectors to form a complete word vector matrix;

步骤S204、对词向量矩阵进行中心化处理，减去每一列的均值，应用奇异值分解将词向量矩阵分解为三个矩阵U、Σ、，其中：式中，是一个m×n的词向量矩阵；U是一个 m×n的正交矩阵，U列向量是的左奇异向量；Σ是一个 m×n的对角矩阵，Σ对角线上的元素是的奇异值；是一个 m×n的正交矩阵，行向量是的右奇异向量的转置；Step S204: Center the word vector matrix, subtract the mean of each column, and apply singular value decomposition to decompose the word vector matrix into three matrices U, Σ, ,in: In the formula, is an m×n word vector matrix; U is an m×n orthogonal matrix, and the column vector of U is The left singular vector of ; Σ is an m×n diagonal matrix, and the elements on the diagonal of Σ are The singular values of is an m×n orthogonal matrix, The row vector is The transpose of the right singular vectors of ;

步骤S205、选取前k个奇异值及其对应的奇异向量，得到低维语义表征向量；选取前k个奇异值后，公式表示为：式中，是U的前k列，大小m×k；是Σ的前k个奇异值，对应一个k×k的对角矩阵；是的前k行，大小为k×n；步骤S206、使用分解后的U矩阵中的前k列，作为词的低维语义表征向量。Step S205: Select the first k singular values and their corresponding singular vectors to obtain a low-dimensional semantic representation vector; after selecting the first k singular values, the formula is expressed as: In the formula, is the first k columns of U, of size m×k; are the first k singular values of Σ, corresponding to a k×k diagonal matrix; yes The first k rows of U are of size k×n; Step S206: Use the first k columns in the decomposed U matrix as the low-dimensional semantic representation vector of the word.

通过上述步骤，通过上述步骤，小明的多源数据被成功投影到一个共享的语义空间，并生成了低维的语义表征向量。这些表征向量可以进一步用于生成个性化作业，确保作业内容能够准确反映小明的学习情况和需求。可以将多源数据中的词向量对齐到共享语义空间，并通过奇异值分解方法生成低维语义表征向量，这一过程确保了不同来源的数据能够在统一的语义空间中进行比较和分析，提供了高效且准确的语义表征。Through the above steps, Xiao Ming's multi-source data is successfully projected into a shared semantic space, and a low-dimensional semantic representation vector is generated. These representation vectors can be further used to generate personalized homework to ensure that the homework content can accurately reflect Xiao Ming's learning situation and needs. The word vectors in the multi-source data can be aligned to the shared semantic space, and the low-dimensional semantic representation vector is generated through the singular value decomposition method. This process ensures that data from different sources can be compared and analyzed in a unified semantic space, providing efficient and accurate semantic representation.

步骤S30、根据融合后的多源学习数据构建学生画像，识别出每个学生的知识薄弱点和学习兴趣点。Step S30: construct a student portrait based on the integrated multi-source learning data to identify each student's knowledge weaknesses and learning interests.

在该步骤中，参见图3所示，构建学生画像时，包括以下步骤：In this step, as shown in FIG3 , when constructing a student portrait, the following steps are included:

步骤S301、构建包含知识水平、学习习惯、兴趣爱好多维特征的学生画像，将融合后的多源学习数据分解成多维的特征子集，建立知识点映射矩阵：式中，表示第个学习活动涉及第个知识点的程度；Step S301: Construct a student portrait containing multi-dimensional features including knowledge level, learning habits, and hobbies, and combine the fused multi-source learning data Decompose into multi-dimensional feature subsets and establish a knowledge point mapping matrix: In the formula, Indicates The learning activities involve The degree of knowledge points;

步骤S302、进行知识点掌握度计算，计算每个学生对各知识点的掌握度：Step S302: Calculate the mastery of knowledge points, and calculate the mastery of each knowledge point by each student:

其中，是学生对各知识点的掌握度矩阵，其中，为分解成的特征子集；步骤S303、薄弱点识别，设定一个掌握度阈值，识别出掌握度低于阈值的知识点：式中，W是学生的知识薄弱点集合；in, is the matrix of students’ mastery of each knowledge point, where Step S303: weak point identification, setting a mastery threshold , identify the knowledge points whose mastery level is below the threshold: In the formula, W is the set of students’ knowledge weaknesses;

步骤S304、使用K-means对学习兴趣特征子集进行聚类，识别出不同的兴趣点，对聚类结果进行标注，识别出每个兴趣点的含义；Step S304: clustering the learning interest feature subset using K-means to identify different points of interest, annotating the clustering results, and identifying the meaning of each point of interest;

步骤S305、将上述计算结果整合，生成每个学生的详细画像报告。Step S305: Integrate the above calculation results to generate a detailed portrait report for each student.

示例性的，建立知识点映射矩阵，表示不同学习活动对于不同知识点的涉及程度时，假设知识点如下：知识点A（代数）；Exemplarily, when establishing a knowledge point mapping matrix to represent the degree of involvement of different learning activities with different knowledge points, it is assumed that the knowledge points are as follows: Knowledge point A (algebra);

知识点B（几何）；Knowledge point B (Geometry);

知识点C（微积分）。Knowledge point C (Calculus).

知识点映射矩阵如下表示：The knowledge point mapping matrix is expressed as follows:

那么，计算学生小明对各知识点的掌握度时，掌握度为：Then, when calculating the mastery of each knowledge point by student Xiao Ming, the mastery is:

掌握度=课堂表现×0.8+考试成绩×0.9+作业完成情况×0.85+在线学习×0.8。Mastery = classroom performance × 0.8 + test score × 0.9 + homework completion × 0.85 + online learning × 0.8.

若小明的计算结果为：知识点A 0.85；知识点B 0.75；知识点C 0.65，设定掌握度阈值为0.70，识别出掌握度低于阈值的知识点。对学习兴趣特征（课堂参与度和在线课程观看记录）进行K-means聚类，得到聚类结果如下：If Xiao Ming's calculation results are: knowledge point A 0.85; knowledge point B 0.75; knowledge point C 0.65, set the mastery threshold to 0.70 and identify the knowledge points whose mastery is below the threshold. Perform K-means clustering on the learning interest features (class participation and online course viewing records), and the clustering results are as follows:

聚类1：高参与度和高完成率；Cluster 1: high engagement and high completion rate;

聚类2：中等参与度和中等完成率；Cluster 2: Moderate engagement and moderate completion rate;

假设小明被聚类到聚类1，代表小明对较高参与度和高完成率的内容感兴趣，那么，整合结果，生成的小明的详细画像报告为：Assume that Xiaoming is clustered into cluster 1, which means that Xiaoming is interested in content with high engagement and high completion rate. Then, after integrating the results, the detailed portrait report of Xiaoming is generated as follows:

知识水平：较好（但在知识点C上较弱）；Knowledge level: Good (but weak in knowledge point C);

学习习惯：出勤率高，学习时间集中在晚上，学习路径有条理；Study habits: high attendance, study time concentrated in the evening, and study path organized;

兴趣爱好：对高参与度和高完成率的内容感兴趣。Interests: Interested in content with high engagement and high completion rates.

通过上述流程，本发明可以为小明生成一份个性化的作业，既能巩固他的薄弱知识点，又能激发他的学习兴趣，不仅能提高学生的学习效果，还能让学习过程更有趣、更高效。Through the above process, the present invention can generate a personalized homework for Xiao Ming, which can not only consolidate his weak knowledge points, but also stimulate his interest in learning, which can not only improve the student's learning effect, but also make the learning process more interesting and efficient.

步骤S40、以语义表征向量和包括知识薄弱点和学习兴趣点的学生画像为输入，以个性化作业推荐为标签，训练作业指导模型。Step S40: Using the semantic representation vector and the student portrait including knowledge weaknesses and learning interests as input, and using personalized homework recommendations as labels, a homework guidance model is trained.

该步骤中，训练作业指导模型时，将生成的语义表征向量与对应的语义标签配对，形成训练数据集模型进行训练；其中，作业语义表征向量作为输入，知识薄弱点和学习兴趣点对应的语义信息作为标签；选择Transformer模型并使用生成的训练数据集进行模型训练，得到作业指导模型，利用训练好的模型生成个性化作业指导。In this step, when training the homework instruction model, the generated semantic representation vector is paired with the corresponding semantic label to form a training data set model for training; wherein the homework semantic representation vector is used as input, and the semantic information corresponding to the knowledge weaknesses and learning interest points is used as the label; the Transformer model is selected and the generated training data set is used for model training to obtain the homework instruction model, and the trained model is used to generate personalized homework instructions.

步骤S50、逐章节遍历学生端上传的在线点选内容，识别点选内容的关键词数据，生成对应资料数据库的知识点架构树，并筛选出知识点架构树的资源子数据集。Step S50, traverse the online selected content uploaded by the student end chapter by chapter, identify the keyword data of the selected content, generate a knowledge point architecture tree of the corresponding data database, and filter out the resource sub-dataset of the knowledge point architecture tree.

该步骤中，生成知识点架构树时，根据章节内容和提取的关键词，初步建立知识点架构树的层级结构，通过语义分析和关联规则挖掘，确定各知识点之间的关系，去除冗余节点，优化架构树结构，将知识点与资料数据库中的资源进行匹配，根据得到的知识点架构树筛选出与其对应的资源子数据集。In this step, when generating the knowledge point architecture tree, the hierarchical structure of the knowledge point architecture tree is initially established based on the chapter content and extracted keywords. Through semantic analysis and association rule mining, the relationship between each knowledge point is determined, redundant nodes are removed, the architecture tree structure is optimized, and the knowledge points are matched with the resources in the data database. According to the obtained knowledge point architecture tree, the corresponding resource sub-datasets are filtered out.

步骤S60、基于作业指导模型和学生画像对应的知识薄弱点和学习兴趣点，从资源子数据集中推送依据所述关键词数据筛选的资源数据，生成符合每个学生的个性化作业。Step S60: Based on the homework guidance model and the knowledge weaknesses and learning interests corresponding to the student portrait, resource data filtered according to the keyword data is pushed from the resource sub-data set to generate personalized homework that suits each student.

示例性的，逐章节遍历学生端上传的在线点选内容，生成知识点架构树并筛选资源子数据集中，若识别点选内容的关键词数据时，假设在某章节中，小明上传了关于“微积分”的在线点选内容，这些内容包括：界定积分、微分、牛顿-莱布尼茨公式。通过自然语言处理提取出以下关键词：For example, the online selected content uploaded by the student end is traversed chapter by chapter, a knowledge point architecture tree is generated and the resource sub-data set is filtered. When identifying the keyword data of the selected content, suppose that in a certain chapter, Xiao Ming uploaded online selected content about "calculus", which includes: defining integrals, differentials, and Newton-Leibniz formula. The following keywords are extracted through natural language processing:

（1）界定积分；(1) Define the integral;

（2）微分；(2) Differentiation;

（3）牛顿-莱布尼茨公式。(3) Newton-Leibniz formula.

生成知识点架构树时，结合章节内容和提取的关键词，初步建立知识点架构树的层级结构：When generating the knowledge point architecture tree, combine the chapter content and the extracted keywords to initially establish the hierarchical structure of the knowledge point architecture tree:

微积分Calculus

├── 界定积分├── Define integral

├── 微分├── Differentiation

└── 牛顿-莱布尼茨公式└── Newton-Leibniz formula

然后进行语义分析和关联规则挖掘，通过语义分析和关联规则挖掘，确定各知识点之间的关系，得到：Then, semantic analysis and association rule mining are carried out to determine the relationship between each knowledge point and obtain:

界定积分与微分密切相关，可以通过牛顿-莱布尼茨公式联系在一起。Defining integrals is closely related to differentials and can be linked together via the Newton-Leibniz formula.

然后，优化后的知识点架构树如下：Then, the optimized knowledge point architecture tree is as follows:

微积分Calculus

├── 界定积分├── Define integral

│└── 牛顿-莱布尼茨公式│└── Newton-Leibniz formula

└── 微分└── Differentiation

└── 牛顿-莱布尼茨公式└── Newton-Leibniz formula

随后将知识点与资料数据库中的资源进行匹配，匹配结果为：Then the knowledge points are matched with the resources in the data database, and the matching results are:

（1）界定积分：匹配到一些习题集、讲解视频等资源。(1) Define points: match them to some exercise books, explanation videos and other resources.

（2）微分：匹配到相关的在线课程、练习题等资源。(2) Differentiation: Match to relevant online courses, exercises and other resources.

（3）牛顿-莱布尼茨公式：匹配到详细的理论讲解和应用实例等资源。(3) Newton-Leibniz formula: Matched with detailed theoretical explanations, application examples and other resources.

最后，根据得到的知识点架构树筛选出与其对应的资源子数据集，资源子数据集为：Finally, the corresponding resource sub-datasets are selected based on the obtained knowledge point architecture tree. The resource sub-datasets are:

├── 界定积分├── Define integral

│├── 习题集│├── Exercise Book

│└── 讲解视频│└── Explanation video

├── 微分├── Differentiation

│├── 在线课程│├── Online Courses

│└── 练习题│└── Exercises

└── 牛顿-莱布尼茨公式└── Newton-Leibniz formula

├── 理论讲解├── Theoretical explanation

└── 应用实例└── Application Examples

最后，根据小明的学生画像：Finally, according to Xiao Ming’s student portrait:

知识薄弱点：知识点C (知识点C包含界定积分和微分)；Knowledge weaknesses: Knowledge point C (Knowledge point C includes defining integrals and differentials);

学习兴趣点：高参与度内容。Learning interest points: High engagement content.

基于作业指导模型和学生画像推送资源，生成个性化作业，从资源子数据集中选取适合小明的资源，对于小明的知识薄弱点（界定积分和微分），可以推送：界定积分的习题集和讲解视频，以及微分的在线课程和练习题。同时，考虑到小明的学习兴趣，可以选择比较互动性强的资源，如有趣的讲解视频和互动性的在线课程。最终生成的个性化作业如下：Based on the homework guidance model and student portrait push resources, generate personalized homework, select resources suitable for Xiao Ming from the resource sub-dataset, and push the following to Xiao Ming's weak points in knowledge (defined integrals and differentials): defined integrals exercise books and explanation videos, as well as differentials online courses and exercises. At the same time, considering Xiao Ming's learning interests, more interactive resources can be selected, such as interesting explanation videos and interactive online courses. The personalized homework generated in the end is as follows:

1.界定积分：1. Define the integral:

▪习题集：完成第3章第1-10题。▪Exercise Book: Complete Questions 1-10 in Chapter 3.

▪讲解视频：观看“界定积分的应用”视频并回答相关问题。▪Explanatory video: Watch the video “Application of Defined Integrals” and answer the related questions.

2.微分：2. Differentiation:

▪在线课程：完成第2节“微分的基本概念”并参与在线测验。▪Online Course: Complete Section 2 “Basic Concepts of Differentiation” and take the online quiz.

▪练习题：完成第4章练习题第1-15题。▪Exercises: Complete Exercises 1-15 in Chapter 4.

3.牛顿-莱布尼茨公式：3. Newton-Leibniz formula:

▪理论讲解：阅读“牛顿-莱布尼茨公式及其应用”的讲义。▪Theoretical explanation: Read the lecture notes on “Newton-Leibniz formula and its applications”.

▪应用实例：完成相关的应用实例题，并提交报告。▪Application Examples: Complete relevant application example questions and submit a report.

通过上述步骤，本发明为小明生成了一份详细且个性化的作业指导，确保小明能够针对自己的薄弱点进行有效的学习，同时保持对学习的高兴趣度。这种方法不仅提高了学习的针对性和有效性，还能激发学生的学习动力。Through the above steps, the present invention generates a detailed and personalized homework guide for Xiao Ming, ensuring that Xiao Ming can effectively study for his weaknesses while maintaining a high interest in learning. This method not only improves the pertinence and effectiveness of learning, but also stimulates students' learning motivation.

本发明的一种基于多源数据融合的作业生成方法，相较于目前提出的在线学习平台和学习管理系统而言，本发明通过获取和融合多源异构数据，如学术成绩、课堂表现、作业成绩、在线学习记录等，全面覆盖学生的学习情况，有助于准确评估学生的学习状态，消除不同数据源之间的差异后，得到全面覆盖学生学习情况的多源数据。利用语义知识库提取多源数据中的语义信息，并投影到共享语义空间，生成词向量矩阵，通过分解和融合生成语义表征向量，确保数据中的重要信息能够被充分利用。基于融合后的多源学习数据，构建详细的学生画像，识别每个学生的知识薄弱点和学习兴趣点，提供了个性化学习的基础；以语义表征向量为输入，结合学生画像及其知识薄弱点和学习兴趣点，训练精确的作业指导模型，确保生成的作业能够针对学生的具体需求和兴趣。Compared with the currently proposed online learning platform and learning management system, the present invention obtains and fuses multi-source heterogeneous data, such as academic performance, classroom performance, homework performance, online learning records, etc., to comprehensively cover the learning situation of students, which is helpful to accurately evaluate the learning status of students. After eliminating the differences between different data sources, multi-source data that comprehensively covers the learning situation of students is obtained. The semantic information in the multi-source data is extracted using a semantic knowledge base, and projected into a shared semantic space to generate a word vector matrix. The semantic representation vector is generated by decomposition and fusion to ensure that the important information in the data can be fully utilized. Based on the fused multi-source learning data, a detailed student portrait is constructed to identify each student's knowledge weaknesses and learning interests, providing a basis for personalized learning; with the semantic representation vector as input, combined with the student portrait and its knowledge weaknesses and learning interests, an accurate homework guidance model is trained to ensure that the generated homework can target the specific needs and interests of students.

本发明的基于多源数据融合的作业生成方法，还能够通过逐章节遍历学生端上传的在线点选内容，识别点选内容的关键词数据，生成对应的知识点架构树，确保作业内容与学生实际学习内容的高度相关性。基于生成的知识点架构树和学生画像，从资源子数据集中筛选出最适合的资源数据，确保学生获得的作业内容是最契合其学习需求的。通过作业指导模型和学生画像，生成符合每个学生的个性化作业，帮助学生更有针对性地进行学习，提升学习效果。The homework generation method based on multi-source data fusion of the present invention can also traverse the online clicked content uploaded by the student end chapter by chapter, identify the keyword data of the clicked content, generate the corresponding knowledge point architecture tree, and ensure the high correlation between the homework content and the actual learning content of the students. Based on the generated knowledge point architecture tree and student portrait, the most suitable resource data is screened out from the resource sub-dataset to ensure that the homework content obtained by the students is the most suitable for their learning needs. Through the homework guidance model and student portrait, personalized homework that meets each student is generated to help students learn more targeted and improve learning effects.

应该理解的是，上述虽然是按照某一顺序描述的，但是这些步骤并不是必然按照上述顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，本实施例的一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although described in a certain order, these steps are not necessarily performed in sequence in the above order. Unless there is clear explanation in this article, the execution of these steps does not have strict order restriction, and these steps can be performed in other orders. Moreover, a part of the steps of the present embodiment may include a plurality of steps or a plurality of stages, and these steps or stages are not necessarily performed at the same time, but can be performed at different times, and the execution order of these steps or stages is not necessarily performed in sequence, but can be performed in turn or alternately with at least a part of the steps or stages in other steps or other steps.

在一个实施例中，本发明提供了一种基于多源数据融合的作业生成系统，用于执行上述基于多源数据融合的作业生成方法，该系统包括：In one embodiment, the present invention provides a job generation system based on multi-source data fusion, which is used to execute the above-mentioned job generation method based on multi-source data fusion, and the system includes:

学生画像构建模块：基于融合后的多源学习数据，构建学生画像，识别每个学生的知识薄弱点和学习兴趣点；Student portrait construction module: Based on the integrated multi-source learning data, student portraits are constructed to identify each student's knowledge weaknesses and learning interests;

作业生成模块：用于基于作业指导模型和学生画像对应的知识薄弱点和学习兴趣点，从资源子数据集中筛选资源数据，生成符合每个学生的个性化作业；Homework generation module: used to filter resource data from the resource sub-dataset based on the homework guidance model and the knowledge weaknesses and learning interests corresponding to the student profile, and generate personalized homework for each student;

数据推送模块，用于将生成的个性化作业推送至学生端，确保每个学生都能收到符合其学习需求和兴趣的作业。The data push module is used to push the generated personalized homework to the student end, ensuring that each student can receive homework that meets their learning needs and interests.

本发明的基于多源数据融合的作业生成系统，通过上述模块的协同工作，能够高效、准确地生成个性化的学习作业，帮助学生更好地掌握知识，提升学习效果。The homework generation system based on multi-source data fusion of the present invention can efficiently and accurately generate personalized learning homework through the collaborative work of the above modules, helping students to better master knowledge and improve learning effects.

在本实施例中，基于多源数据融合的作业生成系统在执行时采用如前述的一种基于多源数据融合的作业生成方法的步骤，因此，本实施例中对基于多源数据融合的作业生成系统的运行过程不再详细介绍。In this embodiment, the job generation system based on multi-source data fusion adopts the steps of a job generation method based on multi-source data fusion as mentioned above during execution. Therefore, the operation process of the job generation system based on multi-source data fusion is no longer introduced in detail in this embodiment.

在一个实施例中，在本发明的实施例中还提供了一种计算机设备，包括至少一个处理器，以及与所述至少一个处理器通信连接的存储器，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行所述的基于多源数据融合的作业生成方法的步骤。In one embodiment, a computer device is also provided in an embodiment of the present invention, comprising at least one processor and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor performs the steps of the job generation method based on multi-source data fusion.

在一个实施例中，本发明还提供了一种计算机可读存储介质，计算机可读存储介质存储有计算机指令，所述计算机指令用于使所述计算机执行所述的基于多源数据融合的作业生成方法的步骤。In one embodiment, the present invention further provides a computer-readable storage medium storing computer instructions, wherein the computer instructions are used to enable the computer to execute the steps of the job generation method based on multi-source data fusion.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机指令表征的计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。A person skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be realized by instructing the relevant hardware through a computer program represented by computer instructions, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to memory, storage, database or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory.

非易失性存储器可包括只读存储器、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器或外部高速缓冲存储器。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器或动态随机存取存储器等。Non-volatile memory may include read-only memory, magnetic tape, floppy disk, flash memory or optical storage, etc. Volatile memory may include random access memory or external cache memory. As an illustration and not limitation, RAM may be in various forms, such as static random access memory or dynamic random access memory, etc.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for generating a job based on multi-source data fusion, characterized in that the method comprises the following steps:

Obtain multi-source heterogeneous data for data fusion, and after standardizing the multi-source heterogeneous data, obtain multi-source data that fully covers the students' learning situation;

The obtained multi-source data is converted into text data, and the semantic information in the multi-source data is extracted according to the semantic knowledge base, and projected into the shared semantic space to generate a word vector matrix, and the word vector matrix is decomposed to obtain the semantic representation vector, and the semantic representation vector is fused to obtain the fused multi-source learning data;

Build student portraits based on the integrated multi-source learning data to identify each student's knowledge weaknesses and learning interests;

The homework guidance model is trained using semantic representation vectors and student portraits including knowledge weaknesses and learning interests as inputs and personalized homework recommendations as labels;

Traverse the online selected content uploaded by the student end chapter by chapter, identify the keyword data of the selected content, generate the knowledge point architecture tree of the corresponding data database, and filter out the resource sub-dataset of the knowledge point architecture tree;

Based on the knowledge weaknesses and learning interests corresponding to the homework guidance model and the student portrait, resource data filtered according to the keyword data is pushed from the resource sub-dataset to generate personalized homework that suits each student.

2. The homework generation method based on multi-source data fusion as described in claim 1 is characterized in that the acquired multi-source heterogeneous data includes students' classroom performance data, test score data, homework completion data, online learning data and learning behavior data.

3. The method for generating a job based on multi-source data fusion as claimed in claim 2 is characterized in that, when standardizing the multi-source heterogeneous data, it also includes interpolating and correcting the deviation of the multi-source heterogeneous data, wherein the multi-source heterogeneous data is interpolated onto a grid of a specified resolution using inverse distance weighted interpolation, and the deviation of the current data is corrected using historical data. During interpolation, the estimated value calculation formula of the interpolation point is: In the formula, Interpolation point The estimated value at is the total number of known data points involved in the interpolation calculation, From 1 to The index is used to mark each known data point. For the The value of the known point, is the interpolation point and The distance between known points, is the weight index.

4. The method for generating a job based on multi-source data fusion according to claim 3, characterized in that the interpolation point Estimated value of is a known point value The weighted average of the known points and the interpolation points is the inverse of the distance between them. Power, where In the calculation formula, the numerator For all known point values According to its weight Perform weighted summation, the denominator To calculate the sum of the weights of all known points, the estimated value Divide the numerator by the denominator to get the interpolation point The estimated value at .

5. The method for generating a job based on multi-source data fusion as described in claim 4 is characterized in that, when extracting semantic information from multi-source data according to a semantic knowledge base, a semantic knowledge base covering the field of multi-source heterogeneous data is selected and constructed based on the existing WordNet, DBpedia and ConceptNet knowledge bases; Stanford NER is used to identify named entities on the standardized multi-source data, and the identified entities are matched with entries in the semantic knowledge base, word sense disambiguation is performed on the identified entities, and semantic relationships between entities and concepts are extracted from the semantic knowledge base, and a semantic graph containing entities and relationships is constructed and generated.

6. The method for generating a job based on multi-source data fusion according to claim 5, characterized in that the projecting to the shared semantic space generates a word vector matrix, and the word vector matrix is decomposed to obtain a semantic representation vector, comprising:

Input the text data after extracting semantic information into the pre-trained word vector model to generate the corresponding word vector;

Map word vectors to a shared semantic space and use adversarial training to align word vectors from different sources;

The aligned word vectors are combined to form a complete word vector matrix;

The word vector matrix is centered, the mean of each column is subtracted, and the singular value decomposition is applied to decompose the word vector matrix into three matrices U, Σ, ; Select the first k singular values and their corresponding singular vectors to obtain a low-dimensional semantic representation vector;

Use the first k columns in the decomposed U matrix as the low-dimensional semantic representation vector of the word.

7. The job generation method based on multi-source data fusion as described in claim 6 is characterized in that the word vector model is a pre-trained Word2Vec model, which is obtained by training on a corpus using Word2Vec. During adversarial training, it also includes a designed adversarial training framework, which includes a generator and a discriminator. The generator is responsible for converting the word vector to a shared semantic space, and the discriminator evaluates whether the word vector comes from the same data source.

8. The method for generating homework based on multi-source data fusion according to claim 7, characterized in that when constructing a student portrait, it includes:

Construct a student portrait that includes multi-dimensional features such as knowledge level, learning habits, and interests and hobbies, decompose the fused multi-source learning data into multi-dimensional feature subsets, and establish a knowledge point mapping matrix;

Calculate the mastery of knowledge points and calculate the mastery of each knowledge point by each student;

Set a mastery threshold and identify knowledge points whose mastery is below the threshold;

Use K-means to cluster the learning interest feature subsets, identify different points of interest, annotate the clustering results, and identify the meaning of each point of interest;

Integrate the above calculation results to generate a detailed portrait report for each student.

9. The method for generating a job based on multi-source data fusion as described in claim 8 is characterized in that, when training a job instruction model, the generated semantic representation vector is paired with the corresponding semantic label to form a training data set model for training; wherein the job semantic representation vector is used as input, and the semantic information corresponding to the knowledge weaknesses and learning interest points is used as the label; a Transformer model is selected and the generated training data set is used for model training to obtain a job instruction model, and the trained model is used to generate personalized job instructions.

10. A job generation system based on multi-source data fusion, characterized in that it is used to execute the job generation method based on multi-source data fusion according to any one of claims 1 to 9, and the job generation system based on multi-source data fusion comprises:

Multi-source data acquisition module: used to acquire multi-source heterogeneous data and perform standardization on the acquired multi-source heterogeneous data to generate multi-source data that fully covers students' learning situations;

Semantic information extraction module: used to convert the standardized multi-source data into text data, extract the semantic information in the multi-source data according to the semantic knowledge base, and project it into the shared semantic space to generate a word vector matrix;

Word vector decomposition and fusion module: used to decompose the word vector matrix to obtain semantic representation vectors, and fuse these vectors to generate fused multi-source learning data;

Student portrait construction module: Based on the integrated multi-source learning data, student portraits are constructed to identify each student’s knowledge weaknesses and learning interests;

Homework guidance model training module: used to train the homework guidance model by taking the semantic representation vector as input and the corresponding semantic information as label, combined with the student portrait, their knowledge weaknesses and learning interests;

Online content recognition module: used to traverse the online selected content uploaded by the student end chapter by chapter and identify the keyword data of the selected content;

Architecture tree generation module: used to generate a knowledge point architecture tree of the corresponding data database according to the identified keyword data, and filter out resource sub-datasets of the knowledge point architecture tree;

Homework generation module: used to filter resource data from the resource sub-dataset based on the homework guidance model and the knowledge weaknesses and learning interests corresponding to the student portrait, and generate personalized homework that suits each student.