CN112417002B

CN112417002B - Information literacy data mining method and system applied to education informatization

Info

Publication number: CN112417002B
Application number: CN202011306182.7A
Authority: CN
Inventors: 吴砥; 朱莎; 徐建; 吴晨
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2023-04-07
Anticipated expiration: 2040-11-19
Also published as: CN112417002A

Abstract

The invention discloses an information literacy data mining method, system, computer equipment and storage medium applied to education informatization. The method includes the steps of: collecting the information literacy test data, index structure data and index parameter data input by the testee; constructing an information literacy model generation program, and the information literacy model generation program is used to receive the index structure data and the index Parameter data, generate an information literacy model based on the Bayesian network according to the index structure data and the index parameter data; construct an information literacy data mining program, and the information literacy data mining program is used to receive the information literacy model and test Data, output information literacy test results. The invention can accurately express the dependency relationship between complex data with computer language, has strong interpretation ability, thereby improving the accuracy of data mining and high data value.

Description

Information literacy data mining method and system applied to educational informatization

技术领域Technical Field

本发明属于数据挖掘技术领域，更具体地，涉及一种应用于教育信息化的信息素养数据挖掘方法及系统。The present invention belongs to the technical field of data mining, and more specifically, relates to an information literacy data mining method and system applied to educational informatization.

背景技术Background Art

在信息化高速发展的浪潮下，教育领域已经与计算机技术充分结合并演变出丰富多彩的形式。不同于以往地依赖人工经验进行的数据分析，现代化的信息素养数据挖掘方法是利用计算机技术从海量的测试大数据中充分挖掘出有价值的信息，实现测量数据的可视化和自动化分析，是数据挖掘技术在教育信息化领域的具体应用，具有广泛的应用场景和极高的应用价值。然而，现有信息素养数据挖掘方法中，数据挖掘模型简单，灵活性不足，无法对复杂数据之间的依赖关系用计算机语言准确表达出来，解释能力不强，从而导致挖掘出来的数据不能准确客观地反应数据真实状况，数据价值低。Under the wave of rapid development of informatization, the field of education has been fully integrated with computer technology and evolved into a variety of forms. Different from the previous data analysis that relied on manual experience, the modern information literacy data mining method uses computer technology to fully mine valuable information from massive test big data, realizes the visualization and automatic analysis of measurement data, and is a specific application of data mining technology in the field of educational informatization. It has a wide range of application scenarios and extremely high application value. However, in the existing information literacy data mining methods, the data mining model is simple and lacks flexibility. It is impossible to accurately express the dependency relationship between complex data in computer language, and the interpretation ability is not strong, which leads to the fact that the mined data cannot accurately and objectively reflect the real situation of the data, and the data value is low.

发明内容Summary of the invention

针对现有技术的至少一个缺陷或改进需求，本发明提供了一种应用于教育信息化的信息素养数据挖掘方法、系统、计算机设备及存储介质，能够对复杂数据之间的依赖关系用计算机语言准确表达出来，解释能力强，从而提高挖掘出来的数据的准确性，数据价值高。In response to at least one defect or improvement need in the prior art, the present invention provides an information literacy data mining method, system, computer equipment and storage medium applied to educational informatization, which can accurately express the dependencies between complex data in computer language and has strong interpretation ability, thereby improving the accuracy of the mined data and increasing the data value.

为实现上述目的，按照本发明的第一方面，提供了一种应用于教育信息化的信息素养数据挖掘方法，包括：To achieve the above-mentioned purpose, according to a first aspect of the present invention, there is provided an information literacy data mining method applied to educational informatization, comprising:

采集被测试者输入的信息素养测试数据，将所述测试数据存储为成绩数据矩阵表；Collecting information literacy test data input by the testee, and storing the test data as a score data matrix table;

采集专家输入的指标结构数据，将所述指标结构数据存储为JSON格式；Collect the indicator structure data input by the expert, and store the indicator structure data in JSON format;

采集专家输入的指标参数数据，将所述指标参数数据存储为JSON格式；Collect the indicator parameter data input by the expert, and store the indicator parameter data in JSON format;

构建信息素养模型生成程序，所述信息素养模型生成程序用于接收所述指标结构数据和所述指标参数数据，根据所述指标结构数据和所述指标参数数据生成基于贝叶斯网络的信息素养模型；Constructing an information literacy model generation program, wherein the information literacy model generation program is used to receive the indicator structure data and the indicator parameter data, and generate an information literacy model based on a Bayesian network according to the indicator structure data and the indicator parameter data;

构建信息素养数据挖掘程序，所述信息素养数据挖掘程序用于接收所述信息素养模型和所述成绩数据矩阵表，输出信息素养测试结果。An information literacy data mining program is constructed, wherein the information literacy data mining program is used to receive the information literacy model and the score data matrix table, and output information literacy test results.

优选的，所述信息素养测试数据包括多个测试项，每个测试项包括多个等级；Preferably, the information literacy test data includes a plurality of test items, and each test item includes a plurality of levels;

所述成绩数据矩阵表中为n×m矩阵，n为被测试者总数量，m为测试项总数量，所述成绩数据矩阵表的每行代表一个被测试者的所有测试项的测试数据，所述成绩数据矩阵表的每列代表一个测试项的所有被测试者的测试数据。The performance data matrix table is an n×m matrix, where n is the total number of test subjects and m is the total number of test items. Each row of the performance data matrix table represents the test data of all test items of a test subject, and each column of the performance data matrix table represents the test data of all test subjects of a test item.

优选的，所述指标结构数据包括测试项、每个测试项所代表的指标以及指标依赖关系，所述将所述指标结构数据存储为JSON格式包括步骤：Preferably, the indicator structure data includes test items, indicators represented by each test item, and indicator dependencies, and storing the indicator structure data in JSON format includes the steps of:

将测试项或指标作为JSON对象的键，将测试项所代表的指标或指标所依赖的指标作为JSON对象的值。The test item or indicator is used as the key of the JSON object, and the indicator represented by the test item or the indicator on which the indicator depends is used as the value of the JSON object.

优选的，所述指标参数包括先验概率参数和条件概率参数，所述先验概率参数是指没有依赖关系的指标为各等级的概率值，所述条件概率参数是指有依赖关系的指标为各等级的概率值，所述将所述指标参数数据存储为JSON格式包括步骤：Preferably, the indicator parameters include a priori probability parameters and conditional probability parameters, the priori probability parameters refer to the probability values of indicators without dependencies at each level, and the conditional probability parameters refer to the probability values of indicators with dependencies at each level, and storing the indicator parameter data in JSON format includes the steps of:

将没有依赖关系的指标存储为先验概率参数对象的键，将没有依赖关系的指标为各等级的概率值存储为所述先验概率参数对象的值；The indicators without dependency are stored as keys of a priori probability parameter object, and the probability values of the indicators without dependency at each level are stored as values of the priori probability parameter object;

将有依赖关系的指标或测试项存储为条件概率参数对象的键，所述条件概率参数对象的值为所依赖对象的列表，所述所依赖对象的键为所依赖指标，所述所依赖对象的值为所依赖指标为各等级的概率值。The indicators or test items with dependencies are stored as the keys of the conditional probability parameter objects, the values of which are the lists of dependent objects, the keys of the dependent objects are the dependent indicators, and the values of the dependent objects are the probability values of the dependent indicators at each level.

优选的，所述生成基于贝叶斯网络的信息素养模型包括步骤：Preferably, the generating of the information literacy model based on the Bayesian network comprises the steps of:

根据所述指标结构数据和所述依赖参数数据，采用尾尾连接的连接方式构建基于贝叶斯网络的信息素养模型，并且采用链式表表示贝叶斯网络，其中贝叶斯网络的节点代表测试项或指标，在节点中使用多维数据存储该节点指标或测试项对应指标参数数据。According to the indicator structure data and the dependent parameter data, a tail-to-tail connection method is adopted to construct an information literacy model based on a Bayesian network, and a linked list is used to represent the Bayesian network, wherein the nodes of the Bayesian network represent test items or indicators, and multidimensional data is used in the nodes to store the indicator parameter data corresponding to the node indicator or test item.

优选的，采用链式表表示贝叶斯网络包括步骤：Preferably, using a linked list to represent the Bayesian network includes the following steps:

采用链式表构建节点数组，每个节点数组包括三个字段，第一个字段存储节点所代表的测试项或指标的名称，第二个字段存储指向该节点对应指标参数数据的多维数组的指针，第三个字段存储指向该节点存所依赖对象的指针。优选的，所述输出信息素养测试结果包括步骤：A linked list is used to construct a node array, each node array includes three fields, the first field stores the name of the test item or indicator represented by the node, the second field stores a pointer to a multidimensional array of indicator parameter data corresponding to the node, and the third field stores a pointer to the object on which the node depends. Preferably, the output of the information literacy test result includes the steps of:

输入基于贝叶斯的信息素养模型和依测试数据，根据贝叶斯公式计算测试者的信息素养各等级概率，最后选择概率最大的等级作为测试者的最终结果。Input the Bayesian information literacy model and the test data, calculate the probability of each level of the test-taker's information literacy according to the Bayesian formula, and finally select the level with the highest probability as the final result of the test-taker.

按照本发明的第二方面，提供了一种应用于教育信息化的信息素养数据挖掘系统，包括：According to a second aspect of the present invention, there is provided an information literacy data mining system for educational informatization, comprising:

测试数据采集模块，用于采集被测试者输入的信息素养测试数据，将所述测试数据存储为成绩数据矩阵表；A test data collection module, used to collect information literacy test data input by the testee, and store the test data as a score data matrix table;

指标结构数据采集模块，用于采集专家输入的指标结构数据，将所述指标结构数据存储为JSON格式；An indicator structure data collection module, used to collect indicator structure data input by experts and store the indicator structure data in JSON format;

指标参数数据采集模块，用于采集专家输入的指标参数数据，将所述指标参数数据存储为JSON格式；An indicator parameter data collection module, used to collect indicator parameter data input by experts and store the indicator parameter data in JSON format;

模型构建模块，用于构建信息素养模型生成程序，所述信息素养模型生成程序用于接收所述指标结构数据和所述指标参数数据，根据所述指标结构数据和所述指标参数数据生成基于贝叶斯网络的信息素养模型；A model building module, used to build an information literacy model generation program, wherein the information literacy model generation program is used to receive the indicator structure data and the indicator parameter data, and generate an information literacy model based on a Bayesian network according to the indicator structure data and the indicator parameter data;

数据挖掘模块，用于构建信息素养数据挖掘程序，所述信息素养数据挖掘程序用于接收所述信息素养模型和所述成绩数据矩阵表，输出信息素养测试结果。The data mining module is used to construct an information literacy data mining program, wherein the information literacy data mining program is used to receive the information literacy model and the score data matrix table and output the information literacy test result.

按照本发明的第三方面，提供了一种计算机设备，包括处理器和存储器，其中，所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序，以用于实现任一项上述的方法。According to a third aspect of the present invention, there is provided a computer device comprising a processor and a memory, wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement any one of the above methods.

按照本发明的第四方面，提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现任一项上述的方法。According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program implements any one of the above methods when executed by a processor.

总体而言，本发明与现有技术相比，具有有益效果：采用根据指标结构数据和指标参数数据来生成信息素养模型，信息素养模型中采用贝叶斯网络表示指标结构和指标参数，能够对复杂数据之间的依赖关系用计算机语言准确表达出来，解释能力强，从而提高数据挖掘的准确性，数据价值高。In general, compared with the prior art, the present invention has beneficial effects: an information literacy model is generated based on indicator structure data and indicator parameter data, and the Bayesian network is used in the information literacy model to represent the indicator structure and indicator parameters, which can accurately express the dependency relationship between complex data in computer language and has strong interpretation ability, thereby improving the accuracy of data mining and increasing the data value.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例的信息素养数据挖掘方法的原理示意图；FIG1 is a schematic diagram of the principle of the information literacy data mining method according to an embodiment of the present invention;

图2为本发明实施例的信息素养模型的示意图；FIG2 is a schematic diagram of an information literacy model according to an embodiment of the present invention;

图3为本发明实施例的节点数组示意图；FIG3 is a schematic diagram of a node array according to an embodiment of the present invention;

图4为本发明实施例的链接表示意图。FIG. 4 is a schematic diagram of a link table according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical scheme and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明实施例的一种应用于教育信息化的信息素养数据挖掘方法，是基于利用计算机技术实现的，从海量的测试大数据中充分挖掘出有价值的信息，实现测量数据的可视化和自动化分析，是数据挖掘技术在教育信息化领域的具体应用。上述信息素养数据挖掘方法可以在服务器或者计算机终端上实现。The information literacy data mining method applied to educational informatization in the embodiment of the present invention is based on the use of computer technology to fully mine valuable information from massive test big data, realize visualization and automatic analysis of measurement data, and is a specific application of data mining technology in the field of educational informatization. The above-mentioned information literacy data mining method can be implemented on a server or a computer terminal.

本发明实施例的一种应用于教育信息化的信息素养数据挖掘方法包括步骤S1至S5。An information literacy data mining method applied to educational informatization according to an embodiment of the present invention includes steps S1 to S5.

S1，采集被测试者输入的信息素养测试数据，将测试数据存储为成绩数据矩阵表。S1, collects the information literacy test data input by the test takers, and stores the test data as a score data matrix table.

优选的，信息素养测试数据包括多个测试项，每个测试项包括多个等级，例如其等级从A到Z不等。将多个被测试者的测试数据汇总，成绩数据矩阵表中为n×m矩阵，n为被测试者总数量，m为测试项总数量，成绩数据矩阵表的每行代表一个被测试者的所有测试项的测试数据，成绩数据矩阵表的每列代表一个测试项的所有被测试者的测试数据。Preferably, the information literacy test data includes multiple test items, each test item includes multiple levels, for example, the levels range from A to Z. The test data of multiple test subjects are summarized, and the score data matrix table is an n×m matrix, n is the total number of test subjects, m is the total number of test items, each row of the score data matrix table represents the test data of all test items of one test subject, and each column of the score data matrix table represents the test data of all test subjects of one test item.

例如，测试项1为单项选择题1，其是一道有关EXCEL的题目，反映了被试信息技能方面的能力，也反映了被试信息应用方面的能力，该测试项包括2个等级，答对了记为A，否则为B。测试项1为量表题2，有4个选项，代表4个等级A，B，C，D，表示四个递减的等级，是询问被试的手机使用情况，反映了被试的信息习惯情况。For example, test item 1 is a multiple-choice question 1, which is a question about EXCEL, reflecting the subject's information skills and information application abilities. This test item includes 2 levels, A for a correct answer, and B for a wrong answer. Test item 1 is a scale question 2, with 4 options, representing 4 levels A, B, C, and D, indicating four descending levels. It asks the subject about the use of mobile phones, reflecting the subject's information habits.

下表为一个成绩数据矩阵表示例。The following table is an example of a performance data matrix.

被测试者Test subjects 测试项1Test Item 1 测试项2Test Item 2 11 AA BB 22 BB BB

S2，采集专家输入的指标结构数据，将指标结构数据存储为JSON格式。S2, collects the indicator structure data input by experts and stores the indicator structure data in JSON format.

指标结构数据是指定义测试项及指标结构体系的数据。Indicator structure data refers to the data that defines the test items and indicator structure system.

优选的，指标结构数据包括测试项、每个测试项所代表的指标以及指标依赖关系。将指标结构数据存储为JSON格式包括步骤：将测试项或指标作为JSON对象(dependencies)的键，将测试项所代表的指标或指标所依赖的指标作为JSON对象的值。Preferably, the indicator structure data includes test items, indicators represented by each test item, and indicator dependencies. Storing the indicator structure data in JSON format includes the steps of using the test item or indicator as the key of the JSON object (dependencies), and using the indicator represented by the test item or the indicator on which the indicator depends as the value of the JSON object.

以下为一具体示例：The following is a specific example:

例如，单项选择题1反映了被试信息技能方面的能力，则记录为：For example, if single-choice question 1 reflects the subject's information skills, it will be recorded as:

“单项选择题1”：[“信息技能”，“信息感知”]"Single-choice question 1": ["Information skills", "Information perception"]

而信息技能指标又是信息知识与技能的次级指标，则可以表示为：The information skills index is a secondary index of information knowledge and skills, which can be expressed as:

“信息技能”：[“信息知识与技能”]“Information skills”: [“Information knowledge and skills”]

优选的，可采集多个专家分别输入的指标结构数据，当多个专家的指标结构判断数据不一致时，可预先定义共识方法，按照预先定义的共识方法获得唯一的指标结构数据。例如，“测试项2”与“指标3”的依赖关系半数或过半数专家给出支持意见则可以保留，反之则不保留。Preferably, the indicator structure data input by multiple experts can be collected. When the indicator structure judgment data of multiple experts are inconsistent, a consensus method can be pre-defined to obtain unique indicator structure data according to the pre-defined consensus method. For example, if half or more than half of the experts give support for the dependency relationship between "test item 2" and "indicator 3", it can be retained, otherwise it will not be retained.

S3，采集专家输入的指标参数数据，将指标参数数据存储为JSON格式。S3 collects the indicator parameter data input by experts and stores the indicator parameter data in JSON format.

指标参数数据是指标或测试项为各等级的概率值的定义数据。The indicator parameter data is the definition data of the probability value of each level of the indicator or test item.

优选的，指标参数包括先验概率参数和条件概率参数，先验概率参数是指没有依赖关系的指标为各等级的概率值，条件概率参数是指有依赖关系的指标为各等级的概率值，将指标参数数据存储为JSON格式包括步骤。Preferably, the indicator parameters include prior probability parameters and conditional probability parameters. The prior probability parameters refer to the probability values of indicators at each level without dependencies, and the conditional probability parameters refer to the probability values of indicators at each level with dependencies. Storing the indicator parameter data in JSON format includes steps.

将没有依赖关系的指标存储为先验概率参数对象(prior_probabilities)的键，将没有依赖关系的指标为各等级的概率值存储为先验概率参数对象的值；The indicators without dependencies are stored as the keys of the prior probability parameter object (prior_probabilities), and the probability values of each level of the indicators without dependencies are stored as the values of the prior probability parameter object;

将有依赖关系的指标或测试项存储为条件概率参数对象(conditional_probabilities)的键，条件概率参数对象的值为所依赖对象的列表，所依赖对象的键为所依赖指标，所依赖对象的值为所依赖指标为各等级的概率值。The indicators or test items with dependencies are stored as the keys of the conditional probability parameter object (conditional_probabilities). The value of the conditional probability parameter object is a list of dependent objects. The key of the dependent object is the dependent indicator, and the value of the dependent object is the probability value of the dependent indicator at each level.

以下为先验概率参数的一具体示例，其中A，B，C，D表示指标的不同等级：The following is a specific example of a priori probability parameters, where A, B, C, and D represent different levels of indicators:

当多个专家的意见不一致时，可预先定义共识方法，按照预先定义的共识方法获得唯一的指标结构数据。例如取平均值，假设不同专家对指标4为等级A的概率值判断为0.15，0.25，0.2，0.1，0.5，则取平均值为0.15。When the opinions of multiple experts are inconsistent, a consensus method can be pre-defined to obtain unique indicator structure data according to the pre-defined consensus method. For example, taking the average value, assuming that different experts judge the probability values of indicator 4 as level A as 0.15, 0.25, 0.2, 0.1, and 0.5, the average value is 0.15.

以下为条件概率参数的一具体示例。The following is a specific example of a conditional probability parameter.

例如，在专家判断被试在信息技能等级为A，信息应用等级为B时，单项选择题1正确的概率为0.8时，则可以表示为：For example, when the expert judges that the subject's information skill level is A and the information application level is B, the probability of single-choice question 1 being correct is 0.8, which can be expressed as:

S4，构建信息素养模型生成程序，信息素养模型生成程序用于接收指标结构数据和指标参数数据，根据指标结构数据和指标参数数据生成基于贝叶斯网络的信息素养模型。S4, constructing an information literacy model generation program, the information literacy model generation program is used to receive indicator structure data and indicator parameter data, and generate an information literacy model based on a Bayesian network according to the indicator structure data and indicator parameter data.

根据指标结构数据和依赖参数数据，各测试项和指标实际构成一个单向依赖的贝叶斯网络。网络的每个节点代表一个测试项或指标，采用尾尾连接(tail-to-tail)的方式连接，箭头代表其依赖关系，如图2。图2为基于贝叶斯网络的信息素养模型的具体示例。According to the indicator structure data and dependent parameter data, each test item and indicator actually constitutes a one-way dependent Bayesian network. Each node of the network represents a test item or indicator, which is connected in a tail-to-tail manner, and the arrows represent their dependencies, as shown in Figure 2. Figure 2 is a specific example of an information literacy model based on a Bayesian network.

基于贝叶斯的信息素养模型生成程序输入指标结构数据和依赖参数数据，先根据结构将贝叶斯网络结构图用计算机数据结构表示出来。结构图采用邻接表(链式)表示法储存，如图3。提取指标结构数据中的信息，采用链式表构建节点数组。节点数组由如下节点结构组成，结构包含三个字段，第一字段储存节点所代表的测试项或指标的名称。第二字段储存指向该节点对应指标参数数据的多维数组的指针，第三字段储存指向该节点存所依赖对象的依赖指针。依赖指针指向一个依赖结构，依赖结构第一字段记录所依赖节点在数组中的位置，即相对于数组头的偏移量，依赖结构第二字段记录其余的依赖结构。逐个提取指标结构数据中的键值对，如果键不在节点数组的节点中，则在节点数组后中插入一个节点结构。再读取其依赖，如果依赖已在数组中，则直接在键的单元结构后添加一个依赖结构；如果依赖不在数组中，则先在数组中添加一个节点结构，再添加一个依赖结构。The Bayesian-based information literacy model generation program inputs the indicator structure data and dependency parameter data. First, the Bayesian network structure diagram is represented by a computer data structure according to the structure. The structure diagram is stored in an adjacency list (linked) representation, as shown in Figure 3. Extract the information in the indicator structure data and use a linked list to build a node array. The node array consists of the following node structure. The structure contains three fields. The first field stores the name of the test item or indicator represented by the node. The second field stores a pointer to the multidimensional array of the indicator parameter data corresponding to the node, and the third field stores a dependency pointer to the object on which the node depends. The dependency pointer points to a dependency structure. The first field of the dependency structure records the position of the dependent node in the array, that is, the offset relative to the array head, and the second field of the dependency structure records the remaining dependency structures. Extract the key-value pairs in the indicator structure data one by one. If the key is not in the node of the node array, insert a node structure after the node array. Then read its dependency. If the dependency is already in the array, add a dependency structure directly after the key unit structure; if the dependency is not in the array, first add a node structure to the array, and then add a dependency structure.

图4为图2的贝叶斯网络用邻接表表示的示意图。FIG. 4 is a schematic diagram showing the Bayesian network of FIG. 2 represented by an adjacency list.

再提取指标参数数据，为每个节点构造多维数组，先验参数与依赖参数的处理方法一致。键对应节点名称，数组的值储存在列表值中，数组维度包括依赖维度与取值维度，维度数为依赖数加1。依赖维度长度为其依赖的等级数，取值维度长度为节点取值等级数。将A，B，C。。。等等级转化为0，1，2。。。等坐标，将概率值作为多维数组相应位置的值。如将以下对象放入Then extract the indicator parameter data and construct a multidimensional array for each node. The processing method of the prior parameters and dependent parameters is the same. The key corresponds to the node name, the array value is stored in the list value, the array dimension includes the dependent dimension and the value dimension, and the number of dimensions is the number of dependencies plus 1. The length of the dependent dimension is the number of levels it depends on, and the length of the value dimension is the number of node value levels. Convert the levels of A, B, C... to coordinates of 0, 1, 2... and use the probability value as the value of the corresponding position in the multidimensional array. If you put the following object into

的相应表示为M[0,1,0]＝0.8。最后将M的指针放入相应节点结构的数组指针字段中。

The corresponding representation is M[0,1,0] = 0.8. Finally, the pointer of M is placed in the array pointer field of the corresponding node structure.

输出邻接表作为基于贝叶斯的信息素养数据挖掘模型。The output adjacency list is used as a Bayesian-based information literacy data mining model.

S5，构建信息素养数据挖掘程序，信息素养数据挖掘程序用于接收信息素养模型和成绩数据矩阵表，输出信息素养标签。S5, construct an information literacy data mining program, which is used to receive the information literacy model and the performance data matrix table and output the information literacy label.

基于贝叶斯信息素养模型的数据挖掘程序输入是基于贝叶斯的信息素养模型和成绩表，根据贝叶斯公式计算测试者的信息素养各概率：The data mining program based on the Bayesian information literacy model takes as input the Bayesian information literacy model and the score table, and calculates the information literacy probabilities of the test takers according to the Bayesian formula:

首先根据贝叶斯网络展开公式。例如，基于图2的贝叶斯网络，e，d分别为测试项1，测试项2，a，c，b为指标1，指标2，指标3。欲在已知被测试者在测试项1为A等级，测试项2为C等级，求被测试者在指标1为等级A的概率，即d＝A，e＝C的情况下求的a＝A概率，公式如下First, expand the formula based on the Bayesian network. For example, based on the Bayesian network in Figure 2, e and d are test item 1 and test item 2, respectively, and a, c, and b are indicators 1, indicator 2, and indicator 3. If we know that the test subject is grade A in test item 1 and grade C in test item 2, and we want to find the probability that the test subject is grade A in indicator 1, that is, the probability of a = A when d = A and e = C, the formula is as follows:

b,c可取任意等级故设为X，Y。公式被还原成节点在其依赖各等级下的概率的加乘。b and c can take any level, so they are set to X and Y. The formula is reduced to the multiplication of the probability of the node at each level of its dependency.

其次，根据公式在贝叶斯网络中查询取值。如上例，其中X，Y代表b，c的任意等级，P(a＝A)为先验概率，储存在指标1节点中，P(b|a)储存在指标2节点中。以此类推，P(e|a，b)储存在测试项1节点中。如P(e＝C|a＝A,c＝B)的取值在测试项1节点的多维数组M_e的[0,1,2]位置上。Secondly, query the value in the Bayesian network according to the formula. As in the above example, X, Y represent any level of b, c, P(a＝A) is the prior probability, which is stored in the index 1 node, and P(b|a) is stored in the index 2 node. Similarly, P(e|a, b) is stored in the test item 1 node. For example, the value of P(e＝C|a＝A, c＝B) is at the [0,1,2] position of the multidimensional array _Me of the test item 1 node.

提取相关节点中的概率值，得到一种情况的值。Extract the probability values in the relevant nodes to get the value of a situation.

P(a＝A,b＝A,c＝A,d＝A,e＝C)＝P(a＝A,b＝A,c＝A,d＝A,e＝C)＝

再将每种情况的值相加。Then add up the values for each case.

即得到指标1节点等级A的概率。计算指标1节点不同等级的概率，选择概率最大的等级作为测试者的最终结果。That is, we get the probability of the index 1 node level A. We calculate the probabilities of different levels of the index 1 node and select the level with the highest probability as the final result of the tester.

最后，提取输入的成绩数据矩阵表的被试者编号列，添加一列记录每个被试的模型判断结果，输出修改后的成绩数据矩阵表作为最终输出，如下。Finally, extract the subject number column of the input score data matrix table, add a column to record the model judgment result of each subject, and output the modified score data matrix table as the final output, as follows.

被测试者Test subjects 成绩score 11 AA 22 BB

本发明实施例的一种应用于教育信息化的信息素养数据挖掘系统，其特征在于，包括：An information literacy data mining system for educational informatization according to an embodiment of the present invention is characterized by comprising:

测试数据采集模块，用于采集被测试者输入的信息素养测试数据，将测试数据存储为成绩数据矩阵表；A test data collection module is used to collect information literacy test data input by the testee and store the test data as a score data matrix table;

指标结构数据采集模块，用于采集专家输入的指标结构数据，将指标结构数据存储为JSON格式；The indicator structure data collection module is used to collect the indicator structure data input by experts and store the indicator structure data in JSON format;

指标参数数据采集模块，用于采集专家输入的指标参数数据，将指标参数数据存储为JSON格式；The indicator parameter data collection module is used to collect the indicator parameter data input by experts and store the indicator parameter data in JSON format;

模型构建模块，用于构建信息素养模型生成程序，信息素养模型生成程序用于接收指标结构数据和指标参数数据，根据指标结构数据和指标参数数据生成基于贝叶斯网络的信息素养模型；A model building module is used to build an information literacy model generation program, and the information literacy model generation program is used to receive indicator structure data and indicator parameter data, and generate an information literacy model based on a Bayesian network according to the indicator structure data and the indicator parameter data;

数据挖掘模块，用于构建信息素养数据挖掘程序，信息素养数据挖掘程序用于接收信息素养模型和成绩数据矩阵表，输出信息素养标签。The data mining module is used to construct an information literacy data mining program. The information literacy data mining program is used to receive an information literacy model and a performance data matrix table and output an information literacy label.

系统的实现原理、技术效果与上述方法类似，此处不再赘述。The implementation principle and technical effects of the system are similar to the above method and will not be repeated here.

本发明实施例还提供一种计算机设备，包括处理器和存储器，其特征在于，其中，处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序，以用于实现上述任一项信息素养数据挖掘方法。An embodiment of the present invention also provides a computer device, including a processor and a memory, wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement any of the above-mentioned information literacy data mining methods.

本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行以实现上述任一信息素养数据挖掘方法实施例的技术方案。其实现原理、技术效果与上述方法类似，此处不再赘述。The embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the technical solution of any of the above-mentioned information literacy data mining method embodiments. Its implementation principle and technical effect are similar to those of the above-mentioned method, and will not be repeated here.

必须说明的是，上述任一实施例中，方法并不必然按照序号顺序依次执行，只要从执行逻辑中不能推定必然按某一顺序执行，则意味着可以以其他任何可能的顺序执行。It must be noted that in any of the above embodiments, the method is not necessarily executed in sequence according to the sequence number. As long as it cannot be inferred from the execution logic that it must be executed in a certain order, it means that it can be executed in any other possible order.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. An information literacy data mining method applied to education informatization is characterized by comprising the following steps:

collecting information literacy test data input by a tested person, and storing the test data as a result data matrix table;

acquiring index structure data input by an expert, and storing the index structure data into a JSON format;

acquiring index parameter data input by an expert, and storing the index parameter data into a JSON format;

constructing an information literacy model generation program, wherein the information literacy model generation program is used for receiving the index structure data and the index parameter data and generating an information literacy model based on a Bayesian network according to the index structure data and the index parameter data;

constructing an information literacy data mining program, wherein the information literacy data mining program is used for receiving the information literacy model and the achievement data matrix table and outputting an information literacy test result;

the index structure data refers to data defining a test item and an index structure system; the index parameter data is definition data of probability values of indexes or test items in all levels;

the method for generating the Bayesian network-based information literacy model comprises the following steps:

according to the index structure data and the index parameter data, an information literacy model based on a Bayesian network is constructed in a tail-to-tail connection mode, and the Bayesian network is represented by a chain table, wherein nodes of the Bayesian network represent test items or indexes, and multidimensional data are used in the nodes to store the index of the node or the index parameter data corresponding to the test items;

the method for representing the Bayesian network by using the chain table comprises the following steps:

the method comprises the steps that a chained list is adopted to construct node arrays, each node array comprises three fields, the first field stores names of test items or indexes represented by nodes, the second field stores pointers of multidimensional arrays pointing to index parameter data corresponding to the nodes, and the third field stores pointers pointing to objects depended by the nodes;

the index structure data comprises test items, indexes represented by each test item and index dependency relations, and the step of storing the index structure data into a JSON format comprises the following steps:

and taking the test item or the index as a key of the JSON object, and taking the index represented by the test item or the index depended by the index as the value of the JSON object.

2. The method of information literacy data mining applied to educational informatization of claim 1, wherein the information literacy test data comprises a plurality of test items, each test item comprising a plurality of levels;

the result data matrix table is an n multiplied by m matrix, n is the total number of testees, m is the total number of test items, each row of the result data matrix table represents the test data of all the test items of one tester, and each column of the result data matrix table represents the test data of all the testees of one test item.

3. The method as claimed in claim 2, wherein the index parameters include a prior probability parameter and a conditional probability parameter, the prior probability parameter is a probability value of each level of an index without dependency, the conditional probability parameter is a probability value of each level of an index with dependency, and the step of storing the index parameter data in JSON format comprises the steps of:

storing the index without the dependency relationship as a key of a prior probability parameter object, and storing the probability value of the index without the dependency relationship as each grade as the value of the prior probability parameter object;

and storing the indexes or test items with the dependency relationship as keys of a conditional probability parameter object, wherein the value of the conditional probability parameter object is a list of the dependent objects, the keys of the dependent objects are the dependent indexes, and the values of the dependent objects are the probability values of all levels of the dependent indexes.

4. The method for information literacy data mining applied to educational informatization according to claim 3, wherein the output of the information literacy test result comprises the steps of:

and inputting an information literacy model and test data based on Bayes, calculating the probability of each grade of the information literacy of the tester according to a Bayes formula, and finally selecting the grade with the highest probability as the final result of the tester.

5. An information literacy data mining system applied to educational informatization, comprising:

the test data acquisition module is used for acquiring information literacy test data input by a tested person and storing the test data into a score data matrix table;

the index structure data acquisition module is used for acquiring index structure data input by experts and storing the index structure data into a JSON format;

the index parameter data acquisition module is used for acquiring index parameter data input by experts and storing the index parameter data into a JSON format;

the model building module is used for building an information literacy model generation program, the information literacy model generation program is used for receiving the index structure data and the index parameter data and generating an information literacy model based on a Bayesian network according to the index structure data and the index parameter data;

the data mining module is used for constructing an information literacy data mining program, and the information literacy data mining program is used for receiving the information literacy model and the score data matrix table and outputting an information literacy test result;

6. A computer device comprising a processor and a memory, wherein the processor runs a program corresponding to an executable program code stored in the memory by reading the executable program code for implementing the method according to any one of claims 1 to 4.

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.