CN119067130A

CN119067130A - A cloud education platform management system based on big data

Info

Publication number: CN119067130A
Application number: CN202411179145.2A
Authority: CN
Inventors: 张文雄
Original assignee: Anhui Youxin Education Technology Co ltd
Current assignee: Anhui Youxin Education Technology Co ltd
Priority date: 2024-08-27
Filing date: 2024-08-27
Publication date: 2024-12-03

Abstract

The invention discloses a cloud education platform management system based on big data, which relates to the field of cloud education platform management and comprises a cloud server platform, wherein the cloud server platform is in communication connection with a data acquisition module, a data processing module, a data analysis module and a personalized coaching module, an information base is arranged in the cloud server and is used for storing basic information of teachers and students, education resource information, interaction voice information of the teachers and students when the teachers develop teaching and the like, the data acquisition module is used for acquiring the voice information of the students and the teachers when the students interact with the teachers, the data processing module is used for processing the voice information of the students and converting the voice information into text information, the data analysis module is used for extracting semantic characteristics of the text information to obtain complete logic of answering questions of the students and carrying out semantic analysis, and the personalized coaching module generates a personalized coaching scheme according to semantic analysis results and displays the personalized coaching scheme on the cloud education platform for the teachers and the students to access.

Description

Cloud education platform management system based on big data

Technical Field

The invention relates to the field of cloud education, in particular to a cloud education platform management system based on big data.

Background

The cloud education platform is an online education service platform based on cloud computing and network technology, and provides omnibearing education services from online course management, teaching resource sharing, remote learning, examination and the like. The cloud education platform is connected with students, teachers and education institutions through the Internet, flexible learning modes and tools are provided, limitation of time and space is broken through, and students can learn online at any time and any place.

In the cloud education platform, interaction between a teacher and students is an important link. One of the problems existing at present is that the accuracy of semantic extraction is not high when a teacher interacts with students, so that the teacher cannot comprehensively know thought defects and knowledge blind areas of the students, and therefore, a personalized coaching scheme cannot be made.

Disclosure of Invention

The invention aims to provide a cloud education platform management system based on big data, which aims to solve the defects in the background technology.

In order to achieve the above purpose, the cloud education platform management system based on big data comprises a cloud server platform, wherein the cloud server platform is in communication connection with a data acquisition module, a data processing module and a data analysis module.

The data acquisition module is used for acquiring voice information when students interact with teachers to form audio files;

The data processing module is used for processing the audio file, converting the audio file into a text form and forming text information;

The data analysis module is used for extracting semantic features of the text information to obtain a semantic expression result, and through the semantic expression result, a teacher can analyze logic errors, thinking loopholes and the like in the text information, so that personalized teaching and coaching suggestions are provided for the teacher. Meanwhile, corresponding improvement strategies and teaching resources can be provided for teachers according to logic defects and knowledge blind areas of the questions answered by students so as to meet personalized demands of the students.

The personalized coaching module is used for generating a personalized coaching scheme according to the semantic expression result

In a preferred embodiment, the process of voice information acquisition when the students interact with the teacher is that in video teaching, a real-time voice communication function is started, voice interaction information of the teacher and the students is recorded, and the voice interaction information is stored on a cloud server in the form of an audio file.

In a preferred embodiment, the data processing module processes the audio file and forms text information, namely, the audio file is segmented according to blocks, the data processing module identifies the audio file block by block, and the identification result is stored on the cloud server in the form of text information. For large audio files, they can be processed in smaller blocks, which helps to increase processing efficiency and avoid overtime or excessive resource consumption caused by excessive audio.

The audio file is segmented according to the volume, a lowest volume threshold is set, when the volume value is lower than the lowest volume threshold, the audio file is segmented until the audio is finished, and a plurality of audio file blocks are formed in sequence.

If the audio is divided into a plurality of blocks for processing, the recognition results of the blocks need to be combined finally, and the results can be ordered and combined according to the time stamp or the sequence of the blocks so as to obtain complete audio recognition text.

In a preferred embodiment, the process of extracting semantic features from text information by the data analysis module is as follows:

1. Collecting interactive voice data of teachers and students in class, and constructing an interactive question-answering voice data set;

2. training a semantic extraction model with attention based on the interactive question-answering voice data set to obtain a trained semantic extraction model;

3. and taking the interactive question-answering voice data to be extracted as the input of a semantic extraction model to obtain a semantic extraction result.

In a preferred embodiment, the semantic extraction model with attention includes an embedding layer, a semantic coding layer, and a decoding layer;

the embedded layer is a trained word embedded model, and converts input text information into a word embedded vector form to be output;

the semantic coding layer consists of a plurality of encoders and is used for acquiring hidden features in each encoder;

The decoding layer is used for outputting hidden features in different encoders through decoding by an attention mechanism, and predicting to obtain a semantic extraction result.

In a preferred embodiment, the training method of the word embedding model comprises the following steps:

1. the method comprises the steps of obtaining a plurality of interactive question-answering voice data sets when teachers interact with students in teaching, wherein the data sets comprise a plurality of historical interactive question-answering voice data and corresponding real interactive question-answering data;

2. preprocessing an interactive question-answering voice data set, including converting an audio file into a text file, removing special characters, punctuation marks and numbers, splitting the text into words and removing prepositions to form preprocessed text data;

3. constructing a vocabulary containing all words appearing in the text data based on the preprocessed text data;

4. Constructing a context set based on a vocabulary, centering each word of the vocabulary, and selecting 3 words in the range of the word as a set to form a plurality of context sets;

5. Training a word embedding model by using the constructed context aggregate data as input, and obtaining a word embedding vector after training is completed;

In a preferred embodiment, the encoding process of the semantic encoder layer is:

The word embedding vectors are input into the encoder for encoding, and in a first time step, the encoder calculates a first hidden feature from the first word embedding vector and then updates the hidden feature with a second word embedding vector in a next time step, where a time step refers to each discrete point in time or step of the model when processing sequence data.

In a preferred embodiment, the decoding process of the decoding layer is:

step one, calculating initial hidden characteristics of a decoder according to final hidden characteristics of the encoder;

Step two, in each time step, the decoder predicts the output of the current time step according to the current hidden characteristic and the output generated before, and the output result can be a word or a character;

thirdly, the decoder carries out weighted collection on the hidden characteristics of the coding layer corresponding to the decoder to obtain a weighted collection vector;

Step four, obtaining current hidden characteristics according to the weighted collection vector and the previous hidden characteristics, and continuously generating word output of the next time step;

By repeatedly executing steps two through four, the decoder will continuously generate output until a complete semantic representation result is generated.

In a preferred embodiment, the personalized coaching module forms complete question-answering logic of interaction between a teacher and a student according to complete semantic expression results, and based on the complete question-answering logic, the teacher analyzes logic defects and knowledge blind areas of answering questions by the student so as to formulate a personalized coaching scheme for the student, the personalized coaching module is displayed on a cloud education platform, the teacher and the student can log in the cloud education platform to access the coaching scheme, and the coaching scheme comprises question answering, knowledge point strengthening exercise, reference material recommendation, learning planning and coaching feedback and evaluation.

In the technical scheme, the invention has the technical effects and advantages that:

1. Through text conversion and feature extraction of voice, a teacher can obtain answer contents of students in a class discussion or question-answering link, so that the teacher can analyze and evaluate the answers of the students in detail, and the knowledge mastering degree, understanding level and thinking mode of the students are accurately known.

2. Individualized coaching and teaching adjustment, namely, through logic errors and thinking holes in answers of students, teachers can provide targeted explanation and guidance to help students to digest and understand knowledge better, so that individualized coaching schemes are formulated according to different requirements and levels. In addition, the teacher can timely adjust teaching strategies according to learning performance and feedback of students, and more effective teaching is provided.

3. Through interactive teaching, teachers can better know the interests of students in courses, teachers can design more attractive and targeted teaching contents based on the information, learning enthusiasm and initiative of the students are mobilized, and participation and enthusiasm of the students are stimulated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a block diagram of a system according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In embodiment 1, referring to fig. 1, the cloud education platform management system based on big data in this embodiment includes a cloud server platform, and the cloud server platform is communicatively connected with a data acquisition module, a data processing module, and a data analysis module.

The data acquisition module is used for acquiring voice information when students interact with teachers, and ensures that audio and video recording functions are set before video teaching begins, so that sound and images of the whole teaching process can be recorded, after recording is completed, voice answers of the voice answers can be uploaded to a server of the cloud teaching platform, and the server transcodes voice files and converts the voice files into formats which can be used for subsequent processing, such as MP3, WAV and the like.

The data processing module is used for processing the voice information when the student interacts with the teacher, converting the voice information into a text form and forming text information; the specific implementation mode is that the audio file is segmented according to blocks, the data processing module performs voice recognition block by block, and recognition results are stored on the cloud server in the form of text information. For audio files, they can be processed in smaller blocks, which helps to increase processing efficiency and avoid overtime or excessive resource consumption caused by excessive audio. If the audio is divided into a plurality of blocks for processing, finally, the identification results of the blocks need to be combined, and the results can be ordered and combined according to the time stamp or the sequence of the blocks so as to obtain a complete audio identification text;

The audio file is segmented according to the blocks, wherein the audio file is segmented according to the volume, a volume threshold is set, when the volume is lower than the threshold, the audio file is segmented to form file blocks, and a plurality of audio file blocks are formed in sequence until the audio is finished.

The data analysis module is used for extracting text information to obtain complete logic of the student answer questions, analyzing the complete logic of the student answer questions to form logic analysis results, and providing individualized teaching and tutoring advice for teachers by analyzing logic errors, thinking loopholes and the like in the student answers. Meanwhile, corresponding improvement strategies and teaching resources can be provided for teachers according to logic defects and knowledge blind areas of the questions answered by students so as to meet personalized demands of the students.

The process of extracting semantic features of the text information by the data analysis module comprises the following steps:

1. collecting interactive question-answering voice data of teachers and students in class, and constructing an interactive question-answering voice data set;

Attention mechanisms are an important technique in deep learning that can be used to improve the model's ability to process and understand input data, and their introduction enables the model to focus more on the parts relevant to the current task when processing the input sequence. In particular, the attention mechanism may help the model determine which are relevant, requiring more attention, from different portions of the input data. By calculating the attention weight, the model can dynamically assign attention to different positions or elements in the input sequence. In this way, the model can perform targeted weighted focusing according to the input context information, so as to extract and utilize the most relevant information.

The semantic extraction model with attention comprises an embedding layer, a semantic coding layer and a decoding layer;

the embedded layer is a trained word embedded model, and converts the input text information into a word embedded vector form to be output;

The function of the embedding layer is to map the input discrete symbols (e.g. words, characters) into a continuous numerical representation, i.e. to convert them into vector form. The purpose of this is to enable the model to better process the text data, capturing the semantics and relevance between words.

The input to the embedding layer may be discrete identifiers (e.g., indices of words) each of which is mapped to a fixed length vector representation after the embedding operation. In this way, the text data can be represented as a matrix of embedded vectors, where each row corresponds to a position or word in the input sequence.

In the attention mechanism model, this vector matrix transformed by the embedding layer will be used as input to the coding layer. The coding layer is responsible for further processing and extracting features from the input data for final tasks or predictions.

The semantic coding layer consists of a plurality of encoders for acquiring hidden features in each encoder, and is responsible for converting an input sequence into a series of representations of the hidden features. The coding layer may take different structures, here a recurrent neural network (model), and the attention mechanism plays a key role in the coding layer, and the most relevant information is extracted and utilized by weighting the output of the coding layer.

The function of the decoding layer is to combine the output of the encoding layer with the previous context information to generate the final output sequence.

The training method of the word embedding model comprises the following steps:

The method comprises the steps of obtaining a plurality of interactive question-answering voice data sets when teachers interact with students in teaching, wherein the data sets comprise a plurality of historical interactive question-answering voice data and corresponding real interactive question-answering data;

preprocessing an interactive question-answering voice data set, including converting an audio file into a text file, removing special characters, punctuation marks and numbers, splitting the text into words and removing prepositions to form preprocessed text data;

Constructing a vocabulary containing all words appearing in the text data based on the preprocessed text data;

Constructing a context set based on a vocabulary, centering each word of the vocabulary, and selecting 3 words in the range of the word as a set to form a plurality of context sets;

in constructing context collection data, a fixed context collection size, and the number of context words surrounding each collection center word, needs to be determined. The size of the set can be determined according to task requirements, and a window size of 3-10 is a common choice.

For example, a procedure for constructing context aggregate data will be described:

Suppose that there is a sentence "I don't understand this concept and can speak a pass or not.

Each word in the set can be used as a central word of the set, and then context words in the range of the set are selected. Assuming that the set size is set to 3, the following context set data can be obtained:

for the center word "I" the context set: [ no, understand this concept ]

For the center word "understand": context set: [ I, did not, this concept ]

For the center word "this concept" the set of contexts: [ I, did not understand ]

In this example, the set size is 3, and each center word contains the nearest three context words around it. It can be seen that the sets of contexts corresponding to different center words are different and that the word order in the sets maintains the relative positional relationship in the original sentence.

The constructed context aggregate data can be used as a training sample and input into a word embedding model for training. By learning the commonalities and semantic relatedness of words within a collection, the model can generate word vector representations capturing the semantic and contextual relationships between words.

The coding process of the semantic coder layer is as follows:

first, at the embedding level, each word is mapped to a low-dimensional continuous vector representation. For example, "I" may map to a 5-dimensional vector and "No" may map to a 4-dimensional vector.

Next, with the output of the embedded layer as input, the encoder will traverse each word in the input sequence and calculate the hidden feature of the current time step at each time step. For example, in a first time step, the encoder may calculate a first hidden feature from the embedded vector of a first word and then update the hidden feature with the embedded vector of a second word in the next time step.

When the encoder has processed the entire input sequence, a concatenation of the forward and backward concealment features for the last time step can be used as an output representation of the encoding layer. This output vector can be used for subsequent classification tasks to help the model understand the context information of the entire text.

In an encoder model, a time step refers to each discrete point in time or step of the model when processing sequence data. Specifically, for an input sequence of length T, the encoder processes each element in the sequence in turn and updates its hidden feature at each time step. The hidden features of each time step contain information entered at the corresponding time point to assist in modeling the context and capturing correlations in the sequence.

The concept of time steps is described below in a simple example. Assume a sentence, "I do not understand this concept," where each word represents a step in time. This sentence is now processed using the encoder model:

time step 1, input is I, the encoder model calculates the hidden characteristic of the first time step and updates.

And 2, inputting 'no', and calculating the hidden characteristic of the current time step by the encoder model according to the current input and the hidden characteristic of the previous time step.

And 3, inputting the hidden characteristic of the current time step into the encoder model, wherein the input is 'understanding'.

Time step 4, input is "this concept", the encoder model calculates the hidden feature of the current time step from the hidden features of the current input and the previous time step.

Through the above process, the encoder model processes each word in the sentence step by step and updates the hidden feature at each time step. Thus, the model can use the information in the hidden features to understand the semantics and context of the sentence after processing the complete input sequence.

The decoding process of the decoding layer is as follows:

1. first, it is necessary to initialize the hidden feature of the decoder. This initial concealment characteristic is typically calculated from the last concealment characteristic of the encoder.

2. Next, the generation of the output sequence is started, one time step at a time. At each time step, the decoder predicts the output of the current time step based on the current concealment feature and the previously generated partial sequence. This prediction may be a word, character, or other form of identifier.

3. To generate a prediction of the current time step, the decoder uses an attention mechanism to calculate a weighted collection of the encoded layer outputs. The current hidden feature of the decoder is treated as a query vector, and the attention weights are calculated on the output of the encoding layer. These weights measure the correlation of different positions in the output of the coding layer with the current concealment feature.

4. Using the calculated attention weights, the decoder performs a weighted aggregation of the encoded layer outputs. A weighted aggregate vector representation is obtained by multiplying the attention weights with the coding layer output vector. This vector contains the coding layer information most relevant to the current time step.

5. The decoder then updates its own hidden feature based on the weighted aggregate vector of the current time step and the previous hidden feature. This allows for a better control of the output of the current time step to be generated, in combination with the information and context of the coding layer.

Repeating the steps 2-5 until a complete output sequence is generated or a specific stopping condition is met. Thus, a complete semantic representation is output.

The personalized coaching module forms complete question-answering logic of interaction between a teacher and students according to complete semantic expression results, based on the complete question-answering logic, the teacher analyzes logic defects and knowledge blind areas of answering questions of the students, so that personalized coaching schemes are formulated for the students, the personalized coaching schemes are displayed in the cloud education platform personalized coaching module, the teacher and the students can log in the cloud education platform to access the coaching schemes, the coaching schemes comprise question answering, knowledge point strengthening exercise, reference material recommendation, learning planning and coaching feedback and evaluation, and the teacher can further adjust and optimize the schemes according to learning conditions of the students.

Solution of questions the students are provided with detailed solutions and solutions to their individual questions. For example, a student may have confusion over a mathematical problem, and a personalized coaching solution may give a clear explanation, steps and examples for the problem, helping the student to eliminate confusion.

And (3) knowledge point strengthening exercise, namely according to the performance and understanding conditions of students in interactive questions and answers, the personalized coaching scheme can recommend specific knowledge point strengthening exercise questions. For example, a student may have a knowledge point in the english grammar that is not firm to master, and a personalized coaching solution may provide relevant exercise questions and answers to consolidate the student's knowledge point.

The reference material recommendation, wherein the personalized coaching scheme can also recommend proper reference materials, such as a teaching material section, an online document, a learning website or video resource and the like, according to the learning requirement and characteristics of students. For example, a student may be interested in a certain historical event, and a personalized coaching solution may provide some relevant reading material and website links to help the student learn and learn in depth.

The personalized coaching scheme can generate a personalized learning plan according to the learning progress and the target of the student. For example, a student may be behind schedule on a subject, and a personalized coaching program may provide a detailed learning plan, including daily tasks and schedules, to help the student catch up with the schedule and achieve learning goals.

Coaching feedback and assessment personalized coaching schemes can also provide teachers with feedback and assessment regarding student learning. For example, the personalized coaching scheme may record information such as question answering accuracy, response time and the like of students in interactive questions and answers, and present the information to a teacher so that the teacher can track and evaluate learning conditions of the students.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The cloud education platform management system based on big data comprises a cloud server platform and is characterized in that the cloud server platform is in communication connection with a data acquisition module, a data processing module, a data analysis module and a personalized coaching module;

the data processing module is used for processing the audio file and generating corresponding text information;

the data analysis module is used for extracting semantic features of the text information and obtaining a semantic expression result;

The personalized coaching module is used for generating a personalized coaching scheme according to the semantic expression result.

2. The cloud education platform management system based on big data as claimed in claim 1, wherein the process of collecting the voice information when the student interacts with the teacher is that in the video teaching, a real-time voice communication function is started, the voice information when the teacher interacts with the student is recorded, and the voice information is stored on the cloud server in the form of an audio file.

3. The cloud education platform management system based on big data as set forth in claim 2, wherein the audio file is segmented according to the volume, a lowest volume threshold is set, when the volume value is lower than the lowest volume threshold from the first audio in the audio file, the audio file is segmented to form a file block, and the audio file is segmented to form a plurality of audio file blocks in turn until the audio is finished;

the data processing module performs block-by-block identification processing on the audio file blocks to form text information.

4. The cloud education platform management system based on big data as claimed in claim 3, wherein the data analysis module performs the semantic feature extraction of the text information by:

Constructing an interactive voice data set by using the text information;

Training a semantic extraction model with attention based on the interactive voice data set to obtain a trained semantic extraction model;

and inputting the interactive voice data set to be extracted into a semantic extraction model to obtain a semantic expression result.

5. The cloud education platform management system based on big data as claimed in claim 4, wherein the semantic extraction model with attention comprises an embedding layer, a semantic coding layer and a decoding layer;

6. The cloud education platform management system based on big data as claimed in claim 5, wherein the training method of the word embedding model comprises the following steps:

the method comprises the steps of obtaining an interactive voice data set when a teacher interacts with students in teaching, wherein the interactive voice data set comprises a plurality of historical interactive voice data;

Preprocessing an interactive voice data set, including converting an audio file into a text file, removing special characters, punctuation marks and numbers, splitting the text into words and removing prepositions, and forming preprocessed text data;

and training a word embedding model by using the constructed context aggregate data as input, and obtaining a word embedding vector after training is completed.

7. The cloud education platform management system based on big data as claimed in claim 5, wherein the encoding process of the semantic encoder layer is as follows:

The word embedding vector is input into the encoder for encoding, the encoder calculates a first hidden feature according to the first word embedding vector in a first time step, and then updates the hidden feature by using the second word embedding vector in a next time step, wherein the time step is a step corresponding to each time point when the encoder processes encoding.

8. The cloud education platform management system based on big data as claimed in claim 5, wherein the decoding process of the decoding layer is:

Step two, in each time step, the decoder predicts the output of the current time step according to the current hidden characteristic and the output generated by the previous time step, and the output result is a word or a character;

Step four, obtaining the hidden characteristic of the current time step according to the weighted collection vector and the hidden characteristic of the previous time step, and continuously generating the output of the next time step;

9. The cloud education platform management system based on big data as claimed in claim 8, wherein the teacher analyzes logic defects and knowledge blind areas of the questions answered by the students according to complete semantic expression results, so that personalized coaching schemes are formulated for the students, the personalized coaching schemes are displayed in a personalized coaching module of the cloud education management system, and the teacher and the students can log in the cloud education platform to access the coaching schemes.