Keywords

1 Introduction

Researches and developments of dashboards with visualized results of Learning Analytics (LA) based on large-scale educational log data accumulated in e-learning environments have become popular. Generally speaking, the major part of the current LA dashboard developments aims at feeding back the analytic results to the instructors rather than the learners directly. To help the learners to be aware of their deficiencies in learning progress, and regulates their learning strategies by themselves, monitoring their own learning processes and behaviors is an essential process [1]. Thus, making the information of past studying activities, such as what the learner has been doing and what the others have in the same class, salient for the learner can be helpful to reflect one’s study and learn from each other [2]. Our prior research has designed a learning analytics dashboard supporting metacognition to improve self-regulated learning in online environments through the collection, analysis, and visualization of learning log data [3]. A very early prototype of the proposed dashboard has been introduced in [4]. This paper focuses on the details of our latest effects on visualizing studying activities extracted from the operational logs of digital teaching materials to support self-monitoring as part of the in-progress development of the learning dashboard.

Because the learners are expected to focus on the learning processes rather than the outcome only in self-monitoring [5], the learning dashboard is designed to provide the students with summarized and visualized studying activities organized with the teaching materials (i.e., slide pages) used in the classes. The visualization includes an overview summarizing the activities on all the slide pages in a class and a view of the detailed activities on a single page. Both views provide side-by-side comparisons between the overall situations of all the students in the class and one’s own activities, as it accords with the point of self-evaluation [6]. This paper will mainly introduce a realization of visualizing the above-mentioned overview and detailed view from the operational event logs of a digital teaching material delivery system named “BookRoll” [7] operated by Kyushu University in Japan. The expected impact of the visualizations is to stimulate the student’s curiosity about the differences between their own study activities and the others by providing intuitive contrasts, in order to motivate her/his further actions of using other tools to find more details and reflecting their studies.

2 UI Designs for Visualizing Study Activities

We currently focus on the page-viewing activities within the slide of a class, which include the slide reading path and page viewing duration, as well as the learner-generated content, which includes highlight markers and memo annotations created on the slide pages by students. As the accumulated operation event logs on the e-book system from different courses and students are stored in the same database table, the event records need to be filtered according to the classes, slides, and students’ IDs. To get the reading paths, we extract the sequences of page-navigation-related events, with the high-frequency page changes filtered. From such sequences, we can calculate the time spent on each page by each student and the “from-to” links between the pages. The overall states of all the students in a class before/during/after the class time can be summarized from all the students’ results indicating the start time and end time of the class.

2.1 Graph for Reading Path Overview

We designed a graph to visualize the slide reading path and durations with the nodes standing for the pages arranged on a circle and the links standing for the “from-to” relations between the nodes for the reading path. As illustrated in Fig. 1, the color intensity of a node with a page number indicates the reading duration on the page; the thickness of a link shows the number counting the same page transit; and the colors of the link shows the directions (light gray for going to the next page, dark gray for going to the previous page, mint for jumping forward and orange for jumping backward). The accessories, which are smaller circles attached to a page node, appear if there is learner-generated content on the page, including highlight markers (circle with the letter H) and memo annotations (circle with the letter M), with their color intensities present the total numbers created on the same page. A pair of such graphs for the teaching material will be presented to the learner for them to compare her/his studying activities with the overall situation of the whole class. For the graph of the class overview, the color intensity of a node stands for the average reading time spent on the page over all the students. When a page node in the graph is clicked, the node and the “from-to” links related to the page will be highlighted with all the other nodes and links in much higher transparency. As illustrated in Fig. 2, two different colors are used to distinguish the links from the selected node with those coming to the node.

Fig. 1.
figure 1

Reading path graphs on a 27-page slide of a 70-student class (left) and one of the students in the class (right) during class time

Fig. 2.
figure 2

Reading path graphs highlighting a selected page and the links related to the page after clicking the node of Page No. 11 in Fig. 1

2.2 Learner’s Activities on a Slide Page

When a page node is clicked, the detailed numbers of the reading time duration in seconds (class average for the graph of class overview) and total learning behaviors will be listed. As shown in Fig. 3, all the highlight markers and the symbols of all the memos annotations will be overlapped on the preview of the selected slide page in a side-by-side comparison between the whole class and the learner’s own content. Such a view will help the user to find out the part other students are interested in but probably has been ignored by the user at a glance.

Fig. 3.
figure 3

Detailed view of the studying activities and learner-created content overlapped on the preview of the slide page (No. 11 as selected in Fig. 2)

2.3 Time Range Selection

It is not a friendly way to make the users input the starting and ending date and time as the range of data to be visualized, as the user will have to recall the memory of the specific time spent on studying with certain teaching material, for example, the class time or the time for preview/review. It is also not necessary to do so, as the precise time specified to minutes or seconds is not meaningful. Also, class days and class time are usually the most concerned time ranges. Therefore, we designed a two-step operation to select a time range with predefined easy-to-understand options. The first step is the date selection. The users can either select from a list of all the class days or select other days from the calendar. The second step is the time slot selection. If the users have chosen a class day in the first step, then they can choose from the options of Before, During, or After the class. Otherwise, they may choose Morning, Afternoon, Evening, or Night.

3 Data Processing

3.1 Data Sources

The raw data used to visualize the study activities is mainly from the database of BookRoll. More specifically, we mainly use the event stream from the log data table to generate the reading paths and other page-navigation-related information, and records from the content tables (e.g., tables of highlight markers and memo annotations) to visualize the learner-generated content. We also access the database of the LMS to get the class dates and times of a certain course.

The event stream consists of a series of data records, with each of them describes an operational log event. The fields of each record mainly include the user’s (student’s) id, the teaching material’s id, the page number in the teaching material, the type code of the operation, and additional descriptions according to the event types, with the date and time when the event is logged. Table 1 shows all of the operation types and their counts from the event logs generated by a single class of 70 students on a single slide of 27 pages. The highlighted operation types are those related to page view and navigations. Actually, there are operation types that never appeared in this data example, such as MEMO_JUMP. From other content tables, we got the number of highlight markers created by the same class on the same slide is 242, while the number of memo annotations is 203. The data include the activities during and out of class time.

Table 1. Example of the operation types and their counts from the event logs generated by a class of 70 students on a 27-page slide, with the types related to page navigation highlighted

3.2 Raw Data Preprocess

As all the event logs generated by all the students on teaching materials of different courses applying BookRoll are accumulated in the same table, we have to at first query the data rows according to the id of the teaching material, screen out the interested records according to the type codes (e.g., only page navigation-related events for reading path generation), and then group them by student ids.

3.3 Reading Path Generation

With the preprocess data that are grouped event log records according to the student ids, we can generate reading paths for visualization by the following steps.

Page Navigation Event Log Filtering

The events of each student are sorted by time from early to late, and then the events having too short time interval (intuitively, less than 0.5 s) since their previous ones or have the same page number as their previous ones are filtered out to make the data more meaningful. This means, for example, if a user clicked the “Next Page” button frequently in a short time, such behavior is considered as a “jump” from the page just before the series of frequent operations to the page after them. Such data filtering processes can help us to obtain the students’ behaviors closer to their real purposes through the surface of the log data.

Page Navigation Sequence Constructing

The sorted and filtered event log records of a single student are divided into several segments according to the pairs of OPEN and CLOSE events. From each segment, we can construct a page navigation sequence, which contains a start time from the OPEN, an end time from the CLOSE event, ordered items each having the page number from the corresponded event log record and time duration stayed at the page in seconds calculated from the timestamp of the event and its next one’s.

Creating Data for Nodes and Links in Reading Paths

By going through each page navigation sequence, we can sum the total time duration stayed at each page for the nodes in the reading path graph and collect all the links between different pages from each pair of two neighbored items in the sequence. The number of links with the same source page and target page is then counted. After that, additional data, such as the numbers of highlight markers and memo annotations, are appended to the nodes (i.e., pages). The set of the nodes and links of a reading path are then stored with the start time and end time.

3.4 Class Overview Generation

With all the reading paths of all the students in a class generated for teaching material, the data for visualizing the class overview graph can be collected and generated. As discussed in Sect. 2.3, a time range is selected by the user to request the visualization. Thus, all the reading paths whose time ranges intersect with the requested range can be searched out. In the next step, by going through all of these reading paths all over the class, data of all the nodes and links are merged according to the same page numbers to generate the nodes and links of the class overview. The sum of time durations stayed on a certain page is replaced by its average over the number of students in the class.

4 Visualization Techniques

The visualization of the data generated above can be realized in different approaches. In our prototype implementation, which will be introduced in Sect. 5, we developed a web-based visualization module using JavaScript on the basis of D3.js [8] (Data-Driven Documents, a widely used JavaScript library for dynamic and interactive data visualizations in web browsers). This section will discuss several technical issues in our experience of visualizing the reading path graphs.

4.1 Reading Path Graph Automatic Generation

The reading path graph has to adapt different learning materials that have different numbers of pages. Thus, it should be able to calculate the size and position of each node according to the number of pages and the size of its container in a web page, and then render the nodes and links between them automatically. We applied a technique combining the APIs for drawing donut charts and force-directed graphs in the quick implementation of our reading path graph.

For the first step, an invisible donut chart is generated, in which the slices are equally divided, and the number of them is the same as the number of pages. The purpose of this donut chart is just to get the center coordinates of the arcs, which can be utilized as the center of the nodes without additional codes for calculation. The radius of each node can be calculated as, for example, 1/4 of the arc’s length, and then the positions and sizes of the accessories of each node can also be derived with a simple calculation. In the next step, a force-directed graph is automatically generated by binding the data of nodes and links. By disabling the animation of the force-directed graph and fixing the final coordinates of each node at the corresponded arc’s center derived in the first step, our reading path graph can be generated with only a few lines of source codes.

4.2 Visual Variable Settings

We need to set up the visual variables, such as colors of the nodes and links, the line width of the links, of reading path graph generated in Sect. 4.1 to provide effective visualization. Although it is possible to define such variables manually with the APIs provided by D3.js, we try to provide several integrated programming interfaces to determine these variables with easy-to-understand settings according to the characteristics of the data. As one of the main purposes is to provide a comparison between the student’s own learning activities and the overall class situation, the two graphs must be in the same settings of visual variables to make sure the comparison is meaningful. Our strategy here is to set up the class overview graph at first and then use the generated settings to render the student’s graph.

Node Color Scales

As introduced in Sect. 2, the intensity of the color that fills a node indicates the time duration stayed on the page. By setting up the colors of the minimum and maximum values of the time duration of all the nodes, we can generate a linear color scale with the API of D3.js and apply it to each node in the graph. When applying the color scale generated using the data of the class overview to the student’s view, it is possible that some values are out of the color scale’s range, as the nodes in the class overview apply the average values. In this case, we simply use the minimum’s color for the values smaller than it and the maximum’s color for the values larger than it. The accessories of the nodes visualizing the numbers of highlight markers and memo annotations also use the same strategy to set up the color scales.

Link Width Scales

We use different stroke widths of the links to indicate the numbers of the same moves between the pages. As the numbers of actions moving to the next page or the previous page is overwhelmingly larger than the others (i.e., “jumps” between the pages not neighbored), if we generate a linear scale from the minimum and maximum numbers over all of the links for the range of the stroke width, most of the “jump” links will have the smallest stroke width. However, the jumps are usually more concerned as they usually provide more information than the commonly appeared “next” and “previous”. Therefore, we ignore the “next” and “previous” links and generate a linear scale with the minimum and maximum numbers over all of the “jump” links, in order to make the link width more meaningful. Similar to the node color scales, we use the minimum’s width for the links have smaller numbers, and the maximum’s width for the links has larger numbers. Thus, usually, all the “next” and “previous” links in the class overview graph have the largest width.

Link Colors

If we set up the same color for all the links, the reading path graphs will look messy, especially when the number of pages is large. This will make the visualization less effective, as the user cannot draw useful information, such as important links and pages of jumped from/to, from the messy graph. As clarified above in link width settings, “jump” links are considered more important than “next” and “previous”, thus we can set up different colors and transparencies on them as follows to distinguish the concerned ones.

  • “Next” links are considered the least important, as it is the most common and necessary action when browsing the teaching material. Thus, we can apply the least distinct color (e.g., light gray) and the highest transparency.

  • Previous” links can be more important than “next” links, as there must some reasons for the students to refer to the page just viewed. Thus, we can apply a more distinct color (e.g., dark gray) and lower transparency than the “next” links.

  • Jump” links are much fewer and more informative, so we can apply distinct colors on them with the least transparency. As jumping forward and backward can have different meanings, we can apply different colors to distinguish the difference of directions.

Highlights for the Selected Page

When a node (i.e., page) in the reading path graph is selected, all the other nodes and the links not related to the selected node will be displayed in with high transparency to distinguish the selected node and the links from/to it. In addition, we can modify the colors stoking the links to distinguish the links from the selected node from those moved to the node. Take Fig. 2 as an example, and we use cerulean for the links from the selected node, as well as the outline of the selected node. For the links towards the selected node, magenta is used as a contrast, also for the outlines of the nodes at the other end of the links.

5 Prototype Development

The initial prototype implementing the proposed visualization is developed with some small dumps of event logs on a very limited number of teaching materials from the database. In this stage, we mainly experimented with the data processing flows and visualization techniques with this prototype to make sure they are correct and effective to generate our expected graphs. A data processing module developed with Python3 and a web-based visualization module developed with JavaScript on the basis of D3.js was developed. In the next stage, these two modules are wrapped into our learning dashboard prototype as plugins of the LMS operated by Kyushu University for educational technology researches. The students, who are the users of the LMS, will be able to access the learning dashboard from the course page of the LMS and browse the reading path graphs of their own and the whole class for each BookRoll slide with details. In this under-going stage, we mainly work on the issues in optimizing the data processing workflows to deal with the real-time request of data period for visualization and the detailed information of the users and courses passed by the web sessions. After completing the current stage, we will prepare the prototype for the experiments of formative evaluations.

5.1 Data Processing Module

The main task of this module is to extract necessary data from the database according to the requests of visualization and process the raw data to generate the data that can be used by the visualization module to render the graphs. The data sources to be processed by this module mainly include the follows:

  • Information passed by the web sessions: such as the identities of the users, courses, and teaching materials, the requested range.

  • Information from the LMS: mainly the time periods of the classes of each course and the list of students registered to the courses.

  • Raw data from the database of BookRoll: operational event logs, and learner-generated content, as clarified in Sect. 3.1.

Considering the balance of processing load and storage cost, we strategically use offline processing for preprocesses and the data expected to be used frequently, while using online processing for the less frequently used or incidentally requested data. The offline processes can be conducted regularly at, for example, the late-night every day. Typically, offline processes include:

  • Preprocesses: as introduced in Sect. 3.2, screen out and group up the data needed from the operational event log data table in the database of BookRoll.

  • Page navigation sequence construction: as introduced in Sect. 3.3, form the time-ordered sequences of page numbers and time durations for each student over each teaching material.

  • Reading path generation for the class time: as the time periods of classes are known and reviewing of these time periods are expected to be frequently used, the data for visualization can be prepared in advance. Therefore, the nodes and links for the student’s view, as well as the class overview during the class time, can be generated and stored following the steps introduced in Sects. 3.3 and 3.4. When the user requests to view the graphs of the class time, the stored results will be directly used instead of processing in real-time.

On the other hand, the online processes mainly deal with generating reading paths for the requests other than the class time. In this case, the data of the nodes and links will be created after the user has chosen a range of time to view the graphs.

5.2 Web-Based Visualization Module

To deploy the visualization module in the web framework of the learning dashboard prototype, the web container of it is also developed with Flask, which a micro web framework is written in Python3. The visualization functions are developed with JavaScript based on D3.js and wrapped up in an independent module as a.js file. This module provides programming interfaces for binding data to generate the reading path graphs and leaner-generated content overviews and to set up visual variables, including sizes and colors. The module will output the graphs in SVG (Scalable Vector Graphics) format that to be displayed in the web container. The module also handles and delivers some of the interactive events. For example, all the mouse-click events on the graphics of nodes, links, accessories, and symbols of learner-generated content will be raised to the upper layers (e.g., the web container of the graph). This also brings expansibility to this module that is can potentially work with other functions of the learning dashboard as well as other plugins of the LMS.

6 Discussions and Future Work

When implementing the proposed visualization of studying activities for the learning dashboard, preliminary evaluations by interviews with several graduate students and teachers in our university obtained positive opinions as well as useful comments and suggestions. Besides, we also engaged in many issues that are worth discussing for further improvements and future studies.

The first issue is how many visual variables should be used for providing useful information to the users. In other words, should we use as many visual variables as possible to provide as much information as possible to achieve effective visualization? The answer is often negative. Although we managed to distinguish more important links in the reading path graph by introducing more colors, however, when we tried to add more variables, for example, using the color intensity of a node’s outline to indicate its in/out degrees, the graph became difficult to understand. Accordingly, here comes the question of what are the proper visual variables for what kinds of information should be applied in the graphs to achieve effective visualization for the students. In our opinion, the reading path graph should distinct the important messages about the differences in learning activities or styles and make them easy to find to arouse the student’s curiosity. In our future user experiments, we will try different combinations of visual variables and find out the effective designs by checking how fast the users can distinguish the important messages and their preferences.

The second issue is that when the teaching material has too many pages (e.g., more than 50), the reading path graph will become a message as there are too many links, while the nodes will be too small to see or operate. Enlarging the graph may solve the problem of too small nodes, but the user will lose the side-by-side comparison view of the two graphs, which will make the visualization less effective. One possible solution is to group the pages by the internal structures of the teaching material, such as sections in a slide, which are displayed as collective nodes, in order to reduce the number of nodes on the graph. When the user clicks the collective nodes, they will be unfolded to show more details. Such a solution will visualize the relations between sections instead of pages. We will develop the functions to support the collective nodes in our future prototypes and test their effectiveness in formative evaluations.

The third issue is the possible loss of some important information on a reading path when visualized with our proposed design. As our reading path graph mainly focuses on the importance of the pages by displaying the reading time of a page and the degrees of its relations with other pages, the order and timing of how the pages have been read are not reflected directly. For example, when a page was read more than once in a reading path, only the total reading time spent on it can be displayed, and we cannot tell if the student spent more time when reading the page for the first time or the later times. In our opinion, the information of reading orders and timing should be presented using other forms of visualizations with a time axis. We argue that such visualizations based on the time axis is not good at giving an overview of the relations between pages, and different forms of visualizations cannot replace the others without loss of functions. We may develop different views of the reading path in our future prototypes for different functions and combine them together in the learning dashboard. We can also test and compare their effectiveness with user experiments.

There can be more issues to be discovered in our future development of the visualizations of study activities for the learning dashboard, and the discussions on them will lead to more refined details of our proposed graphs and better designs of user experiments for formative evaluation. In the future, we are going to explore visualizations of other types of data and forms to enhance the functions of the learning dashboard to support self-regulated learning, including self-monitoring, knowledge monitoring, planning, and regulation, for the students.