The latest overview of 2022 Knowledge Tracking——A survey on DLKT

A survey on deep learning based knowledge tracing

The paper will be included in the Knowledge-Based Systems journal in October 2022 - the original link
In the following article, I will use DLKT instead of deep learning based knowledge tracing

Summary

This is a review article evaluating various models of past DLKT.

  1. Fine-grained classification of technical methods proposed by mainstream DLKT models
  2. Detailed analysis of KT technology
  3. Analysis of technical solutions and main improvements of each model of DLKT
  4. Possible research areas of DLKT in the future

1 Introduction

From online learning → intelligent guidance system → KT. KT not only allows learners to better understand their learning, but also allows platforms and teachers to better understand and even intervene. Then the mainstream KT is introduced, starting from BKT (introducing its concept variants and advantages and disadvantages), and then to the advantages of DKT, but due to its poor interpretability, long-term dependence, and few learning features, there are constant variations and updates. Previous reviews did not delve into the analysis of the unique contributions and improvement directions of each DLKT model. Contribute as follows

  1. Propose a clear taxonomy and compare the architecture, design, knowledge representation of each model
  2. In-depth study and summary with four datasets, comparing DLKTs performance
  3. Provides insights and discusses important issues and limitations for future research

2. Review of KT

problem definition

The intelligent education system mainly has three parts: students, exercises (topics), and knowledge concepts (skills).

Knowledge Tracking (KT) definition: given student history sequence s = { X 0 , . . . , X t } = { ( e 0 , a 0 ) , . . . , ( et , at ) } s=\{X_0,...,X_t\}=\{(e_0,a_0),...,(e_t,a_t)\}s={ X0,...,Xt}={(e0,a0),...,(et,at)} , predict the next set of interactionsX t + 1 X_{t+1}Xt+1(Precisely predict at + 1 a_{t+1}at+1

public paradigm

Different models and methods use almost the same design: negative log-likelihood, so the goal of KT is to learn the negative log-likelihood of a given time 1-t student practice process through historical sequences

insert image description here

Among them, at and a ^ t a_t and \hat a_tatanda^trepresent the actual and predicted values, respectively.

3. Classification of DLKT models

These DLKT models are summarized in Table 1. The specific techniques are divided into DKT and its variants, memory network-based, attention mechanism-based, and graph-structure-based

insert image description here

DKT and its variants

DKT in 2015, using the sequence model (RNN, LSTM, GRU) as the base model, X t X_t through one-hot encodingXtThe problem of converting to input vector
insert image description here
insert image description here
DKT is, 1. ht h_thtRepresents the overall state 2. Unable to simulate the connection between concepts 3. All exercises are of the same importance. The various extensions of DKT are shown in the table, and will not be described in detail

KT based on memory network

Expand the external memory structure to track complex concepts. The most classic is DKVMN in 2017. The focus is on the key matrix to store the skill representation, and the value matrix to store the situation of students for each skill. Specifically, calculate the attention weight of the topic and skill → calculate the student's mastery of the topic → combine the difficulty of the topic and the information of the student's mastery → predict.

SKVMN uses a modified LSTM (Hop-LSTM) for sequential modeling, see these two papers for details.

KT based on attention mechanism

Due to the lack of interpretability of DKT, interpretability is directly incorporated into a specific model structure. The common point is: the weight of the topic in the interaction is learned through the attention mechanism to indicate the importance of the topic during prediction.

The most classic is SAKT , which applies the transformer model to KT for the first time. Other variants are shown in the table. For details, see their respective papers.

KT based on graph structure

Due to the existence of various relational patterns in KT, some studies use graph representation learning to capture such capabilities.

The most classic is the GKT model, other variants are also shown in the table, see the respective papers for details

4. Comparison and analysis of DLKT models

data set

There are 6 commonly used data sets in KT, A09, A12, A15, ASSISTChall, Statics2011, Simulated-5 simulation data sets, the specific differences are shown in Table 2
insert image description here

Evaluation Index

AUC, the larger the value, the better the predictive ability

Experimental Results and Discussion

insert image description here

There are differences in the mainstream data sets of each model, and many new structures have played a positive role (Bi-CLKT), and have also improved the limitations and problems of existing models.

5. Conclusions and future prospects

In the current environment of big data and educational problems, the demand for teaching students according to their aptitude and the power of deep learning make KT gradually become an important technology. This paper reviews the previous models and divides the DLKT model into four categories, and introduces in detail the advantages and disadvantages of each model and the corresponding improvements for the first three problems. Among them, the dependency problem is solved by the self-attention mechanism, and the interpretability problem is still a challenge in deep learning. The three major methods (embedding, constrained loss function, and new structure) that lack feature problems have their own advantages and disadvantages.

In addition, there are still several challenges:

  • Limitations of binary problems, unable to solve subjective problems
  • It is difficult to introduce new learning features, requiring users to extract modeling and provide data
  • Improve the ability of the model to identify the knowledge structure, and expand the knowledge point connection graph into a knowledge graph

The content of the article ends here. I personally feel that this article does not have too many innovative points (for journals in District 1, it seems a bit watery). Here, DLKT is divided into 4 categories, which I believe most researchers have already done tacitly. In addition, compared with the previous Chinese review, this article introduces various models in a more complete and detailed manner, but there are not many extensions. Only the most classic cases are cited. The advantage is that beginners can understand and get started faster. In addition, for the comparison and analysis of various models, no exact quantitative or qualitative conclusions have been obtained, and there is no authoritative framework for comparison, and the content is relatively divergent. Of course, opinions vary from person to person.
At the end of the article, knowledge tracking is also proposed as a key technology in the intelligent guidance system. In the current educational big data environment, there are still many issues worthy of research (improving predictive performance, being more explainable, and implementing practical problems)

Guess you like

Origin blog.csdn.net/weixin_44546100/article/details/127758975
Recommended