A survey on deep learning based knowledge tracing
The paper will be included in the Knowledge-Based Systems journal in October 2022 - the original link
In the following article, I will use DLKT instead of deep learning based knowledge tracing
Summary
This is a review article evaluating various models of past DLKT.
- Fine-grained classification of technical methods proposed by mainstream DLKT models
- Detailed analysis of KT technology
- Analysis of technical solutions and main improvements of each model of DLKT
- Possible research areas of DLKT in the future
1 Introduction
From online learning → intelligent guidance system → KT. KT not only allows learners to better understand their learning, but also allows platforms and teachers to better understand and even intervene. Then the mainstream KT is introduced, starting from BKT (introducing its concept variants and advantages and disadvantages), and then to the advantages of DKT, but due to its poor interpretability, long-term dependence, and few learning features, there are constant variations and updates. Previous reviews did not delve into the analysis of the unique contributions and improvement directions of each DLKT model. Contribute as follows
- Propose a clear taxonomy and compare the architecture, design, knowledge representation of each model
- In-depth study and summary with four datasets, comparing DLKTs performance
- Provides insights and discusses important issues and limitations for future research
2. Review of KT
problem definition
The intelligent education system mainly has three parts: students, exercises (topics), and knowledge concepts (skills).
Knowledge Tracking (KT) definition: given student history sequence s = { X 0 , . . . , X t } = { ( e 0 , a 0 ) , . . . , ( et , at ) } s=\{X_0,...,X_t\}=\{(e_0,a_0),...,(e_t,a_t)\}s={ X0,...,Xt}={(e0,a0),...,(et,at)} , predict the next set of interactionsX t + 1 X_{t+1}Xt+1(Precisely predict at + 1 a_{t+1}at+1)
public paradigm
Different models and methods use almost the same design: negative log-likelihood, so the goal of KT is to learn the negative log-likelihood of a given time 1-t student practice process through historical sequences
Among them, at and a ^ t a_t and \hat a_tatanda^trepresent the actual and predicted values, respectively.
3. Classification of DLKT models
These DLKT models are summarized in Table 1. The specific techniques are divided into DKT and its variants, memory network-based, attention mechanism-based, and graph-structure-based
DKT and its variants
DKT in 2015, using the sequence model (RNN, LSTM, GRU) as the base model, X t X_t through one-hot encodingXtThe problem of converting to input vector
DKT is, 1. ht h_thtRepresents the overall state 2. Unable to simulate the connection between concepts 3. All exercises are of the same importance. The various extensions of DKT are shown in the table, and will not be described in detail
KT based on memory network
Expand the external memory structure to track complex concepts. The most classic is DKVMN in 2017. The focus is on the key matrix to store the skill representation, and the value matrix to store the situation of students for each skill. Specifically, calculate the attention weight of the topic and skill → calculate the student's mastery of the topic → combine the difficulty of the topic and the information of the student's mastery → predict.
SKVMN uses a modified LSTM (Hop-LSTM) for sequential modeling, see these two papers for details.
KT based on attention mechanism
Due to the lack of interpretability of DKT, interpretability is directly incorporated into a specific model structure. The common point is: the weight of the topic in the interaction is learned through the attention mechanism to indicate the importance of the topic during prediction.
The most classic is SAKT , which applies the transformer model to KT for the first time. Other variants are shown in the table. For details, see their respective papers.
KT based on graph structure
Due to the existence of various relational patterns in KT, some studies use graph representation learning to capture such capabilities.
The most classic is the GKT model, other variants are also shown in the table, see the respective papers for details
4. Comparison and analysis of DLKT models
data set
There are 6 commonly used data sets in KT, A09, A12, A15, ASSISTChall, Statics2011, Simulated-5 simulation data sets, the specific differences are shown in Table 2
Evaluation Index
AUC, the larger the value, the better the predictive ability
Experimental Results and Discussion
There are differences in the mainstream data sets of each model, and many new structures have played a positive role (Bi-CLKT), and have also improved the limitations and problems of existing models.
5. Conclusions and future prospects
In the current environment of big data and educational problems, the demand for teaching students according to their aptitude and the power of deep learning make KT gradually become an important technology. This paper reviews the previous models and divides the DLKT model into four categories, and introduces in detail the advantages and disadvantages of each model and the corresponding improvements for the first three problems. Among them, the dependency problem is solved by the self-attention mechanism, and the interpretability problem is still a challenge in deep learning. The three major methods (embedding, constrained loss function, and new structure) that lack feature problems have their own advantages and disadvantages.
In addition, there are still several challenges:
- Limitations of binary problems, unable to solve subjective problems
- It is difficult to introduce new learning features, requiring users to extract modeling and provide data
- Improve the ability of the model to identify the knowledge structure, and expand the knowledge point connection graph into a knowledge graph
The content of the article ends here. I personally feel that this article does not have too many innovative points (for journals in District 1, it seems a bit watery). Here, DLKT is divided into 4 categories, which I believe most researchers have already done tacitly. In addition, compared with the previous Chinese review, this article introduces various models in a more complete and detailed manner, but there are not many extensions. Only the most classic cases are cited. The advantage is that beginners can understand and get started faster. In addition, for the comparison and analysis of various models, no exact quantitative or qualitative conclusions have been obtained, and there is no authoritative framework for comparison, and the content is relatively divergent. Of course, opinions vary from person to person.
At the end of the article, knowledge tracking is also proposed as a key technology in the intelligent guidance system. In the current educational big data environment, there are still many issues worthy of research (improving predictive performance, being more explainable, and implementing practical problems)