Detailed analysis and precautions on the meaning of CiteSpace keyword time zone map

picture
Insert image description here

One of the highlights of CiteSpace is that it adds the time factor to the drawing of knowledge graphs. First divide the time period, and then merge it together for analysis, such as the keyword time zone chart. Some articles name it the topic evolution chart, which is not appropriate. The chart essentially presents an evolutionary relationship of keywords, not the theme. evolution.

Topic evolution should be the relationship between topics, such as the scientific topic evolution diagram made by TE software, or the strategic coordinates (division of topic types) made by time intervals using ST software, as shown in the figure below.
Insert image description here

This article mainly explains the keyword time zone map drawn by CiteSpace, that is, how the time zone map of keywords is generated. Other advanced maps will be discussed later.

The time interval of the data set in the picture above is 1998-2018

picture

circle

Each circle in the figure represents a keyword that first appeared in the analyzed data set [Note: it first appeared in this data set, not in all data on this topic].

Once a keyword appears, it will be fixed in the year of first occurrence. Although the keyword will still appear in the paper later, it will no longer be displayed in the figure and will only be displayed in the earliest year of occurrence.

If the keyword appears again in subsequent years, the frequency of the keyword will be increased by 1 at the position where it first appeared. The frequency will increase as many times as it appears. Therefore, it can be explained why in 1998, the amount of literature was very small, but the keywords "data management" and "university library" had such large circles.

Because "data management" and "university library" that appeared in the keywords of subsequent papers were all accumulated in 1998.

Is this method reasonable?

If a keyword appears once in 1998, does not appear in the following years, and appears 80 times in 2012, then the software will attribute the keyword to 1998. Obviously the result is unreasonable because there are anomalies. Of course, since it is an anomaly, the probability of it happening in reality is not very high.

This figure only shows the time when keywords in the target field first appeared and the research hot spots from an overall perspective (research hot spots are displayed by keyword frequency, but the frequency counted by CiteSpace is the frequency after threshold clipping, not the total frequency, see Tweet: Detailed analysis and precautions on the meaning of CiteSpace keyword co-occurrence map).

This graph cannot reflect the approximate year distribution of these hot spots (keywords). If you need to reflect the average year distribution of research hot spots, CiteSpace cannot do anything at this time. You need to use COOC or VOSviewer software to draw the map. COOC can also draw time zone maps. Specifically see below.

picture

line

Circles represent keywords, and lines represent connections between keywords. However, the existence of lines in this figure is of little significance and is not the focus of our analysis.

The lines here are the co-occurrence relationships between keywords.

For example, "data management" in 1998 and "scientific data management" in 2008 both appeared in a paper in 2008, then there is a connection between "data management" and "scientific data management", this The line was connected from 1998 to 2008.

A connecting line indicates that two keywords appear in the same article or articles.

Summarize:

Each time period in the time zone chart represents all new keywords in that time period. If they appear together with previous keywords in the same article, they will be connected with lines. The frequency of the previous keywords will be increased by 1 and the circle will become larger. , thus generating this graph. This graph can indeed reflect changes in research paths as a whole, but if you want to reflect path changes more comprehensively, you need to combine keyword-weighted time zone graphs, year-by-year attention changes, year-by-year growth rate changes, and time-weighted research hotspot changes. Of course, we can also count the changing trends of keywords year by year to reflect changes in research hotspots, such as the keyword evolution chart drawn by SE software.

Picture piece
Picture piece
Picture piece
Picture piece

picture

Problem 1

One problem with the time zone graph drawn by CieSpace is that the number of keywords displayed in each time interval cannot be too many, otherwise the graph will be very messy. For example, the graph we made at the beginning of this article, although it looks pretty good, but each time interval The number of keywords displayed is limited (PS. This picture has been stolen by many people and used for their publicity). Especially the latest keywords cannot be displayed in the picture due to their relatively low frequency, making it impossible for us to mine. The latest cutting-edge.

As mentioned above, the existence of lines in the time zone diagram is of little significance and is not the focus of our analysis. Therefore, we can use the time zone chart function of COOC software to draw. Although the time zone chart made by COOC is not as good-looking as CiteSpace, it is better than CiteSpace in terms of the number of keywords per year and the most cutting-edge keywords displayed, as shown in the figure below.

picture

This graph is also a keyword time zone graph, but it can comprehensively reflect more keywords and the latest keywords, not just those high-frequency keywords.

picture

Problem 2

There is also a very serious problem here that many novices and even veterans of CiteSpace don't know, causing problems in many published papers. That is, using uncleaned data to draw graphs directly will lead to errors in the first appearance time of keywords.

Because with the promotion of online publishing, many latest papers are missing year information, and CiteSpace will default papers with missing years to 1900, causing errors.

Next, let’s first understand the online first publishing model. The impact of the online first publication model on bibliography cannot be underestimated!

When will it be released online?

Papers first published online are recognized as officially published papers. After review by the editorial department and the electronic magazine "Chinese Academic Journals (CD Version)", it can be published online in advance on China National Knowledge Infrastructure. The case diagram is as follows:

picture

What are the benefits of launching online?

The publication time is not limited by paper journals, and the publishing capacity also breaks through the constraints of traditional paper journals. Facilitate the rapid dissemination and use of research results.

What is the impact of online first publication on bibliometrics?

【1】Duplicate question

Sometimes the same article in CNKI will have both "first published on the Internet" and "first published on the Internet", resulting in duplication of statistics during bibliometric analysis, and existing software cannot remove duplicates.

【2】Time problem

There is no time in the first bibliographic information on the Internet, which leads to errors in bibliometric measurement, which cannot be solved by existing software. Except COOC software.

When doing bibliometric analysis, you must pay attention to the above two issues, otherwise serious errors will occur. For example, due to the lack of time for online publication, CiteSpace software will default to 1900 for documents first published online in 2022, and Vosviewer will not consider this issue when doing time keyword analysis. In addition, none of the above software can perform deduplication.

Many bibliometric articles (including published articles) often fail to pay attention to the above two points, and do not know that what they are doing is actually wrong analysis.

Solutions to the above two problems:

(1) Use the latest version of COOC software to remove duplicates

(2) Use the latest version of COOC software to extract and add time.

Last and most important:

Five major issues in the bibliometric data preprocessing stage, see tweet: Detailed analysis and precautions on the meaning of CiteSpace keyword co-occurrence map

The software used for bibliometrics in the future should be COOC+CiteSpace or COOC+VOSviewer. If you want to make a better-looking network map, you also need to combine it with NSS software. If you have some text data but want to use CiteSpace, VOSviewer and other software to draw graphs, then you need to combine it with TM text mining software.

Guess you like

Origin blog.csdn.net/qq_39974284/article/details/126493153