[Exquisite] NLP natural language processing learning route (knowledge system)

Currently, the powerful dialogue question answering and text generation capabilities of large-scale pre-trained language models have pushed the research and application of natural language processing (NLP) into a new wave of enthusiasm. NLP is a cutting-edge field at the intersection of computer science, artificial intelligence and linguistics. The application and research scope of NLP is very wide, and I personally have not found a particularly good, detailed and systematic document.

This article briefly summarizes an NLP learning route based on the main subfields of natural language processing and the main tasks it contains , which can also be said to be a knowledge system. Subsequent personal technical articles will mainly focus on NLP , and will generally follow this route to record relevant basic knowledge, methods, technologies, tools, practical cases, etc. Therefore, this article is not only a learning route, but also a personal learning plan for a long time to come.

If you are also interested in NLP, please follow "Jiumozhai" to learn more exciting content: WeChat public account [Jiumozhai], Tencent Cloud Developer Community, CSDN, personal blog website (www.jiumoz.com)

The fishbone diagram below is a personal NLP-related learning route. In a sense, it can be understood as a knowledge system. This article will try to briefly describe these basic concepts with examples.

In fact, there are many related articles about 'NLP knowledge system' or 'NLP learning route', and there are many of them just by searching online. However, a lot of the content is developed from an academic perspective and does not fit well with personal understanding and planning, so I simply summarized the following simple structure. Of course, this is not a completely complete system. The attribution of many subtasks is not particularly rigorous.

Insert image description here

brief overview

NLP concept

NLP, also known as natural language processing, is a discipline that studies effective communication between humans and computers using natural language. Natural language processing integrates theories and methods from multiple disciplines such as linguistics, computer science, and mathematics. The working principle of NLP is to first receive information sent by humans using natural language, then translate and convert it through probability-based algorithms, and finally output results that can be understood and executed by computers. The two core tasks of NLP are natural language understanding (NLU) and natural language generation (NLG). Natural language understanding refers to enabling computers to understand and interpret human language, while natural language generation converts data in non-linguistic formats into human language formats to achieve the purpose of human-computer communication.

NLP development history

The development of natural language processing (NLP) can be traced back to the 1950s and has gone through multiple stages and technological innovations. The following is the detailed development history of NLP:

  1. 1950s-1960s: Rule-oriented period

    During this period, NLP mainly adopted a rule-based approach to achieve text analysis and processing by manually writing rules and grammar. The earliest success stories were Shannon's maze-running mouse (1950) and Weizenbaum's ELIZA artificial dialogue system (1964), which used a rule-emphasis approach to solve problems or produce output through simulation.

  2. 1970s-1980s: Statistical analysis and corpus period

    With the development of computer technology, researchers have begun to use large-scale text corpora and adopt statistical analysis and other methods, the so-called "learning from data". During this period, corpus-based methods and statistical learning methods were widely used, and important technologies and algorithms were proposed, such as hidden Markov models, maximum entropy models, and conditional random fields.

  3. 1990s: Knowledge-based and mixed methods period

    During this period, researchers combined both rule-based and statistics-based approaches to form hybrid methods. In addition, knowledge-based methods and deep learning algorithms have emerged, such as artificial neural networks and support vector machines. At the same time, the construction and sharing of annotated corpora has also become an important trend in the development of NLP, such as Penn Treebank and WordNet.

  4. 2000s: The era of deep learning

    With the rise of deep learning technology, NLP has entered a new development period. Deep learning technology can automatically learn features and patterns and solve multiple tasks in an end-to-end manner, such as text classification, sentiment analysis, machine translation, and question answering systems. Important deep learning algorithms include convolutional neural network (CNN), long short-term memory network (LSTM), and transformer network (Transformer).

  5. 2010s to present: Pre-training and contextual understanding period

    During this period, researchers found that using pre-trained models could significantly improve the performance of NLP tasks. These models are usually pre-trained using large-scale unsupervised corpora and fine-tuned on specific tasks. In addition, context understanding has also become an important research direction in the current NLP field, and important models and technologies such as BERT and GPT have emerged in this direction.

NLP main subfields

This section simply records the concepts of each sub-task. For more information, please pay attention to the detailed blog of each task in the future;

Text preprocessing

  1. Text Cleaning

    Text cleaning refers to processing original text to remove noise and irrelevant information. Common cleaning operations include removing HTML tags, special characters, punctuation marks, extra spaces, etc. The purpose of text cleaning is to provide cleaner and more standardized data for subsequent processing.

    Example: Suppose we have the following raw text:

    <div class="intro">Natural language processing (NLP) is a field of artificial intelligence which</div> focuses on enabling computers to understand and interpret human language.
    

    We can perform the following text cleaning operations:

    • Remove HTML tags:Natural language processing (NLP) is a field of artificial intelligence which focuses on enabling computers to understand and interpret human language.
    • Delete the content inside the brackets:Natural language processing is a field of artificial intelligence which focuses on enabling computers to understand and interpret human language.
    • Convert uppercase letters to lowercase:natural language processing is a field of artificial intelligence which focuses on enabling computers to understand and interpret human language.
    • Remove the period at the end of the sentence:natural language processing is a field of artificial intelligence which focuses on enabling computers to understand and interpret human language
  2. Tokenization

    Tokenization refers to the process of splitting text into discrete words or tokens. Word segmentation is one of the important steps in NLP, providing the basic unit for subsequent processing.

    Example: Suppose we have the following raw text:

    I love natural language processing.
    

    We can perform the following word segmentation operations:

    • Split text into words:["I", "love", "natural", "language", "processing"]
  3. Stopword Removal

    Removing stop words refers to removing common words that have no practical meaning or are very frequent in text processing, such as prepositions, articles, conjunctions, etc. These words may not be very informative for some tasks and need to be removed.

    Example: Suppose we have the following raw text:

    I love natural language processing.
    

    We can perform the following operations to remove stop words:

    • Remove stop words:["love", "natural", "language", "processing"]

    Common stop word lists can be obtained through open source NLP tool libraries (such as NLTK).

  4. Removing Low-Frequency Words

    Removing low-frequency words refers to removing words that appear less frequently in the entire text corpus, which can reduce noise and data dimensions in the data. Generally, a threshold is set to remove words that appear less than the threshold in the corpus.

    Example: Suppose we have the following raw text:

    I love natural language processing.
    

    If we set the threshold to 2, only the two words love and natural can be retained, while language and processing will be removed.

  5. Building Vocabulary

    Building a dictionary assigns unique indices to all words in a text for subsequent processing and representation. Dictionaries are generally built by going through the entire corpus and associating each word with a unique identifier (an integer).

    Example: Suppose we have the following raw text:

    I love natural language processing.
    

    We can build a dictionary as follows:

    • Dictionary built:{"I": 0, "love": 1, "natural": 2, "language": 3, "processing": 4}
  6. Word vector representation (Word Vector)

    Word vector representation is the process of converting words in text into numerical vectors. Common word vector representation methods include one-hot encoding, bag of words model (Bag of Words), word frequency-inverse document frequency (TF-IDF) and word embedding (Word Embedding), etc.

    Example: Suppose we have the following raw text:

    I love natural language processing.
    

    We can perform the following word vectorization operations:

    • Word vector representation:[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]

    Specifically, one-hot encoding means that each word position is 1, and other positions are 0; the bag-of-word model represents the number of occurrences of each word; TF-IDF represents the frequency of word occurrence and its occurrence in the entire corpus The relationship between the frequency of occurrence; word embedding represents their position in the vector space by calculating the similarity between words.

lexical analysis

  1. Named Entity Recognition

    Named entity recognition is a text processing task that is used to identify named entities with special meaning in text, such as names of people, places, organizations, etc. This helps us extract key information from text and understand the relationships between entities.

    Example: Suppose we have the following text:

    苹果公司成立于1976年,创始人是史蒂夫·乔布斯、史蒂夫·沃兹尼亚克和罗南·韦恩。
    

    In named entity recognition, we can identify named entities in text into the following categories:

    • Organization name:苹果公司
    • Name:史蒂夫·乔布斯、史蒂夫·沃兹尼亚克、罗南·韦恩

    Through named entity recognition, we can identify important entity information in text.

  2. Stemming and Lemmatization

    Stemming and lemmatization are the processes of converting words into their stems or base forms to eliminate the impact of different word forms on text analysis. Stemming is a relatively simple processing method that only takes the basic part of the word, while lemmatization takes into account factors such as context and part of speech.

    Example: Suppose we have the following text:

    Cats are running in the park, and they love to play with mice.
    

    In stemming and lemmatization, we can process words in the text as follows:

    • Stemming results:cat, are, run, in, the, park, and, they, love, to, play, with, mice
    • Lemmatization results:cat, be, run, in, the, park, and, they, love, to, play, with, mouse

    Through stemming and lemmatization, we can unify words of different morphological forms into their basic forms, reducing noise and redundancy in text.

  3. Part-of-Speech Tagging

    Part-of-speech tagging is to assign a part-of-speech tag to each word in the text, which is used to indicate the grammatical role of the word in the sentence. The annotated parts of speech include nouns, verbs, adjectives, etc., which play an important role in grammatical analysis and other NLP tasks.

    Example: Suppose we have the following text:

    Cats are running in the park.
    

    Part-of-speech tagging can assign the following tags to each word in a sentence:

    • Noun:Cats, park
    • Verb:are, running
    • Preposition:in
    • Article:the

    Through part-of-speech tagging, you can better understand the grammatical structure of the sentence and the role that words play in the sentence.

Syntax analysis

  1. Syntax tree parsing

    *Syntactic tree*Parsing is the process of analyzing sentences into a tree-like structure, where each node represents a phrase or word and the edges represent the grammatical relationships between them. We can build a syntactic tree by splitting sentences step by step until we reach the smallest phrases and words.

    For example, consider the sentence: "The cat is sitting on the mat." (The cat is sitting on the mat.) Through syntax tree parsing, we can generate the following syntax tree:

                   sitting
                    /     \
                 is       on
                /          \
            cat           mat
           /
        The
    

    In a syntactic tree, the top-most node is the root node of the entire sentence, and each word becomes a leaf node of the tree. Edges represent grammatical relationships between phrases, such as the relationship between the verb "sitting" and its subject "is", and the relationship between the verb "sitting" and the prepositional phrase "on the mat". Through syntax tree analysis, we can clearly see the hierarchical relationship and structure between each word.

  2. Dependency analysis

    Dependency analysis is the process of describing the dependencies between words in a sentence. Each word is viewed as a node, and edges represent dependencies between these words, that is, one word is a modification or subordination of another word.

    Taking the sentence: "The cat is sitting on the mat." (The cat is sitting on the mat.) as an example, through dependency analysis, we can get the following dependency graph

                 sitting
                 /     \
              is       mat
             /          |
          cat         on
          |
         The
    

    In this dependency graph, each word is represented as a node, and edges represent dependencies between words. For example, the verb "sitting" depends on "is", indicating that it is the complement of "is"; the noun "mat" depends on the preposition "on", indicating that it is a modifier of "on"; the noun "cat" depends on the post The word "The" means that it is a noun modified by "The".

    Through dependency analysis, we can better understand the modification and subordination relationships between words in a sentence, helping us interpret and understand the grammatical structure of the sentence

Semantic Analysis

  1. Text Clustering

    Text clustering is the process of dividing a set of text data into different clusters so that texts within the same cluster have similar characteristics or topics. The goal of text clustering is to discover latent structures or relationships hidden in text data without requiring a priori labels. Commonly used methods include 层次聚类、k-means聚类和谱聚类etc.

    Suppose we have a set of news articles covering different topics such as sports, technology, politics, etc. We can use text clustering algorithms, such as k-means clustering, to cluster these articles. By calculating the similarity between articles, articles with similar topics are grouped into one category. For example, all sports articles are grouped into one cluster, technology articles are grouped into another cluster, and so on.

  2. Text Classification

    Text classification is the task of automatically classifying unknown text into predefined categories. The goal of text classification is to train a classifier to learn the relationship between text features and categories and to accurately classify new texts. Commonly used methods include 朴素贝叶斯、支持向量机(SVM)、深度学习模型如卷积神经网络(CNN)和循环神经网络(RNN)etc.

    Suppose we want to classify a set of movie reviews into positive, negative, and neutral reviews. We can use text classification algorithms such as Naive Bayes classifier based on machine learning. By learning from the annotated training data, the classifier is able to classify comments into appropriate categories based on their characteristics. Then, for unlabeled comments, we can use this classifier to classify them and determine their evaluation type.

  3. Topic Modeling

    Topic modeling is a method for discovering hidden topics from text data. It performs statistical analysis on the vocabulary in the document collection to infer the word distribution of each topic and the probability that each article belongs to each topic. Commonly used topic models include Latent Dirichlet Allocation (LDA)和Probabilistic Latent Semantic Analysis (PLSA)etc.

    Let's say we have a set of news articles and we want to understand the topics in these articles. By applying a topic model (such as LDA), we can discover the distribution of words for each topic and the probability of each article belonging to each topic. In this way, we can judge whether each topic is relevant to a certain field or topic based on its characteristics.

  4. Sentiment Analysis

    Sentiment analysis is the task of analyzing the emotional tendency of a text, that is, determining whether the text is a positive emotion, a negative emotion, or a neutral emotion. The goal of sentiment analysis is to identify and quantify sentiment polarity in text. Commonly used methods include 基于规则的方法、基于机器学习的方法(如朴素贝叶斯、支持向量机)和深度学习模型(如循环神经网络、Transformer)etc.

    Suppose we have a set of user reviews on social media and we want to understand users’ emotional tendencies towards a certain product. Through sentiment analysis, we can analyze the frequency of emotional words in comments and determine the emotional tendency of the comments based on the polarity of the emotional words. For example, if the frequency of positive emotion words in a review is high, we can judge that the review is a positive emotion.

  5. Word Sense Disambiguation

    Sense disambiguation refers to the task of determining the exact meaning of a word in a given context. Since many words have multiple meanings, inferring the correct semantics of words based on context is important for understanding text. Commonly used methods include 基于词典的方法、基于语境的方法和基于语义相似度的方法etc.

    Suppose we have a text: "The pianist plays beautiful music on the stage." The "pianist" here may refer to a musician, or it may refer to an insect. Through word sense disambiguation, we can consider contextual information, such as "play beautiful music" to determine that "pianist" exactly means a musician.

information extraction

Information extraction plays an important role in the construction and maintenance of knowledge graphs.

Knowledge graph is a way to store and represent knowledge in a graph structure, using nodes and edges to represent entities and relationships between entities.

Information extraction can help automatically extract structured knowledge from text and fill it into the knowledge graph.

  1. Entity extraction

    Entity extraction refers to identifying and extracting named entities of a specific type or category from given text. Named entities can be people, places, organizations, dates, times, currencies, products, etc. The goal of the entity extraction task is to locate and label these entities in text.

    Suppose we have the text of a news report: "Google is headquartered in Silicon Valley, California, and was founded in 1998." For the entity extraction task, our goal is to identify two entities in the text: 谷歌(组织机构)和美国加利福尼亚州的硅谷(地点).

  2. Relation extraction

    Relation extraction refers to extracting relationships or interactions between different entities from text. These relationships can be predefined or customized for specific contexts and tasks. The goal of the relationship extraction task is to identify and capture the relationships between entities and represent them in a structured form.

    Continuing to take the text of a news report as an example: "Google is headquartered in Silicon Valley, California, and was founded in 1998." For the relationship extraction task, our goal is to identify 谷歌和硅谷之间的总部所在地关系(located_in).

  3. event extraction

    Event extraction refers to extracting information describing events or actions from text. It involves identifying the event trigger (trigger word) in the text as well as the participants, time, place and other elements related to the event. The goal of the event extraction task is to analyze and summarize events to gain an understanding of the event content in the text.

    Suppose we have this sentence: "John went to a restaurant yesterday and ordered a pizza." For the event extraction task, our goal is to identify “去”这个触发词,并抽取出相关的参与者(约翰、一家餐厅)、时间(昨天)和动作(去).

machine translation

  1. transfer learning

    Transfer learning refers to a machine learning method that applies knowledge or experience learned on one task to another related task. In machine translation, transfer learning can improve the quality and effect of translation by training a model on a source language-target language translation task and transferring the learned knowledge to translation tasks of other language pairs.

    Suppose we have trained a neural network-based machine translation model on the English-French translation task and achieved good results. Now we want to achieve good performance on the English-German translation task. Through transfer learning, we can use the trained model as the initial model to fine-tune the English-German translation task to take advantage of existing knowledge and experience.

  2. assessment method

    Evaluation method is a way to measure the quality of the output results of a machine translation system. Assessment methods usually include automatic assessment and manual assessment. Automated evaluation methods use metrics or computer algorithms to measure the gap between translation results and reference translations. In manual evaluation, human evaluators score the translation results based on criteria such as semantic accuracy and fluency.

    Commonly used automatic evaluation methods include BLEU (lexical matching under bilingual evaluation), METEOR (multiple criteria based on word, phrase, and sentence levels) and TER (phrase error rate), etc. Manual evaluation methods usually use human evaluators to evaluate the quality of translation results, such as using professional translators to conduct reviews or using questionnaires to obtain user feedback.

  3. neural machine translation

    Neural machine translation (NMT) is a method of using neural network models to implement machine translation. It takes source language sentences as input and directly generates target language sentences as output, without the need for intermediate representation in the translation process like traditional rule- or feature-based methods.

    In neural machine translation, an encoder-decoder structure is usually used, where the encoder encodes the source language sentence into a fixed-length vector representation, and the decoder generates the target language sentence based on this vector. Through massive parallel processing and end-to-end training, neural machine translation has achieved good translation results on some language pairs.

  4. Statistical machine translation

    Statistical machine translation (SMT) is a machine translation method based on probability and statistical modeling. It uses large-scale bilingual corpora and statistical models to establish the mapping relationship between the source language and the target language.

    In statistical machine translation, common models include phrase-based models and syntax-based models. The phrase-based model divides the input sentence into several phrases, and then translates and reorganizes it; while the syntax-based model uses structural information such as syntax trees for translation. Statistical machine translation has been the mainstream method in the field of machine translation for the past few decades, but has been gradually replaced by neural machine translation in recent years.

question and answer system

  1. Search Q&A

    Retrieval Q&A refers to a Q&A method that quickly searches for matching answers in a pre-constructed knowledge base or text corpus based on questions raised by users and returns them to the user. Specific searches are often used 算法和查询语句to match questions to answers.

    Suppose there is a question: "Where is the capital of China?" For this question, we can search in the pre-constructed geographical knowledge base and return the correct answer "Beijing" by searching for the corresponding entity of "Capital of China".

  2. Generative Q&A

    Generative Q&A refers to a Q&A method that uses natural language processing technology to generate answers in knowledge bases and other data sources based on questions raised by users and returns them to the user. Usually required 对自然语言理解、语言生成、实体识别等多个模块进行深度学习和优化.

    Suppose there is a question: "Have you met Cristiano Ronaldo?" For this question, we need to identify the entity "Cristiano Ronaldo" in the question and obtain relevant information from multiple data sources to finally generate the answer "Yes, he is a professional football player who has played for many big clubs such as Real Madrid and Manchester United."

  3. Knowledge graph

    Knowledge graph refers to a knowledge base that graphically represents various entities, their attributes, relationships, and other semantic information. It is one of the core technologies for large-scale semantic web applications and is commonly used in natural language processing, semantic search and other fields.

    Suppose there is a question: "What is the historical origin of the marathon?" For this question, we can search for the entity "marathon" from the knowledge graph, obtain the attribute and relationship information of the entity, and then answer the question.

  4. Dialogue system

    A dialogue system refers to a system that conducts natural language dialogue with users and provides corresponding services and solutions based on the information and contextual context provided by the user. It usually requires 自然语言理解、对话管理、自然语言生成technology covering many aspects.

    Suppose a user needs to query travel information. The dialogue system can determine the user's needs by interacting with the user. For example, through intent recognition, it can determine the destination, time and accommodation information the user wants to query, and then return the corresponding travel plan and booking service.

text generation

  1. Machine creation

    Machine creation refers to the use of machine learning and natural language processing technology to enable machines to generate various forms of text works, such as poetry, novels, music, etc. By learning a large amount of text data and using language models and creation algorithms, machines can generate independently created text content.

    Suppose we have a machine creation model that can generate ancient poetry after training. When the user provides a topic "Autumn Night Moon", the machine creation system can generate the following sentences:“秋夜月悬空,寒风吹落杨梢。静夜思悠悠,一弯明月伴我闲游。”

  2. text rewriting

    Text rewriting refers to the use of natural language processing technology to modify and rewrite existing text to achieve better expression, improve grammar, or simplify complex sentence structures. The rewritten text retains the main message of the original text but is more readable and accurate.

    Suppose we have an original text: "Digital technology is profoundly changing people's lives and has a huge impact on all walks of life." By rewriting the text, it can be rewritten as:“数字化技术正深刻地改变着人们的生活,对各个行业都有巨大的影响。”

  3. text summary

    Text summarization refers to automatically extracting or generating a few sentences from a long document to summarize the main content of the document. Text summarization usually needs to take into account the key information, important events, entities, etc. of the document and generate concise and accurate summary content.

    Suppose we have a long news article titled "Scientists Discover New Treatment for Cancer." Through text summarization technology, the following summaries can be generated:“科学家近日发现一种新型的治疗癌症的方法,该方法基于基因编辑技术,有望在临床应用中取得重要突破。”

  4. language model

    Language models use statistical and machine learning methods to model the probability distribution of natural language sequences. It predicts the next word or phrase given a context and evaluates the generated text against existing language rules and training data. Language models are widely used in tasks such as text generation, machine translation, and speech recognition.

    Suppose we have a text generation system based on a language model. When the user enters the first half of a sentence: "Today's weather is very", the language model can predict and generate the next word or phrase, such as "sunny", to complete the sentence generation:“今天的天气非常晴朗。”

Text content understanding

  1. Chapter comprehension

    Chapter understanding refers to the overall understanding and analysis of a complete text, including understanding the structure, theme, relationship between paragraphs, and the meaning of the context. Chapter understanding needs to take into account the contextual information of the text, so that the meaning and purpose of the article can be better grasped.

    Let's say we have an article describing a person's travel experience. Through chapter comprehension, we can understand 这篇文章的结构,如开头介绍旅行的背景,中间叙述具体的旅行经历,最后总结旅行的感受和体验.

  2. logical reasoning

    Logical reasoning refers to deriving new conclusions or judgments based on existing information and logical rules. In text content understanding, logical reasoning can help us infer implicit information from text, infer the author's point of view, or determine the cause-and-effect relationship of an event.

    Suppose we have a text: "Xiao Ming likes to eat apples. He went to the supermarket today." Through logical reasoning, we can draw the conclusion:“小明今天去超市的目的可能是买苹果。”

  3. common sense reasoning

    Common sense reasoning is the process of reasoning based on people's common sense and experience in the real world. In text content understanding, common sense reasoning can help us understand implicit information, fill gaps in the text, and reason and understand events and phenomena in the text based on common sense.

    Suppose we have a text: "He opened the refrigerator and took out a carton of milk." Through common sense reasoning, we can conclude:“牛奶应该需要冷藏保存,所以它被放在冰箱里。”

*Follow-up article ideas

The follow-up article, which is the blueprint for bloggers in the field of NLP, everyone must pay attention to it! ! !

  • Basic skills required

    This section is also for the content of some subsequent articles, not for novices or students who have a certain understanding;

    Mainly in two aspects:

    The first is the basics of programming . You need to have a basic concept of python and some understanding of its syntax. You cannot understand the code structure or the installation package.

    Then there are data structures and algorithms . Be familiar with common data structures, such as lists, dictionaries, etc., and understand common algorithms, such as search, sorting, etc.

  • rough plan

    The subsequent learning process is mainly based on cases, and basically every article should have practice;

    For example, for word segmentation, you may need to do some basic learning on common packages and implement them based on application scenarios. We will not fully explain how to use them, such as importing files, storage after word segmentation, etc.;

    Then you may practice it by combining web development and GUI program development, such as writing a general web interface, uploading a piece of text, and returning the word segmentation results; so that many people who do not know how to program can use it directly; of course, the content is still more based on Technology and implementation process!

Basic skills required**

This section is also for the content of some subsequent articles, not for novices or students who have a certain understanding;

Mainly in two aspects:

The first is the basics of programming . You need to have a basic concept of python and some understanding of its syntax. You cannot understand the code structure or the installation package.

Then there are data structures and algorithms . Be familiar with common data structures, such as lists, dictionaries, etc., and understand common algorithms, such as search, sorting, etc.

  • rough plan

    The subsequent learning process is mainly based on cases, and basically every article should have practice;

    For example, for word segmentation, you may need to do some basic learning on common packages and implement them based on application scenarios. We will not fully explain how to use them, such as importing files, storage after word segmentation, etc.;

    Then you may practice it by combining web development and GUI program development, such as writing a general web interface, uploading a piece of text, and returning the word segmentation results; so that many people who do not know how to program can use it directly; of course, the content is still more based on Technology and implementation process!

The above is the entire content of the NLP knowledge system compiled by me. Please look forward to the follow-up articles! ! !


Thanks for reading!

Original link: [Exquisite] NLP natural language processing learning route (knowledge system)

Welcome to follow the blogger’s personal mini program!

Guess you like

Origin blog.csdn.net/qq_45730223/article/details/132702281