Introduction to Natural Language Processing (NLP)

Introduction to Natural Language Processing NLP

This course is Part 1 of a 4-part series on NLP 101:

  1. Introduction to Natural Language Processing (Today’s Tutorial)
  2. Introduction to BagofWords model
  3. Word2Vec: Research on Embeddings in Natural Language Processing
  4. Comparison of BagofWords and Word2Vec

This blog will briefly introduce the history of natural language processing. A brief introduction to the history of NLP shows that research began a long time ago. Researchers leveraged the foundation laid in understanding human language in linguistics and got the right idea about how to move NLP forward.

However, technical limitations became the biggest obstacle, and for a time research in this field almost stagnated. But technology has only one way, and that is to move forward. The development of technology has provided NLP researchers with sufficient computing power and broadened many horizons.

We are now at the stage where language models are helping to create virtual assistants that can talk, help complete tasks, etc. Imagine that the world has reached a point where a blind person can ask a virtual assistant to describe an image and it can do it flawlessly.
This advancement comes at the expense of rigorous computing power requirements and, most importantly, access to large amounts of data. Language is one such topic where applying enhancement techniques like we did in images won't help us at all. Therefore, subsequent research directions have focused on reducing these huge requirements in some way.

Even so, the growth of NLP over the years is commendable. The concepts are both clever and intuitive. The next blog in this series will focus on modern NLP concepts in more detail.

1. Natural language processing confusion and current situation

We see with our eyes and divide the objects we see into different groups. Applying mathematical formulas at work and even ways of communicating require the brain to process information. All these tasks are completed in less than a second. The ultimate goal of artificial intelligence has long been to rebuild the brain. But it is currently subject to some limitations, such as computing power and data.

It's extremely difficult to build a machine that can do multiple tasks at the same time. So the problem is classified and mainly divided into Computer Vision and Natural Language Processing.

We have become proficient in modeling image data. Images have basic patterns that are visible to the naked eye, and at their core, images are matrices. In particular, advances have been made with convolutional neural networks that can recognize digital patterns.

But what happens when you enter the world of natural language processing (NLP)? How to make computers understand the logic behind language, semantics, grammar, etc.? Since the core of the image is a matrix, convolution filters can easily help detect the features of the image. This is not the case for languages. At best, using CV techniques can only teach a model to recognize letters from images. This results in at least 26 labels for training, which overall is a very bad approach as it simply doesn't capture the essence of the language. So how to solve the mystery of language?

We are currently in the era of language models, such as GPT-3 (Generative Pretrained Transformer 3) and BERT (Bidirectional Encoder Representation of Transformers). These models are able to talk to us based on perfect syntax and semantics.

But where does it all begin?
Let’s take a brief look at natural language processing through history.

2. The beginning of natural language processing

Language as a science is included in the subject of linguistics. Natural language processing thus becomes a subset of linguistics itself.
Humans created language as a medium of communication to share information more efficiently. We are smart enough to create complex paradigms that serve as the basis of language. Language has undergone extensive changes throughout history, but the essence of sharing information through it remains intact.

When hearing the word apple, an image of a fresh red oval fruit comes to mind. We can instantly associate words with images in our minds. What we see, what we touch, what we feel, a complex nervous system responds to these stimuli, and the brain helps categorize these sensations into fixed words.

But what's being processed here is a computer that only knows what a 0 or a 1 is. Our rules and examples do not apply to computers. So how do you explain something as complex as language to a computer?

Linguistics itself is the scientific study of human language. This means that it requires a thorough, systematic, objective and accurate examination of all aspects of the language. Many of the foundations of natural language processing have direct links to linguistics.

De Ferdinand de Saussure, the father of linguistics in the early 20th century, described language as a systematic approach, where language was stipulated not to be seen as a chaotic whole of facts, but as an edifice in which all elements are interconnected. Sound in language represents a concept that changes depending on context. In this system, you can relate elements to each other to identify context through cause and effect relationships.

In the 1950s, Alan Turing published his famous "Computing Machinery and Intelligence" article, now known as the Turing Test, or "The Imitation Game" because the test was designed to observe Can machines imitate humans? The original article in "Computing Machines and Intelligence" asked, "Can machines think?" The big question that arises here is whether imitation equals the ability to think for oneself. The test determines a computer program's ability to simulate a human in a real-time conversation with an independent human judge.
Most notably, CAPTCHA (Completely Automated Public Turing Test to Tell Computers from Humans) pops up from time to time while browsing the internet.

In 1957, Noam Chomsky's "Syntactic Structure" took a rule-based approach but still managed to revolutionize the world of natural language processing. However this also presents its own problems, not least the computational complexity. A few inventions followed, but staggering problems posed by computational complexity seemed to have prevented any significant progress.

So what happens as researchers slowly gain enough computing power?

3. Gradually improving computing power - natural language processing has found a foothold

Once the reliance on complex hard-coded rules is reduced, it is possible to use early machine learning algorithms such as decision trees to achieve superior results.

The rise of statistical computing in the 1980s also entered the field of natural language processing. The basis of these models is simply the ability to assign weighted values ​​to input features. So this means that the input will always determine the decisions made by the model and not be based on a complex paradigm.

One of the simplest examples of statistics-based nonlinear programming are n-grams, where the concept of Markov models is used (the current state only depends on the previous state). Here, the idea is to identify the interpretation of a word in context.

One of the most successful concepts driving the field of natural language processing forward is Recurrent Neural Networks (RNN). The idea behind RNNs is clever, but extremely simple. There is a recurring cell through which input x1 passes. The recurrent unit outputs a y1 and a hidden state h1, which carries the information from x1.
The input to the RNN is a sequence of tokens representing a sequence of words. This operation is repeated for all inputs, therefore, information from the previous state is always retained. Of course, RNN is not perfect and has been replaced by more powerful algorithms such as LSTM and GRU.

These concepts use the same general idea behind RNN, but introduce some additional efficiency mechanisms. LSTM (long short-term memory) cell has three pathways or gates: input, output and forget gate. LSTM attempts to solve the problem of long-term dependence, where it can relate an input to a long sequence of its predecessors. However, LSTM brings complexity issues. Gated Recurrent Units (GRUs) solve this problem by reducing the number of gates and reducing the complexity of LSTM.

Let’s take a moment to appreciate the fact that these algorithms emerged in the late 1990s and early 2000s, when computing power was still an issue. Let's look at what we've been able to achieve with our sheer computing power.

4. Computing power is addressed – the rise of natural language processing

Let’s first understand how computers understand language. The computer can create a matrix where the columns refer to the context in which the words in the rows are evaluated.

Trying to "represent" each word in a limited N-dimensional space. The model understands each word based on its weight in each N dimension. This representation learning method first appeared in 2003, and it has been widely used in natural language processing since the 1910s.

In 2013, the word2vec series of papers was published. It uses the concept of representation learning (embedding) by expressing words in an N-dimensional space and defining vectors that exist in this space.

Depending on how good the input corpus is, proper training will show that words with similar contexts end up together when expressed in visible space, with meanings depending on the quality of the data and how often the words are used in similar contexts. its adjacent words.

This concept once again opened up the world of natural language processing, and to this day embeddings play a huge role in all subsequent research. A famous spiritual follower of Word2Vec is the FastText series of papers, which introduced the concept of subwords to further enhance the capabilities of the model.

In 2017, the concept of attention emerged, which allows the model to focus on the relevance of each input word to each output word. The confusing concept of Transformers is based on a variant of attention called self-attention.

Transformers have produced models that are powerful enough to even beat the Turing test with ease. This in itself is a testament to the progress that has been made in teaching computers how to understand language. Recently, the GPT-3 model caused a huge stir when the task-trained GPT-3 model appeared on the Internet. These models can perfectly speak to anyone, which also makes it an interesting topic, since fine-tuning them for different tasks can lead to very interesting results.

Let’s see how well transformers master the language (Figure 6).
Insert image description here

By providing some starting token, namely GPT Neo 1.3B, EleutherAI’s GPT-3 replication model provides us with a small paragraph as output with maximum respect for syntax and semantic rules.

At one point, natural language processing was deemed too expensive and its research was severely halted. We lack computing power and data access. Now, there are models that can keep talking to us without even suspecting that we are talking to non-humans.

However, if you're wondering what the 1.3B in GPT Neo's name stands for, it's the number of parameters in the model. This speaks volumes about how computationally complex today's state-of-the-art (SOTA) language models are.

reference

Guess you like

Origin blog.csdn.net/qq_40985985/article/details/125842711