Advanced deep learning - reading notes
- Image Processing
1.1 Migration style
l describes how a pictorial diagram: the Texture representation (texture represented); feature-map form a characterization of content
l how to weigh the content and style:
Image Retrieval 1.2
l content-based image retrieval: Information retrieval based on color images, textures and category picture
l hash-based image retrieval architecture
l image feature represented: Hand CNN a crafted based Features Features →
l Hash Coding learning - Method category: a hash coding concepts (high-dimensional to low-dimensional representation) b hash coding advantage (reducing memory, increase speed) c two hash coding phase (learning phase,... coding stage)
l the depth of the hash coding based supervised learning: image feature extraction layer + layer learn hash coding
l face multi-label image retrieval hash coding depth supervised learning: an input tuple, the CONV, FC generates a feature vector, generated by encoding image hash hash coding layer study, the loss of contrast in the multi-level guidance function Hamming distance corresponding to semantic similarity
1.3 title generation
l What is the image generated title: input image, text description of the picture output
l image title generation - the simplest version of the encoder-decoder: encoder extracting a feature from the CNN, decoder description generated by the RNN
l picture header generating -MS Captivator: detect words → generate sentences → re-rank sentences
l image title generation - based attention model:. a B contexts obtained by the CONV LSTM the contexts generated by the word.
- Natural Language Processing
2.1 Technical Overview:
l NLP Overview: the NLP technology → NLP core base technology → NLP +
The concept of term vectors l: natural language to a machine understood symbol medium
l application of word vectors: computing similarity, as the input of the neural network, the sentence / document representation
l word vector learning model - neural network language model: You can determine the probability of a string of natural language
l word vector learning models -CBOW and skip-grain: a.CBOW Model: a context word by itself as a predictive model input b.Skip-grain words: one word as input word prediction its context
l word vector learning model - hierarchical softmax: one for the output layer optimization strategies, calculate the probability value output layer Huffman tree
l word vector learning model - negative sampling method: to maximize the probability of positive samples, while minimizing the probability of negative samples
2.2 Sentiment Analysis
l sentiment analysis and artificial intelligence
l Construction of emotional knowledge base: sentiment analysis technology system applied research → → sentiment classification model of sentiment analysis
l emotion vector word meaning: a similar syntax and semantics of the word, the word short distance in the vector space
l emotion word vector learning models: the introduction of a sentence as the supervision and guidance of emotional information word vector model
l chapter level sentiment classification models: analysis and judgment (term global sentiment polarity of the entire document → sentences → chapter)
l sentence level sentiment classification model: the emotional polar monomer sentence classification determination (common CNN, RNN, Recursve-NN, BERT)
l attribute level sentiment classification model: the attribute of the thing described sentiment polarity judgment (fine-grained emotion analysis), two types of methods (segmented represented, indicated as a whole)
2.3 reading machine
l What is the machine to read: . A human instead of the AI automatically read information and they have to answer questions b is the field of NLP "crown jewel", involving complex technologies such as semantic understanding.
L difficulty reading machine challenges: semantic reasoning difficult, difficult semantic association, semantic representation difficult
l machine-readable data set -MCTest
l machine-readable data set -CNN / Daily Mail
l machine-readable data set -SQuAD
l machine-readable data set -Quasar-T
l machine-readable model ( BiDAF): Enter a question and an article X Y, the output of each word in the article as the answer probability beginning and the end of the answer (Bi-Directional Attention Flow For Machine Comprehension)
The main steps l machine-readable: text representation, semantic matching, understand the reasoning, the result of recommendation
2.4 QA
l What is the question answering system: . A is considered to be the original form of the Turing Test b is the basic form of the next generation of search engines.
l question answering system based Knowledge Mapping
l based on mapping knowledge quiz - depth learning methods: three key issues (representation of the problem, the association between the semantic representation of the answers, questions and answers)
l text, depth of knowledge representation: ... A vectorized vectorized c b word sentence (text) to knowledge (facts, propositions) quantification
l Q Model Knowledge-based map: determining entity body → → entity generates answer candidate answer questions represent → → calculates a score represents
l question answering system based reasoning: reasoning by known knowledge to get knowledge unknown
l Attentive Reader: LSTM were used to model two-way document and query
- Multimodal fusion
Mode classification than 3.1
l What is multimodal data: a means of communication via text, voice, images, video, or other resource model composed message.
l What is a multi-modal Sentiment Analysis: The information on single mode often with incomplete or ambiguous, multi-modal data to form a single-mode multi-angle supplementary data
l traditional multimodal fusion: by a combination of a plurality of multimodal fusion learner, the learner separate said individual learning, which can be respectively set as text, images, voice and other single view classifier. Individual learning can be a SVM, decision tree, NN and other learning algorithms.
l ensemble learning under what circumstances effective: individual learner should be "good and different" to have a certain accuracy and differences
l based on the depth learning multimodal sentiment classification: . A fusion based on two key points multimodal classification model (how effectively single modality sentiment classification, how a combination of a plurality of emotional classification monomodal results) b. training pictures classifier using transfer learning ideas
l before fusion how: .. A pre-fusion means a semantic relation between the different learning modality data, feature extraction b joint fusion and fusion is an independent feature extraction process
l from the encoder ( what AutoEncoder) is:. a feed forward neural network is a precursor, the goal is to make possible the use of input and output uniform b backpropagation training, unsupervised model, for data reduction or features. extract
l from the encoder principle: an encoder and a decoder comprising a plurality of layers codec performance better codec b c.AutoEncoder process minimizes the objective function of the difference between input and output.
l What is self-encoding sparse: . A sparse from the encoder (Sparse AutoEncoder) can be expressed as a sparse intermediate constraints, to study a more useful feature b AutoEncoder L1 regular basis with limited available Sparse AutoEncoder.
Over 3.2 retrieval mode
l What is multimodal Index: FIG acoustic search + Example text search to FIG.
l Bimodal DBN
l corresponds to from the encoder ( Correspondence Autoencoder): from the encoder by a two monomodal composition, which is responsible for each encoder corresponding modes learns
l the corresponding across models from the encoder ( Correspondence the Modal Cross-Autoencoder): right and left parts are cross-modality from the encoder, the learning image represents text modality and each modality considered
l corresponding to the full mode from the encoder ( Correspondence Full-Autoencoder the Modal): left and right sides, respectively, to a monomodal input and output of a reconstructed image, and text, from the corresponding synthesis and a corresponding mode encoder from the encoder
l What kind of a good multi-modal neural network
3.3 NER
l graphic mixing NER
- Application and Practice
4.1 Optimization
l What is optimization:
Application in depth study of optimization l
l Problems and Solutions
l Introduction to various types of optimization methods
l Application compare
4.2 parameter adjustment method
l parameter adjustment techniques
l grid search ( Grid Search)
l optimal solution
l owe fitting and over-fitting
l prevent over-fitting
l Advanced parameter adjustment
4.3 Curriculum Practice