National University of Science and Technology--Multimedia Analysis and Understanding--2022 Exam Recall

National University of Science and Technology – Multimedia Analysis and Understanding – 2022 Exam Questions

This course is an open-book exam, but you are not allowed to take screenshots or save the exam papers. Here you can only record the exam content based on your memory, which is not bad. All answers are for reference only.

1. Discuss what is multimedia? What applications and challenges exist?

Reference answer :
(1) . Multimedia is content combined in different content forms, such as text, audio, image, animation, video, and interactive content. Or to answer, multimedia refers to the general term for various information carriers processed by computers, including text, audio, graphics, video and interactive content.

(2) . Multimedia analysis and understanding are widely used in industries such as security, education, communication, and entertainment. Specifically, multimedia can be applied in fields such as image retrieval, content recommendation, visual surveillance, personalized video customization, social media, and video websites.

(3) . The challenges faced are as follows

  • How to represent data in different media and different modalities; data is often massive, high-dimensional, unstructured, and has its own complexity.
  • How to understand multimedia data and address issues such as semantic gaps.
  • How to mine the interrelationships between multimedia data, that is, synergy and complementarity.
  • How to meet the diverse information needs of users and handle user preferences and personalization well.

2. Explain the basic principles and solutions of backpropagation, analyze more than two typical problems in the BP algorithm, and the corresponding solutions.

Reference answer :
(1) . The basic principle of backpropagation: use the error of the output layer to estimate the error of the previous layer, and then use this error to estimate the error of the previous layer, and then propagate the error back one by one, so as to obtain all The error estimation of other layers; then use the gradient descent method, combined with the layer-by-layer error estimation, to adjust all the weights of the network.

(2) . The typical problems that can be encountered and the corresponding solutions are as follows:

  • The model produces overfitting, and the solutions include:
    a. Perform data enhancement and expand training samples.
    b. Stop training early when appropriate.
    c. Adopt Dropout, Droppath and other technologies.
    d. Add regularity ratio constraints, such as ridge regression and Lasso regression.
  • Gradient disappearance and gradient explosion, solutions include:
    a. Use Relu and other non-saturation area activation functions.
    b. Use BatchNormalization for normalization to avoid entering the saturation zone.
    c. Set the gradient clipping threshold to prevent the gradient from being too large.

3. Briefly describe the core idea and application scenarios of the pre-training model, and give the basic ideas of three typical pre-training tasks (self-supervised learning tasks).

Reference answer :
(1) . The pre-training model aims to provide better feature expression and basic model for downstream tasks by training on large-scale data in advance, and then use it as initialization, in a small supervised learning fine-tuning on datasets for specific tasks. In recent years, with the continuous development of self-supervised learning technology, large models can be trained on massive, large-scale, unlabeled data through self-supervised learning related technologies, fully learn the general knowledge contained in the data, and provide downstream tasks. Generic feature representation.

(2) . Examples of typical application scenarios:

  • Language pre-training models. Language models such as GPT, BERT, and ERINE have greatly improved the performance of related downstream services in NLP.
  • Vision pre-trained models. For example, visual models pre-trained on Imagenet 1K and ImageNee21K; or visual models trained using methods such as Moco, SimCLR, MaskFeat, MAE, BeiT, etc., have improved the performance of related downstream tasks in CV.
  • Multimodal pre-trained models. Models such as CLIP, ViL-BERT, Oscar, and ViLT have all improved the performance of multi-modal downstream services.

(3) . Some basic ideas for self-supervised tasks:

  • Language Mask Learning, MLM. By predicting the masked words in the input sentence, the contextual semantic relationship between sentences is learned.
  • Context Prediction, NSP. Determine whether two clauses are adjacent in the original text.
  • Learn by comparison. The data of the same category or the same pair is brought closer, and the data of different categories and different pairs is drawn farther away.
  • Image-Text Matching. Determine whether the current input image-text pair matches.

4. Briefly describe the main research content in the field of image semantic understanding, select a typical method for a certain type of semantic understanding task, briefly describe its basic process, and analyze its problems and related solutions.

Reference answer :
(1) . Image semantic understanding aims to study what kind of objects, what kind of instances, and the relationship between objects exist in the image. It is expected that the machine can automatically "understand" the external environment like a human being. Essentially, it learns the mapping relationship between low-level features and high-level semantics.

(2) . The basic tasks of image semantic understanding include:

  • Image Classification: Predict a class for each image.
  • Image annotation: Predict multiple semantic labels for each image.
  • Object Detection: Predict a class and a compact localization target for objects in an image.
  • Semantic Segmentation: Predict a semantic label for each pixel.
  • Image Description: Describe images in natural language.

(3) . A classic algorithm for target detection is as follows:

  • YOLO, the steps are as follows:
    a. Imagine the input image as a series of grids, and lay anchors of different sizes and sizes in each grid.
    b. Then send the picture to the feature network for feature extraction.
    c. Decode the feature map, including predicting anchor correction, confidence and category probability, etc.
    d. Filter and NMS the predicted bounding boxes.

(4) . There are problems:

  • It cannot solve the problem of changing the scale of objects in the picture.
  • Solution: increase detection head, use FPN network, etc.

5. Briefly describe the basic principles of SVD and SVD++ collaborative recommendation methods, list the basic formulas; compare their advantages and disadvantages, and discuss related improvements. (15 marks)

Reference answer :
(1) . SVD
can be expressed as a sparse matrix RR for scoring all users and all productsR. _ SVD-based recommendation method for matrixRRR decomposes and requires the matrix elements to be non-negative, as follows
RU × I = PU × KQK × I R_{U\times I}=P_{U\times K}Q_{K\times I}RU×I=PU×KQK×IThen use RRTrainingPP with known data in RP andQQQ , such thatPPP andQQQ multiplication best fits known ratings. Specifically, predicting userUUU vs. ProductIII的评分为:
r ^ u i = p u T q i \hat{r}_{ui}=p_{u}^{T}q_i r^ui=puTqiExample: eui = rui − r ^ ui e_{ui}=r_{ui}-\hat{r}_{ui}eui=ruir^ui, the total square error is:
SSE = ∑ eui 2 \mathrm{SSE}=\sum{e_{ui}^{2}}SSE=eui2Then use SSE as a loss to train the model.

(2) . SVD++
SVD++ is an improved SVD method, which mainly enhances the predictive ability of the model in the implicit interaction information between users and items (such as the user's browsing history), which can be expressed as the following formula:

r ^ u , i = μ + b u + b i + q i T ( p u + ∣ I u ∣ − 1 2 ∑ j ∈ I u y j ) \hat{r}_{u,i} = \mu + b_u + b_i + q_i^T(p_u + |I_u|^{-\frac{1}{2}}\sum{j \in I_u}y_j) r^u,i=m+bu+bi+qiT(pu+Iu21jIuyj) Among themI u I_uIuIndicates user uuu A collection of all interacted items. The main difference between SVD++ and SVD is that SVD++ introduces additional user and item implicit interaction information, which makes SVD++ more generalizable and can make recommendations without rating information.

(3) . Advantages and disadvantages

  • The SVD recommendation algorithm is relatively clean and has high computational efficiency, but the training target is single, which is easy to cause overfitting; it does not consider the implicit interaction information between users and items, and the prediction is not accurate enough.
  • SVD++ is more flexible, considering the implicit interaction information between users and items, and the effect is better, which makes the generalization ability of the model stronger, and can be recommended without scoring information. But the calculation is more complicated and the learning efficiency is slower.

(4) . Improvements

  • In the SVD recommendation method, the bias term and the regular term improve the flexibility of the model and prevent overfitting.

6. Briefly describe the basic principles of PageRank and HITS, compare their advantages and disadvantages, and try to explain the possible ways of improvement. (15 marks)

Reference answer :
(1) . Page Rank:
The basic idea of ​​PageRank: If a web page is linked by many other web pages, it means that it is generally recognized and trusted, then the higher its PagePank value, the higher its ranking; The higher the PageRank value, the more important the pages it links to, and the higher the PageRank value.
The basic formula of PageRank is
r ( p ) = α ∑ q : ( q , p ) ∈ qr ( q ) w ( q ) + ( 1 − α ) 1 N r(p)=\alpha \sum_{q:\left( q,p \right) \in q}{\frac{r\left( q \right)}{w\left( q \right)}}+\left( 1-\alpha \right) \frac{1} {N}r(p)=aq:(q,p)qw(q)r(q)+(1a )N1 r ( p ) r(p) r ( p ) : web pageqqq 's PageRank value
qqq: p p p 's backlink
w ( q ) w(q)w(q): q q The number of forward linksNN of q
N : total number of web pages in the network

(2) . HITS: Hyperlink Induced Topic Search
The basic principle is: a good "Authority" page will be pointed to by many good "Hub" pages; a good "Hub" page will point to many good "Authority" pages.

(3) . Advantages and disadvantages

  • PageRanK:
    Excellent: It is a static algorithm that has nothing to do with the query, it is global, and it is not suitable for cheating.
    Cons: Unrelated to topic, old pages rank higher than new ones.
  • HITS:
    Excellent: Online, localized, and has achieved good results in NLP and social networks.
    Cons: Complicated calculations, easy to cheat.

(4) . To improve
the shortcoming that PageRank is irrelevant to the topic, set a topic-sensitive PageRank algorithm; pre-calculate the importance score of the page when offline; then, calculate multiple importance scores for a page, that is, calculate this page on different topics importance score.


7. Please briefly describe more than three methods that can prevent data leakage during deep learning model training, introduce the basic principles of the methods, and analyze and compare them.

Reference answer :

  • The method based on gradient compression
    uses layered pruning technology to remove parameter gradients with small absolute values ​​and reduce redundant information in gradients; or quantizes gradients using quantization techniques to increase the difficulty of attacks.
  • The method based on differential privacy
    uses DPSGD in the local training process.
  • The method based on data transformation
    expands the original data set through the method of data enhancement, thereby affecting the gradient of the model, such as AutoML; thus protecting data privacy and not affecting the convergence of the model.
  • Cryptography-based methods,
    such as those based on homomorphic encryption, encrypt gradients and data privacy.

Guess you like

Origin blog.csdn.net/weixin_44110393/article/details/128582937