Paper Study|A review of the development of generative cross-modal steganography

Preface: This article introduces relevant work in the field of generative cross-modal steganography in the past five years.

Related reading: Review of the development of generative text steganography

Different from text steganography, cross-modal steganography needs to consider the correlation between different modalities. Common cross-modal scenarios include: Image-to-Text (such as image description), Text-to-Speech (such as voice assistant) , Text-to-Image (such as drawing according to text), etc. The following introduces related work on generative cross-modal steganography based on deep learning.

[1] -Text information hiding based on image description (Journal of Beijing University of Posts and Telecommunications, 2018) BUPT, Xue et al.

Insert image description here

  • The main idea: adopt the CNN+LSTM framework and modify the search method based on Beam Search. First, 16 bit information is added to the header of the ciphertext to indicate the length of the ciphertext, and then each design is designed according to different recipient sharing scenarios.
    • Sentence-based hiding algorithm (SSH): Using Beam Search, after all words are generated, by pairing 2 n 2^{n}2n candidate sentences are encoded with equal length, and secret information is embedded in the selection process of the final sentence;
    • Word-based hiding algorithm (WWH): When the Beam length is 1, Beam Search degenerates into greedy search. When generating words at each time step, the fixed candidate word set is 2. If the ciphertext is 1, the word with a higher probability is selected. If the ciphertext is 2, the word with a smaller probability is selected.
    • Hiding algorithm based on hash function (HH): Each word corresponds to 1 bit of secret information through the following formula. This method can extract secret information based on the text.
      v ( w , key ) = ( md 5 ( w + key ) ) mod 2 v(w,key) = (md5(w+key)) mod 2v(w,key)=(md5(w+key))mod2
  • Dataset: Flicker8k
  • Evaluation indicators: Embedding capacity: bpw; Semantic relevance:BLEU-N

[2]- Rits: real-time interactive text steganography based on automatic dialogue model (ICCCS, 2018) Tshinghua University, Yang et al.

Insert image description here

Although this article is not a cross-modal article, it points out that the generated steganographic text should be cognitively imperceptible , that is, its semantics should be related to the semantics of the context. This view is the same in the field of cross-modal text steganography. Be applicable.

  • The main idea: For dialogue scenarios, use RNN + reinforcement learning, and use fixed-length coding based on complete binary trees to embed secret information.
  • Dataset: Dialogue Dataset Negotiator
  • Evaluation indicators: efficiency:time

[3]- Steganographic visual story with mutual-perceived joint attention (EURASIP, 2021) Shanghai University, Guo et al.

Insert image description here

  • Main idea: This article proposes that the variance of the probability distribution must be within a certain range to ensure cognitive imperceptibility, and designs an adaptive information embedding and extraction method for candidate word sets.
  • Dataset: VIST
  • Evaluation index: Visual imperceptibility: Perplexity; Cognitive imperceptibility: BLEU&METEOR

[4]- ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography (SPL, 2023) USTC, Wang et al.

Insert image description here

  • Main idea: This article mainly proposes a candidate word set construction method based on semantic control
  • Dataset: MS COCO
  • Evaluation indicators: Embedding amount: bpw; Visual imperceptibility: Perplexity; Security: anti-steganalysis capability TS-FCN; Cognitive imperceptibility: BLEU& METEOR; Diversity: LSA&Self-CIDEr

[5]- Cross-Modal Text Steganography Against Synonym Substitution-Based Text Attack (SPL, 2023) Fudan University, Peng et al.

Insert image description here

  • Main ideas: Resist synonym substitution attacks, lossy steganography, use DNN to encode secret information, and unlock it in the decoding network.
  • Dataset: MS COCO
  • Evaluation indicators: Statistical imperceptibility: KL散度; Resistance to steganalysis: LS-CNN& R-BIC& SeSy&BERT-FT
  • Open source code: https://github.com/hunanpolly/Cross-Modal-Steganography

[6]- Cover Reproducible Steganography via Deep Generative Models (TDSC, 2022) USTC, Chen et al.

Insert image description here

  • Application scenarios: Text-to-Speech; Text-to-Image

[7]- Distribution-Preserving Steganography Based on Text-to-Speech Generative Models (TDSC, 2022) USTC, Chen et al.

Insert image description here

Guess you like

Origin blog.csdn.net/qq_36332660/article/details/132625269