Practical Machine Learning Notes (6): Feature Engineering

1. Feature Engineering

  • Machine learning algorithms prefer well define fixed length input/output (Machine learning prefers fixed input and output)

  • Feature engineering(FE) is the key to ML method before deep learning(DL)

    • in a computer vision task ,people try various FE methods and then train a SVM model
  • DL train deep neural networks to extract features(Deep learning can automatically extract features, while many machine learning methods require FE to extract features

    • features are relevant to the task

2. Tabular data features

  • int/float : directly use or bin to n unique int values ​​(data conversion)

  • categorical data:one-hot encoding (data one-hot encoding)

    • map rare categories into “unknown”
  • Data-time :a feature list such as (time transformation)

    • [year,month,day,day_of_year,week_of_year,…]
  • Feature combination: Cartesian product of two feature groups (data combination)

    • [cat ,dog] * [male,female] -->
    • [(cat,male),(cat,female),(dog,male),(dog,female)]

3. Text features (text data)

  • Represent text as token features (convert text to token)

    • Bag of words(BoW) model

      • limitations: needs careful vocabulary design ,missing context
    • Word embeddings(e.g. Word2vec) (word embedding)

      • vectorizing words such that similar words are placed close together
      • trained by predicting target word from context words
  • Pre-trained language models(e.g. BERT ,GPT-3) : (pre-trained deep neural network to extract features)

    • giant transformer models
    • traind with large amount of unannotated data
    • fine-tuning for downstream task

4. image/video features (picture/video data)

  • traditionally extract images by hand-craft features such as SIFT (manual extraction)
  • now commonly use pre-trained deep neural networks (pre-trained neural network)
    • ResNet:trained with ImageNet(Image classification)
    • I3D:trained with Kinetics(action classifition)

5. Summary

  • Features matter
  • Features are hand-crafted or learned by deep neural networks (Either manual or deep neural network pre-training)

おすすめ

転載: blog.csdn.net/jerry_liufeng/article/details/123455027