特征工程——Tabular Data Features & multimodal features

企业开发 2023-04-10 05:32:43 阅读次数: 0

一、前言

机器学习时期，要花费大量的时间在特征设计上，好的输入数据可以让训练事半功倍。而有了深度学习后，神经网络可以自动实现特征提取，解放了手工(理论上是这样，实际也是要进行特征筛选的，因为在应用中特征一般都很大，几千万或上亿，如果输入网络的特征过多，则模型参数量就会十分巨大，模型训练起来就会十分困难)。

二、Tabular Data Features(表格数据特征)

数值型数据(例如用户ID，年龄)等，直接使用或划分为n个bin区间(连续形式数据离散化)
类别型数据，one-hot编码/multi-hot编码
时间类型数据
特征交叉

注：此类特征使用时一般要先输入embedding层获得潜入向量，再输入神经网络

三、multimodal features

多模态特征被馈送到不同的模态编码器中，用来提取表征。多模态编码器是其他领域中使用的通用架构，例如图像中的ViT，文本中的Bert……。

Image/Video Features

对于图片、视频类型数据，基本都是通过预训练出一个神经网络的模型，获取相应的向量表达。比如可以通过Image Net里面获取一个预训练好的模型，将我们的图片样本输入到模型进行训练，然后拿到hidden_layer里面的最后一层作为当前图片样本的特征数据。

四、总结

对于文本，图片，视频类型数据，可以通过预训练好的模型获取我们所需要的数据信息。
而对于Tabular类型的数据，就需要手动进行构造特征工程，使得模型训练学习到更多维度的信息

猜你喜欢

转载自blog.csdn.net/qq_42018521/article/details/129284046

特征工程——Tabular Data Features & multimodal features

【line features】线特征

自动特征工程（Automatic Features Engineering）

SpringBoot data&Junit5&Actuator&Features

Tengine Features

Bag of features

Features Track

CDI features

spdlog Features

Paper “A Multimodal Deep Learning Method for Android Malware Detection Using Various Features” 个人概要

PCL学习七：Features-特征

openlayers官方教程（七）Vector Data——Drawing new features

openlayers官方教程（六）Vector Data——Modifying features

openlayers官方教程（九）Vector Data——Downloading features

pygplates专栏——Reonstruct features——Reconstruct flowline features

pygplates专栏——Reconstruc features——reconstruct regular features

Relief（Relevant Features）是著名的过滤式特征选择方法

Bag of features 图像特征词典原理及实现

特征点总结（features2d.hpp源码总结）

随机傅里叶特征(Random Fourier Features)

Stanford——机器学习中的特征缩放理解（Features Scaling）

特征点Features2D类介绍

pygplates专栏——Sample code——Create features（新建/查询特征）

DATA SHARING Help JetBrains improve its products by sending anonymous data about features and plugin

Dimension Table Features(原创)

ActiveMQ - Clustering,Features,Wildcards

hive-the summaries of features

An Introduction to the Drupal Features Module

svn server base features

Spring Boot features - Profiles

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)