Big data-data content classification

Big data-data content classification

structured data

  1. Relational databases can be used for representation and storage, and two-dimensional tables can be used to logically express the implemented data.

  2. Structured data: two-dimensional table (relational)

  3. Structured data: structure first, data second

  4. Data is in units of rows. One row of data represents the information of an entity. The attributes of each row of data are the same and are stored in the database,
    such as data in mysql database and csv files.

  5. Can be represented by data or unified structures, such as numbers and symbols

  6. It can be logically expressed using a two-dimensional table structure, including attributes and tuples. For example, transcripts are attributes, and 90 points are their corresponding tuples.

  7. The storage and arrangement of structured data are very regular, which is very helpful for operations such as query and modification.

  8. Mature analysis tools exist for structured data

unstructured data

  1. Unstructured data, as the name suggests, is data without a fixed structure.

    Including all formats of office documents, text, pictures, XML, HTML, various reports, images and audio/video information, etc., are all unstructured data. For this type of data, we generally store it directly as a whole, and generally store it in binary data format.

  2. Information does not have a predefined data model or is not organized in a predefined way.
    Compared with traditional databases or tagged files, it is more difficult to understand due to their non-characteristic and ambiguous nature.

  3. Typical human-generated unstructured data includes

	文本文件:文字处理、电子表格、演示文稿、电子邮件、日志
	社交媒体:来自新浪微博、微信、QQ、Facebook,Twitter,LinkedIn等平台的数据
	网站: YouTube,Instagram,照片共享网站
	移动数据:短信、位置等
	通讯:聊天、即时消息、电话录音、协作软件等
	媒体:MP3、数码照片、音频文件、视频文件
	业务应用程序:MS Office文档、生产力应用程序
  1. Typical machine-generated unstructured data includes
	卫星图像:天气数据、地形、军事活动
	科学数据:石油和天然气勘探、空间勘探、地震图像、大气数据
	数字监控:监控照片和视频
	传感器数据:交通、天气、海洋传感器
  1. Mature analytical tools exist for structured data, but analytical tools for mining unstructured data are in their infancy and development stage

  2. There is much more unstructured data than structured data

随着网络技术的发展,特别是Internet和其技术的飞快发展,使得非结构化数据的数量日趋增大.

这时,主要用于管理结构化数据的关系数据库的局限性暴露地越来越明显。因而,数据库技术相应地进入了“后关系数据库时代”,发展进入基于网络应用的非结构化数据库时代
	
在过去几年里,大数据产业更多关注的是如何处理海量、多源和异构的数据,并从中获得价值,而其中绝大多数都是结构化数据.
		
如今,非结构化数据在各行各业中占比越来越多,比如医疗行业的影像资料、教育行业的教学文档、传媒行业的音视频素材,公安执法的视频存档等,越来越多行业的企业组织都需要长期存放海量非结构化数据,业务对数据的采集、管理、应用的诉求也越来越多样化

semi-structured data

  1. Data between fully structured data (such as data in relational databases, object-oriented databases) and completely unstructured data (such as sounds, image files, etc.), such as: HTML documents, JSON, XML and some NoSQL databases,
    etc. It is semi-structured data.

  2. Semi-structured data is a form of structured data that does not conform to the data model structure associated with a relational database or other data table form, but contains relevant tags to separate semantic elements and classify records and fields. layer. Therefore, it is also called a self-describing structure
    , including log files, XML documents, JSON documents, Email, etc.

  3. Entities belonging to the same class can have different attributes, even if they are grouped together, and the order of these attributes does not matter. That is, the structure and content of general data are mixed together without obvious distinction.

  4. Semi-structured data: trees, graphs

  5. Semi-structured data: data first, then structure

Guess you like

Origin blog.csdn.net/ThinkPet/article/details/132121408