Do you understand "dark data"?

Insert picture description here
Dark data or "dust data" is made up of all redundant, often forgotten data that is collected by companies and organizations in the course of their activities, but then not used. This information is unstructured, unmarked, and unanalyzed, and it often only exists in networks and servers, occupying valuable space. So, how is this dark data accumulated and how should it be used?

There are many ways to collect dark data. It may include user activity logs, customer conversations or email records, server monitoring logs, video files, and machine and sensor information generated by the Internet of Things. Dark data may also include data that can no longer be accessed because it is stored on obsolete devices.

There are three main types of dark data:

  • The first is traditional text-based data. This may include emails, logs, and documents.
  • The second type is non-traditional data. This includes unmarked audio and video files, still images, and sound files. This type of dark data cannot be analyzed by traditional analysis techniques and requires artificial intelligence for analysis, such as computer vision, pattern and facial recognition. For example, video analysis software can now browse images and videos and mark specific elements such as cats, birthday cakes, chairs, etc. The marked images can then be searched to find specific features, and the frequency and location of their appearance can be recorded to convert the dark data into a usable form.
  • The third type is depth data. This includes information that cannot be reached by search engines in the deep web. Most of these in-depth data are private and controlled by the government or private institutions. It includes data, medical records, legal records, financial information, and organization-specific databases curated by academics, government agencies, and local communities.

Keeping dark data may bring hidden dangers to the organization. The stored data can hold sensitive information that the company may not know, including proprietary information and personal information of employees and customers. When an organization does not know what data it has, it is difficult to protect it. Storing so much data can also lead to higher costs. Companies may also violate data compliance laws and regulations, which require strengthening the protection of certain types of data. If an organization does not know what data it has, this may lead to increased costs and expenses for compliance monitoring.

On the other hand, dark data may prove to be a valuable asset. It can save information that cannot be obtained in any other format. Deep learning and artificial intelligence are beginning to provide companies with new hopes for extracting and monetizing these data. New data extraction tools include DeepDive and Snorkel developed by Stanford University, and Dark Vision, a technology demonstration application that uses IBM Watson technology to extract dark data from video.

Springwise recently developed a facial recognition system that can capture emotions on the spot, as well as a system that collects step count data for retailers and makes it easily accessible. Such innovations can make big data easier to use, thereby helping to reduce the amount of "dark data".

This article is reproduced from snow beast software
more exciting recommend please visit @ snow beast Software's official website

Guess you like

Origin blog.csdn.net/u014674420/article/details/112308610