Inspired by a now - as seen in the new vision of the Internet

_ Now inspired a new vision in terms of Internet

People standing technical point of view, it is said that some of the ideas.

Delete the duplicate data hard disk space of the sources of the problem Imagination

I do not know in front of you, there is not found, unknowingly, computer hard disk storage space is getting enough. Although nowadays computers increasingly hard drive, but it can not meet the growing storage needs. There is no way to save a little of it? The answer is, and I fiercely with everything this powerful file (including folders and you imagine all kinds of files are called, if the angle standing on linux then) search tool found under the previous computer duplicate data collection Removal tool. Looking for a long time did not find that the aid had to delete duplicate data 360 of the tool. Standing Personally, I think 360 this data to find and delete duplicate tools, is good. But the lack of customization. For example I want to set the file type and size thresholds under search. No way. We will use it.

However, this data is clearly not delete files based on that I can accept. So we continue to search for some information. Know deduplication technology in today's world, are more likely to use the cloud server vendors, data security and data storage vendors, of course, NAS vendors may be involved in this one.

Cloud server vendors can be divided into two parts, one is the cloud network disk, referred to as the network disk, because really do not see the traces of clouds, another part of the cloud server itself.

Let me talk network disk, network disk storage in order to save costs, network disk manufacturers is bound to consider deduplication and redundant data storage problems. The same time, we often send large files by email, sometimes upload the file have not found an open (may be an operating system image, several GB), instant upload is complete, then the transfer station function which shows an increase in the mailbox functionally redundant data storage and deletion of.

Cloud server, then it comes to virtualized. As we all know, all cloud servers are based on the kvm, xen virtualization, etc. is formed on the high with the machine. Even when we buy or we use cat /proc/cpuinfoother means to find out if we buy or order VPS is a high configuration. In fact, the performance may be very slag slag. Of course, the cloud server vendors may do load balancing. So that all can accept. Here that is virtualized so many servers, it will naturally form a large number of redundant data, how this redundant data storage, technical considerations are naturally cloud server vendors are most concerned about.

The current deduplication, most or file-based, it was also studied the block-based and byte-based. Behind these two cases may require a specific file system format, such as FUSE lot of people mentioned. Of course, I still could not find one kind of easy to use for high-performance personal use of duplicate data remove the software. The ideal situation is:

  1. There should be no duplicate data, duplicate data problems should not appear on the PC at least.
  2. In theory each data check computer's hard drive when it should have been fast segmentation and indexing, to establish a database to your hard bottom.
  3. Each comes in a new file, it should be fast search index database, if there are duplicate data come in, do not ask whether to replace the direct connection it wants to establish a new soft.
  4. Should no longer limited partitions, only one partition bottom, appearance can keep multiple partitions visual experience, most people can not prevent all of a sudden adapt. This is for the life of the SSD should be for the good.
  5. Use database indexing and fast search, the speed should be. All operations should be smooth and without feeling, and hardly a drain on resources.

Imagine sources of technical problems or technical idea that the drama of the Swordsman "Silicon Valley" of the compression technology

As a technical person, natural science fiction look normal, to see technology piece of course bragging normal. So it will be many years ago collection of "Silicon Valley" of this "bad film" turned out to see the way the film download the latest of several season through special channels. "Silicon Valley" of this film, the details of which I will not speak a variety of thought-provoking. Say one thing about the entire film. That is data compression. See the back, we talked about the beginning of the new Internet. The new Internet is to be built on the powerful compression technology.

About compression technology, in fact, it is personally interested. Compression techniques are more things involved. The first time I walked into know almost ah, compressed sensing was just emerging soon, what sparse storage ah, every time I see those words make people heart surging. Know almost point of view when it comes to compression technology, referred to entropy.

This time around again in Wikipedia skim through the compression algorithms (including lossy and lossless compression) list.

Another reason for this interest is the work involved in small and medium size of data (less than 1GB or so, at least I think it is small and medium size, large data relative terms) storage and transmission problems. Using the general binary set of triples or sparse storage can greatly reduce the file size, and some level of lossy data compression, the volume can be greatly reduced. The first contact with an open source code, which uses a more or less lossy compression level. Lossy compression by the individual level, the float is mainly reflected in the matrix of an integer + single floating point matrix memory, and are each an integer of an integer of using fewer bits, e.g. int8 this.

Imagine the impact on the digital universe of the real universe

The "IDC latest research report: 2020" digital universe "" mentioned in 2020, the amount of information in the digital universe reached 40 trillion GB, really scary. So I thought, if not as strong as the "Silicon Valley" in that super Niubi Hong Hong compression algorithm, if not revolutionary deduplication technology, resources and objective truth of the universe will continue to be the digital universe collapse.

Twenties of the 21st century, with the arrival of 5G technology, the rise and commercial, fiery live video technology, the development of online shopping spurt brought online transactions, data expanding faster and faster. It is contemplated that the earth's resources continue to be depleted ing. In such a case, the artificial intelligence AI can make a good compression algorithm model it is still a good deduplication model it?

Guess you like

Origin www.cnblogs.com/liq07lzucn/p/12057344.html