Cedar Rock data storage solution for unstructured data

Traditional banking, insurance industry, artificial counter, credit applications, insurance claims and other services in addition to transaction information recorded in the database, also tend to produce large amounts of unstructured data: ID photos, scanned copies of paper documents, forensic document scanning pieces, scene photos, etc., according to the financial industry regulatory requirements, these documents to be stored for long periods in order to supervise the audit and avoid legal risks may exist.

Cedar Rock data storage solution for unstructured data

With the rapid development of Internet banking, the financial industry, competition is heating up, more and more financial companies want financial technology can help companies reduce the cost of Showmanship and customer service costs, improve office efficiency and risk assessment efficiency. To this end, major financial institutions competing to finance the implementation of projects, such as: intelligent counter, reducing the cost of the business opened outlets; paperless counter, counter work and enhance service efficiency; smart phone client claims, claims to enhance user efficiency; intelligent review of credit, improve risk assessment efficiency, reduce manpower costs; infrastructure of the cloud, containerization, to enhance the efficient use and management of basic resources and so on.

Behind these new financial technology. Apparently produce vast amounts of unstructured data, images, documents, audio and video, the file number and the amount of data showing explosive growth, the original storage system architecture brings more many new challenges.

Challenges brought about massive unstructured data

The business sector, access performance is critical mass of small files, directly related to the end-user experience, while a joint-stock bank branch counter the provincial system, credit system will add hundreds of millions of documents every year, a large number of small files on the file storage is a big challenge, and many banks are already considering how to file centralization.

With the VTM (Virtual Remote banking system), on-line double record system, storage capacity requirements of rapid growth, such as insurance bank and insurance company's half year to increase the data recorded double hundreds of TB of data, whether storage can provide high throughput, to protect audio and video file read and write performance is an important concern.

Most financial institutions have adopted a distributed database, Big Data technologies to achieve unified online storage and query historical data, and unstructured data storage scale may reach even EB PB grade level, how the data in this case unified storage and management, real-time query historical data, the next big data analysis, highly intelligent storage management put forward higher requirements.

The current trend is oriented IaaS clouds, private clouds and cloud computing to achieve a storage resource, distributed database implementation of the cloud, the cloud resources can be allocated on demand of structured data, elastic expansion. The cloud of unstructured data storage is the lack of a good solution, especially with the addition of audio and video data, the storage space is growing, while the unit value of these data is not high, how to reduce the unit cost of storage also need to focus on consideration.

In order to solve the banking, insurance and critical systems (such as: Cash, credit, underwriting, claims, etc.) and to accumulate large storage performance and scalability bottlenecks caused massive bills, certificates, contracts and other documents the number of financial industry unstructured data storage the technology has gone through four stages:

NAS storage stage

In the financial industry, a small number of early documents, little storage capacity stage, commonly used by financial customers NAS external storage device to place the image data, but with the massive growth of the file, a single manageable NAS file number and capacity appear a bottleneck. In the actual project, we see the number of files of tens of millions of users, access time delay may reach the second level, which will directly affect the financial service experience for end users. The increase in multiple NAS external storage devices, storage management complexity will lead to higher, the same application data stored result in fragmentation of data on different devices. In large enterprises, IT staff will spend a lot of time to complete the operation and maintenance of IT change approval process, but also always guard against such frequent changes may lead to the risk of IT operation and maintenance, and can not really focus on creating value for the business.

ECM stage

With the increase of the number of files, financial institutions began introducing ECM (enterprise content management system), ECM unified management of multiple NAS external storage devices, and can be dynamically increased NAS, External provide a unified namespace, file management with respect to the size of a single NAS storage is greatly increased. Meanwhile, ECM system also supports attribute storing and retrieving files attributes, document retrieval can be achieved across business systems to meet the needs of document management.

However, due to non-standard ECM interface protocol, require specialized application development, application of high renovation costs, mainly used in imaging systems counter financial, credit and after Christ. More importantly, ECM higher investment costs, storage costs one hundred TB of data up to several million, not suitable for the lower value of the density data stored audio and video, the maintenance cost is also very high.

Distributed database stage

With big data technology, MPP distributed database in the rise of the financial sector, the financial sector try to use these technologies to solve the problem of unstructured data storage for massive small file performance and scalability is indeed a major breakthrough, and distributed database can be implemented unified storage and retrieval of documents metadata, to meet the demand for content management.

But distributed database architecture is structured storage, file storage to replace many limitations. First, because the architecture limit MPP distributed database, it is difficult to achieve some of the advanced features of traditional storage, such as: erasure codes function (similar distributed RAID), such as file de-duplication, resulting in high storage costs, does not apply to audio and video and other low-value density data storage. Second, limited by the SQL interface, can not achieve rights management directories and subdirectories, the basic functions of quota management, directory snapshot rollback and other traditional NAS storage, leading to a lack of data security mechanisms and data reliability guarantee mechanism. In addition, SQL, NoSQL stored as a file, the standard poor interface complex to business users. This solution after trying some financial institutions, failed to become a mainstream form of large-scale application.

Object storage stage

The other hand, the Internet industry, the amount of unstructured data in recent years with the rapid development of mobile Internet and smart phones, micro-channel, broadcast a short video and other new applications have brought far more than the financial industry. Due to large amount of data, the number of files and more, and therefore need to find cost-effective storage solutions, Internet ten years ago has begun to adopt x86 server-based distributed architecture to solve mass data storage problems, there have been techniques include GoogleFS, Amazon S3, Ali FastDFS peer file storage solutions based on HTTP access protocol, due to the public cloud clout of Amazon, AWS S3 object storage becoming the de facto standard in the Internet industry, currently Ali, Tencent, Huawei's public cloud are using compatible S3 protocol object storage technology.

Technical characteristics of object storage is based on the x86 server + distributed storage software technology to build a unified storage pool using the server's local disk to achieve even EB PB grade-level large-scale storage cluster scalability. Hardware and software decoupling, dynamic phase-out and update the hardware, without the need to complete the data migration time like NAS to update equipment. The simplified user interface, the number of files in a single name space to manage the NAS compared to hundreds of times. Based SDK HTTP protocol access, without having to mount the operating system, applications can directly access, for cloud applications and containerized scene and mobile phone APP program to access the scene. Protocol standardization, standardization infrastructure in line with demand and is compatible with the public cloud, easy to use system seamless migration between public and private clouds.

In addition to the basic characteristics with object storage, cedar rock distributed object storage software focused financial sector, financial help customers build private cloud storage local resource pool. At the same time, the Internet object storage technology depth of the product, and introduced more features:
compatible FTP / file interface, support for legacy applications financial industry to achieve a smooth transition to object storage;
supports file metadata and metadata retrieval, instead of ECM capabilities to meet the needs of enterprise content management;
support directory snapshots and snapshot policy, and quickly roll back multiple versions of files, unstructured data backup to achieve free, to solve the problem of bandwidth tape library backup and retrieval of the problem of slow;
support for multiple data center disaster recovery AA and data center model to achieve business nearby read and write access;
an environment also supports copy and erasure codes (similar distributed RAID), both audio and video storage costs and performance applications demand financial core business systems;
support data hot and cold Automated tiering, while satisfying business performance, reduce storage costs cold historical data.

In summary, with the introduction of financial technology, and more unstructured data types, amount of data grows faster, the need for data storage unified management and utilization of the financial sector IT managers need to select more based on the information needs of suitable for the forward-looking and storage solutions. Future, be able to combine big data analysis, artificial intelligence technology to realize the value of financial mass of unstructured data mining, promote financial sector to flourish.

Guess you like

Origin blog.51cto.com/14636092/2461296
Recommended