Big Data technologies commonly used 12 Tools Summary

In order to meet the main needs of enterprises, big data tools are rapidly applied. In the decade of big data technology appears as a concept and business strategy, the emergence of the thousands of tools to perform a variety of tasks and processes. The introduction of these tools providers are committed to saving time and costs for businesses, and found that corporate profits can make business insights. Obviously, the big data analysis tool market is growing.

  Many large initial data analysis tools like Hadoop Big Data software framework, like all open source projects, but commercial entities have sprung up, providing a new tool or business support and development for the open source products.

  The choice of these tools is a challenge, especially as many large data tool has only a single purpose, and companies need to use big data to complete many different tasks, so businesses of analysis toolbox becomes too full. On the recommendation of expert consultants in this field of industry, mainly the following list a series of big data analysis tools, and lists the three main categories.

  The main tool of big data

  mentioned above, the big data tools tend to use a single category, and there are many ways to use big data. So can by category, and then analyze each analysis tool.

  || big data tools: Data storage and management of

  large data storage data are from the beginning. This means that from the beginning of Hadoop big data framework. It is open source software framework developed by the Apache Foundation, distributed storage with very large data sets on clusters of computers.

  Obviously, storing large amounts of information needed is essential for big data. But more importantly, the need for a way to centralize all of the data to a certain formation / management structure to produce insight. Therefore, the large data storage and management is the real foundation, but no such analysis platform is not going to work. In some cases, these solutions include employee training.

  The main tool of big data in this area are:

  1. Cloudera

  Basically, Hadoop added some additional services, businesses will need these services, because the big data is not a simple exercise. Cloudera service team not only help companies build large data clusters, can also help train employees better access to data.

  MongoDB 2.

  MongoDB Big Data is the most popular database, data management as it applies to unstructured data or large data often change frequently.

  3. Talend

  offers a wide range of solutions as a company, Talend's products are built around integration platform, the platform combines big data, cloud computing, applications, and real-time data integration, master data management and data preparation.

  Talend Data Quality includes big data integration and governance capabilities

  || big data tools: Data clean up

  before the business really handle large amounts of data to get insights, first need to be cleaned and convert the content into a remote retrieval. Large data sets tend to be unstructured and unorganized, hence the need for some sort of clean up or convert.

  In this day and age, to clean up the data becomes more necessary because the data can come from anywhere: mobile network, networking, social media. Not all of these data are likely to be "clean", to generate its opinion, and therefore a good data cleaning tools can change all the difference. In fact, in the next few years, it will effectively clean up the data as a competitive advantage among large data systems and acceptable really good data system.

  OpenRefine 4.

  OpenRefine is an easy to use open source tools to clean up messy data by deleting duplicates, field blanks and other errors. It is open source software, but it has a large community can help.

  5. DataCleaner

  And OpenRefine Similarly, DataCleaner converting the semi-structured data set into data readable visualization tools clean-readable data set. The company also provides data warehousing and data management services.

  6. Microsoft Excel

  you can import data from various data sources. Excel / paste operation is particularly useful for manual data entry and replication. It can eliminate duplicate, find, replace, spell checking, and many formulas for converting data. But it soon in trouble, not suitable for large data sets.

Here I would like to recommend my own build large data exchange learning skirt qq: 522 189 307, there are learning skirt big data development, if you are learning to big data, you are welcome to join small series, we are all party software development, Share occasional dry (only big data development related), including a new advanced materials and advanced big data development tutorial myself finishing, advanced welcome and want to delve into the big data companion. The above information plus group can receive


  || big data tools: Data mining

  Once the data is cleaned and ready for inspection, you can start searching through the data mining process. This is the actual business process discovery, decision-making and forecasting.

  Data mining in many ways the real core of large data flow. Data mining solutions are often very complex, but strive to provide an interesting and user-friendly user interface, which is easier said than done. Another challenge faced by data mining tools are: they do need to work to develop queries, data mining tools is the ability to use it no better than the professionals.

  RapidMiner 7.

  RapidMiner is an easy to use tool for predictive analytics, with a very user-friendly visual interface, which means that companies without writing code, you can run the analysis product.

  IBM SPSS Modeler 8.

  IBM SPSS Modeler is a suite for enterprise-class advanced analytics product for data mining. IBM's services and consulting is undoubtedly second to none.

  Teradata 9.

  Teradata provide end to end solutions for data warehousing, big data and analytics and marketing applications. All this means that the company's business can truly become a data-driven business, and provide business services, consulting, training and support.

  Like many current big data tools, like, RapidMiner solution also includes cloud computing solutions

  || big data tools: Data Visualization

  Data visualization is the way in enterprise data in a readable format. This is the business charts and graphs and to view the data into perspective method.

  As scientific visualization and data, it is an art form. The big data company will have more and more data scientists and senior management, it is important that you can provide more extensive visualization services for their employees. Each member of the team of sales representatives, support, middle management needs to understand it, so the focus is availability. However, it is sometimes easy to read visual depth feature set and read inconsistent, which has become a major challenge for data visualization tools.

  10. Tableau

  as one of the leaders in this field, its data visualization tools focus on business intelligence, no programming to create a variety of maps, charts, graphs, and so on. Tableau A total of five products, including a free version called Tableau Public is for potential customers to try.

  Silk 11.

  Silk is a simple version of Tableau, Silk enables organizations to visualize data into maps and charts, without any programming. It will even attempt to automatically visualize data in the first loaded. It also allows easy online publishing results.

  Chartio 12.

  Chartio use their own visual query language, just a few clicks you can create powerful dashboards, without having to know SQL or other modeling languages. The difference is that with other companies directly connected to the database, so no data warehouse.

  IBM Watson Analytics ||

  IBM Watson Analytics is a machine learning (ML) and artificial intelligence in combination (AI) can help provide intelligent data Scientific Assistant, provides a wide range of scientific skills set of data the user guide for business analysts and data scientists.

  || three big data tools

  Ritesh Ramesh PricewaterhouseCoopers mobile data and analysis plan, chief technology officer, said on sophistication and market strategies, the big data tools broken down into three layers.

  First layer: the largest one, is a set of open source tools. Each company began in this way, like Cloudera and Hortonworks. In addition to basic infrastructure. And storage outside the server, the value is very small. Most cloud computing vendors have commercialized this layer.

  Second layer: This is most of these vendors have realized the need to increase their share of the local market, they must build some proprietary applications based on open source tools to suppliers from other areas. For example, Cloudera Hadoop company build something similar to the kernel data science platform.

  Third layer: These are vertical specialized applications. Most of these companies such as cooperation with PricewaterhouseCoopers, Cognizant or systems integrator Accenture. This is the real value, but this is the big data tool manufacturers very effective competitive strategy.

  Ramesh said that in addition to the basic functions, there are three tools in the field needs. The first is the data processing tools. He said, "learning tool data is an important tool for customer data quality and performance analysis toolkit that can handle 50 million rows of data to find insights."

  , He said, the industry's leading suppliers also include Trifacta, Paxata and Talend company.

  The second category is the application management application, such as how companies have metadata definitions. "A lot of people have to work hard. It will be a lot of garbage dumps data to the lake. Not many tools to work effectively in a data lake on the market. Since most of this work is done by the IT staff, they are more interested input data to the data the lake, rather than the governance structure put around it. "Ramesh said.

  Top suppliers in this field of industry: Waterline Data, Tamr data cataloging tools and Collibra.

  The third category is often demand security applications. Ramesh said, "People want a single product with all the security access layer, column, row, and objects. They want to support differentiated user data object access, and security products."

Guess you like

Origin blog.csdn.net/yyu000001/article/details/90548035