Eight large data entry must know the actual project

Big data if applied properly could easily out of control, and may consume corporate resources and budgets. Here will introduce some best practices to avoid confusion of big data.

  Big Data can provide excellent insight for the user, may also allow enterprises overwhelmed. The companies make their own choices based on their collected data. The main problem is that enterprises are facing big data is collected by technical professionals technical solution, but the best practice is their business processes.

  Due to resource data input device and get the explosive growth, it collected more than ever. According to IBM's survey, most US companies the amount of data stored to 100TB, while the US government departments and enterprises each year due to data corruption caused losses of up to $ 3.1 trillion.

  However, companies create a data warehouse or a data lake is full and their data, most of the data has not been used or have been used. Enterprise data lake will be the rapid accumulation of information pool is full of data.

  The most basic problem is that many data are only partially treated or completely deviated from the foundation. Data is not collected properly or collecting means is not defined properly. Business enterprises are clearly linked to big data.

  This is a small problem for regular, everyday, small-level data that are used in commercial databases. For enterprises, the need for big data processing large amounts of information. Possibility because of the huge size of its data, or to benefit the greater confusion. Therefore, the "right" become more important.

  So in the big data "correct" what does it mean?

\
  The fact is that the concept of "best practices big data" is evolving, because the data analysis itself is rapidly developing field. However, companies need to compete with the best possible strategy. Therefore proposed some best practices, companies can hope to avoid being inundated with useless data, the data will not be submerged in the lake.

  (1) the definition of big data business goals

  industry has a bad habit such as Hadoop cluster of new things easy distraction. Before companies began to take advantage of big data analysis to understand business needs and goals should be the first step taken by the company, the most important step. Business users must clear the results they want.

  This is the company management must take the lead in local and must be followed in technology. If the management of the enterprise is no clear business goals, it will not be collected and created data is correct. Many organizations collect all the data that can be collected, and then to remove them do not need. This will cause a lot of unnecessary work, so companies should clean out the information they need, rather than collect all the information.

  (2) evaluate and develop strategic partnerships with

  large data items should not be done by the IT department isolation. It must involve data owners, it will be a business or a big data technology to provide suppliers or advisory bodies, these vendors can bring in outside perspective and vision for the organization, and to assess the current situation of the organization.

  In developing the strategy, it should continue to carry out checks to ensure that the data collection necessary for business, and will provide the necessary insight for the enterprise, as cooks always check their work as throughout the cooking process. After collecting all of the content, not just re-check, because if the data error during this period, which means checking the data should have been back to the beginning and start when unnecessary.

  Business by working with those who benefit from the project, to ensure that involved together to succeed.

  (3) companies determine what you have and what you need in the big data in

  large amounts of data is not the same good available data. Companies may be mixed together in the correct data somewhere, but it will be decided by the enterprises themselves. The more random the data collected, the more often disorganized, different forms.

  It is also important to determine the company they have is not the thing. Once collected, the data required for the project, it is possible to determine what is missing, before the start of the work must take everything ready.

  Companies do not always know in advance what data fields required, so make sure the software's flexibility to adjust in the implementation process. This enterprise has to determine what is required and what the concept is consistent in big data.

  The bottom line is that companies must test data and test results. Enterprises may be surprised to find that not getting the answers you need. Before first enterprise to carry out the project, it is best to find out.

  (4) maintain ongoing communication and evaluation of

  effective collaboration requires continuous communication between the interests of stakeholders and the IT department. Business objectives may change during the progress of the project, if this happens, you must change to convey information to the IT department. Companies may need to stop a form of data collection, and start collecting another form of data. Companies do not want that to continue.

  Draw a clear map, breaking the expected or desired results in some locations. If it is a 12-month project, check once every three months. This gives companies a chance to review and change course.

  (5) If you start slow, you need to take advantage of big data fast response

  of a large data projects undertaken by enterprises should not be too ambitious. Require proof of concept or pilot projects from the beginning, this project is relatively small, and easy to manage.

  Select companies want to improve business processes of a field, but it will not have much impact in case things go wrong or a serious error. In addition, if you do not need to solve the problem, do not force the use of big data solutions.

  Companies should also be implemented using agile techniques and iterative methods. Agile is a means of operation, is not limited to the development. Agile development is what? For example, write a short code and begin testing a variety of methods, and then add, and then thoroughly tested, rinse, repeat. This is a method may be applied to any process, rather than just programming.

  Agile and iterative implementation techniques that can provide quick solutions in a short time according to the current demand, rather than a one-time waterfall method.

  (6) assessment of the technical requirements of large data

  according to research firm IDC's data, the vast majority of the data is unstructured, possibly as high as 90%. But companies still need to see where the data from the data to determine the best storage. Companies can choose various variants of SQL or NoSQL databases as well as two.

  Companies need to evaluate real-time insight or after work? Apache Spark may need to be processed in real time, or you can use Hadoop (This is a batch process). There geographic database for data spread across multiple locations, which may be the company with multiple locations and data center requirements.

  In addition, companies need to see the specific analysis for each database to see if they apply. IBM acquired high-performance analytics company Netezza equipment manufacturers, and embedded SAS Teradata and Greenplum accelerator, Oracle implemented using a special language on the analysis of R Exadata system, PostgreSQL special programming syntax analysis. Therefore, these tools need to see how we can meet their needs.

  (7) consistent with the cloud of big data

  companies must be careful when using cloud computing, because there may be adopted by the amount of charge, while the big data means to process large amounts of data. However, cloud computing has many advantages. Immediate or public cloud may be configured and extended very quickly at least. Amazon EMR and services such as Google BigQuery allows rapid prototyping.

  The first is to use it to quickly build a prototype environment enterprises. Many subsets of data using tools such as Microsoft and Amazon and cloud computing providers, companies can be established within a few hours, development and test environments, and for testing platform. Then, when companies develop a solid business model, move it back to the on-premises data center work.

  Another advantage of cloud computing is that companies collect most of the data are likely resides there. In this case, companies have no reason to transfer the data to an on-premises data center. Many databases and big data applications support a variety of data sources from the cloud and local, so if companies collect data in the cloud, be sure to stay there.

Here I would like to recommend my own build large data exchange learning skirt qq: 522 189 307, there are learning skirt big data development, if you are learning to big data, you are welcome to join small series, we are all party software development, Share occasional dry (only big data development related), including a new advanced materials and advanced big data development tutorial myself finishing, advanced welcome and want to delve into the big data companion. The above information plus group can receive


  (8) big data talent management companies, and has been concerned about compliance issues and access to

  big data is an emerging field, rather than Python or Java programming in these areas can be self-taught. A study by the McKinsey Global Institute shows that by 2018 the world will 140-1900000 lack of personnel with the necessary expertise, in addition to 1.5 million related to a lack of managers and analysts to make decisions based on analysis results.

  First of all it must be clear who should have access to data, and how much access. Data privacy is a major problem today, especially in Europe, to be implemented stringent common data protection regulations (GDPR), data which will cause companies to use strict limits.

  Be sure to clear all enterprise data privacy issues and who has access to sensitive data. Corporate governance should focus on other issues, such as turnover? Determine what data (if any) can enter the public cloud, what data must be retained in the local data center deployments, and who controls what.

  Finally, although some universities are being set up and increase scientific data related courses, but these courses are not standard, and each lesson plans in key skills are slightly different. So, sometimes companies do not need to recruit master's degree in scientific data and technical personnel, because they may not understand the tools used by businesses or business is located. Again, given the shortage of skills, the companies may need to do this, you can verticals

Guess you like

Origin blog.csdn.net/yyu000001/article/details/90547924