You know that big data tools and frameworks language do?

To understand the current and future state of big data, we interviewed 31 IT technology executives from 28 organizations. We asked them, "Are you in data extraction, analysis and reporting of the most popular languages ​​used, what tools and frameworks are?" The following article is to tell them our records, after summarized below.

Python,Spark,Kafka

With big data and artificial intelligence AL / ML machine learning to promote, Scala and Python language and Apache Spark increasingly popular.

Migration of OLAP data warehouse, learning Python development if the machine uses less structure. Developers to write Python ML model is very convenient, Python extensions have provided support.

Kafka for streaming extraction, R and Python programming for the development, course, Java is common. SQL will not disappear, but it's not a big data best friend, but its openness so that more people can access the data, Gartner let SQL on Hadoop out of the trough of disillusionment.

We see a lot of data warehouse technologies, such as Hadoop, Spark and Kafka and other emerging technologies, many people also Redshift, Snowflake and Big Query very interested.

ML machine learning technology stack, adds TensorFlow a powerful tool to increase people's confidence in their learning, reducing the learning curve.

The third is Kubernetes, also gathered a lot of fans, and gradually expand the user area.

There are other open source tools are widely used, such as Spark, R and Python, which is integrated with the platform provide a reason for these open source tools.

In large data workflow, it can be introduced with Python, R Spark development code or script a new node. When executed, the node will become part of the workflow execution code conduit node.

Some time ago, R language was dominant, especially in data science operations model. Now the real innovation is around Python, Python has support because many tools, libraries.

Then people began to explore Spark and Kafka. Spark at breakneck speed handle large disk volumes. Kafka is used to transfer data to a messaging system Spark, R this language is ideal for the analysis of historical data, and access to real-time data acquisition model, and help data packets in order to run real-time applications and models. Want to learn the system big data, you can join the big data technology learning buttoned Junyang: 522 189 307

Finally, I wish you all soon have achieved, to get satisfactory offer, rapid promotion and pay rise, took to the pinnacle of life.

 

[] Do you know big data development tools and frameworks big data language do?

 

 

Some common tools and framework also includes a memory relational database, such as VoltDB, Spark, Storm, Flink, Kafka, and some NoSQL database.

We offer all types of CRUD data operations LINQ API, you can call for a variety of languages, such as C #, Go, Java, JavaScript, Python, Ruby, PHP, Scala and Swift language. Because the design of high-performance (predictable low latency) database, we mainly created for data access programming instead of a statement, so far does not support SQL.

When customers want to analyze the work they are currently being implemented, we add SQL support, we will support export data to back-end data warehouse and data pool for analysis. For data extraction, Kafka and Kinesis communication tools such as pipes default data of customers are increasingly gaining attention.

We will be regarded as the primary protocol SQL companies of all sizes use the data platform. For the deployment manager cluster, we see Docker and Kubernetes applications is growing rapidly. For data extraction, Apache Kafka by many of our users, we have recently achieved certification in Confluent Kafka Connector Partner Program. In order to better processing and analysis, we often Apache Spark with Apache Ignite together and used as memory data storage.

Apache Kafka fact has become a standard, it can be extracted in large quantities near real-time data (in particular, sensor data), to the analysis of data streamed to the internet. For maximum performance analysis, database machine learning and advanced analytics is becoming a large-scale organizations to provide predictive analysis of a very important way.

For visual reports, currently on the market there are a wide variety of data visualization tools: from Tableau to Looker, from Microsoft Power BI to IBM Cognos then MicroStrategy, and so on. Business analysts have never had so many choices to visualize data report. They will certainly be so sure that their basic data analysis platform now has a large-scale and high performance, enabling them to fully and accurately obtain maximum insight from data in a few seconds or minutes,

We use a variety of data extraction and indexing tools, using Apache Kafka and NIFI project is currently the most common.

We will Hadoop YARN and HBASE / HDFS for data persistence layer, and then used for data processing, predictive modeling, analysis and depth of learning projects, such as Apache Zeppelin, Spark / Spark Streaming, Storm, SciKit-Learn and Elasticsearch these open source projects, we can also use Talend, Pentaho, Tableau and other outstanding commercial software or tools.

TensorFlow,Tableau,PowerBI

1) We use Amazon Athena (Apache Presto) for log analysis.

2) We use the Mode Analytics for data visualization and reporting.

3) We use TensorFlow to analyze traffic patterns.

To see the availability of scientific, DL frame of data from the ML angle, TensorFlow, Pytorch, Keras, Caffe made a great innovation in creating a model for large-scale data applications and ML.

BI use cases are trying to expand the size of data analysts, Tableau, PowerBI, MicroStrategy, TIBCO and Qlik trying to expand the number and role of data in front of the dashboard.

With the technical team is moving away from MapReduce, we saw Spark. Java and Python increasingly popular. Kafka is used to extract data, visualization Visual Arcadia Data, Tableau, Qlik PowerBI and generating reports.

Many projects in multiple languages ​​and multiple analysis tools. Of course, we can see a lot of usage scenarios and the SQL language for data science, languages ​​such as Python and R, but it is also the place to play classics such as Java and C # programming languages. For scientific data, we have top-Kit TensorFlow, followed by self-service BI tools such as Tableau, PowerBI and ClickView.

other

Open source world. More people are turning to streaming data, which is driven by the demand for real-time answers from.

Of course, this depends on the particular project, we have seen a variety of mechanisms are used to extract, rich text, document classification, SciByte, bulk data, smart label tool, in-depth research data. Personalized recommendations and opinions, sentiment analysis and other rich big data.

Find customers from the browser content to be used, or are looking for how to build your own tools, SQL language is still the language of big data, it can work properly on top of Hadoop and other databases.

OData is not so new, people are using it further, and some people use GraphQL to dynamically query and retrieve data from the server and client.

Server-side programming, there are many new technologies, such as MongoDB done well, Redis for caching. For Elasticsearch the AWS S3 and S3 as the backend data storage very useful, of course, has been clearly established techniques and design patterns.

R and Python people will stick with their familiar things, big data system has many API provides a lot of support. From the point of view of data extraction, it is desirable to provide as many ways to handle data input and output, can support as many tools, this is not a critical mass. Cater talent, both demand developer tools and API support.

Big companies want people to use the same scientific data and BI tools, because they have a variety of tools, thousands of people do on a standardized tool and integration with various back-end data and acceleration mode of production, including providing data integration, accelerate and a data directory and semantics defined. Data directory located in the center of the platform, will focus on security, integration and acceleration to the center of a layer may be open to all tools and data sources used in conjunction.

Big Data world in many ways will quickly develop into all development environments, including on-premise, cloud computing and so on. We see a lot of languages, and data format execution engine change. Big Data's core values ​​is to allow customers to bypass these different tools and standards, using drag and drop or code environment we provide, may not need to write any code manually, easily repeatable data pipeline as part of the framework, regardless of the technology used , platform or language can be carried out large-scale deployment.

Published 181 original articles · won praise 3 · views 30000 +

Guess you like

Origin blog.csdn.net/mnbvxiaoxin/article/details/104909479