How to stream SQL debugging on EMR

1 Introduction

Starting EMR-3.21.0 version, EMR officially released Spark Streaming SQL functions, supports the use of Spark SQL streaming data processing. After two versions of the iteration, a lot of user feedback when using SQL when streaming job development and debugging process results correctness is too much trouble. Currently, we require users to complete real data stream development, in order to see whether the results in the results correct storage system. Some data storage systems and inconvenient to view, such as Kafka. Here a simple list of points is not easy to debug the problem:

  • SQL can not visually see the results in the console output, the traditional need to look at the output of the storage system.
  • Data changes: including input and output data are constantly changing, can not easily see the results of each batch.
  • metrics for each batch execution is not easy viewing, the traditional need to look in the log.

In addition In addition, there are some advanced features can also take into account the debugging tools, such as:

  • Data from analog functions real data source.
  • Data sampling.

This article will introduce EMR mention

Guess you like

Origin yq.aliyun.com/articles/719714