Suning big data face questions

1. hadoop what version, CDH version number is 5.3.6. hadoop version is 2.72, remember hadoop version number and version number cdh not the same.
2. flume is to collect data single node or multi-node data collection? flume is a frame or a custom frame provided with an official? Official development framework Is there a problem?
        flume is a multi-node data collection, is the official frame for frame configuration based on the official website. Flume official website provides packet loss occurs, but not a great amount of data, we do not consider the problem of packet loss.
3. Flume when collecting data if there is data congestion should be how to deal with?
4. Flume major source development source?
    1. Avro Source listening port to receive events from AVRO AVRO external client stream. Avro Source may use multi-level flow fan outflow, the inflow fan and other effects. Further information may be accepted by the log flume Avro client terminal to provide transmission.
    2. Spooling Directory Source The Source allows data to be collected you will be placed into the "Automatic collect" directory. The Source will monitor the directory, and parses the emergence of new documents. Event processing logic is pluggable directly deleted when a file is completely read into the channel, it will be renamed or optional.
   3.NetCat Source a NetCat Source for listening on a specified port, and each row converts the received data into one event.
   4.HTTP Source: HTTP Source accepting HTTP GET and POST requests as Flume event, which the GET method should only be used for the test. The Source needs to provide a pluggable "processor" to the request into an event object, the processor must implement HTTPSourceHandler the interface, the processor accepts an HttpServletRequest object and returns a collection of objects Flume Envent. To get from one event to the HTTP request submitted to the channels in a single transaction. So as to allow passage channel image file to improve efficiency. If the processor throws an exception, Source will return a 400 HTTP status code. If the channel is full, it can no longer be added Event Channel, the Source 503 returns an HTTP status code indicating temporarily unavailable. Note that, placed in the file in the directory automatically collect can not be modified, if modified, the flume error. In addition, files can not produce the same name, if there are duplicate files are placed in, the flume error.
5. What is timed with the task, the task appears if the hive fails how to do?
     This can be used to monitor oozie task scheduling, task fails if oozie can be found at the end, and restart the task.
6. project parameters, project architecture introduced? To be familiar with their own projects, including some of the parameters. For example, a brief overview of the process architecture of a project. As well as a project development cycle, developers encountered any problems? (Remember to ask specific knowledge manufacturers, small companies are direct line and the project interview )

7. Project process, to be very familiar with your own project parameters, the parameters easily
8. shuffle process, a brief outline.
Parameter Configuration 9. hive in use?
     For example, for the actual development, in accordance with the actual development of the cluster configuration and size of the job, such as setting, job memory size, cpu core number, etc., it is a hive of development optimization.
10. exec What do you mean, hadoop common file formats
      such as: SequenceFile, RCFile, Avro and so on.
11. Flume and kafka use, mix, how to intercept.
12. fast row, merge sort? Recursion?
If the data source 13.kafka flow producer is much greater than the spending power sonsumer of how to do? What is a good way to solve it?
  This is a pit, the actual development can only increase kafka cluster, increase throughput, there is nothing good optimization measures.

14.Spark difference with the storm? spark is not completely real-time computing.

15.spark compared with mr, mainly fast Where?

Guess you like

Origin www.cnblogs.com/wqbin/p/11031272.html