Event Hub White Getting Started

Event Hub Event Center

The purpose of this paper is the most white vernacular, keep you from "fully understand" beginning to understand what is distributed big data stream platform Event Hub, and understand key concepts and preliminary understanding of its send and receive data API.

Definitions, Event Hub What is the purpose of generating

Event Hub is Microsoft's Azure cloud service of a product, it is distributed big data stream platform. Belong to PaaS. Event Hub:

  • Support large-scale, real-time streaming data
  • It can handle one million events per second
  • Easy to use, managed service
  • Azure supports over 54 regions

This refers to large-scale, real-time streaming data What does it mean? Big Data Streaming large data stream

Many applications need to analyze and collect data from the entire process, such as the use of the site to collect user data, or system of things to collect real-time data for all networked devices. The data generated from a plurality of different terminals, and generating all the time. So these data are streaming data, real-time, as if, like water, a steady stream, flows from one place to the next place.

Why is the "big data" mean? Because these data may be sent from thousands of clients, and high frequency issued to bring together in one place for processing, forming a large-scale data, it is big data.

Event Hub can handle big data?

It can handle one million per second-level events (event). Here the "event": what you send and receive data. - https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-about , 2019.8

 

What is a managed service? Managed Service

Event Hub is a fully-managed service. What does that mean? If you do not use Event Hub is such a hosted service, so developers need to manage their own large data stream platform. For example, Apache Kafka is also a platform for large data stream, but it is not a fully managed, which means that developers will need to build and manage their own large data streams, such as to build (buy and configure) virtual machine cluster, install and manage Kafka, storage management, meaning that developers need to manage all services related to their own, updates, package, version, or need to use the service on behalf of other platforms to complete these steps. The Event Hub for developers fully managed, developers only need to create Event Hub, then you can send and receive data in a large, middle process to ensure that all Event Hub, providing stable service. Such developers to control the middle of the process of getting weaker, but more attention to their business logic.

 

Chain in large data streams, client data generating client, server and data analysis server receives data. Event Hub server as a client and the intermediate buffer area (buffer).

Why should a buffer: because there is no buffer, then will cause dependence (dependency) and high coupling (tight coupling). If a large amount of data, it will be a problem. Buffer and can play the role of controlling the flow (flow control) in the data line.

 

A typical use is to collect telemetry Event Hub (Telemetry) data generated at the distal end, comprising acquiring data from the 1) application and the client network 2) and portals remote device (such as a shared scattered around the bicycle) on.

working principle

In essence, Event Hub is a temporary place to put the data.

When data is generated from the terminal, sending data to an Event Hub time, Event Hub put the data collection and then write it down, written in a storage inside thereof, and then we can read data from these Event Hub our collection, visualization, data analysis and so on, to do what we want to do.

 

Event Hub is like a notebook, write us from start to finish, also read from beginning to end.

|||||||||||||||||||||||||||||||||||||||||||||||||| | -> write

Read ->

 

Write it down on a notebook data can be read again and again, into the Event Hub data can also be read many times. Read data operation data will not be deleted from the Event Hub.

However, the data on the Event Hub is not permanent preservation. After the data reaches the Event Hub, are kept for some time (this time is referred to as a retention day, it can be set to 1-7 days). Like a notebook, after more than a certain period of time, the old notes will be torn.

This is the basic principle of the Event Hub. But in fact there are some more details of the concept, developers need to understand when to use. One of the most important of Partition and Consumer Group.

 

What is a Partition?

Event Hub arrival data is not actually written in the same place, but to write on several of Partition. In fact, like Event Hub which is not only a laptop, but there are multiple notebooks to record the message.

Partition Event Hub sector seems to be a general storage space. When data arrives Event Hub, one by one will be assigned to a Partition, it is in turn assign default way (Round Robin). In other words, the first message comes, will be written on a notebook first, and the second message comes, will be written on a second notebook, and so on.

Notebook A: 1 4

Notebook B: 2 5

Notebook C: 3 <- 6

  

This form is divided into multiple regions, the purpose is to provide the ability to parallel receive (read).

 

Or take the laptop to the analogy. For example, the beginning of time, you have an Event Hub, there are two notebook to record the data, and then you ask a man (read application data) to read the data from the two notebooks (two applications can open threads to simultaneously read data). Since the beginning of the news is not much, so a person's brain is still adequate.

But later, your business bigger and bigger, the incoming data stream is also growing, you got people's heads are not enough (CPU enough to read fast enough). At this time, you will need to ask a person (to open an application example to read), so the two men could each possession a notebook, read the above data.

 

However, your business bigger and bigger, two people, but to read, how to do it? At this time you may want to consider a notebook and then more, so that you can make a person more. But because the current number of partition can not be changed after the Event Hub is created, it can only re-create a number of more Partition Event Hub to meet the requirements.

  

Of course, Partition is not possible, because each Partition are required to have a separate Receiver to read, and the cost of that means more CPU resources and Socket connection, so be careful to consider increasing the number of Partition, do not arbitrarily resource-intensive.

 

You can have the maximum number of Partition?

Event Hub allows a 2-32 Partition, set when you create Event Hub. At present, then, Event Hub can not be modified once created (can only create a new Event Hub), it is time to create a reasonable estimate of the number of concurrent read.

 

What is Consumer Group ?

Consumer Group What is it? Event Hub in concept, Consumer Group corresponds to the view (View) when a read, we can save the next Consumer Group, the state of the stream read (read by the application read what position, or offset) . In this case, if an application reads the connection is disconnected for some reason, to re-establish a connection to read it, we know from what location convenient to continue reading.

 

We can create multiple Consumer Group (up to 20) in a Event Hub, so that different applications can use different reading of the Consumer Group for reading. For example, an Event Hub collects all shared bicycles state data, and in the analysis of data, we have an application that is used to monitor the current position of the bicycle distribution of the need to send people to adjust, another application is the user's needs usage, course of action to the analysis. Application of two different purposes, is not the same as the read frequency. In such case, it is possible to use two Consumer Group, one for each application, so that both sides can be read without disturbing each other.

 

how to use

(* This article is an introduction, not a tutorial, so here assumes that you have created the "Event Center namespace Event Hub Namespace" and "Event Center Event Hub" in the Azure, and gained the Event Hub Connection String.)

 

So how to use Event Hub API to send and receive events it?

Send interfaces Sender API

It can be achieved in two ways to send data on the operation of Event Hub. One is based on a REST API HTTPS protocol to transmit, i.e. authorization information provided in the header, the POST data to be transmitted to a corresponding URL. *

(* Detailed operation can be seen here: https://docs.microsoft.com/en-us/rest/api/eventhub/event-hubs-runtime-rest .)

 

But more is recommended to send the second way, using EventClient API. Behind it is the AMQP protocol is more efficient. But here we do not need to understand AMQP implementations, simply use the interface provided by Microsoft.

Below C # example to describe the sending interface:

API interface NuGet bag Microsoft.Azure.EventHubs of.

1. Create EventData, the message to be sent on EventData object.

 var eventData = new EventData(byteArray);

 

2. Create EventHubClient, this time to provide with authorization information (Connection String) good connection to the specified Event Hub.

 EventHubClient eventHubClient = EventHubClient.CreateFromConnectionString(connectionString);

 

3. Call EventHubClient send API to send data

 eventHubClient.SendAsync(eventData);

 

Receive interface Receiver API

Get the data from the Event Hub interface is based on the inside AMQP protocol (HTTPS protocol and is not based API).

Wherein, there are two ways still acquired data may be achieved: 1) use of PartitionReceiver EventHubClient; 2) EventProcessorHost (EPH) API. The following were introduced:

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-standard-api-overview

1 ) Use PartitionReceiver

This is provided by a read Event Hub API, it can read data from a specified Partition.

API interface NuGet bag Microsoft.Azure.EventHubs of.

1. and transmitting data, the connection to create EventHubClient Event Hub. This EventHubClient when sent there to use.

EventHubClient eventHubClient = EventHubClient.CreateFromConnectionString(connectionString);

 

2. EventHubClient of CreateReceiver () API to create PartitionReceiver:

PartitionReceiver = eventHubClient PartitionReceiver .CreateReceiver ( "the Default $", "0", DateTime.Now) ; // from the default ConsumerGroup, get the number "0" Partition all data from after the current time.

Here, we want

 

3. Call PartitionReceiver receive API to receive messages

the await eventDatas = var partitionReceiver.ReceiveAsync (10); // perform reception, sets the reception data up to 10

 

4. Analytical data

foreach (var ehEvent in ehEvents)

{

var message = UnicodeEncoding.UTF8.GetString(ehEvent.Body.Array);

// code into logic data analysis

}

 

It should be noted that a PartitionReceiver can only read information from a partition. Therefore, the usual practice is for each partition a Event Hub, have created a PartitionReceiver to read, one by one.

We can use EventHubClient of GetRuntimeInfoAsync () API to get runtime information, and there we can know all the partition, so as to create eleven PartitionReceiver:

var runtime information = await eventHubClient.GetRuntimeInformationAsync ();

foreach (var partitionId in runTimeInformation.PartitionIds) {

var receiver = eventHubClient.CreateReceiver(PartitionReceiver.DefaultConsumerGroupName, partitionId, Date.Time.Now);

 }

 

2 ) Event Processor Host

PartitionReceiver directly read data from a specified partition, easy to use. However, we often need to read data on multiple partition, and the data in the scene has a very large demand for scalability, so we need both tube "Creating PartitionReceiver to each partition" this matter, but also pipe " partition read open several applications of the future distribution of work "thing, consider" if an application example to hang up, how to start and then progress from the previous reading, "so to write very complicated.

There is no automatic method is more labor-saving it?

Of course there is da! If you do not want a partition of reading a handwritten record offset, size scaling, we can use EventProcessorHost ( EPH ) for processing.

Essentially, EPH provides two functions:. 1 EPH automatically to a Event Hub each of the EventProcessors creates a partition (corresponding to a Receiver), and the average of these Processor assigned to an existing instance of an application to process, examples of these applications and real-time monitoring;. 2 EPH progress will automatically read the saved. In this way, no matter how many instances you have the healthy operation or hang up an instance, you can ensure that you read normally, providing high availability (availability).

Simply put, now you have all the data on the number of notebooks to read, if you use PartitionReceiver, that is, you ask several people to read these notebooks (App created a number of instances), which people arrange your own notebook which read, then you have to worry about how to allocate their own. If you use EPH, then equivalent to please a manager to manage, according to the manager will ask you a few people (App to open a few examples), automatically allocate each person doing live. If someone asked for leave (an App hung up), the manager will automatically schedule his work to others. Meanwhile, the manager will be recorded where each notebook read, so if the read operation is interrupted (such as reading disconnected), the latter can then continue from where the last read.

API interface NuGet bag in the Microsoft.Azure.EventHubs.Processor. **

 

First, to achieve a IEventProcessor Interface:

CloseAsync(), OpenAsync(), ProcessErrorAsync(), ProcessEventsAsync()

 

public class YourEventProcessor : IEventProcessor{

public Task CloseAsync(PartitionContext context, CloseReason reason){

// your implementation when close

}

public Task OpenAsync(PartitionContext context){

// your implementation when open

}

public Task ProcessErrorAsync(PartitionContext context, Exception error){

// your implementation to process error

}

public Task ProcessEventAsync(PartitionContext context, IEnumerable<EventData> eventDatas){

// your implementation to process event data

if(eventDatas != null){

foreach(var eventData in eventDatas){

// process data here

}

}

return context.CheckpointAsync(); // save offset

}

}

The above code, the actual data analysis logic is written in ProcessEventsAsync (), and finally "return context.CheckpointAsync ();" to save the reading progress.

Then, create EventProcessorHost in the main program:

was your event processor host = new Event Processor Host (

eventHubPath,

consumerGroupName,

eventHubConnectionString,

storageConnectionString,

containerName);

Which, eventHubPath, consumerGroupName, eventHubConnectionString you create good EventHub authentication information, storageConnectionString and containerName is Azure Storage Account authentication information, which is used to save the reading progress, it also needs to be created well in advance (this does not describe how to create ) *.

Next, put the EPH and EventProcessor just created to connect:

await yourEventProcessorHost.RegisterEventProcessorAsync<YourEventProcessor>();

Finally, when the main exit, take off EventProcessor.

await yourEventProcessorHost.UnregisterEventProcessorAsync();

In this way, even if you have multiple application instances, EPH can help you manage, and the way you want to deal with.

 

* You can also choose to use other storage, you need to use ICheckpointManager, not described in detail here.

** EPH being re-optimized the interface design, after posting this article, the interface may have a greater change, the reader is the main official documents in real time.

 

Event Hub Supported languages and API packages:

Tutorial document https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-programming-guide

 

to sum up

Is a variation of nature EventHub --messaging log messaging queue is the product of the era of large data, a large data stream gateway analysis, analysis of data provided exclusively for a large buffer, load balancing, the transceiver large to provide reliable data, easy-to-operate platform services. Hopefully this article introduced the definition of Event Hub, principle, use can bring you inspiration. I hope there is that not clear enough, not precise enough place, please correct me a lot :)

 

Special thanks to:

Xin Okami

Jun

Adam

 

Guess you like

Origin www.cnblogs.com/mysunnytime/p/11634815.html