Azure Data Explorer log analysis solution

        Many small partner in the daily log analysis in the use of ES (ElasticSearch), but no native ES PaaS service hosted on Microsoft's cloud Azure. In fact, Microsoft's cloud Azure Data Explorer is a very suitable service log analysis, Microsoft's cloud native LogAnalytics service is based on Azure Data Explorer service building, flexibility compared to the data source, LogAnalytics for Microsoft's cloud service as a native data source compatibility is better, who want the flexibility to build their own data source class guy here I recommend you use the Azure data Explorer service, it can smooth your existing technology stack ELK inherited, it integrates with native support LogStash, so log various data sources may be collected for analysis Azure data Explorer query. Construction of base addition as Azure Data Explorer PaaS hosted service user does not care about the underlying Infra, and can achieve a smooth horizontal scale to meet performance requirements.

        In this paper, the Nginx Access Log, for example, find out how to implement log collection and analysis Filebeat + Logstash + Azure Data Explorer. Architecture diagram refer to the following:

 

         Nginx example in this article uses version 1.16.1, 1.11.8 version from the start Nginx Access Log has native support for Json format, we can refer to the following Nginx configuration file, the format of the Access Log define, create nginx-log-json.conf

log_format json escape=json '{ '
 '"remote_ip": "$remote_addr", '
 '"user_name": "$remote_user", '
 '"time": "$time_iso8601", '
 '"method": "$request_method", '
 '"nginxhostname": "$host", '
 '"url": "$request_uri", '
 '"http_protocol": "$server_protocol", '
 '"response_code": "$status", '
 '"bytes": "$body_bytes_sent", '
 '"referrer": "$http_referer", '
 '"user_agent": "$http_user_agent" '
'}';

        In the above configuration nginx.conf reference, the section configuration at http cited above definition, and the definition and format of Access Log output path.

http {

        # Other Config Setting

        ##
        # Logging Settings
        ##

        include nginx-log-json.conf;
        access_log /var/log/nginx/access.json json;
        error_log /var/log/nginx/error.log;

        # Other Config Setting
}

        Configuration Access Log Filebeat collected from Nginx path for the log, and the log for input LogStash Transform, Filebeat configuration file as follows, wherein the path definition Log Access Log modify the file path Nginx disposed LogStash as the output, and in this example Logstash Nginx so Logstash address is localhost, if you Logstash independent in the real deployment of the deployment, it can be modified according to the actual address on the same host.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/nginx/*.json
    #- c:\programdata\elasticsearch\logs\*

  tags: ["nginx", "json"]

  json:
    keys_under_root: true
    add_error_key: true
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

#setup.template.settings:
#  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
#setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
#  hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash: # The Logstash hosts
  hosts: ["localhost:5044"]
  # Optional SSL. By default is off.
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem "
 

  # List of root certificates for HTTPS server verifications


  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

#processors:
#  - add_host_metadata: ~
#  - add_cloud_metadata: ~

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== X-Pack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

#================================= Migration ==================================

        Logstash log information output configuration Transform of the result data to the Azure Data Explorer, Logstash reference to the following example of a configuration file, wherein the input from the read log defined Filebeat, filer http version number extracted from http_protocol nginx access log output by grok , a native nginx transform of a field log by useragent geoip plug and plug. Azure Data Explorer output portion defined as a receiver, wherein ingest_url, app_id, app_key, app_tenant, database, table, mapping fields defined in accordance with the information in the azure data explorer created. The default Logstash not included in the Azure data explorer plug-ins, please refer to the following installed documentation: https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-logstash

input {
    beats {
        port => "5044"
        codec => json
    }
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
    if "nginx" in [tags] {
        # nginx doesn't log the http version, only the protocol.
        # i.e. HTTP/1.1, HTTP/2
        grok {
            match => {
                "[http_protocol]" => "HTTP/%{NUMBER:[http_version]}"
            }
        }
        geoip {
            source => "[remote_ip]"
            target => "[geoip]"
        }
        useragent {
            source => "[user_agent]"
            target => "user_agent_info"
        }

    }
}
output {
    kusto {
            path => "/tmp/kusto/%{+YYYY-MM-dd-HH-mm-ss}.txt"
            ingest_url => "https://ingest-xxx.westus2.kusto.windows.net/"
            app_id => "xxx"  # azure management application identity
            app_key => "xxx"  # azure management application identity password
            app_tenant => "xxx"  # azure tenant id
            database => "nginx"  # database name defined in ADX
            table => "nginxlogs" # table name defined in ADX 
            mapping => "basicmsg" # table mapping schema defined in ADX
    }
}

        Configure Azure Data Explorer, the process of creating not repeat them here, we can consult the documentation themselves. Here mainly to introduce cited above Table and Mapping how to create, in which the table is the final table storage nginx access log in ADX, the so need to follow the field type definition Schema, Mapping definitions from LogStash come Json log fields map to ADX in the Log Table.

--- Create a Table

.create table nginxlogs (remote_ip: string, username: string, accesstime: datetime, method: string, response_code: int, url: string, http_protocol: string, http_version: string, bodybyte: int, referrer: string, user_agent_info: dynamic, geoip: dynamic)

- Create Mapping

.create table nginxlogs ingestion json mapping 'basicmsg' '[{"column":"remote_ip","path":"$.remote_ip"},{"column":"username","path":"$.username"},{"column":"accesstime","path":"$.time"},{"column":"method","path":"$.method"},{"column":"response_code","path":"$.response_code"},{"column":"url","path":"$.url"},{"column":"http_protocol","path":"$.http_protocol"},{"column":"http_version","path":"$.http_version"},{"column":"bodybyte","path":"$.bytes"},{"column":"referrer","path":"$.referrer"},{"column":"user_agent_info","path":"$.user_agent_info"},{"column":"geoip","path":"$.geoip"}]'

        Configuration is complete, we KQL by following simple language to ADX received inquiries about the log

nginxlogs
| sort by accesstime desc | take 10

 

         So far ADX log analysis engine log collection waterline has opened up, you can KQL query language provided by the ADX flying self, and today here at first wrote a Blog I'll give you a few examples of simple queries exemplified by KQL . 

 

Guess you like

Origin www.cnblogs.com/wekang/p/11961723.html