Log service new features: data processing on-line! (Closed beta)

Outline

On line data processing function log service! Currently in private beta, welcomed the trial.

condition

  • Region: Beijing, Shanghai, United Kingdom
  • Currently in private beta, beta application, mention may request a work order, or by adding nails group 11,775,223 @ @ Tang Zhe Kai to apply

Solve the problem

The data processing industry pain points

80% of data on industry analysis spent on structured data, data access, analysis, delivery, various data-processing needs and pain points when docked

  • Mixing the various sources of data formats, it is difficult to simple extraction, such as switches, servers, containers, and other program modules Logging, data collected by a route file, stdout, syslog, network, etc., mixed in various formats Logger; as log the message field, the need to match the timing of extracting each case;
  • Single scene, dynamic and uncertain field, e.g. ngnix, the QueryString, HttpCookie, HttpBody information fields need regular automatic extraction KV or more;
  • Dynamic data includes data source JSON format (e.g. CVE data, O365 Audit logs, etc.), dynamically computed, extracted merge fields, even split into a plurality of processing logs;
  • Some conventional log contains sensitive information (e.g., cryptographic keys, mobile phone number, an internal database connection string, etc.), it is difficult to filter out or make desensitization during extraction.
  • Using simple table, CSV, OSS file, RDS, and other external API for data enrichment.
  • Large amount of data needs to be done aggregates (e.g., 5 minutes plurality of network connection data, gathered into the machine in accordance with a few)

Log pain points in data processing service

Before the data processing on-line, we found that users logging service has the following pain points at all stages:

1. Data Access

  • Single source contains a variety of formats, it is difficult to extract the distribution:

    • Switches, servers, containers, Logging module, collected through the pathway, standard output, the syslog, networks, etc., which is a mixture of various log format, can be partially extracted, for example, to extract some basic logtail fields, e.g. time, log level, IP, etc., but the main message in the log a lot of valuable information because of a mix of various logs, can not be extracted when importing
    • To the desired target fails, access is difficult to achieve a specific format
  • Specific content format complex, difficult to extract:

    • String ngnix e.g., in the QueryString, or the HttpCookie, even HttpBody information, great changes which field contents, format information also high complexity, difficult to use when the extracted disposable regex extraction.
    • Certain complex JSON recursion depth
  • Some conventional log contains sensitive information (e.g., cryptographic keys, mobile phone number, an internal database connection string, etc.), it is difficult to filter out or make desensitization during extraction.
  • Some JSON log information including the number of logs needed for processing split into multiple logs, but can not operate
  • Other methods such as using the SDK and then upload rule, when trying to solve by introducing other methods after Logstash channel conversion, things get complicated, performance data collection has become slower

2. analysis

Access data, the user data is generally processed using SQL, pain following points:

  • Conventional data processing implemented in SQL complex and lengthy, hard to write, fragile and maintenance:

    • Conventional simple field comb, regular extraction, enrichment and so difficult to write
    • Source Log slight change execution error appears
    • Long SQL difficult to understand, modify and maintain
  • Slow performance when large volumes of data or complex SQL, it is easy to overtime:

    • After losing an index calculated field SQL Advantage
    • Metrics data across a large number of long range after time-consuming groupby
  • Field length calculation does not support more than 2KB

    • SQL within a single field index 2KB, the excess is not supported, but the prevalence of long field
  • Other advanced functions support, advanced rules processing requirements can not be achieved:

    • A variety of mixed formats, dynamic fields, etc. can not be achieved with SQL
    • Split logs, calculate specific logic (such as UserAgent / SQL Pattern) can not be achieved
    • Custom calculations do not support aggregation

3. Archive

  • Post to OSS, MaxCompute and so does not support filtering or format conversion on content

4. Docking external system

  • May be in other systems (such as DataWorks, FunctionCompute), etc. will be structured log import and then log back into service, but in the whole process because to solve the work of programming, configuration, commissioning, etc., relative to spend a lot of effort.

The main support scene

Scenario 1 - Data regular (one to one)

image

Scenario 2 - Data dispatch (many)

image

Scenario 3 - Multi-source collection of (many)

image

Scenario 4 - General scene data processing

Provides 150 several built-in functions, you do not need to write code to complete the main processing tasks while providing flexibility to customize the function (UDF) ability to meet a variety of scenarios:

image

  • Filtered (filter): The specific log removed
  • Split (split): The log becomes more than a
  • Conversion (transform): Field, the contents conversion
  • Enriched (enrich): associate external resources, rich field information
  • Polymerization (Rollup) (to be on-line): to do a particular dimension aggregation, to reduce the amount of log
  • Custom action (to be on-line): The above custom action, such as SQL parsing mode, operation custom Agg

Advantage

  1. Faster and easier access: through various channels such as logtail, simply use the easiest way to access a non-indexed, logstore to short-term storage.
  2. Faster and more flexible analytical queries: by machine out of the box with a simple syntax rules and to complete the complicated process, and makes an index based on data processed with fast analysis can; no more lengthy analysis of SQL, inefficient and difficult to adjust.
  3. May be more business scenario: enriched by data processing, custom processing, we can further tap the value of the data, build more advanced Business
  4. More flexible delivery and ecological butt: can more easily configure rules in line with the ecological needs of
  5. Hosted a one-stop data processing program, operation and maintenance-free, automatic extension

Other common questions

Costs

  • Machine processing services and network resources consumed itself currently free, but read and write source logstore target logstore normally charged in accordance with standard logging services.
  • According to circumstances, you can close the index source logstore, and set a shorter retention time.

reference

Welcome scan code to join the official nail group (11,775,223) directly support real-time updates in a timely manner and Ali cloud engineers:
image

Guess you like

Origin yq.aliyun.com/articles/704935