[Go] similar csv data logging component design

Original link: https://blog.thinkeridea.com/201907/go/csv_like_data_logs.html

Our business requires daily record of large amounts of log data, and these data are quite important, which is the main basis for revenue settlement and a major data analysis department available data sources, for such an important journal, and the high frequency of the log, we need a high performance and secure logging component, to ensure the integrity of each line of log formats, we have designed a class csv log stitching component that code here Datalog .

It is a list of components and can guarantee the integrity of each log and efficient splicing field, any support column and row separators, but also supports an array of field, but the log-many needs, without a plurality of log records, do not record multiple Row. It is a response []bytedata, to facilitate connection with other primary key data is written to the log file, or network.

Instructions for use

API list

  • NewRecord(len int) Record Create a fixed-length logging

  • NewRecordPool(len int) *sync.Pool Create a fixed-length logging buffer pool
  • ToBytes(sep, newline string) []byte Added to the end use newline newline sep connection Record, and
  • ArrayJoin(sep string) string Use sep connection Record, as a result, the value of the array field
  • ArrayFieldJoin(fieldSep, arraySep string) string Use fieldSep connection Record, as the result of an array element
  • Clean() Empty Record all the elements of, if sync.Pool before emptying back into the Pool Record should, avoid memory leaks
  • UnsafeToBytes(sep, newline string) []byte Use sep connection Record, and newline newline added at the end, using in situ replacement string will destroy log fields referenced
  • UnsafeArrayFieldJoin(fieldSep, arraySep string) string Use fieldSep connection Record, as the result of an array element, using the in situ replacement string will destroy log fields referenced

Use the bottom of type Record []stringthe string sections as a line or an array field, when it should be using fixed-length, because the data is formatted logs often, each column has its own meaning, use NewRecord(len int) Recordor NewRecordPool(len int) *sync.Poolcreate components, I recommend the use of each log NewRecordPoolin Creating a buffer pool when the program is initialized, access from the cache once the program is running Recordwill be more efficient, but each time returned to Poolneed to call when Cleanempty Recordto avoid reference to the string can not be recycled, resulting in a memory leak.

practice

We need to ensure that the meaning of a log to each column of data, we create a fixed-length Record, but how to ensure that each column of data consistency, use enumeration constants can go a good guarantee, for example, we define constants log columns:

const (
    LogVersion = "v1.0.0"
)
const (
    LogVer = iota
    LogTime
    LogUid
    LogUserName
    LogFriends

    LogFieldNumber
)

LogFieldNumberIs the number of columns the log, that is, Recordthe length, then use NewRecordPoolto create a buffer pool, then use the constant name as a subscript logging, so do not worry because the check or carelessness cause problems log columns of the disorder.

var w bytes.Buffer // 一个日志写组件
var pool = datalog.NewRecordPool(LogFieldNumber) // 创建一个缓存池

func main() {
  r := pool.Get().(datalog.Record)
  r[LogVer] = LogVersion
  r[LogTime] = time.Now().Format("2006-01-02 15:04:05")
  // 检查用户数据是否存在
  //if user !=nil{
    r[LogUid] = "Uid"
    r[LogUserName] = "UserNmae"
  //}

  // 拼接一行日志数据
  data := r.Join(datalog.FieldSep, datalog.NewLine)
  r.Clean() // 清空 Record
  pool.Put(r) // 放回到缓存池

  // 写入到日志中
  if _, err := w.Write(data); err != nil {
    panic(err)
  }

  // 打印出日志数据
  fmt.Println("'" + w.String() + "'")
}

The program will run over output:

Since the delimiter character is not visible, below the use, instead of the field separator, use; \ n-newline place using / field separator instead of the array, is - instead of an array of delimiters.

'v1.0.0,2019-07-18,11:39:09,Uid,UserNmae,;\n'

Even if we do not record LogFriendsdata column, but in the log it still has a placeholder, if userShi nil, LogUidand LogUserNamedoes not require special treatment, do not need to write data, which still occupy their place, do not worry about the log and therefore confusion.

Use pool can be a good use of memory, will not bring too much memory allocation, and each field value is a string of Record, a simple assignment does not bring too much overhead, it will point to the string itself data, there will be no additional memory allocation, see detailed string errors and optimization suggestions .
Use Record.Joincan be efficient connection line logging, the log files to facilitate us to quickly write in the back designed to explain the section will detail Jointhe design.

An array containing the log

Sometimes not all records of some single value, such as the above LogFriends will record the current record information related to friends, this could be a set of data, Datalog also provides some simple helper functions can be combined to achieve the following example:

// 定义 LogFriends 数组各列的数据
const (
    LogFriendUid = iota
    LogFriendUserName

    LogFriendFieldNumber
)

var w bytes.Buffer // 一个日志写组件
var pool = datalog.NewRecordPool(LogFieldNumber) // 每行日志的 pool
var frPool = datalog.NewRecordPool(LogFriendFieldNumber) // LogFriends 数组字段的 pool

func main(){
  // 程序运行时
  r := pool.Get().(datalog.Record)
  r[LogVer] = LogVersion
  r[LogTime] = time.Now().Format("2006-01-02 15:04:05")
  // 检查用户数据是否存在
  //if user !=nil{
    r[LogUid] = "Uid"
    r[LogUserName] = "UserNmae"
  //}

  // 拼接一个数组字段,其长度是不固定的
  r[LogFriends] = GetLogFriends(rand.Intn(3))
  // 拼接一行日志数据
  data := r.Join(datalog.FieldSep, datalog.NewLine)
  r.Clean() // 清空 Record
  pool.Put(r) // 放回到缓存池

  // 写入到日志中
  if _, err := w.Write(data); err != nil {
    panic(err)
  }

  // 打印出日志数据
  fmt.Println("'" + w.String() + "'")
}

// 定义一个函数来拼接 LogFriends 
func GetLogFriends(friendNum int) string {
  // 根据数组长度创建一个 Record,数组的个数往往是不固定的,它整体作为一行日志的一个字段,所以并不会破坏数据
    fs := datalog.NewRecord(friendNum) 
    // 这里只需要中 pool 中获取一个实例,它可以反复复用
    fr := frPool.Get().(datalog.Record)
    for i := 0; i < friendNum; i++ {
    // fr.Clean() 如果不是每个字段都赋值,应该在使用前或者使用后清空它们便于后面复用
        fr[LogFriendUid] = "FUid"
        fr[LogFriendUserName] = "FUserName"
    
     // 连接一个数组中各个字段,作为一个数组单元
        fs[i] = fr.ArrayFieldJoin(datalog.ArrayFieldSep, datalog.ArraySep)
    }
    fr.Clean() // 清空 Record
    frPool.Put(fr)  // 放回到缓存池

  // 连接数组的各个单元,返回一个字符串作为一行日志的一列
    return fs.ArrayJoin(datalog.ArraySep)
}

The program will run over output:

Since the delimiter character is not visible, below the use, instead of the field separator, use; \ n-newline place using / field separator instead of the array, is - instead of an array of delimiters.

'v1.0.0,2019-07-18,11:39:09,Uid,UserNmae,FUid/FUserName-FUid/FUserName;\n'

So when parsing array can be resolved as a field, which greatly increases the flexibility of great data log,
but does not recommend the use of excessive levels, data logs should be clear and concise, but can use some special scenes one nested.

Best Practices

Use ToBytesand ArrayFieldJoinconnection strings will replace the data field when an empty string, so the datalog which defines four separators, which are invisible characters, rarely appear in the data, but we need to replace the data these characters are connected, the log structure to avoid damage.

Although the various connector components support, but in order to avoid destruction of data, we should choose some and not visible rare single-byte character as a delimiter. Newline rather special, because most log reads components are used \nas a line separator, if the data is very rare \nit can be used \n, Datalog defined \x03\nas a line break, it is compatible with the general log reading component, which takes do a little work you can correctly parse the log.

UnsafeToBytesAnd UnsafeArrayFieldJoinperformance will be better, and their names, they are not safe because they use exbytes.Replace do situ replacement separator, which will destroy the original string of data points. Unless there will be very many of our log data delimiter needs to be replaced, not those who do not recommend using them, because they only improve performance when replacing.

I use a lot in the service UnsafeToBytesand UnsafeArrayFieldJoinI always log at the end of a request, I ensure that all relevant data is no longer used, so do not worry cause problems in situ replace other data are not perceived changed, this may be a good practice, but I still do not recommend using them.

Designed to explain

datalog does not offer very much constrained feature that contains only a practice and a set of auxiliary tools, before using it, we need to understand these practices.

It helps us to create a fixed-length or a log line sync.Pool, we need to combine enumeration constants record data, which help us to connect to each column of data logging required data format.

It provides helper methods have proven practical projects, taking into consideration many details, high-performance core design goal, use it to greatly reduce development costs related components, then this section will analyze its parts.

I think it is worth, he said it provides a Joinmethod, with respect to strings.Joinsave memory allocation twice, from now begin to analyze it.

// Join 使用 sep 连接 Record, 并在末尾追加 suffix
// 这个类似 strings.Join 方法,但是避免了连接后追加后缀(往往是换行符)导致的内存分配
// 这个方法直接返回需要的 []byte 类型, 可以减少类型转换,降低内存分配导致的性能问题
func (l Record) Join(sep, suffix string) []byte {
    if len(l) == 0 {
        return []byte(suffix)
    }

    n := len(sep) * (len(l) - 1)
    for i := 0; i < len(l); i++ {
        n += len(l[i])
    }

    n += len(suffix)
    b := make([]byte, n)
    bp := copy(b, l[0])
    for i := 1; i < len(l); i++ {
        bp += copy(b[bp:], sep)
        bp += copy(b[bp:], l[i])
    }
    copy(b[bp:], suffix)
    return b
}

Input parameters are often log component []bytetype, it returns a direct []byte, rather than strings.Joinin response to a string, is required at the end of the internal buftype conversion, resulting in additional memory overhead. We not only need to log in each row delimiter connecting the columns, but also as the end of a line separator, it offers a suffix suffixwithout us after Joinstitching line separator again after the results, which can reduce an additional memory allocation.

This is precisely the datalog essence of the design, it does not use a number of methods standard library, but the design more in line with the method of the scene, in order to achieve higher performance and better user experience.

// ToBytes 使用 sep 连接 Record,并在末尾添加 newline 换行符
// 注意:这个方法会替换 sep 与 newline 为空字符串
func (l Record) ToBytes(sep, newline string) []byte {
   for i := len(l) - 1; i >= 0; i-- {
      // 提前检查是否包含特殊字符,以便跳过字符串替换
      if strings.Index(l[i], sep) < 0 && strings.Index(l[i], newline) < 0 {
         continue
      }

      b := []byte(l[i]) // 这会重新分配内存,避免原地替换导致引用字符串被修改
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(sep), []byte{' '}, -1)
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(newline), []byte{' '}, -1)
      l[i] = exbytes.ToString(b)
   }

   return l.Join(sep, newline)
}

ToBytesAs a very important interworking function, which is the highest frequency component using a function, it replaces each field separator field and line before connecting the fields, where a check is made in advance whether to include field separators, if used comprising []byte(l[i])copy data of the column, and then using exbytes.Replace provide high performance in situ Alternatively, because the input data is a copy of the reallocation, so do not worry situ replacement can affect other data.

After using previously introduced Joinmethod of connecting each column of data, if used strings.Joinwill be []byte(strings.Join([]string(l), sep) + newline)that which will increase many times the memory allocation, the component through clever design to circumvent these additional costs, to improve performance.

// UnsafeToBytes 使用 sep 连接 Record,并在末尾添加 newline 换行符
// 注意:这个方法会替换 sep 与 newline 为空字符串,替换采用原地替换,这会导致所有引用字符串被修改
// 必须明白其作用,否者将会导致意想不到的结果。但是这会大幅度减少内存分配,提升程序性能
// 我在项目中大量使用,我总是在请求最后记录日志,这样我不会再访问引用的字符串
func (l Record) UnsafeToBytes(sep, newline string) []byte {
   for i := len(l) - 1; i >= 0; i-- {
      b := exstrings.UnsafeToBytes(l[i])
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(sep), []byte{' '}, -1)
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(newline), []byte{' '}, -1)
      l[i] = exbytes.ToString(b)
   }

   return l.Join(sep, newline)
}

UnsafeToBytesAnd ToBytessimilar but no delimiter check because exbytes.Replace already contains inspection, and used directly exstrings.UnsafeToBytes the string to turn []bytethis will not happen copy data very efficient, but it does not support string literal, but I believe the data in the log is assigned running from, if unfortunate contains literal strings, do not worry too much, as long as the use of a special character as a delimiter, often our programming literal string does not contain these characters, execution exbytes.Replace did not happen replacement is safe.

// Clean 清空 Record 中的所有元素,如果使用 sync.Pool 在放回 Pool 之前应该清空 Record,避免内存泄漏
// 该方法没有太多的开销,可以放心的使用,只是为 Record 中的字段赋值为空字符串,空字符串会在编译时处理,没有额外的内存分配
func (l Record) Clean() {
   for i := len(l) - 1; i >= 0; i-- {
      l[i] = ""
   }
}

Clean The method is simpler, it just replace each data column is an empty string, an empty string as a special character, will be processed at compile time, there will be no additional overhead, they all point to the same piece of memory.

// ArrayJoin 使用 sep 连接 Record,其结果作为数组字段的值
func (l Record) ArrayJoin(sep string) string {
   return exstrings.Join(l, sep)
}

// ArrayFieldJoin 使用 fieldSep 连接 Record,其结果作为一个数组的单元
// 注意:这个方法会替换 fieldSep 与 arraySep 为空字符串,替换采用原地替换
func (l Record) ArrayFieldJoin(fieldSep, arraySep string) string {
   for i := len(l) - 1; i >= 0; i-- {
      // 提前检查是否包含特殊字符,以便跳过字符串替换
      if strings.Index(l[i], fieldSep) < 0 && strings.Index(l[i], arraySep) < 0 {
         continue
      }

      b := []byte(l[i]) // 这会重新分配内存,避免原地替换导致引用字符串被修改
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(fieldSep), []byte{' '}, -1)
      b = exbytes.Replace(b, exstrings.UnsafeToBytes(arraySep), []byte{' '}, -1)
      l[i] = exbytes.ToString(b)
   }

   return exstrings.Join(l, fieldSep)
}

ArrayFieldJoinWill be replaced when the array is directly connected to the respective unit delimiter strings, directly after exstrings.Join connection string, exstrings.Join opposing strings.Joinan improved function, as it only once memory allocation, more strings.Joinsaving time, are interested can see its source code implementation.

to sum up

datalog provides a practical as well as some aids that can help us to quickly record data log, more concerned about the data itself. Specific program performance can be handed over to datalog to achieve, it guarantees the performance.

I'll be late plans to offer an efficient log reading component, in order to read the log analysis data, it reads higher than the general document will be more efficient and convenient, targeted to optimize the efficiency of log analysis, stay tuned right.

Reprint:

Author: Qi Silver ( thinkeridea )

This link: https://blog.thinkeridea.com/201907/go/csv_like_data_logs.html

Disclaimer: This blog All articles unless otherwise specified, are used CC BY 4.0 CN agreement license agreement. Please indicate the source!

Guess you like

Origin www.cnblogs.com/thinkeridea/p/11220670.html