gnu coreutils-4.5.1 split.c代码分析

split很有意思，我一个长文件按行或按字节数进行拆分。
在awk中也有一个split函数，把字串拆分开一个数组中，如
split("3-22-1981",dt,"/")
有人讲，读代码时象lisp一样，（操作符，数1，数2）
此处，split就像一个操作符，哈哈！
读代码时，还是先把代码从头拉到尾，再从main开始慢慢分析。
main前面那些水货，我先略过，直接看干货
switch (split_type)
    {
    case type_digits:
    case type_lines:
      lines_split (num, buf, in_blk_size);
      break;

    case type_bytes:
      bytes_split (num, buf, in_blk_size);
      break;

    case type_byteslines:
      line_bytes_split (num);
      break;

    default:
      abort ();
    }
因为我就是想了解到底如何按行来拆分，因此，我就重点看lines_split
do
    {
      n_read = stdread (buf, bufsize);
      if (n_read < 0)
error (EXIT_FAILURE, errno, "%s", infile);
      bp = bp_out = buf;
      eob = bp + n_read;
      *eob = '\n';
      for (;;)
{
   while (*bp++ != '\n')
     ;   /* this semicolon takes most of the time */
   if (bp > eob)
     {
       if (eob != bp_out) /* do not write 0 bytes! */
  {
    cwrite (new_file_flag, bp_out, eob - bp_out);
    new_file_flag = 0;
  }
       break;
     }
   else
     if (++n >= nlines)
       {
  cwrite (new_file_flag, bp_out, bp - bp_out);
  bp_out = bp;
  new_file_flag = 1;
  n = 0;
       }
}
    }
while (n_read == bufsize);
这个代码很长，用了循环的嵌套。
外面循环，读数据
bp指向字串起点
eob指向终点
*eob='\n'
for(;;)
{
  if(bp>eob) 我当时想，不是eob='\n'吗，为什么还会bp比它大？

}
看不懂，于是就加打印语句，你猜什么效果，
if(bp>eob)
处理拆分时最后打印
else
if(++n >=nlines) 这里每拆分到合适的行就就打印一次。就是每拆分成一个文件就执行一次
总之，作者还是用'\n'来计算行数，用了一个三层嵌套循环，大致意思明白，但没深刻理解，但还是当看完了，先混个脸熟。

gnu coreutils-4.5.1 split.c代码分析

猜你喜欢