1.错误提示

在测试Pager模块时，把日志输出的宏PAGERTRACE打开时，发现再运行pager1.test时，出现如下错误

E:\devc++\sqlite3\tclsqlite3\Debug\tclsqlite3.exe:expected boolean value but got "OPEN"

while executing

"if { [lindex $r 0] } { error$res }"

(procedure "testfixture" line 15)

invoked from within

"testfixture $::code2_chan $tcl"

(procedure "code2" line 1)

invoked from within

"code2 { sqlite3 db2 test.db}"

(procedure "do_multiclient_test" line 25)

invoked from within

"do_multiclient_test tn {

# Create and populate a database table using connection [db]. Check

# that connections [db2] and [db3] can see the sche..."

(file"pager1.test" line 82)

提示的意思是说在过程do_multiclient_test中执行code2 { sqlite3 db2 test.db }语句时，又调用了testfixture $::code2_chan$tcl过程，在执行到if { [lindex $r 0] } { error $res }时出错，本来得到的lindex $r 0应该是一个布尔变量，但是现在却得到“open”所以返回错误。

2.代码分析

首先来看do_multiclient_test过程的实现，该过程是想实现一个多进程的测试用例。

proc do_multiclient_test {varname script} {
  foreach code [list {
    if {[info exists ::G(valgrind)]} { db close ; continue }
    set ::code2_chan [launch_testfixture]
    set ::code3_chan [launch_testfixture]
    proc code2 {tcl} { testfixture $::code2_chan $tcl }
    proc code3 {tcl} { testfixture $::code3_chan $tcl }
    set tn 1
  } {
    proc code2 {tcl} { uplevel #0 $tcl }
    proc code3 {tcl} { uplevel #0 $tcl }
    set tn 2
  }] {
    faultsim_delete_and_reopen

    proc code1 {tcl} { uplevel #0 $tcl }
  
    eval $code
    code2 { sqlite3 db2 test.db }
code3 { sqlite3 db3 test.db }
……
}
}

code定义了一组代码列表，对于每一个code元素，将eval $code等代码执行一遍，第一次执行的是以下这组代码

    if {[info exists ::G(valgrind)]} { db close ; continue }
    set ::code2_chan [launch_testfixture]
    set ::code3_chan [launch_testfixture]
    proc code2 {tcl} { testfixture $::code2_chan $tcl }
    proc code3 {tcl} { testfixture $::code3_chan $tcl }
    set tn 1

所以code2 { sqlite3 db2 test.db }实际执行的代码是

testfixture $::code2_chan { sqlite3 db2 test.db }

$::code2_chan变量是执行完launch_testfixture过程后得到的返回值，我们再来看launch_testfixture的代码

proc launch_testfixture {{prg ""}} {
  write_main_loop
  if {$prg eq ""} { set prg [info nameofexec] }
  if {$prg eq ""} { set prg testfixture }
  if {[file tail $prg]==$prg} { set prg [file join . $prg] }
  set chan [open "|$prg tf_main.tcl" r+]
  fconfigure $chan -buffering line
  set rc [catch { 
    testfixture $chan "sqlite3_test_control_pending_byte $::sqlite_pending_byte"
  }]
  if {$rc} {
    testfixture $chan "set ::sqlite_pending_byte $::sqlite_pending_byte"
  }
  return $chan
}

以上代码中write_main_loop创建了一个tf_main.tcl脚本文件，set chan [open "|$prg tf_main.tcl" r+]这行代码打开了一个管道，该管道是子进程运行tf_main.tc脚本，在我的电脑上open "|$prg tf_main.tcl"展开时就是E:/devc++/sqlite3/tclsqlite3/Debug/tclsqlite3.exe tf_main.tcl，这个过程最后返回一个管道$chan，接着code2过程干的事情就是把管道$chan和sqlite3 db2 test.db传给testfixture执行，那么到底做了什么呢，我们来看代码

proc testfixture {chan cmd args} {
  if {[llength $args] == 0} {
    fconfigure $chan -blocking 1
    puts $chan $cmd
    puts $chan OVER

    set r ""
    while { 1 } {
      set line [gets $chan]
      if { $line == "OVER" } { 
        set res [lindex $r 1]
        if { [lindex $r 0] } { error $res }
        return $res
      }
      if {[eof $chan]} {
        return "ERROR: Child process hung up"
      }
      append r $line  
    }
    return $r
  } else {
    …….
  }
}

通过以下这2行代码通过管道把sqlite3 db2 test.db交给子进程处理

   puts $chan $cmd
   puts $chan OVER

子进程执行完后再把返回值发回给主进程，通过gets $chan来接收，每接收完一行代码就通过append r $line语句把接收到的字符串追加到r的结尾，为了找到问题的原因，把从管道里接收的数据line和r打印出来

puts "line $line"
puts "r $r"

前2次执行launch_testfixture过程时没有出错，而执行sqlite3 db2 test.db时管道发回来的数据如下

line OPEN 18086624 E:\devc++\tcl\tt\test\testdir\test.db
line 0 {}
line OVER
r OPEN 18086624 E:\devc++\tcl\tt\test\testdir\test.db0 {}

而在SQLite的源代码中没加日志打印信息时并未出错，正确的执行结果应该是

line 0 {}
line OVER
r 0 {}

接下来再看看子进程中的代码，在tf_main.tcl中：

    set script ""
    while {![eof stdin]} {
      flush stdout
      set line [gets stdin]
      if { $line == "OVER" } {
        set rc [catch {eval $script} result]
        puts [list $rc $result]
        puts OVER
        flush stdout
        set script ""
      } else {
        append script $line
        append script "\n"
      }
}

在这里我们看到新打开的管道进程是从stdin里接收数据(即脚本)，接收完后再去执行脚本set rc [catch {eval $script} result]，执行完后再把执行结果发回到主进程puts [list $rc $result]，使用的是puts命令，如果没有指明管道，那么默认的是往stdout发数据。而在执行sqlite3 db2 test.db时，SQLite源码里执行sqlite3PagerOpen()函数，这里面有一句打印日志的代码

 PAGERTRACE(("OPEN %d %s\n", FILEHANDLEID(pPager->fd),pPager->zFilename));

这句代码也向stdout输出打印信息，最后被主进程收到了

OPEN 18086624 E:\devc++\tcl\tt\test\testdir\test.db

也就是说主进程本来想收到的是子进程执行脚本后的返回结果，却收到了源代码执行时的打印信息，这就是问题的根源。

3.问题解决

知道了问题的根源，解决起来就很容易了，最彻底的方法就是主进程和子进程定一个稍微复杂点的通信协议。

除了定新的协议，最简单的解决问题的方法是在testfixture过程中，把append r $line该为set r $line

SQLite中的Tcl测试脚本的一个bug

1.错误提示

2.代码分析

3.问题解决

猜你喜欢