一、为什么遇到这个问题

大致来说，可能是对于日志模块，我们可能需要将用户提供字符串进行格式化之后打印。但是这个用户提供的字符串本身可能千奇百怪，如果其中包含了一些特殊的格式化符号，如果不加特殊处理，则可能会导致一些意外的问题。如果只是"%s"其实还好说，这个如果访问地址非法则直接当场吐核。更坑的是如果提供的是"%n"这样的悄悄的修改了随机地址中的内容，那这个就更坑爹了。这里只是简单的说下如果"%n"时访问到异常地址之后，如何确认是由于对于这个%n的处理而导致的。

二、测试代码

tsecer@harry: cat printf.format.cpp
#include "stdio.h"

int main(int argc, const char * argv[])
{
return printf("%s %n tsecer add a suffix to mark this position\n", argv[0], 0);
}
tsecer@harry: g++ -g printf.format.cpp
tsecer@harry: gdb ./a.out
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from …….
(gdb) r
Starting program: ……

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7241611 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.tl2.3.x86_64 libgcc-4.8.5-4.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64
(gdb) bt
#0 0x00007ffff7241611 in vfprintf () from /lib64/libc.so.6
#1 0x00007ffff72488c9 in printf () from /lib64/libc.so.6
#2 0x000000000040061d in main (argc=1, argv=0x7fffffffe568) at printf.format.cpp:5
(gdb)

三、如何结合反汇编看下当前格式化的字符

1、C库中的源代码

C库中vfprintf函数当前处理的格式保存在f指针中，所以现在的问题是如何找到f在堆栈中的位置。
glibc-2.11\stdio-common\vfprintf.c
/* The function itself. */
int
vfprintf (FILE *s, const CHAR_T *format, va_list ap)
{
……
#ifdef COMPILE_WPRINTF
/* Find the first format specifier. */
f = lead_str_end = __find_specwc ((const UCHAR_T *) format);
#else
/* Find the first format specifier. */
f = lead_str_end = __find_specmb ((const UCHAR_T *) format);
#endif
……

/* Look for next format specifier. */
#ifdef COMPILE_WPRINTF
f = __find_specwc ((end_of_spec = ++f));
#else
f = __find_specmb ((end_of_spec = ++f));
#endif
……
}

2、对应的反汇编代码

在反汇编中，有两个比较明显的函数调用时有两个明显的strchrnul调用，那么它们是如何和源码中的两个对应的？其实可以看到，在第二个调用前，有一个比较明显的变量递增操作，
0x00007ffff723e955 <+2005>: lea 0x1(%rax),%r12
这个对应的是源代码中的
f = __find_specwc ((end_of_spec = ++f));
的++f操作，而在这次函数调用之后
0x00007ffff723e96f <+2031>: mov %rax,-0x4c0(%rbp)
把返回值rax存放入了-0x4c0(%rbp)，所以-0x4c0(%rbp)就是f变量在栈中的位置。
0x00007ffff723e1e5 <+101>: mov (%r15),%rax
0x00007ffff723e1e8 <+104>: mov $0x25,%esi
0x00007ffff723e1ed <+109>: mov %r14,%rdi
0x00007ffff723e1f0 <+112>: mov %rax,-0x460(%rbp)
0x00007ffff723e1f7 <+119>: mov 0x8(%r15),%rax
0x00007ffff723e1fb <+123>: mov %rax,-0x458(%rbp)
0x00007ffff723e202 <+130>: mov 0x10(%r15),%rax
0x00007ffff723e206 <+134>: mov %rax,-0x450(%rbp)
0x00007ffff723e20d <+141>: callq 0x7ffff7287210 <strchrnul>
0x00007ffff723e212 <+146>: and $0x8000,%r12d
0x00007ffff723e219 <+153>: mov %rax,-0x4d8(%rbp)
0x00007ffff723e220 <+160>: mov %rax,-0x4c0(%rbp)
0x00007ffff723e227 <+167>: movl $0x0,-0x4cc(%rbp)
……
0x00007ffff723e94e <+1998>: mov %r9d,-0x4c8(%rbp)
0x00007ffff723e955 <+2005>: lea 0x1(%rax),%r12
0x00007ffff723e959 <+2009>: mov %r12,%rdi
0x00007ffff723e95c <+2012>: mov %r12,-0x4c0(%rbp)
0x00007ffff723e963 <+2019>: callq 0x7ffff7287210 <strchrnul>
0x00007ffff723e968 <+2024>: mov 0xd8(%rbx),%rcx
0x00007ffff723e96f <+2031>: mov %rax,-0x4c0(%rbp)
0x00007ffff723e976 <+2038>: sub %r12,%rax
……

3、验证结果

(gdb) p *(char **) ($rbp-0x4c0)
$1 = 0x4006b4 "n tsecer add a suffix to mark this position\n"
(gdb)

printf当前正在处理的格式化符是什么