一、背景介绍
售后机器不开机,经分析是分区损坏导致异常发生
二、分析过程
1. 从现象上看是手机有些是卡在开机动画或者开机第一帧反复重启或者不开机
2. 卡在第一帧的(一是卡在lk logo,二是卡在kernel log)需要打印出来串口log分析,卡在开机动画的可以抓取离线log(mtk平台在不开机状态下可以用GAT工具抓取)
第一种情况:
在优化第N个应用,优化到最后一个应用时停留在那个界面再也无法启动
Kernel log:
<14>[ 1095.195963]<0> (0)[1:init]init: computing context for service '/system/bin/mediaserver'
<13>[ 1095.196665]<0> (0)[1:init]init: starting 'media' <7>[ 1095.197245]<0> (0)[1:init][1:init] fork [1177:init]
<14>[ 1095.197655]<0> (0)[1:init]init: PropSet [init.svc.media:running] Start>>
<14>[ 1095.198039]<0> (0)[1:init]init: PropSet [init.svc.media:running] Done
<6>[ 1095.223132]<0> (0)[55:cfinteractive][Power/cpufreq]
@_mt_cpufreq_set_cur_freq():1804, cur_khz = 221000, target_khz = 988000, dds = 0x130000, sel = 8
<6>[ 1095.223273]<0> (0)[55:cfinteractive][Power/cpufreq] @_mt_cpufreq_set_locked(): Vproc = 1137mv, freq = 988000 KHz
<2>[ 1095.353104]<0> (0)[78:hps_main]Boot slave CPU
<4>[ 1095.353127]<0> (0)[78:hps_main][Power/hotplug] mt_smp_boot_secondary, cpu: 1
<4>[ 1095.353386]<0>-(1)[0:swapper/1]CPU1: Booted secondary processor
<4>[ 1095.353403]<0>-(1)[0:swapper/1][Power/hotplug] platform_secondary_init, cpu: 1
<4>[ 1095.353713]<0> (0)[78:hps_main][wdk]bind kicker thread[148] to cpu[1]
<4>[ 1095.353795]<0> (0)[78:hps_main][WDK]cpu 1 plug on kick wdt
<4>[ 1095.354016]<0> (0)[78:hps_main][HPS] (0020)(2)action end(100)(105)(0)(0) (4)(4)(4)(4)(1) (60)(34)(0) (0)(0)(0) (1)(105)(35)(0)(105)
<7>[ 1095.533688]<0> (1)[1177:mediaserver][1177:mediaserver] exit
<7>[ 1095.534077]<0>-(1)[1177:mediaserver][1177:mediaserver] sig 17 to [1:init] stat=S
<14>[ 1095.534412]<0> (0)[1:init]init: waitpid returned pid 1177, status = 00000100
<13>[ 1095.534441]<0> (0)[1:init]init: process 'media', pid 1177 exited
<13>[ 1095.534466]<0> (0)[1:init]init: process 'media' killing any children in process group
<14>[ 1095.534493]<0> (0)[1:init]init: PropSet [init.svc.media:restarting] Start>>
<14>[ 1095.534645]<0> (0)[1:init]init: PropSet [init.svc.media:restarting] Done
<6>[ 1095.552840]<0> (1)[55:cfinteractive][Power/cpufreq] @_mt_cpufreq_set_cur_freq()
Main log:
01-01 17:56:35.416 896 959 I ServiceManager: Waiting for service media.audio_policy...
01-01 17:56:35.465 896 896 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:35.862 441 474 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:36.416 896 959 I ServiceManager: Waiting for service media.audio_policy...
01-01 17:56:36.465 896 896 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:36.863 441 474 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:37.417 896 959 I ServiceManager: Waiting for service media.audio_policy...
01-01 17:56:37.466 896 896 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:37.864 441 474 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:38.418 896 959 I ServiceManager: Waiting for service media.audio_policy...
01-01 17:56:38.467 896 896 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:38.865 441 474 I ServiceManager: Waiting for service media.audio_flinger...
01-01 17:56:39.069 308 308 I thermal_repeater: oh, queryMdThermalInfo (0)Bad file number
01-01 17:56:39.077 308 308 I thermal_repeater: [recvMdThermalInfo] ret=5, strLen=127, ERROR
分析结论:
1. 查看kernel log和mainlog,发现mediaserver服务未启动起来,一直在反复的重启2. 回读异常手机版本和正常版本systemimg,发现libmediaplayerservice.so只有这个库文件和正常版本有不同,这个库文件应该坏掉了,至于为啥会损坏,至今未查出来
如下是截图:
3. 从我们这边来看已经分析到这无法进一步的分析判断了,这个时候应该求助EMMC厂商来协助分析为何EMMC会出现如此现象
4. 从preloader中的log看一下EMMC ID:
[PLFM] Power key boot!
mt_get_dram_type() 0x3
[EMI] LPDDR3
[Check]mt_get_mdl_number 0x0
[EMI] eMMC/NAND ID = 90,1,4A,48,38,47,31,65,5,7,28,F7,10,9B,13,F1
[EMI] MDL number = 0
[EMI] emi_set eMMC/NAND ID = 90,1,4A,48,38,47,31,65,5,0,0,0,0,0,0,0
Start REXTDN SW calibration...
对应的preloader中的代码如下:
实际代码:
switch (DDR_type)
{
case TYPE_mDDR:
print("[EMI] DDR1\r\n");
break;
case TYPE_LPDDR2:
print("[EMI] LPDDR2\r\n");
break;
case TYPE_LPDDR3:
print("[EMI] LPDDR3\r\n");
break;
case TYPE_PCDDR3:
print("[EMI] PCDDR3\r\n");
break;
default:
print("[EMI] unknown dram type:%d\r\n",mt_get_dram_type());
break;
}
index = mt_get_mdl_number ();
print("[Check]mt_get_mdl_number 0x%x\n",index);
print("[EMI] eMMC/NAND ID = %x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x\r\n", id[0], id[1], id[2], id[3], id[4], id[5], id[6], id[7], id[8],id[9],id[10],id[11],id[12],id[13],id[14],id[15]);
if (index < 0 || index >= num_of_emi_records)
{
print("[EMI] setting failed 0x%x\r\n", index);
return;
}
print("[EMI] MDL number = %d\r\n", index);
emi_set = &emi_settings[index];
print("[EMI] emi_set eMMC/NAND ID = %x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x\r\n", emi_set->ID[0], emi_set->ID[1], emi_set->ID[2], emi_set->ID[3], emi_set->ID[4], emi_set->ID[5], emi_set->ID[6], emi_set->ID[7], emi_set->ID[8],emi_set->ID[9],emi_set->ID[10],emi_set->ID[11],emi_set->ID[12],emi_set->ID[13],emi_set->ID[14],emi_set->ID[15]);
从EMMC ID查看是哪家的EMMC:
[EMI] emi_set eMMC/NAND ID = 90,1,4A,48,38,47,31,65,5,0,0,0,0,0,0,0
EMMC表格路径:
bootable/bootloader/preloader/tools/emigen/MT6735/MemoryDeviceList_MT6735M.xls
之后在和表格中的EMMC ID一列进行比对发现是Hynix 的H9TQ64A8GTMCUR_KUM 型号,至此我们就可以向具体的厂商求助了
备注:
如何查看项目配置了哪家的EMMC呢?
bootable/bootloader/preloader/custom/$(Projece)/inc/custom_MemoryDevice.h
#define BOARD_ID MT6735_EVB
#define CS_PART_NUMBER[0] H9TQ64A8GTMCUR_KUM
#define CS_PART_NUMBER[1] KMQ7X000SA_B315
#define CS_PART_NUMBER[2] TYD0GH121661RA
#define CS_PART_NUMBER[3] KMFN10012M_B214
#define CS_PART_NUMBER[4] KMFNX0012M_B214
#define CS_PART_NUMBER[5] H9TQ64A8GTCCUR_KUM
第二种情况:
1. 卡在第一帧反复重启
2. 打印串口log如下:
[ 4.663933]<0>.(0)[158:init]fs_mgr: eMMC: can NOT find FAT partition via name mapping, part=emmc@intsd. [ 4.665245]<0>.(0)[158:init]fs_mgr: [xiaolei] : blk device name /dev/block/platform/mtk-msdc.0/by-name/system [ 4.666616]<0>.(0)[158:init]fs_mgr: __mount: target='/system, encryptable=0 [ 4.667519]fs_mgr: __mount: 'nvdata' partition exists![ 4.670190]<2>.(2)[133:mmcqd/0][BLOCK_TAG] mmcqd:133 Workload < 1%, duty 877538, period 2340152078, req_cnt=4 [ 4.671431]<2>.(2)[133:mmcqd/0][BLOCK_TAG] mmcqd:133 Read Diversity=5090296 sectors offset, req_cnt=4, break_cnt=0, tract_cnt=0, bit_cnt=0 [ 4.672981]<2>.(2)[133:mmcqd/0][BLOCK_TAG] vmstat (FP:1668)(FD:0)(ND:0)(WB:0)(NW:0) [ 4.678994]<2>.(2)[158:init]fs_mgr: [xiaolei] : blk device name /dev/block/platform/mtk-msdc.0/by-name/userdata [ 4.680895]<2>.(2)[158:init]EXT4-fs (mmcblk0p26): Ignoring removed nomblk_io_submit option [ 4.690987]EXT4-fs error (device mmcblk0p26): ext4_mb_generate_buddy:755: group 15, 1970 clusters in bitmap, 1971 in gd |
1.》 抓取串口log发现启动到kernel 阶段 时启动失败,但是此时Kernel阶段log未 开启
2.》 刷入串口log开启的lk文件,从log看是userdata分区被损坏,导致无法挂载,损坏原因未知
3. 》由于这样的机器比较多,先对其中一台做恢复出厂设置,发现可以正常开机,说明真的和userdata分区有关
4. 这个时候解决方式和第一种情况一样,求助EMMC厂商吧