greenplum 报错 valid segments to start the array

greenplum 集群启动报错 Do not have enough valid segments to start the array.

前提:

集群配置完成后,有些集群配置需要优化调整一下:

设置work_mem 64MB

查看配置

gpconfig -s work_mem

Values on all segments are consistent
GUC          : work_mem
Master  value: 32MB
Segment value: 32MB

修改配置

gpconfig -c work_mem  -v 64M  

重启集群加载配置

重新加载配置文件 postgresql.conf 和 pg_hba.conf

gpstop -u   

重启报错如下:

查看报错日志:

/home/gpadmin/gpAdminLogs/gpstart_20180904.log 

[INFO]:-----------------------------------------------------
[INFO]:-   Successful segment starts                                            = 0
[WARNING]:-Failed segment starts                                                = 32   <<<<<<<<
[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
[INFO]:-----------------------------------------------------
[INFO]:-Successfully started 0 of 32 segment instances <<<<<<<<
[INFO]:-----------------------------------------------------
[WARNING]:-Segment instance startup failures reported
[WARNING]:-Failed start 32 of 32 segment instances <<<<<<<
[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20180904.log    
[INFO]:-----------------------------------------------------
[INFO]:-Commencing parallel segment instance shutdown, please wait...
[ERROR]:-gpstart error: Do not have enough valid segments to start the array.

解决办法:

根据报错信息,在网上搜了一下,发现这个是个很粗的报错,参数设置过大、主机异常、配置错误都会报这个错。。。。。。。

试着根据提示修改了一下master节点的配置,将修改的配置注销,再次重启集群,发现集群还是无法启动。报错如下:

20180904:18:53:20:108168 gpstart:cndh1322-6-15:gpadmin-[INFO]:-Starting Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /usr/local/gpdata/gpmaster/gpseg-1 -l /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 --gp_dbid=
1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 34 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start............................................................................................................................................................
.....................................................................................................................................................................................................
.....................................................................................................................................................................................................
..................................................... stopped waiting

查看master 启动日志 发现报错内容如下:

more /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log 

2018-09-04 11:04:07.898274 GMT,,,p127931,th1064769408,,,,0,,,seg-1,,,,,"FATAL","22023","invalid value for parameter ""work_mem"": ""64M""",,"Valid units for this parameter are ""kB"", ""MB"", and "
"GB"".",,,,,,"set_config_option","guc.c",4874,

通过以上错误内容可以看出是配置参数错误导致的!
修改配置 ,gpconfig -c work_mem -v 64MB 不能写成 64M,服务认为配置错误,所以集群无法启动,将master 节点的配置在之前排查错误过程中已经注销了,为啥还不能启动哪?然后登陆一台segment 节点发现 segment节点的配置文件也已经被修改了,所以segment进程起不来。

最终解决;

快速启动,进入维护模式:

gpstart  -a -m 

调整参数:

gpconfig -c work_mem  -v 64MB       

启动集群、集群可以正常启动;

gpstart 

故障总结:
1:使用gpconfig 修改参数会传递到集群每一个节点的配置文件;
2:gpconfig与集群耦合较松,输入的错误也会被写入配置;
3:修改参数前先查询配置现有值,参照原始参数进行修改;

猜你喜欢

转载自blog.51cto.com/michaelkang/2170508
今日推荐