解决ssh连接机器以及sudo切换用户特别慢的问题

说明:文章看着很粗糙,并且不够细致。主要是为大家提供解决问题的思路和排错的思想;

情景再现:公司的机器都是统一用jumpserver管理,某一天发现连接其中一台web机器特别慢,13秒左右,并且sudo切换 root用户时候也特别慢。

问题分析:

1.先判断会不会是sudo的问题,大多数做法就是 输入hostname 命令 得到我们的主机名 ,然后编辑/etc/hosts 添加 本机与主机名的映射 从而达到 sudo 速度提升的效果。但是针对我今天说的问题是没用的。
2.ssh是慢的第一个关卡,并且影响ssh 慢的原因有很多,那么问题的定位就是ssh相关了。
网上的做法有很多,比如禁止ssh 启动dns反向解析UseDNS no;GSSAPIAuthentication=no ;
这俩个参数的设置对于优化ssh 是无可厚非的,但是首先要分析ssh整个连接过程中暴露的问题点,用的是 debug 调试模式。
ssh -v -p 端口号 用户名@主机ip

OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013 #第一阶段,双方确认协议版本号和ssh版本号
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for
debug1: Connecting to 192.168.1.155 [192.168.1.155] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/identity-cert type -1
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH

debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent #第二阶段,双方确认/支持使用的数据加密算法,消息摘要算法,主机公钥等信息.
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
The authenticity of host '192.168.1.155 (192.168.1.155)' can't be established.
RSA key fingerprint is d4:58:f1:dc:d7:d4:fd:e0:2a:c3:dd:fd:79:51:2e:91.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.155' (RSA) to the list of known hosts.
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure. Minor code may provide more information
Cannot determine realm for numeric host address

debug1: Unspecified GSS failure. Minor code may provide more information
Cannot determine realm for numeric host address

debug1: Unspecified GSS failure. Minor code may provide more information

debug1: Unspecified GSS failure. Minor code may provide more information
Cannot determine realm for numeric host address

debug1: Next authentication method: publickey //先尝试公钥
debug1: Trying private key: /root/.ssh/identity
debug1: Trying private key: /root/.ssh/id_rsa
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug1: Next authentication method: password #第三阶段,进入身份验证的过程

#第四阶段,验证成功后等到一个新的session,及设置环境变量等,最后得到一个shell.

那么我们如何分析调试信息?

1.找错误
比如:debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure. Minor code may provide more information

解决:vi /etc/ssh/ssh_config
GSSAPIAuthentication no

分析:GSSAPI ( Generic Security Services Application Programming Interface) 是一套类似Kerberos 5的通用网络安全系统接口,该接口是对各种不同的客户端服务器安全机制的封装,以消除安全接口的不同,降低编程难度,但该接口在目标机器无域名解析时会有问题,使用strace查看后发现,ssh在验证完key之后,进行authentication gssapi-with-mic,此时先去连接DNS服务器,在这之后会进行其他操作。所以通常关闭;

2.看调试信息卡在哪里?
因为我们连接ssh时候会因为卡住某个环节而连接不上,而我们仅仅只知道他卡了,但是具体卡在哪个环节我们不清楚,通过ssh的调试模式,我们可以定位卡在哪里,然后复制卡的调试信息 ,谷歌搜索。你懂得。。。。

比如:

debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY (一直卡在这个位置)
解决:大多数是考虑到MTU 值 ,可以在网卡配置文件中设置 MTU=1200
分析:https://www.jianshu.com/p/3181b53053dd

------------------------------------------我是快乐的分割线--------------------------------------------------------------------------------------

然而! 问题并不是这些

还需要结合systemctl status 查看 发现了一个错误 :Failed to abandon session scope: Transport endpoint is not connecte,搜索后在github上看到 是因为 我们的dbus服务中的 systemd-logind 宕掉了,而间接让ssh 和sudo 特变慢 , 精确定位到 systemd-logind 后通过 systemctl status systemd-logind 查看 发现因为错误没有启动,kill掉所有进程再启动 重新ssh解决了问题。

我们也可以通过结合wireshark 在ssh连接过程中抓包来定位问题
我会陆续在下面的文章介绍

                                                          by__________ 阿威

猜你喜欢

转载自blog.51cto.com/13173364/2174770