Ceph daily operation and maintenance management
Cluster monitoring and management
Clusters of overall operation
[root@cephnode01 ~]# ceph -s
cluster:
id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0
health: HEALTH_WARN
application not enabled on 1 pool(s)
services:
mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h)
mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02
osd: 4 osds: 4 up (since 27h), 4 in (since 19h)
rgw: 1 daemon active (cephnode01)
data:
pools: 6 pools, 96 pgs
objects: 235 objects, 3.6 KiB
usage: 4.0 GiB used, 56 GiB / 60 GiB avail
pgs: 96 active+clean
id:集群ID
health:集群运行状态,这里有一个警告,说明是有问题,意思是pg数大于pgp数,通常此数值相等。
mon:Monitors运行状态。
osd:OSDs运行状态。
mgr:Managers运行状态。
mds:Metadatas运行状态。
pools:存储池与PGs的数量。
objects:存储对象的数量。
usage:存储的理论用量。
pgs:PGs的运行状态
~]$ ceph -w
~]$ ceph health detail
Common Queries state instruction
Bunching like态: HEALTH_OK, HEALTH_WARN, HEALTH_ERR
[root@ceph2 ~]# ceph health detail
HEALTH_OK
[root@ceph2 ~]# ceph -s
cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in; 32 remapped pgs
rbd-mirror: 1 daemon active
data:
pools: 14 pools, 536 pgs
objects: 220 objects, 240 MB
usage: 1764 MB used, 133 GB / 134 GB avail
pgs: 508 active+clean
28 active+clean+remapped
ceph -w is the same, but for interactive state, you can try update
Cluster logo
noup:OSD启动时,会将自己在MON上标识为UP状态,设置该标志位,则OSD不会被自动标识为up状态
nodown:OSD停止时,MON会将OSD标识为down状态,设置该标志位,则MON不会将停止的OSD标识为down状态,设置noup和nodown可以防止网络抖动
noout:设置该标志位,则mon不会从crush映射中删除任何OSD。对OSD作维护时,可设置该标志位,以防止CRUSH在OSD停止时自动重平衡数据。OSD重新启动时,需要清除该flag
noin:设置该标志位,可以防止数据被自动分配到OSD上
norecover:设置该flag,禁止任何集群恢复操作。在执行维护和停机时,可设置该flag
nobackfill:禁止数据回填
noscrub:禁止清理操作。清理PG会在短期内影响OSD的操作。在低带宽集群中,清理期间如果OSD的速度过慢,则会被标记为down。可以该标记来防止这种情况发生
nodeep-scrub:禁止深度清理
norebalance:禁止重平衡数据。在执行集群维护或者停机时,可以使用该flag
pause:设置该标志位,则集群停止读写,但不影响osd自检
full:标记集群已满,将拒绝任何数据写入,但可读
Cluster flag state operation
We can only operate on the entire cluster, not for a single osd
Set noout state:
[root@ceph2 ~]# ceph osd set noout
noout is set
[root@ceph2 ~]# ceph -s
cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in; 32 remapped pgs
flags noout
rbd-mirror: 1 daemon active
data:
pools: 14 pools, 536 pgs
objects: 220 objects, 240 MB
usage: 1764 MB used, 133 GB / 134 GB avail
pgs: 508 active+clean
28 active+clean+remapped
io:
client: 409 B/s rd, 0 op/s rd, 0 op/s wr
Cancel noout status:
[root@ceph2 ~]# ceph osd unset noout
noout is unset
[root@ceph2 ~]# ceph -s
cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in; 32 remapped pgs
rbd-mirror: 1 daemon active
data:
pools: 14 pools, 536 pgs
objects: 220 objects, 240 MB
usage: 1764 MB used, 133 GB / 134 GB avail
pgs: 508 active+clean
28 active+clean+remapped
io:
client: 2558 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr
Set to full status:
[root@ceph2 ~]# ceph osd set full
full is set
[root@ceph2 ~]# ceph -s
cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_WARN
full flag(s) set
services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in; 32 remapped pgs
flags full
rbd-mirror: 1 daemon active
data:
pools: 14 pools, 536 pgs
objects: 220 objects, 240 MB
usage: 1768 MB used, 133 GB / 134 GB avail
pgs: 508 active+clean
28 active+clean+remapped
io:
client: 2558 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr
The specified file is written as an object to a resource pool put
:
[root@ceph2 ~]# rados -p ssdpool put testfull /etc/ceph/ceph.conf
2019-03-27 21:59:14.250208 7f6500913e40 0 client.65175.objecter FULL, paused modify 0x55d690a412b0 tid 0
Cancel the full state:
[root@ceph2 ~]# ceph osd unset full
full is unset
[root@ceph2 ~]# ceph -s
cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in; 32 remapped pgs
rbd-mirror: 1 daemon active
data:
pools: 14 pools, 536 pgs
objects: 220 objects, 240 MB
usage: 1765 MB used, 133 GB / 134 GB avail
pgs: 508 active+clean
28 active+clean+remapped
io:
client: 409 B/s rd, 0 op/s rd, 0 op/s wr
[root@ceph2 ~]# rados -p ssdpool put testfull /etc/ceph/ceph.conf
[root@ceph2 ~]# rados -p ssdpool ls
testfull
test
PG state
See generally See pg status can use the following two commands, the dump can see more details, such as.
~]$ ceph pg dump
~]$ ceph pg stat
PG state
Creating:PG正在被创建。通常当存储池被创建或者PG的数目被修改时,会出现这种状态
Active:PG处于活跃状态。可被正常读写
Clean:PG中的所有对象都被复制了规定的副本数
Down:PG离线
Replay:当某个OSD异常后,PG正在等待客户端重新发起操作
Splitting:PG正在初分割,通常在一个存储池的PG数增加后出现,现有的PG会被分割,部分对象被移动到新的PG
Scrubbing:PG正在做不一致校验
Degraded:PG中部分对象的副本数未达到规定数目
Inconsistent:PG的副本出现了不一致。如果出现副本不一致,可使用ceph pg repair来修复不一致情况
Peering:Perring是由主OSD发起的使用存放PG副本的所有OSD就PG的所有对象和元数据的状态达成一致的过程。Peering完成后,主OSD才会接受客户端写请求
Repair:PG正在被检查,并尝试修改被发现的不一致情况
Recovering:PG正在迁移或同步对象及副本。通常是一个OSD down掉之后的重平衡过程
Backfill:一个新OSD加入集群后,CRUSH会把集群现有的一部分PG分配给它,被称之为数据回填
Backfill-wait:PG正在等待开始数据回填操作
Incomplete:PG日志中缺失了一关键时间段的数据。当包含PG所需信息的某OSD不可用时,会出现这种情况
Stale:PG处理未知状态。monitors在PG map改变后还没收到过PG的更新。集群刚启动时,在Peering结束前会出现该状态
Remapped:当PG的acting set变化后,数据将会从旧acting set迁移到新acting set。新主OSD需要一段时间后才能提供服务。因此这会让老的OSD继续提供服务,直到PG迁移完成。在这段时间,PG状态就会出现Remapped
State management Stuck (stuck) in PG
如果PG长时间(mon_pg_stuck_threshold,默认为300s)出现如下状态时,MON会将该PG标记为stuck:
inactive:pg有peering问题
unclean:pg在故障恢复时遇到问题
stale:pg没有任何OSD报告,可能其所有的OSD都是down和out
undersized:pg没有充足的osd来存储它应具有的副本数
默认情况下,Ceph会自动执行恢复,但如果未成自动恢复,则集群状态会一直处于HEALTH_WARN或者HEALTH_ERR
如果特定PG的所有osd都是down和out状态,则PG会被标记为stale。要解决这一情况,其中一个OSD必须要重生,且具有可用的PG副本,否则PG不可用
Ceph可以声明osd或PG已丢失,这也就意味着数据丢失。
需要说明的是,osd的运行离不开journal,如果journal丢失,则osd停止
Pg stuck state of operation
Check the state stuck in the pg:
[root@ceph2 ceph]# ceph pg dump_stuck
ok
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
17.5 stale+peering [0,2] 0 [0,2] 0
17.4 stale+peering [2,0] 2 [2,0] 2
17.3 stale+peering [2,0] 2 [2,0] 2
17.2 stale+peering [2,0] 2 [2,0] 2
17.1 stale+peering [0,2] 0 [0,2] 0
17.0 stale+peering [2,0] 2 [2,0] 2
17.1f stale+peering [2,0] 2 [2,0] 2
17.1e stale+peering [0,2] 0 [0,2] 0
17.1d stale+peering [2,0] 2 [2,0] 2
17.1c stale+peering [0,2] 0 [0,2] 0
[root@ceph2 ceph]# ceph osd blocked-by
osd num_blocked
0 19
2 13
Check the lead in blocking peering pg consistent state osd:
ceph osd blocked-by
Check the status of a pg:
ceph pg dump |grep pgid
Pg loss statement:
ceph pg pgid mark_unfound_lost revert|delete
Osd loss statement (required osd is down and out):
ceph osd lost osdid --yes-i-really-mean-it
Pool status
~]$ ceph osd pool stats
~]$ ceph osd lspools
Limiting Pool configuration changes
Main course
Prohibit pool is deleted
osd_pool_default_flag_nodelete
Prohibit pool pg_num and pgp_num be modified
osd_pool_default_flag_nopgchange
Do not modify the pool size and min_size
osd_pool_default_flag_nosizechang
Experimental operation
[root@ceph2 ~]# ceph daemon osd.0 config show|grep osd_pool_default_flag
"osd_pool_default_flag_hashpspool": "true",
"osd_pool_default_flag_nodelete": "false",
"osd_pool_default_flag_nopgchange": "false",
"osd_pool_default_flag_nosizechange": "false",
"osd_pool_default_flags": "0",
[root@ceph2 ~]# ceph tell osd.* injectargs --osd_pool_default_flag_nodelete true
[root@ceph2 ~]# ceph daemon osd.0 config show|grep osd_pool_default_flag
"osd_pool_default_flag_hashpspool": "true",
"osd_pool_default_flag_nodelete": "true",
"osd_pool_default_flag_nopgchange": "false",
"osd_pool_default_flag_nosizechange": "false",
"osd_pool_default_flags": "0",
[root@ceph2 ~]# ceph osd pool delete ssdpool ssdpool yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool ssdpool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it. #不能删除
To false
[root@ceph2 ~]# ceph tell osd.* injectargs --osd_pool_default_flag_nodelete false
[root@ceph2 ~]# ceph daemon osd.0 config show|grep osd_pool_default_flag
"osd_pool_default_flag_hashpspool": "true",
"osd_pool_default_flag_nodelete": "true", #依然显示为ture
"osd_pool_default_flag_nopgchange": "false",
"osd_pool_default_flag_nosizechange": "false",
"osd_pool_default_flags": "0"
Using a configuration file to modify
Modified on ceph1
osd_pool_default_flag_nodelete false
[root@ceph1 ~]# ansible all -m copy -a 'src=/etc/ceph/ceph.conf dest=/etc/ceph/ceph.conf owner=ceph group=ceph mode=0644'
[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-mon.target'
[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-osd.target'
[root@ceph2 ~]# ceph daemon osd.0 config show|grep osd_pool_default_flag
"osd_pool_default_flag_hashpspool": "true",
"osd_pool_default_flag_nodelete": "false",
"osd_pool_default_flag_nopgchange": "false",
"osd_pool_default_flag_nosizechange": "false",
"osd_pool_default_flags": "0",
Delete ssdpool
[root@ceph2 ~]# ceph osd pool delete ssdpool ssdpool --yes-i-really-really-mean-it
Successfully deleted! ! !
OSD status
~]$ ceph osd stat
~]$ ceph osd status
~]$ ceph osd dump
~]$ ceph osd tree
~]$ ceph osd df
Monitor status and view the status of arbitration
~]$ ceph mon stat
~]$ ceph mon dump
~]$ ceph quorum_status
Cluster space usage
~]$ ceph df
~]$ ceph df detail
Cluster configuration management (temporary and global services GR)
Sometimes the need to change the configuration of the service, but do not want to restart the service, or a temporary modification. This time you can use and tell daemon sub-command to complete this requirement.
View the running configuration
命令格式:
# ceph daemon {daemon-type}.{id} config show
命令举例:
# ceph daemon osd.0 config show
2, tell the child command format
Use tell a manner suitable for the setting of the entire cluster, use an asterisk to match, you can set roles entire cluster. The emergence of abnormal nodes can not be set when the error will be among the command line, not very easy to find.
命令格式:
# ceph tell {daemon-type}.{daemon id or *} injectargs --{name}={value} [--{name}={value}]
命令举例:
# ceph tell osd.0 injectargs --debug-osd 20 --debug-ms 1
- daemon-type: the type of object to be operated such osd, mon, mds like.
- daemon id: name of the object, typically 0,1 OSD like, mon ceph -s name is displayed, where * means all enter.
- injectargs: represents the parameter injection, must be followed by a parameter, you can also now more
3, daemon subcommand
Use daemon way is to set up one by one to set, so you can better feedback, this approach needs to be set on the host role to configure is located.
命令格式:
# ceph daemon {daemon-type}.{id} config set {name}={value}
命令举例:
# ceph daemon mon.ceph-monitor-1 config set mon_allow_pool_delete false
Cluster operations
Command includes start, restart, status
1、启动所有守护进程
# systemctl start ceph.target
2、按类型启动守护进程
# systemctl start ceph-mgr.target
# systemctl start ceph-osd@id
# systemctl start ceph-mon.target
# systemctl start ceph-mds.target
# systemctl start ceph-radosgw.target
Add and delete OSD
Adding OSD
1、格式化磁盘
ceph-volume lvm zap /dev/sd<id>
2、进入到ceph-deploy执行目录/my-cluster,添加OSD
ceph-deploy osd create --data /dev/sd<id> $hostname
Delete OSD
1、调整osd的crush weight为 0
ceph osd crush reweight osd.<ID> 0.0
2、将osd进程stop
systemctl stop ceph-osd@<ID>
3、将osd设置out
ceph osd out <ID>
4、立即执行删除OSD中数据
ceph osd purge osd.<ID> --yes-i-really-mean-it
5、卸载磁盘
umount /var/lib/ceph/osd/ceph-?
PG expansion
ceph osd pool set {pool-name} pg_num 128
ceph osd pool set {pool-name} pgp_num 128
Note:
1, taking with it the size of expansion close to N-th power of 2
2, when changing the number of PG pool, to simultaneously change the number of PGP. PGP is designed to manage the placement of PG exists, and it should be consistent with the number of PG. If you increase pg_num pool, we need to increase pgp_num, consistent with their size, so the cluster to function properly rebalancing.
Pool Operation
Lists storage pool
ceph osd lspools
Creating a Storage Pool
命令格式:
# ceph osd pool create {pool-name} {pg-num} [{pgp-num}]
命令举例:
# ceph osd pool create rbd 32 32
Set quota storage pool
命令格式:
# ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
命令举例:
# ceph osd pool set-quota rbd max_objects 10000
Deleting a storage pool
ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
Rename storage pool
ceph osd pool rename {current-pool-name} {new-pool-name}
View Storage Pool Statistics
rados df
Snapshot storage pool to do
ceph osd pool mksnap {pool-name} {snap-name}
To delete a snapshot storage pool
ceph osd pool rmsnap {pool-name} {snap-name}
Obtaining the storage pool option value
ceph osd pool get {pool-name} {key}
Storage Pool option value adjustment
ceph osd pool set {pool-name} {key} {value}
size:设置存储池中的对象副本数,详情参见设置对象副本数。仅适用于副本存储池。
min_size:设置 I/O 需要的最小副本数,详情参见设置对象副本数。仅适用于副本存储池。
pg_num:计算数据分布时的有效 PG 数。只能大于当前 PG 数。
pgp_num:计算数据分布时使用的有效 PGP 数量。小于等于存储池的 PG 数。
hashpspool:给指定存储池设置/取消 HASHPSPOOL 标志。
target_max_bytes:达到 max_bytes 阀值时会触发 Ceph 冲洗或驱逐对象。
target_max_objects:达到 max_objects 阀值时会触发 Ceph 冲洗或驱逐对象。
scrub_min_interval:在负载低时,洗刷存储池的最小间隔秒数。如果是 0 ,就按照配置文件里的 osd_scrub_min_interval 。
scrub_max_interval:不管集群负载如何,都要洗刷存储池的最大间隔秒数。如果是 0 ,就按照配置文件里的 osd_scrub_max_interval 。
deep_scrub_interval:“深度”洗刷存储池的间隔秒数。如果是 0 ,就按照配置文件里的 osd_deep_scrub_interval 。
Gets the object number of copies
ceph osd dump | grep 'replicated size'
User Management
Ceph in the form of data objects stored in each storage pool. Ceph user must have access to the storage pool to be able to read and write data. In addition, Ceph user must have administrative privileges to be able to use the command of Ceph.
View user information
查看所有用户信息
# ceph auth list
获取所有用户的key与权限相关信息
# ceph auth get client.admin
如果只需要某个用户的key信息,可以使用pring-key子命令
# ceph auth print-key client.admin
Add user
# ceph auth add client.john mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth get-or-create client.paul mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth get-or-create client.george mon 'allow r' osd 'allow rw pool=liverpool' -o george.keyring
# ceph auth get-or-create-key client.ringo mon 'allow r' osd 'allow rw pool=liverpool' -o ringo.key
Modify user permissions
# ceph auth caps client.john mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth caps client.paul mon 'allow rw' osd 'allow rwx pool=liverpool'
# ceph auth caps client.brian-manager mon 'allow *' osd 'allow *'
# ceph auth caps client.ringo mon ' ' osd ' '
delete users
# ceph auth del {TYPE}.{ID}
其中, {TYPE} 是 client,osd,mon 或 mds 的其中一种。{ID} 是用户的名字或守护进程的 ID 。
Add and delete Monitor
A cluster can have only one monitor, recommend at least three production environment deployed. Ceph Paxos algorithm using a variant of a common understanding of a variety of map, as well as other vital information for the cluster. Recommended (but not mandatory) to deploy an odd number of monitor. Most mon Ceph need and can communicate with each other during operation, such as a single mon, 2 or 2, 3, 2, 4 in the 3 and the like. When the initial deployment, we recommend the deployment of three monitor. If you want to increase the follow-up, add 2 once.
Add a monitor
# ceph-deploy mon create $hostname
注意:执行ceph-deploy之前要进入之前安装时候配置的目录。/my-cluster
Delete Monitor
# ceph-deploy mon destroy $hostname
注意: 确保你删除某个 Mon 后,其余 Mon 仍能达成一致。如果不可能,删除它之前可能需要先增加一个。
Ceph Troubleshooting
nearfull osd(s) or pool(s) nearfull
At this time, the memory osd Description section has exceeded a threshold, mon monitors cluster OSD ceph space usage. If you want to eliminate WARN, these two parameters can be modified to improve the threshold, but through practice and found not solve the problem, the cause can be analyzed through the distribution of observation data of osd.
Threshold profile settings
"mon_osd_full_ratio": "0.95",
"mon_osd_nearfull_ratio": "0.85"
,
Automatic processing
ceph osd reweight-by-utilization
ceph osd reweight-by-pg 105 cephfs_data(pool_name)
Manual handling:
ceph osd reweight osd.2 0.8
Global deal
ceph mgr module ls
ceph mgr module enable balancer
ceph balancer on
ceph balancer mode crush-compat
ceph config-key set "mgr/balancer/max_misplaced": "0.01"
PG fault state
PG state outlined
a PG at different times of its life cycle may be in the following states:
Creating (creation)
when you create POOL, you need to specify the number of PG, PG's status at this time will be in creating, meaning Ceph are creating PG.
(Interconnection) Peering
Peering is the main function is to establish interconnection between PG OSD and a copy is located, and makes agree on object and its metadata between these PG in the OSD.
Active (active)
in the main to save a copy of PG and PG in this state means that the data has been intact, and Ceph has been completed peering work.
Clean (neat)
When PG is a clean state, then the OSD and copy the corresponding main OSD network has been successful, and without departing from the PG. Ceph has also meant the PG objects in accordance with the provisions of the number of copies made copy operation.
To Degraded (degraded)
When the number of copies does not reach a predetermined number of PG, PG will be in the degraded state, for example:
The process of writing object to the main OSD client, a copy of the object is responsible for the main OSD to write a copy of the OSD, OSD until a copy of the complete copy of the object is created and issued before the completion of information, the state of the PG will be the main OSD It has been in a degraded state. Or is the state became an OSD down, then all will be on the PG OSD is marked as degraded.
When the Ceph for some reason can not find one or more object within a PG, the PG will be marked as degraded state. At this point the client can not read and write the object can not be found, but still have access to other object located within the PG.
Recovering (recovery)
When an OSD down for some reason, and the object within the OSD PG PG will lag behind its corresponding copy. The OSD after the re-up, the contents of the OSD must be updated to the current state, is in the process of PG state is recovering.
Backfilling (backfill)
when a new OSD added to the cluster, the inner part of PG CRUSH existing cluster will be assigned to it. These are reassigned to a new state will be in the OSD PG backfilling.
Remapped (remapping)
When a PG responsible for maintaining the acting set change, PG need to migrate from the original acting set to the new acting set. This process will take some time, so in the meantime, the state will be marked as relevant PG remapped.
Stale (old)
By default, OSD daemon will report every half second to its PG Monitor and other related state, if the main OSD where the acting set a PG failed to send a report to the Monitor, or other Monitor has reported the OSD is down, the PG will be marked as stale.
OSD status
OSD has two sets of single state concern, one group using the in / out markers that OSD is within a cluster, another group using the up / down marks the OSD is in running condition. Between the two groups are not mutually exclusive, in other words, when the OSD is in a "in" state, it can still be in a state of up or down.
OSD status is in and up
This is a normal state of OSD, indicating that the OSD is within the cluster, and functioning properly.
OSD status in and down
at this time of the OSD is still in the cluster, but the state has not normal daemon by default after 300 seconds will be kicked out of the cluster, and then goes out and down the state, then in PG on the OSD will migrate to other OSD.
OSD status is out and up
in this state usually occurs when new OSD, means that the OSD daemon to normal, but not yet added to the cluster.
OSD status is out and down
in the OSD is not cluster in this state, and the daemon is not running properly, CRUSH PG is not redistributed to the OSD.
About Ceph is some reference
Reference 1: https://my.oschina.net/diluga/blog/1501203#h4_6 (explain the precautions and instructions on the maintenance log Ceph line, and there is no command, is ultimately architectural concepts)
Reference 2: http://www.zphj1987.com/tag/server/ (Ceph engineers blog)
参考3: https://www.cnblogs.com/hukey/p/11899710.html