1 概述

Ceph 是一个开源的、高度可扩展的分布式存储系统，设计用于提供高性能、高可靠性和无单点故障的存储服务

1.1 ceph集群支持的三种存储类型

对象存储（RADOS Gateway，兼容 S3/Swift API）
块存储（RBD，Rados Block Device，适用于虚拟机/容器）
文件系统（CephFS，POSIX 兼容的分布式文件系统）。

Ceph是一个对象(object)式存储系统，它把每一个待管理的数据流(例如一个文件)切分为一到多个固定大小的对象数据，并以其为原子单元完成数据存取。

    对象数据的底层存储服务是由多个主机(host)组成的存储系统，该集群也被称之为RADOS存储集群，即可靠，自动化，分布式对象存储系统。

    由于直接基于librados这个API才能使用Ceph集群的话对使用者是有一定门槛的。当然，这一点Ceph官方也意识到了，于是他们还对Ceph做出了三个抽象资源，分别为RADOSGW,RBD,CEPHFS等。

    RadosGw,RBD和CephFS都是RADOS存储服务的客户端，他们把RADOS的存储服务接口(librados)分别从不同的角度做了进一步抽象，因而各自适用不同的应用场景，如下所示:
        RadosGw:
            它是一个更抽象的能够跨互联网的云存储对象，它是基于RESTful风格提供的服务。每一个文件都是一个对象，而文件大小是各不相同的。
            他和Ceph集群的内部的对象(object，它是固定大小的存储块，只能在Ceph集群内部使用，基于RPC协议工作的)并不相同。
            值得注意的是，RadosGw依赖于librados哟，访问他可以基于http/https协议来进行访问。
            
        RBD:
            将ceph集群提供的存储空间模拟成一个又一个独立的块设备。每个块设备都相当于一个传统磁(硬)盘一样，你可以对它做格式化，分区，挂载等处理。
            值得注意的是，RBD依赖于librados哟，访问需要Linux内核支持librdb相关模块哟。

        CephFS:
            很明显，这是Ceph集群的文件系统，我们可以很轻松的像NFS那样使用，但它的性能瓶颈相比NFS来说是很难遇到的。
            值得注意的是，CephFS并不依赖于librados哟，它和librados是两个不同的组件。但该组件使用的热度相比RadosGw和RBD要低。
            
	推荐阅读:
        查看ceph的官方文档:
               https://docs.ceph.com/en/latest/
        中文社区文档:
               http://docs.ceph.org.cn

    温馨提示:
        (1)CRUSH算法是Ceph内部数据对象存储路由的算法。它没有中心节点，即没有元数据中心服务器。
        (2)无论使用librados，RadosGw，RBD，CephFS哪个客户端来存储数据，最终的数据都会被写入到Rados Cluster，值得注意的是这些客户端和Rados Cluster之间应该有多个存储池(pool)，每个客户端类型都有自己的存储池资源。

1.2 Ceph架构的存储流程

如果我们想要将数据存储到ceph集群，那么大致步骤如下所示:
	(1)Rados Cluster集群固定大小的object可能不符合我们要存储某个大文件，因此一个大文件想要存储到ceph集群，它可能会被拆分成多个data object对象进行存储;
	(2)通常情况下data object请求向某个pool存储数据时，通过CRUSH算法会先对data object进行一致性哈希计算，而后将存储任务映射到到该pool中的某个PG上;
	(3)紧接着，CRUSH算法(是用来完成object存储路由的一个算法)会根据pool的冗余副本数量和data object的存储类型找到足量的OSD进行存储，当然对应的PG是有active PG和standby PG角色之分的，通常副本数我们会设置为3;

1.3 ceph架构

自下向上，可以将Ceph系统分为四个层次

RADOS 基础存储系统（即可靠的、自动化的、分布式的对象存储）

RADOS是Ceph最底层的功能模块，是一个无限可扩容的对象存储服务，能将文件拆解成无数个对

象（碎片）存放在硬盘中，大大提高了数据的稳定性。它主要由OSD和Monitor两个组件组成，

OSD和Monitor都可以部署在多台服务器中，这就是ceph分布式的由来，高扩展性的由来

LIBRADOS 基础库

Librados提供了与RADOS进行交互的方式，并向上层应用提供Ceph服务的API接口，因此上层的

RBD、RGW和CephFS都是通过Librados访问的，目前提供PHP、Ruby、Java、Python、Go、C

和C++支持，以便直接基于RADOS（而不是整个Ceph）进行客户端应用开发

高层应用接口：包括了三个部分

对象存储接口 RGW（RADOS Gateway）

网关接口，基于Librados开发的对象存储系统，提供S3和Swift兼容的RESTful API接口。

块存储接口 RBD（Reliable Block Device）

基于Librados提供块设备接口，主要用于Host/VM。

文件存储接口 CephFS（Ceph File System）

Ceph文件系统，提供了一个符合POSIX标准的文件系统，它使用Ceph存储集群在文件系统上存储

用户数据。基于Librados提供的分布式文件系统接口。

应用层

基于高层接口或者基础库Librados开发出来的各种APP，或者Host、VM等诸多客户端

1.4 ceph核心组件

Ceph是一个对象式存储系统，它把每一个待管理的数据流（如文件等数据）切分为一到多个固定
大小（默认4兆）的对象数据（Object），并以其为原子单元（原子是构成元素的最小单元）完成
数据的读写。

OSD守护进程

是负责物理存储的进程，一般配置成和磁盘一一对应，一块磁盘启动一个OSD进程。主要功能是

存储数据、复制数据、平衡数据、恢复数据，以及与其它OSD间进行心跳检查，负责响应客户端

请求返回具体数据的进程等。通常至少需要3个OSD来实现冗余和高可用性。

PG归置组

PG 是一个虚拟的概念而已，物理上不真实存在。它在数据寻址时类似于数据库中的索引：Ceph

先将每个对象数据通过HASH算法固定映射到一个 PG 中，然后将 PG 通过 CRUSH 算法映射到

OSD

Pool

Pool 是存储对象的逻辑分区，它起到 namespace 的作用。每个 Pool 包含一定数量（可配置）的

PG。Pool 可以做故障隔离域，根据不同的用户场景统一进行隔离。

Pool中数据保存方式支持两种类型

多副本（replicated）：类似 raid1，一个对象数据默认保存 3 个副本，放在不同的 OSD
纠删码（Erasure Code）：类似 raid5，对 CPU 消耗稍大，但是节约磁盘空间，对象数据保存只有 1 个副本。由于Ceph部分功能不支持纠删码池，此类型存储池使用不多

Pool、PG 和 OSD 的关系

一个Pool里有很多个PG
一个PG里包含一堆对象，一个对象只能属于一个PG
PG有主从之分，一个PG分布在不同的OSD上（针对多副本类型）

Monitor（守护进程 ceph-mon）

用来保存OSD的元数据。负责维护集群状态的映射视图（Cluster Map：OSD Map、Monitor

Map、PG Map 和 CRUSH Map），维护展示集群状态的各种图表，管理集群客户端认证与授

权。一个Ceph集群通常至少需要 3 或 5 个（奇数个）Monitor 节点才能实现冗余和高可用性，它

们通过 Paxos 协议实现节点间的同步数据。

Manager（守护进程 ceph-mgr）

负责跟踪运行时指标和 Ceph 集群的当前状态，包括存储利用率、当前性能指标和系统负载。为外

部监视和管理系统提供额外的监视和接口，例如 zabbix、prometheus、 cephmetrics 等。一个

Ceph 集群通常至少需要 2 个 mgr 节点实现高可用性，基于 raft 协议实现节点间的信息同步。

MDS（Metadata Server，守护进程 ceph-mds）

是 CephFS 服务依赖的元数据服务。负责保存文件系统的元数据，管理目录结构。对象存储和块

设备存储不需要元数据服务；如果不使用 CephFS 可以不安装。

		
温馨提示:
	(1)如果我们给定的存储路径是一块裸的物理磁盘(我们称之为"裸设备"，也就是该设备没有被格式化指定的文件系统)，则ceph是可以直接来进行管理的，只不过它使用的是bluestore存储引擎。
	(2)通常情况下我们的ceph集群会有OSDs，Monitors，Managers和MDSs这几个基础组件，其中MDSs是可选的组件。
	(3)RBD不需要通过运行守护进程来提供服务的，它基于librbd模块，它提供了相应的API来进行使用。

### 组件关系总结

1. 基础层：RADOS 集群（OSD + MON）提供核心存储能力。
    
2. 管理层：MGR 提供监控和管理接口，MDS/RGW 为上层应用服务。
    
3. 客户端接口：RBD（块）、CephFS（文件）、RGW（对象）通过 librados 与 RADOS 交互。
    

---

### 常见误区澄清

1. MON 的底层协议：
    
    - MON 的元数据存储基于 LevelDB/RocksDB，而非 POSIX。集群状态同步依赖 Paxos 协议。
        
2. MDS 的定位：
    
    - MDS 是 CephFS 的元数据服务，不属于 RADOS 核心层，与 RBD/RGW 平级。
        
3. PG 的虚拟性：
    
    - PG 是逻辑概念，用于控制数据分布粒度，不直接对应物理存储单元。

2 ceph集群部署

见 [[ceph集群部署]]

3 存储池管理

存储池的类型
– 副本池： replicated
数据按照指定副本数量存储，默认为3副本。
– 纠删码池: erasure
相比于副本池而言，更加节省磁盘空间。有点类似于RAID5。
在实际工作中，推荐大家使用副本池。

3.1 存储池的创建，查看和修改

1.查看存储池列表
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
[root@ceph141 ~]# 

	2.查看存储池及其编号信息
[root@ceph141 ~]# ceph osd lspools
1 .mgr
2 cmy
[root@ceph141 ~]# 

	3.查看存储池的详细信息
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 63 lfor 0/0/61 flags hashpspool stripe_width 0 read_balance_score 1.97

[root@ceph141 ~]# 


	4.创建副本存储池(默认) 
[root@ceph141 ~]# ceph osd pool create xixi replicated  # 创建副本池类型，如果不指定，则默认为副本池。
pool 'xixi' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool create haha erasure   # 创建纠删码池类型
pool 'haha' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 63 lfor 0/0/61 flags hashpspool stripe_width 0 read_balance_score 1.97
pool 3 'xixi' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 83 lfor 0/0/81 flags hashpspool stripe_width 0 read_balance_score 2.25
pool 4 'haha' erasure profile default size 4 min_size 3 crush_rule 1 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 88 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 



	5.修改存储池的配置参数
[root@ceph141 ~]# ceph osd pool set xixi size 2
set pool 3 size to 2
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 63 lfor 0/0/61 flags hashpspool stripe_width 0 read_balance_score 1.97
pool 3 'xixi' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 95 lfor 0/0/81 flags hashpspool stripe_width 0 read_balance_score 2.25
pool 4 'haha' erasure profile default size 4 min_size 3 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 92 lfor 0/0/90 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 


	6.修改存储池的名称
[root@ceph141 ~]# ceph osd pool rename xixi hehe
pool 'xixi' renamed to 'hehe'

	7.查看存储池利用率统计信息
[root@ceph141 ~]# rados df
POOL_NAME           USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS       RD  WR_OPS       WR  USED COMPR  UNDER COMPR
.mgr             1.3 MiB        2       0       6                   0        0         0     428  736 KiB     201  1.9 MiB         0 B          0 B
cephfs_data      9.9 GiB     8120       0   24360                   0        0         0     328  1.1 GiB   37861  3.3 GiB         0 B          0 B
cephfs_metadata  545 MiB       68       0     204                   0        0         0    2328  146 MiB   30039  196 MiB         0 B          0 B
cmy       12 KiB        1       0       3                   0        0         0       0      0 B       2    2 KiB         0 B          0 B

total_objects    8191
total_used       11 GiB
total_avail      5.3 TiB
total_space      5.3 TiB


	8.获取特定池或所有池的I/O信息
[root@ceph141 ~]# ceph osd pool stats
pool .mgr id 1
  nothing is going on

pool cmy id 2
  nothing is going on

pool hehe id 3
  nothing is going on

pool haha id 4
  nothing is going on

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool stats cmy
pool cmy id 2
  nothing is going on

[root@ceph141 ~]# 

	9.获取存储池某个特定的配置参数
[root@ceph141 ~]# ceph osd pool get hehe size
size: 2
[root@ceph141 ~]# 


	10.获取存储的大小相关配置信息
[root@ceph141 ~]# ceph osd dump | grep 'replicated size'
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 63 lfor 0/0/61 flags hashpspool stripe_width 0 read_balance_score 1.97
pool 3 'hehe' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 95 lfor 0/0/81 flags hashpspool stripe_width 0 read_balance_score 2.25
[root@ceph141 ~]#

3.2 存储池删除!!!

删除存储池的两种机制
1.删除存储池的机制概述
一旦一个存储池被删除，那么该存储池的所有数据都会被删除且无法找回。

因此为了安全起见，ceph有存储池保护机制，ceph支持两种保护机制: "nodelete"和"mon_allow_pool_delete"
– nodelete:
一旦一个存储池被打上该标记，则意味着存储池不可被删除，默认值为false，表示可以被删除。
– mon_allow_pool_delete:
告诉所有mon组件，可以删除存储池。默认值为false，表示不可以被删除。
生产环境中，为了安全起见，建议大家将存储池设置为nodelete的属性为"ture"，mon_allow_pool_delete的值为false。

‘nodelete’和’mon_allow_pool_delete’任意一种机制都具有一票否决权，如果想要删除一个存储池，2者都得允许删除，这就是ceph的存储池保护机制。

2.mon_allow_pool_delete验证案例
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
hehe
haha
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get hehe nodelete
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=true
mon.ceph141: {}
mon.ceph141: mon_allow_pool_delete = '' 
mon.ceph142: {}
mon.ceph142: mon_allow_pool_delete = '' 
mon.ceph143: {}
mon.ceph143: mon_allow_pool_delete = '' 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete hehe hehe --yes-i-really-really-mean-it
pool 'hehe' removed
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
haha
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=false
mon.ceph141: {}
mon.ceph141: mon_allow_pool_delete = '' 
mon.ceph142: {}
mon.ceph142: mon_allow_pool_delete = '' 
mon.ceph143: {}
mon.ceph143: mon_allow_pool_delete = '' 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get haha nodelete
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete haha haha --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph141 ~]# 


	3.nodelete示例
[root@ceph141 ~]# ceph osd pool get haha nodelete
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set haha nodelete true
set pool 4 nodelete to true
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get haha nodelete
nodelete: true
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=true
mon.ceph141: {}
mon.ceph141: mon_allow_pool_delete = '' 
mon.ceph142: {}
mon.ceph142: mon_allow_pool_delete = '' 
mon.ceph143: {}
mon.ceph143: mon_allow_pool_delete = '' 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete haha haha --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set haha nodelete false
set pool 4 nodelete to false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete haha haha --yes-i-really-really-mean-it
pool 'haha' removed
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
[root@ceph141 ~]#

3.3 pg，pgp，size，min_size数量配置

1.官方推荐合理设置PG数量

OSD数量 * 100
-----------------    ---> PG数量 
pool存储池的size


假设你有9块磁盘，则配置如下 
9*100
-------		----> 300PG
3


但是得到的结果是300，官方建议是2的次方，和300比较接近的是：256，因此集群的合理PG数量为256。
 


	2.当autoscale_mode开启式，修改pg的数量时会自动变会默认的pg数量32
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 63 lfor 0/0/61 flags hashpspool stripe_width 0 read_balance_score 1.97

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set cmy pg_num 2
set pool 2 pg_num to 2
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set cmy pgp_num 2
set pool 2 pgp_num to 2
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 29 pgp_num 27 pg_num_target 2 pgp_num_target 2 pg_num_pending 28 autoscale_mode on last_change 121 lfor 0/121/121 flags hashpspool stripe_width 0 read_balance_score 2.17

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 144 lfor 0/139/142 flags hashpspool stripe_width 0 read_balance_score 1.97

[root@ceph141 ~]# 


	3.关闭autoscale_mode模式，再次修改pg和pgp数量
[root@ceph141 ~]# ceph osd pool set cmy pg_autoscale_mode off
set pool 2 pg_autoscale_mode to off
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 149 lfor 0/139/142 flags hashpspool stripe_width 0 read_balance_score 1.97

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set cmy pgp_num 2
set pool 2 pgp_num to 2
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set cmy pg_num 2
set pool 2 pg_num to 2
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 30 pg_num_target 2 pgp_num_target 2 autoscale_mode off last_change 154 lfor 0/139/142 flags hashpspool stripe_width 0 read_balance_score 2.25

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 pg_num_target 2 pgp_num_target 2 autoscale_mode off last_change 239 lfor 0/239/237 flags hashpspool stripe_width 0 read_balance_score 2.25

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 autoscale_mode off last_change 316 lfor 0/316/314 flags hashpspool stripe_width 0 read_balance_score 4.46

[root@ceph141 ~]# 


	4.创建存储池时指定pg和pgp
[root@ceph141 ~]# ceph osd pool create haha 128 128 --autoscale_mode off
pool 'haha' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 2 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 autoscale_mode off last_change 316 lfor 0/316/314 flags hashpspool stripe_width 0 read_balance_score 4.46
pool 5 'xixi' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 324 flags hashpspool stripe_width 0 read_balance_score 9.09
pool 6 'haha' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode off last_change 323 flags hashpspool stripe_width 0 read_balance_score 2.11

[root@ceph141 ~]# 

	

	5.总结
pg:
	一个存储池可以有多个pg，数据分布式存储在不同的pg中，pg和pgp数量要保持一致。
	
size:
	数据存储几份，对于副本池而言，若不指定，则默认存储3副本。
	
min_size
	最小可用的副本数量，比如3副本，如果你设置为最小的副本数量为1，表示可以允许挂掉2个节点。但是如果你设置的为2，表示可以挂掉1个节点。

3.4 rbd存储池的资源配额

	
- rbd存储池的资源配额实战 	
	1.存储池资源配额概述
ceph集群官方支持基于对象存储数量和数据存储的大小两种方式限制存储资源配额。

官网连接:
	https://docs.ceph.com/en/latest/rados/operations/pools/#setting-pool-quotas
	
	
	
	2.创建存储池
[root@ceph141 ~]# ceph osd pool create linux97 32 32 --size 3 --autoscale_mode off
pool 'linux97' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls  detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 22 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 8 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode off last_change 246 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 3.38
pool 9 'linux97' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 250 flags hashpspool stripe_width 0 read_balance_score 2.25

[root@ceph141 ~]# 



	3.查看存储池的资源限制信息
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: N/A
  max bytes  : N/A
[root@ceph141 ~]# 
 
 
	4.限制存储池最大上限有3w个对象
[root@ceph141 ~]# ceph osd pool set-quota linux97  max_objects 30000
set-quota max_objects = 30000 for pool linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: 30k objects  (current num objects: 0 objects)
  max bytes  : N/A
[root@ceph141 ~]# 


	5.限制存储池最大存储10M大小
[root@ceph141 ~]# echo 10*1024*1024| bc
10485760
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set-quota linux97  max_bytes 10485760
set-quota max_bytes = 10485760 for pool linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: 30k objects  (current num objects: 0 objects)
  max bytes  : 10 MiB  (current num bytes: 0 bytes)
[root@ceph141 ~]# 


	6.验证数据存储的上限
[root@ceph141 ~]# ll /etc/hosts
-rw-r--r-- 1 root root 283 May 16 09:24 /etc/hosts
[root@ceph141 ~]# 
[root@ceph141 ~]# ll install-docker.sh 
-rwxr-xr-x 1 root root 3513 Nov  2  2024 install-docker.sh*
[root@ceph141 ~]# 
[root@ceph141 ~]# ll cmy-autoinstall-docker-docker-compose.tar.gz 
-rw-r--r-- 1 root root 84289496 Nov  2  2024 cmy-autoinstall-docker-docker-compose.tar.gz
[root@ceph141 ~]# 
[root@ceph141 ~]# ll /etc/passwd
-rw-r--r-- 1 root root 2571 May 16 10:46 /etc/passwd
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ll -h cmy-autoinstall-docker-docker-compose.tar.gz 
-rw-r--r-- 1 root root 81M Nov  2  2024 cmy-autoinstall-docker-docker-compose.tar.gz
[root@ceph141 ~]# 
[root@ceph141 ~]# rados put file01 /etc/hosts -p linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: 30k objects  (current num objects: 1 objects)  # 目前有1个对象
  max bytes  : 10 MiB  (current num bytes: 283 bytes)  # 目前283字节大小
[root@ceph141 ~]# 
[root@ceph141 ~]# rados put file02 install-docker.sh  -p linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: 30k objects  (current num objects: 2 objects)  # 目前有2个对象
  max bytes  : 10 MiB  (current num bytes: 3796 bytes)  # 目前3796字节大小
[root@ceph141 ~]# 
[root@ceph141 ~]# rados put file03 cmy-autoinstall-docker-docker-compose.tar.gz  -p linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: 30k objects  (current num objects: 3 objects)
  max bytes  : 10 MiB  (current num bytes: 84293292 bytes)  # 注意，已经超过了限制的10M，目前已经80MB+，这是因为在上传文件前，其大小还不足10M，因此判断此文件可以上传！
[root@ceph141 ~]# 
[root@ceph141 ~]# echo  84293292/1024/1024 | bc  # 很明显已经超过了80MB
80
[root@ceph141 ~]# 
[root@ceph141 ~]# rados put file04  /etc/passwd  -p linux97  # 你会发现无法上传成功啦！因为已经超过10MB啦~(目前将近80+MB)
^C


	7.清除资源限制
[root@ceph141 /]#  ceph osd pool set-quota linux97 max_objects 0
set-quota max_objects = 0 for pool linux97
[root@ceph141 /]#
[root@ceph141 /]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: N/A
  max bytes  : 10 MiB  (current num bytes: 84293292 bytes)
[root@ceph141 /]# 
[root@ceph141 /]# ceph osd pool set-quota linux97  max_bytes 0 
set-quota max_bytes = 0 for pool linux97
[root@ceph141 /]# 
[root@ceph141 /]# ceph osd pool get-quota linux97
quotas for pool 'linux97':
  max objects: N/A
  max bytes  : N/A
[root@ceph141 /]#

4 rbd块存储

4.1 基本操作之增删改查

1.创建存储池 
[root@ceph141 ~]# ceph osd pool create cmy 8 8  replicated --autoscale_mode off --size 3 
pool 'cmy' created
[root@ceph141 ~]# 


	2.声明存储池为rbd应用类型 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_WARN
            1 pool(s) do not have an application enabled
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 17h)
    mgr: ceph141.mbakds(active, since 17h), standbys: ceph142.qgifwo
    osd: 9 osds: 9 up (since 17h), 9 in (since 17h)
 
  data:
    pools:   2 pools, 9 pgs
    objects: 2 objects, 449 KiB
    usage:   652 MiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     9 active+clean
 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 8 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode off last_change 346 flags hashpspool stripe_width 0 read_balance_score 3.38

[root@ceph141 ~]# 
[root@ceph141 ~]# rbd pool init cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail | grep application
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 9.09
pool 8 'cmy' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode off last_change 349 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 3.38
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 17h)
    mgr: ceph141.mbakds(active, since 17h), standbys: ceph142.qgifwo
    osd: 9 osds: 9 up (since 17h), 9 in (since 17h)
 
  data:
    pools:   2 pools, 9 pgs
    objects: 3 objects, 449 KiB
    usage:   652 MiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     9 active+clean
 
[root@ceph141 ~]# 


	3.创建一个块设备 
[root@ceph141 ~]# rbd create -s 2G cmy/linux97
[root@ceph141 ~]# 


	4.查看块设备列表
[root@ceph141 ~]# rbd ls -l cmy
NAME     SIZE   PARENT  FMT  PROT  LOCK
linux97  2 GiB            2            
[root@ceph141 ~]# 


	5.查看块设备的详细信息
[root@ceph141 ~]# rbd info cmy/linux97
rbd image 'linux97':
	size 2 GiB in 512 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: d5074cd0daea
	block_name_prefix: rbd_data.d5074cd0daea
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri May 16 15:46:30 2025
	access_timestamp: Fri May 16 15:46:30 2025
	modify_timestamp: Fri May 16 15:46:30 2025
[root@ceph141 ~]# 
	
	
	6.扩容块设备大小
[root@ceph141 ~]# rbd resize -s 4G cmy/linux97
Resizing image: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -l cmy
NAME     SIZE   PARENT  FMT  PROT  LOCK
linux97  4 GiB            2            
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/linux97
rbd image 'linux97':
	size 4 GiB in 1024 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: d5074cd0daea
	block_name_prefix: rbd_data.d5074cd0daea
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri May 16 15:46:30 2025
	access_timestamp: Fri May 16 15:46:30 2025
	modify_timestamp: Fri May 16 15:46:30 2025
[root@ceph141 ~]# 


	7.缩容块设备大小 
[root@ceph141 ~]# rbd resize -s 1G cmy/linux97 --allow-shrink
Resizing image: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -l cmy
NAME     SIZE   PARENT  FMT  PROT  LOCK
linux97  1 GiB            2            
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/linux97
rbd image 'linux97':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: d5074cd0daea
	block_name_prefix: rbd_data.d5074cd0daea
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri May 16 15:46:30 2025
	access_timestamp: Fri May 16 15:46:30 2025
	modify_timestamp: Fri May 16 15:46:30 2025
[root@ceph141 ~]# 


	7.修改块设备的名称
[root@ceph141 ~]# rbd ls -l cmy
NAME     SIZE   PARENT  FMT  PROT  LOCK
linux97  1 GiB            2            
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd rename -p cmy linux97 LINUX97
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -l cmy
NAME     SIZE   PARENT  FMT  PROT  LOCK
LINUX97  1 GiB            2            
[root@ceph141 ~]# 


	8.删除块设备 
[root@ceph141 ~]# rbd rm cmy/LINUX97
Removing image: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -l cmy
[root@ceph141 ~]#

4.2 客户端使用块设备及热更新

2.1 安装ceph通用包环境
[root@prometheus-server31 ~]# apt -y install ceph-common
[root@prometheus-server31 ~]# ceph --version
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
[root@prometheus-server31 ~]# 

		2.2 拷贝ceph集群认证文件
[root@ceph141 ~]# scp /etc/ceph/{ceph.conf,ceph.client.admin.keyring} 10.168.10.31:/etc/ceph


		2.3 客户端挂载rbd设备
[root@prometheus-server31 ~]# rbd map cmy/prometheus-server
/dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# fdisk -l /dev/rbd0
Disk /dev/rbd0: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
[root@prometheus-server31 ~]#  

		2.4 客户端格式化ext4文件系统
mkfs.ext4 /dev/rbd0        #xfs文件系统 mkfs.xfs /dev/rbd1 

mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: 96976f6e-fef9-437a-9d4f-6c1490ee0426
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

[root@prometheus-server31 ~]# 

		2.5 挂载块设备
[root@prometheus-server31 ~]# mount /dev/rbd0 /mnt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep mnt
/dev/rbd0                          3.9G   24K  3.7G   1% /mnt
[root@prometheus-server31 ~]# 

		2.6 测试尝试写入数据
[root@prometheus-server31 ~]# cp /etc/os-release /mnt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /mnt/
total 28
drwxr-xr-x  3 root root  4096 May 16 15:56 ./
drwxr-xr-x 21 root root  4096 May 12 09:01 ../
drwx------  2 root root 16384 May 16 15:55 lost+found/
-rw-r--r--  1 root root   427 May 16 15:56 os-release
[root@prometheus-server31 ~]# 


1.服务端调大设备大小
[root@ceph141 ~]# rbd info cmy/prometheus-server
rbd image 'prometheus-server':
	size 4 GiB in 1024 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: d6f5a501cf29
	block_name_prefix: rbd_data.d6f5a501cf29
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Tue Apr  1 11:37:28 2025
	access_timestamp: Tue Apr  1 11:37:28 2025
	modify_timestamp: Tue Apr  1 11:37:28 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd resize -s 40G cmy/prometheus-server
Resizing image: 100% complete...done.
[root@ceph141 ~]# 



# 客户端调整文件系统（如果设备已挂载）
resize2fs /dev/rbd0  # ext4
xfs_growfs /mount/point  # XFS

4.3 rbd的快照

基于快照进行数据备份和恢复

rbd的快照可以进行数据的备份，恢复。

客户端环境准备如下：
[root@prometheus-server31 ~]# ll /opt/
total 16
drwxr-xr-x  2 root root   69 May 16 16:51 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root  427 May 16 16:51 os-release
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep opt
/dev/rbd1                           20G  177M   20G   1% /opt
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# rbd showmapped  # 查看本地的块设备信息 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]# 


	2.创建快照 【方式一】
[root@ceph141 ~]# rbd snap create -p cmy --image node-exporter --snap xixi
Creating snap: 100% complete...done.
[root@ceph141 ~]# 


	3.再次修改数据 
[root@prometheus-server31 ~]# cp /etc/shadow /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# cp /etc/hostname /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 24
drwxr-xr-x  2 root root   99 May 16 16:54 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root   20 May 16 16:54 hostname
-rw-r--r--  1 root root  427 May 16 16:51 os-release
-rw-r-----  1 root root 1473 May 16 16:54 shadow
[root@prometheus-server31 ~]# 


	4.再次创建快照【方式二】
[root@ceph141 ~]# rbd snap create cmy/node-exporter@haha
Creating snap: 100% complete...done.
[root@ceph141 ~]# 


	5.查看快照信息 
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls -p cmy --image node-exporter 
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 



	6.客户端篡改数据
[root@prometheus-server31 ~]# ll /opt/
total 24
drwxr-xr-x  2 root root   99 May 16 16:54 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root   20 May 16 16:54 hostname
-rw-r--r--  1 root root  427 May 16 16:51 os-release
-rw-r-----  1 root root 1473 May 16 16:54 shadow
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rm -f /opt/*
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 4
drwxr-xr-x  2 root root    6 May 16 16:56 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
[root@prometheus-server31 ~]# 


	7.客户端准备恢复数据前要删除块设备映射
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# umount /opt
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap /dev/rbd1 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ls /dev/rbd1
ls: cannot access '/dev/rbd1': No such file or directory
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ls /dev/rbd0
/dev/rbd0
[root@prometheus-server31 ~]# 



	8.服务端开始回滚数据
[root@ceph141 ~]# rbd snap rollback cmy/node-exporter@xixi
Rolling back to snapshot: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 



	9.客户端重新挂载测试，验证数据是否恢复 
[root@prometheus-server31 ~]# rbd map cmy/node-exporter
/dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ls /dev/rbd1
/dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# mount /dev/rbd1  /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 16
drwxr-xr-x  2 root root   69 May 16 16:51 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root  427 May 16 16:51 os-release
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 


	10.客户端再次卸载数据
[root@prometheus-server31 ~]# umount /opt 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 


	11.服务端再次回滚
[root@ceph141 ~]# rbd snap rollback -p cmy --image node-exporter --snap haha
Rolling back to snapshot: 100% complete...done.
[root@ceph141 ~]# 


	12.客户端再次验证测试 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd map cmy/node-exporter
/dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# mount /dev/rbd1  /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 24
drwxr-xr-x  2 root root   99 May 16 16:54 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root   20 May 16 16:54 hostname
-rw-r--r--  1 root root  427 May 16 16:51 os-release
-rw-r-----  1 root root 1473 May 16 16:54 shadow
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]#

删除快照，快照保护，基于克隆快照恢复数据实战

1.未被保护的快照可以被删除
[root@ceph141 ~]# rbd snap ls cmy/prometheus-server
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create -p cmy --image prometheus-server --snap xixi
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/prometheus-server
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     5  xixi  40 GiB             Fri May 16 17:05:56 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/prometheus-server@xixi
Removing snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/prometheus-server
[root@ceph141 ~]# 
[root@ceph141 ~]# 


	2.保护快照 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap protect cmy/node-exporter@xixi
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB  yes        Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 



	3.无法删除被保护的快照 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB  yes        Fri May 16 16:53:44 2025
     4  haha  20 GiB             Fri May 16 16:55:29 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/node-exporter@xixi  # 无法删除被保护的快照
Removing snap: 0% complete...failed.
rbd: snapshot 'xixi' is protected from removal.
2025-05-16T17:07:18.389+0800 7f94d65e3640 -1 librbd::Operations: snapshot is protected

[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/node-exporter@haha  # 没有被保护的快照是可以被删除的
Removing snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB  yes        Fri May 16 16:53:44 2025
[root@ceph141 ~]# 

	
	4.克隆快照
[root@ceph141 ~]# rbd clone cmy/node-exporter@xixi cmy/child-xixi-001
[root@ceph141 ~]# 


	5.查看快照是否有子镜像 
[root@ceph141 ~]# rbd clone cmy/node-exporter@xixi cmy/child-xixi-001
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB  yes        Fri May 16 16:53:44 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd children cmy/node-exporter@xixi
cmy/child-xixi-001
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -p cmy -l
NAME                SIZE    PARENT                        FMT  PROT  LOCK
child-xixi-001      20 GiB  cmy/node-exporter@xixi    2            
node-exporter       20 GiB                                  2            
node-exporter@xixi  20 GiB                                  2  yes       
prometheus-server   40 GiB                                  2            
[root@ceph141 ~]# 


	6.基于克隆的子镜像恢复数据实战【相比于回滚快照速度更快】
将块设备回滚到快照意味着用快照中的数据覆盖块设备的当前版本。执行回滚所需的时间随着映像的大小而增加。

从快照克隆比将映像回滚到快照更快。从快照克隆是返回到预先存在状态的首选方法。

		6.1 先破坏数据
[root@prometheus-server31 ~]# ll /opt/
total 24
drwxr-xr-x  2 root root   99 May 16 16:54 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root   20 May 16 16:54 hostname
-rw-r--r--  1 root root  427 May 16 16:51 os-release
-rw-r-----  1 root root 1473 May 16 16:54 shadow
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rm -f /opt/*
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 4
drwxr-xr-x  2 root root    6 May 16 17:16 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             node-exporter      -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# umount /opt 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 


		6.2 基于克隆的快照进行回滚 	 
[root@prometheus-server31 ~]# rbd map cmy/child-xixi-001  # 此快照是从父快照克隆而来，几乎无需恢复时间(速度比rollback更快！)。
/dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# mount /dev/rbd1 /opt/

[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 16
drwxr-xr-x  2 root root   69 May 16 16:51 ./
drwxr-xr-x 21 root root 4096 May 12 09:01 ../
-rw-------  1 root root  319 May 16 15:58 00-installer-config.yaml
-rw-r--r--  1 root root  657 May 16 16:51 fstab
-rw-r--r--  1 root root  427 May 16 16:51 os-release
[root@prometheus-server31 ~]#

取消快照保护，子镜像独立实战

	1.无法移除被保护的快照
[root@ceph141 ~]# rbd ls -p cmy -l
NAME                SIZE    PARENT                        FMT  PROT  LOCK
child-xixi-001      20 GiB  cmy/node-exporter@xixi    2        excl
node-exporter       20 GiB                                  2            
node-exporter@xixi  20 GiB                                  2  yes       
prometheus-server   40 GiB                                  2            
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/node-exporter@xixi 
Removing snap: 0% complete...failed.
rbd: snapshot 'xixi' is protected from removal.
2025-05-16T17:21:22.070+0800 7f885a7fc640 -1 librbd::Operations: snapshot is protected

[root@ceph141 ~]# 

	2.如果被保护的快照有子镜像则无法取消保护
[root@ceph141 ~]# rbd children cmy/node-exporter@xixi
cmy/child-xixi-001
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/node-exporter@xixi
rbd image 'node-exporter':
	size 20 GiB in 5120 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: d6745e6f7503
	block_name_prefix: rbd_data.d6745e6f7503
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri May 16 15:57:14 2025
	access_timestamp: Fri May 16 15:57:14 2025
	modify_timestamp: Fri May 16 15:57:14 2025
	protected: True
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap unprotect cmy/node-exporter@xixi
2025-05-16T17:22:18.485+0800 7f41dd430640 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [d76af70ef658] in pool 'cmy'

2025-05-16T17:22:18.489+0800 7f41ddc31640 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy

2025-05-16T17:22:18.489+0800 7f41ddc31640 -1 librbd::SnapshotUnprotectRequest: 0x55e0603b22a0 should_complete_error: ret_val=-16

2025-05-16T17:22:18.493+0800 7f41dd430640 -1 librbd::SnapshotUnprotectRequest: 0x55e0603b22a0 should_complete_error: ret_val=-16

rbd: unprotecting snap failed: (16) Device or resource busy
[root@ceph141 ~]# 



	
	3.基于flatten取消父镜像和子镜像的关联(说白了，就是让子镜像从父镜像依赖的文件重新复制一份，独立出来，这样就和父镜像无关)
[root@ceph141 ~]# rbd children cmy/node-exporter@xixi
cmy/child-xixi-001
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd flatten cmy/child-xixi-001  # 如果父镜像数据过大，可能需要较长的时间拷贝数据
Image flatten: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd children cmy/node-exporter@xixi
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls cmy -l
NAME                SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001      20 GiB            2            
node-exporter       20 GiB            2            
node-exporter@xixi  20 GiB            2  yes       
prometheus-server   40 GiB            2            
[root@ceph141 ~]# 

	4.取消快照保护
[root@ceph141 ~]# rbd snap unprotect cmy/node-exporter@xixi
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
[root@ceph141 ~]# 


	5.取消保护后，就可以移除快照 
[root@ceph141 ~]# rbd ls cmy -l
NAME                SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001      20 GiB            2        excl
node-exporter       20 GiB            2            
node-exporter@xixi  20 GiB            2            
prometheus-server   40 GiB            2            
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
SNAPID  NAME  SIZE    PROTECTED  TIMESTAMP               
     3  xixi  20 GiB             Fri May 16 16:53:44 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/node-exporter@xixi
Removing snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/node-exporter
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2            
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2            
[root@ceph141 ~]#

RBD的块设备的快照数量限制

1 创建指定数量的快照
		1.1 未添加限制前
[root@ceph141 ~]# rbd snap ls cmy/cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy | egrep "snapshot_count|snapshot_limit"
	snapshot_count: 0 
[root@ceph141 ~]# 


		1.2 添加快照限制
[root@ceph141 ~]# rbd snap limit set cmy/cmy  --limit 5
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy | egrep "snapshot_count|snapshot_limit"
	snapshot_count: 0
	snapshot_limit: 5
[root@ceph141 ~]# 



	2.创建快照测试
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-001
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-002
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-003
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-004
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-005
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-006
Creating snap: 10% complete...failed.
rbd: failed to create snapshot: (122) Disk quota exceeded
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy | egrep "snapshot_count|snapshot_limit"
	snapshot_count: 5
	snapshot_limit: 5
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/cmy
SNAPID  NAME         SIZE    PROTECTED  TIMESTAMP               
     6  linux97-001  20 GiB             Mon May 19 09:01:15 2025
     7  linux97-002  20 GiB             Mon May 19 09:01:17 2025
     8  linux97-003  20 GiB             Mon May 19 09:01:19 2025
     9  linux97-004  20 GiB             Mon May 19 09:01:21 2025
    10  linux97-005  20 GiB             Mon May 19 09:01:23 2025
[root@ceph141 ~]# 


	3.清楚快照数量限制
[root@ceph141 ~]# rbd snap limit clear  cmy/cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy | egrep "snapshot_count|snapshot_limit"
	snapshot_count: 5
[root@ceph141 ~]# 

	
	4.再次测试验证，发现能够创建快照
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-006
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-007
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap create cmy/cmy@linux97-008
Creating snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy | egrep "snapshot_count|snapshot_limit"
	snapshot_count: 8
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/cmy
SNAPID  NAME         SIZE    PROTECTED  TIMESTAMP               
     6  linux97-001  20 GiB             Mon May 19 09:01:15 2025
     7  linux97-002  20 GiB             Mon May 19 09:01:17 2025
     8  linux97-003  20 GiB             Mon May 19 09:01:19 2025
     9  linux97-004  20 GiB             Mon May 19 09:01:21 2025
    10  linux97-005  20 GiB             Mon May 19 09:01:23 2025
    12  linux97-006  20 GiB             Mon May 19 09:02:48 2025
    13  linux97-007  20 GiB             Mon May 19 09:02:50 2025
    14  linux97-008  20 GiB             Mon May 19 09:02:52 2025
[root@ceph141 ~]# 



	5.删除所有快照 
[root@ceph141 ~]# rbd snap rm cmy/cmy@linux97-001
Removing snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap rm cmy/cmy --snap linux97-002
Removing snap: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/cmy
SNAPID  NAME         SIZE    PROTECTED  TIMESTAMP               
     8  linux97-003  20 GiB             Mon May 19 09:01:19 2025
     9  linux97-004  20 GiB             Mon May 19 09:01:21 2025
    10  linux97-005  20 GiB             Mon May 19 09:01:23 2025
    12  linux97-006  20 GiB             Mon May 19 09:02:48 2025
    13  linux97-007  20 GiB             Mon May 19 09:02:50 2025
    14  linux97-008  20 GiB             Mon May 19 09:02:52 2025
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap purge cmy/cmy  # 删除所有的快照
Removing all snapshots: 100% complete...done.
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd snap ls cmy/cmy
[root@ceph141 ~]#

4.4 ceph块设备的回收站机制

	1.创建测试块设备
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
cmy      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 



	2.查看指定存储池回收站列表
[root@ceph141 ~]# rbd trash ls -p cmy 
[root@ceph141 ~]#  

	
	3.将块设备移到回收站模拟删除效果
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
cmy      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd trash move cmy/cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd trash ls -p cmy -l
ID            NAME           SOURCE  DELETED_AT                STATUS                               PARENT
d6745e6f7503  cmy  USER    Mon May 19 08:54:10 2025  expired at Mon May 19 08:54:10 2025        
[root@ceph141 ~]# 



	4.再次查看存储池的信息列表【发现块设备没有了】
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 

	
	5.恢复块设备
[root@ceph141 ~]# rbd trash restore -p cmy --image cmy --image-id d6745e6f7503
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd trash ls -p cmy -l
[root@ceph141 ~]# 


	6.验证是否回收成功
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
cmy      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd info cmy/cmy
rbd image 'cmy':
	size 20 GiB in 5120 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: d6745e6f7503
	block_name_prefix: rbd_data.d6745e6f7503
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri May 16 15:57:14 2025
	access_timestamp: Fri May 16 17:24:06 2025
	modify_timestamp: Fri May 16 15:57:14 2025
[root@ceph141 ~]#

4.5 卸载rbd设备的两种方式

1.查看本地rbd块设备的映射信息
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             child-xixi-001     -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 


	2.查看本地的挂载信息 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
/dev/rbd1                           20G  177M   20G   1% /opt
[root@prometheus-server31 ~]# 

	3.取消挂载点
[root@prometheus-server31 ~]# umount /opt 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
[root@prometheus-server31 ~]# 

	4.取消映射关系【推荐】
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             child-xixi-001     -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap -p cmy --image child-xixi-001
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 

	5.另一种卸载方式
[root@prometheus-server31 ~]# umount /mnt 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep rbd
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap cmy/prometheus-server
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
[root@prometheus-server31 ~]#

4.6 rbd的开机挂载

1.编写开机启动脚本
[root@prometheus-server31 ~]# cat /etc/rc.local 
#!/bin/bash

rbd map cmy/prometheus-server
rbd map cmy/child-xixi-001
mount /dev/rbd0 /mnt
mount /dev/rbd1 /opt
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# chmod +x /etc/rc.local
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /etc/rc.local
-rwxr-xr-x 1 root root 111 Apr  1 16:33 /etc/rc.local*
[root@prometheus-server31 ~]# 



	2.重启服务器 
[root@prometheus-server31 ~]# reboot 

	
	3.验证测试 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
/dev/rbd1                           20G  177M   20G   1% /opt
[root@prometheus-server31 ~]# 





- 多个节点无法同时使用同一块设备案例
	1.LOCK字段中'excl'标记着该设备正在被使用
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 
[root@ceph141 ~]# 


	2.客户端取消映射 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             child-xixi-001     -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
/dev/rbd1                           20G  177M   20G   1% /opt
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# umount /opt 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd unmap /dev/rbd1 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
[root@prometheus-server31 ~]# 

	3.再次查看服务端
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2            
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 


	4.客户端挂载设备
		4.1 终端1挂载设备 
[root@prometheus-server32 ~]# apt -y install ceph-common


		4.2 拷贝认证文件 
[root@ceph141 ~]# scp /etc/ceph/ceph{.client.admin.keyring,.conf} 10.168.10.32:/etc/ceph

		4.3 挂载设备 
[root@elk93 ~]# rbd map cmy/child-xixi-001
/dev/rbd0
[root@elk93 ~]# 
[root@elk93 ~]# rbd showmapped 
id  pool       namespace  image           snap  device   
0   cmy             child-xixi-001  -     /dev/rbd0
[root@elk93 ~]# 


		4.4 测试数据 
[root@elk93 ~]# mount /dev/rbd0 /mnt/
[root@elk93 ~]# ll /mnt/
total 12
drwxr-xr-x  2 root root   56 Apr  1 11:52 ./
drwxr-xr-x 22 root root 4096 Mar 13 11:57 ../
-rw-------  1 root root  319 Apr  1 11:52 00-installer-config.yaml
-rw-r--r--  1 root root  427 Apr  1 11:52 os-release
[root@elk93 ~]# 
[root@elk93 ~]# cp /etc/hostname /mnt/
[root@elk93 ~]# 
[root@elk93 ~]# ll /mnt/
total 16
drwxr-xr-x  2 root root   72 Apr  1 16:58 ./
drwxr-xr-x 22 root root 4096 Mar 13 11:57 ../
-rw-------  1 root root  319 Apr  1 11:52 00-installer-config.yaml
-rw-r--r--  1 root root    6 Apr  1 16:58 hostname
-rw-r--r--  1 root root  427 Apr  1 11:52 os-release
[root@elk93 ~]# 


		4.5 服务端查看块设备挂载情况 
[root@ceph141 ~]# rbd ls cmy -l
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@ceph141 ~]# 



		4.6 终端2继续挂载一个exel的设备 
[root@prometheus-server31 ~]# rbd map cmy/child-xixi-001
/dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd showmapped 
id  pool       namespace  image              snap  device   
0   cmy             prometheus-server  -     /dev/rbd0
1   cmy             child-xixi-001     -     /dev/rbd1
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# df -h | grep rbd
/dev/rbd0                           40G  184M   38G   1% /mnt
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# mount /dev/rbd1 /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 16
drwxr-xr-x  2 root root   72 Apr  1 16:58 ./
drwxr-xr-x 21 root root 4096 Apr  1 11:46 ../
-rw-------  1 root root  319 Apr  1 11:52 00-installer-config.yaml
-rw-r--r--  1 root root    6 Apr  1 16:58 hostname
-rw-r--r--  1 root root  427 Apr  1 11:52 os-release
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# cp /etc/fstab /opt/
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /opt/
total 20
drwxr-xr-x  2 root root   85 Apr  1 17:00 ./
drwxr-xr-x 21 root root 4096 Apr  1 11:46 ../
-rw-------  1 root root  319 Apr  1 11:52 00-installer-config.yaml
-rw-r--r--  1 root root  657 Apr  1 17:00 fstab
-rw-r--r--  1 root root    6 Apr  1 16:58 hostname
-rw-r--r--  1 root root  427 Apr  1 11:52 os-release
[root@prometheus-server31 ~]# 



		4.7 再次切回终端1发现数据没有任何变化【此时数据已经开始冲突了，因此生产环境中，不要让2个主机使用同一个镜像的情况！】
[root@elk93 ~]# ll /mnt/
total 16
drwxr-xr-x  2 root root   72 Apr  1 16:58 ./
drwxr-xr-x 22 root root 4096 Mar 13 11:57 ../
-rw-------  1 root root  319 Apr  1 11:52 00-installer-config.yaml
-rw-r--r--  1 root root    6 Apr  1 16:58 hostname
-rw-r--r--  1 root root  427 Apr  1 11:52 os-release
[root@elk93 ~]#

5 ceph集群用户管理

5.1 基本操作

增


	3.三种方式自定义普通用户
参考链接:
	https://docs.ceph.com/en/nautilus/rados/operations/user-management/#add-a-user
	
		3.1 "ceph auth add" 创建用户
[root@ceph141 ~]# ceph auth add client.linux97 mon 'allow r' osd 'allow rwx pool=cmy'
added key for client.linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.linux97
[client.linux97]
	key = AQBnjSpo29TvGBAAsolv7LmXYuNqz+BRe/qVMQ==
	caps mon = "allow r"
	caps osd = "allow rwx pool=cmy"
[root@ceph141 ~]# 




		3.2 "ceph auth get-or-create"创建用户
[root@ceph141 ~]# ceph auth get client.cmy  # 查看用户不存在
Error ENOENT: failed to find client.cmy in keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get-or-create client.cmy mon 'allow r' osd 'allow rwx'  # 如果用户不存在则直接创建并返回认证信息
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy 
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get-or-create client.cmy mon 'allow rwx' osd 'allow r' 
Error EINVAL: key for client.cmy exists but cap mon does not match
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get-or-create client.cmy mon 'allow r' osd 'allow rwx'   # 如果用户已存在，直接获取用户不会创建，早期版本会报错，但19.2.1不会报错
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy  # 很明显，上一条命令没有执行成功
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 


		3.3 "ceph auth get-or-create-key"创建用户
[root@ceph141 ~]# ceph auth get client.k8s  # 注意，用户是不存在的
Error ENOENT: failed to find client.k8s in keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get-or-create-key client.k8s mon 'allow r' osd 'allow rwx'  # 创建用户并返回KEY
AQBkQrxlR6aVGBAAerMOjQ5Nah/HYafJu+aTsg==
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.k8s  # 再次查看用户信息
[client.k8s]
	key = AQBkQrxlR6aVGBAAerMOjQ5Nah/HYafJu+aTsg==
	caps mon = "allow r"
	caps osd = "allow rwx"
exported keyring for client.k8s
[root@ceph141 ~]#

查


		2.1 查看指定用户
[root@ceph141 ~]# ceph auth get client.admin
[client.admin]
	key = AQDOliZo7KuoJhAAAq2ECnRHaG9E9zUCq05paA==
	caps mds = "allow *"
	caps mgr = "allow *"
	caps mon = "allow *"
	caps osd = "allow *"
[root@ceph141 ~]# 



		2.2 查看所有用户
[root@ceph141 ~]# ceph auth list  # 和"ceph auth ls"等效

	4 "ceph auth print-key"打印已经存在用户的KEY
[root@ceph141 ~]# ceph auth get client.jasonyin2020
Error ENOENT: failed to find client.jasonyin2020 in keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth print-key client.cmy;echo  # 如果用户存在则打印该用户对应的KEY信息。
AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth print-key client.jasonyin2020  # 如果用户不存在则报错
Error ENOENT: don't have client.jasonyin2020
[root@ceph141 ~]#

改

	5.修改用户权限，直接覆盖权限
参考链接:
	https://docs.ceph.com/en/nautilus/rados/operations/user-management/#modify-user-capabilities
	

[root@ceph141 ~]# ceph auth get client.cmy
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth caps client.cmy  mon 'allow rx' osd 'allow r pool=cmy'
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow rx"
	caps osd = "allow r pool=cmy"
updated caps for client.cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow rx"
	caps osd = "allow r pool=cmy"
[root@ceph141 ~]#

删

[root@ceph141 ~]# ceph auth get client.cmy
[client.cmy]
	key = AQDdo+xn9Z45NRAAPWt+OW/ad2Sn3B9bM+hJIQ==
	caps mon = "allow rx"
	caps osd = "allow r pool=cmy"
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth del client.cmy  # 删除名为"cmy"的普通用户(client)。
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy
Error ENOENT: failed to find client.cmy in keyring
[root@ceph141 ~]#

5.2 ceph用户的备份和恢复

	1.创建测试用户
[root@ceph141 ~]# ceph auth add client.cmy mon 'allow rwx' osd 'allow r pool=cmy-rbd'
added key for client.cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy
[client.cmy]
	key = AQBpq+xneZxCLhAAoPsxA/063t2Iy/qcw2zdcw==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
[root@ceph141 ~]# 



	2.三种方式导出用户到文件，用于模拟备份
		2.1 先创建一个600的权限文件，然后再导入内容【官方推荐】
[root@ceph141 ~]# ceph-authtool --create-keyring ceph.client.cmy.keyring  # 说白了，只是创建了一个普通文件。
creating ceph.client.cmy.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ll ceph.client.cmy.keyring 
-rw------- 1 root root 0 Feb  2 09:28 ceph.client.cmy.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy -o ceph.client.cmy.keyring  # 将内容导出到指定文件
exported keyring for client.cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# cat ceph.client.cmy.keyring
[client.cmy]
	key = AQBpq+xneZxCLhAAoPsxA/063t2Iy/qcw2zdcw==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
[root@ceph141 ~]# 

		2.2 直接导出到文件，但是文件的权限是644
[root@ceph141 ~]# ceph auth export client.cmy -o cmy.keyring  # 也可以使用这种方式导入用户信息到文件
export auth(key=AQDtRLxl0V3wFRAA8Cz4Vaeey+k049B761iRZA==)
[root@ceph141 ~]# 
[root@ceph141 ~]# ll cmy.keyring 
-rw-r--r-- 1 root root 137 Apr  2 11:15 cmy.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# cat cmy.keyring
[client.cmy]
	key = AQBpq+xneZxCLhAAoPsxA/063t2Iy/qcw2zdcw==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
[root@ceph141 ~]# 

		2.3 直接重定向到文件，权限默认为644
[root@ceph141 ~]# ceph auth get client.cmy > myuser.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ll myuser.keyring
-rw-r--r-- 1 root root 137 Apr  2 11:16 myuser.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# cat myuser.keyring
[client.cmy]
	key = AQBpq+xneZxCLhAAoPsxA/063t2Iy/qcw2zdcw==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
[root@ceph141 ~]# 
 
 
	3.删除用户
[root@ceph141 ~]# ceph auth get client.cmy 
[client.cmy]
	key = AQDtRLxl0V3wFRAA8Cz4Vaeey+k049B761iRZA==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
exported keyring for client.cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth del client.cmy 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy 
Error ENOENT: failed to find client.cmy in keyring
[root@ceph141 ~]# 


	4.导入用户，用于模拟恢复
[root@ceph141 ~]# cat ceph.client.cmy.keyring 
[client.cmy]
	key = AQDtRLxl0V3wFRAA8Cz4Vaeey+k049B761iRZA==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy 
Error ENOENT: failed to find client.cmy in keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth import -i ceph.client.cmy.keyring 
imported keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cmy 
[client.cmy]
	key = AQDtRLxl0V3wFRAA8Cz4Vaeey+k049B761iRZA==
	caps mon = "allow rwx"
	caps osd = "allow r pool=cmy-rbd"
exported keyring for client.cmy
[root@ceph141 ~]#

导出授权文件并验证用户权限

1.创建用户
[root@ceph141 ~]# ceph auth get-or-create client.k3s mon 'allow r'  osd 'allow * pool=cmy'
[client.k3s]
	key = AQDvrOxn2rd5EBAATTk4WdDFbFw3ecT/RRfiTQ==
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.k3s
[client.k3s]
	key = AQDvrOxn2rd5EBAATTk4WdDFbFw3ecT/RRfiTQ==
	caps mon = "allow r"
	caps osd = "allow * pool=cmy"
[root@ceph141 ~]# 


	2.导出用户授权文件，钥匙环(keyring)
[root@ceph141 ~]# ceph auth export client.k3s -o ceph.client.k3s.keyring
[root@ceph141 ~]# 
[root@ceph141 ~]# cat ceph.client.k3s.keyring
[client.k3s]
	key = AQDvrOxn2rd5EBAATTk4WdDFbFw3ecT/RRfiTQ==
	caps mon = "allow r"
	caps osd = "allow * pool=cmy"
[root@ceph141 ~]# 

	3.拷贝授权文件前，观察客户端是否有查看集群的权限
[root@prometheus-server31 ~]# rm -f /etc/ceph/ceph.c*
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /etc/ceph/
total 20
drwxr-xr-x   2 root root  4096 Apr  2 11:21 ./
drwxr-xr-x 132 root root 12288 Apr  2 06:39 ../
-rw-r--r--   1 root root    92 Dec 18 22:48 rbdmap
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]#  ceph -s
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
[root@prometheus-server31 ~]# 


	4.将授权文件拷贝到客户端
[root@ceph141 ~]# scp ceph.client.k3s.keyring /etc/ceph/ceph.conf  10.168.10.31:/etc/ceph


	5.验证权限
[root@prometheus-server31 ~]# ceph -s --user k3s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 41h)
    mgr: ceph141.mbakds(active, since 41h), standbys: ceph142.qgifwo
    osd: 9 osds: 9 up (since 41h), 9 in (since 41h)
 
  data:
    pools:   2 pools, 9 pgs
    objects: 298 objects, 643 MiB
    usage:   2.8 GiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     9 active+clean
 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd -p cmy ls  -l --user k3s
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
rbd: --user is deprecated, use --id
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd -p cmy ls  -l --id k3s
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@prometheus-server31 ~]# 



	6.服务端创建rbd块设备
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool create linux97 16 16 --autoscale_mode off --size 3
pool 'linux97' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls
.mgr
cmy
linux97
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool application enable linux97 rbd  # 声明存储池的类型
enabled application 'rbd' on pool 'linux97'
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd create -s 2G linux97/xixi
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd create -s 4G linux97/haha
[root@ceph141 ~]# 
[root@ceph141 ~]# rbd ls -l linux97
NAME  SIZE   PARENT  FMT  PROT  LOCK
haha  4 GiB            2            
xixi  2 GiB            2            
[root@ceph141 ~]# 


	7.客户端验证 
[root@prometheus-server31 ~]# cat /etc/ceph/ceph.client.k3s.keyring 
[client.k3s]
	key = AQDvrOxn2rd5EBAATTk4WdDFbFw3ecT/RRfiTQ==
	caps mon = "allow r"
	caps osd = "allow * pool=cmy"
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd -p linux97 ls  -l --id k3s
2025-05-19T10:17:34.003+0800 7f1310d344c0 -1 librbd::api::Image: list_images: error listing v1 images: (1) Operation not permitted

rbd: listing images failed: (1) Operation not permitted
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# rbd -p cmy ls  -l --id k3s
NAME               SIZE    PARENT  FMT  PROT  LOCK
child-xixi-001     20 GiB            2        excl
node-exporter      20 GiB            2            
prometheus-server  40 GiB            2        excl
[root@prometheus-server31 ~]# 

	
	
	
	
- 授权文件加载顺序总结：
	1 如果使用"--user k3s"或者"--id k3s"指定用户，则默认去找以下文件，找不到就报错:
- /etc/ceph/ceph.client.k3s.keyring
- /etc/ceph/ceph.keyring
- /etc/ceph/keyring
- /etc/ceph/keyring.bin


	2 如果不使用"--user"或者"--id"选项，咱们可以理解为默认为"--user amdin"
- /etc/ceph/ceph.client.admin.keyring
- /etc/ceph/ceph.keyring
- /etc/ceph/keyring
- /etc/ceph/keyring.bin
		

	3 对于认证文件不能随便起名字。
而是需要遵循上述2条的规范文件命名，否则ceph不识别用户的配置文件
	

	4 客户端在连接ceph集群时，仅需要读取keyring文件中的KEY值。
其他caps字段会被忽视。也就是说，对于文件中只要保留key值依旧是有效的。

6 ceph_fs

CephFS概述
RBD提供了远程磁盘挂载的问题，但无法做到多个主机共享一个磁盘，如果有一份数据很多客户端都要读写该怎么办呢？这时CephFS作为文件系统解决方案就派上用场了。

CephFS是POSIX兼容的文件系统，它直接使用Ceph存储集群来存储数据。Ceph文件系统于Ceph块设备，同时提供S3和Swift API的Ceph对象存储或者原生库(librados)的实现机制稍显不同。

如上图所示，CephFS支持内核模块或者fuse方式访问，如果宿主机没有安装ceph模块，则可以考虑使用fuse方式访问。可以通过"modinfo ceph"来检查当前宿主机是否有ceph相关内核模块。

2.CephFS架构原理

CephFS需要至少运行一个元数据服务器（MDS）守护进程(ceph-mds)，此进程管理与CephFS上存储文件相关的元数据信息。

MDS虽然称为元数据服务，但是它却不存储任何元数据信息，它存在的目的仅仅是让我们rados集群提供存储接口。

客户端在访问文件接口时，首先链接到MDS上，在MDS到内存里面维持元数据的索引信息，从而间接找到去哪个数据节点读取数据。这一点和HDFS文件系统类似。

3.CephFS和NFS对比

#面试题
相较于NFS来说，它主要有以下特点优势:
 – 1.底层数据冗余的功能，底层的roados提供了基本数据冗余功能，因此不存在NFS的单点故障因素;
 – 2.底层roados系统有N个存储节点组成，所以数据的存储可以分散I/O，吞吐量较高;
 – 3.底层roados系统有N个存储节点组成，所以ceph提供的扩展性要相当的高;

6.1 cephFS的一主一从架构部署

 cephFS的一主一从架构部署
推荐阅读:
	https://docs.ceph.com/en/reef/cephfs/createfs/


	1.创建两个存储池分别用于存储mds的元数据和数据
[root@ceph141 ~]# ceph -s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 41h)
    mgr: ceph141.mbakds(active, since 41h), standbys: ceph142.qgifwo
    osd: 9 osds: 9 up (since 41h), 9 in (since 41h)
 
  data:
    pools:   3 pools, 25 pgs
    objects: 307 objects, 644 MiB
    usage:   2.8 GiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     25 active+clean
 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool create cephfs_data
pool 'cephfs_data' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool create cephfs_metadata
pool 'cephfs_metadata' created
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail | grep  ceph
pool 12 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 431 lfor 0/0/429 flags hashpspool stripe_width 0 read_balance_score 2.25
pool 13 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 431 lfor 0/0/429 flags hashpspool stripe_width 0 read_balance_score 3.09
[root@ceph141 ~]# 


	2.创建一个文件系统，名称为"cmy-cephfs"
[root@ceph141 ~]# ceph fs new cmy-cephfs cephfs_metadata cephfs_data
  Pool 'cephfs_data' (id '12') has pg autoscale mode 'on' but is not marked as bulk.
  Consider setting the flag by running
    # ceph osd pool set cephfs_data bulk true
new fs with metadata pool 13 and data pool 12
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail | grep  ceph
pool 12 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 435 lfor 0/0/429 flags hashpspool stripe_width 0 application cephfs read_balance_score 2.25
pool 13 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 435 lfor 0/0/429 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 3.09
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get cephfs_data bulk 
bulk: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set cephfs_data bulk true  # 标记'cephfs_data'存储池为大容量
set pool 12 bulk to true
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get cephfs_data bulk 
bulk: true
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail | grep  ceph  # 注意观察pg的数量
pool 12 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change 444 lfor 0/0/442 flags hashpspool,bulk stripe_width 0 application cephfs read_balance_score 1.55
pool 13 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 435 lfor 0/0/429 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 3.09
[root@ceph141 ~]# 


	3.查看创建的文件系统
[root@ceph141 ~]# ceph fs ls
name: cmy-cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph mds stat
cmy-cephfs:0
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_ERR
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 41h)
    mgr: ceph141.mbakds(active, since 41h), standbys: ceph142.qgifwo
    mds: 0/0 daemons up
    osd: 9 osds: 9 up (since 41h), 9 in (since 41h)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 313 pgs
    objects: 307 objects, 645 MiB
    usage:   2.9 GiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     313 active+clean
 
[root@ceph141 ~]# 


	4.应用mds的文件系统
[root@ceph141 ~]# ceph orch apply mds cmy-cephfs
Scheduled mds.cmy-cephfs update...
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph fs ls
name: cmy-cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph mds stat
cmy-cephfs:1 {0=cmy-cephfs.ceph142.pmzglk=up:active} 1 up:standby
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     11e66474-0e02-11f0-82d6-4dcae3d59070
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 41h)
    mgr: ceph141.mbakds(active, since 41h), standbys: ceph142.qgifwo
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 41h), 9 in (since 42h)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 313 pgs
    objects: 329 objects, 645 MiB
    usage:   2.9 GiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     313 active+clean
 
  io:
    client:   5.8 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr
 
[root@ceph141 ~]# 


	5.查看cephFS集群的详细信息
[root@ceph141 ~]# ceph fs status cmy-cephfs   # 不难发现目前活跃提供服务是ceph142，备用的是ceph143
cmy-cephfs - 0 clients
================
RANK  STATE                 MDS                   ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cmy-cephfs.ceph142.pmzglk  Reqs:    0 /s    10     13     12      0   
      POOL         TYPE     USED  AVAIL  
cephfs_metadata  metadata  96.0k  1730G  
  cephfs_data      data       0   1730G  
          STANDBY MDS            
cmy-cephfs.ceph143.scwesv  
MDS version: ceph version 19.2.1 (58a7fab8be0a062d730ad7da874972fd3fba59fb) squid (stable)
[root@ceph141 ~]#

6.2 客户端使用

借助内核ceph模块挂载

关于secretfile选项说明如下:
secretfile在ceph内核模块中是不支持的，但是fuse目前还是支持该模块的。

如果你出现报错: 'ceph: Unknown parameter 'secretfile''就说明当前的ceph版本不再支持secretfile功能啦~。

目前对于ceph 19.2.2版本而言，是无法使用secretfile的。

但是早期版本是支持secretfile功能的，毕竟在ceph 14.2.22版本中是可以正常使用的。

其中使用'modinfo ceph'就能看到起支持ceph模块的方式挂载，如果内核地域4.x-版本，则需要考虑使用fuse模块实现。

1.管理节点创建用户并导出钥匙环和key文件
		1.1 创建用户并授权
[root@ceph141 ~]# ceph auth add  client.cephfs mon 'allow r' mds 'allow rw' osd 'allow rwx'
added key for client.cephfs
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph auth get client.cephfs
[client.cephfs]
	key = AQBopypoi7WlERAAD7D+OyN/5dWx0pVU6cUu1Q==
	caps mds = "allow rw"
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 



		1.2 导出认证信息
[root@ceph141 ~]# ceph auth  print-key  client.cephfs > cephfs.key
[root@ceph141 ~]# 
[root@ceph141 ~]# cat cephfs.key ; echo
AQBopypoi7WlERAAD7D+OyN/5dWx0pVU6cUu1Q==
[root@ceph141 ~]# 



	2.基于KEY进行挂载，无需拷贝秘钥文件！
	 mkdir /data
 
 mount -t ceph 10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789:/ /data -o name=cephfs,secret=AQC4rypo3z64AxAAZHW0zQe1uHPibYRlCFEg/w==

基于用户空间fuse方式访问

1.FUSE概述

对于某些操作系统来说，它没有提供对应的ceph内核模块，我们还需要使用CephFS的话，可以通过FUSE方式来实现。

FUSE英文全称为:"Filesystem in Userspace"，工作在用户空间，这意味着ceph-fuse挂载的性能不如内核(ceph)驱动程序挂载，但它们更容易管理和升级。

用于非特权用户能够无需操作内核而创建文件系统，但需要单独安装"ceph-fuse"程序包，ceph-fuse可以作为CephFS内核驱动程序的替代品来挂载CephFS文件系统。

ceph-fuse一般用于较低的Linux 4.X- 内核。

2.安装ceph-fuse程序包
[root@node-exporter43 ~]# apt -y install ceph-fuse 


	3.创建挂载点
[root@node-exporter43 ~]# mkdir -pv /cmy/cephfs
mkdir: created directory '/cmy'
mkdir: created directory '/cmy/cephfs'
[root@node-exporter43 ~]# 


	4.拷贝认证文件
[root@ceph141 ~]# cat ceph.client.cephfs.keyring 
[client.cephfs]
	key = AQBopypoi7WlERAAD7D+OyN/5dWx0pVU6cUu1Q==
	caps mds = "allow rw"
	caps mon = "allow r"
	caps osd = "allow rwx"
[root@ceph141 ~]# 
[root@ceph141 ~]# scp ceph.client.cephfs.keyring  10.168.10.43:/tmp


	5.使用ceph-fuse工具挂载cephFS
[root@node-exporter43 ~]# ceph-fuse -n client.cephfs -m 10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789 /cmy/cephfs/ -c /tmp/ceph.client.cephfs.keyring
2025-05-19T14:38:17.176+0800 7f6022acd3c0 -1 init, newargv = 0x55602fd91f60 newargc=13
ceph-fuse[2392]: starting ceph client

2025-05-19T14:38:17.176+0800 7f6022acd3c0 -1 init, args.argv = 0x55602fd920b0 args.argc=4

ceph-fuse[2392]: starting fuse
[root@node-exporter43 ~]# 
[root@node-exporter43 ~]# df -h | grep cmy
ceph-fuse                          1.7T     0  1.7T   0% /cmy/cephfs
[root@node-exporter43 ~]#

cephFS两种方式开机自动挂载实战

1.基于rc.local脚本方式开机自动挂载【推荐，就算执行失败，并不会导致系统无法启动~】
		1.1 修改启动脚本
[root@prometheus-server31 ~]# cat /etc/rc.local 
#!/bin/bash

install -d /data

mount -t ceph 10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789:/ /linux97/wordpress/wp -o name=cephfs,secret=AQBopypoi7WlERAAD7D+OyN/5dWx0pVU6cUu1Q==

[root@prometheus-server31 ~]# 


		1.2 重启服务器 
[root@prometheus-server31 ~]# reboot 

		1.3 测试验证
[root@prometheus-server31 ~]# df -h | grep  data
10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789:/  1.7T     0  1.7T   0% /data
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# ll /data/
total 7
drwxr-xr-x  2 root root    4 May 19 14:38 ./
drwxr-xr-x 22 root root 4096 May 19 14:48 ../
-rw-r--r--  1 root root  657 May 19 11:44 fstab
-rw-r--r--  1 root root   16 May 19 14:38 hostname
-rw-r--r--  1 root root  226 May 19 12:07 hosts
-rw-r--r--  1 root root  427 May 19 11:44 os-release
[root@prometheus-server31 ~]# 



	2.基于fstab文件进行开机自动挂载（危险！！！，执行失败，可能导致系统无法启动，需要使用'救援模式'！）
		2.1 修改fstab的配置文件
[root@prometheus-server32 ~]# tail -1 /etc/fstab 
10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789:/ /data  ceph name=cephfs,secret=AQBopypoi7WlERAAD7D+OyN/5dWx0pVU6cUu1Q==,noatime,_netdev    0       2
[root@prometheus-server32 ~]# 
 
		2.2 重启系统 
[root@prometheus-server32 ~]# reboot 


		2.3 测试验证
[root@prometheus-server32 ~]# df -h | grep data
10.168.10.141:6789,10.168.10.142:6789,10.168.10.143:6789:/  1.7T     0  1.7T   0% /data
[root@prometheus-server32 ~]# 
[root@prometheus-server32 ~]# ll /data/
total 7
drwxr-xr-x  2 root root    4 May 19 14:38 ./
drwxr-xr-x 22 root root 4096 May 19 14:47 ../
-rw-r--r--  1 root root  657 May 19 11:44 fstab
-rw-r--r--  1 root root   16 May 19 14:38 hostname
-rw-r--r--  1 root root  226 May 19 12:07 hosts
-rw-r--r--  1 root root  427 May 19 11:44 os-release
[root@prometheus-server32 ~]#

cephFS删除操作

1. **先失效**：让文件系统停止服务 (`fs fail`)
    ceph fs fail cmy-cephfs
2. **再删除逻辑定义**：移除文件系统记录 (`fs rm`)
    ceph fs rm cmy-cephfs --yes-i-really-mean-it
3. **最后清理物理存储**：删除底层存储池 (`pool delete`)
 ceph osd pool delete cephfs_metadata cephfs_metadata --yes-i-really-really-mean-it
 ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it

 确保：
    - 没有客户端正在访问
    - 已备份重要数据
    - 在维护窗口期操作

7 ceph集群的维护命令

7.1 集群状态检查命令

检查集群整体状态
```
ceph -s  # 或 ceph status
```
查看集群健康详情
```
ceph health detail
```
查看OSD状态
```
ceph osd stat
ceph osd tree
```
查看MON状态
```
ceph mon stat
ceph mon dump
```
查看PG状态
```
ceph pg stat
ceph pg dump
```

7.2 OSD维护命令

停止OSD
```
systemctl stop ceph-osd@<osd_id>
```
启动OSD
```
systemctl start ceph-osd@<osd_id>
```
将OSD移出集群
```
ceph osd out <osd_id>
```
将OSD重新加入集群
```
ceph osd in <osd_id>
```

删除OSD

ceph osd crush remove osd.<osd_id>
ceph auth del osd.<osd_id>
ceph osd rm <osd_id>

查看OSD使用情况
```
ceph osd df
```

7.3 Pool操作命令

列出所有pool
```
ceph osd lspools
```

创建pool

ceph osd pool create <pool_name> <pg_num> <pgp_num>

删除pool

ceph osd pool delete <pool_name> <pool_name> --yes-i-really-really-mean-it

设置pool配额

ceph osd pool set-quota <pool_name> max_objects|max_bytes <val>

7.4 数据平衡与恢复

手动触发数据重平衡
```
ceph osd reweight-by-utilization
```
查看数据恢复状态
```
ceph -w  # 监控集群事件
```

限制恢复/回填速度

ceph osd set norecover
ceph osd set nobackfill
ceph osd unset norecover
ceph osd unset nobackfill

7.5 集群配置管理

获取配置参数

ceph config get <daemon_type>.<id> <config_param>

设置配置参数

ceph config set <daemon_type>.<id> <config_param> <value>

显示所有配置
```
ceph config dump
```

7.6 升级与维护模式

进入维护模式

ceph osd set noout
ceph osd set norebalance
ceph osd set norecover

退出维护模式

ceph osd unset noout
ceph osd unset norebalance
ceph osd unset norecover

检查版本
```
ceph versions
```

1.查看集群的架构
[root@ceph141 ~]# ceph orch ls



	2.查看ceph集群的守护进程
[root@ceph141 ~]# ceph orch  ps


	3.查看指定节点的守护进程
[root@ceph141 ~]# ceph orch  ps ceph143


	4.重启指定节点守护进程服务
[root@ceph141 ~]# ceph orch daemon restart node-exporter.ceph143 


	5.查看主机有哪些设备列表
[root@ceph141 ~]# ceph orch device ls



	6.查看集群有哪些主机列表
[root@ceph141 ~]# ceph orch host ls
	7.报告配置的后端及其状态
[root@ceph141 ~]# ceph orch status


	8.检查服务版本与可用和目标容器

[root@ceph141 ~]# ceph orch upgrade check quay.io/ceph/ceph:v19


	9.查看指定服务的信息

[root@ceph141 ~]# ceph orch ps --service_name alertmanager

8 Prometheus监控ceph集群实战

1.查看集群的架构
[root@ceph141 ~]# ceph orch ps --service_name prometheus
NAME                HOST     PORTS   STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
prometheus.ceph141  ceph141  *:9095  running (7h)     9m ago   3d     125M        -  2.51.0   1d3b7f56885b  41fbac965052  
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps --service_name grafana
NAME             HOST     PORTS   STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
grafana.ceph141  ceph141  *:3000  running (7h)     9m ago   3d     164M        -  10.4.0   c8b91775d855  815590f0f32d  
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps --service_name alertmanager
NAME                  HOST     PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
alertmanager.ceph141  ceph141  *:9093,9094  running (7h)     9m ago   3d    32.5M        -  0.25.0   c8568f914cd2  4d6ff3e91442  
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps --service_name node-exporter 
NAME                   HOST     PORTS   STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
node-exporter.ceph141  ceph141  *:9100  running (7h)     9m ago   3d    19.1M        -  1.7.0    72c9c2088986  a481816d812d  
node-exporter.ceph142  ceph142  *:9100  running (4h)     9m ago   3d    18.3M        -  1.7.0    72c9c2088986  ed336cd5c446  
node-exporter.ceph143  ceph143  *:9100  running (5m)     5m ago   3d    2803k        -  1.7.0    72c9c2088986  ee1eb531d10f  
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps --service_name ceph-exporter 
NAME                   HOST     PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
ceph-exporter.ceph141  ceph141         running (7h)     9m ago   3d    25.4M        -  19.2.2   4892a7ef541b  27d6f69779fa  
ceph-exporter.ceph142  ceph142         running (4h)     9m ago   3d    12.7M        -  19.2.2   4892a7ef541b  c546792381a6  
ceph-exporter.ceph143  ceph143         running (7h)     5m ago   3d    31.8M        -  19.2.2   4892a7ef541b  6b55796c0016  
[root@ceph141 ~]# 


温馨提示:
	不难发现，有grafana，alertmanager，ceph-exporter，Prometheus等组件默认都是安装好的，说白了，无需手动安装。
	
	所以，基于cephadm方式部署的环境，可以直接使用Prometheus监控。若使用的ceph-deploy方式部署，则需要手动配置各组件。


	2.查看Prometheus的WEbUI
http://10.168.10.141:9095/targets?search=

	3.查看grafana的WebUI
https://10.168.10.141:3000/

	4.查看node-exporter
http://10.168.10.141:9100/metrics

	5.查看alertmanger
http://10.168.10.141:9093/#/status


	6.自实现Prometheus监控参考链接
推荐阅读：
	https://github.com/digitalocean/ceph_exporter
	https://github.com/blemmenes/radosgw_usage_exporter
	
	7.查看ceph集群的mgr模块列表
[root@ceph141 ~]# ceph mgr module ls 


	8.启用模块（可选操作）
[root@ceph141 ~]# ceph mgr module enable zabbix
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph mgr module ls

9 RGW

9.1 部署

1 部署之前查看集群状态
[root@ceph141 ~]# ceph -s
  cluster:
    id:     48fcf2bc-31f6-11f0-8833-3507f15d877f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 21h)
    mgr: ceph141.rzrqkk(active, since 28s), standbys: ceph142.rngppx
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 21h), 9 in (since 3d)



	2 创建一个服务
[root@ceph141 ~]#  ceph orch apply rgw cmy
Scheduled rgw.cmy update...
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps | grep rgw
rgw.cmy.ceph141.izhdud       ceph141  *:80              running (44s)    37s ago  44s    50.4M        -  19.2.2   4892a7ef541b  40fa01bf0263  
rgw.cmy.ceph142.zrxqhd       ceph142  *:80              running (46s)    38s ago  46s    49.1M        -  19.2.2   4892a7ef541b  1ce771d74a30  
[root@ceph141 ~]# 


	3 部署rgw组件【可跳过，如果没有启动对应的daemon进程再尝试执行该命令】
[root@ceph141 ~]# ceph orch daemon add  rgw  cmy ceph142
Deployed rgw.cmy.ceph142.tmtpzs on host 'ceph142'
[root@ceph141 ~]# 

	4 检查rgw组件是否部署成功
[root@ceph141 ~]# ceph -s
  cluster:
    id:     48fcf2bc-31f6-11f0-8833-3507f15d877f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 21h)
    mgr: ceph141.rzrqkk(active, since 2m), standbys: ceph142.rngppx
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 21h), 9 in (since 3d)
    rgw: 2 daemons active (2 hosts, 1 zones)  # 观察你的集群是否有rgw组件
 
  data:
    volumes: 1/1 healthy
    pools:   9 pools, 457 pgs
    objects: 3.88k objects, 459 MiB
    usage:   2.7 GiB used, 5.3 TiB / 5.3 TiB avail
    pgs:     457 active+clean
	
	5 查看rgw默认创建的存储池信息
[root@ceph141 ~]# ceph osd pool ls
...
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
[root@ceph141 ~]# 


	6 查看rgw类型的服务信息及所在的主机列表
[root@ceph141 ~]# ceph  orch ls rgw  rgw.cmy --export
service_type: rgw
service_id: cmy
service_name: rgw.cmy
placement:
  count: 2
[root@ceph141 ~]# 
root@ceph141 ~]# ceph orch ps --service_name rgw.cmy
NAME                            HOST     PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
rgw.cmy.ceph141.afbnhc  ceph141  *:80   running (3m)     3m ago   3m    46.2M        -  19.2.2   4892a7ef541b  bbe03450c347
rgw.cmy.ceph142.aduybb  ceph142  *:80   running (3m)     3m ago   3m    48.6M        -  19.2.2   4892a7ef541b  2d9ea2c20350

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch ps --daemon_type rgw
NAME                            HOST     PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
rgw.cmy.ceph141.izhdud  ceph141  *:80   running (3m)     3m ago   3m    50.4M        -  19.2.2   4892a7ef541b  40fa01bf0263  
rgw.cmy.ceph142.zrxqhd  ceph142  *:80   running (3m)     3m ago   3m    49.1M        -  19.2.2   4892a7ef541b  1ce771d74a30  
[root@ceph141 ~]# 


	7 访问对象存储的WebUI
http://10.168.10.141/
http://10.168.10.142/

9.2 使用

s3cmd工具初始化配置

	1.安装s3cmd工具包 
[root@ceph141 ~]# echo 10.168.10.142 www.cmy.cn >> /etc/hosts  # 解析的ip地址只要是rgw服务器即可
[root@ceph141 ~]# 
[root@ceph141 ~]# apt -y install s3cmd


	2 创建rgw账号
[root@ceph141 ~]#  radosgw-admin user create --uid "cmy" --display-name "陈梦元"
{
    "user_id": "cmy",
    "display_name": "陈梦元",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "cmy",
            "access_key": "V3T5UTHGSER67S29IEO3",
            "secret_key": "OZiCeA0Qb3dRE2ChPbd47UmpUCuv2arVoyDacNVZ"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}


	
	3 运行s3cmd的运行环境，生成"/root/.s3cfg"配置文件
[root@ceph141 ~]# ll /root/.s3cfg
ls: cannot access '/root/.s3cfg': No such file or directory
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd --configure 

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: 5D7TCMGET1K5WHSI2GN6  # rgw账号的access_key
Secret Key: fGrXxkK4rBglEbQp1SJwr9JFbLxkIyAZO2kyLclz   # rgw账号的secret_key
Default Region [US]:  # 直接回车即可

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: www.cmy.cn  # 用于访问rgw的地址

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: www.cmy.cn/%(bucket)  # 设置DNS解析风格

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:  # 文件不加密，直接回车即可 
Path to GPG program [/usr/bin/gpg]:  # 指定自定义的gpg程序路径，直接回车即可

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: No  # 你的rgw是否是https，如果不是设置为No

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:   # 代理服务器的地址，我并没有配置代理服务器，因此直接回车即可

New settings:  # 注意，下面的信息是上面咱们填写时一个总的预览信息
  Access Key: 5D7TCMGET1K5WHSI2GN6
  Secret Key: fGrXxkK4rBglEbQp1SJwr9JFbLxkIyAZO2kyLclz
  Default Region: US
  S3 Endpoint: www.cmy.cn
  DNS-style bucket+hostname:port template for accessing a bucket: www.cmy.cn/%(bucket)
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0


Test access with supplied credentials? [Y/n] Y  # 如果确认上述信息没问题的话，则输入字母Y即可。
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y  # 是否保存配置，我们输入y，默认是不保存配置的。
Configuration saved to '/root/.s3cfg'
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ll /root/.s3cfg
-rw------- 1 root root 2269 Aug 23 09:59 /root/.s3cfg
[root@ceph141 ~]#

上传测试视频案例模拟抖音，快手

1. 创建buckets
[root@ceph141 ~]# s3cmd mb s3://cmy-bucket
Bucket 's3://cmy-bucket/' created

	2.查看buckets
[root@ceph141 ~]# s3cmd ls
2025-05-20 01:11  s3://cmy-bucket
[root@ceph141 ~]# 
[root@ceph141 ~]# radosgw-admin buckets list 
[
    "cmy-bucket"
]
[root@ceph141 ~]# 


	3.使用s3cmd上传数据到buckets
		3.1 准备静态文件【可以是视频，图片，音乐等文件】
s3cmd put 2.jpg s3://cmy-bucket
s3cmd put 7.jpg s3://cmy-bucket
		3.3 查看rgw服务端的桶数据
[root@ceph141 ~]# s3cmd ls s3://cmy-bucket
2025-05-20 01:36        74316  s3://cmy-bucket/2.jpg
2025-05-20 01:36       172190  s3://cmy-bucket/7.jpg
	4.使用s3cmd下载数据
[root@ceph141 ~]# s3cmd get  s3://cmy-bucket/2.jpg /tmp
download: 's3://cmy-bucket/2.jpg' -> '/tmp/2.jpg'  [1 of 1]
 74316 of 74316   100% in    0s     6.26 MB/s  done
[root@ceph141 ~]# ll /tmp | grep jpg
-rw-r--r--  1 root root 74316 May 20 01:36 2.jpg
5.授权策略
[root@ceph141 ~]# cat cmy-anonymous-access-policy.json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::cmy-bucket/*"
    ]
  }]
}
	6.添加授权策略 

[root@ceph141 ~]# s3cmd setpolicy cmy-anonymous-access-policy.json s3://cmy-bucket
7.访问测试 
http://10.168.10.142/cmy-bucket/2.jpg
http://10.168.10.142/cmy-bucket/7.jpg

	8.删除策略
s3cmd delpolicy  s3://cmy-bucket
	9.再次访问测试（无权限） 
http://10.168.10.142/cmy-bucket/2.jpg
http://10.168.10.142/cmy-bucket/7.jpg

9.3 基础操作

查看buckets的列表

[root@ceph141 ~]# s3cmd ls
2025-05-20 01:11  s3://cmy-bucket

查看buckets的数据

[root@ceph141 ~]# s3cmd la
                          DIR  s3://cmy-bucket/xixi/
2025-05-20 01:36        74316  s3://cmy-bucket/2.jpg
2025-05-20 01:36       172190  s3://cmy-bucket/7.jpg

上传多个文件到buckets

链接文件默认会被跳过上传
[root@ceph141 ~]# s3cmd put /etc/os-release /etc/fstab /etc/passwd s3://cmy-bucket/xixi/

下载文件

[root@ceph141 ~]# s3cmd get s3://cmy-bucket/xixi/fstab /tmp/xixi

删除对象

[root@ceph141 ~]# s3cmd del s3://cmy-bucket/xixi/passwd
delete: 's3://cmy-bucket/xixi/passwd'

s3cmd del s3://cmy-bucket/xixi/ -r
delete: 's3://cmy-bucket/xixi/fstab'
delete: 's3://cmy-bucket/xixi/passwd'

上传目录的所有文件到bucket

 s3cmd sync /cmy/softwares/docker/ s3://cmy-bucket

查看某个bucket的大小

[root@ceph141 ~]# s3cmd du s3://cmy-bucket -H
 249M      14 objects s3://cmy-bucket/

获取bucket或者objects信息

 s3cmd info s3://cmy-bucket

拷贝数据

[root@ceph141 ~]# s3cmd cp s3://cmy-bucket/xixi/ s3://cmy

移动objects对象

查看正在上传的文件列表

 s3cmd multipart s3://cmy-bucket

递归删除bucket

s3cmd rb s3://cmy-bucket -r

修改对象的元数据

	10.修改对象的元数据
[root@ceph141 ~]# s3cmd ls -H s3://cmy
2025-05-20 02:10    37M  s3://cmy/containerd
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd modify s3://cmy/containerd
modify: 's3://cmy/containerd'  [1 of 1]
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd ls -H s3://cmy
2025-05-20 02:11    37M  s3://cmy/containerd

9.4 buckets权限策略

s3:PutObjectAcl
   上传对象。
   
s3:GetObject
   下载对象。
   
s3:CreateBucket
   创建buckets。 

s3:ListBucket
   查看buckets。
   
s3:DeleteObject
   删除对象。
   
s3:DeleteBucket
   删除buckets。

9.5 Python操作对象存储服务实战

参考链接:
		https://docs.ceph.com/en/squid/radosgw/s3/python/
		
		
	1.安装python环境
		1.1 安装pip工具包
[root@prometheus-server31 ~]# apt -y install python3-pip

		1.2 安装boto包
[root@prometheus-server31 ~]# pip install boto==2.49.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

	2.编写python程序
[root@prometheus-server31 ~]# cat rgw-cmy.py 
import boto
import boto.s3.connection

# 你账号的access_key和secret_key需要自行修改
access_key = 'V3T5UTHGSER67S29IEO3'
secret_key = 'OZiCeA0Qb3dRE2ChPbd47UmpUCuv2arVoyDacNVZ'

# 连接rgw
conn = boto.connect_s3(
        aws_access_key_id = access_key,
        aws_secret_access_key = secret_key,
        host = '10.168.10.142',
        is_secure=False,
        calling_format = boto.s3.connection.OrdinaryCallingFormat(),
        )

# 创建bucket
bucket = conn.create_bucket('cmy-rgw')

# 查看bucket列表
for bucket in conn.get_all_buckets():
        print("{name}\t{created}".format(
                name = bucket.name,
                created = bucket.creation_date,
        ))


# 查看bucket内容
for key in bucket.list():
        print("{name}\t{size}\t{modified}".format(
                name = key.name,
                size = key.size,
                modified = key.last_modified,
        ))

# 创建一个对象
key = bucket.new_key('blog.txt')
key.set_contents_from_string('https://www.cnblogs.com/cmy\n')

# 生成对象下载的URL
hello_key = bucket.get_key('blog.txt')
hello_url = hello_key.generate_url(0, query_auth=False, force_http=True)
print(hello_url)



	3.运行程序 
[root@prometheus-server31 ~]# python3 rgw-cmy.py 
cmy	2025-05-20T02:10:32.657Z
cmy-rgw	2025-05-20T02:45:47.121Z
http://10.0.0.142/cmy-rgw/blog.txt
[root@prometheus-server31 ~]#
 

	4.访问测试，发现无法访问！
[root@prometheus-server31 ~]# curl http://10.0.0.142/cmy-rgw/blog.txt;echo
<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message></Message><BucketName>cmy-rgw</BucketName><RequestId>tx0000090f801ec50aef41b-0067ee3899-64531-default</RequestId><HostId>64531-default-default</HostId></Error>
[root@prometheus-server31 ~]#



[root@prometheus-server31 ~]# python3 rgw-cmy.py  # 再次测试发现输出内容和之前有变化，原因是已经有bucket。
cmy	2025-05-20T02:10:32.657Z
cmy-rgw	2025-05-20T02:45:47.121Z
blog.txt	36	2025-05-20T02:45:47.231Z
http://10.0.0.142/cmy-rgw/blog.txt
[root@prometheus-server31 ~]# 


	4.使用s3cmd命令访问测试
[root@ceph141 /]#  s3cmd get s3://cmy-rgw/blog.txt
download: 's3://cmy-rgw/blog.txt' -> './blog.txt'  [1 of 1]
 36 of 36   100% in    0s   847.12 B/s  done
[root@ceph141 /]# 
[root@ceph141 /]# cat blog.txt 
https://www.cnblogs.com/cmy
[root@ceph141 /]# 


	5.创建访问策略
		5.1 编写策略配置文件
[root@ceph141 ~]# cat cmy-anonymous-access-policy2.json 
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::cmy-rgw/*"
    ]
  }]
}
[root@ceph141 ~]# 

		5.2 应用策略
[root@ceph141 ~]# s3cmd info s3://cmy-rgw
s3://cmy-rgw/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    none
   CORS:      none
   ACL:       尹正杰: FULL_CONTROL
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd setpolicy cmy-anonymous-access-policy2.json  s3://cmy-rgw
s3://cmy-rgw/: Policy updated
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd info s3://cmy-rgw
s3://cmy-rgw/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::cmy-rgw/*"
    ]
  }]
}

   CORS:      none
   ACL:       尹正杰: FULL_CONTROL
[root@ceph141 ~]# 


	6.再次访问测试 
[root@prometheus-server31 ~]# curl http://10.168.10.142/cmy-rgw/blog.txt
https://www.cnblogs.com/cmy

9.6 swift工具操作对象存储网关rgw

Switft API接口概述

1.1 什么是swift
swift的用户账号对应radosgw中的subuser(子用户)，它隶属于某个事先存在的user(用户账号)。

Swift API的上下文中，存储桶以container表示，而非S3中的bucket，但二者在功能上相似，都是对象数据的容器。

Python swiftclient是一个用于和swift API交互的python客户端程序，它包含了Python API(swift模块)和一个swift命令。

swift命令可以通过Swift API完成容器和对象数据的管理操作。

swift实现的基本逻辑

1.创建专属的用户名和子用户授权;
2.安装专属的客户端命令和python模块;
3.配置专属的认证配置文件;
4.综合测试swift的资源对象管理;

swift命令行配置实战

	2.1 创建账号
[root@ceph141 ~]# radosgw-admin user create --uid "cmy" --display-name "尹正杰"
{
    "user_id": "cmy",
    "display_name": "尹正杰",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "cmy",
            "access_key": "TTI9MIAW9UG68M2IO27N",
            "secret_key": "4dxAfAMyRYaFNBytVK9zFDDcHVX7efHDLacm45IE"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

[root@ceph141 ~]# 


	
			2.2 查看账号列表
[root@ceph141 ~]# radosgw-admin user list
[
    "dashboard",
    "cmy",
    "jasonyin"
]
[root@ceph141 ~]# 


			
			2.3 基于现有用户创建子用户
[root@ceph141 ~]#  radosgw-admin subuser create --uid=cmy --subuser=cmy:swift --access=full
{
    "user_id": "cmy",
    "display_name": "尹正杰",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [
        {
            "id": "cmy:swift",
            "permissions": "full-control"
        }
    ],
    "keys": [
        {
            "user": "cmy",
            "access_key": "Q2PK0VUWSVX91GVYEKPT",
            "secret_key": "wisUifNE5NMmi7qmnjcqx92e3RGtg7Kmx4FIrFix"
        }
    ],
    "swift_keys": [
        {
            "user": "cmy:swift",
            "secret_key": "Md5FS0XB5sCZOsvRXgFaUpciqXOSWc9Jgk3hD99B"
        }
    ],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}




		2.5 安装工具包
[root@prometheus-server31 ~]# apt -y install python-setuptools python3-pip

[root@prometheus-server31 ~]# pip install python-swiftclient==4.6.0



		2.6 创建并查看存储桶
[root@ceph141 ~]# swift -A http://10.168.10.142/auth -U cmy:swift -K "Md5FS0XB5sCZOsvRXgFaUpciqXOSWc9Jgk3hD99B" post cmy-swift

[root@ceph141 ~]# swift -A http://10.168.10.142/auth -U cmy:swift -K "Md5FS0XB5sCZOsvRXgFaUpciqXOSWc9Jgk3hD99B" list -l
           3         1354 2025-05-20 03:33:17 default-placement cmy-swift
           3         1354


[root@ceph141 ~]# s3cmd ls  # 很明显，swift的api和s3的API貌似并不兼容
2025-05-20 02:10  s3://cmy
2025-05-20 02:45  s3://cmy-rgw
温馨提示:
	从结果上来看，貌似s3cmd和swift二者并不兼容，比如存储桶的和管理方式都不一样。

相关参数说明:
	-A 
		指定认证的URL
		
	-U 
		指定子用户的名称
		
	-K 
		指定KEY信息
	
		2.7 配置环境变量【便于操作】
cat .swift
export ST_AUTH=http://10.168.10.142/auth
export ST_USER=cmy:swift
export ST_KEY=Md5FS0XB5sCZOsvRXgFaUpciqXOSWc9Jgk3hD99B
[root@prometheus-server31 ~]# source  .swift 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# swift list
cmy-swift
[root@prometheus-server31 ~]# 



		2.8 上传文件到存储桶
[root@prometheus-server31 ~]# swift list cmy-swift
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# swift upload cmy-swift /etc/os-release  /etc/hosts /etc/fstab 
etc/os-release
etc/hosts
etc/fstab
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# swift list cmy-swift
etc/fstab
etc/hosts
etc/os-release
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# swift list cmy-swift -l
         657 2025-05-20 03:29:09                     None etc/fstab
         226 2025-05-20 03:29:09                     None etc/hosts
         427 2025-05-20 03:29:09                     None etc/os-release
        1310
[root@prometheus-server31 ~]#

swift基础操作

查看存储桶列表
swift list


x创建存储桶
swift post <container-name>
swift post my-container

删除存储桶
swift delete <container-name>
swift delete my-container


上传文件或目录
- 上传文件：
  swift upload <container-name> <file-path>
  swift upload my-container /path/to/file.txt
- 上传目录：
  swift upload <container-name> <directory-path>
  swift upload my-container /path/to/directory


下载文件或目录
- 下载文件：
  swift download <container-name> <object-name>
  swift download my-container file.txt
- 下载目录：
  swift download <container-name> --prefix=<prefix>
  swift download my-container --prefix=directory/


 6）删除文件或目录
- 删除文件：
   
  swift delete <container-name> <object-name>
  swift delete my-container file.txt
- 删除目录：
   
  swift delete <container-name> --prefix=<prefix>
  swift delete my-container --prefix=directory/
**其他注意事项**
- 在生产环境中，建议配置 HTTPS 和访问控制策略，以确保数据的安全性。
- 如果需要与 OpenStack 集成，需要在 OpenStack 控制节点上配置 Swift 的 endpoint，并确保 Ceph 的 RADOS Gateway 服务正常运行。

10 crush

10.1 CRUSH Map概述

CRUSH英文全称为"Controlled Replication Under Scalable Hashing"，是Ceph的核心设计之一，它本质上是ceph存储集群使用的一种数据分发算法，类似于openstack的swift的AQS对象存储所使用的哈希和一致性hash数据分布算法。

CRUSH算法通过接收多维参数，通过一定的计算对客户端对象数据进行分布存位置的确定，来解决数据动态分发的问题。因此ceph客户端无需经过传统查表的方式来获取数据的索引，进而根据索引来读取数据，只需通过crush算法计算后直接和对应的OSD交互进行数据读写。这样，ceph就避免了查表这种传统中心化存在的单点故障，性能瓶颈以及不易扩展的缺陷。这就是ceph相较于其他分布式存储系统具有高扩展性，高可用和高性能特点的主要原因。

ceph中的寻找至少要经历以下三次映射:
– 1.File和object映射:
文件数据object的数据块切片操作，便于多数据的并行化处理。
– 2.Object和PG映射:
将文件数据切分后的每一个Object通过简单的Hash算法归到一个PG中。
– 3.PG和OSD映射:
将PG映射到主机实际的OSD数据磁盘上。

CRUSH算法提供了配置和更改和数据动态再平衡等关键特性，而CRUSH算法存储数据对象的过程可通过CRUSH Map控制并进行自定义修改，CRUSH map是ceph集群物理拓扑结构，副本策略以及故障域等信息抽象配置段，借助于CRUSH Map可以将数据伪随机地分布到集群的各个OSD上。

OSD出现异常的时候，为了避免故障风暴，往往会实现一个所谓的故障域。

10.2 自定义crush规则实战案例

环境准备

每个节点添加一块硬盘并加入

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         7.68745  root default                               
-3         2.56248      host ceph141                           
 0    hdd  0.29300          osd.0         up   1.00000  1.00000
 1    hdd  0.48830          osd.1         up   1.00000  1.00000
 2    hdd  1.00000          osd.2         up   1.00000  1.00000
 9    hdd  0.78119          osd.9         up   1.00000  1.00000
-5         2.56248      host ceph142                           
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
 4    hdd  0.48830          osd.4         up   1.00000  1.00000
 5    hdd  1.00000          osd.5         up   1.00000  1.00000
10    hdd  0.78119          osd.10        up   1.00000  1.00000
-7         2.56248      host ceph143                           
 6    hdd  0.29300          osd.6         up   1.00000  1.00000
 7    hdd  0.48830          osd.7         up   1.00000  1.00000
 8    hdd  1.00000          osd.8         up   1.00000  1.00000
11    hdd  0.78119          osd.11        up   1.00000  1.00000

从monitor节点上获取CRUSH map

ceph osd getcrushmap -o cmy-hdd.file
编译为可读文件
apt -y install ceph-base
crushtool -d cmy-hdd.file -o cmy-hdd-ssd.file

修改规则并应用

cat cmy-hdd-ssd.file
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph141 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.29300
        item osd.1 weight 0.48830
        item osd.2 weight 1.00000
}
host ceph142 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.3 weight 0.29300
        item osd.4 weight 0.48830
        item osd.5 weight 1.00000
}
host ceph143 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.6 weight 0.29300
        item osd.7 weight 0.48830
        item osd.8 weight 1.00000
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 7.68745
        alg straw2
        hash 0  # rjenkins1
        item ceph141 weight 2.56248
        item ceph142 weight 2.56248
        item ceph143 weight 2.56248
}

host ceph141-ssd {
        id -13          # do not change unnecessarily
        id -14 class hdd                # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.9 weight 0.78119
}
host ceph142-ssd {
        id -15          # do not change unnecessarily
        id -16 class hdd                # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.10 weight 0.78119
}
host ceph143-ssd {
        id -17          # do not change unnecessarily
        id -18 class hdd                # do not change unnecessarily
        # weight 2.56248
        alg straw2
        hash 0  # rjenkins1
        item osd.11 weight 0.78119
}

root linux97 {
        id -11          # do not change unnecessarily
        id -12 class hdd                # do not change unnecessarily
        # weight 7.68745
        alg straw2
        hash 0  # rjenkins1
        item ceph141-ssd weight 2.56248
        item ceph142-ssd weight 2.56248
        item ceph143-ssd weight 2.56248
}
# rules
rule replicated_rule {
        id 0
        type replicated
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

rule cmy {
        id 99
        type replicated
        step take linux97
        step chooseleaf firstn 0 type host
        step emit
}
应用配置文件
[root@ceph141 ~]# crushtool -c cmy-hdd-ssd.file -o cmy-hdd-ssd.crushmap
[root@ceph141 ~]# 
[root@ceph141 ~]# file cmy-hdd-ssd.crushmap
cmy-hdd-ssd.crushmap: data
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd setcrushmap -i cmy-hdd-ssd.crushmap
46
[root@ceph141 ~]#

验证crush规则

查看OSD信息
[root@ceph141 ~]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-11         7.68741  root linux97                                   
-13         2.56247      host ceph141-ssd                           
  9    hdd  0.78119          osd.9             up   1.00000  1.00000
-15         2.56247      host ceph142-ssd                           
 10    hdd  0.78119          osd.10            up   1.00000  1.00000
-17         2.56247      host ceph143-ssd                           
 11    hdd  0.78119          osd.11            up   1.00000  1.00000
 -1         7.68741  root default                                   
 -3         2.56247      host ceph141                               
  0    hdd  0.29300          osd.0             up   1.00000  1.00000
  1    hdd  0.48830          osd.1             up   1.00000  1.00000
  2    hdd  1.00000          osd.2             up   1.00000  1.00000
 -5         2.56247      host ceph142                               
  3    hdd  0.29300          osd.3             up   1.00000  1.00000
  4    hdd  0.48830          osd.4             up   1.00000  1.00000
  5    hdd  1.00000          osd.5             up   1.00000  1.00000
 -7         2.56247      host ceph143                               
  6    hdd  0.29300          osd.6             up   1.00000  1.00000
  7    hdd  0.48830          osd.7             up   1.00000  1.00000
  8    hdd  1.00000          osd.8             up   1.00000  1.00000
[root@ceph141 ~]# 
创建存储池

 ceph osd pool create pool-cmy 8 8 replicated cmy --autoscale-mode=off
查看存储池对应的规则id（注意观察pool，crush_rule字段对应的ID）
ceph osd pool ls detail | grep pool-cmy
pool 12 'pool-cmy' replicated size 3 min_size 2 crush_rule 99 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode off last_change 732 flags hashpspool stripe_width 0 read_balance_score 1.13


查看存储池的PG底层对应的OSD范围为[9-11，共计3块磁盘]

[root@ceph141 ~]# ceph pg ls-by-pool pool-cmy | awk '{print $1,$2,$16 }'
PG OBJECTS ACTING
12.0 0 [9,10,11]p9
12.1 0 [10,9,11]p10
12.2 0 [10,9,11]p10
12.3 1 [9,10,11]p9
12.4 0 [9,10,11]p9
12.5 0 [10,9,11]p10
12.6 0 [11,9,10]p11
12.7 0 [11,9,10]p11

* NOTE: depending
写入测试数据
rados put myhosts /etc/hosts -p pool-cmy
查看对象对应的OSD映射关系
[root@ceph141 ~]# ceph osd map pool-cmy myhosts
osdmap e778 pool 'pool-cmy' (12) object 'myhosts' -> pg 12.5a39779b (12.3) -> up ([9,10,11], p9) acting ([9,10,11], p9)
[root@ceph141 ~]#