Ceph: too many PGs per OSD
Before setting PG count you need to know 3 things.
1. Number of OSD
ceph osd ls
Sample Output:
0
1
2
Here Total number of osd is three.
2. Number of Pools
ceph osd pool ls
or rados lspools
Sample Output:
rbd
images
vms
volumes
backups
Here Total number of pool is five.
3. Replication Count
ceph osd dump | grep repli
Sample Output:
pool 0 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 38 flags hashpspool stripe_width 0
pool 1 'images' replicated size 2 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 30 pgp_num 30 last_change 40 flags hashpspool stripe_width 0
pool 2 'vms' replicated size 2 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 30 pgp_num 30 last_change 42 flags hashpspool stripe_width 0
pool 3 'volumes' replicated size 2 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 30 pgp_num 30 last_change 36 flags hashpspool stripe_width 0
pool 4 'backups' replicated size 2 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 30 pgp_num 30 last_change 44 flags hashpspool stripe_width 0
You can see each pool has replication count two.
Now Let get into calculation
Calculations:
Total PGs Calculation:
Total PGs = (Total_number_of_OSD * 100) / max_replication_count
This result must be rounded up to the nearest power of 2.
Example:
No of OSD: 3
No of Replication Count: 2
Total PGs = (3 * 100) / 2 = 150. Nearest Power of 150 to 2 is 256.
So Maximum Recommended PGs is 256
You can set PG for every Pool
Total PGs per pool Calculation:
Total PGs = ((Total_number_of_OSD * 100) / max_replication_count) / pool count
This result must be rounded up to the nearest power of 2.
Example:
No of OSD: 3
No of Replication Count: 2
No of pools: 5
Total PGs = ((3 * 100) / 2 ) / 5 = 150 / 5 = 30 . Nearest Power of 30 to 2 is 32.
So Total No of PGs per pool is 32.
Power of 2 Table:
2^0 1
2^1 2
2^2 4
2^3 8
2^4 16
2^5 32
2^6 64
2^7 128
2^8 256
2^9 512
2^10 1024
Useful Commands
ceph osd pool create <pool-name> <pg-number> <pgp-number> - To create a new pool
ceph osd pool get <pool-name> pg_num - To get number of PG in a pool
ceph osd pool get <pool-name> pgp_num - To get number of PGP in a pool
ceph osd pool set <pool-name> pg_num <number> - To increase number of PG in a pool
ceph osd pool set <pool-name> pgp_num <number> - To increase number of PGP in a pool
*usually pg and pgp number is same
How I fixed it in 12.2.4 luminous:
Too many PGs per OSD (380 > max 200) may lead you to many blocking requests.
First you need to set:
[global]
mon_max_pg_per_osd = 800 # < depends on you amount of PGs
osd max pg per osd hard ratio = 10 # < default is 2, try to set at least 5. It will be
mon allow pool delete = true # without it you can't remove a pool
Then restart all MONs and OSDs, one by one.
Check the value:
ceph --admin-daemon /var/run/ceph/ceph-mon.ceph2.asok config get mon_max_pg_per_osd
ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config get osd_max_pg_per_osd_hard_ratio
Now look here:
rados lspools
ceph osd pool get .users.email pg_num
In my case by default pg_num
was 128 or something like that (my cluster is 4 years old, it was a lot of upgrades a lot of changes). You can reduce it like that.
Be careful:
ceph osd pool create .users.email.new 8
rados cppool .users.email default.rgw.lc.new
ceph osd pool delete .users.email .users.email --yes-i-really-really-mean-it
ceph osd pool rename .users.email.new .users.email
ceph osd pool application enable .users.email rgw
If it wasn't enough, try to find another pool you can cut.