世紀大坑之 systemd & symlink & ProtectHome

本來在愉快的部署 ceph 測試環境, 但是啟動 osd 是發現如下錯誤:

Sep 28 09:32:09 ceph-n1 ceph-osd-prestart.sh[17684]: /usr/lib/ceph/ceph-osd-prestart.sh: line 55: [: too many arguments
Sep 28 09:32:09 ceph-n1 ceph-osd[17724]: 2016-09-28 09:32:09.662028 7f92866a7800 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (13) Permission denied

腳本執行異常, 隨打開 ceph-osd-prestart.sh 腳本查看第55行:

20 data="/var/lib/ceph/osd/${cluster:-ceph}-$id"

53 # ensure ownership is correct
54 owner="`stat -c %U $data/.`"
55 if [ $owner != 'ceph' -a $owner != 'root' ]; then     
56     echo "ceph-osd data dir $data is not owned by 'ceph' or 'root'"
57     echo "you must 'chown -R ceph:ceph ...' or similar to fix ownership"
58     exit 1
59 fi

結合報錯信息, 判斷 owner 沒有正常獲取到 $data 目錄的所有者. 然后開始 debug 該腳本, 添加了幾個 echo, 查看 stat 命令是否正常, $data 變量是否正常. 期間補充了一點關于 "@", 關于 unit 文件中 "%" 的知識.

[root@ceph-n1 osd]# systemctl reset-failed ceph-osd@0
[root@ceph-n1 osd]# systemctl start ceph-osd@0
[root@ceph-n1 osd]# journalctl -xe

調試期間, 上述命令執行了一百遍~~, 百遍~~, 遍~~, 遍~~....

然后, 半天過去了.

結論是, $data 獲取正常, stat 命令正常, 手動執行該 stat 命令正常, 基本可以排除腳本本身的問題. 但是, 發現 ceph-osd-prestart.sh 中凡是涉及到 $data 的命令全部失敗, 隨后開始排查 $data 目錄.

[root@ceph-n1 osd]# ll /var/lib/ceph/osd/
total 0
lrwxrwxrwx 1 root root 15 Sep 28 09:30 ceph-0 -> /home/ceph/osd0

發現 $data 指向的目錄是一個軟連接. 因為手動執行相關命令是正常的, 開始懷疑 systemd 對 symlink(軟連接) 的支持是不是有問題.

google 中搜索 "systemd symlink", 基本都是 systemctl enable 相關的信息, 沒有找到與 ExecStartPre 相關的信息.

https://bugzilla.redhat.com/show_bug.cgi?id=955379#c14
Lennart Poettering 2013-05-06 12:16:16 EDT
"systemctl enable" is about enabling vendor supplied unit files. It will only create and remove symlinks in /etc/ and /run/, that's all it does. So right now it's a pretty safe tool: it will create/override/remove the modifiable configuration via symlinks and strictly leave vendor supplied static data untouched, since it is stored in real files. However, if we suddenly allow enabling of symlinks, then this clear separation goes away.

This gets particularly nasty for disabling things, because that removes all symlinks to the destination file, and how should it know when to stop precisely?

So, yeah, I am pretty sure we shouldn't allow "enabling of symlinks".

What we should support however is enabling of unit files that are outside of the usualy search paths, via specifiying full absolute paths. i.e. "systemctl enable /var/lib/foo/bar.service" should link it to /etc/systemd/system/bar.service and do everything listed in [Install]. Now, I originally implemented things to work like that, but this might got broken one time...

Andrew, so if you'd call "systemctl enable" directly on the original unit file, instead of via a symlink, then everything should be fine for you, right?

Lennart Poettering 是 systemd 的作者, 上述回復的大概意思是: "systemctl enable" 命令只是用來在 /etc 或者 /run 目錄下創建/刪除 unit 文件的軟連接, 僅此而已. 出于管理及安全方面的考慮, 被鏈接的 unit 文件必須是真實文件, 而不能是軟連接. 此外, 為了使 systemctl enable 更加靈活, 應該支持絕對路徑作為 "systemctl enable" 參數, 從而支持默認搜索路徑之外的 unit 文件.

然并卵, 雖然是關于軟連接的, 但這和我遇到的問題其實沒什么關系. 就在我要放棄的時候, 我注意到 ceph-osd@.service 中的兩個配置:

ProtectHome=true
ProtectSystem=full

憑借我有限的英文知識, 我的直覺告訴我, 馬上就要破案了. 我立刻查了官方文檔, 文檔如下:

ProtectHome=
   Takes a boolean argument or "read-only". If true, the directories
   /home, /root and /run/user are made inaccessible and empty for
   processes invoked by this unit. If set to "read-only", the three
   directories are made read-only instead. It is recommended to
   enable this setting for all long-running services (in particular
   network-facing ones), to ensure they cannot get access to private
   user data, unless the services actually require access to the
   user's private data. Note however that processes retaining the
   CAP_SYS_ADMIN capability can undo the effect of this setting.
   This setting is hence particularly useful for daemons which have
   this capability removed, for example with CapabilityBoundingSet=.
   Defaults to off.

ProtectHome 可以設置為 true/false/read-only. 設置為 true 的時候, /home, /root, /run/user 對應用不可見. 設置為 read-only, 上述三個目錄對應用只讀. 設置為 false, 則應用可以正常訪問這三個目錄. 默認值是 false. 為了保證應用不能訪問用戶私有數據, 建議所有長時間運行的服務開啟該選項.

ProtectSystem=
   Takes a boolean argument or "full". If true, mounts the /usr and
   /boot directories read-only for processes invoked by this unit.
   If set to "full", the /etc directory is mounted read-only, too.
   This setting ensures that any modification of the vendor-supplied
   operating system (and optionally its configuration) is prohibited
   for the service. It is recommended to enable this setting for all
   long-running services, unless they are involved with system
   updates or need to modify the operating system in other ways.
   Note however that processes retaining the CAP_SYS_ADMIN
   capability can undo the effect of this setting. This setting is
   hence particularly useful for daemons which have this capability
   removed, for example with CapabilityBoundingSet=. Defaults to
   off.

ProtectSystem 可以設置為 true/false/full. 設置為 true, /usr, /boot 被設置為只讀. 設置為 full, /usr, /boot, /etc 被設置為只讀. 設置為 false, 則應用可以正常訪問上述目錄. 這個選項可以保護系統目錄不會被應用修改, 建議所有長時間運行的服務開啟該選項.

到此, 可以結案了. 由于 ceph-osd@.service 中開啟了 ProtectHome 選項, ceph 無法訪問 /home/ceph/osd0 目錄, /var/lib/ceph/osd/ceph-0 軟鏈失效, 致使 ceph 無法啟動.

解決辦法有兩個:

  1. 關閉 ProtectHome 選項
  2. 將 /home/ceph/osd0 移出 /home

為了遵循官方的建議, 這里我選擇第二種辦法.

教訓:

  1. 英文好才是真的好!! 如果早一點注意到 ProtectHome 這個選項.....
  2. 英文好才是真的好!! 如果早一點注意到 ProtectHome 這個選項.....
  3. 英文好才是真的好!! 如果早一點注意到 ProtectHome 這個選項.....
  4. google 搜索的時候, 如果翻了兩頁都找不到有用的信息. 肯定是方向錯了.
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • muahao閱讀 2,098評論 0 3
  • 系統環境: centos73.10.0-514.26.2.el7.x86_64 機器數量:五臺 硬盤:四塊一塊為系...
    think_lonely閱讀 4,782評論 0 5
  • systemd攻略 轉自http://www.lxweimin.com/p/d5305104d03a或者這個http...
    x1596357閱讀 2,938評論 0 3
  • systemd攻略 相關文檔 arch 的 systemd 說明頁面 (簡體中文) fedora 的 system...
    muahao閱讀 11,766評論 0 12
  • ceph簡介 Ceph是一個分布式存儲系統,誕生于2004年,是最早致力于開發下一代高性能分布式文件系統的項目。隨...
    愛吃土豆的程序猿閱讀 6,084評論 0 21