Zhusl's 小站

firefox浏览器隐藏标签栏和地址栏

最近疫情一直没有好转，公司一直采取的居家办公，家里有一个13寸的笔记本和一个拓展的显示器，正好用13寸的笔记本显示器放监控页面，用拓展的显示器办公，但是在13寸显示器上放多个网页时内容显得好挤，屏幕本身就小，浏览器的标签栏和地址栏就占了好大地，随从网上找了隐藏firefox浏览器多余部分方式：userChrome.css,配置方式记录一下。

配置过程

地址栏输入：about:config -> 选择接受风险并继续

toolkit.legacyUserProfileCustomizations.stylesheets > true

地址栏输入： about:support -> 打开firefox配置文件目录(macos系统下：/Users/xxx/Library/Application Support/Firefox/Profiles/jho4709i.default-release) -> 创建chrome目录 -> 创建userChrome.css

编辑userChrome.css，用命令行vim即可，内容如下：
/*
Firefox Hide Header / Navigator / Top / Tabs / Address / Toolbox

Step 1:
about:config > toolkit.legacyUserProfileCustomizations.stylesheets > true

Step 2:
about:support > Click on "Profile Folder" -> "Open Folder"
Create folder "chrome" here, and put this file in (as "userChrome.css").

More:
Tutorial: How to create and live-debug userChrome.css : FirefoxCSS
https://www.reddit.com/r/FirefoxCSS/comments/73dvty/tutorial_how_to_create_and_livedebug_userchromecss/
*/
@-moz-document url("chrome://browser/content/browser.xul"),
url("chrome://browser/content/browser.xhtml") {
    /*
    隐藏顶部栏
    第一行15px是顶部触发显示的高度。
    如果只需要隐藏地址栏，保留标签栏，用 #nav-bar，然后可能需要适度调节。
    */
    #navigator-toolbox {
        max-height: 15px !important;
        overflow: hidden !important;
        z-index: 1000 !important;
        background: black !important;
        opacity: 0 !important;
        margin-bottom: -15px !important;
        transition: all .2s !important;
    }

    #navigator-toolbox:hover {
        max-height: none !important;
        opacity: 1 !important;
        margin-bottom: 0 !important;
    }

    /*
    窗口最大化时 #titlebar 会多出来一个 padding-top:8px，但没找到哪儿设的。
    会导致显示区上移8px，这里补回来。
    顶部hover时，显示区和顶部间还是有8px的margin，平时看不到就先不管了。
    */
    html[sizemode="maximized"] #browser{
        margin-top: 8px !important;
    }
}
重启浏览器即可生效

标签栏和地址栏会自动隐藏，鼠标移动到顶端即可显示，完美… 效果图：

配置参考

使用中有点小问题，在mac下左上角的三个按钮变得很难点，但可以使用快捷键代替 - 关闭：command+shift+w - 隐藏：command+m - 全屏：command+shift+f

本文链接：https://zhusl.com/post/firefox-bar-hide.html，参与评论 »

mac环境iterm2配置

iterm2下载地址

https://iterm2.com/downloads.html

安装oh-my-zsh

1、在线安装方式：

sh -c “$(curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"

2、离线安装方式：

下载： https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh

安装: sh install.sh

安装Powerline
sudo easy_install pip
pip install powerline-status
安装 Meslo 字体库
# clone
git clone https://github.com/powerline/fonts.git --depth=1
# install
cd fonts
./install.sh
# clean-up a bit
cd ..
rm -rf fonts
配置item2

(iTerm2>Preferences>Profiles>Text>Change Font)

使用solarized配色方案

安装agnoster(oh-my-zsh)主题
vi .zshrc # 修改ZSH_THEME
插件配置
# vim .zshrc
# 插件配置
plugins=(git history history-substring-search node npm wd web-search last-working-dir zsh-autosuggestions)
命令提示符前缀
# vim .oh-my-zsh/themes/agnoster.zsh-theme
#
### Prompt components
# Each component will draw itself, and hide itself if no information needs to be shown

# Context: user@hostname (who am I and where am I)
prompt_context() {
  if [[ "$USER" != "$DEFAULT_USER" || -n "$SSH_CLIENT" ]]; then
    prompt_segment black default "👍%(!.%{%F{yellow}%}.)%n@%m"
  fi
}
修改默认shell
chsh -s /bin/zsh
# 查看当前shell
echo #SHELL
上传下载（lrzsz）

安装： mac: brew install lrzsz linux: yum install lrzsz # apt-get install lrzsz 使用的两个shell脚本：iterm2-send-zmodem.sh and iterm2-recv-zmodem.sh ,保存到/usr/local/bin iterm2-send-zmodem.sh
#!/bin/bash
# Author: Matt Mastracci (matthew@mastracci.com)
# AppleScript from http://stackoverflow.com/questions/4309087/cancel-button-on-osascript-in-a-bash-script
# licensed under cc-wiki with attribution required
# Remainder of script public domain

osascript -e 'tell application"iTerm2"to version' > /dev/null 2>&1 && NAME=iTerm2 || NAME=iTerm
if [[ $NAME = "iTerm" ]]; then
    FILE=$(osascript -e 'tell application"iTerm"to activate' -e 'tell application"iTerm"to set thefile to choose file with prompt"Choose a file to send"' -e "do shell script (\"echo \"&(quoted form of POSIX path of thefile as Unicode text)&\"\")")
else
    FILE=$(osascript -e 'tell application "iTerm2" to activate' -e 'tell application "iTerm2" to set thefile to choose file with prompt "Choose a file to send"' -e "do shell script (\"echo \"&(quoted form of POSIX path of thefile as Unicode text)&\"\")")
fi
if [[ $FILE = "" ]]; then
    echo Cancelled.
    # Send ZModem cancel
    echo -e \\x18\\x18\\x18\\x18\\x18
    sleep 1
    echo
    echo \# Cancelled transfer
else
    /usr/local/bin/sz "$FILE" --escape --binary --bufsize 4096
    sleep 1
    echo
    echo \# Received"$FILE"
fi
iterm2-recv-zmodem.sh
#!/bin/bash
# Author: Matt Mastracci (matthew@mastracci.com)
# AppleScript from http://stackoverflow.com/questions/4309087/cancel-button-on-osascript-in-a-bash-script
# licensed under cc-wiki with attribution required
# Remainder of script public domain

osascript -e 'tell application"iTerm2"to version' > /dev/null 2>&1 && NAME=iTerm2 || NAME=iTerm
if [[ $NAME = "iTerm" ]]; then
    FILE=$(osascript -e 'tell application"iTerm"to activate' -e 'tell application"iTerm"to set thefile to choose folder with prompt"Choose a folder to place received files in"' -e "do shell script (\"echo \"&(quoted form of POSIX path of thefile as Unicode text)&\"\")")
else
    FILE=$(osascript -e 'tell application "iTerm2" to activate' -e 'tell application "iTerm2" to set thefile to choose folder with prompt "Choose a folder to place received files in"' -e "do shell script (\"echo \"&(quoted form of POSIX path of thefile as Unicode text)&\"\")")
fi

if [[ $FILE = "" ]]; then
    echo Cancelled.
    # Send ZModem cancel
    echo -e \\x18\\x18\\x18\\x18\\x18
    sleep 1
    echo
    echo \# Cancelled transfer
else
    cd "$FILE"
    /usr/local/bin/rz --rename --escape --binary --bufsize 4096
    sleep 1
    echo
    echo
    echo \# Sent \-\> $FILE
fi
# 在 / usr/loal/bin 目录下创建两个文件
cd /usr/local/bin
wget https://raw.githubusercontent.com/RobberPhex/iterm2-zmodem/master/iterm2-recv-zmodem.sh
wget https://raw.githubusercontent.com/RobberPhex/iterm2-zmodem/master/iterm2-send-zmodem.sh

# 赋予这两个文件可执行权限
chmod 777 /usr/local/bin/iterm2-*
设置： Tirgger
Regular expression: rz waiting to receive.\*\*B0100
Action: Run Silent Coprocess
Parameters: /usr/local/bin/iterm2-send-zmodem.sh
Instant: checked

Regular expression: \*\*B00000000000000
Action: Run Silent Coprocess
Parameters: /usr/local/bin/iterm2-recv-zmodem.sh
Instant: checked
点击 iTerm2 的设置界面 Perference-> Profiles -> Default -> Advanced -> Triggers 的 Edit 按钮，加入以下配置

添加两条 trigger，分别设置 Regular expression，Action，Parameters，Instant 如下：

上传下载2（trzsz）

说明： lrzsz太为常用，被公司封禁

引用链接：https://trzsz.github.io/cn/iterm2

github： https://github.com/trzsz/trzsz

安装：

mac：brew update && brew install trzsz

linux： pip install trzsz

配置：

trzsz命令路径：
which trzsz-iterm2
/usr/local/bin/trzsz-iterm2
配置iterm2 打开 iTerm -> Preferences... -> Profiles -> ( 在左边选中一个 Profile ) -> Advanced -> Triggers -> Edit -> [+]，如下配置：

Name Value 说明

Regular Expression :(:TRZSZ:TRANSFER:[SR]:\d+\.\d+\.\d+:\d+) 前后没有空格

Action Run Silent Coprocess...

Parameters /usr/local/bin/trzsz-iterm2 \1 前后没有空格

Enabled ✅ 勾选

不要选中最下面的 Use interpolated strings for parameters。

注意 /usr/local/bin/trzsz-iterm2 要替换成真实的 trzsz-iterm2 绝对路径。

不同 Profile 的 Trigger 是互相独立的，也就是每个用到的 Profile 都要进行配置。

Trigger 的配置是允许输入多行的，但只会显示一行，注意不要多复制了一个换行符进去。

打开 iTerm2 -> Preferences... -> General -> Magic，选中 Enable Python API。

设置 ITERM2_COOKIE 环境变量可以使启动速度更快。打开 iTerm2 -> Preferences... -> Advanced，筛选 COOKIE，选择 Yes。

进度条配置

text进度条

升级 iTerm2 到 Build 3.5.20220327-nightly 以上的版本。

Trigger 的 Parameters 配置增加 -p text 参数。 /usr/local/bin/trzsz-iterm2 -p text \1 注意 /usr/local/bin/trzsz-iterm2 要替换成真实的 trzsz-iterm2 绝对路径。

zenity进度条

安装 zenity brew install ncruces/tap/zenity

如果 Mac M1 安装失败，可以试试用 go 进行编译安装： brew install go go install 'github.com/ncruces/zenity/cmd/zenity@latest' sudo cp ~/go/bin/zenity /usr/local/bin/zenity

执行 ls -l /usr/local/bin/zenity 应该输出 zenity 可执行文件或软链。不然可以建个软链： sudo ln -sv $(which zenity) /usr/local/bin/zenity

默认保存路径

如果你想自动下载文件到指定目录，而不是每次都弹窗询问。

例如，自动下载文件到 /Users/xxxxx/Downloads

使用文本进度条，将 /usr/local/bin/trzsz-iterm2 -p text \1 改为： /usr/local/bin/trzsz-iterm2 -p text -d '/Users/xxxxx/Downloads' \1

使用 zenity 进度条，将 /usr/local/bin/trzsz-iterm2 \1 改为： /usr/local/bin/trzsz-iterm2 -p zenity -d '/Users/xxxxx/Downloads' \1 注意 /usr/local/bin/trzsz-iterm2 要替换成真实的 trzsz-iterm2 绝对路径。

history问题(zsh)

linux记录历史命令到文件不生效时增加以下配置
# vi  .zshrc
# 增加一行
precmd () { eval "$PROMPT_COMMAND" }
session复制

增加下面配置文件后复制session便不再需要重新输入密码。
# vim .ssh/config

Host *
ServerAliveInterval 30
ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
ControlPersist yes
本文链接：https://zhusl.com/post/mac_iterm2_env.html，参与评论 »

Name	Value	说明
Regular Expression	`:(:TRZSZ:TRANSFER:[SR]:\d+\.\d+\.\d+:\d+)`	前后没有空格
Action	`Run Silent Coprocess...`
Parameters	`/usr/local/bin/trzsz-iterm2 \1`	前后没有空格
Enabled	✅	勾选

ceph-手动更换osd的journal分区

针对版本： 10.2.11

高版本使用bulestore的来说，修改方式也应该类似，只不过journal换成了wal

操作目的

ceph日常使用过程中，一般都会采用一块ssd对应多块hdd，ssd上就需要创建多个journal分区，使用中如果osd出现故障可以直接执行提除osd步骤，然后新建osd(ceph -disk命令)添加到集群，但是journal分区会一直在ssd的磁盘上按顺序往后创建，ssd容量较小或osd更换次数较多后，之前osd使用的journal能否再次给新添加的osd使用呢，带着这个问题，查阅了相关资料，记录一次手动更换现有osd的journal分区，并指定ssd磁盘上的某一个分区，过程如下：

journal磁盘分区显示

作为ssd的journal分区的磁盘上的分区表是这样的(gpt分区需要使用parted命令)：
[root@node1 ceph-4]# parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model: ATA INTEL SSDSC2KB96 (scsi)
Disk /dev/sdb: 960GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name          Flags
 1      1049kB  21.0GB  21.0GB               ceph journal
 2      21.0GB  41.9GB  21.0GB               ceph journal
 3      41.9GB  61.9GB  20.0GB               ceph journal
 4      61.9GB  82.9GB  21.0GB               ceph journal
ceph-disk list显示

ceph-disk 命令查看当前机器上的磁盘显示情况：

说明：

osd2:(data:sde1,journal:sdb1)

osd3:(data:sdf1, journal:sdb2)
[root@node2 ~]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 other, swap
/dev/sda :
 /dev/sda2 other, LVM2_member
 /dev/sda1 other, xfs, mounted on /boot
/dev/sdb :
 /dev/sdb1 ceph journal, for /dev/sde1
 /dev/sdb2 ceph journal, for /dev/sdf1
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde :
 /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sdb1
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.3, journal /dev/sdb2
现在更换上面osd3的journal分区到一个新的分区：

创建新的journal分区

使用parted命令在sdb盘上手动创建分区，只需要指定分区名字即可：
parted /dev/sdb
(parted) print                                                            
Model: ATA INTEL SSDSC2KB96 (scsi)
Disk /dev/sdb: 960GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 
Number  Start   End     Size    File system  Name          Flags
 1      1049kB  21.0GB  21.0GB               ceph journal
 2      21.0GB  41.9GB  21.0GB               ceph journal

(parted) mkpart 'ceph journal'  41.9GB   62.9GB                            
(parted) print
Model: ATA INTEL SSDSC2KB96 (scsi)
Disk /dev/sdb: 960GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name          Flags
 1      1049kB  21.0GB  21.0GB               ceph journal
 2      21.0GB  41.9GB  21.0GB               ceph journal
 3      41.9GB  62.9GB  21.0GB               ceph journal
这一步（mkpart ‘ceph journal’ 41.9GB 62.9GB ）在sdb上创建了一个20GB的journal分区，名字就按ceph默认使用的命名了。

获取journal的uuid

查看该分区的uuid并记录一下(gpt格式的分区，软连接在by-partuuid目录下)：
[root@node2 ~]# ll /dev/disk/by-partuuid/ -l
total 0
lrwxrwxrwx 1 root root 10 Jul 18 17:35 1da8badf-097c-4c22-a1da-f07bd3bf5699 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Jul 18 17:35 32b1f441-8b0e-4524-824a-498bd3bf5660 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Jun 13 11:27 77001e8c-ae4a-4790-89c5-d93fec7ff815 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Jul 18 17:35 ac83c123-815c-4a79-8aeb-e957c8f6703e -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jun 13 11:27 e55ef960-bb58-462b-9a8b-438395af0836 -> ../../sde1
记录刚才创建的分区uuid是：1da8badf-097c-4c22-a1da-f07bd3bf5699

修改journal分区type code

在ceph中对data分区和journal分区都有特定的type code，目的是为了可以实现osd的自动挂载和journal软连接的自动创建：

osd的自动挂载参考：http://www.zphj1987.com/2018/03/23/parted-may-start-your-osd/

分区 code

journal 45b0969e-9b03-4f30-b4c6-b4b80ceff106

osd 4fbd7e29-9d25-41b8-afd0-062c0ceff05d

这里也修改一下刚才创建的分区，使用sgdisk命令，这个命令可以直接创建对应code的分区：

格式：/usr/sbin/sgdisk --change-name={第几个分区}:'ceph journal' --typecode={第几个分区}:45b0969e-9b03-4f30-b4c6-b4b80ceff106 -- /dev/sd{x}
/usr/sbin/sgdisk  --change-name=3:'ceph journal' --typecode=3:45b0969e-9b03-4f30-b4c6-b4b80ceff106  -- /dev/sdb
停止osd

刷新数据到osd，然后停止该osd：
ceph-osd -i 3 --flush-journal
systemctl stop ceph-osd@3
修改软连接

修改journal软连接：
cd  cd /var/lib/ceph/osd/ceph-3/

mv journal journal-bak
ln -s /dev/disk/by-partuuid/1da8badf-097c-4c22-a1da-f07bd3bf5699 /var/lib/ceph/osd/ceph-3/journal
echo 1da8badf-097c-4c22-a1da-f07bd3bf5699  > journal_uuid
修改分区权限

修改分区权限：
chown ceph:ceph  /dev/sdb3
chown ceph:ceph journal
初始化

初始化journal分区：
ceph-osd -i 3 --mkjournal
2019-07-18 16:42:04.205380 7fc57ff00ac0 -1 journal check: ondisk fsid acf3fc4d-0e69-4a4e-a031-e2024437f445 doesn't match expected 332e849b-2e86-401d-bfeb-85746a16dcff, invalid (someone else's?) journal
2019-07-18 16:42:04.209956 7fc57ff00ac0 -1 created new journal /var/lib/ceph/osd/ceph-3/journal for object store /var/lib/ceph/osd/ceph-3
启动osd

重新启动osd：
systemctl start ceph-osd@3
检查磁盘状态

查看ceph 磁盘信息：
[root@node2 ceph-3]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 other, swap
/dev/sda :
 /dev/sda2 other, LVM2_member
 /dev/sda1 other, xfs, mounted on /boot
/dev/sdb :
 /dev/sdb3 other, journal, for /dev/sdf1
 /dev/sdb1 ceph journal, for /dev/sde1
 /dev/sdb2 ceph journal
/dev/sde :
 /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sdb1
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.3, journal /dev/sdb3
到此journal已经更换成功，按此操作就可以根据实际需求进行调整journal的分区位置了。

本文链接：https://zhusl.com/post/ceph-change-journal.html，参与评论 »

分区	code
journal	45b0969e-9b03-4f30-b4c6-b4b80ceff106
osd	4fbd7e29-9d25-41b8-afd0-062c0ceff05d

在cephfs下快速统计目录大小和文件数量

在linux下统计目录的信息，一般都会使用du命令，如果有特别大的目录，文件数量特别多的目录，使用du进行统计是个非常耗时的过程，在cephfs这种分布式的文件系统中，则耗时会更长，今天从网上看到一个在cephfs下快速获取目录详情的命令，记录一下：例：
getfattr -d -m ceph.dir.* /mnt/cephfs
getfattr -d -m ceph.dir.* /mnt/cephfs/dir1

#cd  /mnt/cephfs/dir1
# getfattr  -d  –m  ceph.dir.*   .
# file: .
ceph.dir.entries="4"         当前目录下共有4个子目录
ceph.dir.files="2"            目录下文件个数位2（允许是普通文件、连接文件等，应该是除了目录文件都是）
ceph.dir.rbytes="23867859016"         递归来统计，该目录消耗的总空间为23867859016字节
ceph.dir.rctime="1554285880.09201328081"
ceph.dir.rentries="42385"
ceph.dir.rfiles="41781"        递归来看，该目录下文件的总数41781
ceph.dir.rsubdirs="604"       递归来看，共有604个子目录
ceph.dir.subdirs="4"          当前目录下的子目录
本文链接：https://zhusl.com/post/cephfs-get-dir.html，参与评论 »

ubuntu16.04-kubernetes+arena搭建机器学习环境

安装环境信息：

结合几天的测试，梳理了一下在kubernentes环境上构建gpu的机器学习训练环境的搭建大致过程，使用ubuntu16.04作为操作系统，安装kubernetes并添加gpu的支持。并使用阿里开源的工具arena提交训练任务。

ubuntu:16.04

kubernetes:1.10.4

cuda:9.2

cudnn:7.2.1.38

nvidia-driver:390.77

docker-ce:18.03

nvidia-docker2

helm:2.8.2

所有需要的安装包下载地址：

https://st.zhusl.com/univer/go-1.10.tgz

https://st.zhusl.com/univer/NVIDIA-Linux-x86_64-390.77.run

https://st.zhusl.com/univer/cuda_9.2.148_396.37_linux.run

https://st.zhusl.com/univer/cuda_9.2.148.1_linux.run

https://st.zhusl.com/univer/cudnn-9.2-linux-x64-v7.2.1.38.tgz

https://st.zhusl.com/univer/helm

https://st.zhusl.com/univer/k8s.1-10-4.tar.gz

安装节点：

id Ip 配置显卡数量

1 10.10.0.51 56C128G 4块 tesla v100

开始安装：

基础系统环境

下载软件包：

下载需要的安装文件：

cuda_9.2.148.1_linux.run

cuda_9.2.148_396.37_linux.run

cudnn-9.2-linux-x64-v7.2.1.38.tgz

NVIDIA-Linux-x86_64-390.77.run

升级软件包和系统内核：

原系统内核： > 4.4.0-116-generic

安装基础软件包(升级后内核：4.4.0-134-generic)：
apt-get install dkms build-essential linux-headers-generic
屏蔽nouveau驱动（系统自带nvidia显卡驱动）：
vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
update-initramfs -u
重启

安装nvidia驱动：
chmod +x NVIDIA-Linux-x86_64-390.77.run
 ./NVIDIA-Linux-x86_64-390.77.run
Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel (no)

查看显卡信息(nvidia-smi -pm 1 ): > nvidia-smi

安装cuda（9.2）：
chmod +x cuda_9.2.148_396.37_linux.run
./cuda_9.2.148_396.37_linux.run
(accept n y y n)
安装补丁：
chmod +x cuda_9.2.148.1_linux.run
./cuda_9.2.148.1_linux.run
增加环境变量（profile）：
export PATH="/usr/local/cuda-9.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME="/usr/local/cuda"
生效环境变量：
source /etc/profile
安装cudnn：
# tar -xzvf cudnn-9.2-linux-x64-v7.2.1.38.tgz
# sudo cp cuda/include/cudnn.h /usr/local/cuda/include
# sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
# sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
安装docker-ce：
apt-get -y install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get  install docker-ce=18.03.1~ce-0~ubuntu
安装nvidia-docker2：
#  头两行为清理历史数据，可省略
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
apt-get purge -y nvidia-docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |   sudo apt-key add - 

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |   sudo tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get  update
#不翻墙的话会非常慢，可以直接下载包，之后直接安装
apt-get install  nvidia-docker2=2.0.3+docker18.03.1-1 nvidia-container-runtime=2.0.0+docker18.03.1-1
修改docker的service启动文件：
vim /lib/systemd/system/docker.service 
.....
ExecStart=/usr/bin/dockerd --default-runtime=nvidia  --log   -level error   --log-opt max-size=50m --log-opt max-file=5
......
重启docker服务：
systemctl  daemon-reload
systemctl  restart docker
安装kubernetes（1.10.4）

略

调整kubelet参数支持gpu > 开启 Kubernetes 对 GPU 支持；Kubernetes GPU 文档可以参考 https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ >
> kubelet 启动时增加 –feature-gates=“Accelerators=true”

安装nvidia插件使kubernetes可以获取GPU资源（版本和kubernetes一致）
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.10/nvidia-device-plugin.yml
安装arena（参考官方文档）

安装依赖包helm(2.8.2)
wget https://st.zhusl.com/univer/helm
mv helm /usr/local/bin/

helm init --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.8.2 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
# 执行list命令没有报错表示安装成功
helm list
创建helm模版目录
mkdir /charts
git clone https://github.com/AliyunContainerService/arena.git
cp -r arena/charts/* /charts
安装tfjob支持（kubeflow）
#Install TFJob Controller
kubectl create -f arena/kubernetes-artifacts/jobmon/jobmon-role.yaml
kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-operator.yaml

#Install Dashboard（非必须）
kubectl create -f arena/kubernetes-artifacts/dashboard/dashboard.yaml

#Install MPIJob Controller
kubectl create -f arena/kubernetes-artifacts/mpi-operator/mpi-operator.yaml
安装arena

https://github.com/kubeflow/arena

需提前安装go
wget  https://st.zhusl.com/univer/go-1.10.tgz
tar zxf  go-1.10.tgz ; mv go /usr/local/
配置环境变量：
vi /etc/profile
export GOPATH=/var/lib/go
export GOROOT=/usr/local/go
export PATH=/usr/local/go/bin:$PATH
#环境变量生效
source /etc/profile
编译arena
mkdir -p $GOPATH/src/github.com/kubeflow
cd $GOPATH/src/github.com/kubeflow
git clone https://github.com/AliyunContainerService/arena.git
cd arena
make
增加arena到PATH
vi /etc/profile
export PATH=/var/lib/go/src/github.com/kubeflow/arena/bin:$PATH
测试，查看节点信息
arena top node
tensorflow上的一段代码跑的示例： > 详细使用方式参考官方提供的文档
# 官方示例
arena submit tf \
             --name=tf-git \
             --gpus=1 \
             --image=zhusl/tensorflow:1.5.0-devel-gpu \
             --syncMode=git \
             --syncSource=https://github.com/cheyang/tensorflow-sample-code.git \
             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 100"
本文链接：https://zhusl.com/post/2018-09-01-ubuntu-gpu-install.html，参与评论 »

id	Ip	配置	显卡数量
1	10.10.0.51	56C128G	4块 tesla v100

Devops 并不那么遥远

转载自Caicloud公众号：DevOps，并不那么遥远

DevOps最早是在2009年被人提出，不过刚开始只是停留在概念上。虽然愿景非常美好，但是真正实施起来非常困难。随着最近几年微服务、容器等技术的兴起，使得企业对DevOps的需求更加迫切，实施变得更加容易，DevOps才逐渐被接受和重视。

什么是DevOps

DevOps不是简单地等价于Dev + Ops，很多人根据这个缩写产生误解。软件生命周期的管理，不仅仅只有dev和ops两个阶段，也不只是涉及到开发和运维两种类型的IT人员。软件生命周期应该包括立项、设计、开发、测试、运维等诸多环节，涉及到参与项目的所有人员，例如：项目经理、产品经理、架构师、开发、测试、运维等。 DevOps 应该是贯穿整个软件生命周期，包括立项、设计、开发、测试、运维等诸多环节。

有人提到 DevOps 的第一反应就是 CI/CD，认为搭建好 CI/CD 流水线就实现了 DevOps。毋庸置疑，CI/CD是DevOps在流程管理方面非常重要的组成部分，没有 CI/CD，DevOps就无从谈起。通过搭建CI/CD流水线，将软件生命周期中最核心的三个环节开发、测试和部署规范化、自动化管理起来，实现持续集成和持续部署，提高软件开发效率和迭代速度。

还有些人认为 DevOps 就是将很多工具组织起来，实现自动化。DevOps 虽然强调充分利用工具实现自动化，但是更重要的是通过规范的流程，将整个工具链打通，使各团队间更加高效地协同工作。

因此，DevOps 是一种强调产品管理、软件开发和运营专员间沟通协作的软件开发和交付流程。它包括三个要素：文化，流程和工具，只要将这三个要素落实，就能真正地实践 DevOps。

DevOps is a software development and delivery process that emphasizes communication and collaboration between product management, software development, operations professionals and close alignment with business objectives. – Wikipedia

为什么需要 DevOps

随着技术的飞速发展，人们对软件服务的要求越来越高。软件必须快速地迭代，才能满足市场不断变化的需求。软件功能的上线速度，一定程度上决定了市场的机会和份额。然而，这种快速的迭代和发布，势必会让开发和运维间不可调和的冲突变得更加严峻。微服务将不断地取代传统的巨石应用。以前只需要管理一个软件，微服务化之后可能就需要同时管理几十甚至上百个微服务应用，每个应用都有自己独立的生命周期，同时又依赖于其他服务。传统的软件生命周期管理方式，已经无法满足如此复杂的需求，必须借助 DevOps 实现敏捷开发。

既然 DevOps 能够帮我们高效地管理软件生命周期，提高迭代速度和生产效率，那么企业应该如何推行 DevOps 呢？主要有两种策略：

自下而上

先成立一个 DevOps 团队，尝试和积累 DevOps 经验，并评估取得的成果。一旦这个 DevOps 团队积累了一套科学合理的经验，并通过实践检验过这套经验在所在的企业是行之有效的，就可以把这套经验在部门进行小范围进行推广，并不断地优化和改进这套经验，最终推广和普及到整个企业。这种是比较推荐而且低成本的方式，更加容易取得成功。

自上而下

公司高层必须意识到 DevOps 的意义及其重要性，并有足够的魄力在整个公司强力推行 DevOps，并且已经有一套适合本公司的 DevOps 实践方案，才能在整个公司自上而下地执行起来。由于这种方式需要的先决条件比较多，风险和成本都比较高，所以不被推荐。

如何实践 DevOps

打造DevOps团队

组建两个披萨团队。两个披萨团队是亚马逊 CEO 杰夫·贝索斯提出的，要控制团队的规模在两个披萨能够吃饱，否则过高的沟通成本，会严重影响项目的进度。

采用 Sprint 控制迭代速度，一般两周作为一个迭代周期比较合适。Sprint 太长，大家前期的积极性不高，绝大部分工作会拖到 Sprint 快结束的时候完成，同时也不能建立快速反馈、快速交付的机制；Sprint 太短，很多功能没有足够的时间开发和测试，导致软件质量没办法保证。在 Sprint 开始的时候，制定好计划，细分任务。Sprint 过程中通过 Scrum 不断地反馈、评估和调整，对于可能存在的导致项目延期的风险，及时解决。Sprint 结束之后，需要开 Retrospective 会议进行回顾和总结，好的经验继续保持，不好的地方及时改进。

每天整个团队在一起开 Scrum，简单地反馈一下自己做的事情和进度，使团队所有人都清楚彼此正在做的事情。如果遇到影响项目进度的风险，及时反馈并讨论解决方案。每个成员必须往全栈工程师的方向努力，提升自己综合能力。DevOps 团队的人数比较少，每个人都要负责比较全面的工作，对日常工作中涉及到的其他领域都要有一定的了解。这样有两个明显的优势：团队组员之间互为 Backup，尽量减少某个人的突发因素，影响到整个项目的进度；大部分工作不依赖别人就能够完成，一定程度上能够减少沟通成本，提高效率。

构建DevOps文化

共享精神

整个团队、部门、甚至公司，成为一个利益共同体，共享成功，同时共担责任。在 Retrospective 会议的时候，可以分享成功和失败经验，定期举行知识分享，共同成长，使整个团队的综合实力不断增强。

加强沟通协作

随着软件的要求越来越高，一个系统或软件已经不可能由一个人单独完成，必须涉及到各种不同角色的人来共同完成。那么，其中的沟通和协作就非常重要，往往决定了软件成功与失败。

自动化

一切能够自动化地尽量自动化，最大限度地减少重复劳动，释放人去做更加有创造性的事情。

搭建自动化流程

自动化的软件生命周期的管理，主要可以分为3个阶段：

CI(Continuous Integration)：持续集成

通过 Webhook 或者定时触发器，自动将软件从源代码构建成可以发布的包或者 Docker 镜像。一般包括如下流程：代码检出、单元测试、集成测试、静态代码分析、代码覆盖率检查、构建和推送包或者 Docker 镜像。在这个阶段，可以对各流程的结果进行严格控制，从而保证构建出来的软件的质量。CI 阶段比较强调的是各流程状态、结果的可视化，例如：流水线执行到哪个阶段、该阶段成功或者失败、查看执行日志、测试报告、静态代码分析和代码覆盖率检查结果。

CD(Continuous Delivery)：持续交付

经过在一系列环境中的部署和测试，最终将合格的软件版本发布到生产环境中，真正发挥软件的价值。环境一般包括系统集成测试环境、用户验收测试环境、性能测试环境以及生产环境，根据实际情况，环境的名称不一样，可能叫 Staging 环境、预生产环境等。 CD 阶段存在的挑战，主要包括：不同环境对应的配置如何管理；采用何种发布策略，保证线上服务不中断的情况下稳定发布；如果发布失败，如何进行回滚；微服务间如何进行注册和发现。

CO (Continuous Operation)：持续运营

通过平台提供的日志收集、监控告警、自动伸缩、健康检查等功能，保证线上环境的安全、可靠运行，达到我们设定的 SLA(ServiceLevel Agreement)。 CO 相对于 CI 和 CD，是比较新的概念，通过对线上系统的监控，达到服务的自治和自愈的目标。它需要完善的平台来支撑，因此运维人员需要向 SRE 的方向发展，最终摆脱“背锅侠”的命运。

如何评估 DevOps 效果

DevOps 的目标是通过定义软件的开发和交付流程来实现软件的价值，因此，可以通过如下指标来衡量 DevOps 效果：

部署频率: 能够由原来的几个月或者几周部署一次，缩短到几天甚至几小时部署一次。

部署成功率: 追求快速部署的同时，也要保证部署的质量。可能九十九次成功发布带来的收益，都弥补不了一次失败发布带来的损失。

问题修复时间: 用户反馈的系统缺陷，多长时间能够被解决。

业务交付时间: 市场或者用户提出的需求，多长时间能够被满足。

自动化率: 被自动化的工作占所有工作的比率。自动化不仅能提高效率，而且能最大限度地避免人为错误带来的损失。

故障恢复时间: 系统出现问题，多长时间能够响应和恢复。

当 DevOps 遇上 Docker

DevOps 的工具链非常丰富，仅 Jenkins 就提供了 1000 多种插件，基本满足各种各样的需求。在众多工具中，不得不提一下最近几年红的发紫 Docker。 Docker 提出的口号是“Build，Ship，Run”，它解决的软件生命周期的这三个阶段，跟 DevOps 中 CI/CD 的目标不谋而合。 Docker 技术的出现，能够帮 DevOps 完美地解决如下问题：

环境问题: Docker 能够非常快速、廉价提供环境，这些环境占用的资源非常少，而且用完之后可以立即释放资源。使 DevOps 再也不用为各种构建环境、测试环境发愁了。

环境一致性问题: Docker 镜像包含了软件运行所需要的所有信息，任何时候、任何地方都可以通过同一 Image 快速启动完全相同的的容器。DevOps 过程中，能够彻底摆脱环境不一致问题带来的困扰。

交付物: Docker 提供了简单、标准的统一交付物。Docker Image 作为唯一的交付物，方便、快捷地在流水线的各个阶段间流转，避免了语言、框架、不规范导等因素导致交付物的多样性。

容器生态圈: 除了上面 Docker 自身的优势以外，Docker 还带来了一个庞大的容器生态圈。基于这个生态圈提供的容器引擎、编排、镜像仓库、监控等丰富工具，DevOps 实践起来会更加地游刃有余。

如果说几年前，DevOps 听上去非常美好，但是离我们非常遥远的话，那么现在是实践 DevOps 的最佳时机。一方面快速的市场变化，迫使我们转向敏捷开发模式；另一方面容器技术的快速发展，又帮我们扫清了 DevOps 的各种障碍。所以，本着精益思想，搭上容器这股热潮，DevOps 就并不那么遥远。

本文链接：https://zhusl.com/post/devops-movement.html，参与评论 »

Kubernetes 使用Ceph作为storageClass

安装ceph-common

所有kubernetes节点安装ceph-common包，包括master节点，且版本要和连接的ceph集群版本一致（实测版本为ceph：10.2.0 k8s：1.9.1）

yum install -y ceph-common

生成 Ceph secret

使用 Ceph 管理员提供给你的 ceph.client.admin.keyring 文件，我们将它放在了 /etc/ceph 目录下，用来生成 secret（测试时直接拷贝的ceph集群/etc/ceph目录）。

grep key /etc/ceph/ceph.client.admin.keyring |awk '{printf "%s", $NF}'|base64

必须使用base64加密
将获得加密后的 key：QVFDWDA2aFo5TG5TQnhBQVl1b0lUL2V3YlRSaEtwVEhPWkxvUlE9PQ==，我们将在后面用到。

创建 Ceph secret

创建 ceph-secret.yaml 文件内容为：

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
  namespace: comall
type: "kubernetes.io/rbd"  
data:
#  key: AQDusT1aFpiKFxAAHotuwTobQTThOznK2iXR6g==
  key: QVFEdXNUMWFGcGlLRnhBQUhvdHV3VG9iUVRUaE96bksyaVhSNmc9PQo=

创建 StorageClass

创建 ceph-class.yaml 文件内容为：格式可以参考官网介绍：https://kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ceph-rbd
provisioner: kubernetes.io/rbd
parameters:
  monitors: 10.90.24.234:6789,10.90.24.129:6789,10.90.24.141:6789
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: comall
  pool: ceshi #此处默认是rbd池，生产上建议自己创建存储池隔离
  userId: admin
  userSecretName: ceph-secret
# 以下指定rbd创建时镜像和文件系统的参数
#  fsType: xfs
#  imageFormat: "2"
#  imageFeatures: "layering"

创建pvc

创建ceph-pvc.yaml 文件内容为：

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-pvc2
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  storageClassName: ceph-rbd

创建pod

创建pod，使用刚刚创建的pvc，文件内容如下：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ceph-nginx
  namespace: comall
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: ceph-nginx
    spec:
      containers:
        - name: nginx
          image: nginx:alpine
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
          volumeMounts:
            - name: ceph-rbd-volume
              mountPath: "/usr/share/nginx/html"
      volumes:
      - name: ceph-rbd-volume
        persistentVolumeClaim:
          claimName: ceph-pvc

查看pod中的挂载：

[root@test-server-4 ceph]# kubectl  get pods -n comall
NAME                          READY     STATUS    RESTARTS   AGE
ceph-nginx-6fd689788f-4stcv   1/1       Running   0          28m
my-nginx-8c56b8777-jjldx      1/1       Running   0          5d
whoami-687bd5fd4-bdrf2        1/1       Running   0          5d
whoami-687bd5fd4-hgknm        1/1       Running   0          5d
whoami-687bd5fd4-r7hj9        1/1       Running   0          5d
zk1-774f7f9bf6-f65bv          1/1       Running   0          1d
zk2-5578b64d4-kp8qs           1/1       Running   0          1d
zk3-78976db5b6-vbksv          1/1       Running   0          1d

目录/usr/share/nginx/html 就是挂载的ceph中的rbd存储，这里自动格式化成了ext4格式，可以在创建storageclass时进行指定：

[root@test-server-4 ceph]# kubectl -n comall exec -it ceph-nginx-6fd689788f-4stcv  sh
/ # df -Th
Filesystem           Type            Size      Used Available Use% Mounted on
overlay              overlay       926.6G     26.5G    900.0G   3% /
tmpfs                tmpfs          64.0M         0     64.0M   0% /dev
tmpfs                tmpfs          15.6G         0     15.6G   0% /sys/fs/cgroup
/dev/mapper/centos-root
                     xfs           926.6G     26.5G    900.0G   3% /dev/termination-log
/dev/mapper/centos-root
                     xfs           926.6G     26.5G    900.0G   3% /etc/resolv.conf
/dev/mapper/centos-root
                     xfs           926.6G     26.5G    900.0G   3% /etc/hostname
/dev/mapper/centos-root
                     xfs           926.6G     26.5G    900.0G   3% /etc/hosts
shm                  tmpfs          64.0M         0     64.0M   0% /dev/shm
/dev/rbd3            ext4            1.9G      6.0M      1.8G   0% /usr/share/nginx/html
tmpfs                tmpfs          15.6G     12.0K     15.6G   0% /var/run/secrets/kubernetes.io/serviceaccount
tmpfs                tmpfs          64.0M         0     64.0M   0% /proc/kcore
tmpfs                tmpfs          64.0M         0     64.0M   0% /proc/timer_list
tmpfs                tmpfs          64.0M         0     64.0M   0% /proc/timer_stats
tmpfs                tmpfs          64.0M         0     64.0M   0% /proc/sched_debug
tmpfs                tmpfs          15.6G         0     15.6G   0% /proc/scsi
tmpfs                tmpfs          15.6G         0     15.6G   0% /sys/firmware

本文链接：https://zhusl.com/post/storageclass-CEPH-rbd.html，参与评论 »

安装文档（1.9参考）

kubernetes 1.9.4

基于二进制文件部署本地化 kube-apiserver, kube-controller-manager , kube-scheduler

环境说明

这里配置单个Master 2个node, Master 既是 Master 也是 Node, node 只做单纯 Node

kubernetes-2: 10.90.26.2   Master
kubernetes-3: 10.90.26.3   Node
kubernetes-4: 10.90.26.4   Node

初始化环境

hostnamectl --static set-hostname hostname

kubernetes-2: 10.90.26.2
kubernetes-3: 10.90.26.3
kubernetes-4: 10.90.26.4

#编辑 /etc/hosts 文件，配置hostname 通信

vi /etc/hosts

kubernetes-2: 10.90.26.2
kubernetes-3: 10.90.26.3
kubernetes-4: 10.90.26.4

创建验证

这里使用 CloudFlare 的 PKI 工具集 cfssl 来生成 Certificate Authority (CA) 证书和秘钥文件。

安装 cfssl

mkdir -p /opt/local/cfssl

cd /opt/local/cfssl

wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 cfssl

wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 cfssljson

wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 cfssl-certinfo

chmod +x *

创建 CA 证书配置

mkdir /opt/ssl

cd /opt/ssl

# config.json 文件

vi  config.json


{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "kubernetes": {
        "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ],
        "expiry": "87600h"
      }
    }
  }
}

# csr.json 文件

vi csr.json

{
  "CN": "kubernetes",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}

生成 CA 证书和私钥


cd /opt/ssl/

/opt/local/cfssl/cfssl gencert -initca csr.json | /opt/local/cfssl/cfssljson -bare ca


[root@kubernetes-64 ssl]# ls -lt
总用量 20
-rw-r--r-- 1 root root 1005 7月   3 17:26 ca.csr
-rw------- 1 root root 1675 7月   3 17:26 ca-key.pem
-rw-r--r-- 1 root root 1363 7月   3 17:26 ca.pem
-rw-r--r-- 1 root root  210 7月   3 17:24 csr.json
-rw-r--r-- 1 root root  292 7月   3 17:23 config.json

分发证书

# 创建证书目录
mkdir -p /etc/kubernetes/ssl

# 拷贝所有文件到目录下
cp *.pem /etc/kubernetes/ssl
cp ca.csr /etc/kubernetes/ssl

# 这里要将文件拷贝到所有的k8s 机器上

scp *.pem 10.90.26.3:/etc/kubernetes/ssl/
scp *.csr 10.90.26.3:/etc/kubernetes/ssl/

scp *.pem 10.90.26.4:/etc/kubernetes/ssl/
scp *.csr 10.90.26.4:/etc/kubernetes/ssl/

安装 docker

所有服务器预先安装 docker-ce ，官方1.9 中提示，目前 k8s 支持最高 Docker versions 1.11.2, 1.12.6, 1.13.1, and 17.03.1

# 导入 yum 源

# 安装 yum-config-manager

yum -y install yum-utils

# 导入
yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo


# 更新 repo
yum makecache

# 查看yum 版本

yum list docker-ce.x86_64  --showduplicates |sort -r



# 安装指定版本 docker-ce 17.03 被 docker-ce-selinux 依赖, 不能直接yum 安装 docker-ce-selinux

wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/docker-ce-selinux-17.03.1.ce-1.el7.centos.noarch.rpm


rpm -ivh docker-ce-selinux-17.03.1.ce-1.el7.centos.noarch.rpm


yum -y install docker-ce-17.03.1.ce


# 查看安装

docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:21:36 2017
 OS/Arch:      linux/amd64

更改docker 配置

# 添加配置

vi /etc/systemd/system/docker.service



[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target docker-storage-setup.service
Wants=docker-storage-setup.service

[Service]
Type=notify
Environment=GOTRACEBACK=crash
ExecReload=/bin/kill -s HUP $MAINPID
Delegate=yes
KillMode=process
ExecStart=/usr/bin/dockerd \
          $DOCKER_OPTS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $DOCKER_DNS_OPTIONS \
          $INSECURE_REGISTRY
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=1min
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

# 重新读取配置，启动 docker 
systemctl daemon-reload
systemctl start docker
systemctl enable docker

# 如果报错 请使用
journalctl -f -t docker  和 journalctl -u docker 来定位问题

etcd 集群

etcd 是k8s集群最重要的组件， etcd 挂了，集群就挂了

安装 etcd

官方地址 https://github.com/coreos/etcd/releases

# 下载 二进制文件

wget https://github.com/coreos/etcd/releases/download/v3.2.14/etcd-v3.2.14-linux-amd64.tar.gz

tar zxvf etcd-v3.2.14-linux-amd64.tar.gz

cd etcd-v3.2.14-linux-amd64

mv etcd  etcdctl /usr/bin/

创建 etcd 证书

etcd 证书这里，默认配置三个，后续如果需要增加，更多的 etcd 节点这里的认证IP 请多预留几个，以备后续添加能通过认证，不需要重新签发

cd /opt/ssl/

vi etcd-csr.json

{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "10.90.26.2",
    "10.90.26.3",
    "10.90.26.4"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}

# 生成 etcd   密钥

/opt/local/cfssl/cfssl gencert -ca=/opt/ssl/ca.pem \
  -ca-key=/opt/ssl/ca-key.pem \
  -config=/opt/ssl/config.json \
  -profile=kubernetes etcd-csr.json | /opt/local/cfssl/cfssljson -bare etcd

# 查看生成

[root@kubernetes-2 ssl]# ls etcd*
etcd.csr  etcd-csr.json  etcd-key.pem  etcd.pem



# 拷贝到etcd服务器

# etcd-1 
cp etcd*.pem /etc/kubernetes/ssl/

# etcd-2
scp etcd*.pem 10.90.26.3:/etc/kubernetes/ssl/

# etcd-3
scp etcd*.pem 10.90.26.4:/etc/kubernetes/ssl/



# 如果 etcd 非 root 用户，读取证书会提示没权限

chmod 644 /etc/kubernetes/ssl/etcd-key.pem

修改 etcd 配置

由于 etcd 是最重要的组件，所以 –data-dir 请配置到其他路径中

# 创建 etcd data 目录， 并授权

useradd etcd

mkdir -p /opt/etcd

chown -R etcd:etcd /opt/etcd

# etcd-1


vi /etc/systemd/system/etcd.service


[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/opt/etcd/
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/usr/bin/etcd \
  --name=etcd1 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://10.90.26.2:2380 \
  --listen-peer-urls=https://10.90.26.2:2380 \
  --listen-client-urls=https://10.90.26.2:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://10.90.26.2:2379 \
  --initial-cluster-token=k8s-etcd-cluster \
  --initial-cluster=etcd1=https://10.90.26.2:2380,etcd2=https://10.90.26.3:2380,etcd3=https://10.90.26.4:2380 \
  --initial-cluster-state=new \
  --data-dir=/opt/etcd/
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

# etcd-2


vi /etc/systemd/system/etcd.service


[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/opt/etcd/
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/usr/bin/etcd \
  --name=etcd2 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://10.90.26.3:2380 \
  --listen-peer-urls=https://10.90.26.3:2380 \
  --listen-client-urls=https://10.90.26.3:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://10.90.26.3:2379 \
  --initial-cluster-token=k8s-etcd-cluster \
  --initial-cluster=etcd1=https://10.90.26.2:2380,etcd2=https://10.90.26.3:2380,etcd3=https://10.90.26.4:2380 \
  --initial-cluster-state=new \
  --data-dir=/opt/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

# etcd-3


vi /etc/systemd/system/etcd.service


[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/opt/etcd/
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/usr/bin/etcd \
  --name=etcd3 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://10.90.26.4:2380 \
  --listen-peer-urls=https://10.90.26.4:2380 \
  --listen-client-urls=https://10.90.26.4:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://10.90.26.4:2379 \
  --initial-cluster-token=k8s-etcd-cluster \
  --initial-cluster=etcd1=https://10.90.26.2:2380,etcd2=https://10.90.26.3:2380,etcd3=https://10.90.26.4:2380 \
  --initial-cluster-state=new \
  --data-dir=/opt/etcd/
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

启动 etcd

分别启动所有节点的 etcd 服务

systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
systemctl status etcd

# 如果报错 请使用
journalctl -f -t etcd  和 journalctl -u etcd 来定位问题

验证 etcd 集群状态

查看 etcd 集群状态：

etcdctl --endpoints=https://10.90.26.2:2379,https://10.90.26.3:2379,https://10.90.26.4:2379\
        --cert-file=/etc/kubernetes/ssl/etcd.pem \
        --ca-file=/etc/kubernetes/ssl/ca.pem \
        --key-file=/etc/kubernetes/ssl/etcd-key.pem \
        cluster-health

member 35eefb8e7cc93b53 is healthy: got healthy result from https://10.90.26.2:2379
member 4576ff5ed626a66b is healthy: got healthy result from https://10.90.26.3:2379
member bf3bd651ec832339 is healthy: got healthy result from https://10.90.26.4:2379
cluster is healthy

查看 etcd 集群成员：

etcdctl --endpoints=https://10.90.26.2:2379,https://10.90.26.3:2379,https://10.90.26.4:2379\
        --cert-file=/etc/kubernetes/ssl/etcd.pem \
        --ca-file=/etc/kubernetes/ssl/ca.pem \
        --key-file=/etc/kubernetes/ssl/etcd-key.pem \
        member list


35eefb8e7cc93b53: name=etcd3 peerURLs=https://10.90.26.4:2380 clientURLs=https://10.90.26.4:2379 isLeader=false
4576ff5ed626a66b: name=etcd1 peerURLs=https://10.90.26.2:2380 clientURLs=https://10.90.26.2:2379 isLeader=true
bf3bd651ec832339: name=etcd2 peerURLs=https://10.90.26.3:2380 clientURLs=https://10.90.26.3:2379 isLeader=false

配置 Kubernetes 集群

kubectl 安装在所有需要进行操作的机器上

Master and Node

Master 需要部署 kube-apiserver , kube-scheduler , kube-controller-manager 这三个组件。 kube-scheduler 作用是调度pods分配到那个node里，简单来说就是资源调度。 kube-controller-manager 作用是对 deployment controller , replication controller, endpoints controller, namespace controller, and serviceaccounts controller等等的循环控制，与kube-apiserver交互。

安装组件

# 从github 上下载版本

cd /tmp

wget https://dl.k8s.io/v1.9.4/kubernetes-server-linux-amd64.tar.gz

tar -xzvf kubernetes-server-linux-amd64.tar.gz

cd kubernetes

cp -r server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl} /usr/local/bin/


scp server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kube-proxy,kubelet} 10.90.26.3:/usr/local/bin/


scp server/bin/{kube-proxy,kubelet} 10.90.26.4:/usr/local/bin/

创建 admin 证书

kubectl 与 kube-apiserver 的安全端口通信，需要为安全通信提供 TLS 证书和秘钥。

cd /opt/ssl/

vi admin-csr.json


{
  "CN": "admin",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "system:masters",
      "OU": "System"
    }
  ]
}

# 生成 admin 证书和私钥
cd /opt/ssl/

/opt/local/cfssl/cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem \
  -ca-key=/etc/kubernetes/ssl/ca-key.pem \
  -config=/opt/ssl/config.json \
  -profile=kubernetes admin-csr.json | /opt/local/cfssl/cfssljson -bare admin


# 查看生成

[root@kubernetes-2 ssl]# ls admin*
admin.csr  admin-csr.json  admin-key.pem  admin.pem

cp admin*.pem /etc/kubernetes/ssl/

配置 kubectl kubeconfig 文件

生成证书相关的配置文件存储与 /root/.kube 目录中

# 配置 kubernetes 集群

kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/ssl/ca.pem \
  --embed-certs=true \
  --server=https://10.90.26.2:6443


# 配置 客户端认证

kubectl config set-credentials admin \
  --client-certificate=/etc/kubernetes/ssl/admin.pem \
  --embed-certs=true \
  --client-key=/etc/kubernetes/ssl/admin-key.pem
  


kubectl config set-context kubernetes \
  --cluster=kubernetes \
  --user=admin


kubectl config use-context kubernetes

创建 kubernetes 证书

cd /opt/ssl

vi kubernetes-csr.json

{
  "CN": "kubernetes",
  "hosts": [
    "127.0.0.1",
    "10.90.26.2",
    "10.90.26.3",
    "10.90.26.4",
    "10.254.0.1",
    "kubernetes",
    "kubernetes.default",
    "kubernetes.default.svc",
    "kubernetes.default.svc.cluster",
    "kubernetes.default.svc.cluster.local"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}


## 这里 hosts 字段中 三个 IP 分别为 127.0.0.1 本机， 10.90.26.2 和 10.90.26.3 ，多个Master需要写多个。  10.254.0.1 为 kubernetes SVC 的 IP， 一般是 部署网络的第一个IP , 如: 10.254.0.1 ， 在启动完成后，我们使用   kubectl get svc ， 就可以查看到

生成 kubernetes 证书和私钥

/opt/local/cfssl/cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem \
  -ca-key=/etc/kubernetes/ssl/ca-key.pem \
  -config=/opt/ssl/config.json \
  -profile=kubernetes kubernetes-csr.json | /opt/local/cfssl/cfssljson -bare kubernetes

# 查看生成

[root@kubernetes-2 ssl]# ls -lt kubernetes*
-rw-r--r-- 1 root root 1261 11月 16 15:12 kubernetes.csr
-rw------- 1 root root 1679 11月 16 15:12 kubernetes-key.pem
-rw-r--r-- 1 root root 1635 11月 16 15:12 kubernetes.pem
-rw-r--r-- 1 root root  475 11月 16 15:12 kubernetes-csr.json


# 拷贝到目录
cp kubernetes*.pem /etc/kubernetes/ssl/

配置 kube-apiserver

kubelet 首次启动时向 kube-apiserver 发送 TLS Bootstrapping 请求，kube-apiserver 验证 kubelet 请求中的 token 是否与它配置的 token 一致，如果一致则自动为 kubelet生成证书和秘钥。

# 生成 token

[root@kubernetes-2 ssl]# head -c 16 /dev/urandom | od -An -t x | tr -d ' '
df3b158fbdc425ae2ac70bbef0688921


# 创建 token.csv 文件

cd /opt/ssl

vi token.csv

df3b158fbdc425ae2ac70bbef0688921,kubelet-bootstrap,10001,"system:kubelet-bootstrap"


# 拷贝

cp token.csv /etc/kubernetes/

# 生成高级审核配置文件

cd /etc/kubernetes


cat >> audit-policy.yaml <



创建 kube-apiserver.service 文件

# 自定义 系统 service 文件一般存于 /etc/systemd/system/ 下
# 配置为 各自的本地 IP

vi /etc/systemd/system/kube-apiserver.service

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
User=root
ExecStart=/usr/local/bin/kube-apiserver \
  --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,NodeRestriction \
  --advertise-address=10.90.26.4 \
  --allow-privileged=true \
  --apiserver-count=3 \
  --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
  --audit-log-maxage=30 \
  --audit-log-maxbackup=3 \
  --audit-log-maxsize=100 \
  --audit-log-path=/var/log/kubernetes/audit.log \
  --authorization-mode=Node,RBAC \
  --bind-address=0.0.0.0 \
  --secure-port=6443 \
  --client-ca-file=/etc/kubernetes/ssl/ca.pem \
  --enable-swagger-ui=true \
  --etcd-cafile=/etc/kubernetes/ssl/ca.pem \
  --etcd-certfile=/etc/kubernetes/ssl/etcd.pem \
  --etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem \
  --etcd-servers=https://10.90.26.2:2379,https://10.90.26.3:2379,https://10.90.26.4:2379 \
  --event-ttl=1h \
  --kubelet-https=true \
  --insecure-bind-address=127.0.0.1 \
  --insecure-port=8080 \
  --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
  --service-cluster-ip-range=10.254.0.0/18 \
  --service-node-port-range=30000-32000 \
  --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
  --enable-bootstrap-token-auth \
  --token-auth-file=/etc/kubernetes/token.csv \
  --v=1
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target



# k8s 1.8 开始需要 添加 --authorization-mode=Node
# k8s 1.8 开始需要 添加 --admission-control=NodeRestriction
# k8s 1.8 开始需要 添加 --audit-policy-file=/etc/kubernetes/audit-policy.yaml

# 这里面要注意的是 --service-node-port-range=30000-32000
# 这个地方是 映射外部端口时 的端口范围，随机映射也在这个范围内映射，指定映射端口必须也在这个范围内。


启动 kube-apiserver

systemctl daemon-reload
systemctl enable kube-apiserver
systemctl start kube-apiserver
systemctl status kube-apiserver



配置 kube-controller-manager


–cluster-signing-cert-file 与 –cluster-signing-key-file 标签将被删除。


# 创建 kube-controller-manager.service 文件

vi /etc/systemd/system/kube-controller-manager.service


[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \
  --address=0.0.0.0 \
  --master=http://127.0.0.1:8080 \
  --allocate-node-cidrs=true \
  --service-cluster-ip-range=10.254.0.0/18 \
  --cluster-cidr=10.254.64.0/18 \
  --cluster-name=kubernetes \
  --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
  --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
  --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
  --root-ca-file=/etc/kubernetes/ssl/ca.pem \
  --leader-elect=true \
  --v=1
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target




启动 kube-controller-manager

systemctl daemon-reload
systemctl enable kube-controller-manager
systemctl start kube-controller-manager
systemctl status kube-controller-manager


配置 kube-scheduler

# 创建 kube-cheduler.service 文件

vi /etc/systemd/system/kube-scheduler.service


[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler \
  --address=0.0.0.0 \
  --master=http://127.0.0.1:8080 \
  --leader-elect=true \
  --v=1
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target



启动 kube-scheduler

systemctl daemon-reload
systemctl enable kube-scheduler
systemctl start kube-scheduler
systemctl status kube-scheduler



验证 Master 节点

[root@kubernetes-2 ~]# kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-2               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"} 



[root@kubernetes-2 ~]# kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-2               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}  




配置 kubelet


kubelet 启动时向 kube-apiserver 发送 TLS bootstrapping 请求，需要先将 bootstrap token 文件中的 kubelet-bootstrap 用户赋予 system:node-bootstrapper 角色，然后 kubelet 才有权限创建认证请求(certificatesigningrequests)。



# 先创建认证请求
# user 为 master 中 token.csv 文件里配置的用户
# 只需创建一次就可以

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap



创建 kubelet kubeconfig 文件

# 配置集群

kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/ssl/ca.pem \
  --embed-certs=true \
  --server=https://10.90.26.4:6443 \
  --kubeconfig=bootstrap.kubeconfig

# 配置客户端认证

kubectl config set-credentials kubelet-bootstrap \
  --token=df3b158fbdc425ae2ac70bbef0688921 \
  --kubeconfig=bootstrap.kubeconfig


# 配置关联

kubectl config set-context default \
  --cluster=kubernetes \
  --user=kubelet-bootstrap \
  --kubeconfig=bootstrap.kubeconfig
  
  
# 配置默认关联
kubectl config use-context default --kubeconfig=bootstrap.kubeconfig

# 拷贝生成的 bootstrap.kubeconfig 文件

mv bootstrap.kubeconfig /etc/kubernetes/

scp bootstrapping  10.90.26.3:/etc/kubernetes/

scp bootstrapping  10.90.26.4:/etc/kubernetes/



创建 kubelet.service 文件

# 创建 kubelet 目录

> 配置为 node 本机 IP

mkdir /var/lib/kubelet

vi /etc/systemd/system/kubelet.service


[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet \
  --cgroup-driver=cgroupfs \
  --hostname-override=kubernetes-2 \
  --pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0 \
  --experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
  --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
  --cert-dir=/etc/kubernetes/ssl \
  --cluster_dns=10.254.0.2 \
  --cluster_domain=cluster.local. \
  --hairpin-mode promiscuous-bridge \
  --allow-privileged=true \
  --fail-swap-on=false \
  --serialize-image-pulls=false \
  --logtostderr=true \
  --max-pods=512 \
  --v=1

[Install]
WantedBy=multi-user.target



# 如上配置:
kubernetes-2    本机hostname
10.254.0.2       预分配的 dns 地址
cluster.local.   为 kubernetes 集群的 domain
mirrorgooglecontainers/pause-amd64:3.0  这个是 pod 的基础镜像，既 gcr 的 gcr.io/google_containers/pause-amd64:3.0 镜像， 下载下来修改为自己的仓库中的比较快。


启动 kubelet


systemctl daemon-reload
systemctl enable kubelet
systemctl start kubelet
systemctl status kubelet



# 如果报错 请使用
journalctl -f -t kubelet  和 journalctl -u kubelet 来定位问题



配置 TLS 认证

# 查看 csr 的名称

[root@kubernetes-2 ~]# kubectl get csr
NAME                                                   AGE       REQUESTOR           CONDITION
node-csr-Pu4QYp3NAwlC6o8AG8iwdCl52CiqhjiSyrso3335JTs   1m        kubelet-bootstrap   Pending
node-csr-poycCHd7B8YPxc12EBgI3Rwe0wnDJah5uIGvQHzghVY   2m        kubelet-bootstrap   Pending


# 增加 认证

kubectl get csr | grep Pending | awk '{print $1}' | xargs kubectl certificate approve



验证 nodes

[root@kubernetes-2 ~]# kubectl get nodes
NAME            STATUS    ROLES     AGE       VERSION
kubernetes-2   Ready         12s       v1.9.4
kubernetes   Ready         17s       v1.9.4


# 成功以后会自动生成配置文件与密钥

# 配置文件

ls /etc/kubernetes/kubelet.kubeconfig   
/etc/kubernetes/kubelet.kubeconfig


# 密钥文件  这里注意如果 csr 被删除了，请删除如下文件，并重启 kubelet 服务

ls /etc/kubernetes/ssl/kubelet*
/etc/kubernetes/ssl/kubelet-client.crt  /etc/kubernetes/ssl/kubelet.crt
/etc/kubernetes/ssl/kubelet-client.key  /etc/kubernetes/ssl/kubelet.key



配置 kube-proxy

创建 kube-proxy 证书

# 证书方面由于我们node端没有装 cfssl
# 我们回到 master 端 机器 去配置证书，然后拷贝过来

[root@kubernetes-2 ~]# cd /opt/ssl


vi kube-proxy-csr.json

{
  "CN": "system:kube-proxy",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}



生成 kube-proxy 证书和私钥

/opt/local/cfssl/cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem \
  -ca-key=/etc/kubernetes/ssl/ca-key.pem \
  -config=/opt/ssl/config.json \
  -profile=kubernetes  kube-proxy-csr.json | /opt/local/cfssl/cfssljson -bare kube-proxy
  
# 查看生成
ls kube-proxy*
kube-proxy.csr  kube-proxy-csr.json  kube-proxy-key.pem  kube-proxy.pem

# 拷贝到目录

cp kube-proxy* /etc/kubernetes/ssl/

scp kube-proxy* 10.90.26.3:/etc/kubernetes/ssl/

scp kube-proxy* 10.90.26.4:/etc/kubernetes/ssl/


创建 kube-proxy kubeconfig 文件

# 配置集群

kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/ssl/ca.pem \
  --embed-certs=true \
  --server=https://10.90.26.2:6443 \
  --kubeconfig=kube-proxy.kubeconfig


# 配置客户端认证

kubectl config set-credentials kube-proxy \
  --client-certificate=/etc/kubernetes/ssl/kube-proxy.pem \
  --client-key=/etc/kubernetes/ssl/kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig
  
  
# 配置关联

kubectl config set-context default \
  --cluster=kubernetes \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig



# 配置默认关联
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

# 拷贝到需要的 node 端里

scp kube-proxy.kubeconfig 10.90.26.3:/etc/kubernetes/

scp kube-proxy.kubeconfig 10.90.26.4:/etc/kubernetes/


创建 kube-proxy.service 文件

# 创建 kube-proxy 目录

mkdir -p /var/lib/kube-proxy


vi /etc/systemd/system/kube-proxy.service

[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/usr/local/bin/kube-proxy \
  --bind-address=10.90.26.2 \
  --hostname-override=kubernetes-2 \
  --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig \
  --logtostderr=true \
  --v=1
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target



启动 kube-proxy


systemctl daemon-reload
systemctl enable kube-proxy
systemctl start kube-proxy
systemctl status kube-proxy



# 如果报错 请使用
journalctl -f -t kube-proxy  和 journalctl -u kube-proxy 来定位问题



至此 Master 端 与 Master and Node 端的安装完毕，正式使用还需要继续下一步，安装网络插件，才能部署应用。

node 节点配置参考master节点，只需配置kubelet,kube-proxy服务。

后续操作

安装cni插件：
到CNI 插件最新release页面下载cni-v0.6.0.tgz，解压后里面有很多插件，选择如下几个复制到项目 bin目录下

flannel用到的插件
bridge
flannel
host-local
loopback
portmap

修改kubelet配置，添加cni条目：

[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/root/local/bin/kubelet \
  --address=192.168.112.156 \
  --cgroup-driver=systemd \
  --hostname-override=192.168.112.156 \
  --pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0 \
  --experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
  --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
  --cert-dir=/etc/kubernetes/ssl \
  --cluster_dns=10.254.0.2 \
  --network-plugin=cni \
  --cni-conf-dir=/etc/cni/net.d \
  --cni-bin-dir=/usr/local/bin \
  --cluster_domain=cluster.local. \
  --hairpin-mode promiscuous-bridge \
  --allow-privileged=true \
  --fail-swap-on=false \
  --serialize-image-pulls=false \
  --logtostderr=true \
  --max-pods=512 \
  --v=3

[Install]
WantedBy=multi-user.target



重启kubelet，此时查看node状态，可能会显示notready，因为没有cni配置文件

 mkdir -p /etc/cni/net.d/
 vim  mkdir -p /etc/cni/net.d/cni-default.conf
 {
        "name": "mynet",
        "type": "bridge",
        "bridge": "mynet0",
        "isDefaultGateway": true,
        "ipMasq": true,
        "hairpinMode": true,
        "ipam": {
                "type": "host-local",
                "subnet": "10.254.0.0/18"
        
}
}


然后重启kubelet，查看kubelet状态，变为ready。

安装网络插件


安装网络插件(flannel,calico)
damenset方式运行


https://github.com/coreos/flannel/tree/master/Documentation/k8s-manifests
#修改net-conf.json配置。
#需要给docker配置代理才能拉取镜像。


安装kube-dns插件

官方yaml文件

安装dashboard插件

官方yaml文件

高可用方案


keepalived+haproxy
该方式需要使用vip实现，实际工作的apiserver只有一个，只需在master节点安装keepalived和haproxy，并配置代理即可。所有node配置连接api的地址为：虚拟ip:6443
使用这种方式需要将上面kubelet，kube-proxy的kubeconfig中apiserver改成https://vip:6443
nginx反向代理
使用nginx的反向代理方式实现，master节点不需要特除配置，直接安装n个master节点即可，在每一个node节点安装一个nginx，版本需要1.9以上，并启用stream模块，支持tcp的反向代理，所有node连接api:127.0.0.1:6443。
使用这种方式需要将上面kubelet，kube-proxy的kubeconfig中apiserver改成https://127.0.0.1:6443

本文链接：https://zhusl.com/post/kubernetes-1-9-install-templ.html，参与评论 »



使用kubeadm安装Kubernetes 1.9


之前kubernetes一直都是使用二进制的方式安装，从最开始的手动安装各个服务组件，到后来的ansible自动部署，二进制方式安装定制性空间比较大，能方便的构建高可用集群，kubeadm是官方出的最简单安装方法，暂时还没有提供高可用环境安装，且镜像都在谷歌，限于国内网络环境安装限制太多，所以一直都没有研究，今天终于手动尝试了一把，比较麻烦的过程就是最新rpm包的获取和谷歌镜像的获取，这里使用github+dockerhub和一台国外主机进行了一把繁琐的搭建过程，总结：初期准备麻烦，第一次安装完成后，之后构建确实还是非常简单快速的。





需要的rpm获取方式

使用境外服务器下载rpm包源代码，构建rpm包，源码地址为https://github.com/kubernetes/release（yum装的版本比较低，阿里云kubernetes源地址：https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64）

root@ip-172-31-21-37:~# git clone https://github.com/kubernetes/release.git
root@ip-172-31-21-37:~# cd release/
root@ip-172-31-21-37:~/release# ls
anago     build        changelog-update    debian  docs              gcb     Gopkg.lock  lib      prin           README-CI.md   README.md  rpm              toolbox
branchff  BUILD.bazel  code-of-conduct.md  defs    find_green_build  gcbmgr  Gopkg.toml  LICENSE  push-build.sh  README-gcb.md  relnotes   script-template  WORKSPACE
root@ip-172-31-21-37:~/release# cd rpm/
root@ip-172-31-21-37:~/release/rpm# ls
10-kubeadm-post-1.8.conf  10-kubeadm-pre-1.8.conf  docker-build.sh  Dockerfile  entry.sh  kubelet.service  kubelet.spec  output
root@ip-172-31-21-37:~/release/rpm# ./docker-build.sh 



构建完成，rpm包会生成在当前目的output目录

root@ip-172-31-21-37:~/release/rpm# cd output/
root@ip-172-31-21-37:~/release/rpm/output# ls
aarch64  armhfp  ppc64le  s390x  x86_64 


如果使用centos安装，直接打包x86_64，下载即可。

谷歌镜像获取方式

获取官方镜像，获取方式使用dockerhub的自动构建作为中转:
在github上创建一个仓库，存储dockerhub需要使用的Dockerfile，只需要指定FROM即可；

FROM  gcr.io/google_containers/etcd-amd64:3.1.10



镜像列表（镜像版本从kubeadm init 执行后/etc/kubernetes/manifests目录查看）：
暂时没有kube-proxy的镜像，但是版本一般都是一致的，可以直接使用相同版本号。

gcr.io/google_containers/etcd-amd64:3.1.10
gcr.io/google_containers/kube-apiserver-amd64:v1.9.2
gcr.io/google_containers/kube-controller-manager-amd64:v1.9.2
gcr.io/google_containers/kube-scheduler-amd64:v1.9.2


gcr.io/google_containers/kube-proxy-amd64:v1.9.2


其他依赖镜像，版本号参考官方网站（https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/）：

gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7 
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7 
gcr.io/google_containers/pause-amd64:3.0


dockerhub自动构建完成之后，就可配合加速器pull下来，重新打tag。

hub上的自动构建的镜像：

zhusl/etcd-amd64:3.1.10
zhusl/kube-apiserver-amd64:v1.9.2
zhusl/kube-controller-manager-amd64:v1.9.2
zhusl/kube-scheduler-amd64:v1.9.2
zhusl/kube-proxy-amd64:v1.9.2
zhusl/k8s-dns-sidecar-amd64:1.14.7
zhusl/k8s-dns-kube-dns-amd64:1.14.7 
zhusl/k8s-dns-dnsmasq-nanny-amd64:1.14.7 
zhusl/pause-amd64:3.0


重新本地tag后的镜像：

gcr.io/google_containers/etcd-amd64:3.1.10
gcr.io/google_containers/kube-apiserver-amd64:v1.9.2
gcr.io/google_containers/kube-controller-manager-amd64:v1.9.2
gcr.io/google_containers/kube-scheduler-amd64:v1.9.2
gcr.io/google_containers/kube-proxy-amd64:v1.9.2
gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7 
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7 
gcr.io/google_containers/pause-amd64:3.0


系统环境初始化

hostnamectl set-hostname kube.master
#停防火墙
systemctl stop firewalld
systemctl disable firewalld
systemctl disable firewalld
#关闭Swap
swapoff -a 
sed 's/.*swap.*/#&/' /etc/fstab
#关闭防火墙
systemctl disable firewalld && systemctl stop firewalld && systemctl status firewalld
#关闭Selinux
setenforce  0 
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/sysconfig/selinux 
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config 
sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/sysconfig/selinux 
sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config 

getenforce
#增加DNS
echo nameserver 114.114.114.114>>/etc/resolv.conf
#设置内核
cat <  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl -p /etc/sysctl.conf

#若问题
执行sysctl -p 时出现：
sysctl -p
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory

#解决方法：
modprobe br_netfilter
ls /proc/sys/net/bridge


安装开始下载下来的rpm包：

[root@kube ~]# cd x86_64
[root@kube x86_64]# ls
kubeadm-1.9.0-0.x86_64.rpm  kubectl-1.9.0-0.x86_64.rpm  kubelet-1.9.0-0.x86_64.rpm  kubernetes-cni-0.6.0-0.x86_64.rpm  repodata
[root@kube x86_64]# rpm -ivh *.rpm


另外还需要手动安装docker，我安装的是17.12.0-ce，默认使用的Cgroup Driver为cgroupfs，而kubelet默认使用的systemd，需要修改kubelet的配置参数：

[root@kube ~]# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 
......
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
......
ExecStart=/usr/bin/kubelet         


开始安装

执行kubeadm init –pod-network-cidr 172.23.0.0/16
> 如报连接失败，可以配置http代理，我用的自建的sockt5，docker服务也要加，再排除本地ip不使用代理，总之很繁琐；–pod-network-cidr是必须指定的，否则安装完成之后，网络插件fannel启动会报错‘node “kube.master” pod cidr not assigned’


后记：开始安装是只提前准备了四个服务镜像，应该是没有把镜像准备全才需要配置代理，因为安装过程中流量主要还是走谷歌的镜像地址。

附 docker服务添加代理方式：


[root@kube ~]# cat /etc/systemd/system/docker.service 
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io

[Service]
Environment="PATH=/root/local/bin:/bin:/sbin:/usr/bin:/usr/sbin"
Environment=HTTP_PROXY=http://172.31.8.194:1080/
Environment=HTTPS_PROXY=http://172.31.8.194:1080/
ExecStart=/root/local/bin/dockerd --log-level=error 
ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target




安装成功界面为：

  [root@kube ~]# kubeadm  init --pod-network-cidr 172.23.0.0/16
  [init] Using Kubernetes version: v1.9.2
  [init] Using Authorization modes: [Node RBAC]
  [preflight] Running pre-flight checks.
     [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.12.0-ce. Max validated version: 17.03
     [WARNING FileExisting-crictl]: crictl not found in system path
     [WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://172.31.8.194:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
     [WARNING HTTPProxyCIDR]: connection to "172.23.0.0/16" uses proxy "http://172.31.8.194:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
  [preflight] Starting the kubelet service
  [certificates] Generated ca certificate and key.
  [certificates] Generated apiserver certificate and key.
  [certificates] apiserver serving cert is signed for DNS names [kube.master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.90.24.159]
  [certificates] Generated apiserver-kubelet-client certificate and key.
  [certificates] Generated sa key and public key.
  [certificates] Generated front-proxy-ca certificate and key.
  [certificates] Generated front-proxy-client certificate and key.
  [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
  [kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
  [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
  [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
  [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
  [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
  [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
  [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
  [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
  [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
  [init] This might take a minute or longer if the control plane images have to be pulled.
  [apiclient] All control plane components are healthy after 32.502960 seconds
  [uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
  [markmaster] Will mark node kube.master as master by adding a label and a taint
  [markmaster] Master kube.master tainted and labelled with key/value: node-role.kubernetes.io/master=""
  [bootstraptoken] Using token: 77a732.41806f2bfca667e9
  [bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
  [bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
  [bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
  [bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
  [addons] Applied essential addon: kube-dns
  [addons] Applied essential addon: kube-proxy
  
  Your Kubernetes master has initialized successfully!
  
  To start using your cluster, you need to run the following as a regular user:
  
    mkdir -p /.kube
    sudo cp -i /etc/kubernetes/admin.conf /.kube/config
    sudo chown $(id -u):$(id -g) /.kube/config
  
  You should now deploy a pod network to the cluster.
  Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
    https://kubernetes.io/docs/concepts/cluster-administration/addons/
  
  You can now join any number of machines by running the following on each node
  as root:
  
    kubeadm join --token 77a732.41806f2bfca667e9 10.90.24.159:6443 --discovery-token-ca-cert-hash sha256:87e3ce011aa73f6e38d81a512b5f0e46847b2a7d6f6bfda9403af513608fa2e9


按提示创建kubectl配置

mkdir -p /.kube
sudo cp -i /etc/kubernetes/admin.conf /.kube/config
sudo chown $(id -u):$(id -g) /.kube/config


安装网络插件

kubeadm init默认是不安装网络插件的，需要自己手动安装，比较熟悉的有fannel，calico，这里使用flannel：

kubectl create -f   https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml


安装dashboard

之前用二进制方式安装过好多次了，这里直接使用的之前的yaml文件，也可以使用官网的，只是镜像要和之前的方式一样通过dockerhub进行中转。

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml


最终成果

[root@kube ~]# kubectl  get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
kube-system   etcd-kube.master                        1/1       Running   2          1h
kube-system   kube-apiserver-kube.master              1/1       Running   1          1h
kube-system   kube-controller-manager-kube.master     1/1       Running   0          1h
kube-system   kube-dns-6f4fd4bdf-9j8j6                3/3       Running   3          1h
kube-system   kube-flannel-ds-pn8jl                   1/1       Running   1          1h
kube-system   kube-proxy-rtgtv                        1/1       Running   2          1h
kube-system   kube-scheduler-kube.master              1/1       Running   0          1h
kube-system   kubernetes-dashboard-5b9649685d-n2k9k   1/1       Running   2          1h


本文链接：https://zhusl.com/post/kubeadm-install-1-9.html，参与评论 »


eiblog配置过程


安装

1、Eiblog 提供多个平台的压缩包下载，可到 Eiblog release 选择相应版本和平台下载。也可通过：

$ curl -L https://github.com/eiblog/eiblog/releases/download/v1.0.0/eiblog-v1.0.0.`uname -s | tr '[A-Z]' '[a-z]'`-amd64.tar.gz > eiblog-v1.0.0.`uname -s | tr '[A-Z]' '[a-z]'`-amd64.tar.gz


2、如果有幸你也是 Gopher，相信你会亲自动手，你可以通过：

$ git clone https://github.com/eiblog/eiblog.git


进行源码编译二进制文件运行。

3、如果你对 docker 技术也有研究的话，你也可以通过 docker 来安装：

$ docker pull registry.cn-hangzhou.aliyuncs.com/deepzz/eiblog:v1.2.0


注意，镜像内部没有提供 conf 文件夹内的配置内容，因为该内容定制化的需求过高。所以需要将 conf 目录映射出来，后面会具体说到。

本地测试

采用二进制包进行测试，在下载好可执行程序之后，我们可以开始本地测试的工作了。本地测试需要搭建两个服务 mongodb （必须）和 elasticsearch2.4.1（可选，搜索服务不可用）。

Eiblog 默认会连接 hostname 为 mongodb 和 elasticsearch 的地址，因此你需要将信息填入 /etc/hosts 下。假如你搭建的 mongodb 地址为 127.0.0.1:27017，elasticsearch 地址为 192.168.99.100:9200，如：

$ sudo vi /etc/hosts

# 在末尾加上两行
172.42.0.1      mongodb
192.168.99.100  elasticsearch


下面先看两个服务的搭建。

MongoDB 搭建

MongoDB 搭建，Mac 可通过 brew install mongo 进行安装，其它平台请查询资料。

Elasticsearch 搭建

Elasticsearch搭建，它的搭建要些许复杂。建议通过 docker 搭建。需要注意的是 es 自带的分析器对中文分词是不友好的，这里采用了 elasticsearch-analysis-ik 分词器。如果你想了解更多 Github 或则如何实现 博客站内搜索。


pull 镜像  docker pull elasticsearch:2.4.1。

添加环境变量 ES_JAVA_OPTS: "-Xms512m -Xmx512m"，除非你想让你的服务器爆掉。

映射相关目录：


   $PWD/conf/es/config:/usr/share/elasticsearch/config
   $PWD/conf/es/plugins:/usr/share/elasticsearch/plugins


博主已经准备好了必要的 es 配置文件，请将这四个目录映射至 eiblog 下的 conf 目录。如果你想查看更多，请查看 docker-compose.yml 文件。

总结一下，docker 运行 es 的命令为：

$ docker run -d --name eisearch \
    -p 9200:9200 \
    -e ES_JAVA_OPTS="-Xms512m -Xmx512m" \
    -v $PWD/conf/es/config:/usr/share/elasticsearch/config \
    -v $PWD/conf/es/plugins:/usr/share/elasticsearch/plugins \
    elasticsearch:2.4.1


之后执行 ./eiblog，咱们的 eiblog 就可以运行起来了。

通过 127.0.0.1:9000 可以进入博客首页，127.0.0.1:9000/admin/login 进入后台登陆，账号密码为 eiblog/conf/app.yml 下的 username 和 password。初始账号密码 deepz、deepzz。


注意，因为配置 conf/app.yml 均是博主自用配置。有些操作可能（如评论）会评论到我的博客，还请尽量避免，谢谢。


准备部署

如果你在感受了该博客的魅力了之后，仍然坚持想要搭建它。那么，恭喜你，获得的一款不想再更换的博客系统。下面，我们跟随步骤对部署流程进一步说明。

这里只提供 Docker 的相关部署说明。你如果需要其它方式部署，请参考该方式。

前提准备

这里需要准备一些必要的东西，如果你已准备好。请跳过。


一台服务器。
一个域名，国内服务器需备案。
有效的证书。通过开启 autocert 可自动申请更新证书。也可去七牛、qcloud 申请一年有效证书。
七牛CDN。博客只设计接入了 七牛cdn，相信该 CDN 服务商不会让你失望。
Disqus。作为博客评论系统，你得有翻墙的能力注册到该账号，具体配置我想又可以写一片博客了。简单说需要 shorname 和 public key。
Google Analystic。数据统计分析工具。
Superfeedr。加速 RSS 订阅。
Twitter。希望你能够有一个 twitter 账号。


是不是这么多要求，很费解。其实当初该博客系统只是为个人而设计的，是自己心中想要的那一款。博主些这篇文章不是想要多少人来用该博客，而是希望对那些追求至极的朋友说：你需要这款博客系统。

文件准备

博主是一个有强迫症的人，一些文件的路径我使用了固定的路径，请大家见谅。假如你的 cdn 域名为 st.example.com，你需要确定这些文件已经在你的 cdn 中，它们路径分别是：




文件
地址
描述





favicon.ico
st.example.com/static/img/favicon.ico
cdn 中的文件名为 static/img/favicon.ico。你也可以复制 favicon.ico 到 static 文件夹下，通过 example.com/favicon.ico 也是能够访问到。docker 用户可能需要重新打包镜像。



bg04.jpg
st.example.com/static/img/bg04.jpg
首页左侧的大背景图，需要更名请到 views/st_blog.css 修改。



avatar.jpg
st.example.com/static/img/avatar.jpg
头像



blank.gif
st.example.com/static/img/blank.gif
空白图片，下载



default_avatar.png
st.example.com/static/img/default_avatar.png
disqus 默认图片，下载



disqus.js
st.example.com/static/js/disqus_xxx.js
disqus 文件，你可以通过 https://short_name.disqus.com/embed.js 下载你的专属文件，并上传到七牛。更新配置文件 app.yml。





注意，cdn 提到的文件下载，请复制链接进行下载，因为博主使用了防盗链功能，还有：

  1、每次修改 app.yml 文件（如：更换 cdn 域名或更新头像），如果你不知道是否应该提高 staticversion 一个版本，那么最好提高一个 +1。

  2、每次手动修改 views 内的以 st_ 开头的文件，请将 app.yml 中的 staticversion 提高一个版本。


配置说明

走到这里，我相信只走到 60% 的路程。放弃还来得及。

这里会对 eiblog/conf 下的所有文件做说明，希望你做好准备。

├── app.yml                         # 博客配置文件
├── blackip.yml                     # 博客 ip 黑名单
├── es                              # elasticsearch 配置
│   ├── config                      # 配置文件
│   │   ├── analysis                # 同义词
│   │   ├── elasticsearch.yml       # 具体配置
│   │   ├── logging.yml             # 日志配置
│   │   └── scripts                 # 脚本文件夹
│   └── plugins                     # 插件文件夹
│       └── ik1.10.1                # ik 分词器
├── nginx                           # nginx 配置
│   ├── domain                      # 域名配置，nginx 会读区改文件夹下的 .conf 文件
│   │   └── eiblog.conf
│   ├── ip.blacklist                # nginx ip黑名单
│   └── nginx.conf                  # nginx 配置，请替换 nginx 原有配置
├── scts                            # ct 透明
│   ├── ecc
│   │   ├── aviator.sct
│   │   └── digicert.sct
│   └── rsa
│       ├── aviator.sct
│       └── digicert.sct
├── ssl                             # 证书相关文件，可参考 eiblog.conf 生成
│   ├── dhparams.pem
│   ├── domain.rsa.key
│   ├── domain.rsa.pem
│   ├── full_chained.pem
│   └── session_ticket.key
└── tpl                             # 模版文件
    ├── crossdomainTpl.xml
    ├── feedTpl.xml
    ├── opensearchTpl.xml
    ├── robotsTpl.xml
    └── sitemapTpl.xml





名称
描述





app.yml
整个程序的配置文件，里面已经列出了所有配置项的说明，这里不再阐述。



blackip.yml
如果没有使用 Nginx，博客内置 ip 过滤系统。



es
elasticsearch，非常强大的分布式搜索引擎，github 用的就是它。里面的配置基本不用修改，但 es/analysis/synonym.txt 是同义词，你可以照着已有的随意增加。scripts 是 es 的脚本文件夹



nginx
系统采用 nginx 作为代理(相信博客系统也不会独占一台服务器～)。请使用 nginx.conf 替换原 nginx 的配置。博客系统的配置文件是 domain/eiblog.conf，或则重命名(只要是满足*.conf)。eiblog.conf文件里面学问是最多的。或许你想一一弄懂，或许…。注意本配置需要更新 nginx 到最新版，openssl 更新到1.0.2j，具体请到 Jerry Qu 的 本博客 Nginx 配置之完整篇 查看，了解详情。



scts
存放 ct 文件。



ssl
这里存放了所有证书相关的内容。



tpl
模版相关，不用修改。




开始部署

docker

请确定你已经完成了上面所说的所有步骤，在本地已经测试成功。服务器上 MognoDB 和Elasticsearch 已经安装并已经运行成功。

首先，请将本地测试好的 conf 文件夹上传至服务器，建议存储到服务器 /data/eiblog 下。

$ tree /data/eiblog -L 1

├── conf


然后，将镜像 PULL 到服务器本地。

# PULL下Eiblog镜像
$ docker pull registry.cn-hangzhou.aliyuncs.com/deepzz/eiblog


最后，执行 docker run 命令，希望你能成功。

$ docker run -d --name eiblog --restart=always \
    --add-host disqus.com:23.235.33.134 \
    --add-host mongodb:172.42.0.1 \
    --add-host elasticsearch:192.168.99.100 \
    -p 9000:9000 \
    -e GODEBUG=netdns=cgo \
    -v /data/eiblog/logdata:/eiblog/logdata \
    -v /data/eiblog/conf:/eiblog/conf \
    registry.cn-hangzhou.aliyuncs.com/deepzz/eiblog


这里默认 MongDB 和 Elasticsearch 均为 docker 部署，且名称为eidb，eisearch。

nginx + docker

通过 Nginx+docker 部署，是博主推荐的方式。这里采用 Docker Compose 管理我们整个博客系统。

请确认你已经成功安装好 Nginx、docker、docker-compose。Nginx 请一定参照 Jerry Qu 的Nginx 配置完整篇。

首先，请将本地测试好的 conf，docker-compose.yml 文件夹和文件上传至服务器。conf 建议存储到服务器 /data/eiblog 下，docker-compose.yml 存放在你使用方便的地方。

$ tree /data/eiblog -L 1

├── conf

$ ls ~/

docker-compose.yml


然后，执行：

$ cd ~
$ docker-compose up -d


等待些许时间，成功运行。
本文链接：https://zhusl.com/post/eiblog-install-0.html，参与评论 »


ceph百科


OSD的Flags




Flag
Description
Use Cases





noin
Prevents OSDs from being treated as in the cluster.
Commonly used with noout to address flapping OSDs.通常和noout一起用防止OSD up/down跳来跳去



noout
Prevents OSDs from being treated as out of the cluster.
If the mon osd report timeout is exceeded and an OSD has not reported to the monitor, the OSD will get marked out. If this happens erroneously, you can set noout to prevent the OSD(s) from getting marked out while you troubleshoot the issue.MON在过了300秒(mon_osd_down_out_interval)后自动将down掉的OSD标记为out，一旦out数据就会开始迁移，建议在处理故障期间设置该标记，避免数据迁移。



noup
Prevents OSDs from being treated as up and running.
Commonly used with nodown to address flapping OSDs.通常和nodwon一起用解决OSD up/down跳来跳去



nodown
Prevents OSDs from being treated as down.
Networking issues may interrupt Ceph ‘heartbeat’ processes, and an OSD may be up but still get marked down. You can set nodown to prevent OSDs from getting marked down while troubleshooting the issue.网络问题可能会影响到Ceph进程之间的心跳，有时候OSD进程还在，却被其他OSD一起举报标记为down,导致不必要的损耗，如果确定OSD进程始终正常，可以设置nodown标记防止OSD被误标记为down.



full
Makes a cluster appear to have reached its full_ratio, and thereby prevents write operations.
If a cluster is reaching its full_ratio, you can pre-emptively set the cluster to full and expand capacity. NOTE: Setting the cluster to full will prevent write operations.如果集群快要满了，你可以预先将其设置为FULL，注意这个设置会停止写操作。(有没有效需要实际测试)



pause
Ceph will stop processing read and write operations, but will not affect OSD in, out, up or down statuses.
If you need to troubleshoot a running Ceph cluster without clients reading and writing data, you can set the cluster to pause to prevent client operations.这个标记会停止一切客户端的读写，但是集群依旧保持正常运行。



nobackfill
Ceph will prevent new backfill operations.
If you need to take an OSD or node down temporarily, (e.g., upgrading daemons), you can set nobackfill so that Ceph will not backfill while the OSD(s) is down.



norebalance
Ceph will prevent new rebalancing operations.
这个标记通常和上面的nobackfill和下面的norecover一起设置，在操作集群(挂掉OSD或者整个节点)时，如果不希望操作过程中数据发生恢复迁移等，可以设置这个标志，记得操作完后unset掉。



norecover
Ceph will prevent new recovery operations.
If you need to replace an OSD disk and don’t want the PGs to recover to another OSD while you are hotswapping disks, you can set norecover to prevent the other OSDs from copying a new set of PGs to other OSDs.也是在操作磁盘时防止数据发生恢复。



noscrub
Ceph will prevent new scrubbing operations.
If you want to prevent scrubbing (e.g., to reduce overhead during high loads, recovery, backfilling, rebalancing, etc.), you can set noscrub and/or nodeep-scrub to prevent the cluster from scrubbing OSDs.



nodeep-scrub
Ceph will prevent new deep scrubbing operations.
有时候在集群恢复时，scrub操作会影响到恢复的性能，和上面的noscrub一起设置来停止scrub。一般不建议打开。



notieragent
Ceph will disable the process that is looking for cold/dirty objects to flush and evict.
If you want to stop the tier agent process from finding cold objects to flush to the backing storage tier, you may set notieragent.停止tier引擎查找冷数据并下刷到后端存储。




Auth的CAPs




capabilities
Description





allow
Precedes access settings for a daemon.



r
Gives the user read access. Required with monitors to retrieve the CRUSH map.



w
Gives the user write access to objects.



x
Gives the user the capability to call class methods (i.e., both read and write) and to conduct auth operations on monitors.



class-read
Gives the user the capability to call class read methods. Subset of x.



class-write
Gives the user the capability to call class write methods. Subset of x.



*
Gives the user read, write and execute permissions for a particular daemon/pool, and the ability to execute admin commands.



profile osd
Gives a user permissions to connect as an OSD to other OSDs or monitors. Conferred on OSDs to enable OSDs to handle replication heartbeat traffic and status reporting.



profile bootstrap-osd
Gives a user permissions to bootstrap an OSD. Conferred on deployment tools such as ceph-disk, ceph-deploy, etc. so that they have permissions to add keys, etc. when bootstrapping an OSD.




PG的States

这段有点长，但是讲解很详尽，建议从头看到尾。




State
Description





Creating
When you create a pool, it will create the number of placement groups you specified. Ceph will echo creating when it is creating one or more placement groups. Once they are created, the OSDs that are part of a placement group’s Acting Set will peer. Once peering is complete, the placement group status should be active+clean, which means a Ceph client can begin writing to the placement group. 当创建一个池的时候，Ceph会创建一些PG(通俗点说就是在OSD上建目录)，处于创建中的PG就被标记为creating，当创建完之后，那些处于Acting集合(ceph pg map 1.0 osdmap e9395 pg 1.0 (1.0) -> up [27,4,10] acting [27,4,10]，对于pg 1.0 它的三副本会分布在osd.27,osd.4,osd.10上，那么这三个OSD上的pg 1.0就会发生沟通，确保状态一致)的PG就会进行peer，当peering完成后，也就是这个PG的三副本状态一致后，这个PG就会变成active+clean状态，也就意味着客户端可以进行写入操作了。



Peering
When Ceph is Peering a placement group, Ceph is bringing the OSDs that store the replicas of the placement group into agreement about the state of the objects and metadata in the placement group. When Ceph completes peering, this means that the OSDs that store the placement group agree about the current state of the placement group. However, completion of the peering process does NOT mean that each replica has the latest contents.peer.过程实际上就是让三个保存同一个PG副本的OSD对保存在各自OSD上的对象状态和元数据进行协商的过程，但是呢peer完成并不意味着每个副本都保存着最新的数据。Authoritative History Ceph will NOT acknowledge a write operation to a client, until all OSDs of the acting set persist the write operation. This practice ensures that at least one member of the acting set will have a record of every acknowledged write operation since the last successful peering operation.With an accurate record of each acknowledged write operation, Ceph can construct and disseminate a new authoritative history of the placement group—a complete, and fully ordered set of operations that, if performed, would bring an OSD’s copy of a placement group up to date.直到OSD的副本都完成写操作，Ceph才会通知客户端写操作完成。这确保了Acting集合中至少有一个副本，自最后一次成功的peer后。剩下的不好翻译因为没怎么理解。



Active
Once Ceph completes the peering process, a placement group may become active. The active state means that the data in the placement group is generally available in the primary placement group and the replicas for read and write operations.当PG完成了Peer之后，就会成为active状态，这个状态意味着主从OSD的该PG都可以提供读写了。



Clean
When a placement group is in the clean state, the primary OSD and the replica OSDs have successfully peered and there are no stray replicas for the placement group. Ceph replicated all objects in the placement group the correct number of times.这个状态的意思就是主从OSD已经成功peer并且没有滞后的副本。PG的正常副本数满足集群副本数。



Degraded
When a client writes an object to the primary OSD, the primary OSD is responsible for writing the replicas to the replica OSDs. After the primary OSD writes the object to storage, the placement group will remain in a degraded state until the primary OSD has received an acknowledgement from the replica OSDs that Ceph created the replica objects successfully。当客户端向一个主OSD写入一个对象时，主OSD负责向从OSD写剩下的副本，在主OSD写完后，在从OSD向主OSD发送ack之前，这个PG均会处于降级状态。The reason a placement group can be active+degraded is that an OSD may be active even though it doesn’t hold all of the objects yet. If an OSD goes down, Ceph marks each placement group assigned to the OSD as degraded. The OSDs must peer again when the OSD comes back online. However, a client can still write a new object to a degraded placement group if it is active.而PG处于active+degraded状态是因为一个OSD处于active状态但是这个OSD上的PG并没有保存所有的对象。当一个OSDdown了，Ceph会将这个OSD上的PG都标记为降级。当这个挂掉的OSD重新上线之后，OSD们必须重新peer。然后，客户端还是可以向一个active+degraded的PG写入的。If an OSD is down and the degraded condition persists, Ceph may mark the down OSD as out of the cluster and remap the data from the down OSD to another OSD. The time between being marked down and being marked out is controlled by mon osd down out interval, which is set to 300 seconds by default.当OSDdown掉五分钟后，集群会自动将这个OSD标为out,然后将缺少的PGremap到其他OSD上进行恢复以保证副本充足，这个五分钟的配置项是mon osd down out interval，默认值为300s。A placement group can also be degraded, because Ceph cannot find one or more objects that Ceph thinks should be in the placement group. While you cannot read or write to unfound objects, you can still access all of the other objects in the degraded placement group.PG如果丢了对象，Ceph也会将其标记为降级。你可以继续访问没丢的对象，但是不能读写已经丢失的对象了。Let’s say there are 9 OSDs with size = 3 (three copies of objects). If OSD number 9 goes down, the PGs assigned to OSD 9 go in a degraded state. If OSD 9 doesn’t recover, it goes out of the cluster and the cluster rebalances. In that scenario, the PGs are degraded and then recover to an active state.假设有9个OSD，三副本，然后osd.8挂了，在osd.8上的PG都会被标记为降级，如果osd.8不再加回到集群那么集群就会自动恢复出那个OSD上的数据，在这个场景中，PG是降级的然后恢复完后就会变成active状态。



Recovering
Ceph was designed for fault-tolerance at a scale where hardware and software problems are ongoing. When an OSD goes down, its contents may fall behind the current state of other replicas in the placement groups. When the OSD is back up, the contents of the placement groups must be updated to reflect the current state. During that time period, the OSD may reflect a recovering state.Ceph设计之初就考虑到了容错性，比如软硬件的错误。当一个OSD挂了，它所包含的副本内容将会落后于其他副本，当这个OSD起来之后， 这个OSD的数据将会更新到当前最新的状态。这段时间，这个OSD上的PG就会被标记为recover。Recovery isn’t always trivial, because a hardware failure might cause a cascading failure of multiple OSDs. For example, a network switch for a rack or cabinet may fail, which can cause the OSDs of a number of host machines to fall behind the current state of the cluster. Each one of the OSDs must recover once the fault is resolved.而recover是不容忽视的，因为有时候一个小的硬件故障可能会导致多个OSD发生一连串的问题。比如，如果一个机架或者机柜的路由挂了，会导致一大批OSD数据滞后，每个OSD在故障解决重新上线后都需要进行recover。Ceph provides a number of settings to balance the resource contention between new service requests and the need to recover data objects and restore the placement groups to the current state. The osd recovery delay start setting allows an OSD to restart, re-peer and even process some replay requests before starting the recovery process. The osd recovery threads setting limits the number of threads for the recovery process (1 thread by default). The osd recovery thread timeout sets a thread timeout, because multiple OSDs may fail, restart and re-peer at staggered rates. The osd recovery max active setting limits the number of recovery requests an OSD will entertain simultaneously to prevent the OSD from failing to serve . The osd recovery max chunk setting limits the size of the recovered data chunks to prevent network congestion.Ceph提供了一些配置项，用来解决客户端请求和数据恢复的请求优先级问题，这些配置参考上面加粗的字体吧。



Backfilling
When a new OSD joins the cluster, CRUSH will reassign placement groups from OSDs in the cluster to the newly added OSD. Forcing the new OSD to accept the reassigned placement groups immediately can put excessive load on the new OSD. Back filling the OSD with the placement groups allows this process to begin in the background. Once backfilling is complete, the new OSD will begin serving requests when it is ready. During the backfill operations, you may see one of several states: backfill_wait indicates that a backfill operation is pending, but isn’t underway yet; backfill indicates that a backfill operation is underway; and, backfill_too_full indicates that a backfill operation was requested, but couldn’t be completed due to insufficient storage capacity. When a placement group can’t be backfilled, it may be considered incomplete. Ceph provides a number of settings to manage the load spike associated with reassigning placement groups to an OSD (especially a new OSD). By default, osd_max_backfills sets the maximum number of concurrent backfills to or from an OSD to 10. The osd backfill full ratio enables an OSD to refuse a backfill request if the OSD is approaching its full ratio (85%, by default). If an OSD refuses a backfill request, the osd backfill retry interval enables an OSD to retry the request (after 10 seconds, by default). OSDs can also set osd backfill scan min and osd backfill scan max to manage scan intervals (64 and 512, by default).当一个新的OSD加入到集群后，CRUSH会重新规划PG将其他OSD上的部分PG迁移到这个新增的PG上。如果强制要求新OSD接受所有的PG迁入要求会极大的增加该OSD的负载。回填这个OSD允许进程在后端执行。一旦回填完成后，新的OSD将会承接IO请求。在回填过程中，你可能会看到如下状态：backfill_wait: 表明回填动作被挂起，并没有执行。backfill：表明回填动作正在执行。backfill_too_full：表明当OSD收到回填请求时，由于OSD已经满了不能再回填PG了。 imcomplete: 当一个PG不能被回填时，这个PG会被认为是不完整的。同样，Ceph提供了一系列的参数来限制回填动作，包括osd_max_backfills：OSD最大回填PG数。osd_backfill_full_ratio：当OSD容量达到默认的85%是拒绝回填请求。osd_backfill_retry_interval:字面意思。



Remmapped
When the Acting Set that services a placement group changes, the data migrates from the old acting set to the new acting set. It may take some time for a new primary OSD to service requests. So it may ask the old primary to continue to service requests until the placement group migration is complete. Once data migration completes, the mapping uses the primary OSD of the new acting set.当Acting集合里面的PG组合发生变化时，数据从旧的集合迁移到新的集合中。这段时间可能比较久，新集合的主OSD在迁移完之前不能响应请求。所以新主OSD会要求旧主OSD继续服务指导PG迁移完成。一旦数据迁移完成，新主OSD就会生效接受请求。



Stale
While Ceph uses heartbeats to ensure that hosts and daemons are running, the ceph-osd daemons may also get into a stuck state where they aren’t reporting statistics in a timely manner (e.g., a temporary network fault). By default, OSD daemons report their placement group, up thru, boot and failure statistics every half second (i.e., 0.5), which is more frequent than the heartbeat thresholds. If the Primary OSD of a placement group’s acting set fails to report to the monitor or if other OSDs have reported the primary OSD down, the monitors will mark the placement group stale.Ceph使用心跳来确保主机和进程都在运行，OSD进程如果不能周期性的发送心跳包，那么PG就会变成stuck状态。默认情况下，OSD每半秒钟汇汇报一次PG，up thru,boot, failure statistics等信息，要比心跳包更会频繁一点。如果主OSD不能汇报给MON或者其他OSD汇报主OSD挂了，Monitor会将主OSD上的PG标记为stale。When you start your cluster, it is common to see the stale state until the peering process completes. After your cluster has been running for awhile, seeing placement groups in the stale state indicates that the primary OSD for those placement groups is down or not reporting placement group statistics to the monitor.当启动集群后，直到peer过程完成，PG都会处于stale状态。而当集群运行了一段时间后，如果PG卡在stale状态，说明主OSD上的PG挂了或者不能给MON发送信息。



Misplaced
There are some temporary backfilling scenarios where a PG gets mapped temporarily to an OSD. When that temporary situation should no longer be the case, the PGs might still reside in the temporary location and not in the proper location. In which case, they are said to be misplaced. That’s because the correct number of extra copies actually exist, but one or more copies is in the wrong place.有一些回填的场景：PG被临时映射到一个OSD上。而这种情况实际上不应太久，PG可能仍然处于临时位置而不是正确的位置。这种情况下个PG就是misplaced。这是因为正确的副本数存在但是有个别副本保存在错误的位置上。Lets say there are 3 OSDs: 0,1,2 and all PGs map to some permutation of those three. If you add another OSD (OSD 3), some PGs will now map to OSD 3 instead of one of the others. However, until OSD 3 is backfilled, the PG will have a temporary mapping allowing it to continue to serve I/O from the old mapping. During that time, the PG is misplaced (because it has a temporary mapping) but not degraded (since there are 3 copies).Example:pg 1.5: up=acting: [0,1,2]  pg 1.5: up: [0,3,1] acting: [0,1,2]  Here, [0,1,2] is a temporary mapping, so the up set is not equal to the acting set and the PG is misplaced but not degraded since [0,1,2] is still three copies.pg 1.5: up=acting: [0,3,1]OSD 3 is now backfilled and the temporary mapping is removed, not degraded and not misplaced.



Incomplete
A PG goes into a incomplete state when there is incomplete content and peering fails i.e, when there are no complete OSDs which are current enough to perform recovery.当一个PG被标记为incomplete,说明这个PG内容不完整或者peer失败，比如没有一个完整的OSD用来恢复数据了。Lets say [1,2,3] is a acting OSD set and it switches to [1,4,3], then osd.1 will request a temporary acting set of [1,2,3] while backfilling 4. During this time, if 1,2,3 all go down, osd.4 will be the only one left which might not have fully backfilled. At this time, the PG will go incomplete indicating that there are no complete OSDs which are current enough to perform recovery.Alternately, if osd.4 is not involved and the acting set is simply [1,2,3] when 1,2,3 go down, the PG would likely go stale indicating that the mons have not heard anything on that PG since the acting set changed. The reason being there are no OSDs left to notify the new OSDs.




RBD的Formats

This setting only applies to format 2 images.




Formats
Bit
Descriptions





Layering
1
Layering enables you to use cloning.



Striping v2
2
Striping spreads data across multiple objects. Striping helps with parallelism for sequential read/write workloads.



Exclusive locking
4
When enabled, it requires a client to get a lock on an object before making a write.



Object map
8
Block devices are thin provisioned—meaning, they only store data that actually exists. Object map support helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for cloning, or importing and exporting a sparsely populated image.



Fast-diff
16
Fast-diff support depends on object map support and exclusive lock support. It adds another property to the object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot much faster.



Deep-flatten
32
Deep-flatten makes rbd flatten work on all the snapshots of an image, in addition to the image itself. Without it, snapshots of an image will still rely on the parent, so the parent will not be delete-able until the snapshots are deleted. Deep-flatten makes a parent independent of its clones, even if they have snapshots.




MON和OSD的配置参考

低配(RedHat提供)


OSD的：
| Criteria | Minimum Recommended|
| : –:| :–|
|Processor| 1x AMD64 and Intel 64|
|RAM | 2 GB of RAM per deamon|
|Volume Storage | 1x storage drive per daemon|
| Journal | 1x SSD partition per daemon (optional)|
| Network| 2x 1GB Ethernet NICs|

MON的：
| Criteria | Minimum Recommended|
| : –:| :–|
|Processor| 1x AMD64 and Intel 64|
|RAM | 1 GB of RAM per deamon|
|Disk Space | 10 GB per daemon|
| Network| 2x 1GB Ethernet NICs|


土豪配(Intel提供)



Ceph常用测试工具




Tool Name
Testing Scenario
Command line /GUI
OS Support
Popularity
Reference





FIO (Flexible I/O Tester)
major in Block level storage ex.SAN、DAS
Command line
Linux / Windows
High
fio github



IOmeter
major in Block level storage ex.SAN、DAS
GUI / Command line
Linux / Windows
High
Iometer and IOzone



iozone
File Level Storage ex.NAS
GUI / Command line
Linux / Windows
High
IOzone Filesystem Benchmark



dd File Level
Storage ex.NAS
Command line
Linux / Windows
High
dd over NFS testing



rados bench
Ceph Rados
Command line
Linux Only
Normal
BENCHMARK A CEPH STORAGECLUSTER



cosbench
Cloud Object Storage Service
GUI / Command line
Linux / Windows
High
COSBench - Cloud Object Storage Benchmark




转载：原文链接http://www.xuxiaopang.com/2016/11/11/doc-ceph-table/
本文链接：https://zhusl.com/post/ceph-all-tag.html，参与评论 »


ceph-jewel安装文档


节点介绍


mon节点：hp-server-6 、hp-server-7 、hp-server-8

osd节点：hp-server-1 、hp-server-2 、hp-server-3 、hp-server-4 、hp-server-5 、hp-server-6、hp-server-7 、hp-server-8

Ceph安装

准备

每台主机设置主机名，如WH-hp-server-1

设置hosts vim etc/hosts
  172.16.10.1 WH-hp-server-1

部署节点到其他节点使用comall用户ssh无密码登录

安装ntp服务器，关闭selinux和防火墙。


部署节点更改ceph源

centos7源

（此ceph源为阿里云提供，只支持centos7）

vim /etc/yum.repos.d/ceph.repo 
[ceph] 
name=ceph 
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/x86_64/ 
gpgcheck=0 
priority=1 
[ceph-noarch] 
name=cephnoarch 
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/ 
gpgcheck=0 
priority=1 
[ceph-source] 
name=Ceph source packages 
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/SRPMS 
enabled=1 
gpgcheck=1 
type=rpm-md 
gpgkey=https://download.ceph.com/keys/release.asc 
priority=1 


centos6源

（强烈不建议在centos6下安装)
rpm -ivh http://ceph.com/rpm-hammer/el6/noarch/ceph-release-1-1.el7.noarch.rpm
（实际安装过程中报错，需要修改一个参数，查看ceph-deploy代码，结合报错修改
/usr/lib/python2.6/site-packages/ceph_deploy/install.py

用 Ceph 的最新主稳定版名字替换 {ceph-stable-release} （如 firefly ），用你的Linux发行版名字替换 {distro} （如 el6 为 CentOS 6 、 el7 为 CentOS 7 、 rhel6 为 Red Hat 6.5 、 rhel7 为 Red Hat 7 、 fc19 是 Fedora 19 、 fc20 是 Fedora 20 ）。最后保存到/etc/yum.repos.d/ceph.repo

[ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-{ceph-release/{distro}/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc}


部署节点安装 ceph-deploy

（该部只需要在部署节点执行，之后卸载epel-release,ceph-release）

sudo yum update && sudo yum install ceph-deploy


每个节点

sudo yum install yum-plugin-priorities


创建mon节点 ；

sudo  ceph-deploy new WH-hp-server-7


为其他节点安装Ceph（下载速度慢也会报错，可以先下载包，在所有节点安装后在执行，注意要所有节点先配置yum源ceph、epel后安装，安装完成卸载两个yum源，再执行以下命令）

sudo  ceph-deploy install  WH-hp-server-1 WH-hp-server-2  WH-hp- server-4 WH-hp-server-5 WH-hp-server-6 WH-hp-server-7 WH-hp- server-8


生成Ceph的监控秘钥。

sudo  ceph-deploy mon create-initial


拷贝秘钥至其他节点 （报错需要加   –overwrite-conf）

sudo ceph-deploy --overwrite-conf admin node1 node2 node3

sudo  ceph-deploy admin  WH-hp-server-2  WH-hp-server-4 WH-hp- server-5 WH-hp-server-6 WH-hp-server-7 WH-hp-server-8 


每个节点

sudo chmod  +r /etc/ceph/ceph.client.admin.keyring


往集群里面增加OSD节点

centos7

不用对硬盘执行任何操作
查看WH-hp-server-8硬盘

sudo  ceph-deploy  disk   list    WH-hp-server-8



格式化WH-hp-server-8硬盘

sudo  ceph-deploy  disk   zap    WH-hp-server-8:sdb



创建并增加OSD节点（create = prepare初始化 then activate激活）

sudo  ceph-deploy  osd   create    WH-hp-server-8:sdb



centos6.5

centos采用手动创建分区的方式增加osd，对新添加硬盘进行分区例如sdc，分成sdc1 sdc2，sdc1不做任何操作，sdc2格式化为xfs（mkfs.xfs   -f  -i size=2048  /dev/sdc2），修改/etc/fstab 挂载参数 （/dev/sdc2     /test      xfs inode64,noatime    0  0）,执行mount -a
初始化

ceph-deploy  -overwrite-conf  osd prepare centos4:/test:/dev/sdc1


激活

ceph-deploy   --overwrite-conf  osd activate centos4:/test:/dev/sdc1


从集群里面删除osd节点

查看集群中的节点

ceph osd tree



剔除问题节点的osd

ceph osd out osd.0


从crush map 中移除该osd

ceph osd crush remove osd.0


删除节点认证

ceph auth del osd.0


删除节点

sudo ceph osd   rm 0


或使用ceph-deploy命令：ceph-deploy osd prepare host:osd目录

增加monitor节点

sudo ceph-deploy  mon  add(create)   WH-hp-server-6



注意：增加新的监控节点之后，需要在每台osd的配置文件(/etc/ceph/ceph.conf)中把心mon节点写进去，然后重启osd


ceph池的操作

创建pool

sudo ceph osd pool create  cloudstack  256 256


查看pool

sudo ceph osd lspools


删除pool(输入两次pool名)

sudo ceph osd pool delete cloudstack   cloudstack  --yes-i-really-really-mean-it


ceph块操作

创建块设备

sudo rbd create  ceph_nfs –size 1024000
本文链接：https://zhusl.com/post/ceph-10-0-2-install-2.html，参与评论 »


如何管理自己的知识.md


资料→知识→智慧→思想

资料的收集只能增长硬盘的储存空间和书架的高度并不能转化成为知识，资料的梳理、学习、理解、吸收、沉淀才能转化成知识。知识的管理产生智慧，智慧的融合产生思想。要成为一个有思想的的人，首先要学习如何管理自己的知识。



1、构建自己的知识管理系统

个人知识积累经过若干阶段


积累过程：在这个过程总我们通过各种方式获取零碎或系统知识。如果这些知识没有积累而又长期不适用就会导致知识丢失。因此学习者要有目的的进行知识积累。比较常用的知识积累工具有：笔记类软件（印象笔记等）| 个人文档系统
共享过程：知识的真正作用在于共享和利用。我们不仅要有知识，而且要让更多的人知道你有某个方面的知识。知道你知道的人越多，你的价值也就最大。知识具有不变性，你共享的知识并不会因为你的共享而消失，于此同时作为交换你会获得从其他人那里得到的更多的知识。
交流过程 ：知识管理不仅要积累和共享其精髓在于与人交流，扩大知识的影响力。在于其他人交流过程中更加深入对知识的理解，最大程度的使信息和知识在不断交流中得到融合和提高。
要想更好利用与管理自己的知识，就要学会如何构建自己的知识管理系统。将大脑中零碎的信息加以组织，利用WIKI可以使各个零碎知识通过网络结构使若干知识节点相互连接。这种方式可将自己隐性的知识变成可以共享的显性知识，在知识显化的过程中，我们对现有知识进行有效的加工和萃取，最终激发自身隐形知识，实现个人创新。


2、分享知识

自由分享是互联网精神，将自己熟悉领域的知识与技术分享给别人带来帮助是我感到最光荣而且最快乐的事情。这个过程不仅自己梳理了知识，而且将知识显式输出，能够使更多的人从中受益。

3、积累技术

IT行业技术众多，学习过程中需要积累大量经验。这些经验如果不常使用就很容易遗忘。因此我们需要对学习工作中遇到的问题，解决问题的步骤以及在解决问题的过程中学习到的东西记录下来。一方面方便自己再次遇到问题能够快速解决，另一方面帮助别人快速锁定和解决问题。
本文链接：https://zhusl.com/post/how-to-mamage-knowledge.html，参与评论 »


nginx配置完整篇


本博客 Nginx 配置之完整篇

本文贴出博客的nginx完整配置。

更新说明


2017.04.07：将 Nginx 更新到 1.13.4；


安装依赖

我的 VPS 系统是Centos 7.3。

首先安装依赖库和编译要用到的工具：

sudo yum install build-essential libpcre3 libpcre3-dev zlib1g-dev unzip git


获取必要组件

nginx-ct

nginx-ct 模块用于启用 Certificate Transparency 功能。直接从 github 上获取源码：

wget -O nginx-ct.zip -c https://github.com/grahamedgecombe/nginx-ct/archive/v1.3.2.zip
unzip nginx-ct.zip


ngx_brotli

本站支持 Google 开发的 Brotli 压缩格式，它通过内置分析大量网页得出的字典，实现了更高的压缩比率，同时几乎不影响压缩 / 解压速度。

以下是让 Nginx 支持 Brotli 所需准备工作，这些工作是一次性的。首先安装 libbrotli：

sudo yum  install autoconf libtool automake

git clone https://github.com/bagder/libbrotli
cd libbrotli

# 如果提示 error: C source seen but 'CC' is undefined，可以在 configure.ac 最后加上 AC_PROG_CC
./autogen.sh

./configure
make
sudo make install

cd  ../


默认 libbrotli 装在 /usr/local/lib/libbrotlienc.so.1，如果后续启动 Nginx 时提示找不到这个文件，那么可以把它软链到 /lib 或者 /usr/lib 目录。如果还有问题，请参考这篇文章查找解决方案。

接下来获取 ngx_brotli 源码：

git clone https://github.com/google/ngx_brotli.git
cd ngx_brotli

git submodule update --init

cd ../


Cloudflare 补丁

本站主要使用了 Cloudflare 的 ChaCha20/Poly1305 for OpenSSL 补丁，以及 Dynamic TLS Records for Nginx 补丁。先来获取补丁文件：

git clone https://github.com/cloudflare/sslconfig.git


OpenSSL

由于系统自带的 OpenSSL 库往往不够新，推荐在编译 Nginx 时指定 OpenSSL 源码目录，而不是使用系统自带的版本，这样更可控。

本站目前使用 OpenSSL 1.0.2k：

wget -O openssl.tar.gz -c https://github.com/openssl/openssl/archive/OpenSSL_1_0_2k.tar.gz
tar zxf openssl.tar.gz
mv openssl-OpenSSL_1_0_2k/ openssl


打上 ChaCha20/Poly1305 补丁：

cd openssl
patch -p1 < ../sslconfig/patches/openssl__chacha20_poly1305_draft_and_rfc_ossl102j.patch 

cd ../


编译并安装 Nginx

接着就可以获取 Nginx 源码，并打上 Dynamic TLS Records 补丁：

wget -c https://nginx.org/download/nginx-1.13.4.tar.gz
tar zxf nginx-1.13.4.tar.gz

cd nginx-1.13.4/
patch -p1 < ../sslconfig/patches/nginx__1.11.5_dynamic_tls_records.patch

cd ../


编译和安装：

cd nginx-1.13.4/
./configure --add-module=../ngx_brotli --add-module=../nginx-ct-1.3.2 --with-openssl=../openssl --with-http_v2_module --with-http_ssl_module --with-http_gzip_static_module

make
sudo make install


除了 http_v2 和 http_ssl 这两个 HTTP/2 必备模块之外，我还额外启用了 http_gzip_static，需要启用哪些模块需要根据自己实际情况来决定（注：从 Nginx 1.11.5 开始，ipv6 模块已经内置，故 --with-ipv6 配置项已被移除）。

以上步骤会把 Nginx 装到 /usr/local/nginx/ 目录，如需更改路径可以在 configure 时指定。

管理脚本与自启动

为了方便管理 Nginx 服务，再创建一个管理脚本：

sudo vim /etc/init.d/nginx


输入以下内容：

#! /bin/sh

### BEGIN INIT INFO
# Provides:          nginx
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: starts the nginx web server
# Description:       starts nginx using start-stop-daemon
### END INIT INFO

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/nginx/sbin/nginx
NAME=nginx
DESC=nginx

test -x $DAEMON || exit 0

# Include nginx defaults if available
if [ -f /etc/default/nginx ] ; then
  . /etc/default/nginx
fi

set -e

. /lib/lsb/init-functions

case "$1" in
  start)
    echo -n "Starting $DESC: "
    start-stop-daemon --start --quiet --pidfile /usr/local/nginx/logs/$NAME.pid \
        --exec $DAEMON -- $DAEMON_OPTS || true
    echo "$NAME."
    ;;
  stop)
    echo -n "Stopping $DESC: "
    start-stop-daemon --stop --quiet --pidfile /usr/local/nginx/logs/$NAME.pid \
        --exec $DAEMON || true
    echo "$NAME."
    ;;
  restart|force-reload)
    echo -n "Restarting $DESC: "
    start-stop-daemon --stop --quiet --pidfile \
        /usr/local/nginx/logs/$NAME.pid --exec $DAEMON || true
    sleep 1
    start-stop-daemon --start --quiet --pidfile \
        /usr/local/nginx/logs/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS || true
    echo "$NAME."
    ;;
  reload)
    echo -n "Reloading $DESC configuration: "
    start-stop-daemon --stop --signal HUP --quiet --pidfile /usr/local/nginx/logs/$NAME.pid \
        --exec $DAEMON || true
    echo "$NAME."
    ;;
  status)
    status_of_proc -p /usr/local/nginx/logs/$NAME.pid "$DAEMON" nginx && exit 0 || exit $?
    ;;
  *)
    N=/etc/init.d/$NAME
    echo "Usage: $N {start|stop|restart|reload|force-reload|status}" >&2
    exit 1
    ;;
esac

exit 0


增加执行权限：

sudo chmod a+x /etc/init.d/nginx


现在管理 Nginx 只需使用以下命令即可：

sudo service nginx start|stop|restart|reload


如果要开机自动启动 Nginx，请执行以下命令：

sudo update-rc.d -f nginx defaults


Nginx 全局配置

到此为止，Nginx 已经安装完毕。再来修改一下它的全局配置，打开 /usr/local/nginx/conf/nginx.conf，新增或修改以下内容：

http {
    include            mime.types;
    default_type       application/octet-stream;

    charset            UTF-8;

    sendfile           on;
    tcp_nopush         on;
    tcp_nodelay        on;

    keepalive_timeout  60;

    #... ...#

    gzip               on;
    gzip_vary          on;

    gzip_comp_level    6;
    gzip_buffers       16 8k;

    gzip_min_length    1000;
    gzip_proxied       any;
    gzip_disable       "msie6";

    gzip_http_version  1.0;

    gzip_types         text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript image/svg+xml;

    # 如果编译时添加了 ngx_brotli 模块，需要增加 brotli 相关配置
    brotli             on;
    brotli_comp_level  6;
    brotli_types       text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript image/svg+xml;

    #... ...#

    include            /home/jerry/www/nginx_conf/*.conf;
}


最后的 include 用来加载我个人目录下的配置文件，这样今后创建和修改站点配置就不需要再使用 sudo 权限了。

要让网站支持浏览器通过 HTTP/2 访问必须先部署 HTTPS，要部署 HTTPS 必须先有合法的证书。本博客目前在用 RapidSSL 单域名证书，在 NameCheap 购买。另外，我还申请了 Let’s Encrypt 的免费证书备用。一般情况下，个人使用 Let’s Encrypt 的免费证书就足够了，还可以节省一笔开销。

要申请 Let’s Encrypt 证书，推荐使用 Neilpang/acme.sh 这个小巧无依赖的命令行工具，或者参考我的这篇文章：Let’s Encrypt，免费好用的 HTTPS 证书。

注：Let’s Encrypt 已于 2016 年 3 月 26 日修复 Windows XP 下的兼容问题，本站也第一时间切换到 Let’s Encrypt 证书。

WEB 站点配置

以下是本博客站点完整配置：

server {
    listen               443 ssl http2 fastopen=3 reuseport;

    # 如果你使用了 Cloudflare 的 HTTP/2 + SPDY 补丁，记得加上 spdy
    # listen               443 ssl http2 spdy fastopen=3 reuseport;

    server_name          www.imququ.com imququ.com;
    server_tokens        off;

    include              /home/jerry/www/nginx_conf/ip.blacklist;

    # https://imququ.com/post/certificate-transparency.html#toc-2
    ssl_ct               on;
    ssl_ct_static_scts   /home/jerry/www/scts;

    # 中间证书 + 站点证书
    ssl_certificate      /home/jerry/www/ssl/chained.pem;

    # 创建 CSR 文件时用的密钥
    ssl_certificate_key  /home/jerry/www/ssl/domain.key;

    # openssl dhparam -out dhparams.pem 2048
    # https://weakdh.org/sysadmin.html
    ssl_dhparam          /home/jerry/www/ssl/dhparams.pem;

    # https://github.com/cloudflare/sslconfig/blob/master/conf
    ssl_ciphers                EECDH+CHACHA20:EECDH+CHACHA20-draft:EECDH+AES128:RSA+AES128:EECDH+AES256:RSA+AES256:EECDH+3DES:RSA+3DES:!MD5;

    # 如果启用了 RSA + ECDSA 双证书，Cipher Suite 可以参考以下配置：
    # ssl_ciphers              EECDH+CHACHA20:EECDH+CHACHA20-draft:EECDH+ECDSA+AES128:EECDH+aRSA+AES128:RSA+AES128:EECDH+ECDSA+AES256:EECDH+aRSA+AES256:RSA+AES256:EECDH+ECDSA+3DES:EECDH+aRSA+3DES:RSA+3DES:!MD5;

    ssl_prefer_server_ciphers  on;

    ssl_protocols              TLSv1 TLSv1.1 TLSv1.2;

    ssl_session_cache          shared:SSL:50m;
    ssl_session_timeout        1d;

    ssl_session_tickets        on;

    # openssl rand 48 > session_ticket.key
    # 单机部署可以不指定 ssl_session_ticket_key
    ssl_session_ticket_key     /home/jerry/www/ssl/session_ticket.key;

    ssl_stapling               on;
    ssl_stapling_verify        on;

    # 根证书 + 中间证书
    # https://imququ.com/post/why-can-not-turn-on-ocsp-stapling.html
    ssl_trusted_certificate    /home/jerry/www/ssl/full_chained.pem;

    resolver                   114.114.114.114 valid=300s;
    resolver_timeout           10s;

    access_log                 /home/jerry/www/nginx_log/imququ_com.log;

    if ($request_method !~ ^(GET|HEAD|POST|OPTIONS)$ ) {
        return           444;
    }

    if ($host != 'imququ.com' ) {
        rewrite          ^/(.*)$  https://imququ.com/$1 permanent;
    }

    location ~* (robots\.txt|favicon\.ico|crossdomain\.xml|google4c90d18e696bdcf8\.html|BingSiteAuth\.xml)$ {
        root             /home/jerry/www/imququ.com/www/static;
        expires          1d;
    }

    location ^~ /static/uploads/ {
        root             /home/jerry/www/imququ.com/www;
        add_header       Access-Control-Allow-Origin *;

        set              $expires_time max;

        valid_referers   blocked none server_names *.qgy18.com *.inoreader.com feedly.com *.feedly.com www.udpwork.com theoldreader.com digg.com *.feiworks.com *.newszeit.com r.mail.qq.com yuedu.163.com *.w3ctech.com;
        if ($invalid_referer) {
            set          $expires_time -1;
            return       403;
        }

        expires          $expires_time;
    }

    location ^~ /static/ {
        root             /home/jerry/www/imququ.com/www;
        add_header       Access-Control-Allow-Origin *;      
        expires          max;
    }

    location ^~ /admin/ {
        proxy_http_version       1.1;

        add_header               Strict-Transport-Security "max-age=31536000; includeSubDomains; preload";

        # DENY 将完全不允许页面被嵌套，可能会导致一些异常。如果遇到这样的问题，建议改成 SAMEORIGIN
        # https://imququ.com/post/web-security-and-response-header.html#toc-1
        add_header               X-Frame-Options DENY;

        add_header               X-Content-Type-Options nosniff;

        proxy_set_header         X-Via            QingDao.Aliyun;
        proxy_set_header         Connection       "";
        proxy_set_header         Host             imququ.com;
        proxy_set_header         X-Real_IP        $remote_addr;
        proxy_set_header         X-Forwarded-For  $proxy_add_x_forwarded_for;

        proxy_pass               http://127.0.0.1:9095;
    }
    
    location / {
        proxy_http_version       1.1;

        add_header               Strict-Transport-Security "max-age=31536000; includeSubDomains; preload";
        add_header               X-Frame-Options deny;
        add_header               X-Content-Type-Options nosniff;
        add_header               Content-Security-Policy "default-src 'none'; script-src 'unsafe-inline' 'unsafe-eval' blob: https:; img-src data: https: http://ip.qgy18.com; style-src 'unsafe-inline' https:; child-src https:; connect-src 'self' https://translate.googleapis.com; frame-src https://disqus.com https://www.slideshare.net";
        add_header               Public-Key-Pins 'pin-sha256="YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg="; pin-sha256="aef6IF2UF6jNEwA2pNmP7kpgT6NFSdt7Tqf5HzaIGWI="; max-age=2592000; includeSubDomains';
        add_header               Cache-Control no-cache;

        proxy_ignore_headers     Set-Cookie;

        proxy_hide_header        Vary;
        proxy_hide_header        X-Powered-By;

        proxy_set_header         X-Via            QingDao.Aliyun;
        proxy_set_header         Connection       "";
        proxy_set_header         Host             imququ.com;
        proxy_set_header         X-Real_IP        $remote_addr;
        proxy_set_header         X-Forwarded-For  $proxy_add_x_forwarded_for;

        proxy_pass               http://127.0.0.1:9095;
    }
}

server {
    server_name       www.imququ.com imququ.com;
    server_tokens     off;

    access_log        /dev/null;

    if ($request_method !~ ^(GET|HEAD|POST)$ ) {
        return        444;
    }

    location ^~ /.well-known/acme-challenge/ {
        alias         /home/jerry/www/challenges/;
        try_files     $uri =404;
    }

    location / {
        rewrite       ^/(.*)$ https://imququ.com/$1 permanent;
    }
}


以上配置中的一些关键点分散在以下这些文章中：


 Nginx 配置之性能篇；
 Nginx 配置之安全篇；
TLS 握手优化详解；
关于启用 HTTPS 的一些经验分享（一）；
关于启用 HTTPS 的一些经验分享（二）；
Certificate Transparency 那些事；
HTTP Public Key Pinning 介绍；
从无法开启 OCSP Stapling 说起；


日志自动切分

上一节中，我在 Nginx 的站点配置中通过 access_log 指定了访问日志的存放位置。Nginx 启动后，会持续往这个文件写入访问日志。如果网站访问量很大，最好能按照指定大小或者时间间隔切分日志，便于后期管理和排查问题。

虽然本站访问量不大，但我也使用了 logrotate 工具对访问日志进行了按天切分。

大多数 Linux 发行版都内置了 logrotate，只需新建一个配置文件即可，例如：

sudo vim /etc/logrotate.d/nginx

/home/jerry/www/nginx_log/*.log {
    su root root
    daily
    rotate 5
    missingok
    notifempty
    sharedscripts
    dateext
    postrotate
        if [ -f /usr/local/nginx/logs/nginx.pid ]; then
            kill -USR1 `cat /usr/local/nginx/logs/nginx.pid`
        fi
    endscript
}


配置中具体指令的含义可以查看手册。配置好之后，可以手动执行一下，看是否正常：

sudo /usr/sbin/logrotate -f /etc/logrotate.d/nginx


如果一切无误，后续 Nginx 的访问日志就会自动按天切分，并以年月日做为文件后缀，一目了然。

在我的 Centos 7.3 上，/etc/logrotate.d/ 目录中的日志切分任务会由 /etc/cron.daily/logrotate 来确保每天执行一次。查看 /etc/crontab 会发现 cron.daily 任务会在每天 6:25 执行，这就是 logrotate 每天切分日志的时机。

如果想要让日志正好在零点被切分，可以修改 cron.daily 的执行时机，也可以把自己的 logrotate 配置文件放在 /etc/logrotate.d/ 之外，再手动配置 crontab 规则。

原文链接：https://imququ.com/post/my-nginx-conf.html.
本文链接：https://zhusl.com/post/http-2-0.html，参与评论 »


nginx添加对 TLS 1.3 的支持


几个月前，我在升级本博客所用 Nginx 时，顺手加上了对 TLS 1.3 的支持，本文贴出详细的步骤和注意事项。有关 TLS 1.3 的介绍可以看 CloudFlare 的这篇文章：An overview of TLS 1.3 and Q&A。需要注意目前 Chrome 和 Firefox 支持的是 TLS 1.3 draft 18，暂时不要用在生产环境。



安装依赖

我的 VPS 系统是 Ubuntu 16.04.3 LTS，如果你使用其它发行版，与包管理有关的命令请自行调整。

首先安装依赖库和编译要用到的工具：

sudo apt-get install build-essential libpcre3 libpcre3-dev zlib1g-dev unzip git


获取必要组件

nginx-ct 和 ngx-brotli 与本文主题无关，不过都是常用的 Nginx 组件，一并记录在这里。

nginx-ct

nginx-ct 模块用于启用 Certificate Transparency 功能。直接从 github 上获取源码：

wget -O nginx-ct.zip -c https://github.com/grahamedgecombe/nginx-ct/archive/v1.3.2.zip
unzip nginx-ct.zip


ngx_brotli

本站支持 Google 开发的 Brotli 压缩格式，它通过内置分析大量网页得出的字典，实现了更高的压缩比率，同时几乎不影响压缩 / 解压速度。

以下是让 Nginx 支持 Brotli 所需准备工作，这些工作是一次性的。首先安装 libbrotli：

sudo apt-get install autoconf libtool automake

git clone https://github.com/bagder/libbrotli
cd libbrotli

# 如果提示 error: C source seen but 'CC' is undefined，可以在 configure.ac 最后加上 AC_PROG_CC
./autogen.sh

./configure
make
sudo make install

cd  ../


默认 libbrotli 装在 /usr/local/lib/libbrotlienc.so.1，如果后续启动 Nginx 时提示找不到这个文件，那么可以把它软链到 /lib 或者 /usr/lib 目录。如果还有问题，请参考这篇文章查找解决方案。

接下来获取 ngx_brotli 源码：

git clone https://github.com/google/ngx_brotli.git
cd ngx_brotli

git submodule update --init

cd ../


OpenSSL

为了支持 TLS 1.3，需要使用 OpenSSL 1.1.1 的 draft-18 分支：

git clone -b tls1.3-draft-18 --single-branch https://github.com/openssl/openssl.git openssl


编译并安装 Nginx

接着就可以获取 Nginx 源码，编译并安装：

wget -c https://nginx.org/download/nginx-1.13.3.tar.gz
tar zxf nginx-1.13.3.tar.gz

cd nginx-1.13.3/

./configure --add-module=../ngx_brotli --add-module=../nginx-ct-1.3.2 --with-openssl=../openssl --with-openssl-opt='enable-tls1_3 enable-weak-ssl-ciphers' --with-http_v2_module --with-http_ssl_module --with-http_gzip_static_module

make
sudo make install


enable-tls1_3 是让 OpenSSL 支持 TLS 1.3 的关键选项；而 enable-weak-ssl-ciphers 的作用是让 OpenSSL 继续支持 3DES 等不安全的 Cipher Suite，如果你打算继续支持 IE8，才需要加上这个选项。

除了 http_v2 和 http_ssl 这两个 HTTP/2 必备模块之外，我还额外启用了 http_gzip_static，需要启用哪些模块需要根据自己实际情况来决定。

以上步骤会把 Nginx 装到 /usr/local/nginx/ 目录，如需更改路径可以在 configure 时指定。

WEB 站点配置

在 Nginx 的站点配置中，以下两个参数需要修改：

ssl_protocols              TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # 增加 TLSv1.3
ssl_ciphers                TLS13-AES-256-GCM-SHA384:TLS13-CHACHA20-POLY1305-SHA256:TLS13-AES-128-GCM-SHA256:TLS13-AES-128-CCM-8-SHA256:TLS13-AES-128-CCM-SHA256:EECDH+CHACHA20:EECDH+CHACHA20-draft:EECDH+ECDSA+AES128:EECDH+aRSA+AES128:RSA+AES128:EECDH+ECDSA+AES256:EECDH+aRSA+AES256:RSA+AES256:EECDH+ECDSA+3DES:EECDH+aRSA+3DES:RSA+3DES:!MD5;


包含 TLS13 是 TLS 1.3 新增的 Cipher Suite，加在最前面即可；如果你不打算继续支持 IE8，可以去掉包含 3DES 的 Cipher Suite。

本博客完整的 Nginx 配置，请点击这里查看。

验证是否支持 TLS 1.3

目前最新版 Chrome 和 Firefox 都支持 TLS 1.3，但需要手动开启：


Chrome，将 chrome://flags/ 中的 Maximum TLS version enabled 改为 TLS 1.3（Chrome 62 中需要将 TLS 1.3 改为 Enabled (Draft)，感谢 @TsuranSonoda 指出）；
Firefox，将 about:config 中的 security.tls.version.max 改为 4；


本博客多次推荐的 Qualys SSL Labs’s SSL Server Test 也支持验证服务端是否支持 TLS 1.3，非常方便，继续推荐。

原文链接：https://imququ.com/post/enable-tls-1-3.html.
本文链接：https://zhusl.com/post/eiblog-enable-tls-1-3.html，参与评论 »


给docker配置代理地址加速官方镜像的pull速度


First

create a systemd drop-in directory for the docker service:

mkdir /etc/systemd/system/docker.service.d


Now create a file called /etc/systemd/system/docker.service.d/http-proxy.conf that adds the HTTP_PROXY environment variable:

[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80/"


If you have internal Docker registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:

Environment="HTTP_PROXY=http://proxy.example.com:80/"
Environment="NO_PROXY=localhost,127.0.0.0/8,docker-registry.somecorporation.com"


Flush changes

$ sudo systemctl daemon-reload


Verify that the configuration has been loaded:

$ sudo systemctl show docker --property Environment
Environment=HTTP_PROXY=http://proxy.example.com:80/


Restart Docker

$ sudo systemctl restart docker

本文链接：https://zhusl.com/post/docker-proxy.html，参与评论 »


在centos6中如何安装docker


安装epel源

wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo


yum方式安装

yum install docker-io


启动服务

service docker start


修改启动配置

修改私有仓库支持http，修改docker0地址：

vim /etc/init.d/docker
修改start（）函数
$exec -d  --insecure-registry harbor.product.co-mall  --bip=172.18.0.1/16   $other_args &>> $logfile &



本文链接：https://zhusl.com/post/docker-centos6.html，参与评论 »


中国区docker在线安装脚本


官方安装方式

curl –sSL https://get.docker.com/ | sh


在中国区优化后的安装方式

为了方便中国区的用户安装不同版本的docker，我们在这里提供针对中国网络环境优化的安装脚本。它们使用中国的软件包仓库（在此感谢USTC）。



用法
使用需要的docker版本替换以下脚本中的

curl -sSL https://github.com/gitlawr/install-docker/blob/1.0/.sh?raw=true | sh


或:

wget -qO- https://github.com/gitlawr/install-docker/blob/1.0/.sh?raw=true | sh


支持的Docker版本

注：会根据Linux发行版有少许区别，比如Ubuntu 16.04 下不兼容docker-1.10.3


1.10.3
1.11.2
1.12.1
1.12.2
1.12.3
1.12.4
1.12.5
1.12.6
1.13.0
1.13.1
17.03.0
17.03.1
17.04.0


备注

脚本基于Ubuntu_Xenial,CentOS7 以及 Debian_Jessie做的测试。
本文链接：https://zhusl.com/post/docker-sh.html，参与评论 »

文件	地址	描述
favicon.ico	st.example.com/static/img/favicon.ico	cdn 中的文件名为 `static/img/favicon.ico`。你也可以复制 favicon.ico 到 static 文件夹下，通过 example.com/favicon.ico 也是能够访问到。docker 用户可能需要重新打包镜像。
bg04.jpg	st.example.com/static/img/bg04.jpg	首页左侧的大背景图，需要更名请到 views/st_blog.css 修改。
avatar.jpg	st.example.com/static/img/avatar.jpg	头像
blank.gif	st.example.com/static/img/blank.gif	空白图片，下载
default_avatar.png	st.example.com/static/img/default_avatar.png	disqus 默认图片，下载
disqus.js	st.example.com/static/js/disqus_xxx.js	disqus 文件，你可以通过 https://short_name.disqus.com/embed.js 下载你的专属文件，并上传到七牛。更新配置文件 app.yml。

名称	描述
app.yml	整个程序的配置文件，里面已经列出了所有配置项的说明，这里不再阐述。
blackip.yml	如果没有使用 `Nginx`，博客内置 `ip` 过滤系统。
es	elasticsearch，非常强大的分布式搜索引擎，`github` 用的就是它。里面的配置基本不用修改，但 `es/analysis/synonym.txt` 是同义词，你可以照着已有的随意增加。scripts 是 es 的脚本文件夹
nginx	系统采用 `nginx` 作为代理(相信博客系统也不会独占一台服务器～)。请使用 `nginx.conf` 替换原 `nginx` 的配置。博客系统的配置文件是 `domain/eiblog.conf`，或则重命名(只要是满足`*.conf`)。`eiblog.conf`文件里面学问是最多的。或许你想一一弄懂，或许…。注意本配置需要更新 nginx 到最新版，openssl 更新到1.0.2j，具体请到 Jerry Qu 的本博客 Nginx 配置之完整篇查看，了解详情。
scts	存放 ct 文件。
ssl	这里存放了所有证书相关的内容。
tpl	模版相关，不用修改。

Flag	Description	Use Cases
noin	Prevents OSDs from being treated as in the cluster.	Commonly used with noout to address flapping OSDs.通常和noout一起用防止OSD up/down跳来跳去
noout	Prevents OSDs from being treated as out of the cluster.	If the mon osd report timeout is exceeded and an OSD has not reported to the monitor, the OSD will get marked out. If this happens erroneously, you can set noout to prevent the OSD(s) from getting marked out while you troubleshoot the issue.MON在过了300秒(mon_osd_down_out_interval)后自动将down掉的OSD标记为out，一旦out数据就会开始迁移，建议在处理故障期间设置该标记，避免数据迁移。
noup	Prevents OSDs from being treated as up and running.	Commonly used with nodown to address flapping OSDs.通常和nodwon一起用解决OSD up/down跳来跳去
nodown	Prevents OSDs from being treated as down.	Networking issues may interrupt Ceph ‘heartbeat’ processes, and an OSD may be up but still get marked down. You can set nodown to prevent OSDs from getting marked down while troubleshooting the issue.网络问题可能会影响到Ceph进程之间的心跳，有时候OSD进程还在，却被其他OSD一起举报标记为down,导致不必要的损耗，如果确定OSD进程始终正常，可以设置nodown标记防止OSD被误标记为down.
full	Makes a cluster appear to have reached its full_ratio, and thereby prevents write operations.	If a cluster is reaching its full_ratio, you can pre-emptively set the cluster to full and expand capacity. NOTE: Setting the cluster to full will prevent write operations.如果集群快要满了，你可以预先将其设置为FULL，注意这个设置会停止写操作。(有没有效需要实际测试)
pause	Ceph will stop processing read and write operations, but will not affect OSD in, out, up or down statuses.	If you need to troubleshoot a running Ceph cluster without clients reading and writing data, you can set the cluster to pause to prevent client operations.这个标记会停止一切客户端的读写，但是集群依旧保持正常运行。
nobackfill	Ceph will prevent new backfill operations.	If you need to take an OSD or node down temporarily, (e.g., upgrading daemons), you can set nobackfill so that Ceph will not backfill while the OSD(s) is down.
norebalance	Ceph will prevent new rebalancing operations.	这个标记通常和上面的nobackfill和下面的norecover一起设置，在操作集群(挂掉OSD或者整个节点)时，如果不希望操作过程中数据发生恢复迁移等，可以设置这个标志，记得操作完后unset掉。
norecover	Ceph will prevent new recovery operations.	If you need to replace an OSD disk and don’t want the PGs to recover to another OSD while you are hotswapping disks, you can set norecover to prevent the other OSDs from copying a new set of PGs to other OSDs.也是在操作磁盘时防止数据发生恢复。
noscrub	Ceph will prevent new scrubbing operations.	If you want to prevent scrubbing (e.g., to reduce overhead during high loads, recovery, backfilling, rebalancing, etc.), you can set noscrub and/or nodeep-scrub to prevent the cluster from scrubbing OSDs.
nodeep-scrub	Ceph will prevent new deep scrubbing operations.	有时候在集群恢复时，scrub操作会影响到恢复的性能，和上面的noscrub一起设置来停止scrub。一般不建议打开。
notieragent	Ceph will disable the process that is looking for cold/dirty objects to flush and evict.	If you want to stop the tier agent process from finding cold objects to flush to the backing storage tier, you may set notieragent.停止tier引擎查找冷数据并下刷到后端存储。

capabilities	Description
allow	Precedes access settings for a daemon.
r	Gives the user read access. Required with monitors to retrieve the CRUSH map.
w	Gives the user write access to objects.
x	Gives the user the capability to call class methods (i.e., both read and write) and to conduct auth operations on monitors.
class-read	Gives the user the capability to call class read methods. Subset of x.
class-write	Gives the user the capability to call class write methods. Subset of x.
*	Gives the user read, write and execute permissions for a particular daemon/pool, and the ability to execute admin commands.
profile osd	Gives a user permissions to connect as an OSD to other OSDs or monitors. Conferred on OSDs to enable OSDs to handle replication heartbeat traffic and status reporting.
profile bootstrap-osd	Gives a user permissions to bootstrap an OSD. Conferred on deployment tools such as ceph-disk, ceph-deploy, etc. so that they have permissions to add keys, etc. when bootstrapping an OSD.

State	Description
Creating	When you create a pool, it will create the number of placement groups you specified. Ceph will echo creating when it is creating one or more placement groups. Once they are created, the OSDs that are part of a placement group’s Acting Set will peer. Once peering is complete, the placement group status should be active+clean, which means a Ceph client can begin writing to the placement group. 当创建一个池的时候，Ceph会创建一些PG(通俗点说就是在OSD上建目录)，处于创建中的PG就被标记为creating，当创建完之后，那些处于Acting集合(ceph pg map 1.0 osdmap e9395 pg 1.0 (1.0) -> up [27,4,10] acting [27,4,10]，对于pg 1.0 它的三副本会分布在osd.27,osd.4,osd.10上，那么这三个OSD上的pg 1.0就会发生沟通，确保状态一致)的PG就会进行peer，当peering完成后，也就是这个PG的三副本状态一致后，这个PG就会变成active+clean状态，也就意味着客户端可以进行写入操作了。
Peering	When Ceph is Peering a placement group, Ceph is bringing the OSDs that store the replicas of the placement group into agreement about the state of the objects and metadata in the placement group. When Ceph completes peering, this means that the OSDs that store the placement group agree about the current state of the placement group. However, completion of the peering process does NOT mean that each replica has the latest contents.peer.过程实际上就是让三个保存同一个PG副本的OSD对保存在各自OSD上的对象状态和元数据进行协商的过程，但是呢peer完成并不意味着每个副本都保存着最新的数据。Authoritative History Ceph will NOT acknowledge a write operation to a client, until all OSDs of the acting set persist the write operation. This practice ensures that at least one member of the acting set will have a record of every acknowledged write operation since the last successful peering operation.With an accurate record of each acknowledged write operation, Ceph can construct and disseminate a new authoritative history of the placement group—a complete, and fully ordered set of operations that, if performed, would bring an OSD’s copy of a placement group up to date.直到OSD的副本都完成写操作，Ceph才会通知客户端写操作完成。这确保了Acting集合中至少有一个副本，自最后一次成功的peer后。剩下的不好翻译因为没怎么理解。
Active	Once Ceph completes the peering process, a placement group may become active. The active state means that the data in the placement group is generally available in the primary placement group and the replicas for read and write operations.当PG完成了Peer之后，就会成为active状态，这个状态意味着主从OSD的该PG都可以提供读写了。
Clean	When a placement group is in the clean state, the primary OSD and the replica OSDs have successfully peered and there are no stray replicas for the placement group. Ceph replicated all objects in the placement group the correct number of times.这个状态的意思就是主从OSD已经成功peer并且没有滞后的副本。PG的正常副本数满足集群副本数。
Degraded	When a client writes an object to the primary OSD, the primary OSD is responsible for writing the replicas to the replica OSDs. After the primary OSD writes the object to storage, the placement group will remain in a degraded state until the primary OSD has received an acknowledgement from the replica OSDs that Ceph created the replica objects successfully。当客户端向一个主OSD写入一个对象时，主OSD负责向从OSD写剩下的副本，在主OSD写完后，在从OSD向主OSD发送ack之前，这个PG均会处于降级状态。The reason a placement group can be active+degraded is that an OSD may be active even though it doesn’t hold all of the objects yet. If an OSD goes down, Ceph marks each placement group assigned to the OSD as degraded. The OSDs must peer again when the OSD comes back online. However, a client can still write a new object to a degraded placement group if it is active.而PG处于active+degraded状态是因为一个OSD处于active状态但是这个OSD上的PG并没有保存所有的对象。当一个OSDdown了，Ceph会将这个OSD上的PG都标记为降级。当这个挂掉的OSD重新上线之后，OSD们必须重新peer。然后，客户端还是可以向一个active+degraded的PG写入的。If an OSD is down and the degraded condition persists, Ceph may mark the down OSD as out of the cluster and remap the data from the down OSD to another OSD. The time between being marked down and being marked out is controlled by mon osd down out interval, which is set to 300 seconds by default.当OSDdown掉五分钟后，集群会自动将这个OSD标为out,然后将缺少的PGremap到其他OSD上进行恢复以保证副本充足，这个五分钟的配置项是mon osd down out interval，默认值为300s。A placement group can also be degraded, because Ceph cannot find one or more objects that Ceph thinks should be in the placement group. While you cannot read or write to unfound objects, you can still access all of the other objects in the degraded placement group.PG如果丢了对象，Ceph也会将其标记为降级。你可以继续访问没丢的对象，但是不能读写已经丢失的对象了。Let’s say there are 9 OSDs with size = 3 (three copies of objects). If OSD number 9 goes down, the PGs assigned to OSD 9 go in a degraded state. If OSD 9 doesn’t recover, it goes out of the cluster and the cluster rebalances. In that scenario, the PGs are degraded and then recover to an active state.假设有9个OSD，三副本，然后osd.8挂了，在osd.8上的PG都会被标记为降级，如果osd.8不再加回到集群那么集群就会自动恢复出那个OSD上的数据，在这个场景中，PG是降级的然后恢复完后就会变成active状态。
Recovering	Ceph was designed for fault-tolerance at a scale where hardware and software problems are ongoing. When an OSD goes down, its contents may fall behind the current state of other replicas in the placement groups. When the OSD is back up, the contents of the placement groups must be updated to reflect the current state. During that time period, the OSD may reflect a recovering state.Ceph设计之初就考虑到了容错性，比如软硬件的错误。当一个OSD挂了，它所包含的副本内容将会落后于其他副本，当这个OSD起来之后，这个OSD的数据将会更新到当前最新的状态。这段时间，这个OSD上的PG就会被标记为recover。Recovery isn’t always trivial, because a hardware failure might cause a cascading failure of multiple OSDs. For example, a network switch for a rack or cabinet may fail, which can cause the OSDs of a number of host machines to fall behind the current state of the cluster. Each one of the OSDs must recover once the fault is resolved.而recover是不容忽视的，因为有时候一个小的硬件故障可能会导致多个OSD发生一连串的问题。比如，如果一个机架或者机柜的路由挂了，会导致一大批OSD数据滞后，每个OSD在故障解决重新上线后都需要进行recover。Ceph provides a number of settings to balance the resource contention between new service requests and the need to recover data objects and restore the placement groups to the current state. The osd recovery delay start setting allows an OSD to restart, re-peer and even process some replay requests before starting the recovery process. The osd recovery threads setting limits the number of threads for the recovery process (1 thread by default). The osd recovery thread timeout sets a thread timeout, because multiple OSDs may fail, restart and re-peer at staggered rates. The osd recovery max active setting limits the number of recovery requests an OSD will entertain simultaneously to prevent the OSD from failing to serve . The osd recovery max chunk setting limits the size of the recovered data chunks to prevent network congestion.Ceph提供了一些配置项，用来解决客户端请求和数据恢复的请求优先级问题，这些配置参考上面加粗的字体吧。
Backfilling	When a new OSD joins the cluster, CRUSH will reassign placement groups from OSDs in the cluster to the newly added OSD. Forcing the new OSD to accept the reassigned placement groups immediately can put excessive load on the new OSD. Back filling the OSD with the placement groups allows this process to begin in the background. Once backfilling is complete, the new OSD will begin serving requests when it is ready. During the backfill operations, you may see one of several states: backfill_wait indicates that a backfill operation is pending, but isn’t underway yet; backfill indicates that a backfill operation is underway; and, backfill_too_full indicates that a backfill operation was requested, but couldn’t be completed due to insufficient storage capacity. When a placement group can’t be backfilled, it may be considered incomplete. Ceph provides a number of settings to manage the load spike associated with reassigning placement groups to an OSD (especially a new OSD). By default, osd_max_backfills sets the maximum number of concurrent backfills to or from an OSD to 10. The osd backfill full ratio enables an OSD to refuse a backfill request if the OSD is approaching its full ratio (85%, by default). If an OSD refuses a backfill request, the osd backfill retry interval enables an OSD to retry the request (after 10 seconds, by default). OSDs can also set osd backfill scan min and osd backfill scan max to manage scan intervals (64 and 512, by default).当一个新的OSD加入到集群后，CRUSH会重新规划PG将其他OSD上的部分PG迁移到这个新增的PG上。如果强制要求新OSD接受所有的PG迁入要求会极大的增加该OSD的负载。回填这个OSD允许进程在后端执行。一旦回填完成后，新的OSD将会承接IO请求。在回填过程中，你可能会看到如下状态：backfill_wait: 表明回填动作被挂起，并没有执行。backfill：表明回填动作正在执行。backfill_too_full：表明当OSD收到回填请求时，由于OSD已经满了不能再回填PG了。 imcomplete: 当一个PG不能被回填时，这个PG会被认为是不完整的。同样，Ceph提供了一系列的参数来限制回填动作，包括osd_max_backfills：OSD最大回填PG数。osd_backfill_full_ratio：当OSD容量达到默认的85%是拒绝回填请求。osd_backfill_retry_interval:字面意思。
Remmapped	When the Acting Set that services a placement group changes, the data migrates from the old acting set to the new acting set. It may take some time for a new primary OSD to service requests. So it may ask the old primary to continue to service requests until the placement group migration is complete. Once data migration completes, the mapping uses the primary OSD of the new acting set.当Acting集合里面的PG组合发生变化时，数据从旧的集合迁移到新的集合中。这段时间可能比较久，新集合的主OSD在迁移完之前不能响应请求。所以新主OSD会要求旧主OSD继续服务指导PG迁移完成。一旦数据迁移完成，新主OSD就会生效接受请求。
Stale	While Ceph uses heartbeats to ensure that hosts and daemons are running, the ceph-osd daemons may also get into a stuck state where they aren’t reporting statistics in a timely manner (e.g., a temporary network fault). By default, OSD daemons report their placement group, up thru, boot and failure statistics every half second (i.e., 0.5), which is more frequent than the heartbeat thresholds. If the Primary OSD of a placement group’s acting set fails to report to the monitor or if other OSDs have reported the primary OSD down, the monitors will mark the placement group stale.Ceph使用心跳来确保主机和进程都在运行，OSD进程如果不能周期性的发送心跳包，那么PG就会变成stuck状态。默认情况下，OSD每半秒钟汇汇报一次PG，up thru,boot, failure statistics等信息，要比心跳包更会频繁一点。如果主OSD不能汇报给MON或者其他OSD汇报主OSD挂了，Monitor会将主OSD上的PG标记为stale。When you start your cluster, it is common to see the stale state until the peering process completes. After your cluster has been running for awhile, seeing placement groups in the stale state indicates that the primary OSD for those placement groups is down or not reporting placement group statistics to the monitor.当启动集群后，直到peer过程完成，PG都会处于stale状态。而当集群运行了一段时间后，如果PG卡在stale状态，说明主OSD上的PG挂了或者不能给MON发送信息。
Misplaced	There are some temporary backfilling scenarios where a PG gets mapped temporarily to an OSD. When that temporary situation should no longer be the case, the PGs might still reside in the temporary location and not in the proper location. In which case, they are said to be misplaced. That’s because the correct number of extra copies actually exist, but one or more copies is in the wrong place.有一些回填的场景：PG被临时映射到一个OSD上。而这种情况实际上不应太久，PG可能仍然处于临时位置而不是正确的位置。这种情况下个PG就是misplaced。这是因为正确的副本数存在但是有个别副本保存在错误的位置上。Lets say there are 3 OSDs: 0,1,2 and all PGs map to some permutation of those three. If you add another OSD (OSD 3), some PGs will now map to OSD 3 instead of one of the others. However, until OSD 3 is backfilled, the PG will have a temporary mapping allowing it to continue to serve I/O from the old mapping. During that time, the PG is misplaced (because it has a temporary mapping) but not degraded (since there are 3 copies).Example:pg 1.5: up=acting: [0,1,2] pg 1.5: up: [0,3,1] acting: [0,1,2] Here, [0,1,2] is a temporary mapping, so the up set is not equal to the acting set and the PG is misplaced but not degraded since [0,1,2] is still three copies.pg 1.5: up=acting: [0,3,1]OSD 3 is now backfilled and the temporary mapping is removed, not degraded and not misplaced.
Incomplete	A PG goes into a incomplete state when there is incomplete content and peering fails i.e, when there are no complete OSDs which are current enough to perform recovery.当一个PG被标记为incomplete,说明这个PG内容不完整或者peer失败，比如没有一个完整的OSD用来恢复数据了。Lets say [1,2,3] is a acting OSD set and it switches to [1,4,3], then osd.1 will request a temporary acting set of [1,2,3] while backfilling 4. During this time, if 1,2,3 all go down, osd.4 will be the only one left which might not have fully backfilled. At this time, the PG will go incomplete indicating that there are no complete OSDs which are current enough to perform recovery.Alternately, if osd.4 is not involved and the acting set is simply [1,2,3] when 1,2,3 go down, the PG would likely go stale indicating that the mons have not heard anything on that PG since the acting set changed. The reason being there are no OSDs left to notify the new OSDs.

Formats	Bit	Descriptions
Layering	1	Layering enables you to use cloning.
Striping v2	2	Striping spreads data across multiple objects. Striping helps with parallelism for sequential read/write workloads.
Exclusive locking	4	When enabled, it requires a client to get a lock on an object before making a write.
Object map	8	Block devices are thin provisioned—meaning, they only store data that actually exists. Object map support helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for cloning, or importing and exporting a sparsely populated image.
Fast-diff	16	Fast-diff support depends on object map support and exclusive lock support. It adds another property to the object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot much faster.
Deep-flatten	32	Deep-flatten makes rbd flatten work on all the snapshots of an image, in addition to the image itself. Without it, snapshots of an image will still rely on the parent, so the parent will not be delete-able until the snapshots are deleted. Deep-flatten makes a parent independent of its clones, even if they have snapshots.

Tool Name	Testing Scenario	Command line /GUI	OS Support	Popularity	Reference
FIO (Flexible I/O Tester)	major in Block level storage ex.SAN、DAS	Command line	Linux / Windows	High	fio github
IOmeter	major in Block level storage ex.SAN、DAS	GUI / Command line	Linux / Windows	High	Iometer and IOzone
iozone	File Level Storage ex.NAS	GUI / Command line	Linux / Windows	High	IOzone Filesystem Benchmark
dd File Level	Storage ex.NAS	Command line	Linux / Windows	High	dd over NFS testing
rados bench	Ceph Rados	Command line	Linux Only	Normal	BENCHMARK A CEPH STORAGECLUSTER
cosbench	Cloud Object Storage Service	GUI / Command line	Linux / Windows	High	COSBench - Cloud Object Storage Benchmark

Zhusl's 小站

firefox浏览器隐藏标签栏和地址栏

配置过程

重启浏览器即可生效

mac环境iterm2配置

iterm2下载地址

安装oh-my-zsh

安装Powerline

安装 Meslo 字体库

配置item2

使用solarized配色方案

安装agnoster(oh-my-zsh)主题

修改默认shell

上传下载（lrzsz）

上传下载2（trzsz）

安装：

配置：

进度条配置

text进度条

zenity进度条

默认保存路径

history问题(zsh)

session复制

ceph-手动更换osd的journal分区

操作目的

journal磁盘分区显示

ceph-disk list显示

创建新的journal分区

获取journal的uuid

修改journal分区type code

停止osd

修改软连接

修改分区权限

初始化

启动osd

检查磁盘状态

在cephfs下快速统计目录大小和文件数量

ubuntu16.04-kubernetes+arena搭建机器学习环境

安装环境信息：

安装节点：

开始安装：

基础系统环境

升级软件包和系统内核：

安装基础软件包(升级后内核：4.4.0-134-generic)：

屏蔽nouveau驱动（系统自带nvidia显卡驱动）：

重启

安装nvidia驱动：

安装cuda（9.2）：

安装kubernetes（1.10.4）

安装arena（参考官方文档）

安装arena

Devops 并不那么遥远

什么是DevOps

为什么需要 DevOps

自下而上

自上而下

如何实践 DevOps

打造DevOps团队

构建DevOps文化

搭建自动化流程

如何评估 DevOps 效果

当 DevOps 遇上 Docker