zookeeper服務端啟動入口是QuorumPeerMain的main方法,
public static void main(String[] args) {
QuorumPeerMain main = new QuorumPeerMain();
try {
main.initializeAndRun(args);
.....省略異常校驗代碼......
其中主要的邏輯方法是initializeAndRun
protected void initializeAndRun(String[] args)
throws ConfigException, IOException, AdminServerException
{
QuorumPeerConfig config = new QuorumPeerConfig();
if (args.length == 1) {
config.parse(args[0]);
}
// Start and schedule the the purge task
DatadirCleanupManager purgeMgr = new DatadirCleanupManager(config
.getDataDir(), config.getDataLogDir(), config
.getSnapRetainCount(), config.getPurgeInterval());
purgeMgr.start();
if (args.length == 1 && config.isDistributed()) {
//集群模式
runFromConfig(config);
} else {
LOG.warn("Either no config or no quorum defined in config, running "
+ " in standalone mode");
// there is only server in the quorum -- run as standalone
//單機模式
ZooKeeperServerMain.main(args);
}
}
通常采用配置文件zoo.cfg的方式加載配置,也就是args.length == 1 && args[0]是配置文件的路徑,其中QuorumPeerConfig通過parse(String path)方法存儲了配置解析的信息。可參考zookeeper中的配置信息詳解
如果配置了多個server節點(config.isDistributed()),將以集群方式啟動,否則以單機模式啟動,首先分析單機模式的啟動
單機版服務器啟動
主要處理函數:ZooKeeperServerMain.runFromConfig
public void runFromConfig(ServerConfig config) throws IOException, AdminServerException {
LOG.info("Starting server");
FileTxnSnapLog txnLog = null;
try {
txnLog = new FileTxnSnapLog(config.dataLogDir, config.dataDir);
final ZooKeeperServer zkServer = new ZooKeeperServer(txnLog,
config.tickTime, config.minSessionTimeout, config.maxSessionTimeout, null);
txnLog.setServerStats(zkServer.serverStats());
// Registers shutdown handler which will be used to know the
// server error or shutdown state changes.
final CountDownLatch shutdownLatch = new CountDownLatch(1);
zkServer.registerServerShutdownHandler(
new ZooKeeperServerShutdownHandler(shutdownLatch));
// Start Admin server
adminServer = AdminServerFactory.createAdminServer();
adminServer.setZooKeeperServer(zkServer);
adminServer.start();
boolean needStartZKServer = true;
if (config.getClientPortAddress() != null) {
cnxnFactory = ServerCnxnFactory.createFactory();
cnxnFactory.configure(config.getClientPortAddress(), config.getMaxClientCnxns(), false);
cnxnFactory.startup(zkServer);
// zkServer has been started. So we don't need to start it again in secureCnxnFactory.
needStartZKServer = false;
}
if (config.getSecureClientPortAddress() != null) {
secureCnxnFactory = ServerCnxnFactory.createFactory();
secureCnxnFactory.configure(config.getSecureClientPortAddress(), config.getMaxClientCnxns(), true);
secureCnxnFactory.startup(zkServer, needStartZKServer);
}
containerManager = new ContainerManager(zkServer.getZKDatabase(), zkServer.firstProcessor,
Integer.getInteger("znode.container.checkIntervalMs", (int) TimeUnit.MINUTES.toMillis(1)),
Integer.getInteger("znode.container.maxPerMinute", 10000)
);
containerManager.start();
// Watch status of ZooKeeper server. It will do a graceful shutdown
// if the server is not running or hits an internal error.
shutdownLatch.await();
shutdown();
if (cnxnFactory != null) {
cnxnFactory.join();
}
if (secureCnxnFactory != null) {
secureCnxnFactory.join();
}
if (zkServer.canShutdown()) {
zkServer.shutdown(true);
}
} catch (InterruptedException e) {
// warn, but generally this is ok
LOG.warn("Server interrupted", e);
} finally {
if (txnLog != null) {
txnLog.close();
}
}
}
初始化過程
:
1.創建數據管理器FileTxnSnapLog
FileTxnSnapLog是zookeeper上層服務器和底層數據存儲的對接層,提供操作事務日志和快照的接口,可參考zookeeper源碼分析(6)-數據和存儲 ,在啟動服務器時用來恢復本地數據
2.創建服務器實例ZooKeeperServer
ZooKeeperServer是單機版服務端的實現,
構造函數如下:
public ZooKeeperServer(FileTxnSnapLog txnLogFactory, int tickTime,
int minSessionTimeout, int maxSessionTimeout, ZKDatabase zkDb) {
serverStats = new ServerStats(this);
this.txnLogFactory = txnLogFactory;
this.txnLogFactory.setServerStats(this.serverStats);
this.zkDb = zkDb;
this.tickTime = tickTime;
//沒有配置默認為2*tickTime
setMinSessionTimeout(minSessionTimeout);
//沒有配置默認為20*tickTime
setMaxSessionTimeout(maxSessionTimeout);
listener = new ZooKeeperServerListenerImpl(this);
ServerStats是服務器運行的統計器,包含最基本的運行時信息,ZooKeeperServer實現了ServerStats.Provider接口
public class ServerStats {
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端向客戶端發送的響應包次數
private long packetsSent;
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端從客戶端接收的請求包次數
private long packetsReceived;
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端請求處理的最大延時
private long maxLatency;
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端請求處理的最小延時
private long minLatency = Long.MAX_VALUE;
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端請求處理的總延時
private long totalLatency = 0;
//從服務器啟動開始,或最近一次重置服務端統計信息之后,服務端請求處理總次數
private long count = 0;
//記錄事務日志fsync刷盤的超過閾值時間的報警次數
private AtomicLong fsyncThresholdExceedCount = new AtomicLong(0);
private final Provider provider;
public interface Provider {
public long getOutstandingRequests();
public long getLastProcessedZxid();
public String getState();
public int getNumAliveConnections();
public long getDataDirSize();
public long getLogDirSize();
}
public ServerStats(Provider provider) {
this.provider = provider;
}
設置服務器tickTime和會話超時時間限制,初始化ZooKeeperServerListenerImpl,用來監聽某些重要線程掛掉時,更改ZooKeeperServer的state為State.ERROR
3.啟動AdminServer
AdminServer主要是用來接收一些執行命令的請求,默認實現為JettyAdminServer,默認暴露端口為8080,可通過http://<hostname>:8080/commands/<commandname>的方式訪問。
4.初始化ServerCnxnFactory并啟動
ServerCnxnFactory是創建和客戶端交互的IO線程和worker線程,有NIOServerCnxnFactory,NettyServerCnxnFactory實現。默認為NIOServerCnxnFactory實現,使用非阻塞事件驅動的NIO方式處理連接,創建線程數和機器硬件有關,如
32核的機器默認為1個
接收新連接的線程,1個會話校驗線程,4個IO讀寫線程和64個IO數據處理線程
配置如下:
public void configure(InetSocketAddress addr, int maxcc, boolean secure) throws IOException {
maxClientCnxns = maxcc;
sessionlessCnxnTimeout = Integer.getInteger(
ZOOKEEPER_NIO_SESSIONLESS_CNXN_TIMEOUT, 10000);
// We also use the sessionlessCnxnTimeout as expiring interval for
// cnxnExpiryQueue. These don't need to be the same, but the expiring
// interval passed into the ExpiryQueue() constructor below should be
// less than or equal to the timeout.
cnxnExpiryQueue =
new ExpiryQueue<NIOServerCnxn>(sessionlessCnxnTimeout);
expirerThread = new ConnectionExpirerThread();
int numCores = Runtime.getRuntime().availableProcessors();
// 32 cores sweet spot seems to be 4 selector threads
numSelectorThreads = Integer.getInteger(
ZOOKEEPER_NIO_NUM_SELECTOR_THREADS,
Math.max((int) Math.sqrt((float) numCores/2), 1));
if (numSelectorThreads < 1) {
throw new IOException("numSelectorThreads must be at least 1");
}
numWorkerThreads = Integer.getInteger(
ZOOKEEPER_NIO_NUM_WORKER_THREADS, 2 * numCores);
workerShutdownTimeoutMS = Long.getLong(
ZOOKEEPER_NIO_SHUTDOWN_TIMEOUT, 5000);
for(int i=0; i<numSelectorThreads; ++i) {
selectorThreads.add(new SelectorThread(i));
}
this.ss = ServerSocketChannel.open();
ss.socket().setReuseAddress(true);
LOG.info("binding to port " + addr);
ss.socket().bind(addr);
ss.configureBlocking(false);
acceptThread = new AcceptThread(ss, addr, selectorThreads);
}
啟動流程為
public void startup(ZooKeeperServer zks, boolean startServer)
throws IOException, InterruptedException {
start();
setZooKeeperServer(zks);
if (startServer) {
zks.startdata();
zks.startup();
}
}
public void start() {
stopped = false;
if (workerPool == null) {
workerPool = new WorkerService(
"NIOWorker", numWorkerThreads, false);
}
for(SelectorThread thread : selectorThreads) {
if (thread.getState() == Thread.State.NEW) {
thread.start();
}
}
// ensure thread is started once and only once
if (acceptThread.getState() == Thread.State.NEW) {
acceptThread.start();
}
if (expirerThread.getState() == Thread.State.NEW) {
expirerThread.start();
}
}
可見分別啟動了expirerThread,acceptThread,selectorThreads和workerPool
acceptThread
是單線程的,主要用來接收客戶端新的連接請求,并把新的客戶端socket交給一個selectorThread處理,放入它的acceptedQueue,并使它的selector的select操作馬上返回。
selectorThread
做的事情為
1.將acceptedQueue中的客戶端連接取出,包裝成NIOServerCnxn,代表一個客戶端線程的通信,包含了讀寫數據相關的方法
2.不斷找出有read/write通知的客戶端socket(),并調用handleIO,交給workerPool處理io請求數據,在處理數據時,會將當前請求包裝成一個IOWorkRequest,最后會包裝為ScheduledWorkRequest,也就是一個任務。會暫時取消該channel上的事件監聽,等處理完畢之后,會重新更新注冊事件。此外,處理數據的前后均會激活會話,重新計算過期時間touchCnxn
private void handleIO(SelectionKey key) {
IOWorkRequest workRequest = new IOWorkRequest(this, key);
NIOServerCnxn cnxn = (NIOServerCnxn) key.attachment();
// Stop selecting this key while processing on its
// connection
cnxn.disableSelectable();
key.interestOps(0);
touchCnxn(cnxn);
workerPool.schedule(workRequest);
}
expirerThread
是用來客戶端連接過期校驗的線程,通過從ExpiryQueue<NIOServerCnxn> cnxnExpiryQueue不斷獲取出過期時間小于當前時間的NIOServerCnxn,進行連接清理
ExpiryQueue維護了兩個map
ConcurrentHashMap<E, Long> elemMap:
key為每個連接,value為當前連接的過期時間
ConcurrentHashMap<Long, Set<E>> expiryMap
:key為下一個過期時間,value為會話過期時間滿足key的一批連接
計算會話的下個超時時間的公式為
lastExpirationTime = currentTime + sessionTimeout
newExpirationTime = (lastExpirationTime / expirationInterval + 1) * expirationInterval;
其中expirationInterval為expirerThread定時會話超時檢查的時間間隔,這個公式保證了會話總會在離他會話過期的下一個最近時間間隔得到檢查。
每次會話激活的時候都會更新連接對應的這兩個map,重新計算下個過期時間,expirerThread只要在下個定時檢查時間中從ExpiryQueue.expiryMap獲得超時的一批連接,清理掉就好。
5.從本地快照數據文件和事務日志文件中恢復本地數據
public void startdata()
throws IOException, InterruptedException {
//check to see if zkDb is not null
if (zkDb == null) {
zkDb = new ZKDatabase(this.txnLogFactory);
}
if (!zkDb.isInitialized()) {
loadData();
}
}
簡要的說就是從最近可用的快照中恢復dataTree,并從對應的事務日志中恢復數據的變更,可參考 zookeeper源碼分析(6)-數據和存儲
6.ZookeeperServer的初始化和啟動
public synchronized void startup() {
if (sessionTracker == null) {
createSessionTracker();
}
startSessionTracker();
setupRequestProcessors();
registerJMX();
setState(State.RUNNING);
notifyAll();
}
1.創建會話管理器SessionTrackerImpl,是負責服務端的會話管理,如果客戶端在會話過期時間內沒有激活會話,會將過期的會話清掉。
2.初始化zookeeper的請求處理鏈
PrepRequestProcessor -> SyncRequestProcessor -> FinalRequestProcessor
也就是數據經過NIOServerCnxn,如果是讀IO,會調用ZookeeperServer.processPacket方法,交由處理鏈處理。可參考zookeeper源碼分析(7)-服務器請求處理鏈的初始化
3.注冊JMX服務
4.更改服務狀態為State.RUNNING,通知其他阻塞線程并釋放鎖,至此單機版啟動完畢
集群服務器啟動
集群版是zab協議的實現,所以相比單機版會多了選舉,以及集群中主從服務器的通信和事務請求的發起等邏輯實現。
主要處理函數:QuorumPeerMain.runFromConfig
public void runFromConfig(QuorumPeerConfig config)
throws IOException, AdminServerException
{
try {
ManagedUtil.registerLog4jMBeans();
} catch (JMException e) {
LOG.warn("Unable to register log4j JMX control", e);
}
LOG.info("Starting quorum peer");
try {
ServerCnxnFactory cnxnFactory = null;
ServerCnxnFactory secureCnxnFactory = null;
if (config.getClientPortAddress() != null) {
cnxnFactory = ServerCnxnFactory.createFactory();
cnxnFactory.configure(config.getClientPortAddress(),
config.getMaxClientCnxns(),
false);
}
if (config.getSecureClientPortAddress() != null) {
secureCnxnFactory = ServerCnxnFactory.createFactory();
secureCnxnFactory.configure(config.getSecureClientPortAddress(),
config.getMaxClientCnxns(),
true);
}
quorumPeer = getQuorumPeer();
quorumPeer.setTxnFactory(new FileTxnSnapLog(
config.getDataLogDir(),
config.getDataDir()));
quorumPeer.enableLocalSessions(config.areLocalSessionsEnabled());
quorumPeer.enableLocalSessionsUpgrading(
config.isLocalSessionsUpgradingEnabled());
//quorumPeer.setQuorumPeers(config.getAllMembers());
quorumPeer.setElectionType(config.getElectionAlg());
quorumPeer.setMyid(config.getServerId());
quorumPeer.setTickTime(config.getTickTime());
quorumPeer.setMinSessionTimeout(config.getMinSessionTimeout());
quorumPeer.setMaxSessionTimeout(config.getMaxSessionTimeout());
quorumPeer.setInitLimit(config.getInitLimit());
quorumPeer.setSyncLimit(config.getSyncLimit());
quorumPeer.setConfigFileName(config.getConfigFilename());
quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory()));
quorumPeer.setQuorumVerifier(config.getQuorumVerifier(), false);
if (config.getLastSeenQuorumVerifier()!=null) {
quorumPeer.setLastSeenQuorumVerifier(config.getLastSeenQuorumVerifier(), false);
}
quorumPeer.initConfigInZKDatabase();
quorumPeer.setCnxnFactory(cnxnFactory);
quorumPeer.setSecureCnxnFactory(secureCnxnFactory);
quorumPeer.setLearnerType(config.getPeerType());
quorumPeer.setSyncEnabled(config.getSyncEnabled());
quorumPeer.setQuorumListenOnAllIPs(config.getQuorumListenOnAllIPs());
// sets quorum sasl authentication configurations
quorumPeer.setQuorumSaslEnabled(config.quorumEnableSasl);
if(quorumPeer.isQuorumSaslAuthEnabled()){
quorumPeer.setQuorumServerSaslRequired(config.quorumServerRequireSasl);
quorumPeer.setQuorumLearnerSaslRequired(config.quorumLearnerRequireSasl);
quorumPeer.setQuorumServicePrincipal(config.quorumServicePrincipal);
quorumPeer.setQuorumServerLoginContext(config.quorumServerLoginContext);
quorumPeer.setQuorumLearnerLoginContext(config.quorumLearnerLoginContext);
}
quorumPeer.setQuorumCnxnThreadsSize(config.quorumCnxnThreadsSize);
quorumPeer.initialize();
quorumPeer.start();
quorumPeer.join();
} catch (InterruptedException e) {
// warn, but generally this is ok
LOG.warn("Quorum Peer interrupted", e);
}
}
初始化過程主要包括:
- 創建并初始化ServerCnxnFactory,同單機版
- 創建數據管理器FileTxnSnapLog
- 創建QuorumPeer實例
QuorumPeer是集群模式下特有的對象,是ZookeeperServer的托管者,在運行期間,會不斷檢測當前服務器實例的狀態,同時根據情況發起Leader選舉。 - 創建內存數據庫zkDatabase
- 啟動quorumPeer
public synchronized void start() {
if (!getView().containsKey(myid)) {
throw new RuntimeException("My id " + myid + " not in the peer list");
}
loadDataBase();
startServerCnxnFactory();
try {
adminServer.start();
} catch (AdminServerException e) {
LOG.warn("Problem starting AdminServer", e);
System.out.println(e);
}
startLeaderElection();
super.start();
}
主要流程包括:
1.恢復本地數據,獲得本機最新的lastProcessedZxid 和 currentEpoch等,參考zookeeper源碼分析(6)-數據和存儲
2.啟動ServerCnxnFactory主線程,參考單機版
3.開始集群選舉
synchronized public void startLeaderElection() {
try {
if (getPeerState() == ServerState.LOOKING) {
currentVote = new Vote(myid, getLastLoggedZxid(), getCurrentEpoch());
}
} catch(IOException e) {
RuntimeException re = new RuntimeException(e.getMessage());
re.setStackTrace(e.getStackTrace());
throw re;
}
this.electionAlg = createElectionAlgorithm(electionType);
}
如果是 ServerState.LOOKING狀態,就構造自己的投票currentVote,electionAlg從3.4之后的版本默認為3,
啟動之后的執行流程為:
public void run() {
········省略JMX注冊·············
try {
/*
* Main loop
*/
while (running) {
switch (getPeerState()) {
case LOOKING:
LOG.info("LOOKING");
if (Boolean.getBoolean("readonlymode.enabled")) {
LOG.info("Attempting to start ReadOnlyZooKeeperServer");
// Create read-only server but don't start it immediately
final ReadOnlyZooKeeperServer roZk =
new ReadOnlyZooKeeperServer(logFactory, this, this.zkDb);
// Instead of starting roZk immediately, wait some grace
// period before we decide we're partitioned.
//
// Thread is used here because otherwise it would require
// changes in each of election strategy classes which is
// unnecessary code coupling.
Thread roZkMgr = new Thread() {
public void run() {
try {
// lower-bound grace period to 2 secs
sleep(Math.max(2000, tickTime));
if (ServerState.LOOKING.equals(getPeerState())) {
roZk.startup();
}
} catch (InterruptedException e) {
LOG.info("Interrupted while attempting to start ReadOnlyZooKeeperServer, not started");
} catch (Exception e) {
LOG.error("FAILED to start ReadOnlyZooKeeperServer", e);
}
}
};
try {
roZkMgr.start();
reconfigFlagClear();
if (shuttingDownLE) {
shuttingDownLE = false;
startLeaderElection();
}
setCurrentVote(makeLEStrategy().lookForLeader());
} catch (Exception e) {
LOG.warn("Unexpected exception", e);
setPeerState(ServerState.LOOKING);
} finally {
// If the thread is in the the grace period, interrupt
// to come out of waiting.
roZkMgr.interrupt();
roZk.shutdown();
}
} else {
try {
reconfigFlagClear();
if (shuttingDownLE) {
shuttingDownLE = false;
startLeaderElection();
}
setCurrentVote(makeLEStrategy().lookForLeader());
} catch (Exception e) {
LOG.warn("Unexpected exception", e);
setPeerState(ServerState.LOOKING);
}
}
break;
case OBSERVING:
try {
LOG.info("OBSERVING");
setObserver(makeObserver(logFactory));
observer.observeLeader();
} catch (Exception e) {
LOG.warn("Unexpected exception",e );
} finally {
observer.shutdown();
setObserver(null);
updateServerState();
}
break;
case FOLLOWING:
try {
LOG.info("FOLLOWING");
setFollower(makeFollower(logFactory));
follower.followLeader();
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
follower.shutdown();
//運行至此表明當前follewer線程出現問題,需要重置為looking,進行新 一輪的選舉
setFollower(null);
updateServerState();
}
break;
case LEADING:
LOG.info("LEADING");
try {
setLeader(makeLeader(logFactory));
leader.lead();
//運行至此表明當前leader線程出現問題,需要重置為looking,進行新 一輪的選舉
setLeader(null);
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
if (leader != null) {
leader.shutdown("Forcing shutdown");
setLeader(null);
}
updateServerState();
}
break;
}
start_fle = Time.currentElapsedTime();
}
} finally {
········清理操作·····
}
可以看到,當是looking狀態時,會通過選舉設置當前投票,setCurrentVote(makeLEStrategy().lookForLeader())
;默認的選舉方法為FastLeaderElection.lookForLeader,這個方法會通過選舉最終確定當前服務器的狀態,是Leader的話就進行leader相關的初始化并啟動leader.lead(),follewer同樣啟動自己的流程follower.followLeader()。具體分析參考zookeeper源碼分析(4)-選舉流程和服務器啟動處理
至此,服務端啟動分析完畢。
感謝您的閱讀,我是Monica23334 || Monica2333 。立下每周寫一篇原創文章flag的小姐姐,關注我并期待打臉吧~