Elasticsearch2.3.3 java client search源碼分析

問題

  1. es集群只配置一個節點,client是否能夠自動發現集群中的所有節點?是如何發現的?如下配置了一個節點:


    單個node配置
  2. es client如何做到負載均衡?
  3. 一個es node掛掉之后,es client如何摘掉該節點?
  4. es client node檢測分為兩種模式,有什么不同?

核心類

  • TransportClient : es client對外API類
  • TransportClientNodesService : 維護node節點的類
  • ScheduledNodeSampler : 定期維護正常節點類
  • NettyTransport : 進行數據傳輸
  • NodeSampler : 節點嗅探類

Client初始化過程

初始化代碼

Settings.Builder builder = Settings.settingsBuilder()                                   
                                   .put("cluster.name", clusterName)
                                   .put("client.transport.sniff", true);
Settings settings = builder.build(); 
TransportClient client = TransportClient.builder().settings(settings).build(); 
for (TransportAddress transportAddress : transportAddresses) {
      client.addTransportAddress(transportAddress);
}
  1. ES 通過builder模式構造了基礎的配置參數;
  2. 通過build構造了client,這個時候包括構造client、初始化ThreadPool、構造TransportClientNodesService、啟動定時任務、定制化嗅探類型;
  3. 添加集群可用地址,比如我只配了集群中的一個節點;

構建client

調用build API

build code

其中,關于依賴注入的簡單說明:Guice 是 Google 用于 Java? 開發的開放源碼依賴項注入框架(感興趣的可以了解下,這里不做重點講解),具體可參考如下:

Google Guice Started
Guice 博客1
Guice 博客2

初始化TransportClientNodesService

在上一幅圖的modules.createInjectorTransportClientNodesService進行實例化,在TransportClient進行注入,可以看到TransportClient里邊的絕大部分API都是通過TransportClientNodesService進行代理的:

TransportClient code

Guice通過注解進行注入

Guice 注解注入code

在上圖中:注入了集群名稱、線程池等,重點是如下代碼:該段代碼選擇了節點嗅探器的類型 嗅探同一集群中的所有節點SniffNodesSampler或者是只關注配置文件配置的節點SimpleNodeSampler

if (this.settings.getAsBoolean("client.transport.sniff", false)) {
    this.nodesSampler = new SniffNodesSampler();
} else {
    this.nodesSampler = new SimpleNodeSampler();
}

特點
SniffNodesSampler:client會主動發現集群里的其他節點,會創建fully connect(什么叫fully connect?后邊說)
SimpleNodeSampler:ping listedNodes中的所有node,區別在于這里創建的都是light connect;
其中TransportClientNodesService維護了三個節點存儲數據結構:

// nodes that are added to be discovered
1 private volatileListlistedNodes= Collections.emptyList();
2 private volatileListnodes= Collections.emptyList();
3 private volatileListfilteredNodes= Collections.emptyList();
  1. 代表配置文件中主動加入的節點;
  2. 代表參與請求的節點;
  3. 過濾掉的不能進行請求處理的節點;

Client如何做到負載均衡

負載均衡code

如上圖,我們發現每次 execute 的時候,是從 nodes 這個數據結構中獲取節點,然后通過簡單的 rouund-robbin 獲取節點服務器,核心代碼如下:

private final AtomicInteger randomNodeGenerator = new AtomicInteger();
......
private int getNodeNumber() {
    int index = randomNodeGenerator.incrementAndGet();  
    if (index < 0) {
        index = 0;
        randomNodeGenerator.set(0);
    }
    return index;
}

然后通過netty的channel將數據寫入,核心代碼如下:

public void sendRequest(final DiscoveryNode node, final long requestId, final String action, final TransportRequest request, TransportRequestOptions options) 
throws IOException, TransportException {
1 Channel targetChannel = nodeChannel(node, options);     
  if (compress) {        
      options = TransportRequestOptions.builder(options).withCompress(true).build();    
  }    
byte status = 0;    
status = TransportStatus.setRequest(status);    
ReleasableBytesStreamOutput bStream = new ReleasableBytesStreamOutput(bigArrays);    
boolean addedReleaseListener = false;    
try {        
    bStream.skip(NettyHeader.HEADER_SIZE);        
    StreamOutput stream = bStream;        
    // only compress if asked, and, the request is not bytes, since then only
    // the header part is compressed, and the "body" can't be extracted as compressed      
    if (options.compress() && (!(request instanceof BytesTransportRequest))) {            
        status = TransportStatus.setCompress(status);            
        stream = CompressorFactory.defaultCompressor().streamOutput(stream);       
    }
    // we pick the smallest of the 2, to support both backward and forward compatibility
    // note, this is the only place we need to do this, since from here on, we use the serialized version
    // as the version to use also when the node receiving this request will send the     response with
    Version version = Version.smallest(this.version, node.version());
    stream.setVersion(version);
    stream.writeString(action);
    ReleasablePagedBytesReference bytes;
    ChannelBuffer buffer;
    // it might be nice to somehow generalize this optimization, maybe a smart "paged" bytes output        
    // that create paged channel buffers, but its tricky to know when to do it (where this option is
    // more explicit).      
    if (request instanceof BytesTransportRequest) {
        BytesTransportRequest bRequest = (BytesTransportRequest) request;  
        assert node.version().equals(bRequest.version());
        bRequest.writeThin(stream);
        stream.close();
        bytes = bStream.bytes();
        ChannelBuffer headerBuffer = bytes.toChannelBuffer();
        ChannelBuffer contentBuffer = bRequest.bytes().toChannelBuffer();
        buffer = ChannelBuffers.wrappedBuffer(NettyUtils.DEFAULT_GATHERING,         headerBuffer, contentBuffer);      
    } else {
        request.writeTo(stream);
        stream.close();
        bytes = bStream.bytes();            
        buffer = bytes.toChannelBuffer();
    }
    NettyHeader.writeHeader(buffer, requestId, status, version);
2 ChannelFuture future = targetChannel.write(buffer);        
  ReleaseChannelFutureListener listener= new ReleaseChannelFutureListener(bytes);
  future.addListener(listener);
  addedReleaseListener = true;
  transportServiceAdapter.onRequestSent(node, requestId, action, request, options);  
  } finally {      
    if (!addedReleaseListener) {
    Releasables.close(bStream.bytes());       
  }
  }
}

其中最重要的就是1和2

  • 1代表拿到一個連接;
  • 2代表通過拿到的連接寫數據;

這時候就會有新的問題

  1. nodes的數據是何時寫入的?
  2. 連接是什么時候創建的?

Nodes數據何時寫入

核心是調用doSampler,代碼如下:

protected void doSample() {    
// the nodes we are going to ping include the core listed nodes that were added    
// and the last round of discovered nodes    
SetnodesToPing = Sets.newHashSet();   
for (DiscoveryNode node : listedNodes) {       
    nodesToPing.add(node);  
}
for (DiscoveryNode node : nodes) {        
    nodesToPing.add(node);    
}   
final CountDownLatch latch = new CountDownLatch(nodesToPing.size());    
final ConcurrentMapclusterStateResponses = ConcurrentCollections.newConcurrentMap();   
for (final DiscoveryNode listedNode : nodesToPing) {        
    threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new Runnable() {           
       @Override   
       public void run() {                
          try {
               if (!transportService.nodeConnected(listedNode)) {                        
                    try {                           
                     // if its one of the actual nodes we will talk to, not to listed nodes, fully connect           
                     if (nodes.contains(listedNode)) {                                        
                        logger.trace("connecting to cluster node [{}]", listedNode);                                    
                        transportService.connectToNode(listedNode);                            
                     } else {
                         // its a listed node, light connect to it...                                    
                        logger.trace("connecting to listed node (light) [{}]", listedNode);                                
                        transportService.connectToNodeLight(listedNode);                                      
                     }
                   } catch (Exception e) {                       
                     logger.debug("failed to connect to node [{}], ignoring...", e, listedNode);
                      latch.countDown();                        
                      return;                       
                   }                
                }
//核心是在這里,剛剛開始初始化的時候,可能只有配置的一個節點,這個時候會通過這個地址發送一個state狀態監測                    
//"cluster:monitor/state"                    
transportService.sendRequest(listedNode, ClusterStateAction.NAME,                          
headers.applyTo(Requests.clusterStateRequest().clear().nodes(true).local(true)),                            
TransportRequestOptions.builder().withType(TransportRequestOptions.Type.STAE).withTimeout(pingTimeout).build(),                            
new BaseTransportResponseHandler() {                                
@Override                                
public ClusterStateResponse newInstance() {                                
    return new ClusterStateResponse();                                
}                                
@Override                                
public String executor() {                                    
    return ThreadPool.Names.SAME;                               
}                                
@Override                              
public void handleResponse(ClusterStateResponse response) {
/*通過回調,會在這個地方返回集群中類似下邊所有節點的信息
{  "version" : 27,  "state_uuid" : "YSI9d_HiQJ-FFAtGFCVOlw",  "master_node" : "TXHHx-XRQaiXAxtP1EzXMw",  "blocks" : { },  "nodes" : {    "poxubF0LTVue84GMrZ7rwA" : {      "name" : "node1",      "transport_address" : "1.1.1.1:8888",      "attributes" : {        "data" : "false",        "master" : "true"      }    },    "9Cz8m3GkTza7vgmpf3L65Q" : {      "name" : "node2",      "transport_address" : "1.1.1.2:8889",      "attributes" : {        "master" : "false"      }    }  },  "metadata" : {    "cluster_uuid" : "_na_",    "templates" : { },    "indices" : { }  },  "routing_table" : {    "indices" : { }  },  "routing_nodes" : {    "unassigned" : [ ],    "nodes" : {      "lZqD-WExRu-gaSUiCXaJcg" : [ ],      "hR6PbFrgQVSY0MHajNDmgA" : [ ],    }  }}*/                                    
clusterStateResponses.put(listedNode, response);                                  
latch.countDown();                                
}                                
@Override                                
public void handleException(TransportException e) {                                    logger.info("failed to get local cluster state for {}, disconnecting...", e, listedNode);                                    transportService.disconnectFromNode(listedNode);                                    latch.countDown();                                
}                            
});} catch (Throwable e) {                    
logger.info("failed to get local cluster state info for {}, disconnecting...", e, listedNode);                    
transportService.disconnectFromNode(listedNode);                    latch.countDown();                
}}});}    
try {     
   latch.await();    
} catch (InterruptedException e) {       
 return;    
}    
HashSetnewNodes = new HashSet<>();    HashSetnewFilteredNodes = new HashSet<>();   
for (Map.Entryentry : clusterStateResponses.entrySet()) {      
    if (!ignoreClusterName &&!clusterName.equals(entry.getValue().getClusterName())) {            
        logger.warn("node {} not part of the cluster {}, ignoring...",     
        entry.getValue().getState().nodes().localNode(), clusterName);            
        newFilteredNodes.add(entry.getKey());            
        continue;        
}
//接下來在這個地方拿到所有的data nodes 寫入到nodes節點里邊       
 for (ObjectCursorcursor : entry.getValue().getState().nodes().dataNodes().values()){
    newNodes.add(cursor.value);}}
    nodes = validateNewNodes(newNodes);
    filteredNodes = Collections.unmodifiableList(new ArrayList<(newFilteredNodes));
  }

其中調用時機分為兩部分:

  • client.addTransportAddress(transportAddress);
  • ScheduledNodeSampler,默認每隔5s會進行一次對各個節點的請求操作;

連接是何時創建的呢

也是在doSampler調用,最終由NettryTransport創建

創建連接code

這個時候發現,如果是light則創建輕連接,也就是,否則創建fully connect,其中包括:
recovery:做數據恢復recovery,默認個數2個;

  • bulk:用于bulk請求,默認個數3個;
  • med/reg:典型的搜索和單doc索引,默認個數6個;
  • high:如集群state的發送等,默認個數1個;
  • ping:就是node之間的ping咯。默認個數1個;

對應的代碼為:

public void start() {    
    List<Channel> newAllChannels = new ArrayList<>();    
    newAllChannels.addAll(Arrays.asList(recovery));    
    newAllChannels.addAll(Arrays.asList(bulk));    
    newAllChannels.addAll(Arrays.asList(reg));    
    newAllChannels.addAll(Arrays.asList(state));    
    newAllChannels.addAll(Arrays.asList(ping));    
    this.allChannels = Collections.unmodifiableList(newAllChannels);
}

END

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容