Apache Kafka 是一種分布式流式平臺
Kafka基本搭建 :
Step1
kafka下載地址
wget http://mirror.bit.edu.cn/apache/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
tar zxvf kafka_2.11-0.10.2.0.tgz
cd kafka_2.11-0.10.2.0
Step2: 啟動Server
Kafka使用ZooKeeper,所以如果你沒有一個ZooKeeper Server你需要首先去啟動它。你可以通過一個腳本來獲取一個快的單節點的ZooKeeper實例。
bin/zookeeper-server-start.sh config/zookeeper.properties
這時候你就可以啟動Kafka server:
bin/kafka-server-start.sh config/server.properties
Step3: 創建一個話題Topic
下面我們創建一個名為test的topic,其 只有一個分區和復制
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
我們來看看我們創建的話題topic
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
Step4 發送消息,生產者
Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
比如在centos中發送消息:
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Hello
Step5 接收消息,消費者
Kafka also has a command line consumer that will dump out messages to standard output.
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Hello
以上就是Single Broker cluster,但是我們可以開發multi-broker
Step6 設置multi-broker cluster
So far we have been running against a single broker, but that's no fun. For Kafka, a single broker is just a cluster of size one, so nothing much changes other than starting a few more broker instances. But just to get feel for it, let's expand our cluster to three nodes (still all on our local machine).簡單看就是在本地主機上設置三個node構成kafka的broker集群
首先我們需要給每個broker生成一個配置文件,簡單講就是復制一份啊
cd /home/kason/kafka/kafka_2.11-0.10.2.0/config/
su
cp server.properties server-1.properties
cp server.properties server-2.properties
現在編輯這些新創建的server-1,server-2文件,設置如下:
server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-1
server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-2
broker.id是集群中的每個node的唯一并且持久的標識名字。同時我們為這臺機器上的每個node設置了不同的端口監聽以及日志路徑進而進行區分
萬事俱備,啟動另外兩個broker
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
現在我們創建一個分區三個復制的新的話題topic
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
Created topic "my-replicated-topic".
但是如何知道每一個broker暗殺的呢 可以查看describe topics命令
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
我們解釋解釋上面的東西的含義:第一行給出分區總和,每一個額外的行給出一個分區的信息,由于我們這個topic只有一個分區因此只有一行
leader 是為指定的分區負責讀寫的節點node,它是隨機選的
replicas 節點node list
isr 同步replicas
在這個broker集群中發送消息 生產者:
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
my test message 1
^[^T^[^T
hello world
hello kafka
接收消息 消費者:
[kason@kason kafka_2.11-0.10.2.0]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
my test message 1
????
hello world
hello kafka
現在因為這是集群我們來測試測試它的容錯性,根據上面我們知道Leader是node broker1,現在查到其進程號并手動殺死.。例子就不舉了
Step7 使用Kafka來導入或者導出數據
Writing data from the console and writing it back to the console is a convenient place to start, but you'll probably want to use data from other sources or export data from Kafka to other systems. For many systems, instead of writing custom integration code you can use Kafka Connect to import or export data
Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an extensible tool that runs connectors, which implement the custom logic for interacting with an external system. In this quickstart we'll see how to run Kafka Connect with simple connectors that import data from a file to a Kafka topic and export data from a Kafka topic to a file.
首先我們手動創建一個文件并寫入幾個數據,test.txt要放在kafka目錄下
echo -e "foo\nbar" > test.txt
然后我們啟動兩個連接器運行在standalone模式下(就是單獨本地的進程),提供三個配置文件最為入參,第一個是Kafka Connect 進程的配置文件,包含一些普通的配置文件如Kafka brokers以及序列化數據的格式,剩下的配置文件每一個指定了要創建的connector, 這些文件包含一個獨一無二的connector名字
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
Kafka啟動圖片:
Spark Streaming
Spark Streaming Code
package com.scala.action.streaming
import kafka.serializer.StringDecoder
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Created by kason_zhang on 4/11/2017.
*/
object MyKafkaSparkStreaming {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MyKafkaStreamingDemo").setMaster("local[3]")
val ssc = new StreamingContext(conf,Seconds(5))
val topicLines = KafkaUtils.createStream(ssc,"10.64.24.78:2181"
,"StreamKafkaGroupId",Map("spark" -> 1))
topicLines.map(_._2).flatMap(str => str.split(" ")).print()
ssc.start()
ssc.awaitTermination()
}
}
在這里有幾點需要注意的地方
因為我沒有在centos kafka server.properties里面設置
listeners = PLAINTEXT://your.host.name:9092,它將采用默認的listeners,這樣的話host將獲取centos的host名,但是我的SparkStreaming程序是在Windows中開發的,他不能識別host,所以需要在C盤的hosts文件里面加入10.64.24.78 kason讓其能夠識別host
同時還需要在centos中開放zookeeper的2181端口以及你的kafka的端口。
IDEA的結果如圖: