插件名稱:flink-connector-redis
插件地址:https://github.com/jeff-zou/flink-connector-redis.git
無法翻墻:https://gitee.com/jeff-zou/flink-connector-redis.git
項目介紹
基于bahir-flink二次開發,相對bahir調整的內容有:
1.增加Table/SQL API
2.增加維表查詢支持
3.增加查詢緩存(支持增量與全量)
4.增加支持整行保存功能,用于多字段的維表關聯查詢
5.增加限流功能,用于Flink SQL在線調試功能
6.增加支持Flink高版本(包括1.12,1.13,1.14+)
7.統一或增加過期策略、寫入并發數等其它功能。
因bahir使用的flink接口版本較老,所以改動較大,開發過程中參考了騰訊云與阿里云兩家產商的流計算產品,取兩家之長,并增加了更豐富的功能。
支持功能對應redis的操作命令有:
插入 | 維表查詢 |
---|---|
set | get |
hset | hget |
rpush lpush | |
incrBy decrBy hincrBy zincrby | |
sadd zadd pfadd(hyperloglog) | |
publish | |
zrem srem | |
del hdel |
使用方法:
在命令行執行 mvn package -DskipTests打包后,將生成的包flink-connector-redis-1.1.0.jar引入flink lib中即可,無需其它設置。
項目依賴jedis 3.7.1,如flink環境無jedis,則使用flink-connector-redis-1.1.0-jar-with-dependencies.jar
開發環境工程直接引用:
<dependency>
<groupId>io.github.jeff-zou</groupId>
<artifactId>flink-connector-redis</artifactId>
<version>1.1.0</version>
</dependency>
使用說明:
value.data.structure = column(默認)
無需通過primary key來映射redis中的Key,直接由ddl中的字段順序來決定Key,如:
create table sink_redis(username VARCHAR, passport VARCHAR) with ('command'='set')
其中username為key, passport為value.
create table sink_redis(name VARCHAR, subject VARCHAR, score VARCHAR) with ('command'='hset')
其中name為map結構的key, subject為field, score為value.
value.data.structure = row
整行內容保存至value并以'\01'分割
create table sink_redis(username VARCHAR, passport VARCHAR) with ('command'='set')
其中username為key, username\01passport為value.
create table sink_redis(name VARCHAR, subject VARCHAR, score VARCHAR) with ('command'='hset')
其中name為map結構的key, subject為field, name\01subject\01score為value.
with參數說明:
字段 | 默認值 | 類型 | 說明 |
---|---|---|---|
connector | (none) | String | redis |
host | (none) | String | Redis IP |
port | 6379 | Integer | Redis 端口 |
password | null | String | 如果沒有設置,則為 null |
database | 0 | Integer | 默認使用 db0 |
maxTotal | 2 | Integer | 最大連接數 |
maxIdle | 2 | Integer | 最大保持連接數 |
minIdle | 1 | Integer | 最小保持連接數 |
timeout | 2000 | Integer | 連接超時時間,單位 ms,默認 1s |
cluster-nodes | (none) | String | 集群ip與端口,當redis-mode為cluster時不為空,如:10.11.80.147:7000,10.11.80.147:7001,10.11.80.147:8000 |
command | (none) | String | 對應上文中的redis命令 |
redis-mode | (none) | Integer | mode類型: single cluster |
lookup.cache.max-rows | -1 | Integer | 查詢緩存大小,減少對redis重復key的查詢 |
lookup.cache.ttl | -1 | Integer | 查詢緩存過期時間,單位為秒, 開啟查詢緩存條件是max-rows與ttl都不能為-1 |
lookup.max-retries | 1 | Integer | 查詢失敗重試次數 |
lookup.cache.load-all | false | Boolean | 開啟全量緩存,當命令為hget時,將從redis map查詢出所有元素并保存到cache中,用于解決緩存穿透問題 |
sink.max-retries | 1 | Integer | 寫入失敗重試次數 |
sink.parallelism | (none) | Integer | 寫入并發數 |
value.data.structure | column | String | column: value值來自某一字段 (如, set: key值取自DDL定義的第一個字段, value值取自第二個字段) row: 將整行內容保存至value并以'\01'分割 |
在線調試SQL時,用于限制sink資源使用的參數:
Field | Default | Type | Description |
---|---|---|---|
sink.limit | false | Boolean | 限制開頭 |
sink.limit.max-num | 10000 | Integer | taskmanager內每個slot可以寫的最大數據量 |
sink.limit.interval | 100 | String | taskmanager內每個slot寫入數據間隔 milliseconds |
sink.limit.max-online | 30 * 60 * 1000L | Long | taskmanager內每個slot最大在線時間, milliseconds |
集群類型為sentinel時額外連接參數:
字段 | 默認值 | 類型 | 說明 |
---|---|---|---|
master.name | (none) | String | 主名 |
sentinels.info | (none) | String | |
sentinels.password | none) | String |
數據類型轉換
flink type | redis row converter |
---|---|
CHAR | String |
VARCHAR | String |
String | String |
BOOLEAN | String String.valueOf(boolean val) boolean Boolean.valueOf(String str) |
BINARY | String Base64.getEncoder().encodeToString byte[] Base64.getDecoder().decode(String str) |
VARBINARY | String Base64.getEncoder().encodeToString byte[] Base64.getDecoder().decode(String str) |
DECIMAL | String BigDecimal.toString DecimalData DecimalData.fromBigDecimal(new BigDecimal(String str),int precision, int scale) |
TINYINT | String String.valueOf(byte val) byte Byte.valueOf(String str) |
SMALLINT | String String.valueOf(short val) short Short.valueOf(String str) |
INTEGER | String String.valueOf(int val) int Integer.valueOf(String str) |
DATE | String the day from epoch as int date show as 2022-01-01 |
TIME | String the millisecond from 0'clock as int time show as 04:04:01.023 |
BIGINT | String String.valueOf(long val) long Long.valueOf(String str) |
FLOAT | String String.valueOf(float val) float Float.valueOf(String str) |
DOUBLE | String String.valueOf(double val) double Double.valueOf(String str) |
TIMESTAMP | String the millisecond from epoch as long timestamp TimeStampData.fromEpochMillis(Long.valueOf(String str)) |
使用示例:
-
維表查詢:
create table sink_redis(name varchar, level varchar, age varchar) with ( 'connector'='redis', 'host'='10.11.80.147','port'='7001', 'redis-mode'='single','password'='******','command'='hset');
-- 先在redis中插入數據,相當于redis命令: hset 3 3 100 --
insert into sink_redis select * from (values ('3', '3', '100'));
create table dim_table (name varchar, level varchar, age varchar) with ('connector'='redis', 'host'='10.11.80.147','port'='7001', 'redis-mode'='single', 'password'='*****','command'='hget', 'maxIdle'='2', 'minIdle'='1', 'lookup.cache.max-rows'='10', 'lookup.cache.ttl'='10', 'lookup.max-retries'='3');
-- 隨機生成10以內的數據作為數據源 --
-- 其中有一條數據會是: username = 3 level = 3, 會跟上面插入的數據關聯 --
create table source_table (username varchar, level varchar, proctime as procTime()) with ('connector'='datagen', 'rows-per-second'='1', 'fields.username.kind'='sequence', 'fields.username.start'='1', 'fields.username.end'='10', 'fields.level.kind'='sequence', 'fields.level.start'='1', 'fields.level.end'='10');
create table sink_table(username varchar, level varchar,age varchar) with ('connector'='print');
insert into
sink_table
select
s.username,
s.level,
d.age
from
source_table s
left join dim_table for system_time as of s.proctime as d on
d.name = s.username
and d.level = s.level;
-- username為3那一行會關聯到redis內的值,輸出為: 3,3,100
-
多字段的維表關聯查詢
很多情況維表有多個字段,本實例展示如何利用'value.data.structure'='row'寫多字段并關聯查詢。
-- 創建表
create table sink_redis(uid VARCHAR,score double,score2 double )
with ( 'connector' = 'redis',
'host' = '10.11.69.176',
'port' = '6379',
'redis-mode' = 'single',
'password' = '****',
'command' = 'SET',
'value.data.structure' = 'row'); -- 'value.data.structure'='row':整行內容保存至value并以'\01'分割
-- 寫入測試數據,score、score2為需要被關聯查詢出的兩個維度
insert into sink_redis select * from (values ('1', 10.3, 10.1));
-- 在redis中,value的值為: "1\x0110.3\x0110.1" --
-- 寫入結束 --
-- create join table --
create table join_table with ('command'='get', 'value.data.structure'='row') like sink_redis
-- create result table --
create table result_table(uid VARCHAR, username VARCHAR, score double, score2 double) with ('connector'='print')
-- create source table --
create table source_table(uid VARCHAR, username VARCHAR, proc_time as procTime()) with ('connector'='datagen', 'fields.uid.kind'='sequence', 'fields.uid.start'='1', 'fields.uid.end'='2')
-- 關聯查詢維表,獲得維表的多個字段值 --
insert
into
result_table
select
s.uid,
s.username,
j.score, -- 來自維表
j.score2 -- 來自維表
from
source_table as s
join join_table for system_time as of s.proc_time as j on
j.uid = s.uid
result:
2> +I[2, 1e0fe885a2990edd7f13dd0b81f923713182d5c559b21eff6bda3960cba8df27c69a3c0f26466efaface8976a2e16d9f68b3, null, null]
1> +I[1, 30182e00eca2bff6e00a2d5331e8857a087792918c4379155b635a3cf42a53a1b8f3be7feb00b0c63c556641423be5537476, 10.3, 10.1]
-
DataStream查詢方式
示例代碼路徑: src/test/java/org.apache.flink.streaming.connectors.redis.datastream.DataStreamTest.java
hset示例,相當于redis命令:hset tom math 150
Configuration configuration = new Configuration();
configuration.setString(REDIS_MODE, REDIS_CLUSTER);
configuration.setString(REDIS_COMMAND, RedisCommand.HSET.name());
RedisSinkMapper redisMapper = (RedisSinkMapper)RedisHandlerServices
.findRedisHandler(RedisMapperHandler.class, configuration.toMap())
.createRedisMapper(configuration);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
GenericRowData genericRowData = new GenericRowData(3);
genericRowData.setField(0, "tom");
genericRowData.setField(1, "math");
genericRowData.setField(2, "152");
DataStream<GenericRowData> dataStream = env.fromElements(genericRowData, genericRowData);
RedisCacheOptions redisCacheOptions = new RedisCacheOptions.Builder().setCacheMaxSize(100).setCacheTTL(10L).build();
FlinkJedisConfigBase conf = getLocalRedisClusterConfig();
RedisSinkFunction redisSinkFunction = new RedisSinkFunction<>(conf, redisMapper, redisCacheOptions);
dataStream.addSink(redisSinkFunction).setParallelism(1);
env.execute("RedisSinkTest");
-
redis-cluster寫入示例
示例代碼路徑: src/test/java/org.apache.flink.streaming.connectors.redis.table.SQLTest.java
set示例,相當于redis命令: set test test11
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings environmentSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, environmentSettings);
String ddl = "create table sink_redis(username VARCHAR, passport VARCHAR) with ( 'connector'='redis', " +
"'cluster-nodes'='10.11.80.147:7000,10.11.80.147:7001','redis- mode'='cluster','password'='******','command'='set')" ;
tEnv.executeSql(ddl);
String sql = " insert into sink_redis select * from (values ('test', 'test11'))";
TableResult tableResult = tEnv.executeSql(sql);
tableResult.getJobClient().get()
.getJobExecutionResult()
.get();
開發與測試環境
ide: IntelliJ IDEA
code format: google-java-format + Save Actions
code check: CheckStyle
flink 1.12/1.13/1.14+
jdk1.8 jedis3.7.1
如果需要flink 1.12版本支持,請切換到分支flink-1.12
<dependency>
<groupId>io.github.jeff-zou</groupId>
<artifactId>flink-connector-redis</artifactId>
<version>1.1.1-1.12</version>
</dependency>