flink維表查詢redis之flink-connector-redis

插件名稱:flink-connector-redis

插件地址:https://github.com/jeff-zou/flink-connector-redis.git
無法翻墻:https://gitee.com/jeff-zou/flink-connector-redis.git

項目介紹

基于bahir-flink二次開發,相對bahir調整的內容有:

1.增加Table/SQL API
2.增加維表查詢支持
3.增加查詢緩存(支持增量與全量)
4.增加支持整行保存功能,用于多字段的維表關聯查詢
5.增加限流功能,用于Flink SQL在線調試功能
6.增加支持Flink高版本(包括1.12,1.13,1.14+)
7.統一或增加過期策略、寫入并發數等其它功能。

因bahir使用的flink接口版本較老,所以改動較大,開發過程中參考了騰訊云與阿里云兩家產商的流計算產品,取兩家之長,并增加了更豐富的功能。

支持功能對應redis的操作命令有:

插入 維表查詢
set get
hset hget
rpush lpush
incrBy decrBy hincrBy zincrby
sadd zadd pfadd(hyperloglog)
publish
zrem srem
del hdel

使用方法:

在命令行執行 mvn package -DskipTests打包后,將生成的包flink-connector-redis-1.1.0.jar引入flink lib中即可,無需其它設置。


項目依賴jedis 3.7.1,如flink環境無jedis,則使用flink-connector-redis-1.1.0-jar-with-dependencies.jar


開發環境工程直接引用:

<dependency>
    <groupId>io.github.jeff-zou</groupId>
    <artifactId>flink-connector-redis</artifactId>
    <version>1.1.0</version>
</dependency>

使用說明:

value.data.structure = column(默認)

無需通過primary key來映射redis中的Key,直接由ddl中的字段順序來決定Key,如:

create table sink_redis(username VARCHAR, passport VARCHAR)  with ('command'='set') 
其中username為key, passport為value.

create table sink_redis(name VARCHAR, subject VARCHAR, score VARCHAR)  with ('command'='hset') 
其中name為map結構的key, subject為field, score為value.

value.data.structure = row

整行內容保存至value并以'\01'分割

create table sink_redis(username VARCHAR, passport VARCHAR)  with ('command'='set') 
其中username為key, username\01passport為value.

create table sink_redis(name VARCHAR, subject VARCHAR, score VARCHAR)  with ('command'='hset') 
其中name為map結構的key, subject為field, name\01subject\01score為value.

with參數說明:

字段 默認值 類型 說明
connector (none) String redis
host (none) String Redis IP
port 6379 Integer Redis 端口
password null String 如果沒有設置,則為 null
database 0 Integer 默認使用 db0
maxTotal 2 Integer 最大連接數
maxIdle 2 Integer 最大保持連接數
minIdle 1 Integer 最小保持連接數
timeout 2000 Integer 連接超時時間,單位 ms,默認 1s
cluster-nodes (none) String 集群ip與端口,當redis-mode為cluster時不為空,如:10.11.80.147:7000,10.11.80.147:7001,10.11.80.147:8000
command (none) String 對應上文中的redis命令
redis-mode (none) Integer mode類型: single cluster
lookup.cache.max-rows -1 Integer 查詢緩存大小,減少對redis重復key的查詢
lookup.cache.ttl -1 Integer 查詢緩存過期時間,單位為秒, 開啟查詢緩存條件是max-rows與ttl都不能為-1
lookup.max-retries 1 Integer 查詢失敗重試次數
lookup.cache.load-all false Boolean 開啟全量緩存,當命令為hget時,將從redis map查詢出所有元素并保存到cache中,用于解決緩存穿透問題
sink.max-retries 1 Integer 寫入失敗重試次數
sink.parallelism (none) Integer 寫入并發數
value.data.structure column String column: value值來自某一字段 (如, set: key值取自DDL定義的第一個字段, value值取自第二個字段)
row: 將整行內容保存至value并以'\01'分割
在線調試SQL時,用于限制sink資源使用的參數:
Field Default Type Description
sink.limit false Boolean 限制開頭
sink.limit.max-num 10000 Integer taskmanager內每個slot可以寫的最大數據量
sink.limit.interval 100 String taskmanager內每個slot寫入數據間隔 milliseconds
sink.limit.max-online 30 * 60 * 1000L Long taskmanager內每個slot最大在線時間, milliseconds

集群類型為sentinel時額外連接參數:

字段 默認值 類型 說明
master.name (none) String 主名
sentinels.info (none) String
sentinels.password none) String

數據類型轉換

flink type redis row converter
CHAR String
VARCHAR String
String String
BOOLEAN String String.valueOf(boolean val)
boolean Boolean.valueOf(String str)
BINARY String Base64.getEncoder().encodeToString
byte[] Base64.getDecoder().decode(String str)
VARBINARY String Base64.getEncoder().encodeToString
byte[] Base64.getDecoder().decode(String str)
DECIMAL String BigDecimal.toString
DecimalData DecimalData.fromBigDecimal(new BigDecimal(String str),int precision, int scale)
TINYINT String String.valueOf(byte val)
byte Byte.valueOf(String str)
SMALLINT String String.valueOf(short val)
short Short.valueOf(String str)
INTEGER String String.valueOf(int val)
int Integer.valueOf(String str)
DATE String the day from epoch as int
date show as 2022-01-01
TIME String the millisecond from 0'clock as int
time show as 04:04:01.023
BIGINT String String.valueOf(long val)
long Long.valueOf(String str)
FLOAT String String.valueOf(float val)
float Float.valueOf(String str)
DOUBLE String String.valueOf(double val)
double Double.valueOf(String str)
TIMESTAMP String the millisecond from epoch as long
timestamp TimeStampData.fromEpochMillis(Long.valueOf(String str))

使用示例:

  • 維表查詢:
create table sink_redis(name varchar, level varchar, age varchar) with ( 'connector'='redis', 'host'='10.11.80.147','port'='7001', 'redis-mode'='single','password'='******','command'='hset');

-- 先在redis中插入數據,相當于redis命令: hset 3 3 100 --
insert into sink_redis select * from (values ('3', '3', '100'));
                
create table dim_table (name varchar, level varchar, age varchar) with ('connector'='redis', 'host'='10.11.80.147','port'='7001', 'redis-mode'='single', 'password'='*****','command'='hget', 'maxIdle'='2', 'minIdle'='1', 'lookup.cache.max-rows'='10', 'lookup.cache.ttl'='10', 'lookup.max-retries'='3');
    
-- 隨機生成10以內的數據作為數據源 --
-- 其中有一條數據會是: username = 3  level = 3, 會跟上面插入的數據關聯 -- 
create table source_table (username varchar, level varchar, proctime as procTime()) with ('connector'='datagen',  'rows-per-second'='1',  'fields.username.kind'='sequence',  'fields.username.start'='1',  'fields.username.end'='10', 'fields.level.kind'='sequence',  'fields.level.start'='1',  'fields.level.end'='10');

create table sink_table(username varchar, level varchar,age varchar) with ('connector'='print');

insert into
    sink_table
select
    s.username,
    s.level,
    d.age
from
    source_table s
left join dim_table for system_time as of s.proctime as d on
    d.name = s.username
    and d.level = s.level;
-- username為3那一行會關聯到redis內的值,輸出為: 3,3,100   
  • 多字段的維表關聯查詢

很多情況維表有多個字段,本實例展示如何利用'value.data.structure'='row'寫多字段并關聯查詢。

-- 創建表
create table sink_redis(uid VARCHAR,score double,score2 double )
with ( 'connector' = 'redis',
            'host' = '10.11.69.176',
            'port' = '6379',
            'redis-mode' = 'single',
            'password' = '****',
            'command' = 'SET',
            'value.data.structure' = 'row');  -- 'value.data.structure'='row':整行內容保存至value并以'\01'分割
-- 寫入測試數據,score、score2為需要被關聯查詢出的兩個維度
insert into sink_redis select * from (values ('1', 10.3, 10.1));

-- 在redis中,value的值為: "1\x0110.3\x0110.1" --
-- 寫入結束 --

-- create join table --
create table join_table with ('command'='get', 'value.data.structure'='row') like sink_redis

-- create result table --
create table result_table(uid VARCHAR, username VARCHAR, score double, score2 double) with ('connector'='print')

-- create source table --
create table source_table(uid VARCHAR, username VARCHAR, proc_time as procTime()) with ('connector'='datagen', 'fields.uid.kind'='sequence', 'fields.uid.start'='1', 'fields.uid.end'='2')

-- 關聯查詢維表,獲得維表的多個字段值 --
insert
    into
    result_table
select
    s.uid,
    s.username,
    j.score, -- 來自維表
    j.score2 -- 來自維表
from
    source_table as s
join join_table for system_time as of s.proc_time as j on
    j.uid = s.uid
    
result:
2> +I[2, 1e0fe885a2990edd7f13dd0b81f923713182d5c559b21eff6bda3960cba8df27c69a3c0f26466efaface8976a2e16d9f68b3, null, null]
1> +I[1, 30182e00eca2bff6e00a2d5331e8857a087792918c4379155b635a3cf42a53a1b8f3be7feb00b0c63c556641423be5537476, 10.3, 10.1]
  • DataStream查詢方式

    示例代碼路徑: src/test/java/org.apache.flink.streaming.connectors.redis.datastream.DataStreamTest.java

    hset示例,相當于redis命令:hset tom math 150

Configuration configuration = new Configuration();
configuration.setString(REDIS_MODE, REDIS_CLUSTER);
configuration.setString(REDIS_COMMAND, RedisCommand.HSET.name());

RedisSinkMapper redisMapper = (RedisSinkMapper)RedisHandlerServices
.findRedisHandler(RedisMapperHandler.class, configuration.toMap())
.createRedisMapper(configuration);

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

GenericRowData genericRowData = new GenericRowData(3);
genericRowData.setField(0, "tom");
genericRowData.setField(1, "math");
genericRowData.setField(2, "152");
DataStream<GenericRowData> dataStream = env.fromElements(genericRowData, genericRowData);

RedisCacheOptions redisCacheOptions = new RedisCacheOptions.Builder().setCacheMaxSize(100).setCacheTTL(10L).build();
FlinkJedisConfigBase conf = getLocalRedisClusterConfig();
RedisSinkFunction redisSinkFunction = new RedisSinkFunction<>(conf, redisMapper, redisCacheOptions);

dataStream.addSink(redisSinkFunction).setParallelism(1);
env.execute("RedisSinkTest");
  • redis-cluster寫入示例

    示例代碼路徑: src/test/java/org.apache.flink.streaming.connectors.redis.table.SQLTest.java

    set示例,相當于redis命令: set test test11

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings environmentSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, environmentSettings);

String ddl = "create table sink_redis(username VARCHAR, passport VARCHAR) with ( 'connector'='redis', " +
              "'cluster-nodes'='10.11.80.147:7000,10.11.80.147:7001','redis- mode'='cluster','password'='******','command'='set')" ;

tEnv.executeSql(ddl);
String sql = " insert into sink_redis select * from (values ('test', 'test11'))";
TableResult tableResult = tEnv.executeSql(sql);
tableResult.getJobClient().get()
.getJobExecutionResult()
.get();

開發與測試環境

ide: IntelliJ IDEA

code format: google-java-format + Save Actions

code check: CheckStyle

flink 1.12/1.13/1.14+

jdk1.8 jedis3.7.1

如果需要flink 1.12版本支持,請切換到分支flink-1.12

<dependency>
    <groupId>io.github.jeff-zou</groupId>
    <artifactId>flink-connector-redis</artifactId>
    <version>1.1.1-1.12</version>
</dependency>
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。