flink搭建
搭建起来比较简单,只需要三个配置文件,使用kubectl create -f 就可以了
通过jobmanager-service的cluster-ip:8081,即可访问flink UI的页面
jobmanager-service.yaml
1 | apiVersion: v1 |
jobmanager-deployment.yaml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-jobmanager
spec:
replicas: 1
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
containers:
- name: jobmanager
image: flink:latest
args:
- jobmanager
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
env:
- name: JOB_MANAGER_RPC_ADDRESS
value: # 通过kubectl get services 查看jobmanager-service的cluster-ip
```
taskmanager-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-taskmanager
spec:
replicas: 2
template:
metadata:
labels:
app: flink
component: taskmanager
spec:
containers:
- name: taskmanager
image: flink:latest
args:
- taskmanager
ports:
- containerPort: 6121
name: data
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query
env:
- name: JOB_MANAGER_RPC_ADDRESS
value: # 通过kubectl get services 查看jobmanager-service的cluster-ip
1 |
|
- Close ResourceManager connection 85654ea551c6c887217cb9ed765e3977.
- 重连resourceManager
- Could not resolve ResourceManager address akka.tcp://flink@10.110.22.162:6123/user/resourcemanage
r, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@10.110.22.162:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type “akka
.actor.Identify”.. - 一直重试失败后,org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
1 |
|
timeout之后,又有注册到resourcemanager.slotmanager上了,然后任务RESTARTING1
2
3
4
5
6
72018-08-07 05:43:13,867 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Closing TaskExecutor connection 2689b4542574aeffb6c1568a130908c2 because: The heartbeat of Ta
skManager with id 2689b4542574aeffb6c1568a130908c2 timed out.
2018-08-07 05:43:13,867 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager 59c8e7cbbbbaaff3569e663f6e03da05 from the SlotManager.
2018-08-07 05:43:17,431 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Registering TaskManager 2689b4542574aeffb6c1568a130908c2 under ca97f0a26b6d08cdb8522855c124d21c
at the SlotManager.
2018-08-07 05:43:18,538 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Monitor run (fe766311552623989de02fda30f32127) switched from state RESTARTING to CREATED.
2018-08-07 05:43:18,538 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Monitor run (fe766311552623989de02fda30f32127) switched from state CREATED to RUNNING.
无法再通过重启来恢复了,因为重启策略阻止了再重启
1 | 2018-08-08 12:09:28,239 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Monitor run (2df26ee10d34f33c88b852c9e2c3fff2) because the restart strate |