docker-flink | Apache Flink docker image | Continuous Deployment library
kandi X-RAY | docker-flink Summary
kandi X-RAY | docker-flink Summary
Apache Flink docker image
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of docker-flink
docker-flink Key Features
docker-flink Examples and Code Snippets
Community Discussions
Trending Discussions on docker-flink
QUESTION
I am using Flink running inside ECS installed from docker-flink. I have enabled externalized checkpoint to AWS S3 via state.checkpoints.dir
to S3 in flink-conf.yaml.
Now according to Flink documentation here if we want resume from a checkpoint in case of failure we have to say bin/flink run -s :checkpointMetaDataPath [:runArgs]
but I use FLINK_HOME/bin standalone-job.sh start-foreground
. So I am not able to figure out how my Flink job would resume from externalized checkpoint in case of failure.
Do we really need to have some config option option of resuming from checkpoint? Can't JM as part of restart strategy automatically read last offsets from state store? I am new to Flink.
...ANSWER
Answered 2020-Apr-06 at 08:58The referred Dockerfile alone won't start a Flink job. It will only start a Flink session cluster which is able to execute Flink jobs. The next step is to use bin/flink run
to submit a job. Once you have a job, which has enabled checkpointing via StreamExecutionEnvironment.enableCheckpointing
, submitted and running it will create checkpoints to the configured location.
If you have retaining of checkpoints enabled, then you can cancel the job and resume it from a checkpoint via bin/flink run -s ...
.
In case that you are running a per job cluster where the image already contains the user code jars, then you can resume from a savepoint by starting the image with --fromSavepoint
as a command line argument. Note that needs to be accessible from container running the job manager.
In order to resume from a checkpoint when using standalone-job.sh
you have to call
QUESTION
I wanted to use Google Cloud Storage to write (sink) elements of DataStream
from my streaming job using StreamingFileSink
.
For doing that, I used Google Cloud Storage connector for Hadoop as an implementation of org.apache.hadoop.fs.FileSystem
, and used HadoopFileSystem
as an implementation of org.apache.flink.core.fs.FileSystem
that wraps the hadoop FileSystem class for Flink.
I included the following dependencies in my gradle file:
compile(
"com.google.cloud.bigdataoss:gcs-connector:1.9.4-hadoop2"
)
compile(
"org.apache.flink:flink-connector-filesystem_2.11:1.6.0"
)
provided(
"org.apache.flink:flink-shaded-hadoop2:1.6.0"
)
Now, from what I understand looking at the sources [1] [2] [3], Flink dynamically loads the implementations of FileSystemFactory
at runtime (via java.util.ServiceLoader
) and also loads the HadoopFsFactory
at runtime (via reflection, if it finds Hadoop in classpath) which it then uses to create instances of FileSystem
.
The issue I faced was that the default RecoverableWriter
for Hadoop compatibility package only supports hdfs
file scheme (I use gs
) and hence throws an error at runtime.
So, I extended
the HadoopFileSystem
(I called GCSFileSystem
) and @overrided
the FileSystem#createRecoverableWriter()
to return a custom implementation of RecoverableWriter
which then handle the details of recovery, etc. and also created a corresponding FileSystemFactory
class (the class is decorated with @AutoService
and thus should be discoverable by ServiceLoader
).
The setup works well locally and on a local docker cluster (actually the GCS connector throws an error due to lack of authorization, but that's fine since it means that the FileSystem
is loaded and running) but it fails when I deploy it to a docker cluster running on Google Compute Engine.
On GCE, the default HadoopFileSystem
gets loaded and throws the exception as the scheme is gs
and not hdfs
, but my assumption is that it should have loaded my implementation of the factory and thus this error shouldn't have arised.
I am on Flink v1.6.0 and running as long running session cluster on Docker using docker-flink
...ANSWER
Answered 2018-Sep-22 at 06:26The answer is in the last line of the OP!!
I was running on a long living Session-cluster and by the time my job.jar
was executed the FileSystem
initialization had already been done and the factories were already loaded! and so, no initialization calls were made when I added my Job.
The solution? There are a few ways depending on how you deploy your job:
Standalone: Add the jar containing
FileSystem
implementation to thelib/
directoryCluster (
manual
): Add the jar containingFileSystem
implementation to thelib/
directory of yourzip
or image or whatever.Cluster (
docker
)(long-living
): Create a custom container image and add the jar to thelib/
directory of that image.Cluster (
docker
)(per-job-session
): Create a custom container image and add all the jars (containingFileSystem
and your job, etc.) to thelib/
directory, read more about per-job session here.
QUESTION
I'm trying to run a simple Apache Beam pipeline on a Flink 1.5.2 docker image. When I run the main class to deploy the pipeline, I get a weird 404 error. The pipeline runs fine on Google Cloud Dataflow.
I run the main with parameters --runner=FlinkRunner, --flinkMaster=localhost:8081
, and I can see the Flink dashboard on http://localhost:8081
. The deploy fails with:
Unrecognised token 'failure': was expecting ('true', 'false' or 'null)
and it appears to have been trying to access localhost:8081/blobserver/port
based on the debug output - I can confirm this path returns a 404 when I do a get request to it.
I get the a similar problem when I try to deploy the job as a fat jar from the web UI. RestException in JarPlanHandler.
I've tried versions 1.6.0 and 1.5.x - specifically I'm using https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/docker-compose.yml like this:
FLINK_DOCKER_IMAGE_NAME=flink:1.5.0 docker-compose up
What am I doing wrong?
...ANSWER
Answered 2018-Sep-14 at 12:36Please downgrade your Flink to 1.5.0 and everything should work. In REST API you will find blobserver/port
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install docker-flink
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page