flink | Apache Flink is an open source project | SQL Database library

 by   apache Java Version: release-1.17.1 License: Apache-2.0

kandi X-RAY | flink Summary

kandi X-RAY | flink Summary

flink is a Java library typically used in Telecommunications, Media, Media, Entertainment, Database, SQL Database, Spark, Hadoop applications. flink has build file available, it has a Permissive License and it has medium support. However flink has 2490 bugs and it has 21 vulnerabilities. You can download it from GitHub, Maven.

Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              flink has a medium active ecosystem.
              It has 21401 star(s) with 12120 fork(s). There are 945 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              flink has no issues reported. There are 1023 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of flink is release-1.17.1

            kandi-Quality Quality

              OutlinedDot
              flink has 2490 bugs (137 blocker, 120 critical, 1174 major, 1059 minor) and 38707 code smells.

            kandi-Security Security

              flink has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              OutlinedDot
              flink code analysis shows 21 unresolved vulnerabilities (15 blocker, 5 critical, 0 major, 1 minor).
              There are 1446 security hotspots that need review.

            kandi-License License

              flink is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              flink releases are not available. You will need to build from source code and install.
              Deployable package is available in Maven.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              flink saves you 4780464 person hours of effort in developing the same functionality from scratch.
              It has 1703645 lines of code, 119488 functions and 14240 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed flink and discovered the below as its top functions. This is intended to give you an instant insight into flink implemented functionality, and help decide if they suit your requirements.
            • Initialize handlers .
            • Starts the application .
            • Generate a select logical plan .
            • Registers an alias from the given node .
            • Perform a phase 1 .
            • Performs the actual operation .
            • Returns the converter for the given data type .
            • Runs the loop .
            • Load resource .
            • Builds an execution graph .
            Get all kandi verified functions for this library.

            flink Key Features

            No Key Features are available at this moment for flink.

            flink Examples and Code Snippets

            Building Apache Flink from Source
            mavendot img1Lines of Code : 3dot img1no licencesLicense : No License
            copy iconCopy
            git clone https://github.com/apache/flink.git
            cd flink
            ./mvnw clean package -DskipTests # this will take up to 10 minutes
            
              
            Capitalizes the words in the Flink topic .
            javadot img2Lines of Code : 20dot img2License : Permissive (MIT License)
            copy iconCopy
            public static void capitalize() throws Exception {
                    String inputTopic = "flink_input";
                    String outputTopic = "flink_output";
                    String consumerGroup = "baeldung";
                    String address = "localhost:9092";
            
                    StreamExecutionE  
            Consume SSE from Flink endpoint .
            javadot img3Lines of Code : 14dot img3License : Permissive (MIT License)
            copy iconCopy
            @Async
                public void consumeSSEFromFluxEndpoint() {
                    ParameterizedTypeReference> type = new ParameterizedTypeReference>() {
                    };
            
                    Flux> eventStream = client.get()
                        .uri("/stream-flux")
                        .accept(Me  

            Community Discussions

            QUESTION

            Flink capped MapState
            Asked 2022-Apr-08 at 09:03
            Background

            We want to keep in a Flink operator's state the last n unique id's. When the n+1 unique id arrives, we want to keep it and drop the oldest unique id in the state. This is in order to avoid an ever-growing state.

            We already have a TTL (expiration time) mechanism in place. The size limit is another restriction we're looking to put in place.

            Not every element holds a unique id.

            Question

            Does Flink provide an API that limits the number of elements in the state?

            Things tried
            1. Using MapState with a StateTtlConfig generated TTL/expiration mechanism.
            2. Window limited the number of processed elements, but not the number of elements in state.
            ...

            ANSWER

            Answered 2022-Apr-07 at 14:30

            I don't think Flink has a state type that would support this out of the box. The closest thing I can think of is to use ListState. With ListState you can append elements like you would a regular list.

            For your use case, you would read the state, call .get() which would give you an iterable that you can iterate on, removing the item you'd like to drop and then pushing the state back.

            From a performance perspective, the iteration may not be ideal but on the other hand, it would not be significant in comparison to accessing state from disk (in case you're using RocksDB as a state backend) which incurs a heavy cost due to serialization and deserialization.

            Source https://stackoverflow.com/questions/71782690

            QUESTION

            Migrating from FlinkKafkaConsumer to KafkaSource, no windows executed
            Asked 2021-Nov-30 at 05:29

            I am a kafka and flink beginner. I have implemented FlinkKafkaConsumer to consume messages from a kafka-topic. The only custom setting other than "group" and "topic" is (ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest") to enable re-reading the same messages several times. It works out of the box for consuming and logic. Now FlinkKafkaConsumer is deprecated, and i wanted to change to the successor KafkaSource.

            Initializing KafkaSource with the same parameters as i do FlinkKafkaConsumer produces a read of the topic as expected, i can verify this by printing the stream. De-serialization and timestamps seem to work fine. However execution of windows are not done, and as such no results are produced.

            I assume some default setting(s) in KafkaSource are different from that of FlinkKafkaConsumer, but i have no idea what they might be.

            KafkaSource - Not working

            ...

            ANSWER

            Answered 2021-Nov-24 at 18:39

            Update: The answer is that the KafkaSource behaves differently than FlinkKafkaConsumer in the case where the number of Kafka partitions is smaller than the parallelism of Flink's kafka source operator. See https://stackoverflow.com/a/70101290/2000823 for details.

            Original answer:

            The problem is almost certainly something related to the timestamps and watermarks.

            To verify that timestamps and watermarks are the problem, you could do a quick experiment where you replace the 3-hour-long event time sliding windows with short processing time tumbling windows.

            In general it is preferred (but not required) to have the KafkaSource do the watermarking. Using forMonotonousTimestamps in a watermark generator applied after the source, as you are doing now, is a risky move. This will only work correctly if the timestamps in all of the partitions being consumed by each parallel instance of the source are processed in order. If more than one Kafka partition is assigned to any of the KafkaSource tasks, this isn't going to happen. On the other hand, if you supply the forMonotonousTimestamps watermarking strategy in the fromSource call (rather than noWatermarks), then all that will be required is that the timestamps be in order on a per-partition basis, which I imagine is the case.

            As troubling as that is, it's probably not enough to explain why the windows don't produce any results. Another possible root cause is that the test data set doesn't include any events with timestamps after the first window, so that window never closes.

            Do you have a sink? If not, that would explain things.

            You can use the Flink dashboard to help debug this. Look to see if the watermarks are advancing in the window tasks. Turn on checkpointing, and then look to see how much state the window task has -- it should have some non-zero amount of state.

            Source https://stackoverflow.com/questions/69765972

            QUESTION

            OSS supported by Google Cloud Dataproc
            Asked 2021-Oct-01 at 17:18

            When I go to https://cloud.google.com/dataproc, I see this ...

            "Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks."

            But gcloud dataproc jobs submit doesn't list all of them. It lists only 8 (hadoop, hive, pig, presto, pyspark, spark, spark-r, spark-sql). Any idea why?

            ...

            ANSWER

            Answered 2021-Oct-01 at 17:18

            Some OSS components are offered as Dataproc Optional Components. Not of all them have a job submit API, some (e.g., Anaconda, Jupyter) don't need one, some (e.g., Flink, Druid) might add in the future.

            Some other OSS components are offered as libraries, e.g., GCS connector, BigQuery connector, Apache Parquet.

            Source https://stackoverflow.com/questions/69408310

            QUESTION

            Questions for reading data from JDBC source in DataStream Flink
            Asked 2021-Aug-11 at 07:39

            I'm starting a new Flink application to allow my company to perform lots of reporting. We have an existing legacy system with most the data we need held in SQL Server databases. We will need to consume data from these databases initially before starting to consume more data from newly deployed Kafka streams.

            I've spent a lot of time reading the Flink book and web pages but I have some simple questions and assumptions I hope you can help with so I can progress.

            Firstly, I am wanting to use the DataStream API so we can both consume historic data and also realtime data. I do not think I want to use the DataSet API but I also don't see the point in using the SQL/Table apis as I would prefer to write my functions in Java classes. I need to maintain my own state and it seems DataStream keyed functions are the way to go.

            Now I am trying to actually write code against our production databases, I need to be able to read in "streams" of data with SQL queries - there does not appear to be a JDBC source connector so I think I have to make the JDBC call myself and then possibly create a DataSource using env.fromElements(). Obviously this is a "bounded" data set, but how else am I meant to get historic data loaded in? In the future I want to include a Kafka stream as well which will only have a few weeks worth of data, so I imagine I will sometimes need to merge data from a SQL Server/Snowflake database with a live stream from a Kafka stream. What is the best practice for this as I don't see examples discussing this.

            With retrieving data from a JDBC source, I have also seen some examples using a StreamingTableEnvironment - am I meant to use this somehow instead to query data from a JDBC connection into my DataStream functions etc? Again, I want to write my functions in Java not some Flink SQL. Is it best practice to use a StreamingTableEnvironment to query JDBC data if I'm only using the DataStream API?

            ...

            ANSWER

            Answered 2021-Aug-11 at 07:39

            Following approaches can be used to read from the database and create a datastream :

            1. You can use RichParallelSourceFunction where you can do a custom query to your database and get the datastream from it. An SQL with JDBC driver can be fired in the extension of RichParallelSourceFunction class.

            2. Using Table DataStream API - It is possible to query a Database by creating a JDBC catalog and then transform it into a stream

            3. An alternative to this, a more expensive solution perhaps - You can use a Flink CDC connectors which provides source connectors for Apache Flink, ingesting changes from different databases using change data capture (CDC)

            Then you can add Kafka as source and get a datastream.

            So, briefly your your pipeline could look like follows : You have both the sources transformed as a datastreams, you can join these streams using, for example coprocess function which will also give you possibility to maintain a state and use it in your business logic. And finally Sink your final output to either a database, to Kafka or to even AWS S3 buckets using a Sink function.

            Source https://stackoverflow.com/questions/68726957

            QUESTION

            how flink interacts with MySQL for the temporal join with mysql
            Asked 2021-Aug-07 at 14:24

            I am reading at

            https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/joins/#lookup-join,

            It is using the MySQL as a lookup table in the temporal table join as

            ...

            ANSWER

            Answered 2021-Aug-07 at 14:24

            You'll find some relevant details in the docs for the Table / JDBC connector: https://ci.apache.org/projects/flink/flink-docs-stable/docs/connectors/table/jdbc/#features. See especially the section describing the Lookup Cache, which says

            JDBC connector can be used in temporal join as a lookup source (aka. dimension table). Currently, only sync lookup mode is supported.

            By default, lookup cache is not enabled. You can enable it by setting both lookup.cache.max-rows and lookup.cache.ttl.

            The lookup cache is used to improve performance of temporal join the JDBC connector. By default, lookup cache is not enabled, so all the requests are sent to external database. When lookup cache is enabled, each process (i.e. TaskManager) will hold a cache. Flink will lookup the cache first, and only send requests to external database when cache missing, and update cache with the rows returned. The oldest rows in cache will be expired when the cache hit to the max cached rows lookup.cache.max-rows or when the row exceeds the max time to live lookup.cache.ttl. The cached rows might not be the latest, users can tune lookup.cache.ttl to a smaller value to have a better fresh data, but this may increase the number of requests send to database. So this is a balance between throughput and correctness.

            Source https://stackoverflow.com/questions/68689647

            QUESTION

            What's the difference between temporal table function and versioned table
            Asked 2021-Aug-02 at 04:34

            In Flink 1.12, Flink introduced a new concept which is called versioned table, It is very similar with temporal table function,but I am kind of confused between these two concepts.

            temporal table function only supports append-only table(Please correct me if I am wrong),such as:

            ...

            ANSWER

            Answered 2021-Aug-02 at 04:34

            You can view Temporal Table Function as a previous version of some sorts of versioned tables (This is also stated in the legacy features part of Flink).

            In your first example, you create an append only table which will hold multiple keys of the same currency (i.e. EURO), which makes it ineligible to be a primary key.

            Can you call an append only table a versioned table? The definition on Versioned Tables in doc says:

            Flink SQL can define versioned tables over any dynamic table with a PRIMARY KEY constraint and time attribute.

            This implies that an append only table cannot serve as a versioned table since it can't, as we said, hold true the primary key constraint.

            But, since the append only table in your example does hold the relevant information to become a versioned table, containing inserts as well as updates and deletes, we can turn it to one with a deduplication query, as you posted above.

            Source https://stackoverflow.com/questions/68607727

            QUESTION

            Apache Flink delay processing of certain events
            Asked 2021-Jul-29 at 16:16

            I have a requirement to delay processing of some of the events.

            eg. I have three events (published on Kafka):

            • A (id: 1, retryAt: now)
            • B (id: 2, retryAt: 10 minutes later)
            • C (id: 3, retryAt: now)

            I need to process record A and C immediately while record B needs to be processed Ten minutes later. Is this something feasible to achieve in Apache Flink?

            So far whatever I have researched, it seems, "Triggers" is something which might help to achieve it in Flink but have not been able to implement it correctly yet.

            I looked through Kafka documentation too, but it doesn't look feasible there.

            ...

            ANSWER

            Answered 2021-Jul-29 at 16:16

            Triggers are for windows, but windowing doesn't seem appropriate for your use case.

            A better solution would be to use Timers with a KeyedProcessFunction. Depending on whether you want to wait for 10 minutes of processing time, or 10 minutes of event time, you'll choose processing time timers or event time timers.

            You'll also need to use Flink state to store the events that need to be processed later.

            You'll find the documentation for process functions here. There are some additional examples in the Flink training, here and here.

            FWIW, Flink's Stateful Functions API might be a better fit for what you're doing, in which case you would use delayed messages.

            Source https://stackoverflow.com/questions/68578746

            QUESTION

            Flink, Kafka and JDBC sink
            Asked 2021-Jul-01 at 13:09

            I have a Flink 1.11 job that consumes messages from a Kafka topic, keys them, filters them (keyBy followed by a custom ProcessFunction), and saves them into the db via JDBC sink (as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/jdbc.html)

            The Kafka consumer is initialized with these options:

            ...

            ANSWER

            Answered 2021-Jul-01 at 13:09

            There are 3 options that I can see:

            1. Try out the JDBC 1.13 connector with your Flink version. There is a good chance it might just work.
            2. If that doesn't work immediately, check if you can backport it to 1.11. There shouldn't be too many changes.
            3. Write your own 2-phase-commit sink, either by extending TwoPhaseCommitSinkFunction or implement your own SinkFunction with CheckpointedFunction and CheckpointListener. Basically, you create a new transaction after a successful checkpoint and commit it with notifyCheckpointCompleted.

            Source https://stackoverflow.com/questions/68146383

            QUESTION

            javadoc of SingleOutputStreamOperator#returns(TypeHint typeHint) method
            Asked 2021-Jun-30 at 09:16

            I am reading the source code of SingleOutputStreamOperator#returns, its javadoc is:

            ...

            ANSWER

            Answered 2021-Jun-30 at 08:55

            When the docs says NonInferrableReturnType it means that we can use the type variable , or any other letter that you prefer. So you can create a MapFunction that return a T. But then you have to use .returns(TypeInformation.of(String.class) for example, if your goal is to return a String.

            Source https://stackoverflow.com/questions/68187973

            QUESTION

            Flink pipeline without a data sink with checkpointing on
            Asked 2021-Jun-09 at 16:43

            I am researching on building a flink pipeline without a data sink. i.e my pipeline ends when it makes a successful api call to a datastore.

            In that case if we don't use a sink operator how will checkpointing work ?

            As checkpointing is based on the concept of pre-checkpoint epoch (all events that are persisted in state or emitted into sinks) and a post-checkpoint epoch. Is having a sink required for a flink pipeline?

            ...

            ANSWER

            Answered 2021-Jun-09 at 16:43

            Yes, sinks are required as part of Flink's execution model:

            DataStream programs in Flink are regular programs that implement transformations on data streams (e.g., filtering, updating state, defining windows, aggregating). The data streams are initially created from various sources (e.g., message queues, socket streams, files). Results are returned via sinks, which may for example write the data to files, or to standard output (for example the command line terminal)

            One could argue that your that the call to your datastore is the actual sink implementation that you could use. You could define your own sink and execute the datastore call there.

            I am not keen on the details of your datastore, but one could assume that you are serializing these events and sending them to the datastore in some way. In that case, you could flow all your elements to the sink operator, and store each of these elements in some ListState which you can continuously offload and send. This way, if your application needs to be upgraded, in flight records will not be lost and will be recovered and sent once the job has restored.

            Source https://stackoverflow.com/questions/67894229

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            A change introduced in Apache Flink 1.11.0 (and released in 1.11.1 and 1.11.2 as well) allows attackers to read any file on the local filesystem of the JobManager through the REST interface of the JobManager process. Access is restricted to files accessible by the JobManager process. All users should upgrade to Flink 1.11.3 or 1.12.0 if their Flink instance(s) are exposed. The issue was fixed in commit b561010b0ee741543c3953306037f00d7a9f0801 from apache/flink:master.
            Apache Flink 1.5.1 introduced a REST handler that allows you to write an uploaded file to an arbitrary location on the local file system, through a maliciously modified HTTP HEADER. The files can be written to any location accessible by Flink 1.5.1. All users should upgrade to Flink 1.11.3 or 1.12.0 if their Flink instance(s) are exposed. The issue was fixed in commit a5264a6f41524afe8ceadf1d8ddc8c80f323ebc4 from apache/flink:master.
            A vulnerability in Apache Flink (1.1.0 to 1.1.5, 1.2.0 to 1.2.1, 1.3.0 to 1.3.3, 1.4.0 to 1.4.2, 1.5.0 to 1.5.6, 1.6.0 to 1.6.4, 1.7.0 to 1.7.2, 1.8.0 to 1.8.3, 1.9.0 to 1.9.2, 1.10.0) where, when running a process with an enabled JMXReporter, with a port configured via metrics.reporter.reporter_name>.port, an attacker with local access to the machine and JMX port can execute a man-in-the-middle attack using a specially crafted request to rebind the JMXRMI registry to one under the attacker's control. This compromises any connection established to the process via JMX, allowing extraction of credentials and any other transferred data.

            Install flink

            You can download it from GitHub, Maven.
            You can use flink like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the flink component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

            Support

            Don’t hesitate to ask!. Contact the developers and community on the mailing lists if you need any help. Open an issue if you found a bug in Flink.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/apache/flink.git

          • CLI

            gh repo clone apache/flink

          • sshUrl

            git@github.com:apache/flink.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link