rocksdb | A library that provides an embeddable, persistent key-value store for fast storage | Database library
kandi X-RAY | rocksdb Summary
Support
Quality
Security
License
Reuse
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
rocksdb Key Features
rocksdb Examples and Code Snippets
Faust supports kafka with version >= 0.10. .. _`introduction`: http://faust.readthedocs.io/en/latest/introduction.html .. _`quickstart`: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html .. _`User Guide`: http://faust.readthedocs.io/en/latest/userguide/index.html .. _getting-help: Getting Help ============ .. _slack-channel: Slack ----- For discussions about the usage, development, and future of Faust, please join the `fauststream`_ Slack. * https://fauststream.slack.com * Sign-up: https://join.slack.com/t/fauststream/shared_invite/enQtNDEzMTIyMTUyNzU2LTIyMjNjY2M2YzA2OWFhMDlmMzVkODk3YTBlYThlYmZiNTUwZDJlYWZiZTdkN2Q4ZGU4NWM4YWMyNTM5MGQ5OTg Resources ========= .. _bug-tracker: Bug tracker ----------- If you have any suggestions, bug reports, or annoyances please report them to our issue tracker at https://github.com/robinhood/faust/issues/ .. _license: License ======= This software is licensed under the `New BSD License`. See the ``LICENSE`` file in the top distribution directory for the full license text. .. # vim: syntax=rst expandtab tabstop=4 shiftwidth=4 shiftround .. _`introduction`: http://faust.readthedocs.io/en/latest/introduction.html .. _`quickstart`: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html .. _`User Guide`: http://faust.readthedocs.io/en/latest/userguide/index.html Contributing ============ Development of `Faust` happens at GitHub: https://github.com/robinhood/faust You're highly encouraged to participate in the development of `Faust`. Be sure to also read the `Contributing to Faust`_ section in the documentation. .. _`Contributing to Faust`: http://faust.readthedocs.io/en/latest/contributing.html Code of Conduct =============== Everyone interacting in the project's code bases, issue trackers, chat rooms, and mailing lists is expected to follow the Faust Code of Conduct. As contributors and maintainers of these projects, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities. We are committed to making participation in these projects a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality. Examples of unacceptable behavior by participants include: * The use of sexualized language or imagery * Personal attacks * Trolling or insulting/derogatory comments * Public or private harassment * Publishing other's private information, such as physical or electronic addresses, without explicit permission * Other unethical or unprofessional conduct. Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers commit themselves to fairly and consistently applying these principles to every aspect of managing this project. Project maintainers who do not follow or enforce the Code of Conduct may be permanently removed from the project team. This code of conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers. This Code of Conduct is adapted from the Contributor Covenant, version 1.2.0 available at http://contributor-covenant.org/version/1/2/0/. .. _`introduction`: http://faust.readthedocs.io/en/latest/introduction.html .. _`quickstart`: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html .. _`User Guide`: http://faust.readthedocs.io/en/latest/userguide/index.html .. |build-status| image:: https://secure.travis-ci.org/robinhood/faust.png?branch=master :alt: Build status :target: https://travis-ci.org/robinhood/faust .. |coverage| image:: https://codecov.io/github/robinhood/faust/coverage.svg?branch=master :target: https://codecov.io/github/robinhood/faust?branch=master .. |license| image:: https://img.shields.io/pypi/l/faust.svg :alt: BSD License :target: https://opensource.org/licenses/BSD-3-Clause .. |wheel| image:: https://img.shields.io/pypi/wheel/faust.svg :alt: faust can be installed via wheel :target: http://pypi.org/project/faust/ .. |pyversion| image:: https://img.shields.io/pypi/pyversions/faust.svg :alt: Supported Python versions. :target: http://pypi.org/project/faust/ .. |pyimp| image:: https://img.shields.io/pypi/implementation/faust.svg :alt: Support Python implementations. :target: http://pypi.org/project/faust/ .. _`introduction`: http://faust.readthedocs.io/en/latest/introduction.html .. _`quickstart`: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html .. _`User Guide`: http://faust.readthedocs.io/en/latest/userguide/index.html
Trending Discussions on rocksdb
Trending Discussions on rocksdb
QUESTION
Is it okay to hold large state in RocksDB when using Kafka Streams? We are planning to use RocksDB as an eventstore to hold billions of events for ininite of time.
ANSWER
Answered 2022-Apr-03 at 20:15The main limitation would be disk space, so sure, it can be done, but if the app crashes for any reason, you might be waiting for a while for the app to rebuild its state.
QUESTION
It's my first Kafka program.
From a kafka_2.13-3.1.0
instance, I created a Kafka topic poids_garmin_brut
and filled it with this csv
:
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic poids_garmin_brut
kafka-console-producer.sh --broker-list localhost:9092 --topic poids_garmin_brut < "Poids(1).csv"
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
[...]
And at anytime now, before or after running the program I'll show, its content can be displayed by a kafka-console-consumer
command:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic poids_garmin_brut --from-beginning
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
" 11 Fév. 2022",
05:54,72.2 kg,0.1 kg,22.8,25.6 %,29.7 kg,3.5 kg,54.3 %,
" 10 Fév. 2022",
06:14,72.3 kg,0.0 kg,22.8,25.9 %,29.7 kg,3.5 kg,54.1 %,
" 9 Fév. 2022",
06:06,72.3 kg,0.5 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 8 Fév. 2022",
07:14,71.8 kg,0.7 kg,22.7,26.3 %,29.6 kg,3.5 kg,53.8 %,
Here is the Java program, based on org.apache.kafka:kafka-streams:3.1.0
dependency, extracting this topic as a stream:
package extracteur.garmin;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.slf4j.*;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.util.Properties;
@SpringBootApplication
public class Kafka {
/** Logger. */
private static final Logger LOGGER = LoggerFactory.getLogger(Kafka.class);
public static void main(String[] args) {
LOGGER.info("L'extracteur de données Garmin démarre...");
/* Les données du fichier CSV d'entrée sont sous cette forme :
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
*/
// Création d'un flux sans clef et valeur : chaîne de caractères.
StreamsBuilder builder = new StreamsBuilder();
KStream stream = builder.stream("poids_garmin_brut");
// C'est un foreach de Kafka, pas de lambda java. Il est lazy.
stream.foreach((key, value) -> {
LOGGER.info(value);
});
KafkaStreams streams = new KafkaStreams(builder.build(), config());
streams.start();
// Fermer le flux Kafka quand la VM s'arrêtera, en faisant appeler
streams.close();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
/**
* Propriétés pour le démarrage.
* @return propriétés de configuration.
*/
private static Properties config() {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "dev1");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Void().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
return config;
}
}
But, while the logs don't seem to report any error during execution, my program doesn't enter the stream.forEach
, and therefore: displays no content from that topic.
(in this log I removed the dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088-
part of [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088-StreamThread-1]
you should read inside, for SO message length and lisibility. And org.apache.kafka
becames o.a.k.
).
/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -XX:TieredStopAtLevel=1 -noverify -Dspring.output.ansi.enabled=always -Dcom.sun.management.jmxremote -Dspring.jmx.enabled=true -Dspring.liveBeansView.mbeanDomain -Dspring.application.admin.enabled=true -javaagent:/opt/idea-IU-212.5284.40/lib/idea_rt.jar=41397:/opt/idea-IU-212.5284.40/bin -Dfile.encoding=UTF-8 -classpath /home/lebihan/dev/Java/garmin/target/classes:/home/lebihan/.m2/repository/org/slf4j/slf4j-api/1.7.33/slf4j-api-1.7.33.jar:/home/lebihan/.m2/repository/org/slf4j/log4j-over-slf4j/1.7.33/log4j-over-slf4j-1.7.33.jar:/home/lebihan/.m2/repository/ch/qos/logback/logback-classic/1.2.10/logback-classic-1.2.10.jar:/home/lebihan/.m2/repository/ch/qos/logback/logback-core/1.2.10/logback-core-1.2.10.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-web/2.6.3/spring-boot-starter-web-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter/2.6.3/spring-boot-starter-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot/2.6.3/spring-boot-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-autoconfigure/2.6.3/spring-boot-autoconfigure-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-logging/2.6.3/spring-boot-starter-logging-2.6.3.jar:/home/lebihan/.m2/repository/org/apache/logging/log4j/log4j-to-slf4j/2.17.1/log4j-to-slf4j-2.17.1.jar:/home/lebihan/.m2/repository/org/apache/logging/log4j/log4j-api/2.17.1/log4j-api-2.17.1.jar:/home/lebihan/.m2/repository/org/slf4j/jul-to-slf4j/1.7.33/jul-to-slf4j-1.7.33.jar:/home/lebihan/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/lebihan/.m2/repository/org/yaml/snakeyaml/1.29/snakeyaml-1.29.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-json/2.6.3/spring-boot-starter-json-2.6.3.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.13.1/jackson-datatype-jdk8-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.13.1/jackson-datatype-jsr310-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.13.1/jackson-module-parameter-names-2.13.1.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-tomcat/2.6.3/spring-boot-starter-tomcat-2.6.3.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-core/9.0.56/tomcat-embed-core-9.0.56.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-el/9.0.56/tomcat-embed-el-9.0.56.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-websocket/9.0.56/tomcat-embed-websocket-9.0.56.jar:/home/lebihan/.m2/repository/org/springframework/spring-web/5.3.15/spring-web-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-beans/5.3.15/spring-beans-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-webmvc/5.3.15/spring-webmvc-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-aop/5.3.15/spring-aop-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-context/5.3.15/spring-context-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-expression/5.3.15/spring-expression-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-core/5.3.15/spring-core-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-jcl/5.3.15/spring-jcl-5.3.15.jar:/home/lebihan/.m2/repository/org/apache/kafka/kafka-streams/3.1.0/kafka-streams-3.1.0.jar:/home/lebihan/.m2/repository/org/apache/kafka/kafka-clients/3.0.0/kafka-clients-3.0.0.jar:/home/lebihan/.m2/repository/com/github/luben/zstd-jni/1.5.0-2/zstd-jni-1.5.0-2.jar:/home/lebihan/.m2/repository/org/lz4/lz4-java/1.7.1/lz4-java-1.7.1.jar:/home/lebihan/.m2/repository/org/xerial/snappy/snappy-java/1.1.8.1/snappy-java-1.1.8.1.jar:/home/lebihan/.m2/repository/org/rocksdb/rocksdbjni/6.22.1.1/rocksdbjni-6.22.1.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.13.1/jackson-annotations-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.13.1/jackson-databind-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.13.1/jackson-core-2.13.1.jar extracteur.garmin.Kafka
07:57:49.720 [main] INFO extracteur.garmin.Kafka - L'extracteur de données Garmin démarre...
07:57:49.747 [main] INFO o.a.k.streams.StreamsConfig - StreamsConfig values:
acceptable.recovery.lag = 10000
application.id = dev1
application.server =
bootstrap.servers = [localhost:9092]
buffered.records.per.partition = 1000
built.in.metrics.version = latest
cache.max.bytes.buffering = 10485760
client.id =
commit.interval.ms = 30000
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class o.a.k.streams.errors.LogAndFailExceptionHandler
default.key.serde = class o.a.k.common.serialization.Serdes$VoidSerde
default.list.key.serde.inner = null
default.list.key.serde.type = null
default.list.value.serde.inner = null
default.list.value.serde.type = null
default.production.exception.handler = class o.a.k.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class o.a.k.streams.processor.FailOnInvalidTimestamp
default.value.serde = class o.a.k.common.serialization.Serdes$StringSerde
max.task.idle.ms = 0
max.warmup.replicas = 2
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
num.standby.replicas = 0
num.stream.threads = 1
poll.ms = 100
probing.rebalance.interval.ms = 600000
processing.guarantee = at_least_once
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
replication.factor = -1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
state.cleanup.delay.ms = 600000
state.dir = /tmp/kafka-streams
task.timeout.ms = 300000
topology.optimization = none
upgrade.from = null
window.size.ms = null
windowed.inner.class.serde = null
windowstore.changelog.additional.retention.ms = 86400000
07:57:49.760 [main] INFO o.a.k.clients.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [localhost:9092]
client.dns.lookup = use_all_dns_ips
client.id = admin
connections.max.idle.ms = 300000
default.api.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269788
07:57:49.793 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] Kafka Streams version: 3.1.0
07:57:49.793 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] Kafka Streams commit ID: 37edeed0777bacb3
07:57:49.800 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating restore consumer client
07:57:49.802 [main] INFO o.a.k.clients.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = none
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-restore-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = null
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class o.a.k.clients.consumer.RangeAssignor, class o.a.k.clients.consumer.CooperativeStickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269816
07:57:49.818 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating thread producer client
07:57:49.820 [main] INFO o.a.k.clients.producer.ProducerConfig - ProducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-producer
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
interceptor.classes = []
key.serializer = class o.a.k.common.serialization.ByteArraySerializer
linger.ms = 100
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class o.a.k.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class o.a.k.common.serialization.ByteArraySerializer
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269828
07:57:49.830 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating consumer client
07:57:49.831 [main] INFO o.a.k.clients.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = false
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = dev1
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [o.a.k.streams.processor.internals.StreamsPartitionAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
replication.factor = -1
windowstore.changelog.additional.retention.ms = 86400000
07:57:49.836 [main] INFO o.a.k.streams.processor.internals.assignment.AssignorConfiguration - stream-thread [StreamThread-1-consumer] Cooperative rebalancing protocol is enabled now
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269840
07:57:49.844 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] State transition from CREATED to REBALANCING
07:57:49.845 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Starting
07:57:49.845 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] State transition from CREATED to STARTING
07:57:49.845 [StreamThread-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Subscribed to topic(s): poids_garmin_brut
07:57:49.845 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] State transition from REBALANCING to PENDING_SHUTDOWN
07:57:49.846 [kafka-streams-close-thread] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Informed to shut down
07:57:49.846 [kafka-streams-close-thread] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] State transition from STARTING to PENDING_SHUTDOWN
07:57:49.919 [kafka-producer-network-thread | StreamThread-1-producer] INFO o.a.k.clients.Metadata - [Producer clientId=StreamThread-1-producer] Cluster ID: QKJGs4glRAy7besZxXNCrg
07:57:49.920 [StreamThread-1] INFO o.a.k.clients.Metadata - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Cluster ID: QKJGs4glRAy7besZxXNCrg
07:57:49.921 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Discovered group coordinator debian:9092 (id: 2147483647 rack: null)
07:57:49.922 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] (Re-)joining group
07:57:49.929 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Request joining group due to: need to re-join with the given member-id
07:57:49.929 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] (Re-)joining group
07:57:49.930 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Successfully joined group with generation Generation{generationId=3, memberId='StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c', protocol='stream'}
07:57:49.936 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] All members participating in this rebalance:
d1c8ce47-6fbf-41b7-b8aa-e3d094703088: [StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c].
07:57:49.938 [StreamThread-1] INFO o.a.k.streams.processor.internals.assignment.HighAvailabilityTaskAssignor - Decided on assignment: {d1c8ce47-6fbf-41b7-b8aa-e3d094703088=[activeTasks: ([0_0]) standbyTasks: ([]) prevActiveTasks: ([]) prevStandbyTasks: ([]) changelogOffsetTotalsByTask: ([]) taskLagTotals: ([]) capacity: 1 assigned: 1]} with no followup probing rebalance.
07:57:49.938 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Assigned tasks [0_0] including stateful [] to clients as:
d1c8ce47-6fbf-41b7-b8aa-e3d094703088=[activeTasks: ([0_0]) standbyTasks: ([])].
07:57:49.939 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Client d1c8ce47-6fbf-41b7-b8aa-e3d094703088 per-consumer assignment:
prev owned active {}
prev owned standby {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=[]}
assigned active {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=[0_0]}
revoking active {}
assigned standby {}
07:57:49.939 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Finished stable assignment of tasks, no followup rebalances required.
07:57:49.939 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Finished assignment for group at generation 3: {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=Assignment(partitions=[poids_garmin_brut-0], userDataSize=52)}
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Successfully synced group in generation Generation{generationId=3, memberId='StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c', protocol='stream'}
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Updating assignment with
Assigned partitions: [poids_garmin_brut-0]
Current owned partitions: []
Added partitions (assigned - owned): [poids_garmin_brut-0]
Revoked partitions (owned - assigned): []
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Notifying assignor about the new Assignment(partitions=[poids_garmin_brut-0], userDataSize=52)
07:57:49.944 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
07:57:49.944 [StreamThread-1] INFO o.a.k.streams.processor.internals.TaskManager - stream-thread [StreamThread-1] Handle new assignment with:
New active tasks: [0_0]
New standby tasks: []
Existing active tasks: []
Existing standby tasks: []
07:57:49.950 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Adding newly assigned partitions: poids_garmin_brut-0
07:57:49.953 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Found no committed offset for partition poids_garmin_brut-0
07:57:49.954 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Shutting down
[...]
Process finished with exit code 0
What am I doing wrong?
I'm running my Kafka instance and its Java program locally, on the same PC.
I've experienced
3.1.0
and2.8.1
versions of Kafka, or removed any traces of Spring in the Java program without success.
I belive I'm facing a configuration problem.
ANSWER
Answered 2022-Feb-15 at 14:36Following should work.
LOGGER.info("L'extracteur de données Garmin démarre...");
/* Les données du fichier CSV d'entrée sont sous cette forme :
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
*/
// Création d'un flux sans clef et valeur : chaîne de caractères.
StreamsBuilder builder = new StreamsBuilder();
builder.stream("poids_garmin_brut")
.foreach((k, v) -> {
LOGGER.info(v.toString());
});
KafkaStreams streams = new KafkaStreams(builder.build(), config());
streams.start();
// Fermer le flux Kafka quand la VM s'arrêtera, en faisant appeler
//streams.close();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
OUTPUT
2022-02-15 20:05:54 INFO ConsumerCoordinator:291 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Adding newly assigned partitions: poids_garmin_brut-0
2022-02-15 20:05:54 INFO StreamThread:229 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] State transition from STARTING to PARTITIONS_ASSIGNED
2022-02-15 20:05:54 INFO ConsumerCoordinator:844 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Setting offset for partition poids_garmin_brut-0 to the committed offset FetchPosition{offset=21, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[LAPTOP-J1JBHQUR:9092 (id: 0 rack: null)], epoch=0}}
2022-02-15 20:05:54 INFO StreamTask:240 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] task [0_0] Initialized
2022-02-15 20:05:54 INFO StreamTask:265 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] task [0_0] Restored and ready to run
2022-02-15 20:05:54 INFO StreamThread:882 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] Restoration took 30 ms for all tasks [0_0]
2022-02-15 20:05:54 INFO StreamThread:229 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-15 20:05:54 INFO KafkaStreams:332 - stream-client [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b] State transition from REBALANCING to RUNNING
2022-02-15 20:05:54 INFO KafkaConsumer:2254 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Requesting the log end offset for poids_garmin_brut-0 in order to compute lag
2022-02-15 20:06:03 INFO Main:33 - Test22
2022-02-15 20:06:06 INFO Main:33 - Test23
QUESTION
I have a job running on Flink 1.14.3 (Java 11) that uses rocksdb as the state backend. The problem is that the job requires an amount of memory pretty similar to the overall state size.
Indeed, for making it stable (and capable of taking snapshots) this is what I'm using:
- 4 TMs with 30 GB of RAM and 7 CPUs
- Everything is run on top of Kubernetes on AWS using nodes with 32 GB of RAM and locally attached SSD disks (M5ad instances for what it's worth)
I have these settings in place:
state.backend: rocksdb
state.backend.incremental: 'true'
state.backend.rocksdb.localdir: /opt/flink/rocksdb <-- SSD volume (see below)
state.backend.rocksdb.memory.managed: 'true'
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED
taskmanager.memory.managed.fraction: '0.9'
taskmanager.memory.framework.off-heap.size: 512mb
taskmanager.numberOfTaskSlots: '4' (parallelism: 16)
Also this:
- name: rocksdb-volume
volume:
emptyDir:
sizeLimit: 100Gi
name: rocksdb-volume
volumeMount:
mountPath: /opt/flink/rocksdb
Which provides plenty of disk space for each task manager. With those settings, the job runs smoothly and in particular there is a relatively big memory margin. Problem is that memory consumption slowly increases and also that with less memory margin snapshots fail. I have tried reducing the number of taskmanagers but I need 4. Same with the amount of RAM, I have tried giving e.g. 16 GB instead of 30 GB but same problem. Another setting that has worked for us is using 8 TMs each with 16 GB of RAM, but again, this leads to the same amount of memory overall as the current settings. Even with that amount of memory, I can see that memory keeps growing and will probably lead to a bad end...
Also, the latest snapshot took around 120 GBs, so as you can see I am using an amount of RAM similar to the size of the total state, which defeats the whole purpose of using a disk-based state backend (rocksdb) plus local SSDs.
Is there an effective way of limiting the memory that rocksdb takes (to that available on the running pods)? Nothing I have found/tried out so far has worked. Theoretically, the images I am using have jemalloc
in place for memory allocation, which should avoid memory fragmentation issues observed with malloc in the past.
UPDATE 1: Attached please find memory evolution with `taskmanager.memory.managed.fraction' equal to 0.25 and 0.1. Apparently, the job continues to require all the available memory in the long run.
UPDATE 2: Following David's suggestion I've tried lowering the value of taskmanager.memory.managed.fraction
as well as the total amount of memory. In particular, it seems that the job can run smoothly with 8 GB per TM if the managed fraction is set to 0.2. However, if set to 0.9 the job fails to start (due to lack of memory) unless 30 GB per TM are given. The following screenshot displays the memory evolution with the managed fraction set to 0.2 and the TM memory set to 8 GB. At around 13.50h a significant amount of memory was freed as a result of taking a snapshot (which worked well). Overall the job looks pretty stable now...
ANSWER
Answered 2022-Feb-04 at 18:54RocksDB is designed to use all of the memory you give it access to -- so if it can fit all of your state in memory, it will. And given that you've increased taskmanager.memory.managed.fraction
from 0.4 to 0.9, it's not surprising that your overall memory usage approaches its limit over time.
If you give RocksDB rather less memory, it should cope. Have you tried that?
QUESTION
What is the difference between using RocksDB to store operator state checkpoints vs using RocksDB as cache(instead of a cache like Redis)in Flink job? I have a requirement to store data processed from Flink job to a cache for 24 hours and perform some computations in streaming job based on that data. The data has to be removed past 24 hrs. Can RocksDB be used for this purpose?
ANSWER
Answered 2022-Jan-30 at 10:25The role that RocksDB plays in Flink is not really a checkpoint store or a cache. A checkpoint store must be reliable, and capable of surviving failures; Flink does not rely on RocksDB to survive failures. During checkpointing Flink copies the state in RocksDB to a distributed file system. During recovery, a new RocksDB instance will be created from the latest checkpoint. Caches, on the other hand, are a nice-to-have storage layer that can transparently fall back to some ground truth storage in the case of a cache miss. This comes closer to describing how the RocksDB state backend fits into Flink, except that Flink's state backends are essential components, rather than nice-to-haves. If the state for a running job can't be found in RocksDB, it doesn't exist.
Setting that aside, yes, you can store data in RocksDB for 24 hours and then remove it (or have it removed). You can explicitly remove it by using a Timer with a KeyedProcessFunction, and then clear an entry when the Timer fires. Or you can use the State TTL mechanism to have Flink clear state for you automatically.
You don't have to use Flink with RocksDB. The fully in-memory heap-based state backend is a higher performance alternative that offers the same exactly-once fault-tolerance guarantees, but it doesn't spill to disk like RocksDB, so you are more limited in how much state can be managed.
QUESTION
I have a flink(v1.13.3) application with un-bounded stream (using kafka). And one of the my stream is so busy. And also busy value (I can see on the UI) increases over the time. When I just start flink application:
sum by(task_name) (flink_taskmanager_job_task_busyTimeMsPerSecond{job="Flink", task_name="MyProcessFunction"})
returns 300-450 ms- After five++ hours
sum by(task_name) (flink_taskmanager_job_task_busyTimeMsPerSecond{job="Flink", task_name="MyProcessFunction"})
returns 5-7 sn.
This function is so simple, and it just use rocksdb for the state backend:
public class MyObj implements Serializable
{
private Set distinctValues;
public MyObj()
{
this.distinctValues = new HashSet<>();
}
public Set getDistinctValues() {
return distinctValues;
}
public void setDistinctValues(Set values) {
this.distinctValues = values;
}
}
public class MyProcessFunction extends KeyedProcessFunction
{
private transient ValueState state;
@Override
public void open(Configuration parameters)
{
ValueStateDescriptor stateDescriptor = new ValueStateDescriptor<>("MyObj",
TypeInformation.of(MyObj.class));
state = getRuntimeContext().getState(stateDescriptor);
}
@Override
public void processElement(KafkaRecord value, Context ctx, Collector out) throws Exception
{
MyObj stateValue = state.value();
if (stateValue == null)
{
stateValue = new MyObj();
ctx.timerService().registerProcessingTimeTimer(value.getTimestamp() + 10mins);
}
stateValue.getDistinctValues().add(value.getValue());
if (stateValue.getDistinctValues().size() >= 20)
{
state.clear();
}
else
{
state.update(stateValue);
}
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector out)
{
state.clear();
}
}
NOTE: Before implementing valueState, I was just using ListState. But using with listState flink_taskmanager_job_task_busyTimeMsPerSecond
returns 25-30sn:
public class MyProcessFunction extends extends KeyedProcessFunction
{
private transient ListState listState;
@Override
public void open(Configuration parameters)
{
ListStateDescriptor listStateDescriptor = new ListStateDescriptor<>("myobj", TypeInformation.of(String.class));
listState = getRuntimeContext().getListState(listStateDescriptor);
}
@Override
public void processElement(KafkaRecord value, Context ctx, Collector out) throws Exception
{
List values = IteratorUtils.toList(listState.get().iterator());
if (CollectionUtils.isEmpty(values))
{
ctx.timerService().registerProcessingTimeTimer(value.getTimestamp() + 10min);
}
if (!values.contains(value.getValue()))
{
values.add(value.getValue());
listState.update(values);
}
if (values.size() >= 20)
{
...
}
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector out)
{
listState.clear();
}
}
ANSWER
Answered 2022-Jan-12 at 09:16Some slowdown is to be expected once RocksDB reaches the point where the working state no longer fits in memory. However, in this case you should be able to dramatically improve performance by switching from ValueState
to MapState
.
Currently you are deserializing and reserializing the entire hashSet for every record. As these hashSets grow over time, performance degrades.
The RocksDB state backend has an optimized implementation of MapState
. Each individual key/value entry in the map is stored as a separate RocksDB object, so you can lookup, insert, and update entries without having to do serde on the rest of the map.
ListState
is also optimized for RocksDB (it can be appended to without deserializing the list). In general it's best to avoid storing collections in ValueState
when using RocksDB, and use ListState
or MapState
instead wherever possible.
Since the heap-based state backend keeps its working state as objects on the heap, it doesn't have the same issues.
QUESTION
If I have a simple flink job with 2 keyed states, say State1 and State2.
The job is configured with rocksDB backend. Each of the states hold 10GB data.
If I update the code so that one of the state is not used(state descriptor deleted, and related code removed.). For example State1 is deleted.
When next time flink trigger checkpoint or I trigger savepoint manually. Will the checkpoint/savepoint still hold data of State1 or not?
ANSWER
Answered 2021-Dec-28 at 09:39If you are using RocksDB with incremental checkpoints, then state for the obsolete state descriptor will remain in checkpoints until it is compacted away (but it can be ignored). With any full snapshot, nothing of State1 will remain.
With RocksDB, expired state is eventually removed by a RocksDB compaction filter. Until then, if StateTtlConfig.StateVisibility.NeverReturnExpired
is set the state backend returns null in the place of expired values.
QUESTION
I'm trying to run Python Faust from Docker.
Based on this documentation: https://faust.readthedocs.io/en/latest/userguide/installation.html
I created a simple Docker file:
FROM python:3
ADD ./app/app.py /
RUN pip3 install --upgrade pip
RUN pip install -U faust
RUN pip install "faust[rocksdb]"
RUN pip install "faust[rocksdb,uvloop,fast,redis]"
CMD ["python", "./app.py"]
When I create a docker file I receive an error at the 5th stage (Step 5/7 : RUN pip install "faust[rocksdb]")
---> Running in 1e42a5e50cbe Requirement already satisfied: faust[rocksdb] in /usr/local/lib/python3.10/site-packages (1.10.4) Requirement already satisfied: terminaltables<4.0,>=3.1 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (3.1.10) Requirement already satisfied: click<8.0,>=6.7 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (7.1.2) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (1.7.2) Requirement already satisfied: aiohttp-cors<2.0,>=0.7 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (0.7.0) Requirement already satisfied: mypy-extensions in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (0.4.3) Requirement already satisfied: colorclass<3.0,>=2.2 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (2.2.2) Requirement already satisfied: opentracing<2.0.0,>=1.3.0 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (1.3.0) Requirement already satisfied: mode<4.4,>=4.3.2 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (4.3.2) Requirement already satisfied: venusian<2.0,>=1.1 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (1.2.0) Requirement already satisfied: aiohttp<4.0,>=3.5.2 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (3.8.1) Requirement already satisfied: robinhood-aiokafka<1.2,>=1.1.6 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (1.1.6) Requirement already satisfied: croniter>=0.3.16 in /usr/local/lib/python3.10/site-packages (from faust[rocksdb]) (1.1.0) Collecting python-rocksdb>=0.6.7 Downloading python-rocksdb-0.7.0.tar.gz (219 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (1.2.0) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (21.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (1.2.0) Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (2.0.9) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (5.2.0) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/site-packages (from aiohttp<4.0,>=3.5.2->faust[rocksdb]) (4.0.2) Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/site-packages (from croniter>=0.3.16->faust[rocksdb]) (2.8.2) Requirement already satisfied: colorlog>=2.9.0 in /usr/local/lib/python3.10/site-packages (from mode<4.4,>=4.3.2->faust[rocksdb]) (6.6.0) Requirement already satisfied: setuptools>=25 in /usr/local/lib/python3.10/site-packages (from python-rocksdb>=0.6.7->faust[rocksdb]) (57.5.0) Requirement already satisfied: kafka-python<1.5,>=1.4.6 in /usr/local/lib/python3.10/site-packages (from robinhood-aiokafka<1.2,>=1.1.6->faust[rocksdb]) (1.4.7) Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.10/site-packages (from yarl<2.0,>=1.0->faust[rocksdb]) (3.3) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/site-packages (from python-dateutil->croniter>=0.3.16->faust[rocksdb]) (1.16.0)
And an ERROR PART:
Building wheels for collected packages: python-rocksdb Building wheel for python-rocksdb (setup.py): started ERROR: Command errored out with exit status 1: command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b8y7g4hs/python-rocksdb_b1c08993fd134ac4bc59e6f5d18bcd91/setup.py'"'"'; file='"'"'/tmp/pip-install-b8y7g4hs/python-rocksdb_b1c08993fd134ac4bc59e6f5d18bcd91/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-9_o4ek6z cwd: /tmp/pip-install-b8y7g4hs/python-rocksdb_b1c08993fd134ac4bc59e6f5d18bcd91/ Complete output (64 lines): running bdist_wheel running build
running build_py creating build creating build/lib.linux-x86_64-3.10 creating build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/interfaces.py -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/errors.py -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/merge_operators.py -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/init.py -> build/lib.linux-x86_64-3.10/rocksdb
creating build/lib.linux-x86_64-3.10/rocksdb/tests copying rocksdb/tests/test_memtable.py -> build/lib.linux-x86_64-3.10/rocksdb/tests copying rocksdb/tests/test_db.py -> build/lib.linux-x86_64-3.10/rocksdb/tests copying rocksdb/tests/init.py -> build/lib.linux-x86_64-3.10/rocksdb/tests copying rocksdb/tests/test_options.py -> build/lib.linux-x86_64-3.10/rocksdb/tests running egg_info writing python_rocksdb.egg-info/PKG-INFO writing dependency_links to python_rocksdb.egg-info/dependency_links.txt writing requirements to python_rocksdb.egg-info/requires.txt writing top-level names to python_rocksdb.egg-info/top_level.txt reading manifest file 'python_rocksdb.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'python_rocksdb.egg-info/SOURCES.txt' copying rocksdb/_rocksdb.cpp -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/rocksdb.pyx -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/backup.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/cache.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/comparator.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/db.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/env.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/filter_policy.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/iterator.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/logger.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/memtablerep.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/merge_operator.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/options.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/slice.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/slice_transform.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/snapshot.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/status.pxd -> build/lib.linux-x86_64-3.10/rocksdb
copying rocksdb/std_memory.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/table_factory.pxd -> build/lib.linux-x86_64-3.10/rocksdb copying rocksdb/universal_compaction.pxd -> build/lib.linux-x86_64-3.10/rocksdb creating build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/comparator_wrapper.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/filter_policy_wrapper.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/memtable_factories.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/merge_operator_wrapper.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/slice_transform_wrapper.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp copying rocksdb/cpp/utils.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp
copying rocksdb/cpp/write_batch_iter_helper.hpp -> build/lib.linux-x86_64-3.10/rocksdb/cpp running build_ext
cythoning rocksdb/_rocksdb.pyx to rocksdb/_rocksdb.cpp
/tmp/pip-install-b8y7g4hs/python-rocksdb_b1c08993fd134ac4bc59e6f5d18bcd91/.eggs/Cython-0.29.26-py3.10-linux-x86_64.egg/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-b8y7g4hs/python-rocksdb_b1c08993fd134ac4bc59e6f5d18bcd91/rocksdb/_rocksdb.pyx tree = Parsing.p_module(s, pxd, full_module_name) building 'rocksdb._rocksdb' extension creating build/temp.linux-x86_64-3.10
creating build/temp.linux-x86_64-3.10/rocksdb gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c rocksdb/_rocksdb.cpp -o build/temp.linux-x86_64-3.10/rocksdb/_rocksdb.o -std=c++11 -O3 -Wall -Wextra -Wconversion -fno-strict-aliasing -fno-rtti rocksdb/_rocksdb.cpp:705:10: fatal error: rocksdb/slice.h: No such file or directory 705 | #include "rocksdb/slice.h" | ^~~~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1
---------------------------------------- Building wheel for python-rocksdb (setup.py): finished with status 'error' ERROR: Failed building wheel for python-rocksdb
Can anyone help me to move on with this? I'd like to use Faust from Docker on Kubernetes.
ANSWER
Answered 2021-Dec-27 at 23:37Read the error message, where it is clearly stated you are missing a header file:
fatal error: rocksdb/slice.h: No such file or directory 705 | #include "rocksdb/slice.h" | ^~~~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1
Accordingly, you'll need to build and install RocksDB. This is separate from the installation of faust[rocksdb]
with pip. That simply installs python-rocksdb
, the Python interface to the underlying libraries.
There is even a (third-party) RocksDB docker image based on Python 3.7 Slim.
You could use that directly or take some tricks from the Dockerfile for that image.
QUESTION
Let's say I have process function like this one (with rocksdb state backend):
public class Test extends KeyedProcessFunction<...>
{
private transient ValueState ...;
...
@Override
public void open(Configuration parameters) throws Exception
{
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.minutes(10))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.cleanupInRocksdbCompactFilter(1000)
.build();
ValueStateDescriptor testDescr = new ValueStateDescriptor(
"test",
TypeInformation.of(Integer.class)
);
testDescr.enableTimeToLive(ttlConfig);
...
}
}
kafkaSource.keyby(object -> object.getKey()).process(new Test()))...;
Assume that this is a unbounded stream application Let's say I have seen the key called "orange", but only one time(or just assume that process function called once for the key "orange"), and assume that there will no key called "orange". In that case key "orange" will stay in the rocksdb forever?
ANSWER
Answered 2021-Dec-15 at 21:16The state for the inactive key "orange" will be removed from RocksDB during the first RocksDB compaction that occurs after 10 minutes have elapsed since the state for that key was created (because the TTL configuration builder was configured with a 10 minute TTL timeout). Until then the state will linger in RocksDB, but because you have configured StateVisibility.NeverReturnExpired
Flink will pretend it's not there should you try to access it.
QUESTION
I have 2 questions related to high availability of a StateFun application running on Kubernetes
Here are details about my setup:
- Using StateFun v3.1.0
- Checkpoints are stored on HDFS (state.checkpoint-storage: filesystem)
- Checkpointing mode is EXACTLY_ONCE
- State backend is rocksdb and incremental checkpointing is enabled
1- I tried both Zookeeper and Kubernetes HA settings, result is the same (log below is from a Zookeeper HA env). When I kill the jobmanager pod, minikube starts another pod and this new pod fails when it tries to load last checkpoint:
...
2021-12-11 14:25:26,426 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job myStatefunApp (00000000000000000000000000000000).
2021-12-11 14:25:26,443 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, backoffTimeMS=1000) for myStatefunApp (00000000000000000000000000000000).
2021-12-11 14:25:26,516 INFO org.apache.flink.runtime.util.ZooKeeperUtils [] - Initialized DefaultCompletedCheckpointStore in 'ZooKeeperStateHandleStore{namespace='statefun_zk_recovery/my-statefun-app/checkpoints/00000000000000000000000000000000'}' with /checkpoints/00000000000000000000000000000000.
2021-12-11 14:25:26,599 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job myStatefunApp (00000000000000000000000000000000).
2021-12-11 14:25:26,599 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms.
2021-12-11 14:25:26,617 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 1 ms
2021-12-11 14:25:26,626 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using job/cluster config to configure application-defined state backend: EmbeddedRocksDBStateBackend{, localRocksDbDirectories=null, enableIncrementalCheckpointing=TRUE, numberOfTransferThreads=1, writeBatchSize=2097152}
2021-12-11 14:25:26,627 INFO org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend [] - Using predefined options: DEFAULT.
2021-12-11 14:25:26,627 INFO org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend [] - Using application-defined options factory: DefaultConfigurableOptionsFactory{configuredOptions={state.backend.rocksdb.thread.num=1}}.
2021-12-11 14:25:26,627 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using application-defined state backend: EmbeddedRocksDBStateBackend{, localRocksDbDirectories=null, enableIncrementalCheckpointing=TRUE, numberOfTransferThreads=1, writeBatchSize=2097152}
2021-12-11 14:25:26,631 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Checkpoint storage is set to 'filesystem': (checkpoints "hdfs://hdfs-namenode:8020/tmp/statefun_checkpoints/myStatefunApp")
2021-12-11 14:25:26,712 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Recovering checkpoints from ZooKeeperStateHandleStore{namespace='statefun_zk_recovery/my-statefun-app/checkpoints/00000000000000000000000000000000'}.
2021-12-11 14:25:26,724 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found 1 checkpoints in ZooKeeperStateHandleStore{namespace='statefun_zk_recovery/my-statefun-app/checkpoints/00000000000000000000000000000000'}.
2021-12-11 14:25:26,725 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying to fetch 1 checkpoints from storage.
2021-12-11 14:25:26,725 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying to retrieve checkpoint 2.
2021-12-11 14:25:26,931 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 00000000000000000000000000000000 from Checkpoint 2 @ 1639232587220 for 00000000000000000000000000000000 located at hdfs://hdfs-namenode:8020/tmp/statefun_checkpoints/myStatefunApp/00000000000000000000000000000000/chk-2.
2021-12-11 14:25:27,012 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: JobMaster for job 00000000000000000000000000000000 failed.
at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:873) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:459) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:436) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$3(Dispatcher.java:415) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at java.util.concurrent.CompletableFuture.uniHandle(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.13.2.jar:1.13.2]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.13.2.jar:1.13.2]
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.13.2.jar:1.13.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.13.2.jar:1.13.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.2.jar:1.13.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.13.2.jar:1.13.2]
Caused by: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException: There is no operator for the state 2edd7b5dafb2c271440b25f6da5f4532
at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: java.lang.IllegalStateException: There is no operator for the state 2edd7b5dafb2c271440b25f6da5f4532
at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.checkStateMappingCompleteness(StateAssignmentOperation.java:712) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.assignStates(StateAssignmentOperation.java:100) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1562) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreInitialCheckpointIfPresent(CheckpointCoordinator.java:1476) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:134) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:190) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:122) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:317) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.13.2.jar:1.13.2]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
2021-12-11 14:25:27,017 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting StatefulFunctionsClusterEntryPoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally..
2021-12-11 14:25:27,021 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shutting down rest endpoint.
2021-12-11 14:25:27,025 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124
2021-12-11 14:25:27,034 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Removing cache directory /tmp/flink-web-6c2dafc9-bb7d-489a-9e2d-cf78e3f19b67/flink-web-ui
2021-12-11 14:25:27,035 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Stopping DefaultLeaderElectionService.
2021-12-11 14:25:27,035 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - Closing ZooKeeperLeaderElectionDriver{leaderPath='/leader/rest_server_lock'}
2021-12-11 14:25:27,036 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shut down complete.
2021-12-11 14:25:27,036 INFO org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent [] - Closing components.
2021-12-11 14:25:27,037 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Stopping DefaultLeaderRetrievalService.
2021-12-11 14:25:27,037 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Closing ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/dispatcher_lock'}.
2021-12-11 14:25:27,037 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Stopping DefaultLeaderRetrievalService.
2021-12-11 14:25:27,037 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Closing ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/resource_manager_lock'}.
2021-12-11 14:25:27,038 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Stopping DefaultLeaderElectionService.
2021-12-11 14:25:27,038 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - Closing ZooKeeperLeaderElectionDriver{leaderPath='/leader/dispatcher_lock'}
2021-12-11 14:25:27,039 INFO org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - Stopping JobDispatcherLeaderProcess.
2021-12-11 14:25:27,040 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Closing the slot manager.
2021-12-11 14:25:27,040 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Suspending the slot manager.
2021-12-11 14:25:27,041 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Stopping DefaultLeaderElectionService.
2021-12-11 14:25:27,041 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - Closing ZooKeeperLeaderElectionDriver{leaderPath='/leader/resource_manager_lock'}
I believe not being able to specify ids for Flink operators (as told here) when using StateFun is causing this. While it was working fine in the beginning, operators got some random id assigned and checkpointing went just fine. After the restart, the operators are assigned other random ids, and when the jobmanager (statefun master in this case) tries to load the state "2edd7b5dafb2c271440b25f6da5f4532" it fails to find the operator assigned to it originally.
Can someone confirm what I think is correct and / or give me directions for making my StateFun app work with high availability?
Another interesting thing to note is, after several restarts of the jobmanager pod with the above exception, it sometimes can get past the "Restoring job 00000000000000000000000000000000 from Checkpoint ..." line somehow (?), with "No master state to restore" log (link) - which makes me feel not sure about it really did recover or it just started discarding the state on last successful checkpoint. What might be causing this? Is it really recovering from the checkpoint successfully?
2- For Kubernetes deployments, on StateFun deployment documentation (link) Deployment type is used for jobmanager component. On the other hand Flink deployment documentation (Standalone / Kubernetes section) (link) uses Job type for jobmanager for high available setup (jobmanager-application-ha.yaml file)
Basically since Kubernetes will restart the pod on failures, either Job or Deployment can be used. But the thing is, when we try to stop the job with a savepoint and Deployment type is used, Kubernetes restarts the pod regardless of successful savepoint creation and success exit status (0).
Are we supposed not to stop StateFun apps with savepoint when running on Kubernetes? I am aware of the related bug (link) - but although it seems to be deprecated I can do a cancel with savepoint - are we supposed to just delete deployment as told in High availability data clean up section? (link)
UPDATE for the first question: I turned on debug logging and could capture a session with the exception and a successful startup in a row. The following is from the unsuccessful one:
...
2021-12-11 21:55:14,001 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash '32d5ca33c915e65563a5c7f4d62703ad' for node 'router (my-ingress-1-in)-5' {id: 5, parallelism: 1, user function: }
2021-12-11 21:55:14,001 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash '33b86fe798648d648b237ddfc986200d' for node 'router (my-ingress-2-in)-4' {id: 4, parallelism: 1, user function: }
2021-12-11 21:55:14,001 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash 'bd4c3fa1570bbcf606f2dabddd61ed7f' for node 'router (my-ingress-3-in)-6' {id: 6, parallelism: 1, user function: }
and this is from the successful one:
2021-12-11 21:55:34,543 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash 'a1448ecf31ac98d2215c38bfd119abe0' for node 'router (my-ingress-3-in)-5' {id: 5, parallelism: 1, user function: }
2021-12-11 21:55:34,543 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash '05037ff96baea131d9cf1390846efd98' for node 'router (my-ingress-1-in)-4' {id: 4, parallelism: 1, user function: }
2021-12-11 21:55:34,543 DEBUG org.apache.flink.streaming.api.graph.StreamGraphHasherV2 [] - Generated hash '2edd7b5dafb2c271440b25f6da5f4532' for node 'router (my-ingress-2-in)-6' {id: 6, parallelism: 1, user function: }
It seems that generated hashes between two runs are computed differently.
ANSWER
Answered 2021-Dec-15 at 16:51In statefun <= 3.2 routers do not have manually specified UIDs. While Flinks internal UID generation is deterministic, the way statefun generates the underlying stream graph may not be in some cases. This is a bug. I've opened a PR to fix this in a backwards compatible way[1].
QUESTION
I am using Flink with v1.13.2
Many of the process functions use registerProcessingTimeTimer
to clear state:
public class ProcessA ...
{
@Override
public void processElement(Object value, Context ctx, Collector<...> out) throws Exception
{
if (...)
{
ctx.timerService().registerProcessingTimeTimer(value.getTimestampMs() + 23232);
}
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector out)
{
state.clear();
}
}
And many of the process functions use StateTtlConfig
:
public class ProcessB extends...
{
@Override
public void open(Configuration parameters)
{
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.minutes(15))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();
ValueStateDescriptor descriptor = ...
descriptor.enableTimeToLive(ttlConfig);
}
@Override
public void processElement(...) throws Exception
{
}
}
And I am using RocksDB for the state management.
Questions:
- Where timer created by timerService will be stored? (Stored in RocksDB or task memory)
- Where state time-to-live created by statettl config will be stored?
- Is there anything saved into the memory when I use timerService or statettl?
- If I have millions of keys which way should i prefer?
- Creating millions of keys can lead to out of memory exception when I use timerService?
- Creating millions of keys can lead to out of memory exception when I use statettl
ANSWER
Answered 2021-Dec-03 at 12:46Where will timers created by timerService be stored? (Stored in RocksDB or task memory)
By default, in RocksDB. You also have the option to keep your timers on the heap, but unless they are few in number, this is a bad idea because checkpointing heap-based timers blocks the main stream processing thread, and they add stress to the garbage collector.
Where state time-to-live created by statettl config will be stored?
This will add a long to each item of state (in the state backend, so in RocksDB).
Is there anything saved into the memory when I use timerService or statettl?
Not if you are using RocksDB for both state and timers.
If I have millions of keys which way should I prefer?
Keep your timers in RocksDB.
Creating millions of keys can lead to out of memory exception when I use timerService? Creating millions of keys can lead to out of memory exception when I use statettl?
It is always possible to have out-of-memory exceptions with RocksDB irrespective of what you are storing in it; the native library is not always well behaved with respect to living within the memory it has been allocated. But it shouldn't grow in an unbounded way, and these choices you make about timers and state TTL shouldn't make any difference.
Improvements were made in Flink 1.14 (by upgrading to a newer version of RocksDB), but some problems are still being seen. In the worst case you might need to set your actual process memory limit in the OS to something larger than what you tell Flink it can use.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rocksdb
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page