HiBench | HiBench is a big data benchmark suite
kandi X-RAY | HiBench Summary
kandi X-RAY | HiBench Summary
HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Main method for testing
- Generate a local bitmask command file
- Configure the job1 stage1
- Configure stage 5
- The main method
- Configure the phase 2
- Run Saxpy
- Multiply a block vector
- Multiply a block vector vector
- Make a block of encoded data from an output file
- Load query node info
- Deserialize fields
- Demonstrates how to run the Mahout algorithm
- Main method to submit a Phoenix job
- Generate a page words and titles
- Reduce keys and values
- The main entry point
- Compute the dot product of two matrices
- Demonstrates how to submit a Phoenix job
- Run a map job
- Calculate min block vector
- Performs a bit - OR operation on a block vector
- Main entry point
- Main method
- Submit a map job
- Create a record writer
HiBench Key Features
HiBench Examples and Code Snippets
Community Discussions
Trending Discussions on HiBench
QUESTION
I am running some experiments to test the fault tolerance capabilities of Apache Flink. I am currently using the HiBench framework with the WordCount micro benchmark implemented for Flink.
I noticed that if I kill a TaskManager during an execution, the state of the Flink operators is recovered after the automatic "redeploy" but many (all?) tuples sent from the benchmark to Kafka are missed (stored in Kafka but not received in Flink).
It seems that after the recovery, the FlinkKafkaConsumer
(the benchmark uses FlinkKafkaConsumer08) in place of start reading from the last offset read before the failure start reading from the latest available one (losing all the event sent during the failure).
Any suggestion?
Thanks!
...ANSWER
Answered 2018-Apr-09 at 14:46The problem was with the HiBench framework itself and with the latest version of Flink.
I had to update the version of Flink in the benchmark in order to use the "setStartFromGroupOffsets()" method in the Kafka consumer.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install HiBench
Run HadoopBench
Run SparkBench
Run StreamingBench (Spark streaming, Flink, Storm, Gearpump)
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page