spark-introduction | presentation given at the IBM Data Science | Machine Learning library
kandi X-RAY | spark-introduction Summary
kandi X-RAY | spark-introduction Summary
The presentation given at the IBM Data Science Connect Meeting titled Introduction to Apache Spark
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-introduction
spark-introduction Key Features
spark-introduction Examples and Code Snippets
Community Discussions
Trending Discussions on spark-introduction
QUESTION
I'm reading an article on Apache Spark and I came across the following sentence:
"Hadoop as a big data processing technology has been around for 10 years and has proven to be the solution of choice for processing large data sets. MapReduce is a great solution for one-pass computations, but not very efficient for use cases that require multi-pass computations and algorithms." (Full article)
Searching the web yields results about the difference between one-pass and multi-pass compilers (For instance, see This SO question)
However, I'm not really sure if the answer also applies for data processing. Can somebody explain me what one-pass computation and multi-pass computation is, and why the latter is better, and thus is used in Spark?
...ANSWER
Answered 2019-Oct-16 at 08:11One pass computations is when you are reading the dataset once whereas multipass computations is when a dataset is read once from the disk and multiple computations or operation are done on the same dataset. Apache Spark processing framework allows you to read data once which is then cached into memory and then we can perform multi pass computations on the data. These computations can be done on the dataset very quickly because the data is present into memory of the machine and apache spark does not need to read the data again from the disk which helps us to save lot of input output operations time. As per the definition of apache spark it is an in memory processing framework which means the data and transformation on which the computation is done is present in memory itself.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install spark-introduction
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page