spark-on-k8s-operator | Kubernetes operator for managing the lifecycle of Apache
kandi X-RAY | spark-on-k8s-operator Summary
kandi X-RAY | spark-on-k8s-operator Summary
Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes Mutating Admission Webhook, which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm chart. Check out the Quick Start Guide on how to enable the webhook.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-on-k8s-operator
spark-on-k8s-operator Key Features
spark-on-k8s-operator Examples and Code Snippets
Community Discussions
Trending Discussions on spark-on-k8s-operator
QUESTION
I am trying to install ANY working version of spark-operator https://console.cloud.google.com/gcr/images/spark-operator/GLOBAL/spark-operator?tag=v1beta2-1.3.1-3.1.1 on my local kubernetes. However, spark pod is stuck on ImagePullBackOff trying to download a version that does not exist.
Commands:
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6 --create-namespace
kubectl get pods -n spark-operator6
NAME READY STATUS RESTARTS AGE v1beta2-1.2.1-3.0.0-spark-operator-67577fd4d4-m9zmw 0/1 ImagePullBackOff 0 6s
kubectl describe pod v1beta2-1.2.1-3.0.0-spark-operator-67577fd4d4-m9zmw
Image: gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1 - a different one! Failed to pull image "gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1": rpc error: code = Unknown desc = Error response from daemon: manifest for gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1 not found: manifest unknown: Failed to fetch "v1beta2-1.3.1-3.1.1" from request "/v2/spark-operator/spark-operator/manifests/v1beta2-1.3.1-3.1.1"
Now this just seems like an incorrect version. Tried to override it via helm install flag --operatorVersion, but that is completely ignored.
Took a look at its template via helm template v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6 --set operatorVersion=v1beta2-1.2.1-3.0.0 > template.yaml
and all the app.kubernetes.io/instance are on correct version, but ALL app.kubernetes.io/version were showing the non-existing v1beta2-1.3.1-3.1.1. Corrected it and tried to install with the correct values via
helm install -f template.yaml v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6
but this was completely ignored again.
I feel like I am missing something very basic here, helm install cannot possibly be that broken. Any help is much appreciated, thanks.
...ANSWER
Answered 2022-Jan-22 at 15:58Try this one
helm upgrade -i my-release spark-operator/spark-operator --namespace spark-operator --set image.tag=v1beta2-1.3.2-3.1.1 --set image.repository=ghcr.io/googlecloudplatform/spark-operator
the key thing here is the image.tag
and image.repository.
I am assuming you have a namespace for spark-operator. Please adjust namespace according to your requirements.
QUESTION
I'm using google spark-operator and some calico network policies to protect the namespaces.
The Spark driver pods need to be able to communicate with the kubernetes
service in the default
namespace to speak with the api-server.
This is what I get :
ANSWER
Answered 2021-May-04 at 15:08So... In the end...
Network policies don't work on services that dont target pods, which is the case of this particular kubernetes
service sitting quietly in the default
namespace. It's a special service that always points to the api-server.
The solution is to retrieve the api-server's real IP and allow egress
-ing to it.
To find this IP you can use this command :
QUESTION
I have a K8s cluster up and running, on VMs inside VMWare Workstation, as of now. I'm trying to deploy a Spark application natively using the official documentation from here. However, I also landed on this article which made it clearer, I felt.
Now, earlier my setup was running inside nested VMs, basically my machine is on Win10 and I had an Ubuntu VM inside which I had 3 more VMs running for the cluster (not the best idea, I know).
When I tried to run my setup by following the article mentioned, I first created a service account inside the cluster called spark
, then created a clusterrolebinding called spark-role
, gave edit
as the clusterrole and assigned it to the spark
service account so that Spark driver pod has sufficient permissions.
I then try to run the example SparkPi job using this command line:
...ANSWER
Answered 2020-May-04 at 15:20Make sure the kubernetes version that you are deploying is compatible with the Spark version that you are using.
Apache Spark uses the Kubernetes Client library to communicate with the kubernetes cluster.
As per today the latest LTS Spark version is 2.4.5 which includes the kubernetes client version 4.6.3.
Checking the compatibility matrix of Kubernetes Client: here
The supported kubernetes versions go all the way up to v1.17.0.
Based on my personal experience Apache Spark 2.4.5 works well with kubernetes version v1.15.3. I have had problems with more recent versions.
When a not supported kubernetes version is used, the logs to get are as the ones you are describing:
QUESTION
Is it possible to recover automatically from an exception thrown during query execution?
Context: I'm developing a Spark application that reads data from a Kafka topic, processes the data, and outputs to S3. However, after running for a couple of days in production, the spark application faces some network hiccups from S3 that causes an exception to be thrown and stops the application. It's also worth mentioning that this application runs on Kubernetes using GCP's Spark k8s Operator.
From what I've seen so far, these exceptions are minor and a simple restart of the application solves the issue. Can we handle those exceptions and restart the structured streaming query automatically?
Here's an example of a thrown exception:
...ANSWER
Answered 2020-May-08 at 14:55No, there is not in a reliable way to do this. BTW, No is also an answer.
Logic for checking exceptions are generally via try / catch running on the driver.
As unexpected situations at Executor level are already standardly handled by the Spark Framework itself for Structured Streaming, and if the error is non-recoverable, then the App / Job simply crashes after signalling of error(s) back to the driver unless you code try / catch within the various foreachXXX constructs.
- That said, it is not clear for the foreachXXX constructs that the micro batch will be recoverable in such an approach afaics, some part of the microbatch is highly likely lost. Hard to test though.
Given that Spark has things standardly catered for that you cannot hook into, why would it be possible to insert a loop or try/catch in the source of the program? Likewise broadcast variables area an issue - although some have techniques around this so they say. But it is not in the spirit of the framework.
So, good question as I wonder(ed) about this (in the past).
QUESTION
I am following this tutorial to run Spark-Pi Application using kubectl command from here. https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#running-the-examples
When I submit
kubectl apply -f spark-pi.yaml
and check the logs using kubectl logs spark-pi-driver -f
, I see this exception.
ANSWER
Answered 2020-Mar-20 at 03:00As pointed out by @Andreas, ${SPARK_HOME}/jars
doesn't contain jackson-datatype-jsr310
.
You can try to modify spark-docker/Dockerfile
and see how it works:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install spark-on-k8s-operator
Get started quickly with the Kubernetes Operator for Apache Spark using the Quick Start Guide. If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide. For more information, check the Design, API Specification and detailed User Guide.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page