spark-on-k8s-operator | Kubernetes operator for managing the lifecycle of Apache

by GoogleCloudPlatform Go Version: spark-operator-chart-1.1.27 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(5)Vulnerabilities Install Support

kandi X-RAY | spark-on-k8s-operator Summary

spark-on-k8s-operator is a Go library typically used in Big Data, Spark applications. spark-on-k8s-operator has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes Mutating Admission Webhook, which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm chart. Check out the Quick Start Guide on how to enable the webhook.

Support

Quality

Security

License

Reuse

Support

spark-on-k8s-operator has a medium active ecosystem.

It has 2344 star(s) with 1246 fork(s). There are 89 watchers for this library.

It had no major release in the last 12 months.

There are 413 open issues and 572 have been closed. On average issues are closed in 50 days. There are 69 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of spark-on-k8s-operator is spark-operator-chart-1.1.27

Quality

spark-on-k8s-operator has 0 bugs and 0 code smells.

Security

spark-on-k8s-operator has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

spark-on-k8s-operator code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

spark-on-k8s-operator is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

spark-on-k8s-operator releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 18335 lines of code, 763 functions and 136 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-on-k8s-operator

Get all kandi verified functions for this library.

spark-on-k8s-operator Key Features

No Key Features are available at this moment for spark-on-k8s-operator.

spark-on-k8s-operator Examples and Code Snippets

No Code Snippets are available at this moment for spark-on-k8s-operator.

Community Discussions

Trending Discussions on spark-on-k8s-operator

Helm install spark-operator tries to download a version that does not exist and cannot be force to use the correct one

Calico's network policy can't select kubernetes.default service

Why am I not able to run sparkPi example on a Kubernetes (K8s) cluster?

Spark Structured Streaming recovering from a query exception

Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/datatype/jsr310/JavaTimeModule

QUESTION

Helm install spark-operator tries to download a version that does not exist and cannot be force to use the correct one

Asked 2022-Jan-22 at 15:58

I am trying to install ANY working version of spark-operator https://console.cloud.google.com/gcr/images/spark-operator/GLOBAL/spark-operator?tag=v1beta2-1.3.1-3.1.1 on my local kubernetes. However, spark pod is stuck on ImagePullBackOff trying to download a version that does not exist.

Commands:

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

helm install v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6 --create-namespace

kubectl get pods -n spark-operator6

NAME READY STATUS RESTARTS AGE v1beta2-1.2.1-3.0.0-spark-operator-67577fd4d4-m9zmw 0/1 ImagePullBackOff 0 6s

kubectl describe pod v1beta2-1.2.1-3.0.0-spark-operator-67577fd4d4-m9zmw

Image: gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1 - a different one! Failed to pull image "gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1": rpc error: code = Unknown desc = Error response from daemon: manifest for gcr.io/spark-operator/spark-operator:v1beta2-1.3.1-3.1.1 not found: manifest unknown: Failed to fetch "v1beta2-1.3.1-3.1.1" from request "/v2/spark-operator/spark-operator/manifests/v1beta2-1.3.1-3.1.1"

Now this just seems like an incorrect version. Tried to override it via helm install flag --operatorVersion, but that is completely ignored.

Took a look at its template via helm template v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6 --set operatorVersion=v1beta2-1.2.1-3.0.0 > template.yaml and all the app.kubernetes.io/instance are on correct version, but ALL app.kubernetes.io/version were showing the non-existing v1beta2-1.3.1-3.1.1. Corrected it and tried to install with the correct values via

helm install -f template.yaml v1beta2-1.2.1-3.0.0 spark-operator/spark-operator --namespace spark-operator6

but this was completely ignored again.

I feel like I am missing something very basic here, helm install cannot possibly be that broken. Any help is much appreciated, thanks.

...

ANSWER

Answered 2022-Jan-22 at 15:58

Try this one

helm upgrade -i my-release spark-operator/spark-operator --namespace spark-operator --set image.tag=v1beta2-1.3.2-3.1.1 --set image.repository=ghcr.io/googlecloudplatform/spark-operator

the key thing here is the image.tag and image.repository.

I am assuming you have a namespace for spark-operator. Please adjust namespace according to your requirements.

Source https://stackoverflow.com/questions/70803564

QUESTION

Calico's network policy can't select kubernetes.default service

Asked 2021-May-04 at 15:08

I'm using google spark-operator and some calico network policies to protect the namespaces.

The Spark driver pods need to be able to communicate with the kubernetes service in the default namespace to speak with the api-server.
This is what I get :

...

ANSWER

Answered 2021-May-04 at 15:08

So... In the end...

Network policies don't work on services that dont target pods, which is the case of this particular kubernetes service sitting quietly in the default namespace. It's a special service that always points to the api-server.

The solution is to retrieve the api-server's real IP and allow egress-ing to it.

To find this IP you can use this command :

Source https://stackoverflow.com/questions/67387396

QUESTION

Why am I not able to run sparkPi example on a Kubernetes (K8s) cluster?

Asked 2020-May-29 at 01:41

I have a K8s cluster up and running, on VMs inside VMWare Workstation, as of now. I'm trying to deploy a Spark application natively using the official documentation from here. However, I also landed on this article which made it clearer, I felt.

Now, earlier my setup was running inside nested VMs, basically my machine is on Win10 and I had an Ubuntu VM inside which I had 3 more VMs running for the cluster (not the best idea, I know).

When I tried to run my setup by following the article mentioned, I first created a service account inside the cluster called spark, then created a clusterrolebinding called spark-role, gave edit as the clusterrole and assigned it to the spark service account so that Spark driver pod has sufficient permissions.

I then try to run the example SparkPi job using this command line:

...

ANSWER

Answered 2020-May-04 at 15:20

Make sure the kubernetes version that you are deploying is compatible with the Spark version that you are using.

Apache Spark uses the Kubernetes Client library to communicate with the kubernetes cluster.

As per today the latest LTS Spark version is 2.4.5 which includes the kubernetes client version 4.6.3.

Checking the compatibility matrix of Kubernetes Client: here

The supported kubernetes versions go all the way up to v1.17.0.

Based on my personal experience Apache Spark 2.4.5 works well with kubernetes version v1.15.3. I have had problems with more recent versions.

When a not supported kubernetes version is used, the logs to get are as the ones you are describing:

Source https://stackoverflow.com/questions/61565751

QUESTION

Spark Structured Streaming recovering from a query exception

Asked 2020-May-08 at 19:53

Is it possible to recover automatically from an exception thrown during query execution?

Context: I'm developing a Spark application that reads data from a Kafka topic, processes the data, and outputs to S3. However, after running for a couple of days in production, the spark application faces some network hiccups from S3 that causes an exception to be thrown and stops the application. It's also worth mentioning that this application runs on Kubernetes using GCP's Spark k8s Operator.

From what I've seen so far, these exceptions are minor and a simple restart of the application solves the issue. Can we handle those exceptions and restart the structured streaming query automatically?

Here's an example of a thrown exception:

...

ANSWER

Answered 2020-May-08 at 14:55

No, there is not in a reliable way to do this. BTW, No is also an answer.

Logic for checking exceptions are generally via try / catch running on the driver.
As unexpected situations at Executor level are already standardly handled by the Spark Framework itself for Structured Streaming, and if the error is non-recoverable, then the App / Job simply crashes after signalling of error(s) back to the driver unless you code try / catch within the various foreachXXX constructs.
- That said, it is not clear for the foreachXXX constructs that the micro batch will be recoverable in such an approach afaics, some part of the microbatch is highly likely lost. Hard to test though.
Given that Spark has things standardly catered for that you cannot hook into, why would it be possible to insert a loop or try/catch in the source of the program? Likewise broadcast variables area an issue - although some have techniques around this so they say. But it is not in the spirit of the framework.

So, good question as I wonder(ed) about this (in the past).

Source https://stackoverflow.com/questions/61666010

QUESTION

Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/datatype/jsr310/JavaTimeModule

Asked 2020-Mar-20 at 03:00

I am following this tutorial to run Spark-Pi Application using kubectl command from here. https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#running-the-examples

When I submit

kubectl apply -f spark-pi.yaml and check the logs using kubectl logs spark-pi-driver -f , I see this exception.

...

ANSWER

Answered 2020-Mar-20 at 03:00

As pointed out by @Andreas, ${SPARK_HOME}/jars doesn't contain jackson-datatype-jsr310.

You can try to modify spark-docker/Dockerfile and see how it works:

Source https://stackoverflow.com/questions/60767580

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install spark-on-k8s-operator

The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm chart.
Get started quickly with the Kubernetes Operator for Apache Spark using the Quick Start Guide. If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide. For more information, check the Design, API Specification and detailed User Guide.