spark-bigquery | Google BigQuery support for Spark , SQL , and DataFrames

by spotify Scala Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | spark-bigquery Summary

spark-bigquery is a Scala library typically used in Big Data, Pandas, Spark applications. spark-bigquery has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

THIS PROJECT IS IN MAINTENANCE MODE DUE TO THE FACT THAT IT’S NOT WIDELY USED WITHIN SPOTIFY. WE’LL PROVIDE BEST EFFORT SUPPORT FOR ISSUES AND PULL REQUESTS BUT DO EXPECT DELAY IN RESPONSES.

Support

Quality

Security

License

Reuse

Support

spark-bigquery has a low active ecosystem.

It has 151 star(s) with 54 fork(s). There are 36 watchers for this library.

It had no major release in the last 6 months.

There are 30 open issues and 20 have been closed. On average issues are closed in 32 days. There are 4 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of spark-bigquery is current.

Quality

spark-bigquery has no bugs reported.

Security

spark-bigquery has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

spark-bigquery is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

spark-bigquery releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-bigquery

Get all kandi verified functions for this library.

spark-bigquery Key Features

No Key Features are available at this moment for spark-bigquery.

spark-bigquery Examples and Code Snippets

No Code Snippets are available at this moment for spark-bigquery.

Community Discussions

Trending Discussions on spark-bigquery

why dataproc not recognizing argument : spark.submit.deployMode=cluster?

How to pass spark parameter to a dataproc workflow template?

Dataproc notebook cannot import or export to BigQuery : Class Not Found Exception

Issue while writing into Date datatype in Big Query using Spark Java

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment

error while loading data to bigquery table from dataproc cluster

Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version

pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class

Facing Issue while using Spark-BigQuery-Connector with Java

Issue with Spark Big Query Connector with Java

QUESTION

why dataproc not recognizing argument : spark.submit.deployMode=cluster?

Asked 2021-Apr-30 at 15:10

I am submitting a spark job to dataproc this way :

gcloud dataproc jobs submit spark --cluster=$CLUSTER --region=$REGION --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.19.1, spark.submit.deployMode=cluster --class path.to.my.main.class --jars=path.to.jars -- "-p" "some_arg" "-z" "some_other_arg"

But i am getting this error :

ERROR: (gcloud.dataproc.jobs.submit.spark) unrecognized arguments: spark.submit.deployMode=cluster

Any idea why? thank you in advance for your help.

It works fine this way (without the cluster mode):

gcloud dataproc jobs submit spark --cluster=$CLUSTER --region=$REGION --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.19.1 --class path.to.my.main.class --jars=path.to.jars -- "-p" "some_arg" "-z" "some_other_arg"

...

ANSWER

Answered 2021-Apr-30 at 15:10

It seems you have a space between the first property and the second. Either remove it or surround both of them with quotes.

Another option is to replace this with

Source https://stackoverflow.com/questions/67314006

QUESTION

How to pass spark parameter to a dataproc workflow template?

Asked 2021-Mar-12 at 14:58

Here's what I have:

...

ANSWER

Answered 2021-Jan-21 at 20:28

This is described in the documentation gcloud dataproc workflow-templates add-job pyspark:

Source https://stackoverflow.com/questions/65823816

QUESTION

Dataproc notebook cannot import or export to BigQuery : Class Not Found Exception

Asked 2021-Feb-12 at 00:27

Here is the spark session I am making. I include the latest jar for spark big query connector for Dataproc 1.5 .

...

ANSWER

Answered 2021-Feb-10 at 23:39

I think the description in SPARK-21752 is relevant - by this time the application is already launched and you cannot change its classpath. Please try to run with pyspark --jars gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar (and then you can skip the .config() part.

Source https://stackoverflow.com/questions/66145702

QUESTION

Issue while writing into Date datatype in Big Query using Spark Java

Asked 2021-Feb-01 at 14:30

I am trying to store date datatype column in BigQuery via Spark

...

ANSWER

Answered 2021-Feb-01 at 14:30

The resolution is to use intermediateFormat as Orc. With intermediate format as Avro it is not working, and we can't use parquet(default) format as we have array data type in our table where Big Query create intermediate format like explained here. Save Array in BigQuery using Java

Source https://stackoverflow.com/questions/65792558

QUESTION

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment

Asked 2020-Dec-15 at 12:51

I'm trying to connect BigQuery Dataset to Databrick and run Script using Pyspark.

Procedures I've done:

I patched the BigQuery Json API to databrick in dbfs for connection access.
Then I added spark-bigquery-latest.jar in the cluster library and I ran my Script.

When I run this script, I didn't face any error.

...

ANSWER

Answered 2020-Dec-15 at 08:56

Can you avoid using queries and just use the table option?

Source https://stackoverflow.com/questions/65302174

QUESTION

error while loading data to bigquery table from dataproc cluster

Asked 2020-Nov-18 at 16:32

I have a spark job that run in dataproc I want to load result to BigQuery, I know that I have to add spark-bigquery connector to save data to bigquery

...

ANSWER

Answered 2020-Nov-18 at 16:32

Use below buil.sbt file for building fat jar file.

build.sbt

Source https://stackoverflow.com/questions/64896709

QUESTION

Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version

Asked 2020-Nov-18 at 01:49

I'm writing into BigTable using JavaHBaseContext bulkput API. This is working fine with below spark and scala version

...

ANSWER

Answered 2020-Nov-18 at 01:49

Seems the exception has to do with the dependency org.apache.hbase:hbase-spark:2.0.2.3.1.0.0-78:

java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.spark.HBaseConnectionCache$ at org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:488) at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1.apply(HBaseContext.scala:225) at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1.apply(HBaseContext.scala:225)

From the maven page, we can see it is built with Scala 2.11, which might explain it doesn't work with Dataproc 1.5 which comes with Scala 2.12.

I think you can try Dataproc 1.4 which comes with Spark 2.4 and Scala 2.11.12, and update your app's dependency accordingly.

Source https://stackoverflow.com/questions/64877813

QUESTION

pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class

Asked 2020-Nov-13 at 20:52

I created a dataproc cluster and was trying to submit my local job for testing.

...

ANSWER

Answered 2020-Nov-13 at 20:52

The Dataproc preview image contains Spark 3 with Scala 2.12. The connector jar you have referred to is based on Scala 2.11. Please change the URL to gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar.

Source https://stackoverflow.com/questions/64796397

QUESTION

Facing Issue while using Spark-BigQuery-Connector with Java

Asked 2020-Nov-06 at 05:52

I am able to read the data from BigQuery table via spark big query connector from local, but when I deploy this in Google Cloud and running via dataproc, I am getting below exception.If you see the below logs, its able to identify the schema of the table and after that it waited for 8-10 mins and threw the below exception. Can someone help on this?

...

ANSWER

Answered 2020-Nov-06 at 05:52

For other's,

Here is the big-query dependency I used and its working fine now.

Source https://stackoverflow.com/questions/64609741

QUESTION

Issue with Spark Big Query Connector with Java

Asked 2020-Nov-05 at 22:09

Getting Below issue with the Spark Big Query connector in Dataproc cluster with below configuraton. Image: 1.5.21-debian10 Spark Version: 2.4.7 Scala Version: 2.12.10

This is working fine from local but failing when I deploy this in dataproc cluster.Can someone suggest some pointers for this issue?

...

ANSWER

Answered 2020-Nov-05 at 22:09

Can you please replace the Spark BigQuery connector to the shaded one?

Source https://stackoverflow.com/questions/64697490

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install spark-bigquery

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: