metastore | Store and restore metadata from a filesystem | Cloud Storage library
kandi X-RAY | metastore Summary
kandi X-RAY | metastore Summary
Store and restore metadata from a filesystem.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of metastore
metastore Key Features
metastore Examples and Code Snippets
Community Discussions
Trending Discussions on metastore
QUESTION
We are trying to migrate pyspark script from on-premise which creates and drops tables in Hive with data transformations to GCP platform.
Hive is replaced by BigQuery. In this case, the hive reads and writes is converted to bigquery reads and writes using spark-bigquery-connector.
However the problem lies with creation and dropping of bigquery tables via spark sql as spark sql will default run the create and drop queries on hive backed by hive metastore not on big query.
I wanted to check if there is plan to incorporate DDL statements support as well as part of spark-bigquery-connector.
Also, from architecture perspective is it possible to base the metastore for spark sql on bigquery so that any create or drop statement can be run on bigquery from spark.
...ANSWER
Answered 2022-Apr-01 at 04:00I don't think Spark SQL will support BigQuery as metastore, nor BQ connector will support BQ DDL. On Dataproc, Dataproc Metastore (DPMS) is the recommended solution for Hive and Spark SQL metastore.
In particular, for no-prem to Dataproc migration, it is more straightforward to migrate to DPMS, see this doc.
QUESTION
Attempting to read a view
which was created on AWS Athena (based on a Glue table that points to an S3's parquet file) using pyspark
over a Databricks cluster throws the following error for an unknown reason:
ANSWER
Answered 2022-Mar-30 at 15:27I was able to come up with a python script to fix the problem. It turns out that this exception occurs because Athena and Presto store view's metadata in a format that is different from what Databricks Runtime and Spark expect. You'll need to re-create your views through Spark
Python script example with execution example:
QUESTION
I'm trying to create local spark environment in Windows 11 with python.
I am using python 3.9 and spark version 3.2.1.
I have set my environmental variables to:
ANSWER
Answered 2022-Mar-16 at 09:29Not sure if this would be the fix, but neither of the links you posted for hadoop.dll and winutils.exe are for the version of Spark you're using (3.2.1)
I use 3.2.1 on Windows as well and always use this link to download the files and add them to my Spark bin https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin
QUESTION
I'm using a dockerized version of the Confluent Platform v 7.0.1:
...ANSWER
Answered 2022-Feb-18 at 22:37You may be hitting issues since you are running an old version of ksqlDB's quickstart (0.7.1) with Confluent Platform 7.0.1.
If you check out a quick start like this one: https://ksqldb.io/quickstart-platform.html, things may work better.
I looked for an updated version of that data generator and didn't find it quickly. If you are looking for more info about structured data, give https://docs.ksqldb.io/en/latest/how-to-guides/query-structured-data/ a read.
QUESTION
HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift framework is actually customised as HIVESERVER2. In this way, HIVE is acting as a service. Via programming language, we can use HIVE as a database.
The relationship between Spark-SQL and HIVE is that:
Spark-SQL just utilises the HIVE setup (HDFS file system, HIVE Metastore, Hiveserver2). When we invoke /sbin/start-thriftserver2.sh (present in spark installation), we are supposed to give hiveserver2 port number, and the hostname. Then via spark's beeline, we can actually create, drop and manipulate tables in HIVE. The API can be either Spark-SQL or HIVE QL. If we create a table / drop a table, it will be clearly visible if we login into HIVE and check(say via HIVE beeline or HIVE CLI). To put in other words, changes made via Spark can be seen in HIVE tables.
My understanding is that Spark does not have its own meta store setup like HIVE. Spark just utilises the HIVE setup and simply the SQL execution happens via Spark SQL API.
Is my understanding correct here?
Then I am little confused about the usage of bin/spark-sql.sh (which is also present in Spark installation). Documentation says that via this SQL shell, we can create tables like we do above (via Thrift Server/Beeline). Now my question is: How the metadata information is maintained by spark then?
Or like the first approach, can we make spark-sql CLI to communicate to HIVE (to be specific: hiveserver2 of HIVE) ? If yes, how can we do that ?
Thanks in advance!
...ANSWER
Answered 2022-Mar-11 at 13:53My understanding is that Spark does not have its own meta store setup like HIVE
Spark will start a Derby server on its own, if a Hive metastore is not provided
can we make spark-sql CLI to communicate to HIVE
Start an external metastore process, add a hive-site.xml
file to $SPARK_CONF_DIR
with hive.metastote.uris
, or use SET
SQL statements for the same
QUESTION
I use Spark to write data from Hive Table to Kinetica using this jar: kinetica-spark-7.0.6.1-jar-with-dependencies.jar
. However, when I run spark-submit
, the logger from the jar is printing the JDBC connection string with its credentials as follows:
ANSWER
Answered 2022-Mar-08 at 14:38In my configuration, I use the following to log the LoaderParams
statements at WARN
and everything else from the Kinetica Spark connector at INFO
:
QUESTION
I'm unable to connect to snowflake via a dockerized pyspark container. I do not find the snowflake documentation to be helpful nor the pyspark documentation at this point in time.
I'm using the following configuration installed & can be seen below in the Dockerfile
- python 3.7.12
- pyspark 3.1.1
- Hadoop 3.2
- jre-1.8.0-openjdk
- snowflake-jdbc-3.13.15.jar
- spark-snowflake_2.12-2.10.0-spark_3.1.jar
- snowflake-connector-python 2.7.4
ANSWER
Answered 2022-Mar-01 at 20:58instead of --jars
, try --packages=net.snowflake:snowflake-jdbc:3.13.14,net.snowflake:spark-snowflake_2.11:2.9.3-spark_2.4
QUESTION
I'm running spark 3.1.1 and an AWS emr-6.3.0 cluster with the following hive/metastore configurations:
...ANSWER
Answered 2022-Feb-18 at 03:49I ended up figuring this out myself.
You can save yourself a lot of pain and misunderstanding by grasping the distinction between querying a delta lake external table (via glue) and querying a delta lake table directly, see: https://docs.delta.io/latest/delta-batch.html#read-a-table
In order to query the delta lake table directly without having to interact or go through the external table, simple change the table reference in your spark sql query to the following format:
QUESTION
I'd like connect to Delta using JDBC and would like to run the Spark Thrift Server (STS) in local mode to kick the tyres.
I start STS using the following command:
...ANSWER
Answered 2022-Jan-08 at 06:42Once you can copy io.delta:delta-core_2.12:1.0.0 JAR file to $SPARK_HOME/lib and restart, this error goes away.
QUESTION
I'm learning Spark SQL, when I'm using spark-sql to uncache a table which has previously cached, but after submitted the uncache command, I can still query the cache table. Why this happened?
Spark version 3.2.0(Pre-built for Apache Hadoop 2.7)
Hadoop version 2.7.7
Hive metastore 2.3.9
Linux Info
...ANSWER
Answered 2021-Dec-17 at 02:19UNCACHE TABLE
removes the entries and associated data from the in-memory and/or on-disk cache for a given table or view, not drop the table. So you can still query it.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install metastore
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page