kandi background
Explore Kits

metastore | Flexible Metadata , Data and Configuration information store

 by   pentaho Java Version: Current License: No License

 by   pentaho Java Version: Current License: No License

Download this library from

kandi X-RAY | metastore Summary

metastore is a Java library typically used in Utilities applications. metastore has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub, GitLab.
This project contains a flexible metadata, data and configuration information store. Anyone can use it but it was designed for use within the Pentaho software stack. The "meta-model" is simple and very generic. The top level entry is always a namespace. The namespace can be used by non-Pentaho companies to store their own information separate from anyone else. The next level in the meta-model is an Element Type. A very generic name was chosen on purpose to reflect the fact that you can store just about anything. The element is at this point in time nothing more than a simple placeholder: an ID, a name and a description. Finally, each element type can have a series of Elements. Each element has an ID and a set of key/value pairs (called "id" and "value") as child attributes. All attributes can have children of their own. An element has security information: an owner and a set of owner-permissions describing who has which permission to use the element. (CRUD permissions).
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • metastore has a low active ecosystem.
  • It has 19 star(s) with 77 fork(s). There are 70 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 1 open issues and 0 have been closed. There are 2 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of metastore is current.
metastore Support
Best in #Java
Average in #Java
metastore Support
Best in #Java
Average in #Java

quality kandi Quality

  • metastore has 0 bugs and 0 code smells.
metastore Quality
Best in #Java
Average in #Java
metastore Quality
Best in #Java
Average in #Java

securitySecurity

  • metastore has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • metastore code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
metastore Security
Best in #Java
Average in #Java
metastore Security
Best in #Java
Average in #Java

license License

  • metastore does not have a standard license declared.
  • Check the repository for any license declaration and review the terms closely.
  • Without a license, all rights are reserved, and you cannot use the library in your applications.
metastore License
Best in #Java
Average in #Java
metastore License
Best in #Java
Average in #Java

buildReuse

  • metastore releases are not available. You will need to build from source code and install.
  • Build file is available. You can build the component from source.
  • Installation instructions, examples and code snippets are available.
metastore Reuse
Best in #Java
Average in #Java
metastore Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed metastore and discovered the below as its top functions. This is intended to give you an instant insight into metastore implemented functionality, and help decide if they suit your requirements.

  • Saves attributes in a meta store .
    • Delete an element type .
      • Load an attribute .
        • Append the security element .
          • Deletes a folder and all its subdirectories .
            • Save the metadata to a stream result file .
              • Deletes a child attribute with the given id .
                • Execute a locked operation .
                  • Registers an element type with the given namespace .
                    • Get the text node value

                      Get all kandi verified functions for this library.

                      Get all kandi verified functions for this library.

                      metastore Key Features

                      Flexible Metadata, Data and Configuration information store

                      metastore Examples and Code Snippets

                      See all related Code Snippets

                      How to build

                      copy iconCopydownload iconDownload
                      $ mvn clean install
                      

                      Not able to query AWS Glue/Athena views in Databricks Runtime ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;']

                      copy iconCopydownload iconDownload
                      import boto3
                      import time
                      
                      
                      def execute_blocking_athena_query(query: str, athenaOutputPath, aws_region):
                          athena = boto3.client("athena", region_name=aws_region)
                          res = athena.start_query_execution(QueryString=query, ResultConfiguration={
                              'OutputLocation': athenaOutputPath})
                          execution_id = res["QueryExecutionId"]
                          while True:
                              res = athena.get_query_execution(QueryExecutionId=execution_id)
                              state = res["QueryExecution"]["Status"]["State"]
                              if state == "SUCCEEDED":
                                  return
                              if state in ["FAILED", "CANCELLED"]:
                                  raise Exception(res["QueryExecution"]["Status"]["StateChangeReason"])
                              time.sleep(1)
                      
                      
                      def create_cross_platform_view(db: str, table: str, query: str, spark_session, athenaOutputPath, aws_region):
                          glue = boto3.client("glue", region_name=aws_region)
                          glue.delete_table(DatabaseName=db, Name=table)
                          create_view_sql = f"create view {db}.{table} as {query}"
                          execute_blocking_athena_query(create_view_sql, athenaOutputPath, aws_region)
                          presto_schema = glue.get_table(DatabaseName=db, Name=table)["Table"][
                              "ViewOriginalText"
                          ]
                          glue.delete_table(DatabaseName=db, Name=table)
                      
                          spark_session.sql(create_view_sql).show()
                          spark_view = glue.get_table(DatabaseName=db, Name=table)["Table"]
                          for key in [
                              "DatabaseName",
                              "CreateTime",
                              "UpdateTime",
                              "CreatedBy",
                              "IsRegisteredWithLakeFormation",
                              "CatalogId",
                          ]:
                              if key in spark_view:
                                  del spark_view[key]
                          spark_view["ViewOriginalText"] = presto_schema
                          spark_view["Parameters"]["presto_view"] = "true"
                          spark_view = glue.update_table(DatabaseName=db, TableInput=spark_view)
                      
                      
                      create_cross_platform_view("<YOUR DB NAME>", "<YOUR VIEW NAME>", "<YOUR VIEW SQL QUERY>", <SPARK_SESSION_OBJECT>, "<S3 BUCKET FOR OUTPUT>", "<YOUR-ATHENA-SERVICE-AWS-REGION>")
                      

                      How to Set Log Level for Third Party Jar in Spark

                      copy iconCopydownload iconDownload
                      log4j.logger.com.kinetica.spark=INFO
                      log4j.logger.com.kinetica.spark.LoaderParams=WARN
                      

                      Snowflake Pyspark: Failed to find data source: snowflake

                      copy iconCopydownload iconDownload
                      docker run --interactive --tty \
                                      --volume /src:/src \
                                      --volume /data/:/root/data \
                                      --volume /jars:/jars \
                                      reports bash '-c' "cp -r /jars /opt/spark-3.1.1-bin-hadoop3.2/jars && cd /home && export PYTHONIOENCODING=utf8 && spark-submit \
                                      /src/reports.py \
                                      --jars net.snowflake:/jars/snowflake-jdbc-3.13.14.jar,net.snowflake:/jars/spark-snowflake_2.12-2.10.0-spark_3.1.jar \
                                      --partitions-output "4" \
                                      1> >(sed $'s,.*,\e[32m&\e[m,' >&2)" || true
                      
                      docker run --interactive --tty \
                                      --volume /src:/src \
                                      --volume /data/:/root/data \
                                      --volume /jars:/jars \
                                      reports bash '-c' "cp -r /jars /opt/spark-3.1.1-bin-hadoop3.2/jars && cd /home && export PYTHONIOENCODING=utf8 && spark-submit \
                                      --jars /jars/snowflake-jdbc-3.13.14.jar,/jars/spark-snowflake_2.12-2.10.0-spark_3.1.jar \
                                      /src/reports.py \
                                      --partitions-output "4" \
                                      1> >(sed $'s,.*,\e[32m&\e[m,' >&2)" || true
                      
                      docker run --interactive --tty \
                                      --volume /src:/src \
                                      --volume /data/:/root/data \
                                      --volume /jars:/jars \
                                      reports bash '-c' "cp -r /jars /opt/spark-3.1.1-bin-hadoop3.2/jars && cd /home && export PYTHONIOENCODING=utf8 && spark-submit \
                                      /src/reports.py \
                                      --jars net.snowflake:/jars/snowflake-jdbc-3.13.14.jar,net.snowflake:/jars/spark-snowflake_2.12-2.10.0-spark_3.1.jar \
                                      --partitions-output "4" \
                                      1> >(sed $'s,.*,\e[32m&\e[m,' >&2)" || true
                      
                      docker run --interactive --tty \
                                      --volume /src:/src \
                                      --volume /data/:/root/data \
                                      --volume /jars:/jars \
                                      reports bash '-c' "cp -r /jars /opt/spark-3.1.1-bin-hadoop3.2/jars && cd /home && export PYTHONIOENCODING=utf8 && spark-submit \
                                      --jars /jars/snowflake-jdbc-3.13.14.jar,/jars/spark-snowflake_2.12-2.10.0-spark_3.1.jar \
                                      /src/reports.py \
                                      --partitions-output "4" \
                                      1> >(sed $'s,.*,\e[32m&\e[m,' >&2)" || true
                      

                      Spark SQL queries against Delta Lake Tables using Symlink Format Manifest

                      copy iconCopydownload iconDownload
                      delta.`<table-path>`
                      
                      spark.sql("""select * from delta.`s3://<bucket>/<key>/<table-name>/` limit 10""")
                      
                      delta.`<table-path>`
                      
                      spark.sql("""select * from delta.`s3://<bucket>/<key>/<table-name>/` limit 10""")
                      

                      Spark application syncing with Hive metastore - &quot;There is no primary group for UGI spark&quot; error

                      copy iconCopydownload iconDownload
                      System.setProperty("HADOOP_USER_NAME", "root")
                      

                      How do you implement SASTokenProvider for per-container SAS token access?

                      copy iconCopydownload iconDownload
                      %scala
                      package com.foo
                      
                      import org.apache.hadoop.fs.FileSystem
                      import org.apache.spark.sql.catalyst.DefinedByConstructorParams
                      
                      import scala.util.Try
                      
                      import scala.language.implicitConversions
                      import scala.language.reflectiveCalls
                      
                      trait DBUtilsApi {
                          type SecretUtils
                          type SecretMetadata
                          type SecretScope
                          val secrets: SecretUtils
                      }
                      
                      object ReflectiveDBUtils extends DBUtilsApi {
                          
                          private lazy val dbutils: DBUtils =
                              Class.forName("com.databricks.service.DBUtils$").getField("MODULE$").get().asInstanceOf[DBUtils]
                      
                          override lazy val secrets: SecretUtils = dbutils.secrets
                      
                          type DBUtils = AnyRef {
                              val secrets: SecretUtils
                          }
                      
                          type SecretUtils = AnyRef {
                              def get(scope: String, key: String): String
                              def getBytes(scope: String, key: String): Array[Byte]
                              def list(scope: String): Seq[SecretMetadata]
                              def listScopes(): Seq[SecretScope]
                          }
                      
                          type SecretMetadata = DefinedByConstructorParams { val key: String }
                      
                          type SecretScope = DefinedByConstructorParams { val name: String }
                      }
                      
                      class VaultTokenProvider extends org.apache.hadoop.fs.azurebfs.extensions.SASTokenProvider {
                        def getSASToken(accountName: String,fileSystem: String,path: String,operation: String): String = {
                          return ReflectiveDBUtils.secrets.get("scope", "SECRET")
                        }
                        def initialize(configuration: org.apache.hadoop.conf.Configuration, accountName: String): Unit = {    
                        }
                      }
                      
                      spark.conf.set("fs.azure.account.auth.type.bidbtests.dfs.core.windows.net", "SAS")
                      spark.conf.set("fs.azure.sas.token.provider.type.bidbtests.dfs.core.windows.net", "com.foo.VaultTokenProvider")
                      
                      %scala
                      package com.foo
                      
                      import org.apache.hadoop.fs.FileSystem
                      import org.apache.spark.sql.catalyst.DefinedByConstructorParams
                      
                      import scala.util.Try
                      
                      import scala.language.implicitConversions
                      import scala.language.reflectiveCalls
                      
                      trait DBUtilsApi {
                          type SecretUtils
                          type SecretMetadata
                          type SecretScope
                          val secrets: SecretUtils
                      }
                      
                      object ReflectiveDBUtils extends DBUtilsApi {
                          
                          private lazy val dbutils: DBUtils =
                              Class.forName("com.databricks.service.DBUtils$").getField("MODULE$").get().asInstanceOf[DBUtils]
                      
                          override lazy val secrets: SecretUtils = dbutils.secrets
                      
                          type DBUtils = AnyRef {
                              val secrets: SecretUtils
                          }
                      
                          type SecretUtils = AnyRef {
                              def get(scope: String, key: String): String
                              def getBytes(scope: String, key: String): Array[Byte]
                              def list(scope: String): Seq[SecretMetadata]
                              def listScopes(): Seq[SecretScope]
                          }
                      
                          type SecretMetadata = DefinedByConstructorParams { val key: String }
                      
                          type SecretScope = DefinedByConstructorParams { val name: String }
                      }
                      
                      class VaultTokenProvider extends org.apache.hadoop.fs.azurebfs.extensions.SASTokenProvider {
                        def getSASToken(accountName: String,fileSystem: String,path: String,operation: String): String = {
                          return ReflectiveDBUtils.secrets.get("scope", "SECRET")
                        }
                        def initialize(configuration: org.apache.hadoop.conf.Configuration, accountName: String): Unit = {    
                        }
                      }
                      
                      spark.conf.set("fs.azure.account.auth.type.bidbtests.dfs.core.windows.net", "SAS")
                      spark.conf.set("fs.azure.sas.token.provider.type.bidbtests.dfs.core.windows.net", "com.foo.VaultTokenProvider")
                      

                      How to delete data physically with Presto/Trino?

                      copy iconCopydownload iconDownload
                      CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
                      CREATE TABLE hive.xyz.test AS SELECT (...);
                      
                      DELETE FROM hive.xyz.test WHERE TRUE;
                      
                      -- Data ARE physically deleted
                      
                      
                      CREATE SCHEMA hive.xyz;
                      CREATE TABLE hive.xyz.test 
                          WITH (external_location = 'abfs://...') 
                          AS SELECT (...);
                      
                      DELETE FROM hive.xyz.test WHERE TRUE;
                      
                      -- Data ARE NOT physically deleted.
                      
                      CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
                      CREATE TABLE hive.xyz.test AS SELECT (...);
                      
                      DELETE FROM hive.xyz.test WHERE TRUE;
                      
                      -- Data ARE physically deleted
                      
                      
                      CREATE SCHEMA hive.xyz;
                      CREATE TABLE hive.xyz.test 
                          WITH (external_location = 'abfs://...') 
                          AS SELECT (...);
                      
                      DELETE FROM hive.xyz.test WHERE TRUE;
                      
                      -- Data ARE NOT physically deleted.
                      

                      Spark Java append data to Hive table

                      copy iconCopydownload iconDownload
                      df.registerTempTable("sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
                      
                      sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
                      
                      sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
                      
                      df.registerTempTable("sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
                      
                      sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
                      
                      sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
                      
                      df.registerTempTable("sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
                      
                      sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
                      
                      sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
                      
                      df.registerTempTable("sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
                      
                      sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
                      
                      sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
                      
                      df.registerTempTable("sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
                      
                      sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
                      
                      sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
                      
                      sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
                      

                      Apache Spark: broadcast join behaviour: filtering of joined tables and temp tables

                      copy iconCopydownload iconDownload
                      df = df1.join(F.broadcast(df2),df1.some_col == df2.some_col, "left")
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildLeft, false
                      :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#168]
                      :  +- LocalTableScan [key#122, df_a_column#123]
                      +- *(1) LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#152]
                         +- LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#184]
                         +- LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildLeft, false
                      :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#168]
                      :  +- LocalTableScan [key#122, df_a_column#123]
                      +- *(1) LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#152]
                         +- LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#184]
                         +- LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildLeft, false
                      :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#168]
                      :  +- LocalTableScan [key#122, df_a_column#123]
                      +- *(1) LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#152]
                         +- LocalTableScan [key#111, value#112]
                      
                      == Physical Plan ==
                      *(1) BroadcastHashJoin [key#122], [key#111], Inner, BuildRight, false
                      :- *(1) LocalTableScan [key#122, df_a_column#123]
                      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#184]
                         +- LocalTableScan [key#111, value#112]
                      

                      Performance of spark while reading from hive vs parquet

                      copy iconCopydownload iconDownload
                      spark.read.option("basePath", "s3a://....").parquet("s3a://..../date_col=2021-06-20")
                      

                      See all related Code Snippets

                      Community Discussions

                      Trending Discussions on metastore
                      • Bigquery as metastore for Dataproc
                      • Not able to query AWS Glue/Athena views in Databricks Runtime ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;']
                      • Unable to run pyspark on local windows environment: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativei
                      • Confluent Platform - how to properly use ksql-datagen?
                      • Spark-SQL plug in on HIVE
                      • How to Set Log Level for Third Party Jar in Spark
                      • Snowflake Pyspark: Failed to find data source: snowflake
                      • Spark SQL queries against Delta Lake Tables using Symlink Format Manifest
                      • How to run Spark SQL Thrift Server in local mode and connect to Delta using JDBC
                      • Why Uncache table in spark-sql not working?
                      Trending Discussions on metastore

                      QUESTION

                      Bigquery as metastore for Dataproc

                      Asked 2022-Apr-01 at 04:00

                      We are trying to migrate pyspark script from on-premise which creates and drops tables in Hive with data transformations to GCP platform.

                      Hive is replaced by BigQuery. In this case, the hive reads and writes is converted to bigquery reads and writes using spark-bigquery-connector.

                      However the problem lies with creation and dropping of bigquery tables via spark sql as spark sql will default run the create and drop queries on hive backed by hive metastore not on big query.

                      I wanted to check if there is plan to incorporate DDL statements support as well as part of spark-bigquery-connector.

                      Also, from architecture perspective is it possible to base the metastore for spark sql on bigquery so that any create or drop statement can be run on bigquery from spark.

                      ANSWER

                      Answered 2022-Apr-01 at 04:00

                      I don't think Spark SQL will support BigQuery as metastore, nor BQ connector will support BQ DDL. On Dataproc, Dataproc Metastore (DPMS) is the recommended solution for Hive and Spark SQL metastore.

                      In particular, for no-prem to Dataproc migration, it is more straightforward to migrate to DPMS, see this doc.

                      Source https://stackoverflow.com/questions/71676161

                      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                      Vulnerabilities

                      No vulnerabilities reported

                      Install metastore

                      metastore uses the maven framework. This is a maven project, and to build it use the following command. Optionally you can specify -Drelease to trigger obfuscation and/or uglification (as needed). Optionally you can specify -Dmaven.test.skip=true to skip the tests (even though you shouldn't as you know). The build result will be a Pentaho package located in target.
                      Maven, version 3+
                      Java JDK 11
                      This settings.xml in your /.m2 directory
                      Don't use IntelliJ's built-in maven. Make it use the same one you use from the commandline. Project Preferences -> Build, Execution, Deployment -> Build Tools -> Maven ==> Maven home directory

                      Support

                      For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

                      DOWNLOAD this Library from

                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      Explore Related Topics

                      Share this Page

                      share link
                      Consider Popular Java Libraries
                      Try Top Libraries by pentaho
                      Compare Java Libraries with Highest Support
                      Compare Java Libraries with Highest Quality
                      Compare Java Libraries with Highest Security
                      Compare Java Libraries with Permissive License
                      Compare Java Libraries with Highest Reuse
                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      • © 2022 Open Weaver Inc.