kandi background
Explore Kits

hive | managing large datasets residing in distributed storage using SQL

 by   apache Java Version: Current License: Apache-2.0

 by   apache Java Version: Current License: Apache-2.0

Download this library from

kandi X-RAY | hive Summary

hive is a Java library typically used in Big Data, Spark, Hadoop applications. hive has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.
Hive includes changes to the MetaStore schema. If you are upgrading from an earlier version of Hive it is imperative that you upgrade the MetaStore schema by running the appropriate schema upgrade scripts located in the scripts/metastore/upgrade directory. We have provided upgrade scripts for MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and Derby databases. If you are using a different database for your MetaStore you will need to provide your own upgrade script.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • hive has a medium active ecosystem.
  • It has 4217 star(s) with 3998 fork(s). There are 329 watchers for this library.
  • It had no major release in the last 12 months.
  • hive has no issues reported. There are 89 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of hive is current.
hive Support
Best in #Java
Average in #Java
hive Support
Best in #Java
Average in #Java

quality kandi Quality

  • hive has no bugs reported.
hive Quality
Best in #Java
Average in #Java
hive Quality
Best in #Java
Average in #Java

securitySecurity

  • hive has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
hive Security
Best in #Java
Average in #Java
hive Security
Best in #Java
Average in #Java

license License

  • hive is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
hive License
Best in #Java
Average in #Java
hive License
Best in #Java
Average in #Java

buildReuse

  • hive releases are not available. You will need to build from source code and install.
  • Build file is available. You can build the component from source.
  • Installation instructions, examples and code snippets are available.
hive Reuse
Best in #Java
Average in #Java
hive Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed hive and discovered the below as its top functions. This is intended to give you an instant insight into hive implemented functionality, and help decide if they suit your requirements.

  • Implement the fastSubtraction .
  • Analyze create table .
  • Determine if a map join can be performed .
  • Compares lazy object
  • Retrieves aggr statistics for the adgr table .
  • Reads encoded columns .
  • Given a UDF and a new UDF node which is equivalent to a UDF .
  • Read a field .
  • Perform a shared work optimization .
  • Sets the map work .

hive Key Features

Hive includes changes to the MetaStore schema. If you are upgrading from an earlier version of Hive it is imperative that you upgrade the MetaStore schema by running the appropriate schema upgrade scripts located in the scripts/metastore/upgrade directory.

We have provided upgrade scripts for MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and Derby databases. If you are using a different database for your MetaStore you will need to provide your own upgrade script.

General Info

copy iconCopydownload iconDownload
http://hive.apache.org/

Requirements

copy iconCopydownload iconDownload
| Hive Version  | Java Version  |
| ------------- |:-------------:|
| Hive 1.0      | Java 6        |
| Hive 1.1      | Java 6        |
| Hive 1.2      | Java 7        |
| Hive 2.x      | Java 7        |
| Hive 3.x      | Java 8        |
| Hive 4.x      | Java 8        |


Hadoop

How to duplicate row based on int column

copy iconCopydownload iconDownload
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)

select t.*
  from your_table t --use your table here
       lateral view explode(split(space(t.sampling_rate-1),' '))e 
name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5
-----------------------
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)

select t.*
  from your_table t --use your table here
       lateral view explode(split(space(t.sampling_rate-1),' '))e 
name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5

Getting java.lang.ClassNotFoundException when I try to do spark-submit, referred other similar queries online but couldnt get it to work

copy iconCopydownload iconDownload
<project>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>4.5.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

SQL: Extract from messy JSON nested field with backslashes

copy iconCopydownload iconDownload
regexp_replace(obj,'\\\\"','"') 

Hive Explode the Array of Struct key: value:

copy iconCopydownload iconDownload
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)

SELECT max(case when e.key='name' then e.value end) as name, 
       max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
    name    status  
------------------------
Jane Doe    ACTIVE
SELECT detail_data[0].value as client_status,
       detail_data[1].value as name
 from sample_table 
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
       case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as status
FROM sample_table
-----------------------
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)

SELECT max(case when e.key='name' then e.value end) as name, 
       max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
    name    status  
------------------------
Jane Doe    ACTIVE
SELECT detail_data[0].value as client_status,
       detail_data[1].value as name
 from sample_table 
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
       case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as status
FROM sample_table
-----------------------
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)

SELECT max(case when e.key='name' then e.value end) as name, 
       max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
    name    status  
------------------------
Jane Doe    ACTIVE
SELECT detail_data[0].value as client_status,
       detail_data[1].value as name
 from sample_table 
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
       case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as status
FROM sample_table
-----------------------
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)

SELECT max(case when e.key='name' then e.value end) as name, 
       max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
    name    status  
------------------------
Jane Doe    ACTIVE
SELECT detail_data[0].value as client_status,
       detail_data[1].value as name
 from sample_table 
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
       case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as status
FROM sample_table

SQL: JSON Extract from nested object

copy iconCopydownload iconDownload
SELECT
    id,JSON_EXTRACT( obj, '$.products[0].price.currency') first_product_currency
FROM my_table;
SELECT
    id,JSON_EXTRACT( obj, '$.products[*].price.currency') multiple_currencies
FROM my_table;
-----------------------
SELECT
    id,JSON_EXTRACT( obj, '$.products[0].price.currency') first_product_currency
FROM my_table;
SELECT
    id,JSON_EXTRACT( obj, '$.products[*].price.currency') multiple_currencies
FROM my_table;

Flutter dart export hive saved data to file to retrieve later

copy iconCopydownload iconDownload
Directory.systemTemp;
Directory.systemTemp.createTemp('my_app');
import 'package:hive/hive.dart';

part 'product.g.dart';

@HiveType(typeId: 0)
class Product extends HiveObject{
  @HiveField(0)
  String itemName;

  @HiveField(1)
  String barCode;

  @HiveField(2)
  String bcType;

  Product(this.itemName, this.barCode, this.bcType);

  /// This function will automatically be used by the [jsonEncode()] function internally
  Map<String, dynamic> toJson() => {
    'itemName': this.itemName,
    'barCode': this.barCode,
    'bcType': this.bcType,
  }
}
Future<File?> _createBackupFile() async {
  /// This example uses the OS temp directory
  File backupFile = File('${Directory.systemTemp.path}/backup_barcode.json');

  try {
    /// barcodeBox is the [Box] object from the Hive package, usually exposed inside a [ValueListenableBuilder] or via [Hive.box()]
    backupFile = await backupFile.writeAsString(jsonEncode(barcodeBox.values));

    return backupFile;
  } catch (e) {
    // TODO: handle exception
  }
}
-----------------------
Directory.systemTemp;
Directory.systemTemp.createTemp('my_app');
import 'package:hive/hive.dart';

part 'product.g.dart';

@HiveType(typeId: 0)
class Product extends HiveObject{
  @HiveField(0)
  String itemName;

  @HiveField(1)
  String barCode;

  @HiveField(2)
  String bcType;

  Product(this.itemName, this.barCode, this.bcType);

  /// This function will automatically be used by the [jsonEncode()] function internally
  Map<String, dynamic> toJson() => {
    'itemName': this.itemName,
    'barCode': this.barCode,
    'bcType': this.bcType,
  }
}
Future<File?> _createBackupFile() async {
  /// This example uses the OS temp directory
  File backupFile = File('${Directory.systemTemp.path}/backup_barcode.json');

  try {
    /// barcodeBox is the [Box] object from the Hive package, usually exposed inside a [ValueListenableBuilder] or via [Hive.box()]
    backupFile = await backupFile.writeAsString(jsonEncode(barcodeBox.values));

    return backupFile;
  } catch (e) {
    // TODO: handle exception
  }
}
-----------------------
Directory.systemTemp;
Directory.systemTemp.createTemp('my_app');
import 'package:hive/hive.dart';

part 'product.g.dart';

@HiveType(typeId: 0)
class Product extends HiveObject{
  @HiveField(0)
  String itemName;

  @HiveField(1)
  String barCode;

  @HiveField(2)
  String bcType;

  Product(this.itemName, this.barCode, this.bcType);

  /// This function will automatically be used by the [jsonEncode()] function internally
  Map<String, dynamic> toJson() => {
    'itemName': this.itemName,
    'barCode': this.barCode,
    'bcType': this.bcType,
  }
}
Future<File?> _createBackupFile() async {
  /// This example uses the OS temp directory
  File backupFile = File('${Directory.systemTemp.path}/backup_barcode.json');

  try {
    /// barcodeBox is the [Box] object from the Hive package, usually exposed inside a [ValueListenableBuilder] or via [Hive.box()]
    backupFile = await backupFile.writeAsString(jsonEncode(barcodeBox.values));

    return backupFile;
  } catch (e) {
    // TODO: handle exception
  }
}
-----------------------
Directory.systemTemp;
Directory.systemTemp.createTemp('my_app');
import 'package:hive/hive.dart';

part 'product.g.dart';

@HiveType(typeId: 0)
class Product extends HiveObject{
  @HiveField(0)
  String itemName;

  @HiveField(1)
  String barCode;

  @HiveField(2)
  String bcType;

  Product(this.itemName, this.barCode, this.bcType);

  /// This function will automatically be used by the [jsonEncode()] function internally
  Map<String, dynamic> toJson() => {
    'itemName': this.itemName,
    'barCode': this.barCode,
    'bcType': this.bcType,
  }
}
Future<File?> _createBackupFile() async {
  /// This example uses the OS temp directory
  File backupFile = File('${Directory.systemTemp.path}/backup_barcode.json');

  try {
    /// barcodeBox is the [Box] object from the Hive package, usually exposed inside a [ValueListenableBuilder] or via [Hive.box()]
    backupFile = await backupFile.writeAsString(jsonEncode(barcodeBox.values));

    return backupFile;
  } catch (e) {
    // TODO: handle exception
  }
}

Hive: Query executing from hours

copy iconCopydownload iconDownload
create index idx_TABLE2 on table DB_MYDB.TABLE2 (SDNT_ID,CLSS_CD,BRNCH_CD,SECT_CD,GRP_CD,GRP_NM) AS 'COMPACT' WITH DEFERRED REBUILD;

create index idx_TABLE3 on table DB_MYDB.TABLE3(SDNT_ID,CLSS_CD,BRNCH_CD,SECT_CD,GRP_CD,GRP_NM) AS 'COMPACT' WITH DEFERRED REBUILD;
-----------------------
set hive.exec.reducers.bytes.per.reducer=67108864; --example only, check your current settings 
                                                   --and reduce accordingly to get twice more reducers on Reducer 2 vertex

Drop a hive table named &quot;union&quot;

copy iconCopydownload iconDownload
DROP TABLE IF EXISTS `union`;

Issue in reading records from hive bucket

copy iconCopydownload iconDownload
Select * from collection tablesample(bucket 2 out of 4 on loan_id)
UNION ALL
Select * from collection tablesample(bucket 3 out of 4 on loan_id)

query spark dataframe on max column value

copy iconCopydownload iconDownload
scala> val max_version = df.groupBy().agg(max("project_version").as("version")).as[Double].collect.head
max_version: Double = 2.1

scala> val df_foo = Seq((2.0,20210105,187234),(2.0,20210110,188356),(2.1,20210201,188820)).toDF("project_version","dt","count")
df_foo: org.apache.spark.sql.DataFrame = [project_version: double, dt: int ... 1 more field]

scala> val max_version = df_foo.groupBy().agg(max("project_version").as("version")).as[Double].collect.head
max_version: Double = 2.1

scala> val df_foo_latest = df_foo.filter($"project_version" === max_version).count()
df_foo_latest: Long = 1

scala> val df_foo_latest = df_foo.filter($"project_version" === max_version)
df_foo_latest: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [project_version: double, dt: int ... 1 more field]

scala> df_foo_latest.count
res1: Long = 1

scala> df_foo_latest.show(false)
+---------------+--------+------+
|project_version|dt      |count |
+---------------+--------+------+
|2.1            |20210201|188820|
+---------------+--------+------+
scala> val max_version = df_foo.groupBy().max("project_version")
max_version: org.apache.spark.sql.DataFrame = [max(project_version): double]

scala> val max_version = df_foo.groupBy().agg(max("project_version").as("project_version"))

scala> val df_foo_latest = df_foo.join(max_version,Seq($"project_version"),"inner")


scala> df_foo_latest.show(false)
+---------------+--------+------+
|project_version|dt      |count |
+---------------+--------+------+
|2.1            |20210201|188820|
+---------------+--------+------+
-----------------------
scala> val max_version = df.groupBy().agg(max("project_version").as("version")).as[Double].collect.head
max_version: Double = 2.1

scala> val df_foo = Seq((2.0,20210105,187234),(2.0,20210110,188356),(2.1,20210201,188820)).toDF("project_version","dt","count")
df_foo: org.apache.spark.sql.DataFrame = [project_version: double, dt: int ... 1 more field]

scala> val max_version = df_foo.groupBy().agg(max("project_version").as("version")).as[Double].collect.head
max_version: Double = 2.1

scala> val df_foo_latest = df_foo.filter($"project_version" === max_version).count()
df_foo_latest: Long = 1

scala> val df_foo_latest = df_foo.filter($"project_version" === max_version)
df_foo_latest: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [project_version: double, dt: int ... 1 more field]

scala> df_foo_latest.count
res1: Long = 1

scala> df_foo_latest.show(false)
+---------------+--------+------+
|project_version|dt      |count |
+---------------+--------+------+
|2.1            |20210201|188820|
+---------------+--------+------+
scala> val max_version = df_foo.groupBy().max("project_version")
max_version: org.apache.spark.sql.DataFrame = [max(project_version): double]

scala> val max_version = df_foo.groupBy().agg(max("project_version").as("project_version"))

scala> val df_foo_latest = df_foo.join(max_version,Seq($"project_version"),"inner")


scala> df_foo_latest.show(false)
+---------------+--------+------+
|project_version|dt      |count |
+---------------+--------+------+
|2.1            |20210201|188820|
+---------------+--------+------+

Community Discussions

Trending Discussions on hive
  • How to duplicate row based on int column
  • Getting java.lang.ClassNotFoundException when I try to do spark-submit, referred other similar queries online but couldnt get it to work
  • SQL: Extract from messy JSON nested field with backslashes
  • Hive Explode the Array of Struct key: value:
  • SQL: JSON Extract from nested object
  • Flutter dart export hive saved data to file to retrieve later
  • Hive: Query executing from hours
  • Drop a hive table named &quot;union&quot;
  • Issue in reading records from hive bucket
  • query spark dataframe on max column value
Trending Discussions on hive

QUESTION

How to duplicate row based on int column

Asked 2021-Jun-15 at 22:07

If I have a table like this in Hive:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
greg         0              5

How can I duplicate each row in a select statement by the sampling_rate column so that it would look like this:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5

ANSWER

Answered 2021-Jun-15 at 22:07

Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.

Demo:

with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)

select t.*
  from your_table t --use your table here
       lateral view explode(split(space(t.sampling_rate-1),' '))e 

Result:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5

Source https://stackoverflow.com/questions/67993016

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install hive

Installation Instructions and a quick tutorial: https://cwiki.apache.org/confluence/display/Hive/GettingStarted. A longer tutorial that covers more features of HiveQL: https://cwiki.apache.org/confluence/display/Hive/Tutorial. The HiveQL Language Manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual.
Installation Instructions and a quick tutorial: https://cwiki.apache.org/confluence/display/Hive/GettingStarted
A longer tutorial that covers more features of HiveQL: https://cwiki.apache.org/confluence/display/Hive/Tutorial
The HiveQL Language Manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

Explore Related Topics

Share this Page

share link
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.