kandi X-RAY | parquet-tools Summary
kandi X-RAY | parquet-tools Summary
Command line tools for the parquet project
Top functions reviewed by kandi - BETA
- Starts the SLF bridge handler
- Prints usage information for given command
- Merges options
- Read Parquet meta data
- Flush columns
- Prints the details of a column
- Prints a string
- Displays the Parquet file
- Converts a binary value to a string
- Dump the Parquet schema information
- The main method
- Pretty print values in the given print writer
- Read Parquet file
- Read and print the values from the command line
- Creates a converter for a field
parquet-tools Key Features
parquet-tools Examples and Code Snippets
Trending Discussions on parquet-tools
i'm using parquetjs to create parquet files and push to
google cloud storage.
Problem is that bigquery cannot read the data from file but when i use
parquet-tools everything looks healthy.
ANSWERAnswered 2021-Nov-29 at 15:07
useDataPageV2: false as option to
I am new to Snowflake, but my company has been using it successfully.
Parquet files are currently being written with an existing Avro Schema, using Java parquet-avro v1.10.1.
I have been updating the dependencies in order to use latest Avro, and part of that bumped Parquet to 1.11.0.
The Avro Schema is unchanged. However when using the COPY INTO Snowflake command, I receive a LOAD FAILED with error:
Error parsing the parquet file: Logical type Null can not be applied to group node but no other error details :(
The problem is that there are no null columns in the files.
I've cut the Avro schema down, and found that the presence of a MAP type in the Avro schema is causing the issue.
The field is...
ANSWERAnswered 2020-Jun-22 at 09:19
Logical type Null can not be applied to group node
Looking up the error above, it appears that a version of Apache Arrow's parquet libraries is being used to read the file.
However, looking closer, the real problem lies in the use of legacy types within the Avro based Parquet Writer implementation (the following assumes Java was used to write the files).
logicalTypes schema metadata introduced in Parquet defines many types including a singular
MAP type. Historically, the former
convertedTypes schema field supported use of
MAP_KEY_VALUE for legacy readers. The new writers that use
logicalTypes (1.11.0+) should not be using the legacy map type anymore, but work hasn't been done yet to update the Avro to Parquet schema conversions to drop the
MAP_KEY_VALUE types entirely.
As a result, the schema field for
MAP_KEY_VALUE gets written out with an
UNKNOWN value of
logicalType, which trips up Arrow's implementation that only understands
logicalType values of
Consider logging this as a bug against the Apache Parquet project to update their Avro writers to stop nesting the legacy
MAP_KEY_VALUE type when transforming an Avro schema to a Parquet one. It should've ideally been done as part of PARQUET-1410.
Unfortunately this is hard-coded behaviour and there are no configuration options that influence map-types that can aid in producing a correct file for Apache Arrow (and for Snowflake by extension). You'll need to use an older version of the writer until a proper fix is released by the Apache Parquet developers.
I have an S3 bucket full of .gz.parquet files. I want to make them accessible in Athena. In order to do this I am creating a table in Athena that points at the s3 bucket:...
ANSWERAnswered 2020-Jun-17 at 20:16
You can use an AWS Glue Crawler to automatically derive the schema from your Parquet files.
Defining AWS Glue Crawlers: https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html
I am using BigQuery to query an external data source (also known as a federated table), where the source data is a hive-partitioned parquet table stored in google cloud storage. I used this guide to define the table.
My first query to test this table looks like the following...
ANSWERAnswered 2020-Apr-13 at 15:53
Note that, the schema of the external table is inferred from the last file sorted by the file names lexicographically among the list of all files that match the source URI of the table. So any chance that particular Parquet file in your case has a different schema than the one you described, e.g., a INT32 column with DATE logical type for the "visitor_partition" field -- which BigQuery would infer as DATE type.
I am running a spark job to write to parquet. I want to enable dictionary encoding for the files written. When I check the files, I see they are 'plain dictionary'. However, I do not see any stats for these columns
Let me know if I am missing anything...
ANSWERAnswered 2020-Mar-27 at 23:18
Got the answer. The parquet tools version I was using was 1.6. Upgrading to 1.10 solved the issue
I'd like to convert an int96 value such as ACIE4NxJAAAKhSUA into a readable timestamp format like 2020-03-02 14:34:22 or whatever that could be normally interpreted...I mostly use python so I'm looking to build a function that does this conversion. If there's another function that can do the reverse -- even better.Background
I'm using parquet-tools to convert a raw parquet file (with snappy compression) to raw JSON via this commmand:...
ANSWERAnswered 2020-Mar-16 at 19:01
parquet-tools will not be able to change format type from INT96 to INT64. What you are observing in json output is a String representation of the timestamp stored in INT96 TimestampType. You will need spark to re-write this parquet with timestamp in INT64 TimestampType and then the json output will produce a timestamp (in the format you desire).
You will need to set a specific config in Spark -
No vulnerabilities reported
You can use parquet-tools like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the parquet-tools component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Reuse Trending Solutions
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page