parquet-avro-protobuf | Convert Protobuf to Parquet using parquet | Serialization library
kandi X-RAY | parquet-avro-protobuf Summary
kandi X-RAY | parquet-avro-protobuf Summary
Example: Convert Protobuf to Parquet using parquet-avro and avro-protobuf
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Runs the example
- This method writes protobuf data to Avro file
- Write protobuf file
- Returns the alphanumeric for the given ordinal
parquet-avro-protobuf Key Features
parquet-avro-protobuf Examples and Code Snippets
Community Discussions
Trending Discussions on parquet-avro-protobuf
QUESTION
I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
...ANSWER
Answered 2021-Jan-31 at 12:24(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec
before-hand. Do the following:
- Set the
BulkInferrer
to write tooutput_examples
rather thaninference_result
by adding a output_example_spec to the component construction. - Add a
StatisticsGen
and aSchemaGen
component in the main pipeline right after theBulkInferrer
to generate a schema for the aforementionedoutput_examples
- Use the artifacts from
SchemaGen
andBulkInferrer
to read the TFRecords and do whatever is neccessary.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install parquet-avro-protobuf
You can use parquet-avro-protobuf like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the parquet-avro-protobuf component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page