cobrix | A COBOL parser and Mainframe/EBCDIC data source for Apache Spark | Parser library

by AbsaOSS Scala Version: 1.1.2 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(4)Vulnerabilities Install Support

kandi X-RAY | cobrix Summary

cobrix is a Scala library typically used in Utilities, Parser applications. cobrix has no vulnerabilities, it has a Permissive License and it has low support. However cobrix has 1 bugs. You can download it from GitHub.

Allows loading data from multiple unrelated paths on the same filesystem. Specifies the number of bytes to skip at the beginning of each file. Specifies the number of bytes to skip at the end of each file. Specifies the number of bytes to skip at the beginning of each record before applying copybook fields to data. Specifies the number of bytes to skip at the end of each record after applying copybook fields to data. Historically, COBOL parser ignores the first 6 characters and all characters after 72. When this option is false, no truncation is performed. By default each line starts with a 6 character comment. The exact number of characters can be tuned using this option. By default all characters after 72th one of each line is ignored by the COBOL parser. The exact number of characters can be tuned using this option. Specifies if and how string fields should be trimmed. Available options: both (default), none, left, right. Specifies a code page for EBCDIC encoding. Currently supported values: common (default), common_extended, cp037, cp037_extended, cp875. *_extended code pages supports non-printable characters that converts to ASCII codes below 32. Specifies a user provided class for a custom code page to UNICODE conversion. Specifies a charset to use to decode ASCII data. The value can be any charset supported by java.nio.charset: US-ASCII (default), UTF-8, ISO-8859-1, etc. Specifies if UTF-16 encoded strings (National / PIC N format) are big-endian (default). Specifies a floating-point format. Available options: IBM (default), IEEE754, IBM_little_endian, IEEE754_little_endian. If false (default) fields that have OCCURS 0 TO 100 TIMES DEPENDING ON clauses always have the same size corresponding to the maximum array size (e.g. 100 in this example). If set to true the size of the field will shrink for each field that has less actual elements. .option("occurs_mapping", "{"FIELD": {"X": 1}}"). If specified, as a JSON string, allows for String DEPENDING ON fields with a corresponding mapping. If true, values that contain only 0x0 ror DISPLAY strings and numbers will be considered nulls instead of empty strings. When collapse_root (default) the root level record will be removed from the Spark schema. When keep_original, the root level GROUP will be present in the Spark schema. If true, all GROUP FILLERs will be dropped from the output schema. If false (default), such fields will be retained. If true (default), all non-GROUP FILLERs will be dropped from the output schema. If false, such fields will be retained. Specifies groups to also be added to the schema as string fields. When this option is specified, the reader will add one extra data field after each matching group containing the string data for the group. Generate autoincremental 'File_Id' and 'Record_Id' fields. This is used for processing record order dependent data. Generates a column containing input file name for each record (Similar to Spark SQL input_file_name() function). The column name is specified by the value of the option. This option only works for variable record length files. For fixed record length files use input_file_name(). If specified, each primitive field will be accompanied by a debug field containing raw bytes from the source file. Possible values: none (default), hex, binary. The legacy value true is supported and will generate debug fields in HEX. | .option("record_format", "F") | Record format from the spec. One of F (fixed length, default), FB (fixed block), V(variable length RDW),VB(variable block BDW+RDW),D(ASCII text). | | .option("record_length", "100") | Overrides the length of the record (in bypes). Normally, the size is derived from the copybook. But explicitly specifying record size can be helpful for debugging fixed-record length files. | | .option("block_length", "500") | Specifies the block length for FB records. It should be a multiple of 'record_length'. Cannot be used together withrecords_per_block| | .option("records_per_block", "5") | Specifies the number of records ber block for FB records. Cannot be used together withblock_length` |.

Support

Quality

Security

License

Reuse

Support

cobrix has a low active ecosystem.

It has 124 star(s) with 78 fork(s). There are 25 watchers for this library.

It had no major release in the last 12 months.

There are 69 open issues and 261 have been closed. On average issues are closed in 55 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of cobrix is 1.1.2

Quality

cobrix has 1 bugs (0 blocker, 0 critical, 1 major, 0 minor) and 211 code smells.

Security

cobrix has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

cobrix code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

cobrix is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

cobrix releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 27946 lines of code, 1415 functions and 309 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cobrix

Get all kandi verified functions for this library.

cobrix Key Features

No Key Features are available at this moment for cobrix.

cobrix Examples and Code Snippets

No Code Snippets are available at this moment for cobrix.

Community Discussions

Trending Discussions on cobrix

How to build dataframe from cobol and ebcdic file with COMP fields using Cobrix?

How do I define the record structure of ebcdic file?

Error while using Scala object in PySpark

Processing a mainframe file using cobrix in databricks - Pyspark python 3

QUESTION

How to build dataframe from cobol and ebcdic file with COMP fields using Cobrix?

Asked 2021-Feb-19 at 14:58

I am trying first to build a simple dataframe from mainframe source with cobrix to find out how it deals with ebcdic files.

My Input looks like this. (hex) : 313030100C3230301A0C. If I quickly open with Notepad++ : raw_data

I use these options to read my data and turn it into a dataframe. I have tried all the ebcdic encoding supported values without success. I also tried to change S9(3). to 999. or 9(3). in .cobol file but does not change anything.

My output does not look like what I was expecting.

It works fine with "classic" ascii encodage and without "COMP-3". Can you help me to find out why my df does not look like expected ?

Many thanks !

...

ANSWER

Answered 2021-Feb-19 at 14:58

Your input data has been converted from EBCDIC to ASCII already.

The first s9(3) characters '100' should be hex 'F1F0F0' in EBCDIC.

However your file was transferred it converted all bytes to ASCII therefore corrupting the comp-3 values which are NOT valid EBCDIC.

Source https://stackoverflow.com/questions/66266182

QUESTION

How do I define the record structure of ebcdic file?

Asked 2020-Sep-24 at 07:53

I have ebcdic file in hdfs I want to load data to spark dataframe, process it and load results as orc files, I found that there is a open source solution wich is cobrix cobrix, that allow to get data from ebcdic files, but developer must provide a copybook file wich is a schema definition. A few line of my ebcedic file are presented in the attached image. I want to get the format of copybook of the ebcdic file, essentially I want to read the vin his length is 17, vin_data the length is 3 and finally vin_val the length is 100.

Thank you

...

ANSWER

Answered 2020-Sep-21 at 10:46

how to define a copybook file of ebcdic data?

You don't.

A copybook may be used as a record definition (=how the data is stored), it has nothing to do with the encoding of data that may be stored in that.

This leaves the question "How do I define the record structure?"

You'd need the amount of fields, their length and type (it likely is not only USAGE DISPLAY) and then just define it with some fancy names. Ideally you just get the original record definition from the COBOL program writing the file, put that into a copybook if it isn't in one yet, and use that.

Your link has samples that show actually how a copybook looks like, if you struggle on the definition then please edit your question with the copybook you've defined and we may be able to help.

Source https://stackoverflow.com/questions/63990208

QUESTION

Error while using Scala object in PySpark

Asked 2020-Apr-11 at 23:12

I am planning to use Scala Object in Pyspark. This is the below code in Scala

...

ANSWER

Answered 2020-Apr-11 at 15:38

You forgot to declare the object at Scala so the Python part can find it. Something like this:

Source https://stackoverflow.com/questions/61090242

QUESTION

Processing a mainframe file using cobrix in databricks - Pyspark python 3

Asked 2020-Jan-30 at 05:24

Does anyone know on how to integrate cobrix in azure databricks - pyspark for processing a mainframe file , having comp-3 columns(Python 3 )

Please find the below link for detailed issue. https://github.com/AbsaOSS/cobrix/issues/236#issue-550885564

...

ANSWER

Answered 2020-Jan-30 at 05:24

To make third-party or locally-built code available to notebooks and jobs running on your clusters, you can install a library. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.

Steps to install third party libraries:

Step1: Create Databricks Cluster.

Step2: Select the cluster created.

Step3: Select Libraries => Install New => Select Library Source = "Maven" => Coordinates => Search Packages => Select Maven Central => Search for the package required. Example: (spark-cobol, cobol-parser, scodec) => Select the version required => Install

For more details, refer "Azure Databricks - libraries" and "Cobrix: A Mainframe Data Source for Spark SQL and Streaming".

Hope this helps. Do let us know if you any further queries.

Source https://stackoverflow.com/questions/59773131

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cobrix

This repository contains several standalone example applications in examples/spark-cobol-app directory. It is a Maven project that contains several examples:. The example project can be used as a template for creating Spark Application. Refer to README.md of that project for the detailed guide how to run the examples locally and on a cluster. When running mvn clean package in examples/spark-cobol-app an uber jar will be created. It can be used to run jobs via spark-submit or spark-shell.
SparkTypesApp is an example of a very simple mainframe file processing. It is a fixed record length raw data file with a corresponding copybook. The copybook contains examples of various numeric data types Cobrix supports.
SparkCobolApp is an example of a Spark Job for handling multisegment variable record length mainframe files.
SparkCodecApp is an example usage of a custom record header parser. This application reads a variable record length file having non-standard RDW headers. In this example RDH header is 5 bytes instead of 4
SparkCobolHierarchical is an example processing of an EBCDIC multisegment file extracted from a hierarchical database.
Spark 2.2.1
Driver memory: 4GB
Driver cores: 4
Executor memory: 4GB
Cores per executor: 1

Support

Cobrix supports variable record length files. The only requirement is that such a file should contain a standard 4 byte record header known as Record Descriptor Word (RDW). Such headers are created automatically when a variable record length file is copied from a mainframe. Another type of files are variable blocked length. Such files contain Block Descriptor Word (BDW), as well as Record Descriptor Word (RDW) headers. Any such header can be either big-endian or little-endian. Also, quite often BDW headers need to be adjusted in order to be read properly. See the use cases section below.

Find more information at: