reair | use tools for replicating tables | Storage library

 by   airbnb Java Version: Current License: Apache-2.0

kandi X-RAY | reair Summary

kandi X-RAY | reair Summary

reair is a Java library typically used in Storage applications. reair has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. However reair has 96 bugs. You can download it from GitHub.

The replication features in ReAir are useful for the following use cases:. When migrating a Hive warehouse, ReAir can be used to copy over existing data to the new warehouse. Because ReAir copies both data and metadata, datasets are ready to query as soon as the copy completes. While many organizations start out with a single Hive warehouse, they often want better isolation between production and ad hoc workloads. Two isolated Hive warehouses accommodate this need well, and with two warehouses, there is a need to replicate evolving datasets. ReAir can be used to replicate data from one warehouse to another and propagate updates incrementally as they occur. Lastly, ReAir can be used to replicated datasets to a hot-standby warehouse for fast failover in disaster recovery scenarios. To accommodate these use cases, ReAir includes both batch and incremental replication tools. Batch replication executes a one-time copy of a list of tables. Incremental replication is a long-running process that copies objects as they are created or changed on the source warehouse.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              reair has a low active ecosystem.
              It has 261 star(s) with 95 fork(s). There are 42 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 5 open issues and 15 have been closed. On average issues are closed in 118 days. There are 6 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of reair is current.

            kandi-Quality Quality

              OutlinedDot
              reair has 96 bugs (19 blocker, 2 critical, 62 major, 13 minor) and 1847 code smells.

            kandi-Security Security

              reair has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              reair code analysis shows 0 unresolved vulnerabilities.
              There are 9 security hotspots that need review.

            kandi-License License

              reair is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              reair releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              reair saves you 14429 person hours of effort in developing the same functionality from scratch.
              It has 28874 lines of code, 2235 functions and 174 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed reair and discovered the below as its top functions. This is intended to give you an instant insight into reair implemented functionality, and help decide if they suit your requirements.
            • Main entry point for the copy
            • Retrieves a single partition
            • Returns the ThreadLocal metastore client
            • Copy a table
            • Returns true if the field with the specified field ID is set
            • Run the replication task
            • Run a single MR copy job
            • Run a batch replication
            • Runs a commit change job
            • Returns true if the field is set
            • Resets this record so that it can be reused
            • Compares this job info with the specified value
            • Insert a query entry into the audit log
            • Fetch the replication job data from the thrift server
            • Runs the copy partition task
            • Launches the audit log entry
            • Retrieves a job information from the database
            • Splits the input splits into chunks
            • Get the input splits
            • Run the rename task
            • Perform a mapping operation
            • Main entry point
            • Rename the destination
            • Sets the field value
            • Returns a string representation of this TReplicationJob
            • Ordered by id
            Get all kandi verified functions for this library.

            reair Key Features

            No Key Features are available at this moment for reair.

            reair Examples and Code Snippets

            No Code Snippets are available at this moment for reair.

            Community Discussions

            QUESTION

            Sync files on hdfs having same size but varies in contents
            Asked 2019-May-08 at 08:21

            i am trying to sync files from one hadoop clutster to another using distcp and airbnb reair utility, but both of them are not working as expected.

            if file size is same on source and destination both of them fails to update it even if file content are been changed(checksum also varies) unless overwrite option is not used.

            I need to keep sync data of around 30TB so every time loading complete dataset is not feasible.

            Could anyone please suggest how can i bring two dataset in sync if file size is same(count in source is changed) and have varied checksum.

            ...

            ANSWER

            Answered 2018-Jan-24 at 04:17

            The way DistCp handles syncing between files that are the same size but having different contents is by comparing its so-called FileChecksum. The FileChecksum was first introduced in HADOOP-3981, mostly for the purpose of being used in DistCp. Unfortunately, this has the known shortcoming of being incompatible between different storage implementations, and even incompatible between HDFS instances that have different internal block/chunk settings. Specifically, that FileChecksum bakes in the structure of having, for example, 512-bytes-per-chunk and 128MB-per-block.

            Since GCS doesn't have the same notions of "chunks" or "blocks", there's no way for it to have any similar definition of a FileChecksum. The same is also true of all other object stores commonly used with Hadoop; the DistCp documentation appendix discusses this fact under "DistCp and Object Stores".

            That said, there's a neat trick that can be done to define a nice standardized representation of a composite CRC for HDFS files that is mostly in-place compatible with existing HDFS deployments; I've filed HDFS-13056 with a proof of concept to try to get this added upstream, after which it should be possible to make it work out-of-the-box against GCS, since GCS also supports file-level CRC32C.

            Source https://stackoverflow.com/questions/48289719

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install reair

            If the MySQL tables for incremental replication were not set up while setting up the audit log, create the state tables for incremental replication on desired MySQL instance by running the create table commands listed here. Read through and fill out the configuration from the template. You might want to deploy the file to a widely accessible location. Switch to the repo directory and build the JAR. You can skip the unit tests if no changes have been made (via the '-x test' flag). Once the build finishes, the JAR to run the incremental replication process can be found under main/build/libs/airbnb-reair-main-1.0.0-all.jar. If you use the recommended log4j.properties file that is shipped with the tool, messages with the INFO level will be printed to stderr, but more detailed logging messages with >= DEBUG logging level will be recorded to a log file in the current working directory. When the incremental replication process is launched for the first time, it will start replicating entries after the highest numbered ID in the audit log. Because the process periodically checkpoints progress to the DB, it can be killed and will resume from where it left off when restarted. To override this behavior, please see the additional options section. For production deployment, an external process should monitor and restart the replication process if it exits. The replication process will exit if the number of consecutive failures while making RPCs or DB queries exceed the configured number of retries.
            If the MySQL tables for incremental replication were not set up while setting up the audit log, create the state tables for incremental replication on desired MySQL instance by running the create table commands listed here.
            Read through and fill out the configuration from the template. You might want to deploy the file to a widely accessible location.
            Switch to the repo directory and build the JAR. You can skip the unit tests if no changes have been made (via the '-x test' flag).
            To start replicating, set options to point to the appropriate logging configuration and kick off the replication launcher by using the hadoop jar command on the destination cluster. An example log4j.properties file is provided here. Be sure to specify the configuration file that was filled out in the prior step. As with batch replication, you may need to run the process as a different user.
            Verify that entries are replicated properly by creating a test table on the source warehouse and checking to see if it appears on the destination warehouse.

            Support

            Blog PostFAQKnown IssuesLarge HDFS Directory Copy
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/airbnb/reair.git

          • CLI

            gh repo clone airbnb/reair

          • sshUrl

            git@github.com:airbnb/reair.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Storage Libraries

            localForage

            by localForage

            seaweedfs

            by chrislusf

            Cloudreve

            by cloudreve

            store.js

            by marcuswestin

            go-ipfs

            by ipfs

            Try Top Libraries by airbnb

            javascript

            by airbnbJavaScript

            lottie-android

            by airbnbJava

            lottie-web

            by airbnbJavaScript

            lottie-ios

            by airbnbSwift

            visx

            by airbnbTypeScript