OpenRefine | open source power tool for working with messy data

 by   OpenRefine Java Version: 3.7.2 License: BSD-3-Clause

kandi X-RAY | OpenRefine Summary

kandi X-RAY | OpenRefine Summary

OpenRefine is a Java library typically used in Data Science applications. OpenRefine has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub, Maven.

OpenRefine is a Java-based power tool that allows you to load data, understand it, clean it up, reconcile it, and augment it with data coming from the web. All from a web browser and the comfort and privacy of your own computer.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              OpenRefine has a medium active ecosystem.
              It has 9524 star(s) with 1837 fork(s). There are 476 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 584 open issues and 2253 have been closed. On average issues are closed in 407 days. There are 21 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of OpenRefine is 3.7.2

            kandi-Quality Quality

              OpenRefine has 0 bugs and 0 code smells.

            kandi-Security Security

              OpenRefine has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              OpenRefine code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              OpenRefine is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              OpenRefine releases are available to install and integrate.
              Deployable package is available in Maven.
              Build file is available. You can build the component from source.
              Installation instructions are available. Examples and code snippets are not available.
              OpenRefine saves you 87305 person hours of effort in developing the same functionality from scratch.
              It has 100859 lines of code, 5388 functions and 1368 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed OpenRefine and discovered the below as its top functions. This is intended to give you an instant insight into OpenRefine implemented functionality, and help decide if they suit your requirements.
            • Parse a numeric token .
            • Retrieves data from a post request .
            • Returns the next token .
            • Encode the main loop .
            • Parse a factor .
            • Gets the insert SQL .
            • Gets the create sql .
            • Export rows .
            • Retrieves the data directory .
            • Generate a serializable log event .
            Get all kandi verified functions for this library.

            OpenRefine Key Features

            No Key Features are available at this moment for OpenRefine.

            OpenRefine Examples and Code Snippets

            How to use tow filters in Haskell?
            Lines of Code : 17dot img1License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            main :: IO ()
            main = do
              let x1 = allSubseqs2 [6,3,1,5,2,7,8,1]
              print $ filter' ((==) (maximum (map' length x1)) . length) x1
            
            longSubseqs values = do
              let x1 = allSubseqs2 values
              filter' ((==) (maximum (map' 
            java calling method fails as method undefined
            Javadot img2Lines of Code : 43dot img2License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            public class Test 
            {
                public static void main(String[] args)
            {
            System.out.println(countFileRecords());
            }
            
            package com;
            
            import java.io.FileInputStream;
            import java.io.FileNotFoundException;
            import java.util.Scann
            How to add MongoDB dependecy to Java
            Javadot img3Lines of Code : 91dot img3License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            
            
                4.0.0
            
                com.example
                mongodb-javafx-demo
                1.0-SNAPSHOT
                mongo
            
                
                    UTF-8
                    18
                
            
                
                    
                        org.openjfx
                        javafx-controls
                        ${javafx.version}
                    
                    
                    
            rmarkdown beamer reduce font size for references
            Lines of Code : 26dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            ---
            title: "minimal reproducible example"
            author: "User"
            date: "April 2022"
            output: 
              beamer_presentation:
                keep_tex: true
            bibliography: test.bib
            header-includes:
              - \AtBeginEnvironment{CSLReferences}{\tiny}
            ---
            
            ## Main question
            
            
            
            H
            Why does H2 alias duplicate inserted rows?
            Javadot img5Lines of Code : 9dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            String url = conn.getMetaData().getURL();
            if (url.equals("jdbc:columnlist:connection")) {
                SimpleResultSet rs = new SimpleResultSet();
                // With some connection options "id" should be used instead
                rs.addColumn("ID", Types.BIGINT, 
            Best suited data structure for prefix matching search
            Lines of Code : 130dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            class TrieNode {
                constructor(data=null) {
                    this.children = {}; // Dictionary, 
                    this.data = data; // Non-null when this node represents the end of a valid word
                }
                addWord(word, data) {
                    let node = this; // t
            populating an ArrayList of unknown size- error: class expected nextInt
            Javadot img7Lines of Code : 23dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import java.util.Scanner;    
            import java.util.ArrayList;    
            
            public class MyClass {
              public static void main(String args[]) {
                
                Scanner sc = new Scanner(System.in);    //Telling the scanner class to accept input from keyboard
            
              
            Creating a serializable fixed size char array in F#
            Lines of Code : 93dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            #nowarn "9"
            
            open System
            open System.Runtime.InteropServices
            open BenchmarkDotNet.Attributes
            open BenchmarkDotNet.Running
            open Microsoft.FSharp.NativeInterop
            
            type ShortEventDataRec =
                {
                    Timestamp: DateTime
                    Event:     by
            gridView builder with dynamic Filter (search)
            Lines of Code : 221dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            class HomeScreen00 extends StatefulWidget {
              @override
              _HomeScreen00State createState() => _HomeScreen00State();
            }
            
            class _HomeScreen00State extends State {
              List myIds = [];
              List myServiceNames = [];
              List myImagesUrl = [];
              bo
            copy iconCopy
            export 'string_extension.dart';
            export 'list_extension.dart';
            
            abstract class Extensions {}
            

            Community Discussions

            QUESTION

            Storing RDF to Triple Store as input: Conversion from CSV to RDF
            Asked 2022-Feb-05 at 16:46

            I am using Triple Store called Apache Jena Fuseki for storing the RDF as input But the thing is that i have data in CSV format. I researched a lot but didn't find direct way to convert CSV to RDF but there is tarql tool which is command line tool that can do the job but the thing is that i need a python script that directly converts my CSV to RDF form.

            I have used the tools like openRefine and tarql but i need python script to do this job and i have read somewhere that owlready2 tool also used to convert CSV to RDF but when i used to visit the official site then i found that they are using OWL file for this work.

            Thanks!

            ...

            ANSWER

            Answered 2022-Feb-05 at 16:46

            CSVW - CSV on the Web - is a W3C Recommendation for this. There is a python implementation.

            Or you can run "tarql" from python by forking a subprocess.

            Source https://stackoverflow.com/questions/70997605

            QUESTION

            OpenRefine sample extension not building
            Asked 2021-Dec-30 at 15:25

            I'd like to write my own OpenRefine extension

            Before starting any implementation, I just want to build the sample extension from OpenRefine just to get me started.

            However, I'm getting the Maven error

            ...

            ANSWER

            Answered 2021-Dec-30 at 15:25

            ok, I think the sample project has a wrong version in the pom.xml. it should be\

            Source https://stackoverflow.com/questions/70529687

            QUESTION

            Does OpenRefine support Python3?
            Asked 2021-Dec-29 at 22:31

            I have my own Python library that I would like to use in OpenRefine as described here

            However, it seems that all the Python code in OpenRefine goes through Jython which supports only Python 2

            Is there a way to run Python3 code in OpenRefine?

            cheers

            ...

            ANSWER

            Answered 2021-Dec-29 at 06:36

            Short answer: NO. Openrefine uses Jython, which is currently based on python 2.7 and there is no immediate or short term plans to move to 3.X versions.

            BUT.

            There is a trick to do this, as soon as you have python3 installed on your machine. Python2 allows the execution of a command-line script/tool, and collecting the result.

            This simple python2 script will do that :

            Source https://stackoverflow.com/questions/70515655

            QUESTION

            OpenRefine: swapping order strings within a column of values
            Asked 2021-Nov-14 at 08:43

            I have a column of values with a range of dates formatted as DD Month YYYY, but I want this to read Month DD YYYY. So, for example "14 October 2021" should be "October 14 2021" - is there is a simple way to do this in OpenRefine?

            Thank you!

            ...

            ANSWER

            Answered 2021-Oct-14 at 21:36

            From a google search it looks like there is a python library called Jython. If you install it, you could try.

            Source https://stackoverflow.com/questions/69576075

            QUESTION

            Extract text using GREL in OpenRefine
            Asked 2021-Sep-03 at 20:25

            I'm trying to add a column based on a column in OpenRefine using GREL.

            I need to extract every text after the second space in scientific name.

            Here is two examples of the original cell data ---> what I want to extract:

            Amandinea punctata (Hoffm.) Coppins & Scheid. ---> (Hoffm.) Coppins & Scheid. Agonimia tristicula (Nyl.) Zahlbr. ---> (Nyl.) Zahlbr.

            ...

            ANSWER

            Answered 2021-Aug-31 at 14:58

            A solution : partition on what appears to be a good separator : " (", take the right part and add a missing "(" at the beginning.

            Source https://stackoverflow.com/questions/68997950

            QUESTION

            How to get csv with header from xml
            Asked 2021-Jul-01 at 13:35

            I have a tei listPerson

            ...

            ANSWER

            Answered 2021-Jul-01 at 13:35

            With XSLT 2 or 3, I usually prefer to use xsl:value-of separator to construct the lines of CSV e.g.

            Source https://stackoverflow.com/questions/68209714

            QUESTION

            OpenRefine: How can I offset values? (preceding row to the following row)
            Asked 2021-Jun-03 at 09:22

            Let's suppose I have this list in OpenRefine:

            • A
            • B
            • C

            Is there a way to move (offset values) B to A like the following?

            • A B
            • B C
            ...

            ANSWER

            Answered 2021-May-31 at 08:08

            With the cross() function, and v3.5 of OpenRefine (currently in beta) you can access previous or following rows by not supplying the field name. You can achieve the same by creating an index column in v3.4.

            So, you can do cells.ColumnName.value +" "+ cross(row.index + 1, "", "")[0].cells.ColumnName.value to get the value of the next row appending the value of that cell in the current row, with a space.

            Note that this will take the value of the row with an index higher, not necessally the row following in the display, if you use sorting.

            Regards, Antoine

            Source https://stackoverflow.com/questions/67764565

            QUESTION

            OpenRefine: How to create a unique row for each input in a column ( dilneated by comma)
            Asked 2021-Jun-01 at 22:15

            I have a bunch of product data to clean prior to entry into a database that looks like this:

            COL A COL B COL C... "N" Option 1 A, B, C, D Option 1 attribute Option 2 C, D, F Option 2 attribute Option 3 D, J, Z Option 3 attribute

            And I'd like for it to look like this with a unique row for every unique product option:

            COL A COL B COL C... "N" Option 1 A Option 1 attribute Option 1 B Option 1 attribute Option 1 C Option 1 attribute Option 1 D Option 1 attribute Option 2 C Option 2 attribute Option 2 D Option 2 attribute Option 2 F Option 2 attribute Option 3 D Option 3 attribute Option 3 J Option 3 attribute Option 3 Z Option 3 attribute

            I understand how I could do this with a python script, but I am already using OpenRefine, and I am hoping not to involve a whole new process to my data flow.

            Is there an easy way to do this in OpenRefine? I am having a hard time finding a method or extensions for something like this.

            Thanks!

            EDIT

            @magdmartin How can you fill down blank cells using delineated values from the first cell?

            COL A COL B COL C... "N" Option 1 A,B,C,D Option 1 attribute Option 1 Option 1 attribute Option 1 Option 1 attribute Option 1 Option 1 attribute Option 2 C,D,F Option 2 attribute Option 2 Option 2 attribute Option 2 Option 2 attribute Option 3 D,J,Z Option 3 attribute Option 3 Option 3 attribute Option 3 Option 3 attribute

            Turned into

            COL A COL B COL C... "N" Option 1 A Option 1 attribute Option 1 B Option 1 attribute Option 1 C Option 1 attribute Option 1 D Option 1 attribute Option 2 C Option 2 attribute Option 2 D Option 2 attribute Option 2 F Option 2 attribute Option 3 D Option 3 attribute Option 3 J Option 3 attribute Option 3 Z Option 3 attribute

            Thanks!

            ...

            ANSWER

            Answered 2021-May-26 at 02:31

            I recorded a video here walking through each options describe below here: https://youtu.be/3194zXoJtqI

            For this project, you will need to use two OpenRefine functions

            If you have a lot of columns you can use the All > Transform to speed up the process with the following expression row.record.cells[columnName].value[0]. The trick here is to fill down Col A last so we can keep the record mode when filling down other column (see screenshot below)

            Source https://stackoverflow.com/questions/67626449

            QUESTION

            Regex to delete all caps letters and following comma
            Asked 2021-Mar-23 at 22:09

            I have a csv of names like so Smith, SMITH, John, JOHN and I'm trying to use regex in OpenRefine to remove the names in all caps.

            replace(value, /^[A-Z]$/, '') does nothing and replace(value, /[A-Z]/, '') gets rid of all names with any capital letters and leaves a trail of stray commas.

            I need to delete the all caps names and any commas that may follow as well. I'm not interested in preserving the list by making all names lower case or capitalizing the first letter of each name. Any name in all caps must be deleted.

            ...

            ANSWER

            Answered 2021-Mar-23 at 22:09

            QUESTION

            Pattern Matching in OpenRefine JSON
            Asked 2021-Feb-25 at 16:23

            love OpenRefine and how easy it is to use, just been looking into the Extract / Apply bit and this would come in really useful for what I use OpenRefine for. I was hoping that it would be able to use wild cards to match a pattern in the apply section.

            So in the example below, I have a new column called Cluster and in there there are items which will be

            ...

            ANSWER

            Answered 2021-Feb-25 at 16:23

            First of all, I need to warn you that the Extract Operations / Apply Operations facility is not fully developed has a number of limitations if used on anything other than the original data.

            Anything that ends up being recorded as a mass-edit is unlikely to be useful for replaying on different data. For this use case, I'd suggest using something like the replace function with a regex pattern as the string to be replaced, so something like:

            Source https://stackoverflow.com/questions/66369559

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install OpenRefine

            OpenRefine Releases

            Support

            User ManualFAQOfficial Website and tutorial videos
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries

            Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link