stringdist | String distance functions for R

 by   markvanderloo R Version: Current License: No License

kandi X-RAY | stringdist Summary

kandi X-RAY | stringdist Summary

stringdist is a R library. stringdist has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

The package offers the following main functions:.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              stringdist has a low active ecosystem.
              It has 292 star(s) with 36 fork(s). There are 14 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 20 open issues and 68 have been closed. On average issues are closed in 352 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of stringdist is current.

            kandi-Quality Quality

              stringdist has 0 bugs and 0 code smells.

            kandi-Security Security

              stringdist has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              stringdist code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              stringdist does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              stringdist releases are not available. You will need to build from source code and install.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of stringdist
            Get all kandi verified functions for this library.

            stringdist Key Features

            No Key Features are available at this moment for stringdist.

            stringdist Examples and Code Snippets

            No Code Snippets are available at this moment for stringdist.

            Community Discussions

            QUESTION

            Return multiple possible matches when fuzzy joining two dataframes or vectors in R if they share a word in common
            Asked 2022-Mar-15 at 18:03

            Is there a way of joining two dataframes via where a row in the first dataframe is joined with every row in the second dataframe if they share a word in common?

            For example:

            ...

            ANSWER

            Answered 2022-Mar-15 at 18:03

            QUESTION

            Matching strings with abbreviations; fuzzy matching
            Asked 2022-Mar-07 at 17:10

            I am having trouble matching character strings. Most of the difficulty centers on abbreviation

            I have two character vectors. I am trying to match words in vector A (typos) to the closes match in vector B.

            ...

            ANSWER

            Answered 2022-Mar-07 at 17:10

            Maybe agrep is what the question is asking for.

            Source https://stackoverflow.com/questions/71383882

            QUESTION

            Convert to matrix but keep one diagonal to NULL in R
            Asked 2022-Mar-02 at 09:16

            I have a huge dataset and that look like this. To save some memory I want to calculate the pairwise distance but leave the upper diagonal of the matrix to NULL.

            ...

            ANSWER

            Answered 2022-Mar-01 at 10:37

            I think you may need to use sparse matrices. Package Matrix has such a possibility. You can learn more about sparse matrices at: Sparse matrix

            Source https://stackoverflow.com/questions/71305971

            QUESTION

            Match two columns based on string distance in R
            Asked 2022-Mar-01 at 12:45

            I have two very large dataframes containing names of people. The two dataframes report different information on these people (i.e. df1 reports data on health status and df2 on socio-economic status). A subset of people appears in both dataframes. This is the sample I am interested in. I would need to create a new dataframe which includes only those people appearing in both datasets. There are, however, small differences in the names, mostly due to typos.

            My data is as follows:

            ...

            ANSWER

            Answered 2022-Mar-01 at 12:45
            library(tidyverse)
            library(fuzzyjoin)
            
            df1  <- tibble(
              name = c("Joe Smith", "Michael Fagin"),
              smoker = c("yes", "yes")
            )
            
            df2 <- tibble(
              name = c("Joe Smit", "Michael Fegin"),
              occupation = c("post doc", "IT consultant")
            )
            
            df1 %>%
              # max 3 chars different
              stringdist_inner_join(df2, max_dist = 3)
            #> Joining by: "name"
            #> # A tibble: 2 × 4
            #>   name.x        smoker name.y        occupation   
            #>                               
            #> 1 Joe Smith     yes    Joe Smit      post doc     
            #> 2 Michael Fagin yes    Michael Fegin IT consultant
            

            Source https://stackoverflow.com/questions/71307580

            QUESTION

            finding and matching next address, but want to drop if string is too close
            Asked 2022-Feb-12 at 00:16

            I have a somewhat messy address database that track moves by a given order in long format. I want to add columns to match it to the next address, but I want to skip it / drop the entry if the next address is too close.

            The process I have so far mirrors this one:

            ...

            ANSWER

            Answered 2022-Feb-12 at 00:16

            This looks at the next address and drops it if it is close. It uses agrepl which can also be fine tuned with cost and max.distance

            Source https://stackoverflow.com/questions/71086935

            QUESTION

            External dendrogram does not keep the same formation when using it for cluster_rows in complexheatmap
            Asked 2022-Feb-04 at 11:21

            I am trying to create a heatmap with an external dendrogram using the ComplexHeatmap library .

            ...

            ANSWER

            Answered 2022-Feb-04 at 11:21

            The problem is that after all the transformations:

            Source https://stackoverflow.com/questions/70956152

            QUESTION

            Get nearest n matching strings
            Asked 2022-Jan-06 at 09:37

            Hi I am trying to match one string from other string in different dataframe and get nearest n matches based on score.

            EX: from string_2 (df_2) column i need to match with string_1(df_1) and get the nearest 3 matches based on each ID group.

            ...

            ANSWER

            Answered 2022-Jan-06 at 09:37
             merge(df_1, df_2, by = 'ID') %>%
               group_by(string_2) %>%
               mutate(dist = (stringdist::stringdist(string_2,string_1, 'jw')) %>%
                        rank(ties = 'last')) %>%
               slice_min(dist, n = 3) %>%
               pivot_wider(names_from = dist, names_prefix = 'nearest_str_match_', 
                           values_from = string_1)
            
            # A tibble: 7 x 5
            # Groups:   string_2 [7]
                 ID string_2    nearest_str_match_1 nearest_str_match_2 nearest_str_match_3
                                                                  
            1   104 Addidas     Addidas             Nike                Puma               
            2   100 Jack Daniel Jack Daniel         JackDan             Jac                
            3   100 Mark        JackDan             Jack Daniel         Jac                
            4   103 Mark 2      Mark                Duke                Allan              
            5   104 Nike        Nike                Addidas             Puma               
            6   104 Reebok      Nike                Puma Nike           Addidas            
            7   103 Steve       Duke                Dukes               Allan   
            

            Source https://stackoverflow.com/questions/70604078

            QUESTION

            Ignoring the case for maxDist in stringdist::extract
            Asked 2021-Nov-04 at 13:00

            I am using the stringdist package in R.

            For several options:

            ...

            ANSWER

            Answered 2021-Nov-04 at 13:00

            You can use tolower and write your pattern in lowercase to ignore case:

            Source https://stackoverflow.com/questions/69839087

            QUESTION

            Finding matches for multiple words with stringdist
            Asked 2021-Nov-03 at 13:58

            I have test data as follows. I am trying to find (near) matches for a vector of words, using stringdist as the actual database is large:

            ...

            ANSWER

            Answered 2021-Nov-03 at 13:58

            Get the index of matches, then update all rows that match:

            Source https://stackoverflow.com/questions/69825488

            QUESTION

            Identify which pairs of rows are most similar (string distance) in a data.frame
            Asked 2021-Oct-01 at 17:40

            Let's say I have the following data.frame

            ...

            ANSWER

            Answered 2021-Oct-01 at 17:40

            Since you want to match each chat with all the others, the complexity of the algorithm will obviously be high.

            However, you can remove the id of the chats that already have a match from the competitors, so that each step take a little shorter than the previous one.

            As much as I hate for loops in R, I couldn't find a purrr solution so here we go:

            Source https://stackoverflow.com/questions/69394227

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install stringdist

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/markvanderloo/stringdist.git

          • CLI

            gh repo clone markvanderloo/stringdist

          • sshUrl

            git@github.com:markvanderloo/stringdist.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link