stringdist | String distance functions for R

by markvanderloo R Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | stringdist Summary

stringdist is a R library. stringdist has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

The package offers the following main functions:.

Support

Quality

Security

License

Reuse

Support

stringdist has a low active ecosystem.

It has 292 star(s) with 36 fork(s). There are 14 watchers for this library.

It had no major release in the last 6 months.

There are 20 open issues and 68 have been closed. On average issues are closed in 352 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of stringdist is current.

Quality

stringdist has 0 bugs and 0 code smells.

Security

stringdist has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

stringdist code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

stringdist does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

stringdist releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of stringdist

Get all kandi verified functions for this library.

stringdist Key Features

No Key Features are available at this moment for stringdist.

stringdist Examples and Code Snippets

No Code Snippets are available at this moment for stringdist.

Community Discussions

Trending Discussions on stringdist

Return multiple possible matches when fuzzy joining two dataframes or vectors in R if they share a word in common

Matching strings with abbreviations; fuzzy matching

Convert to matrix but keep one diagonal to NULL in R

Match two columns based on string distance in R

finding and matching next address, but want to drop if string is too close

External dendrogram does not keep the same formation when using it for cluster_rows in complexheatmap

Get nearest n matching strings

Ignoring the case for maxDist in stringdist::extract

Finding matches for multiple words with stringdist

Identify which pairs of rows are most similar (string distance) in a data.frame

QUESTION

Return multiple possible matches when fuzzy joining two dataframes or vectors in R if they share a word in common

Asked 2022-Mar-15 at 18:03

Is there a way of joining two dataframes via where a row in the first dataframe is joined with every row in the second dataframe if they share a word in common?

For example:

...

ANSWER

Answered 2022-Mar-15 at 18:03

With fuzzy_join:

Source https://stackoverflow.com/questions/71486862

QUESTION

Matching strings with abbreviations; fuzzy matching

Asked 2022-Mar-07 at 17:10

I am having trouble matching character strings. Most of the difficulty centers on abbreviation

I have two character vectors. I am trying to match words in vector A (typos) to the closes match in vector B.

...

ANSWER

Answered 2022-Mar-07 at 17:10

Maybe agrep is what the question is asking for.

Source https://stackoverflow.com/questions/71383882

QUESTION

Convert to matrix but keep one diagonal to NULL in R

Asked 2022-Mar-02 at 09:16

I have a huge dataset and that look like this. To save some memory I want to calculate the pairwise distance but leave the upper diagonal of the matrix to NULL.

...

ANSWER

Answered 2022-Mar-01 at 10:37

I think you may need to use sparse matrices. Package Matrix has such a possibility. You can learn more about sparse matrices at: Sparse matrix

Source https://stackoverflow.com/questions/71305971

QUESTION

Match two columns based on string distance in R

Asked 2022-Mar-01 at 12:45

I have two very large dataframes containing names of people. The two dataframes report different information on these people (i.e. df1 reports data on health status and df2 on socio-economic status). A subset of people appears in both dataframes. This is the sample I am interested in. I would need to create a new dataframe which includes only those people appearing in both datasets. There are, however, small differences in the names, mostly due to typos.

My data is as follows:

...

ANSWER

Answered 2022-Mar-01 at 12:45

library(tidyverse)
library(fuzzyjoin)

df1  <- tibble(
  name = c("Joe Smith", "Michael Fagin"),
  smoker = c("yes", "yes")
)

df2 <- tibble(
  name = c("Joe Smit", "Michael Fegin"),
  occupation = c("post doc", "IT consultant")
)

df1 %>%
  # max 3 chars different
  stringdist_inner_join(df2, max_dist = 3)
#> Joining by: "name"
#> # A tibble: 2 × 4
#>   name.x        smoker name.y        occupation   
#>                               
#> 1 Joe Smith     yes    Joe Smit      post doc     
#> 2 Michael Fagin yes    Michael Fegin IT consultant

Source https://stackoverflow.com/questions/71307580

QUESTION

finding and matching next address, but want to drop if string is too close

Asked 2022-Feb-12 at 00:16

I have a somewhat messy address database that track moves by a given order in long format. I want to add columns to match it to the next address, but I want to skip it / drop the entry if the next address is too close.

The process I have so far mirrors this one:

...

ANSWER

Answered 2022-Feb-12 at 00:16

This looks at the next address and drops it if it is close. It uses agrepl which can also be fine tuned with cost and max.distance

Source https://stackoverflow.com/questions/71086935

QUESTION

External dendrogram does not keep the same formation when using it for cluster_rows in complexheatmap

Asked 2022-Feb-04 at 11:21

I am trying to create a heatmap with an external dendrogram using the ComplexHeatmap library .

...

ANSWER

Answered 2022-Feb-04 at 11:21

The problem is that after all the transformations:

Source https://stackoverflow.com/questions/70956152

QUESTION

Get nearest n matching strings

Asked 2022-Jan-06 at 09:37

Hi I am trying to match one string from other string in different dataframe and get nearest n matches based on score.

EX: from string_2 (df_2) column i need to match with string_1(df_1) and get the nearest 3 matches based on each ID group.

...

ANSWER

Answered 2022-Jan-06 at 09:37

 merge(df_1, df_2, by = 'ID') %>%
   group_by(string_2) %>%
   mutate(dist = (stringdist::stringdist(string_2,string_1, 'jw')) %>%
            rank(ties = 'last')) %>%
   slice_min(dist, n = 3) %>%
   pivot_wider(names_from = dist, names_prefix = 'nearest_str_match_', 
               values_from = string_1)

# A tibble: 7 x 5
# Groups:   string_2 [7]
     ID string_2    nearest_str_match_1 nearest_str_match_2 nearest_str_match_3
                                                      
1   104 Addidas     Addidas             Nike                Puma               
2   100 Jack Daniel Jack Daniel         JackDan             Jac                
3   100 Mark        JackDan             Jack Daniel         Jac                
4   103 Mark 2      Mark                Duke                Allan              
5   104 Nike        Nike                Addidas             Puma               
6   104 Reebok      Nike                Puma Nike           Addidas            
7   103 Steve       Duke                Dukes               Allan

Source https://stackoverflow.com/questions/70604078

QUESTION

Ignoring the case for maxDist in stringdist::extract

Asked 2021-Nov-04 at 13:00

I am using the stringdist package in R.

For several options:

...

ANSWER

Answered 2021-Nov-04 at 13:00

You can use tolower and write your pattern in lowercase to ignore case:

Source https://stackoverflow.com/questions/69839087

QUESTION

Finding matches for multiple words with stringdist

Asked 2021-Nov-03 at 13:58

I have test data as follows. I am trying to find (near) matches for a vector of words, using stringdist as the actual database is large:

...

ANSWER

Answered 2021-Nov-03 at 13:58

Get the index of matches, then update all rows that match:

Source https://stackoverflow.com/questions/69825488

QUESTION

Identify which pairs of rows are most similar (string distance) in a data.frame

Asked 2021-Oct-01 at 17:40

Let's say I have the following data.frame

...

ANSWER

Answered 2021-Oct-01 at 17:40

Since you want to match each chat with all the others, the complexity of the algorithm will obviously be high.

However, you can remove the id of the chats that already have a match from the competitors, so that each step take a little shorter than the previous one.

As much as I hate for loops in R, I couldn't find a purrr solution so here we go:

Source https://stackoverflow.com/questions/69394227

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install stringdist

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: