stringdist | String distance functions for R
kandi X-RAY | stringdist Summary
kandi X-RAY | stringdist Summary
The package offers the following main functions:.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of stringdist
stringdist Key Features
stringdist Examples and Code Snippets
Community Discussions
Trending Discussions on stringdist
QUESTION
Is there a way of joining two dataframes via where a row in the first dataframe is joined with every row in the second dataframe if they share a word in common?
For example:
...ANSWER
Answered 2022-Mar-15 at 18:03With fuzzy_join
:
QUESTION
I am having trouble matching character strings. Most of the difficulty centers on abbreviation
I have two character vectors. I am trying to match words in vector A (typos) to the closes match in vector B.
...ANSWER
Answered 2022-Mar-07 at 17:10Maybe agrep
is what the question is asking for.
QUESTION
I have a huge dataset and that look like this. To save some memory I want to calculate the pairwise distance but leave the upper diagonal of the matrix to NULL.
...ANSWER
Answered 2022-Mar-01 at 10:37I think you may need to use sparse matrices. Package Matrix
has such a possibility. You can learn more about sparse matrices at: Sparse matrix
QUESTION
I have two very large dataframes containing names of people. The two dataframes report different information on these people (i.e. df1 reports data on health status and df2 on socio-economic status). A subset of people appears in both dataframes. This is the sample I am interested in. I would need to create a new dataframe which includes only those people appearing in both datasets. There are, however, small differences in the names, mostly due to typos.
My data is as follows:
...ANSWER
Answered 2022-Mar-01 at 12:45library(tidyverse)
library(fuzzyjoin)
df1 <- tibble(
name = c("Joe Smith", "Michael Fagin"),
smoker = c("yes", "yes")
)
df2 <- tibble(
name = c("Joe Smit", "Michael Fegin"),
occupation = c("post doc", "IT consultant")
)
df1 %>%
# max 3 chars different
stringdist_inner_join(df2, max_dist = 3)
#> Joining by: "name"
#> # A tibble: 2 × 4
#> name.x smoker name.y occupation
#>
#> 1 Joe Smith yes Joe Smit post doc
#> 2 Michael Fagin yes Michael Fegin IT consultant
QUESTION
I have a somewhat messy address database that track moves by a given order in long format. I want to add columns to match it to the next address, but I want to skip it / drop the entry if the next address is too close.
The process I have so far mirrors this one:
...ANSWER
Answered 2022-Feb-12 at 00:16This looks at the next address and drops it if it is close. It uses agrepl
which can also be fine tuned with cost
and max.distance
QUESTION
I am trying to create a heatmap with an external dendrogram using the ComplexHeatmap
library .
ANSWER
Answered 2022-Feb-04 at 11:21The problem is that after all the transformations:
QUESTION
Hi I am trying to match one string from other string in different dataframe and get nearest n matches based on score.
EX: from string_2 (df_2) column i need to match with string_1(df_1) and get the nearest 3 matches based on each ID group.
...ANSWER
Answered 2022-Jan-06 at 09:37 merge(df_1, df_2, by = 'ID') %>%
group_by(string_2) %>%
mutate(dist = (stringdist::stringdist(string_2,string_1, 'jw')) %>%
rank(ties = 'last')) %>%
slice_min(dist, n = 3) %>%
pivot_wider(names_from = dist, names_prefix = 'nearest_str_match_',
values_from = string_1)
# A tibble: 7 x 5
# Groups: string_2 [7]
ID string_2 nearest_str_match_1 nearest_str_match_2 nearest_str_match_3
1 104 Addidas Addidas Nike Puma
2 100 Jack Daniel Jack Daniel JackDan Jac
3 100 Mark JackDan Jack Daniel Jac
4 103 Mark 2 Mark Duke Allan
5 104 Nike Nike Addidas Puma
6 104 Reebok Nike Puma Nike Addidas
7 103 Steve Duke Dukes Allan
QUESTION
I am using the stringdist
package in R
.
For several options:
...ANSWER
Answered 2021-Nov-04 at 13:00You can use tolower
and write your pattern in lowercase to ignore case:
QUESTION
I have test data as follows. I am trying to find (near) matches for a vector of words, using stringdist
as the actual database is large:
ANSWER
Answered 2021-Nov-03 at 13:58Get the index of matches, then update all rows that match:
QUESTION
Let's say I have the following data.frame
...ANSWER
Answered 2021-Oct-01 at 17:40Since you want to match each chat with all the others, the complexity of the algorithm will obviously be high.
However, you can remove the id of the chats that already have a match from the competitors, so that each step take a little shorter than the previous one.
As much as I hate for
loops in R, I couldn't find a purrr
solution so here we go:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install stringdist
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page