edit-distance | Levenshtein edit distance in Rust | Data Manipulation library

by febeling Rust Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | edit-distance Summary

edit-distance is a Rust library typically used in Utilities, Data Manipulation applications. edit-distance has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Levenshtein edit distance in Rust

Support

Quality

Security

License

Reuse

Support

edit-distance has a low active ecosystem.

It has 30 star(s) with 13 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. On average issues are closed in 424 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of edit-distance is current.

Quality

edit-distance has 0 bugs and 0 code smells.

Security

edit-distance has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

edit-distance code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

edit-distance is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

edit-distance releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of edit-distance

Get all kandi verified functions for this library.

edit-distance Key Features

No Key Features are available at this moment for edit-distance.

edit-distance Examples and Code Snippets

No Code Snippets are available at this moment for edit-distance.

Community Discussions

Trending Discussions on edit-distance

Coq Program Fixpoint vs equations as far as best way to get reduction lemmas?

How to normalize Levenshtein distance between 0 to 1

ghc error: hidden package, but it's actually exposed

Levenshtein distance with substitution, deletion and insertion count

Why does indexing a string inside of a recursive call yield a different result?

An algorithm for computing the edit-distance between two words

stack build on macOS

Speeding up Levenshtein distance calculation in python with global variables

QUESTION

Coq Program Fixpoint vs equations as far as best way to get reduction lemmas?

Asked 2022-Mar-24 at 21:42

I am trying to prove that particular implementations of how to calculate the edit distance between two strings are correct and yield identical results. I went with the most natural way to define edit distance recursively as a single function (see below). This caused coq to complain that it couldn't determine the decreasing argument. After some searching, it seems that using the Program Fixpoint mechanism and providing a measure function is one way around this problem. However, this led to the next problem that the tactic simpl no longer works as expected. I found this question which has a similar problem, but I am getting stuck because I don't understand the role the Fix_sub function is playing in the code generated by coq for my edit distance function which looks more complicated than in the simple example in the previous question.

Questions:

For a function like edit distance, could the Equations package be easier to use than Program Fixpoint (get reduction lemmas automatically)? The previous question on this front is from 2016, so I am curious if the best practices on this front have evolved since then.
I came across this coq program involving edit_distance that using an inductively defined prop instead of a function. Maybe this is me still trying to wrap my head around the Curry-Howard Correspondence, but why is Coq willing to accept the inductive proposition definition for edit_distance without termination/measure complaints but not the function driven approach? Does this mean there is an angle using a creatively defined inductive type that could be passed to edit_distance that contains both strings that wrapped as a pair and a number and process on that coq would more easily accept as structural recursion?

Is there an easier way using Program Fixpoint to get reductions?

...

ANSWER

Answered 2022-Mar-24 at 21:12

There is a common trick to this kind of recursion over two arguments, which is to write two nested functions, each recursing over one of the two arguments.

This can also be understood from the perspective of dynamic programming, where the edit distance is computed by traversing a matrix. More generally, the edit distance function edit xs ys can be viewed as a matrix of nat with rows indexed by xs and columns indexed by ys. The outer recursion iterates over rows xs, and for each of those rows, when xs = x :: xs', the inner recursion iterates over its columns ys to generates the entries of that row from another row with a smaller index xs'.

Source https://stackoverflow.com/questions/71608107

QUESTION

How to normalize Levenshtein distance between 0 to 1

Asked 2020-Sep-29 at 20:54

I have to normalize the Levenshtein distance between 0 to 1. I see different variations floating in SO.

I am thinking to adopt the following approach:

if two strings, s1 and s2
len = max(s1.length(), s2.length());
normalized_distance = float(len - levenshteinDistance(s1, s2)) / float(len);

Then the highest score 1.0 means an exact match and 0.0 means no match.

But I see variations here: two whole texts similarity using levenshtein distance where 1- distance(a,b)/max(a.length, b.length)

Difference in normalization of Levenshtein (edit) distance?

Explanation of normalized edit distance formula

I am wondering is there a canonical code implementation in Java? I know org.apache.commons.text only implements LevenshteinDistance and not normalized LevenshteinDistance.

https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/LevenshteinDistance.html

...

ANSWER

Answered 2020-Sep-29 at 06:11

Your first answer begins with "The effects of both variants should be nearly the same". The reason normalized LevenshteinDistance doesn't exist is because you (or somebody else) hasn't seen fit to implement it. Besides, it seems a rather trivial once you have the Levenshtein distance:

Source https://stackoverflow.com/questions/64113621

QUESTION

ghc error: hidden package, but it's actually exposed

Asked 2020-Aug-05 at 18:57

I'm trying to use the package Parsec. When I run ghc Main.hs I get the error message:

...

ANSWER

Answered 2020-Aug-05 at 18:57

This looks like an issue with global vs local installs. Oh, and there it is in your ghc-pkg list output. You've got a multiuser ghc install and a single-user list of packages you've installed. Things work when you run ghc as a superuser because they won't see your local (per-user) installs.

This is going to cause problems unless you use a tool to manage your environment for you. Both cabal and stack can handle this fine. I prefer cabal because it doesn't need coaxing to work with your preinstalled GHC, but this is a matter that has caused religious wars in the past. I won't argue against stack if you have a good resource for using it instead.

Source https://stackoverflow.com/questions/63267436

QUESTION

Levenshtein distance with substitution, deletion and insertion count

Asked 2020-May-13 at 21:24

There's a great blog post here https://davedelong.com/blog/2015/12/01/edit-distance-and-edit-steps/ on Levenshtein distance. I'm trying to implement this to also include counts of subs, dels and ins when returning the Levenshtein distance. Just running a smell check on my algorithm.

...

ANSWER

Answered 2020-May-13 at 21:23

The problem was that Python does address passing for objects so I should be cloning the lists to the variables rather than doing a direct reference.

Source https://stackoverflow.com/questions/61784300

QUESTION

Why does indexing a string inside of a recursive call yield a different result?

Asked 2020-May-10 at 10:30

In my naive implementation of edit-distance finder, I have to check whether the last characters of two strings match:

...

ANSWER

Answered 2020-May-10 at 10:30

The operators have different precedence from what you expect. In const auto delt = a[$ - 1] == b[$ - 1] ? 0 : 1; there is no ambiguity, but in editDistance(a[0 .. $ - 1], b[0 .. $ - 1]) + a[$ - 1] == b[$ - 1] ? 0 : 1, there is (seemingly).

Simplifying:

Source https://stackoverflow.com/questions/61707979

QUESTION

An algorithm for computing the edit-distance between two words

Asked 2020-Apr-24 at 02:10

I am trying to write Python code that takes a word as an input (e.g. book), and outputs the most similar word with similarity score.

I have tried different off-the-shelf edit-distance algorithms like cosine, Levenshtein and others, but these cannot tell the degree of differences. For example, (book, bouk) and (book,bo0k). I am looking for an algorithm that can gives different scores for these two examples. I am thinking about using fastText or BPE, however they use cosine distance.

Is there any algorithm that can solve this?

...

ANSWER

Answered 2020-Apr-22 at 12:25

The problem is that both "bo0k" and "bouk" are one character different from "book", and no other metric will give you a way to distinguish between them.

What you will need to do is change the scoring: Instead of counting a different character as an edit distance of 1, you could give it a higher score if it's a different character class (ie a digit instead of a letter). That way you will get a different score for your examples.

You might have to adapt the other scores as well, though, so that replacement / insertion / deletion are still consistent.

Source https://stackoverflow.com/questions/61364975

QUESTION

stack build on macOS

Asked 2020-Apr-04 at 17:27

I am new to haskell. I have the simplest of simple programs.

...

ANSWER

Answered 2020-Apr-04 at 17:27

I just found this known bug: https://github.com/commercialhaskell/stack/issues/4373

That is exactly what I'm seeing.

The workaround required is to update a settings file that is buried deep under a newly generated ~/.stack directory https://github.com/commercialhaskell/stack/issues/4373#issuecomment-432726112

Those instructions are incomplete so I added a comment to that bug to clarify. That settings location: ~/.stack/programs/x86_64-osx/ghc-8.8.3/lib/ghc-8.8.3/settings

And this works (note that stack test is a combination of stack build and stack test):

Source https://stackoverflow.com/questions/61023053

QUESTION

Speeding up Levenshtein distance calculation in python with global variables

Asked 2020-Jan-05 at 09:19

Hi I'm using python for a project in bioinformatics.

I have a function that uses the Needleman-Wunsch algorithm to calculate the edit-distance between a query and a read from our Next-generation-Sequencing platform. (both strings with the alphabet: 'ACGT') My script works fine, but takes a long time to run, because the function is called more than a 100 million times in total. In the function I use a 2-dimensional list with size MxN, where M is the length of the query and N is the length of the read. Every time the function is called this 2D-list has to be recreated in memory before it can be filled with the calculation. I was wondering if I could speed up the process by creating a 2D-List as global variable, and then passing the handle to this List as an argument to the function. This way the memory would only have to be allocated once by the operating system. Hope I made my question clear. How much time does requesting the memory for a list from the operating system take. Is it significant?

Edit: some sample code as requested:

The function goes through the 2D-Array and fills it with numbers:

...

ANSWER

Answered 2020-Jan-04 at 22:46

This is my take on the performance impact by having the list enclosed locally in the function rather than "globally".

Edit: as pointed out by @DanD in the comments, I wrote (and deleted) before the more traditional way of stacks and heaps. This is not entirely true for Python. The Python Virtual Machine (PVM) only uses a private heap to allocate its objects. But the PVM itself has been implemented as a stack. Then Python uses reference counters (among other things) to keep track of the objects, whether they should be discarded or not. When you use your first example, the list object gets pushed onto the stack again and again and again. The previous list object gets its reference counter decreased, and then gets removed when the reference counter reaches 0. This is a good amount of overhead. Your second example creates the list object once, keeps the reference counter satisfied, and then the PVM can use that object each time you make your call.

So: instead of recreating the list object for each call and generating new references, the performance is gained by having only 1 list object created with the same references.

Here is a small example, which your first and second example in a nutshell:

Source https://stackoverflow.com/questions/59593094

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install edit-distance

Then re-run cargo build. That fetches the dependencies and builds the code.

Support

Fork it!Create your feature branch: git checkout -b my-new-featureDevelop your changes (see details above)Commit your changes: git commit -am 'Add some feature'Push to the branch: git push origin my-new-featureSubmit a pull request :D

Find more information at: