SuffixTree | Optimized implementation of suffix tree | Computer Vision library
kandi X-RAY | SuffixTree Summary
kandi X-RAY | SuffixTree Summary
Optimized implementation of suffix tree in python using Ukkonen's algorithm.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- extend the suffix tree
- Traverse the tree traversal .
- Determines if the current node is up to the current node .
- initialize the tree
- Recursively find all matches
- creates a new node
- Check if the sub_string is empty .
- Get the attribute of the node .
- Checks equality operator .
- Checks if the node has no effect .
SuffixTree Key Features
SuffixTree Examples and Code Snippets
Community Discussions
Trending Discussions on SuffixTree
QUESTION
I am trying to update the end variable in my SuffixNode Class, automatically. What I mean is I have created a SuffixNode in the following code, and I assigned the endIndex.end as the SuffixNode's end value. Then I update the endIndex.end to 2. However, when I print the (self.root.end) out after I updated the endIndex.end value, the end value store in SuffixNode is still showing 1 rather than show the updated 2.
Can anyone provide me with a suggestion on how should I modify the code, so that when I update the endIndex.end, the end value store in the SuffixNode will also update automatically.
Thank you
Below is the code
class EndIndex: def init(self, endIndexValue): self.end = endIndexValue
...ANSWER
Answered 2020-May-13 at 09:15You don't need to create endIndex
in your code at all. And that's the only change that you need to make. So, your SuffixTree
should be like that:
QUESTION
Introduction
I have this favorite algorithm that I've made quite some time ago which I'm always writing and re-writing in new programming languages, platforms etc. as some sort of benchmark. Although my main programming language is C# I've just quite literally copy-pasted the code and changed the syntax slightly, built it in Java and found it to run 1000x faster.
The Code
There is quite a bit of code but I'm only going to present this snippet which seems to be the main issue:
...ANSWER
Answered 2018-Aug-12 at 16:22Issue Origin
After having a glorious battle that lasted two days and three nights (and amazing ideas and thoughts from the comments) I've finally managed to fix this issue!
I'd like to post an answer for anybody running into similar issues where the string.Substring(i, j)
function is not an acceptable solution to get the substring of a string because the string is either too large and you can't afford the copying done by string.Substring(i, j)
(it has to make a copy because C# strings are immutable, no way around it) or the string.Substring(i, j)
is being called a huge number of times over the same string (like in my nested for loops) giving the garbage collector a hard time, or as in my case both!
Attempts
I've tried many suggested things such as the StringBuilder, Streams, unmanaged memory allocation using Intptr and Marshal within the unsafe{}
block and even creating an IEnumerable and yield return the characters by reference within the given positions. All of these attempts failed ultimatively because some form of joining of the data had to be done as there was no easy way for me to traverse my tree character by character without jeopardizing performance. If only there was a way to span over multiple memory addresses within an array at once like you would be able to in C++ with some pointer arithmetic.. except there is..
(credits to @Ivan Stoev's comment)
The Solution
The solution was using System.ReadOnlySpan
(couldn't be System.Span
due to strings being immutable) which, among other things, allows us to read sub arrays of memory addresses within an existing array without creating copies.
This piece of the code posted:
QUESTION
I'm working with suffix trees. As far as I can tell, I have Ukkonen's algorithm running correctly to build a generalised suffix tree from an arbitrary number of strings. I'm now trying to implement a find_longest_common_substring()
method to do exactly that. For this to work, I understand that I need to find the deepest shared edge (with depth in terms of characters, rather than edges) between all strings in the tree, and I've been struggling for a few days to get the traversal right.
Right now I have the following in C++. I'll spare you all my code, but for context, I'm keeping the edges of each node in an unordered_map called outgoing_edges
, and each edge has a vector of ints recorded_strings
containing integers identifying the added strings. The child
field of an edge is the node it is going to, and l
and r
identify its left and rightmost indices, respectively. Finally, current_string_number
is the current number of strings in the tree.
ANSWER
Answered 2017-Aug-19 at 20:40Your handling of deepest_shared_edge
is wrong. First, the allocation you do at the start of the function is a memory leak, since you never free the memory. Secondly, the result of the recursive call is ignored, so whatever deepest edge it finds is lost (although you update the depth, you don't keep track of the deepest edge).
To fix this, you should either pass deepest_shared_edge
as a reference parameter (like you do for longest
), or you can initialize it to nullptr
, then check the return from your recursive call for nullptr
and update it appropriately.
QUESTION
I am much more familiar with C# than C++ so I must ask for advice on this issue. I had to rewrite some code pieces to C++ and then (surprisingly) ran into performance issues.
I've narrowed the problem down to these snippets:
C#
...ANSWER
Answered 2017-Aug-17 at 23:40In C#, a System.String includes its Length, so you can get the length in constant time. In C++, a std::string
also includes its size, so it is also available in constant time.
However, you aren’t using C++ std::string
(which you should be, for a good translation of the algorithm); you’re using a C-style null-terminated char
array. That char*
literally means “pointer to char
”, and just tells you where the first character of the string is. The strlen
function looks at each char
from the one pointed to forward, until it finds a null character '\0'
(not to be confused with a null pointer); this is expensive, and you do it in each iteration of your loop in insertSuffix
. That probably accounts for at least a reasonable fraction of your slowdown.
When doing C++, if you find yourself working with raw pointers (any type involving a *
), you should always wonder if there’s a simpler way. Sometimes the answer is “no”, but often it’s “yes” (and that’s getting more common as the language evolves). For example, consider your struct node
and node* root
. Both use node
pointers, but in both cases you should have used node
directly because there is no need to have that indirection (in the case of node
, some amount of indirection is necessary so you don’t have each node containing another node ad infinitum, but that’s provided by the std::unordered_map
).
A couple other tips:
- In C++ you often don’t want to do any work in the body of a constructor, but instead use initialization lists.
- When you don’t want to copy something you pass as a parameter, you should make the parameter a reference; instead of changing
insertSuffix
to take astd::string
as the first parameter, make it takestd::string const&
; similarly,contains
should take astd::string const&
. Better yet, sinceinsertSuffix
can see thetext
member, it doesn’t need to take that first parameter at all and can just usefrom
. - C++ supports a foreach-like construct, which you should probably prefer to a standard
for
loop when iterating over a string’s characters. - If you’re using the newest not-technically-finalized-but-close-enough version of C++, C++17, you should use
std::string_view
instead ofstd::string
whenever you just want a look at a string, and don’t need to change it or keep a reference to it around. This would be useful forcontains
, and since you want to make a local copy in thetext
member, even for the constructor; it would not be useful in thetext
member itself, because the object being viewed might be temporary. Lifetime can sometimes be tricky in C++, though, and until you get the hang of it you might just want to usestd::string
to be on the safe side. - Since
node
isn’t useful outside of the concept ofsuffixTree
, it should probably be inside it, like in the C# version. As a deviation from the C# version, you might want to make the typenode
and the data membersroot
andtext
intoprivate
instead ofpublic
members.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install SuffixTree
You can use SuffixTree like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page