warplda | Cache efficient implementation for Latent Dirichlet | Search Engine library

 by   thu-ml C++ Version: Current License: MIT

kandi X-RAY | warplda Summary

kandi X-RAY | warplda Summary

warplda is a C++ library typically used in Database, Search Engine, Tensorflow applications. warplda has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

WarpLDA is a cache efficient implementation of Latent Dirichlet Allocation, which samples each token in O(1).
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              warplda has a low active ecosystem.
              It has 158 star(s) with 56 fork(s). There are 10 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 4 open issues and 5 have been closed. On average issues are closed in 72 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of warplda is current.

            kandi-Quality Quality

              warplda has no bugs reported.

            kandi-Security Security

              warplda has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              warplda is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              warplda releases are not available. You will need to build from source code and install.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of warplda
            Get all kandi verified functions for this library.

            warplda Key Features

            No Key Features are available at this moment for warplda.

            warplda Examples and Code Snippets

            No Code Snippets are available at this moment for warplda.

            Community Discussions

            QUESTION

            Can text2vec and topicmodels generate similar topics with suitable parameter settings for LDA?
            Asked 2017-Nov-30 at 10:40

            I was wondering how results of different packages, hence, algorithms, differ and if parameters could be set in a way to produce similar topics. I had a look at the packages text2vec and topicmodels in particular.

            I used below code to compare 10 topics (see code section for terms) generated with these packages. I could not manage to generate sets of topics with similar meaning. E.g. topic 10 from text2vec has something to do with "police", none of the topics produced by topicmodels refers to "police" or similar terms. Further, I could not identify a pendant of topic 5 produced by topicmodels that has something to do with "life-love-familiy-war" in the topics produced by text2vec.

            I am a beginner with LDA, hence, my understanding may sound naive for experienced programmers. However, intuitively, one would asssume that it should be possible to produce sets of topics with similar meaning to prove validity/robustness of results. Of course, not necessarily the exact same set of terms, but termlists addressing similar topics.

            Maybe the issue is simply that my human interpretation of these termlists is not good enough to capture similarities, but maybe there are some parameters that might increase similarity for human interpretation. Can someone guide me on how to set parameters to achieve this or otherwise provide explanations or hint on suitable resources to improve my understanding of the matter?

            Here some issues that might be relevant:

            • I know that text2vec does not use standard Gibbs sampling but WarpLDA, which already is a difference in the algorithm to topcimodels. If my understanding is correct, the priors alpha and delta used in topicmodels are set as doc_topic_prior and topic_word_prior in text2vec respectively.
            • Furthermore, in postprocessing, text2vec allows the adaption of lambda for sorting terms of topics based on their frequency. I have not yet understood, how terms are sorted in topicmodels - comparable to setting lambda=1?. (I have tried different lambdas between 0 to 1 without getting similar topics)
            • Another issue is that is seems difficult to produce a fully reproducible example even when setting seed (see, e.g., this question). This is not directly my question but might make it more difficult to respond.

            Sorry for the lenghty question and thanks in advance for any help or suggestions.

            Update2: I have moved the content of my first update into an answer that is based on a more complete analysis.

            Update: Following the helpful comment of text2vec package creator Dmitriy Selivanov, I can confirm that setting lambda=1 increases the similarity of topics betweeen the termlists produced by the two packages.

            Furthermore, I had a closer look at the differences between termlists produced by both packages via a quick check of length(setdiff()) and length(intersect()) across topics (see in below code). This rough check shows that text2vec discards several terms per topic - probably by a threshold of probability for the individual topics? topicmodels keeps all terms for all topics. This explains part of the differences in meanings that can be derived (by a human) from the termlists.

            As mentioned above already, generating a reproducible example seems difficult, so I have not adapted all data examples in below code. Since run time is short, anybody can check on his/her own system.

            ...

            ANSWER

            Answered 2017-Nov-30 at 10:38

            After having updated my question with some comparison results, I was still interested more in detail. Therefore, I have run lda models on the complete movie_review data set included in text2vec (5000 docs). To produce half-way realistic results, I have also introduced some gentle pre-processing and stopword removal. (Sorry for the long code example below)

            My conclusion is that some of the "good" topics (from a subjective standpoint) produced by the two packages are comparable to a certain extent (especially the last three topics in below example are not really good and were difficult to compare). However, looking at similar topics between the two packages, produced different (subjective) associations for each topic. Hence, the standard Gibbs sampling and the WarpLDA algorithm seem to capture similar topical areas for the given data, but with different "moods" expressed in the topics.

            I would see the main reason for the differences in the fact that the WarpLDA algorithm seems to discard terms and introduce NA values in the beta matrix (term-topic-distribution). See below example for this. Hence, its faster convergence seems to be achieved by sacrificing completeness.

            I do not want to judge which topics are subjectively "better" and leave this to your own judgement.

            One important limitation of this analysis is, that I have not (yet) checked the results for an optimal number of topics, I only used k=10. Hence, comparability of the topics might increase for an optimal k, in any case the quality will improve and thereby maybe the "mood". (The optimal k might again differ between the algorithms depending on the measure used to find k.)

            Source https://stackoverflow.com/questions/46788242

            QUESTION

            Link to static library by cmake
            Asked 2017-Sep-13 at 16:20

            I have a project C++ using libnuma library. Because I don't have permission to install libnuma in the root system, so I have to install it in folder of user: /home/khangtg/opt. This folder contains 2 main folders:

            • Folder include contains: numacompat1.h, numa.h, numaif.h
            • Folder lib contains: libnuma.a, libnuma.la, libnuma.so, libnuma.so.1, libnuma.so.1.0.0

            Now, I have a file .cpp include libnuma library:

            ...

            ANSWER

            Answered 2017-Sep-13 at 16:20

            you want to set the link_directories to include the directory of the libraries. More can be found in the cmake docs. This tells the linker where to look for the libraries.

            It should probably look something like this

            Source https://stackoverflow.com/questions/46201935

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install warplda

            Download some data, and split it as training and testing set.
            GCC (>=4.8.5)
            CMake (>=2.8.12)
            git
            libnuma CentOS: yum install libnuma-devel Ubuntu: apt-get install libnuma-dev

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/thu-ml/warplda.git

          • CLI

            gh repo clone thu-ml/warplda

          • sshUrl

            git@github.com:thu-ml/warplda.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link