hlda | implements hierarchical latent Dirichlet allocation | Topic Modeling library
kandi X-RAY | hlda Summary
kandi X-RAY | hlda Summary
this code implements hierarchical lda with a fixed depth tree and a stick breaking prior on the depth weights. an infinite-depth tree can be approximated by setting the depth to be very high. this code requires that you have installed the gsl package. the input format of the data is the same as in the lda-c package. each line contains. [# of unique terms] [term #] : [count] ... the settings file controls various parameters of the model. there are several settings files contained in this directory. i hope that this code is useful to you, but please note that this code is unsupported. do not email me (david
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of hlda
hlda Key Features
hlda Examples and Code Snippets
Community Discussions
Trending Discussions on hlda
QUESTION
I'm using the mallet topic-modeling tool and have some difficulties to make it stable (the topics that I get are not seemed very logic).
I worked with your tutorial and that one: https://programminghistorian.org/en/lessons/topic-modeling-and-mallet#getting-your-own-texts-into-mallet and I got some questions on that:
- Is there some best practices for get that model to work? Except the optimize command (what is a good number for that)? What is good number for iterations command?
- I import my data with the import dir command. In that dir there are my files. Is it matter if those files contain a text with new lines or just a very long line?
- I read about the hLDA model. When I tried to run it I saw that the only output is the state.txt output that is not very clear. I expect for an output like the topic-modeling model (topic_keys.txt, doc_topics.txt) how can I get them?
- When should I use the hLDA rather then the topic-modeling?
Thanks a lot for your help!
...ANSWER
Answered 2019-Apr-12 at 13:27Some references for good practices in topic modeling are The Care and Feeding of Topic Models with Jordan Boyd-Graber and Dave Newman and Applied Topic Modeling with Jordan Boyd-Graber and Yuening Hu.
For hyperparameter optimization --optimize-interval 20 --optimize-burn-in 50
should be fine, it doesn't seem to be very sensitive to specific values. Convergence for Gibbs sampling is hard to assess, the default 1000 iterations should be interpreted as "a number large enough that it's probably ok" rather than a specific value.
If you are reading individual documents from files in a directory, lines don't matter. If documents are longer than about 1000 tokens before stopword removal, consider breaking them into smaller segments.
hLDA is only included because people seem to want it, I don't recommend it for any purpose.
QUESTION
In the past few days, I have started using Mallet. I am specifically interested in running a hierarchical topic model, like HLDA or HPAM. When importing the sample data files and running them using the cc.mallet.topics.tui.HierarchicalLDATUI
class, I get results, no problems.
When running the same on the Wikipedia article on WW2, after importing I get the following error:
...ANSWER
Answered 2018-Jan-25 at 22:19It took a while but I found the answer to the problem and it seems too simple.
HLDATUI considers files as documents, which means if there is only one file there are not enough documents and the program crashes. That means one has to import more than one file.
The solution to my personal situation is that I will write a program, which will split the .xml file I want to run HLDATUI on into multiple smaller files, which then can be imported and analyzed.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install hlda
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page