Wikiproject | Scripts for parsing and analysing locations on Wikipedia | Wiki library
kandi X-RAY | Wikiproject Summary
kandi X-RAY | Wikiproject Summary
Scripts for parsing and analysing locations on Wikipedia
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Scans the given langxml file .
- Counts the number of places in the list
- Handles end of an element
- Calculate the verbose relation .
- compares two lists
- Create an author network .
- loadazeteers list
- Merge two files
- Lists the difference between two dictionaries
- Updates the dictionary with the template information .
Wikiproject Key Features
Wikiproject Examples and Code Snippets
Community Discussions
Trending Discussions on Wikiproject
QUESTION
I am trying to recreate this list:
https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_GDP
with a Wikidata SPARQL query.
I can find states by population with this query
Additionally, the fields:
- population (P1082)
- GDP (P2131)
- And some extra ones, like unemployment (P1198)
are covered by the wikiproject economics, though only at the country level.
That said, seeing the "List of states and territories of the United States by GDP" article makes me think at least P2131 may be available at the state level.
I have tried the following query.
...ANSWER
Answered 2021-May-18 at 13:14Because of a Wikidata internal convention, I had to upload the GPD data in the items about the States' economies, that are linked through property P8744.
E.g., for the State of Maine you'll find the data in economy of Maine.
This is the correct query for obtaining what you want (test):
QUESTION
For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.
- hypertriglyceridemia:
['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity']
- enzyme inhibitor:
['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism']
- bypass surgery:
['Category:Surgery stubs', 'Category:Surgical procedures and techniques']
- perth:
['Category:1829 establishments in Australia', 'Category:Australian capital cities', 'Category:Metropolitan areas of Australia', 'Category:Perth, Western Australia', 'Category:Populated places established in 1829']
- climate:
['Category:Climate', 'Category:Climatology', 'Category:Meteorological concepts']
As you can see, the first three concepts belong to medical domain (whereas the remaining two terms are not medical terms).
More precisely, I want to divide my concepts as medical and non-medical. However, it is very difficult to divide the concepts using the categories alone. For example, even though the two concepts enzyme inhibitor
and bypass surgery
are in medical domain, their categories are very different to each other.
Therefore, I would like to know if there is a way to obtain the parent category
of the categories (for example, the categories of enzyme inhibitor
and bypass surgery
belong to medical
parent category)
I am currently using pymediawiki
and pywikibot
. However, I am not restricted to only those two libraries and happy to have solutions using other libraries as well.
EDIT
As suggested by @IlmariKaronen I am also using the categories of categories
and the results I got is as follows (The small font near the category
is the categories of the category
).
However, I still could not find a way to use these category details to decide if a given term is a medical or non-medical.
Moreover, as pointed by @IlmariKaronen using Wikiproject
details can be potential. However, it seems like the Medicine
wikiproject do not seem to have all the medical terms. Therefore we also need to check other wikiprojects as well.
EDIT:
My current code of extracting categories from wikipedia concepts is as follows. This could be done using pywikibot
or pymediawiki
as follows.
Using the librarary
pymediawiki
import mediawiki as pw
...
ANSWER
Answered 2019-Feb-16 at 00:56The question appears a little unclear to me and does not seem like a straightforward problem to solve and may require some NLP model. Also,the words concept and categories are interchangeably used. What I understand is that the concepts such as enzyme inhibitor, bypass surgery and hypertriglyceridimia need to be combined together as medical and the rest as non medical. This problem will require more data than just the category names. A corpus is required to train an LDA model(for instance) where the entire text information is fed to the algorithm and it returns the most likely topics for each of the concepts.
https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/
QUESTION
I recently found that wikipedia has Wikiprojects
that are categorised based on discipline
(https://en.wikipedia.org/wiki/Category:WikiProjects_by_discipline). As shown in the link it has 34 disciplines.
I would like to know if it is possible to get all the wikipedia articles that is related to each of these wikipedia disciplines
.
For example, consider WikiProject Computer science
. Is it possible to get all the computer science related wikipedia articles using WikiProject Computer science
category? If so, are there any data dumps related to it or is there any other way to obtain these data?
I am currently using python (i.e. pywikibot
and pymediawiki
). However, I am happy to receive answers in other languages as well.
I am happy to provide more details if needed.
...ANSWER
Answered 2019-Feb-17 at 09:43You can use API:Categorymembers to get the list of sub categories and pages. set "cmtype" parameter to "subcat" to get subcategories and "cmnamespace" to "0" to get articles.
Also you can get the list from database (category hierarchy information in categorylinks table and article information in page table)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Wikiproject
You can use Wikiproject like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page