By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
by fxsjy Python Version: Current License: MIT
by fxsjy Python Version: Current License: MIT
Reuse
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Stuttering Chinese word segmentation
See all related Code Snippets
QUESTION
Problem to extract NER subject + verb with spacy and Matcher
Asked 2021-Apr-26 at 17:44I work on an NLP project and i have to use spacy and spacy Matcher to extract all named entities who are nsubj (subjects) and the verb to which it relates : the governor verb of my NE nsubj. Example :
Georges and his friends live in Mexico City
"Hello !", says Mary
I'll need to extract "Georges" and "live" in the first sentence and "Mary" and "says" in the second one but i don't know how many words will be between my named entity and the verb to which it relate. So i decided to explore spacy Matcher more. So i'm struggling to write a pattern on Matcher to extract my 2 words. When the NE subj is before the verb, i get good results but i don't know how to write a pattern to match a NE subj after words which it correlates to. I could also, according to the guideline, do this task with "regular spacy" but i don't know how to do that. The problem with Matcher concerns the fact that i can't manage the type of dependency between the NE and VERB and grab the good VERB. I'm new with spacy, i've always worked with NLTK or Jieba (for chineese). I don't know even how to tokenize a text in sentence with spacy. But i chose to split the whole text in sentences to avoir bad matching between two sentences. Here is my code
import spacy
from nltk import sent_tokenize
from spacy.matcher import Matcher
nlp = spacy.load('fr_core_news_md')
matcher = Matcher(nlp.vocab)
def get_entities_verbs():
try:
# subjet before verb
pattern_subj_verb = [{'ENT_TYPE': 'PER', 'DEP': 'nsubj'}, {"POS": {'NOT_IN':['VERB']}, "DEP": {'NOT_IN':['nsubj']}, 'OP':'*'}, {'POS':'VERB'}]
# subjet after verb
# this pattern is not good
matcher.add('ent-verb', [pattern_subj_verb])
for sent in sent_tokenize(open('Le_Ventre_de_Paris-short.txt').read()):
sent = nlp(sent)
matches = matcher(sent)
for match_id, start, end in matches:
span = sent[start:end]
print(span)
except Exception as error:
print(error)
def main():
get_entities_verbs()
if __name__ == '__main__':
main()
Even if it's french, i can assert you that i get good results
Florent regardait
Lacaille reparut
Florent baissait
Claude regardait
Florent resta
Florent, soulagé
Claude s’était arrêté
Claude en riait
Saget est matinale, dit
Florent allait
Murillo peignait
Florent accablé
Claude entra
Claude l’appelait
Florent regardait
Florent but son verre de punch ; il le sentit
Alexandre, dit
Florent levait
Claude était ravi
Claude et Florent revinrent
Claude, les mains dans les poches, sifflant
I have some wrong results but 90% is good. I just need to grab the first ans last word of each line to have my couple NE/verb. So my question is. How to extract NE when NE is subj with the verb which it correlates to with Matcher or simply how to do that with spacy (not Matcher) ? There are to many factors to be taken into account. Do you have a method to get the best results as possible even if 100% is not possible. I need a pattern matching VERB governor + NER subj after from this pattern:
pattern = [
{
"RIGHT_ID": "person",
"RIGHT_ATTRS": {"ENT_TYPE": "PERSON", "DEP": "nsubj"},
},
{
"LEFT_ID": "person",
"REL_OP": "<",
"RIGHT_ID": "verb",
"RIGHT_ATTRS": {"POS": "VERB"},
}
]
All credit to polm23 for this pattern
ANSWER
Answered 2021-Apr-26 at 05:05This is a perfect use case for the Dependency Matcher. It also makes things easier if you merge entities to single tokens before running it. This code should do what you need:
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
# merge entities to simplify this
nlp.add_pipe("merge_entities")
pattern = [
{
"RIGHT_ID": "person",
"RIGHT_ATTRS": {"ENT_TYPE": "PERSON", "DEP": "nsubj"},
},
{
"LEFT_ID": "person",
"REL_OP": "<",
"RIGHT_ID": "verb",
"RIGHT_ATTRS": {"POS": "VERB"},
}
]
matcher = DependencyMatcher(nlp.vocab)
matcher.add("PERVERB", [pattern])
texts = [
"John Smith and some other guy live there",
'"Hello!", says Mary.',
]
for text in texts:
doc = nlp(text)
matches = matcher(doc)
for match in matches:
match_id, (start, end) = match
# note order here is defined by the pattern, so the nsubj will be first
print(doc[start], "::", doc[end])
print()
Check out the docs for the DependencyMatcher.
QUESTION
Docker AWS Elastic beanstalk no error in local machine docker build but spacy NLP hanging forever when put on server
Asked 2020-Jul-08 at 09:45Like the title. I have tested my docker on my local machine using docker build -t container-name and everything worked fine without any errors. Once I uploaded to beanstalk via CLI EB it fails. I have figured that there is one part where I run spacy's chinese NLP where it fails. Everything else is working fine but there seem to be no errors in the logs or anything unusual I can tell to understand how to debug this.
I have tried every possibility and looked through the web to no avail. There is one time when the full logs from the EB showed 'memoryerror' which I cannot recreate under any circumstance but that is all the clue I have. Here are the logs:
> ---------------------------------------- /var/log/eb-docker/containers/eb-current-app/eb-29f07434a6e4-stdouterr.log
> ----------------------------------------
> 172.31.43.156 - - [07/Jul/2020 17:24:29] "POST /food_autocomplete HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:30] "POST /food_autocomplete HTTP/1.1" 200 -
> 172.31.4.206 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:34] "POST /food_nutrient_modal HTTP/1.1" 200 -
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator CountVectorizer from version
> 0.21.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator TfidfTransformer from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator LogisticRegression from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator Pipeline from version 0.21.2
> when using version 0.23.1. This might lead to breaking code or invalid
> results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator CountVectorizer from version
> 0.21.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator TfidfTransformer from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator LogisticRegression from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator Pipeline from version 0.21.2
> when using version 0.23.1. This might lead to breaking code or invalid
> results. Use at your own risk. warnings.warn(
> 172.31.43.156 - - [07/Jul/2020 17:24:36] "POST /smart_suggestions HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:38] "POST /food_nutrient_modal HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:39] "POST /addfood HTTP/1.1" 204 -
> 172.31.43.156 - - [07/Jul/2020 17:24:40] "POST /food_table_generate HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:42] "POST /personal_submit HTTP/1.1" 204 -
> /usr/local/lib/python3.8/site-packages/spacy/util.py:271: UserWarning:
> [W031] Model 'en_Reported_outcome_NLP' (0.0.0) requires spaCy v2.2 and
> is incompatible with the current spaCy version (2.3.0). This may lead
> to unexpected results or runtime errors. To resolve this, download a
> newer compatible model or retrain your custom model with the current
> spaCy version. For more details and available updates, run: python -m
> spacy validate warnings.warn(warn_msg)
> 172.31.4.206 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:47] "POST /food_nlp_onthefly HTTP/1.1" 200 - Building prefix dict from the default dictionary ...
> Dumping model to file cache /tmp/jieba.cache Loading model cost 0.990
> seconds. Prefix dict has been built successfully. * Serving Flask app
> "base" (lazy loading) * Environment: development * Debug mode: off
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator CountVectorizer from version
> 0.21.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator TfidfTransformer from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/utils/deprecation.py:143:
> FutureWarning: The sklearn.linear_model.logistic module is deprecated
> in version 0.22 and will be removed in version 0.24. The corresponding
> classes / functions should instead be imported from
> sklearn.linear_model. Anything that cannot be imported from
> sklearn.linear_model is now part of the private API.
> warnings.warn(message, FutureWarning)
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator LogisticRegression from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator Pipeline from version 0.21.2
> when using version 0.23.1. This might lead to breaking code or invalid
> results. Use at your own risk. warnings.warn( * Running on
> http://0.0.0.0:5000/ (Press CTRL+C to quit)
> 172.31.4.206 - - [07/Jul/2020 17:23:47] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:23:47] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:23:47] "GET / HTTP/1.1" 200 -
> 172.31.4.206 - - [07/Jul/2020 17:24:02] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:02] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:02] "GET / HTTP/1.1" 200 -
> 172.31.4.206 - - [07/Jul/2020 17:24:17] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:17] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:17] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:21] "GET /personal HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:21] "POST /usertimezone HTTP/1.1" 204 -
> 172.31.43.156 - - [07/Jul/2020 17:24:22] "GET /static/img/favicon.png HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:29] "POST /food_autocomplete HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:30] "POST /food_autocomplete HTTP/1.1" 200 -
> 172.31.4.206 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:32] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:34] "POST /food_nutrient_modal HTTP/1.1" 200 -
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator CountVectorizer from version
> 0.21.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator TfidfTransformer from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator LogisticRegression from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator Pipeline from version 0.21.2
> when using version 0.23.1. This might lead to breaking code or invalid
> results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator CountVectorizer from version
> 0.21.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator TfidfTransformer from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator LogisticRegression from
> version 0.21.2 when using version 0.23.1. This might lead to breaking
> code or invalid results. Use at your own risk. warnings.warn(
> /usr/local/lib/python3.8/site-packages/sklearn/base.py:329:
> UserWarning: Trying to unpickle estimator Pipeline from version 0.21.2
> when using version 0.23.1. This might lead to breaking code or invalid
> results. Use at your own risk. warnings.warn(
> 172.31.43.156 - - [07/Jul/2020 17:24:36] "POST /smart_suggestions HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:38] "POST /food_nutrient_modal HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:39] "POST /addfood HTTP/1.1" 204 -
> 172.31.43.156 - - [07/Jul/2020 17:24:40] "POST /food_table_generate HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:42] "POST /personal_submit HTTP/1.1" 204 -
> /usr/local/lib/python3.8/site-packages/spacy/util.py:271: UserWarning:
> [W031] Model 'en_Reported_outcome_NLP' (0.0.0) requires spaCy v2.2 and
> is incompatible with the current spaCy version (2.3.0). This may lead
> to unexpected results or runtime errors. To resolve this, download a
> newer compatible model or retrain your custom model with the current
> spaCy version. For more details and available updates, run: python -m
> spacy validate warnings.warn(warn_msg)
> 172.31.4.206 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.27.83 - - [07/Jul/2020 17:24:47] "GET / HTTP/1.1" 200 -
> 172.31.43.156 - - [07/Jul/2020 17:24:47] "POST /food_nlp_onthefly HTTP/1.1" 200 - Building prefix dict from the default dictionary ...
> Dumping model to file cache /tmp/jieba.cache Loading model cost 0.990
> seconds. Prefix dict has been built successfully.
>
>
> ---------------------------------------- /var/log/docker-events.log
> ---------------------------------------- 2020-07-07T17:08:43.222801211Z image pull python:3.8.2-buster
> name=agitated_hugle) 2020-07-07T17:09:46.255907751Z container create
> 5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3
> (image=sha256:938ea8d987340203ade469ba4cbf79ea7af3626bc521c6e4667959b700877892,
> name=interesting_vaughan) 2020-07-07T17:09:46.256131915Z container
> attach
> 5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3
> (image=sha256:938ea8d987340203ade469ba4cbf79ea7af3626bc521c6e4667959b700877892,
> name=interesting_vaughan) 2020-07-07T17:09:46.295742501Z network
> connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3,
> name=bridge, type=bridge) 2020-07-07T17:09:46.734903605Z container
> start 5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3
> (image=sha256:938ea8d987340203ade469ba4cbf79ea7af3626bc521c6e4667959b700877892,
> name=interesting_vaughan) 2020-07-07T17:09:49.122315926Z container die
> 5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3
> (exitCode=0,
> image=sha256:938ea8d987340203ade469ba4cbf79ea7af3626bc521c6e4667959b700877892,
> name=interesting_vaughan) 2020-07-07T17:09:49.171805355Z network
> disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3,
> name=bridge, type=bridge) 2020-07-07T17:09:49.813444357Z container
> destroy
> 5e3269e6db83eb7da60b4f011e0038ee847a83a16200482885ccdfc1ee53f4b3
> (image=sha256:938ea8d987340203ade469ba4cbf79ea7af3626bc521c6e4667959b700877892,
> name=interesting_vaughan) 2020-07-07T17:09:49.831035003Z container
> create
> 55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed
> (image=sha256:8c7a43e1019f42bc7153ec0b4d142740110250db72ae1475d44d3fa9d16e771d,
> name=friendly_goldstine) 2020-07-07T17:09:49.831391340Z container
> attach
> 55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed
> (image=sha256:8c7a43e1019f42bc7153ec0b4d142740110250db72ae1475d44d3fa9d16e771d,
> name=friendly_goldstine) 2020-07-07T17:09:49.884733812Z network
> connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed,
> name=bridge, type=bridge) 2020-07-07T17:09:50.290165549Z container
> start 55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed
> (image=sha256:8c7a43e1019f42bc7153ec0b4d142740110250db72ae1475d44d3fa9d16e771d,
> name=friendly_goldstine) 2020-07-07T17:09:53.138031459Z container die
> 55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed
> (exitCode=0,
> image=sha256:8c7a43e1019f42bc7153ec0b4d142740110250db72ae1475d44d3fa9d16e771d,
> name=friendly_goldstine) 2020-07-07T17:09:53.195435239Z network
> disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed,
> name=bridge, type=bridge) 2020-07-07T17:09:53.294312475Z container
> destroy
> 55647e02051fdd853af507483370a89e29e916ef618c0f8dff83b88054d157ed
> (image=sha256:8c7a43e1019f42bc7153ec0b4d142740110250db72ae1475d44d3fa9d16e771d,
> name=friendly_goldstine) 2020-07-07T17:09:53.313349292Z container
> create
> c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0
> (image=sha256:f973e6783f1cb4b492fdf837be75718d5cd396707432a9dbebab795c7a8f1c29,
> name=crazy_cartwright) 2020-07-07T17:09:53.313571816Z container attach
> c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0
> (image=sha256:f973e6783f1cb4b492fdf837be75718d5cd396707432a9dbebab795c7a8f1c29,
> name=crazy_cartwright) 2020-07-07T17:09:53.362840973Z network connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0,
> name=bridge, type=bridge) 2020-07-07T17:09:53.774896296Z container
> start c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0
> (image=sha256:f973e6783f1cb4b492fdf837be75718d5cd396707432a9dbebab795c7a8f1c29,
> name=crazy_cartwright) 2020-07-07T17:10:30.678618014Z container die
> c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0
> (exitCode=0,
> image=sha256:f973e6783f1cb4b492fdf837be75718d5cd396707432a9dbebab795c7a8f1c29,
> name=crazy_cartwright) 2020-07-07T17:10:30.740972059Z network
> disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0,
> name=bridge, type=bridge) 2020-07-07T17:10:31.134171606Z container
> destroy
> c3c43c4db5b583beb1fbe2161bd73b7a6ea4411b58366f4f043db51b6e2412b0
> (image=sha256:f973e6783f1cb4b492fdf837be75718d5cd396707432a9dbebab795c7a8f1c29,
> name=crazy_cartwright) 2020-07-07T17:10:33.387455304Z container create
> d7445fa259c0281ef92909bae68e0ef640122c3c8aadc8523ccd0ab3168cb439
> (image=sha256:bbe93b4133f847fb8b7217d07fea76d7954db093dced90575f5904437b41bf3e,
> name=angry_hermann) 2020-07-07T17:10:33.476870016Z container destroy
> d7445fa259c0281ef92909bae68e0ef640122c3c8aadc8523ccd0ab3168cb439
> (image=sha256:bbe93b4133f847fb8b7217d07fea76d7954db093dced90575f5904437b41bf3e,
> name=angry_hermann) 2020-07-07T17:10:33.494062931Z container create
> bc23ff300c23cff0b00d12faab49a790a21b9e3bdc8879039afd236163de3cca
> (image=sha256:e4dac3e5b2864c76b1be5c0a9931a23c29d30896eb3deae344dce336ddbb11e5,
> name=admiring_driscoll) 2020-07-07T17:10:33.573500812Z container
> destroy
> bc23ff300c23cff0b00d12faab49a790a21b9e3bdc8879039afd236163de3cca
> (image=sha256:e4dac3e5b2864c76b1be5c0a9931a23c29d30896eb3deae344dce336ddbb11e5,
> name=admiring_driscoll) 2020-07-07T17:10:33.592409924Z image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/staging-app:latest) 2020-07-07T17:10:36.173605128Z
> container create
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:10:36.223247839Z
> network connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:10:36.637664357Z container
> start 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:10:41.964896576Z
> image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/current-app:latest) 2020-07-07T17:10:42.012166449Z
> image untag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536)
> 2020-07-07T17:10:42.740687237Z image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/current-app:latest) 2020-07-07T17:10:42.836810391Z
> image tag
> sha256:4f7cd4269fa9900fe43f5c0db2267926ee972cac6cec74a92b9136e49f8b3489
> (name=python:3.8.2-buster) 2020-07-07T17:16:40.255804571Z container
> die 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (exitCode=137, image=a01459d6ad26, name=goofy_easley)
> 2020-07-07T17:16:40.376674654Z network disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:16:43.031775619Z network
> connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:16:43.789437781Z container
> start 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:17:07.998859712Z
> container die
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (exitCode=137, image=a01459d6ad26, name=goofy_easley)
> 2020-07-07T17:17:08.142891259Z network disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:17:10.700159940Z network
> connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:17:11.346017877Z container
> start 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:23:30.234526457Z
> image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/current-app:latest) 2020-07-07T17:23:30.281059141Z
> image tag
> sha256:4f7cd4269fa9900fe43f5c0db2267926ee972cac6cec74a92b9136e49f8b3489
> (name=python:3.8.2-buster) 2020-07-07T17:23:32.677604215Z image pull
> python:3.8.2-buster (name=python) 2020-07-07T17:23:33.765347130Z image
> tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/staging-app:latest) 2020-07-07T17:23:34.638892876Z
> image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/staging-app:latest) 2020-07-07T17:23:34.721945923Z
> container create
> 29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97
> (image=a01459d6ad26, name=pensive_taussig)
> 2020-07-07T17:23:34.777140048Z network connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97,
> name=bridge, type=bridge) 2020-07-07T17:23:35.225632098Z container
> start 29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97
> (image=a01459d6ad26, name=pensive_taussig)
> 2020-07-07T17:23:40.730637840Z container kill
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley, signal=15)
> 2020-07-07T17:23:50.749346961Z container kill
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley, signal=9)
> 2020-07-07T17:23:51.007134732Z container die
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (exitCode=137, image=a01459d6ad26, name=goofy_easley)
> 2020-07-07T17:23:51.064061323Z network disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e,
> name=bridge, type=bridge) 2020-07-07T17:23:51.079435605Z container
> stop 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:23:51.254045082Z
> container destroy
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> (image=a01459d6ad26, name=goofy_easley) 2020-07-07T17:23:51.303433852Z
> image tag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=aws_beanstalk/current-app:latest) 2020-07-07T17:23:51.350189423Z
> image untag
> sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536
> (name=sha256:a01459d6ad267dabc254bb7b7cd75af0179b81cfdf6762d9bfa13691b2f2d536)
> 2020-07-07T17:25:13.302167787Z container die
> 29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97
> (exitCode=137, image=a01459d6ad26, name=pensive_taussig)
> 2020-07-07T17:25:13.478841157Z network disconnect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97,
> name=bridge, type=bridge) 2020-07-07T17:25:16.050466851Z network
> connect
> 0313ee36f6c330fb7f0435db1997443faa3aec1cf0aa895a68bce628065eece5
> (container=29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97,
> name=bridge, type=bridge) 2020-07-07T17:25:16.806548828Z container
> start 29f07434a6e45cb922cdde01f615fadb13962602050e7330821ca7efe4234e97
> (image=a01459d6ad26, name=pensive_taussig)
>
>
> ---------------------------------------- /var/log/docker
> ---------------------------------------- Jul 7 17:08:13 ip-172-31-2-67 docker: time="2020-07-07
> type="*events.TaskDelete" Jul 7 17:17:07 ip-172-31-2-67 docker:
> time="2020-07-07T17:17:07.984841127Z" level=info msg="ignoring event"
> module=libcontainerd namespace=moby topic=/tasks/delete
> type="*events.TaskDelete" Jul 7 17:23:50 ip-172-31-2-67 docker:
> time="2020-07-07T17:23:50.730877626Z" level=info msg="Container
> 65f83c0b3683725ef5c6e633ef3302f8ee853e4edd5478c900a2a27086421a7e
> failed to exit within 10 seconds of signal 15 - using the force" Jul
> 7 17:23:51 ip-172-31-2-67 docker:
> time="2020-07-07T17:23:51.007035781Z" level=info msg="ignoring event"
> module=libcontainerd namespace=moby topic=/tasks/delete
> type="*events.TaskDelete" Jul 7 17:25:13 ip-172-31-2-67 docker:
> time="2020-07-07T17:25:13.295457935Z" level=info msg="ignoring event"
> module=libcontainerd namespace=moby topic=/tasks/delete
> type="*events.TaskDelete"
>
>
>
> ---------------------------------------- /var/log/nginx/error.log
> ---------------------------------------- 2020/07/07 17:16:38 [error] 5596#0: *144 upstream prematurely closed connection while reading
> response header from upstream, client: 172.31.43.156, server: ,
> request: "POST /food_nlp_onthefly HTTP/1.1", upstream:
> "http://172.17.0.2:5000/food_nlp_onthefly", host:
> "healthbook.humango.co", referrer:
> "https://healthbook.humango.co/personal" 2020/07/07 17:16:39 [error]
> 5596#0: *131 upstream prematurely closed connection while reading
> response header from upstream, client: 172.31.43.156, server: ,
> request: "POST /food_nlp_onthefly HTTP/1.1", upstream:
> "http://172.17.0.2:5000/food_nlp_onthefly", host:
> "healthbook.humango.co", referrer:
> "https://healthbook.humango.co/personal" 2020/07/07 17:16:47 [error]
> 6480#0: *1 connect() failed (111: Connection refused) while connecting
> to upstream, client: 172.31.4.206, server: , request: "GET /
> HTTP/1.1", upstream: "http://172.17.0.2:5000/", host: "172.31.2.67"
> 2020/07/07 17:16:47 [error] 6480#0: *3 connect() failed (111:
> Connection refused) while connecting to upstream, client:
> 172.31.43.156, server: , request: "GET / HTTP/1.1", upstream: "http://172.17.0.2:5000/", host: "172.31.2.67" 2020/07/07 17:16:47
> [error] 6480#0: *5 connect() failed (111: Connection refused) while
> connecting to upstream, client: 172.31.27.83, server: , request: "GET
> / HTTP/1.1", upstream: "http://172.17.0.2:5000/", host: "172.31.2.67"
> 2020/07/07 17:17:07 [error] 6480#0: *7 upstream prematurely closed
> connection while reading response header from upstream, client:
> 172.31.43.156, server: , request: "POST /food_table_generate HTTP/1.1", upstream: "http://172.17.0.2:5000/food_table_generate",
> host: "healthbook.humango.co", referrer:
> "https://healthbook.humango.co/personal"
ANSWER
Answered 2020-Jul-08 at 09:45Just for anyone who somehow has the same problem:
The problem for me was that it worked on my local machine but not on AWS EB but without errors. The problem was the memoryerror mentioned above. I was using a free tier hence my memory limit was at 1gb and AWS EB crashes once you exceed that limit.
There are two ways to fix it that is quite obvious but was not obvious to me in the first place:
I did the latter and the problem was solved.
Some useful commands to help you debug:
eb health
to check on the memory cpu usage of your AWS EB
docker stats container-name
this lets you check on the memory usage in your docker in your local machine. I hope this helps. I was quite hopeless since I couldn't find any clues to (without any error log)
Cheers!
QUESTION
Chinese segmentation selection in model loading in Spacy 2.4 release
Asked 2020-Jun-30 at 09:17For the Chinese model loading, how can I load all the models while still be able to set the pkuseg and jieba settings?
nlp = Chinese() # Disable jieba through tokenizer config options
cfg = {"use_jieba": False}
nlp = Chinese(meta={"tokenizer": {"config": cfg}})
The 'nlp' created by Chinese() doesn't have other models besides segmentation models. This can only load the segmenter models. If I do this to get the 'nlp' object:
nlp = spacy.load('zh_core_web_sm')
This loads all the models. However, how can I control the pkuseg or jieba parameters in this load function?
ANSWER
Answered 2020-Jun-30 at 09:17You don't want to modify the segmentation setup in the loaded model.
It's technically possible to switch the loaded model from pkuseg to jieba, but if you do that, the model components will perform terribly because they've only been trained on pkuseg segmentation.
QUESTION
How to implement parallel process on huge dataframe
Asked 2020-May-04 at 23:53Now, I have one huge dataframe "all_in_one",
all_in_one.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8271066 entries, 0 to 8271065
Data columns (total 3 columns):
label int64
text object
type int64
dtypes: int64(2), object(1)
memory usage: 189.3+ MB
all_in_one.sample(2)
import jieba
import re
def jieba_cut(text):
text_cut = list(filter(lambda x: re.match("\w", x),
jieba.cut(text)))
return text_cut
%%time
all_in_one['seg_text'] = all_in_one.apply(lambda x:jieba_cut(x['text']),axis = 1)
CPU times: user 1h 18min 14s, sys: 55.3 s, total: 1h 19min 10s
Wall time: 1h 19min 10s
This process total consumed more than 1 hour. I want to parallel execute word segmentation on dataframe and reduce running time. please leave some message.
EDIT:
Amazing, when i used dask to implement the function above.
all_in_one_dd = dd.from_pandas(all_in_one, npartitions=10)
%%time
all_in_one_dd.head()
CPU times: user 4min 10s, sys: 2.98 s, total: 4min 13s
Wall time: 4min 13s
ANSWER
Answered 2019-Jul-04 at 03:35I'd suggest if you're working with pandas and want to work on some form of parallel processing, I invite you to use dask
. It's a Python package that has the same API as pandas
dataframes, so in your example, if you have a csv file called file.csv
, you can do something like:
You'll have to do some setup for a dask Client and tell it how many workers you want and how many cores to use.
import dask.dataframe as dd
from dask.distributed import Client
import jieba
def jieba_cut(text):
text_cut = list(filter(lambda x: re.match("\w", x),
jieba.cut(text)))
return text_cut
client = Client() # by default, it creates the same no. of workers as cores on your local machine
all_in_one = dd.read_csv('file.csv') # This has almost the same kwargs as a pandas.read_csv
all_in_one = all_in_one.apply(jieba_cut) # This will create a process map
all_in_one = all_in_one.compute() # This will execute all the processes
Fun thing is you can actually access a dashboard to see all the processes done by dask (i think by default it's localhost:8787
)
QUESTION
Why python program runs with different result between Linux shell and Jenkins job
Asked 2020-Mar-09 at 16:43I have a python program, which should run in Jenkins job
. But I got below error:
Started by user admin
Building in workspace /var/lib/jenkins/workspace/automatic_test
[automatic_test] $ /bin/sh -xe /tmp/jenkins257763233971180370.sh
+ cd /ext/data/nlu_test/ifly/job123
+ python3 bleu.py Zh ref_zh.txt translation_zh.txt
Traceback (most recent call last):
File "bleu.py", line 5, in <module>
import jieba
ImportError: No module named 'jieba'
Build step 'Execute shell' marked build as failure
Finished: FAILURE
While I run the same commands on Linux shell
, then it runs normally as below. Why?
[jenkins@localhost ~]$ cd /ext/data/nlu_test/ifly/job123
[jenkins@localhost job123]$ ls
bleu.py input.orig.txt input.trans.txt input.txt ref_zh.txt splitText.py translation_zh.txt
[jenkins@localhost job123]$ python3 bleu.py Zh ref_zh.txt translation_zh.txt
W0310 00:06:37.430938 295363 init.cc:157] AVX is available, Please re-compile on local machine
Paddle enabled successfully......
reference 1: 3615
candidate: 3493
score: 31.8288254543782
[jenkins@localhost job123]$
And I installed the python package jieba
already as below.
[root@localhost ~]# pip3 install jieba
Requirement already satisfied: jieba in /usr/local/lib/python3.7/site-packages (0.42.1)
[root@localhost ~]# pip install jieba
Requirement already satisfied: jieba in /usr/local/lib/python3.7/site-packages (0.42.1)
ANSWER
Answered 2020-Mar-09 at 16:39You can use absolute path of Python for executing the script in Jenkins.
Example: /usr/bin/python3 bleu.py Zh ref_zh.txt translation_zh.txt
QUESTION
Calculate tangent for each point of the curve python in matplotlib
Asked 2019-Oct-23 at 04:28I made one curve with a series of point. I want to calculate gradient of the jieba curve.
plt.loglog(jieba_ranks, jieba_counts, linestyle='-', label='jieba')
plt.loglog([1,jieba_counts[0]],[jieba_counts[0],1],color='r', linestyle='--', label='zipf\'s law a =1')
plt.legend(loc="best")
plt.title("Zipf plot for jieba")
plt.xlabel("Frequency rank of token")
plt.ylabel("Absolute frequency of token")
plt.grid(True,axis="y",ls="-", which="both")
slope_Y = np.gradient(np.log(jieba_counts), np.log(jieba_ranks))
fig1, ax1 = plt.subplots()
ax1.plot(np.log(jieba_ranks), slope_Y)
But, the gradient curve created didn't describe the relationship between the zipf and the jieba. Maybe, I need calculate the distance of each point on zipf and jieba.
ANSWER
Answered 2019-Oct-22 at 14:02numpy
makes gradient
available, this function would probably be useful for solving your problem
if you add data/code to the question I can try and suggest something more sensible!
QUESTION
skmultiLearn classifiers predictions always return 0
Asked 2019-Oct-16 at 08:41I'm pretty new with skmultiLearn, now I use this for 'Chinese' documents multiple label classification. The training dataset is quite small(like 200 sentences), and I set 6 classes totally. Even I use sentence IN training dataset, I can only got [0,0,0,0,0,0] as the prediction result, can I get some help with this? Thanks!
My code:
# Import BinaryRelevance from skmultilearn
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.svm import SVC
from scipy import sparse
import jieba
import codecs
import numpy as np
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
Q_list = []
L_list = []
# Read Sentence file
with codecs.open('multi-label-Q.txt',encoding='utf-8') as infile:
for line in infile:
Q_list.append(line[:-2])
infile.close()
# Read Label file
with open('multi-label-L.txt') as infile:
for line in infile:
tmp_l = line[:-1].split(',')
L_list.append(tmp_l)
infile.close()
L_list = np.array(L_list)
L_Question_list = []
# Preprocess for Chinese sentences
for line in Q_list:
seg_list = jieba.lcut(line, cut_all=False)
q_addSpace = ''
for w in seg_list:
q_addSpace = q_addSpace + w + ' '
L_Question_list.append(q_addSpace[:-1])
cv = CountVectorizer()
cv_fit=cv.fit_transform(L_Question_list)
transformer = TfidfTransformer()
tfidf = transformer.fit_transform(cv_fit)
M = sparse.lil_matrix((len(L_list),6), dtype=int)
for i,row in enumerate(L_list):
count = 0
for col in row:
M[i, count] = col
count += 1
# Setup the classifier
clf = BinaryRelevance(classifier=SVC())
# Train
clf.fit(tfidf, M)
# A sentence in train dataset
x_test = '偏头痛多发于什么年龄层?'
# Preprocess for Chinese sentence
seg_list = jieba.lcut(x_test, cut_all=False)
q_addSpace = ''
for w in seg_list:
q_addSpace = q_addSpace + w + ' '
X_test = [q_addSpace]
cv_fit2=cv.transform(X_test)
tfidf2 = transformer.transform(cv_fit2)
# Predict
pred = clf.predict(tfidf2)
print(pred.todense())
ANSWER
Answered 2019-Oct-16 at 08:41Now I got it, the reason is I have too much single labelled data.
I used some high value dataset and got the correct result.
So, the answer is: polish the dataset.
QUESTION
Python 3 cannot find a module
Asked 2019-Sep-11 at 11:36I am unable to install a module called 'jieba' in Python 3 which is running in Jupyter Notebook 6.0.0. I keep getting ModuleNotFoundError: No module named 'jieba'
after trying these methods:
1. import jieba
2. pip3 install jieba
Can anyone help? Thank you.
ANSWER
Answered 2019-Sep-11 at 11:36pip3
in the terminal is almost certainly installing your package into a different Python installation.
Rather than have you hunt around for the right installation, you can use Python and Jupyter themselves to ensure you are using the correct Python binary. This relies on three tricks:
You can execute the pip
command-line tool as a module by running python -m pip ...
. This uses the pip
module installed for the python
command, so you don't have to verify what python installation the pip3
command is tied to.
You can get the path of the current Python interpreter with the sys.executable
attribute.
You can execute shell commands from Jupyter notebooks by prefixing the shell command with !
, and you can insert values generated with Python code with {expression}
You can combine these to run pip
(to install packages or run other commands) against the current Python installation, from Jupyter itself, by adding this into a cell:
import sys
!{sys.executable} -m pip <pip command line options>
To install your jieba
package, that makes:
import sys
!{sys.executable} -m pip install jieba
If you are using Anaconda, then you could also install the conda package for jieba; the package does not require any platform-specific dependencies or compilation, but it may be more convenient for you or necessary to install other packages that do have such requirements and have pre-compiled conda packages.
In that case, tell the conda
command about your Python executable:
import sys
!conda install --yes --prefix {sys.prefix} <package name>
QUESTION
ModuleNotFoundError: No module named 'jieba'
Asked 2019-Aug-31 at 03:53When I run my code on Pycharm,it works well.However,when I use "python [my_code_file_name].py" to run code on windows shell,the system says that no module found to run,could anyone help me to solve this?Thanks.
the project intepreter path is:
C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\python.exe
when I search some methods,I have tried this to add in my code:
import sys
sys.path.append("C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python37-32\\python.exe")
but still not working.
besides,I run my code on Pycharm,it works well.
and when I run "python [my_code_file_name].py" on windows shell,it shows the message below:
Traceback (most recent call last):
File "main.py", line 4, in <module>
import jieba.analyse
ModuleNotFoundError: No module named 'jieba'
and my project works not on virtual environment,the intepreter is the package which I downloaded from python official website and manually installed on my computer.
ANSWER
Answered 2019-Aug-31 at 03:45Are you using the same python that's being used by your project interpreter? Try
C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\python.exe [my_code_file_name].py
Or you could modify system variables to prefer executables in C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32
, then when your system searches for python
it'll hit that directory first. Is that a virtual environment? There may be a way to activate that environment so that you don't have to type the full path of the executable or modify system variables yourself.
Your attempt to modify your sys.path isn't correct; sys.path entries should point to a directory containing python module, not to a python executable. You should have done this:
sys.path.append("C:\\path\\to\\the\\module\\that\\you\\want")
where C:\path\to\the\module\that\you\want\jieba
would be the directory containing the jieba
module (e.g. you point your path to the parent directory). Your custom python path looks like you may have several versions of python installed on the system, so simply modifying your sys.path probably won't work if your default python is a different version from your project's python. I wouldn't recommend doing that. Ideally, activate the environment if possible, or invoke whichever executable that you want directly.
QUESTION
POS tagging and NER for Chinese Text with Spacy
Asked 2019-Aug-14 at 02:20But I am getting empty tuples for the entities and no results for pos_.
from spacy.lang.zh import Chinese
nlp = Chinese()
doc = nlp(u"蘋果公司正考量用一億元買下英國的新創公司")
doc.ents
# returns (), i.e. empty tuple
for word in doc:
print(word.text, word.pos_)
''' returns
蘋果
公司
正
考量
用
一
億元
買
下
英國
的
新創
公司
'''
I am new to NLP. I want to know what is the correct way to do ?
ANSWER
Answered 2019-Aug-12 at 07:05Unfortunately, spaCy does not have a pretrained Chinese model yet (see here), which means you have to use the default Chinese()
model which only performs tokenization, and no POS tagging or entity recognition.
There is definitely some work in progress around Chinese for spaCy though, check the issues here.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
No Installation instructions are available at this moment for jieba.Refer to component home page for details.
Save this library and start creating your kit
SSH
git@github.com:fxsjy/jieba.git
Share this Page
See Similar Libraries in
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source