Support
Quality
Security
License
Reuse
kandi has reviewed universal-data-tool and discovered the below as its top functions. This is intended to give you an instant insight into universal-data-tool implemented functionality, and help decide if they suit your requirements.
Collaborate with others in real time, no sign up!
Usable on web or as Windows,Mac or Linux desktop application
Configure your project with an easy-to-use GUI
Easily create courses to train your labelers
Download/upload as easy-to-use CSV (sample.udt.csv) or JSON (sample.udt.json)
Support for Images, Videos, PDFs, Text, Audio Transcription and many other formats
Can be easily integrated into a React application
Annotate images or videos with classifications, tags, bounding boxes, polygons and points
Fast Automatic Smart Pixel Segmentation using WebWorkers and WebAssembly
Import data from Google Drive, Youtube, CSV, Clipboard and more
Annotate NLP datasets with Named Entity Recognition (NER), classification and Part of Speech (PoS) tagging.
Easily load into pandas or use with fast.ai
Runs with docker docker run -p 3000:3000 universaldatatool/universaldatatool
Runs with singularity singularity run universaldatatool/universaldatatool
No Code Snippets are available at this moment for universal-data-tool.
QUESTION
How can I do this split process in Python?
Asked 2021-Dec-30 at 14:06I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.
What I've done so far is make this representation with the same enumerator class.
A solution using the column separately as a list would also be possible. But what would be the best way to resolve this?
import pandas as pd
from enum import Enum
df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df
class Tipos(Enum):
B = 1
I = 2
L = 3
for index, row in df.iterrows():
sentencas = row.values
for sentenca in sentencas:
for pos, palavra in enumerate(sentenca.split()):
print(f"{palavra} {Tipos(pos+1).name}")
Results:
first second
0 product and other product and prices
1 product2 and other price2
2 price product3 and price
product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L
Desired Results:
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
ANSWER
Answered 2021-Dec-30 at 13:57Instead of using Enum
you can use a dict
mapping. You can avoid loops if you flatten your dataframe:
out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
+ '_' + out.index.get_level_values(0)
out = out.reset_index(drop=True)
Output:
>>> out
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
Explore Related Topics
Save this library and start creating your kit