kandi background
kandi background
Explore Kits
kandi background
Explore Kits
Explore all Data Labeling open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in Data Labeling

Release 1.4.1

v1.7.0

v0.14.26

1.6.0

Pre-release for 0.3.0

label-studio

Release 1.4.1

cvat

v1.7.0

universal-data-tool

v0.14.26

semantic-segmentation-editor

1.6.0

scalabel

Pre-release for 0.3.0

Popular Libraries in Data Labeling

Trending New libraries in Data Labeling

Top Authors in Data Labeling

1

3 Libraries

1435

2

3 Libraries

8237

3

2 Libraries

9

4

2 Libraries

9

5

2 Libraries

3369

6

2 Libraries

46

7

2 Libraries

5

8

2 Libraries

50

9

2 Libraries

19

10

2 Libraries

5

1

3 Libraries

1435

2

3 Libraries

8237

3

2 Libraries

9

4

2 Libraries

9

5

2 Libraries

3369

6

2 Libraries

46

7

2 Libraries

5

8

2 Libraries

50

9

2 Libraries

19

10

2 Libraries

5

Trending Kits in Data Labeling

go-data-labelling

11 best Go Data Labelling

Go is a general purpose language developed by google. Go can be used to build server side applications, APIs and web services. Go is also used in machine learning and data science projects. In this article, I will list few of the best Golang data labelling libraries. Go vector space models package is built on top of gonum. This kit provides an implementation of some of the commonly used algorithms in natural language processing (NLP) like word2vec, doc2vec etc. With these libraries, you can convert your texts into vectors which can then be used as features in classification and regression models to solve text classification problems. A few of the most popular open source libraries for developers are: Parca - Continuous profiling for analysis of CPU, memory usage over time, and down to the line number. Saving infrastructure cost, improving performance, and increasing reliability; Etable - provides a DataTable / DataFrame structure in Go (golang), similar to pandas and xarray in Python, and Apache Arrow Table, using etensor n-dimensional columns aligned by common outermost row dimension. The following is a comprehensive list of the best open source libraries for Go data labelling:

Trending Discussions on Data Labeling

    How can I do this split process in Python?
    Replacing a character with a space and dividing the string into two words in R
    Azure ML FileDataset registers, but cannot be accessed for Data Labeling project

QUESTION

How can I do this split process in Python?

Asked 2021-Dec-30 at 14:06

I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.

What I've done so far is make this representation with the same enumerator class.

A solution using the column separately as a list would also be possible. But what would be the best way to resolve this?

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19

Results:

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38

Desired Results:

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55

ANSWER

Answered 2021-Dec-30 at 13:57

Instead of using Enum you can use a dict mapping. You can avoid loops if you flatten your dataframe:

copy icondownload icon

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
56out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
57                 + '_' + out.index.get_level_values(0)
58out = out.reset_index(drop=True)
59

Output:

copy icondownload icon

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
56out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
57                 + '_' + out.index.get_level_values(0)
58out = out.reset_index(drop=True)
59>>> out
60        Word       Ent
610    product   B_first
621        and   I_first
632      other   L_first
643    product  B_second
654        and  I_second
665     prices  L_second
676   product2   B_first
687        and   I_first
698      other   L_first
709     price2  B_second
7110     price   B_first
7211  product3  B_second
7312       and  I_second
7413     price  L_second
75

Source https://stackoverflow.com/questions/70532286

Community Discussions contain sources that include Stack Exchange Network

    How can I do this split process in Python?
    Replacing a character with a space and dividing the string into two words in R
    Azure ML FileDataset registers, but cannot be accessed for Data Labeling project

QUESTION

How can I do this split process in Python?

Asked 2021-Dec-30 at 14:06

I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.

What I've done so far is make this representation with the same enumerator class.

A solution using the column separately as a list would also be possible. But what would be the best way to resolve this?

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19

Results:

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38

Desired Results:

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55

ANSWER

Answered 2021-Dec-30 at 13:57

Instead of using Enum you can use a dict mapping. You can avoid loops if you flatten your dataframe:

copy icondownload icon

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
56out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
57                 + '_' + out.index.get_level_values(0)
58out = out.reset_index(drop=True)
59

Output:

copy icondownload icon

1import pandas as pd
2from enum import Enum
3
4
5df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
6df
7
8class Tipos(Enum):
9    B = 1
10    I = 2
11    L = 3
12
13for index, row in df.iterrows():
14    sentencas = row.values
15    for sentenca in sentencas:
16        for pos, palavra in enumerate(sentenca.split()):
17            print(f"{palavra} {Tipos(pos+1).name}")
18
19                first              second
200   product and other  product and prices
211  product2 and other              price2
222               price  product3 and price
23
24product B
25and I
26other L
27product B
28and I
29prices L
30product2 B
31and I
32other L
33price2 B
34price B
35product3 B
36and I
37price L
38        Word Ent
390    product B_first
401        and I_first
412      other L_first
423    product B_second
434        and I_second
445     prices L_second
456   product2 B_first
467        and I_first
478      other L_first
489     price2 B_second
4910     price B_first
5011  product3 B_second
5112       and I_second
5213     price L_second
53
54# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
55out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
56out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
57                 + '_' + out.index.get_level_values(0)
58out = out.reset_index(drop=True)
59>>> out
60        Word       Ent
610    product   B_first
621        and   I_first
632      other   L_first
643    product  B_second
654        and  I_second
665     prices  L_second
676   product2   B_first
687        and   I_first
698      other   L_first
709     price2  B_second
7110     price   B_first
7211  product3  B_second
7312       and  I_second
7413     price  L_second
75

Source https://stackoverflow.com/questions/70532286

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in Data Labeling

Tutorials and Learning Resources are not available at this moment for Data Labeling

Share this Page

share link

Get latest updates on Data Labeling

  • © 2022 Open Weaver Inc.