Support
Quality
Security
License
Reuse
kandi has reviewed bert-cosine-sim and discovered the below as its top functions. This is intended to give you an instant insight into bert-cosine-sim implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Fine-tune BERT to generate sentence embedding for cosine similarity
QUESTION
number of matches for keywords in specified categories
Asked 2022-Apr-14 at 13:32For a large scale text analysis problem, I have a data frame containing words that fall into different categories, and a data frame containing a column with strings and (empty) counting columns for each category. I now want to take each individual string, check which of the defined words appear, and count them within the appropriate category.
As a simplified example, given the two data frames below, i want to count how many of each animal type appear in the text cell.
df_texts <- tibble(
text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the
grasshopper"),
mammals=NA,
reptiles=NA,
birds=NA,
insects=NA
)
df_animals <- tibble(animals=c("ape", "fox", "tortoise", "hare", "owl", "grasshopper"),
type=c("mammal", "mammal", "reptile", "mammal", "bird", "insect"))
So my desired result would be:
df_result <- tibble(
text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the
grasshopper"),
mammals=c(2,1,0),
reptiles=c(0,1,0),
birds=c(0,0,1),
insects=c(0,0,1)
)
Is there a straightforward way to achieve this keyword-matching-and-counting that would be applicable to a much larger dataset?
Thanks in advance!
ANSWER
Answered 2022-Apr-14 at 13:32Here's a way do to it in the tidyverse
. First look at whether strings in df_texts$text
contain animals, then count them and sum by text and type.
library(tidyverse)
cbind(df_texts[, 1], sapply(df_animals$animals, grepl, df_texts$text)) %>%
pivot_longer(-text, names_to = "animals") %>%
left_join(df_animals) %>%
group_by(text, type) %>%
summarise(sum = sum(value)) %>%
pivot_wider(id_cols = text, names_from = type, values_from = sum)
text bird insect mammal reptile
<chr> <int> <int> <int> <int>
1 "the ape and the fox" 0 0 2 0
2 "the owl and the the \n grasshopper" 1 0 0 0
3 "the tortoise and the hare" 0 0 1 1
To account for the several occurrences per text:
cbind(df_texts[, 1], t(sapply(df_texts$text, str_count, df_animals$animals, USE.NAMES = F))) %>%
setNames(c("text", df_animals$animals)) %>%
pivot_longer(-text, names_to = "animals") %>%
left_join(df_animals) %>%
group_by(text, type) %>%
summarise(sum = sum(value)) %>%
pivot_wider(id_cols = text, names_from = type, values_from = sum)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Find more information at:
Save this library and start creating your kit
See Similar Libraries in
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source