Support
Quality
Security
License
Reuse
kandi has reviewed bert-cosine-sim and discovered the below as its top functions. This is intended to give you an instant insight into bert-cosine-sim implemented functionality, and help decide if they suit your requirements.
Fine-tune BERT to generate sentence embedding for cosine similarity
Model and Fine-tuning
class BertPairSim(BertPreTrainedModel):
def __init__(self, config, emb_size=1024):
super(BertPairSim, self).__init__(config)
self.emb_size = emb_size
self.bert = BertModel(config)
self.emb = nn.Linear(config.hidden_size, emb_size)
self.activation = nn.Tanh()
self.cos_fn = torch.nn.CosineSimilarity(dim=1, eps=1e-6)
self.apply(self.init_bert_weights)
def calcSim(self, emb1, emb2):
return self.cos_fn(emb1, emb2)
def forward(self, input_ids, attention_mask):
_, pooled_output = self.bert(input_ids, None, attention_mask,
output_all_encoded_layers=False)
emb = self.activation(self.emb(pooled_output))
return emb
QUESTION
number of matches for keywords in specified categories
Asked 2022-Apr-14 at 13:32For a large scale text analysis problem, I have a data frame containing words that fall into different categories, and a data frame containing a column with strings and (empty) counting columns for each category. I now want to take each individual string, check which of the defined words appear, and count them within the appropriate category.
As a simplified example, given the two data frames below, i want to count how many of each animal type appear in the text cell.
df_texts <- tibble(
text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the
grasshopper"),
mammals=NA,
reptiles=NA,
birds=NA,
insects=NA
)
df_animals <- tibble(animals=c("ape", "fox", "tortoise", "hare", "owl", "grasshopper"),
type=c("mammal", "mammal", "reptile", "mammal", "bird", "insect"))
So my desired result would be:
df_result <- tibble(
text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the
grasshopper"),
mammals=c(2,1,0),
reptiles=c(0,1,0),
birds=c(0,0,1),
insects=c(0,0,1)
)
Is there a straightforward way to achieve this keyword-matching-and-counting that would be applicable to a much larger dataset?
Thanks in advance!
ANSWER
Answered 2022-Apr-14 at 13:32Here's a way do to it in the tidyverse
. First look at whether strings in df_texts$text
contain animals, then count them and sum by text and type.
library(tidyverse)
cbind(df_texts[, 1], sapply(df_animals$animals, grepl, df_texts$text)) %>%
pivot_longer(-text, names_to = "animals") %>%
left_join(df_animals) %>%
group_by(text, type) %>%
summarise(sum = sum(value)) %>%
pivot_wider(id_cols = text, names_from = type, values_from = sum)
text bird insect mammal reptile
<chr> <int> <int> <int> <int>
1 "the ape and the fox" 0 0 2 0
2 "the owl and the the \n grasshopper" 1 0 0 0
3 "the tortoise and the hare" 0 0 1 1
To account for the several occurrences per text:
cbind(df_texts[, 1], t(sapply(df_texts$text, str_count, df_animals$animals, USE.NAMES = F))) %>%
setNames(c("text", df_animals$animals)) %>%
pivot_longer(-text, names_to = "animals") %>%
left_join(df_animals) %>%
group_by(text, type) %>%
summarise(sum = sum(value)) %>%
pivot_wider(id_cols = text, names_from = type, values_from = sum)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Explore Related Topics