kandi has reviewed bert-cosine-sim and discovered the below as its top functions. This is intended to give you an instant insight into bert-cosine-sim implemented functionality, and help decide if they suit your requirements.
Fine-tune BERT to generate sentence embedding for cosine similarity
Model and Fine-tuning
class BertPairSim(BertPreTrainedModel): def __init__(self, config, emb_size=1024): super(BertPairSim, self).__init__(config) self.emb_size = emb_size self.bert = BertModel(config) self.emb = nn.Linear(config.hidden_size, emb_size) self.activation = nn.Tanh() self.cos_fn = torch.nn.CosineSimilarity(dim=1, eps=1e-6) self.apply(self.init_bert_weights) def calcSim(self, emb1, emb2): return self.cos_fn(emb1, emb2) def forward(self, input_ids, attention_mask): _, pooled_output = self.bert(input_ids, None, attention_mask, output_all_encoded_layers=False) emb = self.activation(self.emb(pooled_output)) return emb
number of matches for keywords in specified categoriesAsked 2022-Apr-14 at 13:32
For a large scale text analysis problem, I have a data frame containing words that fall into different categories, and a data frame containing a column with strings and (empty) counting columns for each category. I now want to take each individual string, check which of the defined words appear, and count them within the appropriate category.
As a simplified example, given the two data frames below, i want to count how many of each animal type appear in the text cell.
df_texts <- tibble( text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the grasshopper"), mammals=NA, reptiles=NA, birds=NA, insects=NA ) df_animals <- tibble(animals=c("ape", "fox", "tortoise", "hare", "owl", "grasshopper"), type=c("mammal", "mammal", "reptile", "mammal", "bird", "insect"))
So my desired result would be:
df_result <- tibble( text=c("the ape and the fox", "the tortoise and the hare", "the owl and the the grasshopper"), mammals=c(2,1,0), reptiles=c(0,1,0), birds=c(0,0,1), insects=c(0,0,1) )
Is there a straightforward way to achieve this keyword-matching-and-counting that would be applicable to a much larger dataset?
Thanks in advance!
ANSWERAnswered 2022-Apr-14 at 13:32
Here's a way do to it in the
tidyverse. First look at whether strings in
df_texts$text contain animals, then count them and sum by text and type.
library(tidyverse) cbind(df_texts[, 1], sapply(df_animals$animals, grepl, df_texts$text)) %>% pivot_longer(-text, names_to = "animals") %>% left_join(df_animals) %>% group_by(text, type) %>% summarise(sum = sum(value)) %>% pivot_wider(id_cols = text, names_from = type, values_from = sum) text bird insect mammal reptile <chr> <int> <int> <int> <int> 1 "the ape and the fox" 0 0 2 0 2 "the owl and the the \n grasshopper" 1 0 0 0 3 "the tortoise and the hare" 0 0 1 1
To account for the several occurrences per text:
cbind(df_texts[, 1], t(sapply(df_texts$text, str_count, df_animals$animals, USE.NAMES = F))) %>% setNames(c("text", df_animals$animals)) %>% pivot_longer(-text, names_to = "animals") %>% left_join(df_animals) %>% group_by(text, type) %>% summarise(sum = sum(value)) %>% pivot_wider(id_cols = text, names_from = type, values_from = sum)
No vulnerabilities reported