proxyC | R package for large-scale similarity/distance computation | Machine Learning library
kandi X-RAY | proxyC Summary
kandi X-RAY | proxyC Summary
proxyC computes proximity between rows or columns of large matrices efficiently in C++. It is optimized for large sparse matrices using the Armadillo and Intel TBB libraries. Among several built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast. This code was originally written for quanteda to compute similarity/distance between documents or features in large corpora, but separated as a stand-alone package to make it available for broader data scientific purposes.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of proxyC
proxyC Key Features
proxyC Examples and Code Snippets
install.packages("proxyC")
require(Matrix)
## Loading required package: Matrix
require(microbenchmark)
## Loading required package: microbenchmark
require(RcppParallel)
## Loading required package: RcppParallel
require(ggplot2)
## Loading required p
bm2 <- microbenchmark(
"proxyC all" = proxyC::simil(sm1k, margin = 2, method = "cosine"),
"proxyC min_simil" = proxyC::simil(sm1k, margin = 2, method = "cosine", min_simil = 0.9),
times = 10
)
autoplot(bm2)
## Coordinate system already
bm1 <- microbenchmark(
"proxy 1k" = proxy::simil(dm1k, method = "cosine"),
"proxyC 1k" = proxyC::simil(sm1k, margin = 2, method = "cosine"),
"proxy 10k" = proxy::simil(dm10k, method = "cosine"),
"proxyC 10k" = proxyC::simil(sm10k,
Community Discussions
Trending Discussions on proxyC
QUESTION
I have been working with some R
packages that calculate (cosine) (sparse) similarity matrices from sparse binary matrices, e.g. proxyC
.
As I am now starting (and learning) to use python
as well, and I was told it might even be faster, I would like to try and run the same calculations there.
I found this interesting post:
What's the fastest way in Python to calculate cosine similarity given sparse matrix data?
which describes a few methods.
I did try some of them out after writing out a small test matrix myself by hand.
Now I would like to try on 'real' data.
And that's where I encounter a problem I currently cannot solve.
My data come in tsv files that associate objects (ID's) to comma-separated lists of features (FP's). E.g.:
...ANSWER
Answered 2021-Apr-19 at 15:21import pandas as pd
df = pd.DataFrame({'ID':[1,2,3], 'FP':["A,B,C","A,D","C,D,F"]})
>>> df
ID FP
0 1 A,B,C
1 2 A,D
2 3 C,D,F
QUESTION
I have the following function
...ANSWER
Answered 2017-Nov-12 at 19:49A C function
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install proxyC
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page