linalg | single header , public domain | Reflection library
kandi X-RAY | linalg Summary
kandi X-RAY | linalg Summary
linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute languages and is intended to serve as a lightweight alternative to projects such as GLM, Boost.QVM or Eigen in domains such as computer graphics, computational geometry, and physical simulation. It allows you to easily write programs like the following:.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of linalg
linalg Key Features
linalg Examples and Code Snippets
Community Discussions
Trending Discussions on linalg
QUESTION
I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here:
- Why python run natively on M1 Max is greatly (~100%) slower than on my old MacBook Pro 2016 with Intel i5?
- On M1 Max, why there isn't significant speed difference between native run (by miniforge) and run via Rosetta (by anaconda) - which is supposed to be slower ~20%?
- On M1 Max and native run, why there isn't significant speed difference between conda installed Numpy and TensorFlow installed Numpy - which is supposed to be faster?
- On M1 Max, why run in PyCharm IDE is constantly slower ~20% than run from terminal, which doesn't happen on my old Intel Mac.
Evidence supporting my questions is as follows:
Here are the settings I've tried:
1. Python installed by
- Miniforge-arm64, so that python is natively run on M1 Max Chip. (Check from Activity Monitor,
Kind
of python process isApple
). - Anaconda. Then python is run via Rosseta. (Check from Activity Monitor,
Kind
of python process isIntel
).
2. Numpy installed by
conda install numpy
: numpy from original conda-forge channel, or pre-installed with anaconda.- Apple-TensorFlow: with python installed by miniforge, I directly install tensorflow, and numpy will also be installed. It's said that, numpy installed in this way is optimized for Apple M1 and will be faster. Here is the installation commands:
ANSWER
Answered 2021-Dec-06 at 05:53Since the benchmark is running linear algebra routines, what is likely being tested here are the BLAS implementations. A default Anaconda distribution for osx-64 platform is going to come with Intel's MKL implementation; the osx-arm64 platform only has the generic Netlib BLAS and the OpenBLAS implementation options.
For me (MacOS w/ Intel i9), I get the following benchmark results:
BLAS Implmentation Mean Timing (s)mkl
0.95932
blis
1.72059
openblas
2.17023
netlib
5.72782
So, I suspect the old MBP had MKL installed, and the M1 system is installing either Netlib or OpenBLAS. Maybe try figuring out whether Netlib or OpenBLAS are faster on M1, and keep the faster one.
Specifying BLAS ImplementationHere are specifically the different environments I tested:
QUESTION
Suppose I have a list that stores many 2D points. In this list, some positions are stored the same points, consider the index of positions that stored the same point as an index pair. I want to find all the pairs in the list and return all 2 by 2 index pairs. It is possible that the list has some points repeated more than two times, but only the first match needs to be treated as a pair.
For example, in the below list, I have 9 points in total and there are 5 positions containing repeated points. The indices 0, 3, and 7 store the same point ([1, 1]
), and the indicies 1 and 6 store the same point ([2, 3]
).
ANSWER
Answered 2022-Mar-14 at 23:44You can use a dictionary to keep track of the indices for each point.
Then, you can iterate over the items in the dictionary, printing out the indices corresponding to points that appear more than once. The runtime of this procedure is linear, rather than quadratic, in the number of points in A
:
QUESTION
I am having trouble when switching a model from some local dummy data to using a TF dataset.
Sorry for the long model code, I have tried to shorten it as much as possible.
The following works fine:
...ANSWER
Answered 2022-Mar-10 at 08:57You will have to explicitly set the shapes of the tensors coming from tf.py_functions
. Using None
will allow variable input lengths. The Bert
output dimension (384,)
is, however, necessary:
QUESTION
a = np.array([0, 0, 1000])
b = np.array([[0.1,0.5,0.4],[0.2,0,0.8],[0.1,0.2,0.7]])
c= np.array([[0,0.5,0.5],[0.3,0,0.7],[0.1,0.4,0.5]])
d= np.array([[0.3,0.5,0.2],[0.4,0.3,0.3],[0.1,0.3,0.6]])
I have this vector (a) that I need to multiply by many 2d arrays(b,c,d). How can I write a loop that takes a and multiplies by b and then uses the result, to multiply by c ,and then uses that result to multiply by d.
I am currently doing this
result=np.dot(a,b)
result2=np.dot(result,c)
result3=np.dot(result2,d)
But i need to be able to loop through all as I have many vectors and matrices.
**edit np.linalg.multi_dot works for this case, but I need to get the output of each array like this
...ANSWER
Answered 2022-Mar-02 at 09:56You can use the multi_dot
function:
QUESTION
I'm inverting very large matrices on an HPC. Obviously, this has high RAM requirements. To avoid out-of-memory errors, as a temporary solution I've just been requesting a lot of memory (TBs). How might I predict the required memory from the input matrix size for a matrix inversion using numpy.linalg.inv to more efficiently run HPC jobs?
...ANSWER
Answered 2022-Jan-17 at 21:04TL;DR: up to O(32 n²)
bytes for a (n,n)
input matrix of type float64
.
numpy.linalg.inv
calls _umath_linalg.inv
internally without performing any copy or creating any additional big temporary arrays. This internal function itself calls LAPACK functions internally. As far as I understand, the wrapping layer of Numpy is responsible for allocating the output Numpy matrix. The C code itself allocates a temporary array (see: here). No other array allocations appear to be performed by Numpy for this operation. There are several Lapack implementation and so it is not possible to generally know how much memory is requested by Lapack calls. However, AFAIK, almost all Lapack implementations does not allocate data in your back: the caller has to do it (especially for sgesv
/dgesv
which is used here). Assuming the (n, n)
input matrix is of type float64
and FORTRAN integers are 4-byte wise (which should be the case on most platform, especially on Windows), then the actual memory required (taken by both the input matrix, the output matrix and the temporary matrix) is 8 n² + 8 n² + (8 n² + 8 n² + 4 n)
bytes which is equal to (32 n + 4) n
or simply O(32 n²)
bytes. Note that the temporary buffer is the maximum size and may not be fully written which mean that the OS can physically map (ie. reserve in physical RAM) only a fraction of the allocated space. This is what happens on my (Linux) machine with OpenBLAS: only 24 n²
bytes appear to be actually physically mapped. For float32
matrices, it is half the space.
QUESTION
I have a set of data values for a scalar 3D function, arranged as inputs x,y,z
in an array of shape (n,3)
and the function values f(x,y,z)
in an array of shape (n,)
.
EDIT: For instance, consider the following simple function
...ANSWER
Answered 2021-Nov-16 at 17:10All you need is just reshape F[:, 3]
(only f(x, y, z)) into a grid. Hard to be more precise without sample data:
If the data is not sorted, you need to sort it:
QUESTION
I want to test and compare Numpy matrix multiplication and Eigen decomposition performance with Intel MKL and without Intel MKL.
I have installed MKL using pip install mkl
(Windows 10 (64-bit), Python 3.8).
I then used examples from here for matmul and eigen decompositions.
How do I now enable and disable MKL in order to check numpy performance with MKL and without it?
Reference code:
...ANSWER
Answered 2021-Nov-25 at 12:30You can use different environments for the comparison of Numpy with and without MKL. In each environment you can install the needed packages(numpy with MKL or without) using package installer. Then on that environments you can run your program to compare the performance of Numpy with and without MKL.
NumPy doesn’t depend on any other Python packages, however, it does depend on an accelerated linear algebra library - typically Intel MKL or OpenBLAS.
The NumPy wheels on PyPI, which is what pip installs, are built with OpenBLAS.
In the conda defaults channel, NumPy is built against Intel MKL. MKL is a separate package that will be installed in the users' environment when they install NumPy.
When a user installs NumPy from conda-forge, that BLAS package then gets installed together with the actual library.But it can also be MKL (from the defaults channel), or even BLIS or reference BLAS.
Please refer this link to know about installing Numpy in detail.
You can create two different environments to compare the NumPy performance with MKL and without it. In the first environment install the stand-alone NumPy (that is, the NumPy without MKL) and in the second environment install the one with MKL.
To create environment using NumPy without MKL.
QUESTION
I implemented a function (angle_between
) to calculate the angle between two vectors. It makes use of needle-like triangles and is based on Miscalculating Area and Angles of a Needle-like Triangle and this related question.
The function appears to work fine most of the time, except for one weird case where I don't understand what is happening:
...ANSWER
Answered 2021-Oct-05 at 20:34I just tried the case of setting vectorB as a multiple of vectorA and - interestingly - it sometimes produces nan, sometimes 0 and sometimes it fails and produces a small angle of magnitude 1e-8 ... any ideas why?
Yea and I think that's what your question boils down to. Here is the formula from the berkeley paper due to Kahan that you've been using.
Assuming that a≥b
, a≥c
(only then is the formula valid) and b+c≈a
.
If we ignore mu
for a second and look at everything else under the square root it must all be positive since a
is the longest side. And mu
is c-(a-b)
which is 0 ± a small error
. If that error is zero you get zero which is btw. the correct result. If the error is negative the square root gives you nan and if the error is positive you get a small angle.
Notice that the same argument works when b+c-a
is non zero but smaller than the error.
QUESTION
I'm doing some calculation in Julia and noticed it's running significantly slower (about 25 times!) than numpy counterpart.
Then I realized Julia is only using 8 threads out of total 96 CPU threads (48 physical cores) on my PC while numpy seems to be having no problem utilzing well over 70 threads.
Running Julia with $julia --thread 96
argument does not make any difference even though julia> Threads.nthreads()
returns 96
.
Furthermore, a little disapointment from the result is that I suspect Julia using all of the 96 threads still might not be able to matchup with numpy's speed.
Here is the Julia code. I simply measure the time with julia> @time calc_fid(mat_a, mat_b)
which gave me 90 seconds average.
ANSWER
Answered 2021-Jul-09 at 07:19Your Julia example code isn't actually using Julia's multithreading capabilities.
In Julia, if you want multithreading, you generally need to enable it explicitly. Functions like mean
or cov
will not automatically be multithreaded just by virtue of starting Julia with multiple threads; if you want these functions to multithread, you will need to either write multithreaded versions of them, or else use a package that has already done so.
The only thing in your Julia example code that will multithread at all as currently written would be the linear algebra, because that falls back to BLAS (as indeed likely does numpy), which has its own multithreading system entirely separate from Julia's multithreading (BLAS is written in Fortran). That is also probably not using all 96 threads either though, and in your case is almost certainly defaulting to 8. The number of threads used by BLAS in Julia can be checked with
QUESTION
I have several points on the unit sphere that are distributed according to the algorithm described in https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf (and implemented in the code below). On each of these points, I have a value that in my particular case represents 1 minus a small error. The errors are in [0, 0.1]
if this is important, so my values are in [0.9, 1]
.
Sadly, computing the errors is a costly process and I cannot do this for as many points as I want. Still, I want my plots to look like I am plotting something "continuous". So I want to fit an interpolation function to my data, to be able to sample as many points as I want.
After a little bit of research I found scipy.interpolate.SmoothSphereBivariateSpline which seems to do exactly what I want. But I cannot make it work properly.
Question: what can I use to interpolate (spline, linear interpolation, anything would be fine for the moment) my data on the unit sphere? An answer can be either "you misused scipy.interpolation
, here is the correct way to do this" or "this other function is better suited to your problem".
Sample code that should be executable with numpy
and scipy
installed:
ANSWER
Answered 2021-Jul-02 at 16:35You can use Cartesian coordinate instead of Spherical coordinate.
The default norm parameter ('euclidean'
) used by Rbf is sufficient
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install linalg
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page