num_cpus | Get the number of CPUs in Rust | JSON Processing library
kandi X-RAY | num_cpus Summary
kandi X-RAY | num_cpus Summary
Count the number of CPUs on the current machine.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of num_cpus
num_cpus Key Features
num_cpus Examples and Code Snippets
def set_logical_cpu_devices(self, num_cpus, prefix=""):
"""Set virtual CPU devices in context.
If virtual CPU devices are already configured at context initialization
by tf.config.set_logical_device_configuration(), this method should no
Community Discussions
Trending Discussions on num_cpus
QUESTION
I have a large array arr1
of shape (k, n)
where both k
and n
are of the order 1e7. Each row contains only a few hundred non-zero elements and is sparse.
For each row k
I need to do element-wise multiplication with arr2
of shape (1, n)
.
Currently I perform this multiplication using the multiply
method of scipy.sparse.csc_matrix
and the multiplication is performed as part of a function that I am minimising which means it is evaluated thousands of times and causes a large computational load. More importantly, I've found that this function runs on a single core.
I've instead tried to find ways of parallelising this calculation by splitting the array into sub-arrays in k
to calculate in parallel. Much to my dismay I find that the parallelised version runs even slower. So far I've tried implementations in Dask, Ray, and multiprocessing. Below are the implementations I've been using on a machine with ~500GB RAM and 56 CPUs.
I don't understand why the paralell version run so slowly. This is my first time parallelising my own code so any assistance is greatly appreciated.
Setting up data (for reproducibility) ...ANSWER
Answered 2021-Nov-18 at 18:44If I'm understanding your implementations correctly, you haven't actually partitioned the arrays in any of these cases. so all you've done is run the exact same workflow, but on a different thread, so the "parallel" execution time is the original runtime plus the overhead of setting up the distributed job scheduler and passing everything to the second thread.
If you want to see any total time improvements, you'll have to actually rewrite your code to operate on subsets of the data.
In the dask case, use dask.array.from_numpy
to split the array into multiple chunks, then rewrite your workflow to use dask.array operations rather than numpy ones. Alternatively, partition the data yourself and run your function on subsets of the array using dask distributed's client.map
(see the quickstart).
None of these approaches will be easy, and you need to recognize that there is an overhead (both in terms of actual compute/network usagee/memory etc as well as a real investment of your time) in any of these, but if total runtime is important then it'll be worth it. See the dask best practices documentation for more background.
Update:
After your iteration with dask.array, your implementation is now faster than the single-threaded wall time, and yes, the additional CPU time is overhead. For your first time trying this, getting it to be faster than numpy/scipy (which, by the way, are already heavily optimized and are likely parallelizing under the hood in the single-threaded approach) is a huge win, so pat yourself on the back. Getting this to be even faster is a legitimate challenge that is far outside the scope of this question. Welcome to parallelism!
Additional reading:
QUESTION
So I'm trying to get processor groupings by socket. I've got a POC in C++ that does it, output is as follows.
...ANSWER
Answered 2021-Oct-23 at 20:10I only have a single core with 8 logical processors, so I can't verify I got everything right for your situation, but try this:
QUESTION
[Question posted by a user on YugabyteDB Community Slack]
I deployed a 3 master + 3 tserver cluster on 3 different regions. I am trying to set a preferred region.
Even though each master is in a different region, I had to use https://docs.yugabyte.com/latest/admin/yb-admin/#modify-placement-info so that is appeared in the config. I then used : https://docs.yugabyte.com/latest/admin/yb-admin/#set-preferred-zones to set the preferred zone.
But the tablets are not rebalancing the leaders. Is there anything else to do ?
sample yb-tserver.conf
...ANSWER
Answered 2021-Aug-13 at 11:36You also need the --placement_cloud=cloud
and --placement_zone=rack1
arguments to your yb-tserver processes (to match what you passed to modify_placement_info
step). Not just the --placement_region
gflag.
Otherwise, the create table
step doesn't find matching TServers to meet the desired placement for the table.
QUESTION
I am trying to perform IO operations with multiple directories using ray, modin(with ray backend) and python. The file writes pause and the memory and disk usages do not change at all and the program is blocked.
SetupI have a ray actor set up as this
...ANSWER
Answered 2021-Sep-27 at 13:27For any future readers,
modin.DataFrame.to_csv()
pauses unexplainably for unknown reasons, but modin.Dataframe.to pickle()
doesnt with the same logic.
There is also a significant performance increase in terms of read/write times, when data is stored as .pkl
files.
QUESTION
I'm clearly doing something wrong with multiprocessing
but I'm not sure what -- I'd expect to see some speed up on this task, but the time spent running this test function in the forked process is 2 orders of magnitude more than the time it takes in the main process. This is a non-trivial task, so I don't think it's a case where the workload was too small to benefit from multiprocessing, as in this question and basically all the other SO questions about multiprocessing
. And I know there is overhead from starting new processes, but my function returns the time spent doing the actual computation, which I would think would take place after that forking overhead is done.
I've looked at a bunch of docs and examples, tried versions of this code with map
, map_async
, apply
, and apply_async
instead of imap_unordered
, but I get comparable results in all cases. I'm thoroughly mystified at this point... any help understanding what I'm doing wrong would be great, as would a revised code snippet that provides an example of how to obtain a performance boost by parallelizing this task. Thanks!
ANSWER
Answered 2021-Sep-23 at 16:07I believe it's likely that your numpy
is already taking advantage of your multicore architecture in the single process model. For example, from here:
But many architectures now have a BLAS that also takes advantage of a multicore machine. If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything. Similarly for other matrix operations, like inversion, singular value decomposition, determinant, and so on.
You can check for that:
QUESTION
I am trying to create the vSphere VM with Dynamics number of disks. from the resource block not sure how to iterate the number of data disks list.
Here is the code I have:
varible.tf
...ANSWER
Answered 2021-Sep-11 at 02:16You can use dynamic blocks:
QUESTION
I’ve been trying to script a clone operation in VSphere using Terraform. I am able to perform this clone manually when using VSphere UI.
When running terraform plan, the executions fails with this error:
...ANSWER
Answered 2021-Sep-03 at 09:08After some digging, the issue was coming from VSphere privileges. The host VM (the one to clone) was hosted on a machine for which I didn't have access and wasn't shown in the Hosts list on VCenter.
This access was not required when manually cloning the VM from the UI but it was required when using the API.
Thus, the solution was to edit my user privileges and select "Propagate to children" at the Datacenter level to ensure I had access to all the hosts and clusters.
No specific privileges was required to be added, just the basic ones applied to the correct level.
QUESTION
In my application a method runs quickly once started but begins to continuously degrade in performance upon nearing completion, this seems to be even irrelevant of the amount of work (the number of iterations of a function each thread has to perform). Once it reaches near the end it slows to an incredibly slow pace compared to earlier (worth noting this is not just a result of fewer threads remaining incomplete, it seems even each thread slows down).
I cannot figure out why this occurs, so I'm asking. What am I doing wrong?
An overview of CPU usage:A slideshow of the problem
Worth noting that CPU temperature remains low throughout.
This stage varies with however much work is set, more work produces a better appearance with all threads constantly near 100%. Still, at this moment this appears good.
Here we see the continued performance of earlier,
Here we see it start to degrade. I do not know why this occurs.
After some period of chaos most of the threads have finished their work and the remaining threads continue, at this point although it seems they are at 100% they in actually perform their remaining workload very slowly. I cannot understand why this occurs.
Printing progressI have written a multi-threaded random_search
(documentation link) function for optimization. Most of the complexity in this function comes from printing data passing data between threads, this supports giving outputs showing progress like:
ANSWER
Answered 2021-Jul-28 at 09:09Some basic debugging (aka println!
everywhere) shows that your performance problem is not related to the multithreading at all. It just happens randomly, and when there are 24 threads doing their job, the fact that one is randomly stalling is not noticeable, but when there is only one or two threads left, they stand out as slow.
But where is this performance bottleneck? Well, you are stating it yourself in the code: in binary_buffer
you say:
QUESTION
I am new to Terraform and I am tying to create a VM using terraform. Below is my code. I want to clone a VM and don't want to give an IP. The VM gets created in vSphere but terraform kept on waiting for IP. Is there any way I can stop terraform to stop waiting for an ip?
Below is my code..
...ANSWER
Answered 2021-Jul-19 at 10:37I was able to fix the issue by changing the below parameter.
QUESTION
I am interested in physical cores, not logical cores.
I am aware of https://crates.io/crates/num_cpus, but I want to get the number of cores using cpuid. I am mostly interested in a solution that works on Ubuntu, but cross-platform solutions are welcome.
...ANSWER
Answered 2021-May-05 at 08:35I see mainly two ways for you to do this.
You could use the higher level library cpuid
. With this, it's as simple as cpuid::identify().unwrap().num_cores
(of course, please do proper error handling). But since you know about the library num_cpus
and still ask this question, I assume you don't want to use an external library.
The second way to do this is do it all on your own. But this method of doing it is mostly unrelated to Rust as the main difficulty lies in understanding the CPUID instruction and what it returns. This is explained in this Q&A, for example. It's not trivial, so I won't repeat it here.
The only Rust specific thing is how to actually execute that instruction in Rust. One way to do it is to use core::arch::x86_64::__cpudid_count
. It's an unsafe function that returns the raw result (four registers). After calling it, you have to extract the information you want via bit shifting and masking as described in the Q&A I linked above. Consult core::arch
for other architectures or more cpuid related functions.
But again, doing this manually is not trivial, error prone and apparently hard to make work truly cross-CPU. So I would strongly recommend using a library like num_cpus
in any real code.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install num_cpus
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page