xeon | module loader for bash scripts with require style | Script Programming library
kandi X-RAY | xeon Summary
kandi X-RAY | xeon Summary
xeon tiny node.js based tool. that simplify the process of creation modular and reusable bash scripts,. large or small, for personal usage or sysadmin tasks.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Resolve a path to a file .
- Read the configuration file .
- Lookup a node_modules directory
- Merge multiple data together .
- Parse a header string
- Read a file .
- Get the data from the graph graph
- Get all the patches from the database .
- Process a bundle .
- Read the package . json file .
xeon Key Features
xeon Examples and Code Snippets
Community Discussions
Trending Discussions on xeon
QUESTION
After command sudo service mongod start && sudo service mongod status
ANSWER
Answered 2021-Nov-23 at 02:13Signal "ILL" is illegal instruction.
MongoDB 5.0 requires Advanced Vector Extensions, Xeon E5540 does not have them.
For a list of processors that support AVX, see https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX
QUESTION
We have a site that has been working pretty well for the past 2 years. But we are actually seeing random peaks in the database load that make the site very slow for a few seconds.
These peaks only appear from a certain load on the server and are impossible to predict. More users = more peaks. Everything run very smoothly outside of those peaks (page load is < 300ms). CPU and RAM are not impacted by those peaks.
Spikes are especially visible in db connections were it can go from 100 connections to 1000 connections for 2 or 3 seconds. Then back to normal.
We have nothing in the PHP log, nothing in the slow query log (long_query_time = 0.1).
Server : Debian / MariaDB 10.3.31, Apache 2.4.38, PHP 7.3.31 All tables are InnoDB with primary keys. Connection by socket. Codeigniter 4.1.7. Redis cache.
What we already try :
Reboot the server / Restart Mysql
Slow query log with long_query_time = 0 for 24h then pt-query-digest on the result. Everything is ok.
General log for 3h when heavy traffic then pt-query-digest on the result. Everything is ok.
Explain on each request of the logs. Everything looks fine.
We no longer know where to look to find the source of the problem.
Additional info:
Environment : VMware virtual machine | CPU : 16x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz | RAM : 31.39 GiB | Disks : SSDs via SAN network
SHOW VARIABLES
: https://pastebin.com/fx99mrdt
SHOW GLOBAL STATUTS
: https://pastebin.com/NY1PKqpp
SHOW ENGINE INNODB STATUS
: https://pastebin.com/bNcKKTYN
MYSQL TUNNER
: https://pastebin.com/8gx9Qp1j
EDIT 1:
...ANSWER
Answered 2022-Mar-06 at 18:16"intersect" is less efficient than a composite index.
Have an index with these 4 columns in any order:
QUESTION
I am trying to optimize some code for speed and its spending a lot of time doing memcpys. I decided to write a simple test program to measure memcpy on its own to see how fast my memory transfers are and they seem very slow to me. I am wondering what might cause this. Here is my test code:
...ANSWER
Answered 2022-Mar-11 at 19:06Saturating RAM is not as simple as is seems.
First of all, at first glance here is the apparent throughput we can compute from the provided numbers:
- Fill:
1 / 0.4263950000 = 2.34
GB/s (1 GB is read); - Memcpy:
2 / 0.6350150000 = 3.15
GB/s (1 GB is read and 1 GB is written).
The thing is that the pages allocated by malloc
are not mapped in physical memory on Linux systems. Indeed, malloc
reserve some space in virtual memory, but the pages are only mapped in physical memory when a first touch is performed causing expensive page faults. AFAIK, the only way speed up this process is to use multiple cores or to prefill the buffers and reuse them later.
Additionally, due to architectural limitations (ie. latency), one core of a Xeon processor cannot saturate the RAM. Again, the only way to fix that is to use multiple cores.
If you try to use multiple core, then the result provided by the benchmark will be surprising since clock
does not measure the wall-clock time but the CPU time (which is the sum of the time spent in all threads). You need to use another function. In C, you can use gettimeofday
(which is not perfect as it is not monotonic) but certainly good-enough for your benchmark (related post: How can I measure CPU time and wall clock time on both Linux/Windows?). In C++, you should use std::steady_clock
(which is monotonic as opposed to std::system_clock
).
In addition, the write-allocate cache policy on x86-64 platform force cache lines to be read when they are written. This means that to write 1 GB, you actually need to read 1 GB! That being said, x86-64 processors provide non-temporal store instructions that does not cause this issue (assuming your array is aligned properly and big enough). Compilers can use that but GCC and Clang generally does not. memcpy
is already optimized to use non-temporal stores on most machines. For more information, please read How do non temporal instructions work?.
Finally, you can parallelize the benchmark easily using OpenMP with simple #pragma omp parallel for
directives on loops. Note that is also provide a user-friendly function for computing the wall-clock time correctly: omp_get_wtime. For the memcpy
, the best is certainly to write a loop doing memcpy
by (relatively big) chunks in parallel.
For more information about this subject, I advise you to read the great famous document: What Every Programmer Should Know About Memory. Since the document is a bit old, you can check the updating information about this here. The document also describe additional important things to understand why you may still not succeed saturate the RAM with the above information. One critical topic is NUMA.
QUESTION
I recently upgraded my OS from Debian 9 to Debian 11. I have a bunch of servers running a simulation and one subset produces a certain result and another subset produces a different result. This did not used to happen with Debian 9. I have produced a minimal failing example:
...ANSWER
Answered 2022-Feb-28 at 13:18It’s not a bug. Floating point arithmetic has rounding errors. For single arithmetic operations + - * / sqrt the results should be the same, but for floating-point functions you can’t really expect it.
In this case it seems the compiler itself produced the results at compile time. The processor you use is unlikely to make a difference. And we don’t know whether the new version is more or less precise than the old one.
QUESTION
I'm comparing the single thread performance of the matrix-matrix products in TensorFlow 2 and NumPy. I compare separately for single precision (float32) and double precision (float64). I find that the NumPy performance is almost equivalent to the Intel MKL C++ implementation (used as a benchmark for matrix multiplication) for both single and double precision (DGEMM and SGEMM). But in TensorFlow, only the single precision (float32) performance is equivalent to the MKL, and the double precision (float64) performance is significantly slower. Why is Tensorflow slower when used with double precision data?
Sample Scripts:
I consider the following instance to reproduce my observation. Consider the matrix multiplication:
C = AB where A and B are of size 3000x3000
The TensorFlow2 and NumPy code are given below:
Tensorflow2 code
...ANSWER
Answered 2022-Jan-20 at 06:54Assuming that you are using an Intel® AVX-512 instruction-supported processor, try installing the Intel® Optimization for TensorFlow Wheel via PIP specifically build for AVX512. These packages are available as *.whl on the Intel® website for specific Python versions or can be installed using the following command for Python versions 3.7, 3.8, and 3.9 (Linux Only).
QUESTION
I am trying to import data into MySQL from a JSON file.
...ANSWER
Answered 2021-Dec-06 at 07:43I changed save() to insert() - the speed has increased. Now all JSON (1107 lines) is imported in 40 seconds. Are there faster ways to load ready-made data from JSON into the database? What if there are 100 thousand lines or a million? Is it normal practice to wait a few hours?
QUESTION
So I have in WPF an Web Browser Element and filld it with some html Code with CSS. And The Web Browser Element show nearly everything I put in the CSS part. But the only things that dosent work in the Web Browser element, but work fine in an normal HTML file is stuff like Hover Effect and nth-child. The other CSS Stuff is working and showing correctly in the Web Browser.
here my HTML Code:
...ANSWER
Answered 2021-Nov-22 at 15:05The WebBrowser
control in WPF internally uses a native ActiveX control that is kind of dated.
If you want a modern browser experience that supports the latest HTML and CSS in your app, you should switch to using the WebView2 control.
QUESTION
I've got a function func
that may cost ~50s when running on a single core. Now I want to run it on a server which has got 192-core CPUs for many times. But when I add worker processes to say, 180, the performance of each core slows down. The worst CPU takes ~100s to calculate func
.
Can someone help me, please?
Here is the pseudo code
...ANSWER
Answered 2021-Nov-17 at 17:18You are measuring the time it takes each worker to perform func()
and observe performance decrease for a single process when going from 10 processes to 180 parallel processes.
This looks quite normal to me:
- Intel cores use hyper-threading so you actually have 96 cores (in more detail - a hyper-threaded core adds only 20-30% performance). It means that 168 of your processes need to share 84 hyper-threaded cores and 12 processes get full 12 cores.
- The CPU speed is determined by throttle temperature (https://en.wikipedia.org/wiki/Thermal_design_power) and of course there is so much more space when running 10 processes vs 180 processes
- Your tasks are obviously competing for memory. They make a total of over 5TB of memory allocations and you machine has much less than that. The last mile in garbage collecting is always the most expensive one - so if your garbage collectors are squeezed and competing for memory the performance is uneven with surprisingly longer garbage collection times.
Looking at this data I would recommend you to try:
QUESTION
The code I work on has a substantial amount of floating point arithmetic in it. We have test cases that record the output for given inputs and verify that we don't change the results too much. I had it suggested that I enable -march native to improve performance. However, with that enabled we get test failures because the results have changed. Do the instructions that will be used because of access to more modern hardware enabled by -march native reduce the amount of floating point error? Increase the amount of floating point error? Or a bit of both? Fused multiply add should reduce the amount of floating point error but is that typical of instructions added over time? Or have some instructions been added that while more efficient are less accurate?
The platform I am targeting is x86_64 Linux. The processor information according to /proc/cpuinfo
is:
ANSWER
Answered 2021-Nov-15 at 09:40-march native
means -march $MY_HARDWARE
. We have no idea what hardware you have. For you, that would be -march=skylake-avx512
(SkyLake SP) The results could be reproduced by specifying your hardware architecture explicitly.
It's quite possible that the errors will decrease with more modern instructions, specifically Fused-Multiply-and-Add (FMA). This is the operation a*b+c, but rounded once instead of twice. That saves one rounding error.
QUESTION
I'm trying to download the map of Mexico to avoid querying using save_graphml
and avoiding long response times in the graph_from_place
, but I've already left this code running for almost six hours and absolutely nothing happens.
ANSWER
Answered 2021-Oct-14 at 20:09I've already left this code running for almost six hours and absolutely nothing happens.
A lot has been happening! Don't believe me? You ran ox.config(log_console=True)
, so look at your terminal and watch what's happening while it runs. You'll see a line like "2021-10-14 13:05:39 Requesting data within polygon from API in 1827 request(s)"... so you are making 1,827 requests to the Overpass server and the server is asking you to pause for rate limiting between many of those requests.
I know that due to the stipulated area the time is long, but what I wanted to know is if there is an alternative to this procedure or if there is a way to optimize so that the creation of the map is a little faster or if there is another way to load maps to route with osmnx and networkx without using queries to servers
Yes. This answer provides more details. There are tradeoffs between 1) model precision vs 2) area size vs 3) memory/speed. For faster modeling, you can load the network data from a .osm XML file instead of having to make numerous calls to the Overpass API. I'd also recommend using a custom_filter
as described in the linked answer. OSMnx by default divides your query area into 50km x 50km pieces, then queries Overpass for each piece one a time to not exceed the server's per-query memory limits. You can configure this max_query_area_size
parameter, as well as the server memory allocation, if you prefer to use OSMnx's API querying functions rather than its from-file functionality.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
Install xeon
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page