cpupower | Gnome-Shell Extension for intel-pstate driver | Calendar library
kandi X-RAY | cpupower Summary
kandi X-RAY | cpupower Summary
Gnome-Shell Extension for intel-pstate driver
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cpupower
cpupower Key Features
cpupower Examples and Code Snippets
Community Discussions
Trending Discussions on cpupower
QUESTION
Solve the doubts on the use of OptaPlanner. OptaPlanner uses the following score calculation types: Drools score calculation or Constraint streams score calculation. Both methods support incremental calculation of scores. One doubt about incremental calculation of scores:
Demo:
...ANSWER
Answered 2020-Oct-14 at 08:33The previous negative score will be deleted because
addHardConstraintMatch()
is doing some black magic: it registers a rule unmatch listener to undo that negative addition when the score no longer matches.scoreDRL is incremental, so only the delta of the score change will be recalculated.
PS: Take a look at ConstraintStreams too, they are also incremental :)
QUESTION
I've been looking at the Cloud Balancing demo of OptaPlanner with Drools implementation.
There are 2 rules for this demo (there actually are 4, but 3 of those rules are of the same logic) --
...ANSWER
Answered 2020-Feb-19 at 12:04https://docs.optaplanner.org/latest/optaplanner-docs/html_single/#localSearchOverview:
Local Search needs to start from an initialized solution, therefore it’s usually required to configure a Construction Heuristic phase before it.
By default, Local Search expects that all entities are initialized and is not allowed to uninitialize them when making changes on the solution. The answer to your last question is that the Construction Heuristic phase assigns a computer to each process even though it actually worsens the score. The Local Search will improve the score in the next phase.
Thanks to this default behavior you don't have to write rules that penalize processes that haven't been assigned any computer.
For completeness, note that you can make planning variables nullable but that is an advanced topic. Nullable planning variables are usually only useful for over-constrained planning.
QUESTION
I have an Intel CPU with 4 HT cores (8 logical CPUs) and I built two simple processes.
The first one:
...ANSWER
Answered 2019-Sep-29 at 01:30Both are compiled with gcc without special options. (I.e. with the default of -O0: no optimization debug mode, keeping variables in memory instead of registers.)
Unlike a normal program, the version with int i,j
loop counters bottlenecks completely on store-forwarding latency, not front-end throughput or back-end execution resources or any shared resource.
This is why you never want to do real benchmarking with -O0
debug-mode: the bottlenecks are different than with normal optimization (-O2
at least, preferably -O3 -march=native
).
On Intel Sandybridge-family (including @uneven_mark's Kaby Lake CPU), store-forwarding latency is lower if the reload doesn't try to run right away after the store, but instead runs a couple cycles later. Adding a redundant assignment speeds up code when compiled without optimization and also Loop with function call faster than an empty loop both demonstrate this effect in un-optimized compiler output.
Having another hyperthread competing for front-end bandwidth apparently makes this happen some of the time.
Or maybe the static partitioning of the store buffer speeds up store-forwarding? Might be interesting to try a minimally-invasive loop running on the other core, like this:
QUESTION
I am using Red Hat Decision Manager 7.3 and and trying to get the OptaCloud sample working, specifically when submitting the problem to the solver, which throws the following error:
...ANSWER
Answered 2019-Sep-03 at 12:18The correct body looks like this:
QUESTION
Note: exhaustive system details are given at the end of the question.
I am trying to get my development machine to have a very stable CPU frequency so that I can get precise benchmarks of some linear algebra codes - however, it still displays significant frequency fluctuations.
I have set scaling governor to performance
mode:
ANSWER
Answered 2019-Aug-25 at 23:19One case not mentioned in your post is Intel's turbo boost. You can disable it by writing 1
to /sys/devices/system/cpu/intel_pstate/no_turbo
. This setting is also available in BIOS, but I'm not sure if the effects are 100% equivalent.
QUESTION
Is it necessary to use intel_pstate driver to enable intel turbo boost technology? I have a processor using acpi-cpufreq driver, when I execute
...ANSWER
Answered 2019-Apr-12 at 10:54Turbo boost doesn't require software intervention but it can be disabled (by the BIOS/UEFI or by the OS).
When disabled it is not reported by the cpuid
instruction.
You can check if TB is enabled by executing the command:
QUESTION
I'm developing (NASM + GCC targetting ELF64) a PoC that uses a spectre gadget that measures the time to access a set of cache lines (FLUSH+RELOAD).
How can I make a reliable spectre gadget?
I believe I understand the theory behind the FLUSH+RELOAD technique, however in practice, despiste some noise, I'm unable to produce a working PoC.
Since I'm using the Timestamp counter and the loads are very regular I use this script to disable the prefetchers, the turbo boost and to fix/stabilize the CPU frequency:
...ANSWER
Answered 2019-Mar-26 at 12:05The buffer is allocated from the bss
section and so when the program is loaded, the OS will map all of the buffer
cache lines to the same CoW physical page. After flushing all of the lines, only the accesses to the first 64 lines in the virtual address space miss in all cache levels1 because all2 later accesses are to the same 4K page. That's why the latencies of the first 64 accesses fall in the range of the main memory latency and the latencies of all later accesses are equal to the L1 hit latency3 when GAP
is zero.
When GAP
is 1, every other line of the same physical page is accessed and so the number of main memory accesses (L3 misses) is 32 (half of 64). That is, the first 32 latencies will be in the range of the main memory latency and all later latencies will be L1 hits. Similarly, when GAP
is 63, all accesses are to the same line. Therefore, only the first access will miss all caches.
The solution is to change mov eax, [rdi]
in flush_all
to mov dword [rdi], 0
to ensure that the buffer is allocated in unique physical pages. (The lfence
instructions in flush_all
can be removed because the Intel manual states that clflush
cannot be reordered with writes4.) This guarantees that, after initializing and flushing all lines, all accesses will miss all cache levels (but not the TLB, see: Does clflush also remove TLB entries?).
You can refer to Why are the user-mode L1 store miss events only counted when there is a store initialization loop? for another example where CoW pages can be deceiving.
I suggested in the previous version of this answer to remove the call to flush_all
and use a GAP
value of 63. With these changes, all of the access latencies appeared to be very high and I have incorrectly concluded that all of the accesses are missing all cache levels. Like I said above, with a GAP
value of 63, all of the accesses become to the same cache line, which is actually resident in the L1 cache. However, the reason that all of the latencies were high is because every access was to a different virtual page and the TLB didn't have any of mappings for each of these virtual pages (to the same physical page) because by removing the call to flush_all
, none of the virtual pages were touched before. So the measured latencies represent the TLB miss latency, even though the line being accessed is in the L1 cache.
I also incorrectly claimed in the previous version of this answer that there is an L3 prefetching logic that cannot be disabled through MSR 0x1A4. If a particular prefetcher is turned off by setting its flag in MSR 0x1A4, then it does fully get switched off. Also there are no data prefetchers other than the ones documented by Intel.
Footnotes:
(1) If you don't disable the DCU IP prefetcher, it will actually prefetch back all the lines into the L1 after flushing them, so all accesses will still hit in the L1.
(2) In rare cases, the execution of interrupt handlers or scheduling other threads on the same core may cause some of the lines to be evicted from the L1 and potentially other levels of the cache hierarchy.
(3) Remember that you need to subtract the overhead of the rdtscp
instructions. Note that the measurement method you used actually doesn't enable you to reliably distinguish between an L1 hit and an L2 hit. See: Memory latency measurement with time stamp counter.
(4) The Intel manual doesn't seem to specify whether clflush
is ordered with reads, but it appears to me that it is.
QUESTION
I am profiling an application for execution time on an x86-64 processor running linux. Before starting to benchmark the application, I want to make sure that the Dynamic Frequency scaling and idle states are disabled.
Check on Frequency scaling ...ANSWER
Answered 2019-Jan-30 at 22:18The beauty of open source software is that you can always go and check :)
cpupower monitor
uses different monitors, the mperf
monitor defines this array:
QUESTION
I am trying to use cpu-frequency scaling to set cpu frequency. In my system, only powersave , performance frequency-scaling-governor supported. It was explained in other document, by default, intel_pstate is enabled and it only supports powersave , performance frequency-scaling-governor and solution is disable intel_pstate. So I tried to disable as below
...ANSWER
Answered 2018-Oct-03 at 16:28sorry to post this as an answer but I don't have the reputation to post as a comment :/
I had the same problem when trying to disable intel_pstate driver in my intel core i7. While managing to disable it, acpi-cpufreq was not loading properly, the problem being SpeedStep was disabled. SpeedStep allows the frequency to be changed by software in these microprocessors, with it disabled it can only be touched by the hardware. You can access this option through BIOS settings. I hope that helps!
QUESTION
I would like to capture the output of turbostat to a variable. turbostat will run every 5sec by default so the man recommended to add sleep 1 on the end to just capture a single shot. However when I do this I can not seem to pipe the result plus it writes the output to console unless I pipe everything to null.
How can I capture the output?
I tired this but to no avail.
...ANSWER
Answered 2018-Jun-18 at 20:52According to the man page, when turbostat
is given a command to run, it sends its own output to stderr
, not stdout
. This is presumably so that the statistics can be distinguished from the output of the command.
So you need to redirect stderr
to stdout
in order to process it with pcregrep
. You're doing this with the output of pcregrep
itself, not the output of turbostat
.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install cpupower
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page