Phenome | Scripts and libraries for managing the SGN Phenome database | Database library
kandi X-RAY | Phenome Summary
kandi X-RAY | Phenome Summary
Scripts and libraries for managing the SGN Phenome database
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Phenome
Phenome Key Features
Phenome Examples and Code Snippets
Community Discussions
Trending Discussions on Phenome
QUESTION
I find an interesting phenomenon:
...ANSWER
Answered 2019-Nov-19 at 05:21TL:DR: Sandybridge-family store-forwarding has lower latency if the reload doesn't try to happen "right away". Adding useless code can speed up a debug-mode loop because loop-carried latency bottlenecks in -O0
anti-optimized code almost always involve store/reload of some C variables.
Other examples: hyperthreading, calling an empty function, accessing vars through pointers.
None of this is relevant for optimized code. Bottlenecks on store-forwarding latency can occasionally happen, but adding useless complications to your code won't speed it up.
You're benchmarking a debug build, which is basically useless.
But obviously there is a real reason for the debug build of one version running slower than the debug build of the other version. (Assuming you measured correctly and it wasn't just CPU frequency variation (turbo / power-saving) leading to a difference in wall-clock time.)
If you want to get into the details of x86 performance analysis, we can try to explain why the asm performs the way it does in the first place, and why the asm from an extra C statement (which with -O0
compiles to extra asm instructions) could make it faster overall. This will tell us something about asm performance effects, but nothing useful about optimizing C.
You haven't shown the whole inner loop, only some of the loop body, but gcc -O0
is pretty predictable. Every C statement is compiled separately from all the others, with all C variables spilled / reloaded between the blocks for each statement. This lets you change variables with a debugger while single-stepping, or even jump to a different line in the function, and have the code still work. The performance cost of compiling this way is catastrophic. For example, your loop has no side-effects (none of the results are used) so the entire triple-nested loop can and would compile to zero instructions in a real build, running infinitely faster. Or more realistically, running 1 cycle per iteration instead of ~6 even without optimizing away or doing major transformations.
The bottleneck is probably the loop-carried dependency on k
, with a store/reload and an add
to increment. Store-forwarding latency is typically around 5 cycles on most CPUs. And thus your inner loop is limited to running once per ~6 cycles, the latency of memory-destination add
.
If you're on an Intel CPU, store/reload latency can actually be lower (better) when the reload can't try to execute right away. Having more independent loads/stores in between the dependent pair may explain it in your case. See Loop with function call faster than an empty loop.
So with more work in the loop, that addl $1, -12(%rbp)
which can sustain one per 6 cycle throughput when run back-to-back might instead only create a bottleneck of one iteration per 4 or 5 cycles.
This effect apparently happens on Sandybridge and Haswell (not just Skylake), according to measurements from a 2013 blog post, so yes, this is the most likely explanation on your Broadwell i5-5257U, too. It appears that this effect happens on all Intel Sandybridge-family CPUs.
Without more info on your test hardware, compiler version (or asm source for the inner loop), and absolute and/or relative performance numbers for both versions, this is my best low-effort guess at an explanation. Benchmarking / profiling gcc -O0
on my Skylake system isn't interesting enough to actually try it myself. Next time, include timing numbers.
The latency of the stores/reloads for all the work that isn't part of the loop-carried dependency chain doesn't matter, only the throughput. The store queue in modern out-of-order CPUs does effectively provide memory renaming, eliminating write-after-write and write-after-read hazards from reusing the same stack memory for p
being written and then read and written somewhere else. (See https://en.wikipedia.org/wiki/Memory_disambiguation#Avoiding_WAR_and_WAW_dependencies for more about memory hazards specifically, and this Q&A for more about latency vs. throughput and reusing the same register / register renaming)
Multiple iterations of the inner loop can be in flight at once, because the memory-order buffer keeps track of which store each load needs to take data from, without requiring a previous store to the same location to commit to L1D and get out of the store queue. (See Intel's optimization manual and Agner Fog's microarch PDF for more about CPU microarchitecture internals.)
Does this mean adding useless statements will speed up real programs? (with optimization enabled)In general, no, it doesn't. Compilers keep loop variables in registers for the innermost loops. And useless statements will actually optimize away with optimization enabled.
Tuning your source for gcc -O0
is useless. Measure with -O3
, or whatever options the default build scripts for your project use.
Also, this store-forwarding speedup is specific to Intel Sandybridge-family, and you won't see it on other microarchitectures like Ryzen, unless they also have a similar store-forwarding latency effect.
Store-forwarding latency can be a problem in real (optimized) compiler output, especially if you didn't use link-time-optimization (LTO) to let tiny functions inline, especially functions that pass or return anything by reference (so it has to go through memory instead of registers). Mitigating the problem may require hacks like volatile
if you really want to just work around it on Intel CPUs and maybe make things worse on some other CPUs. See discussion in comments
QUESTION
I have a table with one column redirecting to another page. But table changes dynamically so I need to create the link in javascript function.
I am trying to create link a href="" which call another view with parameters. I need below createrlink into my href but when I insert it ..it just ignoring the parameter part
...ANSWER
Answered 2019-Jul-15 at 20:41Basically I created url as string and then just inserted into the a href to solve the quote problem
linkpagedr = '${createLink(action:'showGraph')}?inputprscode='+ phenocatsel+'&inprscat='+phenocatsel+'&inprsstudy='+phenomesel;
linkpage ='link';
But I m not sure if this is an efficient way. If anybody knows a better way to implement this , please let me know.
QUESTION
I'm writing a library for genetic algorithms/neuroevolution. Currently, the program uses polymorphism to allow multiple types of genomes. So my code looks like this:
...ANSWER
Answered 2019-May-06 at 16:42Template might help your case better, but has other implication.
For example, your list cannot be heterogeneous in which genome type it contains. It must be all of the same type. If you need heterogenicity, then you'll have to implement some sort of type erasure.
Here's an example that look like your example but with static polymorphism:
QUESTION
So I have one page which contains a form so that the user can submit data to the database and a second page which contains specific fields from that record by using the assigned pk value.
My question is whether I can use the submit button to post the data to the database and then immediately load up a URL with the pk value from that submitted record to use some of the fields? Or would this need to be done in 2 steps? i.e. 1. insert data into DB, then 2. load the URL with the relevant pk..
urls.py
:
ANSWER
Answered 2018-May-17 at 13:33You should overwrite the get_success_url
method of your UserView
class
QUESTION
I am trying to import specific items from a database into a web page using Django Framework. I have managed to do this when I pull all the records but not when I try to pick individual records by their primary key.
Here is the html to display the page and database fields:
...ANSWER
Answered 2018-May-11 at 13:04I don't understand why you're doing datum = data._meta.get_fields()
in the view. That gets the class-level field objects, which you don't want at all. Don't do that; pass the object directly to the template.
You will also need to remove the loop in the template and pass the object directly as item
.
As a further optimisation, you can use get_object_or_404
rather than catching the DoesNotExist exception and raising a 404 manually.
QUESTION
I've searched but I just can't figure out why my global array variable isn't getting updated in the global scope when I call this function. I've commented places in the code where I can print() the results I want but can't make it available outside the functions. I know it's something simple I'm not understanding about closures and completion handlers but I need someone else to have a look. Thanks.
...ANSWER
Answered 2018-Mar-01 at 10:11Code in function doURLsessions
is asynchronous - it isn't blocking other code from executing and since it is a network request it takes some time to finish. In your case print(arrFromURL)
is called before doURLsessions
is finished so arrFromURL
is still empty.
QUESTION
I am trying to access values of System.Collections.Generic.IList
which is declared outside Alea.Gpu.Default.For
.
ANSWER
Answered 2017-May-10 at 09:32Currently, AleaGPU only works with array. List usually require dynamic memory allocation, such as add element, which is not efficient in GPU.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Phenome
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page