false-sharing | Demo project to demonstrate the effects | Application Framework library
kandi X-RAY | false-sharing Summary
kandi X-RAY | false-sharing Summary
This is a small demo project for the accompanying blog post [False Sharing] It contains a microbenchmark to demonstrate the effects of false sharing and its mitigation with @Contended.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Prints the class
false-sharing Key Features
false-sharing Examples and Code Snippets
Community Discussions
Trending Discussions on false-sharing
QUESTION
Assuming such vector is std::vector>
would be read/write accessed by several cores concurrently and having it allocated on a per-core basis (i.e. vector index 0 used by the CPU core 0 only, 1 by the core 1, and so on), if one wants to avoid false-sharing, the underlying T
here would have to be declared as either alignas(64)
or just ensuring to have it properly padded to a standard x86 cache line size (i.e. 64 bytes). But what if the vector's T
is std::unique_ptr
? Does the same still hold and make sense, i.e. each vector item - in this case std::unique_ptr
- required to be 64 bytes in size?
ANSWER
Answered 2021-May-29 at 09:15If you want to be able to modify the pointer, then yes, you should ensure the pointers themselves are aligned. However, if the pointers are never changed while your parallel code is running, then they don't have to (it's fine to share read-only data among threads even if they share cache lines). But then you have to make sure U
is correctly aligned.
Note: don't assume 64 bytes cache line size, use std::hardware_destuctive_interference_size
instead.
QUESTION
Following static_assert passes in both gcc and clang trunk.
...ANSWER
Answered 2020-Oct-02 at 12:05Forcing padding where it's not needed would be bad design. Users can always pad if they have nothing useful to put in the rest of the cache line.
You probably want it in the same cache line as the data it's protecting if it's usually lightly contended; only one cache line to bounce around, instead of a 2nd cache miss when accessing the shared data after acquiring the lock. This is probably common with fine-grained locking where many objects have their own std::mutex
, and makes it more beneficial to keep it small.
(Heavily contended could create false sharing between readers trying to acquire the lock vs. the lock owner writing to the shared data after gaining ownership of the lock. Flipping the cache line to "shared", or invalidating, before the lock owner has a chance to write, would indeed slow things down).
Or the space in the rest of the line could be some very-rarely-used thing that needs to exist somewhere in the program, but maybe only used for error handling so its performance doesn't matter. If it couldn't share a line with a mutex, it would have to be taking up space somewhere else. (Maybe in some page of "cold" data, so this isn't a great example).
It's probably unlikely that you'd want to malloc
or new
a mutex itself, although one could be part of a class you dynamically allocate. Allocator overhead is a real thing, e.g. using 16 bytes of memory before the allocation for bookkeeping space. (Large allocations with glibc's malloc/new are often page-aligned + 16 bytes, making them misaligned wrt. all wider boundaries). Dynamic-allocator bookkeeping is a very good thing for a mutex to be sharing space with: it's probably not read or written by anything while the mutex is in use.
Non-lock-free std::atomic
objects typically use an array of locks (maybe just simple spinlocks, but could be std::mutex). If the latter, you don't expect two adjacent mutexes to be used simultaneously so it's good to pack them all together.
Also, increasing its size would be a very clunky way to try to ensure no false sharing. An implementation that wanted to make sure a std::mutex had a cache line to itself this would want to declare it with alignas(64)
to make sure its alignof()
was that. That would force padding to make sizeof(mutex) a multiple of alignof (in this case equal).
But note that std::hardware_destructive_interference_size
should be 128 on some modern x86-64, if you're going to fix a size for it, because of adjacent-line hardware prefetch in Intel's L2 caches. That's a weaker destructive effect than same cache-line, and that's too much space to waste.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install false-sharing
You can use false-sharing like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the false-sharing component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page