chrono | High-performance C++ library for multiphysics and multibody
kandi X-RAY | chrono Summary
kandi X-RAY | chrono Summary
[bsd license] distributed under a permissive bsd license, chrono is an open-source multi-physics package used to model and simulate: - dynamics of large systems of connected rigid bodies governed by differential-algebraic equations (dae) - dynamics of deformable bodies governed by partial differential equations (pde) - granular dynamics using either a non-smooth contact formulation resulting in differential variational inequality (dvi) problems or a smooth contact formulation resulting in daes - fluid-solid interaction problems whose dynamics is governed by coupled daes and pdes - first-order dynamic systems governed by ordinary differential equations (ode). chrono provides a mature and stable code base that continues to be augmented with new features and modules. the core functionality of chrono provides support for the modeling, simulation, and visualization of rigid and flexible
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of chrono
chrono Key Features
chrono Examples and Code Snippets
Community Discussions
Trending Discussions on chrono
QUESTION
I noticed unexpected behaviour and narrowed it down to conversion from one duration to another.
...ANSWER
Answered 2021-Jun-11 at 16:35On your platform, system_clock::time_point
is a type alias for time_point
. It turns out that this detail is important.
When running the float
version of this test, the very first step is convert b - a
to duration
to make the call to count_intervals
. The type of b - a
is nanoseconds
. The value of b - a
is 30'000'000'000ns
.
It takes 25 bits of precision to store the integer 30'000'000'000
. IEEE float has 24 bits of precision.
The first step in converting 30'000'000'000ns
to duration
is store the number 30'000'000'000
into the common_type
of system_clock::rep
and float
which is float
. It can only do this approximately. It actually stores 30'000'001'024
, which is the best representable value under the "round toward nearest, and to even on tie" policy. It then divides this approximation by 1'000'000'000
(stored in a float
), and gets approximately 30
. The exact value it gets is 0x1.e00002p+4
, or 30.000001...
From here on out, the computation is slightly off from exact.
You can see this for yourself by setting cout
to hexfloat
mode which will output floating point numbers exactly:
QUESTION
I have a minimally reproducible sample which is as follows -
...ANSWER
Answered 2021-Jun-11 at 14:46The non-OpenMP vectorizer is defeating your benchmark with loop inversion.
Make your function __attribute__((noinline, noclone))
to stop GCC from inlining it into the repeat loop. For cases like this with large enough functions that call/ret overhead is minor, and constant propagation isn't important, this is a pretty good way to make sure that the compiler doesn't hoist work out of the loop.
And in future, check the asm, and/or make sure the benchmark time scales linearly with the iteration count. e.g. increasing 500 up to 1000 should give the same average time in a benchmark that's working properly, but it won't with -O3
. (Although it's surprisingly close here, so that smell test doesn't definitively detect the problem!)
After adding the missing #pragma omp simd
to the code, yeah I can reproduce this. On i7-6700k Skylake (3.9GHz with DDR4-2666) with GCC 10.2 -O3 (without -march=native
or -fopenmp
), I get 18266, but with -O3 -fopenmp
I get avg time 39772.
With the OpenMP vectorized version, if I look at top
while it runs, memory usage (RSS) is steady at 771 MiB. (As expected: init code faults in the two inputs, and the first iteration of the timed region writes to result
, triggering page-faults for it, too.)
But with the "normal" vectorizer (not OpenMP), I see the memory usage climb from ~500 MiB until it exits just as it reaches the max 770MiB.
So it looks like gcc -O3
performed some kind of loop inversion after inlining and defeated the memory-bandwidth-intensive aspect of your benchmark loop, only touching each array element once.
The asm shows the evidence: GCC 9.3 -O3
on Godbolt doesn't vectorize, and it leaves an empty inner loop instead of repeating the work.
QUESTION
This is my first post here and I am not that experienced, so please excuse my ignorance.
I am building a Monte Carlo simulation in C++ for my PhD and I need help in optimizing its computational time and performance. I have a 3d cube repeated in each coordinate as a simulation volume and inside every cube magnetic particles are generated in clusters. Then, in the central cube a loop of protons are created and move and at each step calculate the total magnetic field from all the particles (among other things) that they feel.
At this moment I define everything inside the main function and because I need the position of the particles for my calculations (I calculate the distance between the particles during their placement and also during the proton movement), I store them in dynamic arrays. I haven't used any class or function,yet. This makes my simulations really slow because I have to use eventually millions of particles and thousands of protons. Even with hundreds it needs days. Also I use a lot of for and while loops and reading/writing to .dat files.
I really need your help. I have spent weeks trying to optimize my code and my project is behind schedule. Do you have any suggestion? I need the arrays to store the position of the particles .Do you think classes or functions would be more efficient? Any advice in general is helpful. Sorry if that was too long but I am desperate...
Ok, I edited my original post and I share my full script. I hope this will give you some insight regarding my simulation. Thank you.
Additionally I add the two input files
...ANSWER
Answered 2021-Jun-10 at 13:17I talked the problem in more steps, first thing I made the run reproducible:
QUESTION
I am trying to install hpctoolkit
using Spack
. In order to do that, I executed :
ANSWER
Answered 2021-Jun-10 at 11:42In order to fix this error, you should precise the path to g++. In my case, here is the updated content of my compilers.yaml file:
QUESTION
I am trying to install hpctoolkit
using spack
. In order to do that, I executed :
ANSWER
Answered 2021-Jun-09 at 12:34As you can see in the error, compiler 'gcc@10.2.0' does not support compiling C++ programs.
In order to display the compilers, use the command:
QUESTION
#include
#include
using namespace std;
int main()
{
const unsigned int m = 200;
const unsigned int n = 200;
srand(static_cast(static_cast
>(std::chrono::high_resolution_clock::now().time_since_epoch()).count()));
double** matrixa;
double** matrixb;
double** matrixc;
matrixa = new double* [m];
matrixb = new double* [m];
matrixc = new double* [m];
unsigned int max = static_cast(1u << 31);
for (unsigned int i = 0; i < m; i++)
matrixa[i] = new double[n];
for (unsigned int i = 0; i < m; i++)
matrixb[i] = new double[n];
for (unsigned int i = 0; i < m; i++)
matrixc[i] = new double[n];
for (unsigned int i = 0; i < m; i++)
for (unsigned int j = 0; j < n; j++)
matrixa[i]
[j] = static_cast(static_cast(rand()) / max * 10);
for (unsigned int i = 0; i < m; i++)
for (unsigned int j = 0; j < n; j++)
matrixb[i]
[j] = static_cast(static_cast(rand()) / max * 10);
auto start = std::chrono::high_resolution_clock::now();
for (unsigned int i = 0; i < m; i++)
for (unsigned int j = 0; j < n; j++)
for (unsigned int k = 0; k < m; k++)
for (unsigned int l = 0; l < m; l++)
matrixc[i][j] += matrixa[k][l] * matrixb[l][k];
auto stop = std::chrono::high_resolution_clock::now();
std::chrono::duration time_diff = stop - start;
cout << "Czas wykonania programu " << time_diff.count() << " sekund." <<
endl;
for (unsigned int i = 0; i < m; i++)
delete[] matrixa[i];
for (unsigned int i = 0; i < m; i++)
delete[] matrixb[i];
for (unsigned int i = 0; i < m; i++)
delete[] matrixc[i];
delete[] matrixa;
delete[] matrixb;
delete[] matrixc;
return 0;
}
...ANSWER
Answered 2021-Jun-10 at 03:50Firstly, your matrix multiply algorithm is over complex than a normal one(Or it's just wrong), you may reference the wiki for a typical algorithm:
Input: matrices A and B
Let C be a new matrix of the appropriate size
For i from 1 to n:
For j from 1 to p:
Let sum = 0
For k from 1 to m:
Set sum ← sum + Aik × Bkj
Set Cij ← sum
Return C
There is a critical bug in your code, you haven't initialized the result matrix.
So the fixed code may like this:
QUESTION
I found you can create a vector of different types of structs using an enum. When filtering the vector on a common field, such as id
, the compiler doesn't know the type while iterating:
ANSWER
Answered 2021-Jun-09 at 17:51Those are not a common field, they are completely unrelated fields. The fact that they share a name is, as far as the Rust compiler is concerned, an insignificant coincidence. You need to use pattern matching to get the field from either case
QUESTION
I have a function that works under C++14 using the date.h library but I'm converting my program to use C++20 and it's no longer working. What am I doing wrong, please?
My C++14/date.h code is as follows:
...ANSWER
Answered 2021-Jun-09 at 14:45There's a bug in the spec that is in the process of being fixed. And VS2019 faithfully reproduced the spec. Wrap your format string in string{}
, or give it a trailing s
literal to turn it into a string, and this will work around the bug.
QUESTION
I am trying to organise my code into a more OOP style with each class taking on its own header and cpp file. This is my tree in the package folder in the workspace
...ANSWER
Answered 2021-Jun-09 at 02:37In essence you don't need to include either header into the other. This fixes the circular includes.
In ROS_Topic_Thread.h, friend class Master_Thread;
already forward declares Mater_Thread, so you don't need the Master_Thread.h header.
In Master_Thread.h, you can also forward declare class ROS_Topic_Thread;
before the declaration of the Master_Thread class, since you only use references to ROS_Topic_Thread in the transfer_to_master_buffer
method.
QUESTION
I have written code to read the same parquet file using c++ and using python. The time taken to read the file is much less for python than in c++, but as generally we know, execution in c++ is faster than in python. I have attached the code here -
...ANSWER
Answered 2021-Jun-06 at 06:53It is likely that the Python module is bound to functions compiled in a language such as c++ or using cython. The implementation of the python module may thus have better performance, depending on how it reads from the file or processes data.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install chrono
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page