java-performance | small examples to test and explain simple Java | Runtime Evironment library
kandi X-RAY | java-performance Summary
kandi X-RAY | java-performance Summary
Set of small examples to test and explain simple Java performance tips.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Start the VisualVM
- Display runnable execution time
- Waits for enter
- Sets up this benchmark
- Do memoize function
- Memoize a function
- Returns the default configuration
- Return the value for one config
- Without initial capacity
- Reverse the original list
- Parameterized message size
- Starts the benchmark
- Test whether a string is equal to an empty string
- Test to see if something is empty
- Parameterized message handling
- Split the input string
- Benchmark for parallel streams
- Measure the pattern split
- Main entry point
- Sleep for exponentialIO IO
- Start with initial capacity
- Setup the default configuration
- Performs the setup
- Perform setup
- Small helper method for calculating the index
- This benchmark is used to add random lines to the buffer
java-performance Key Features
java-performance Examples and Code Snippets
Community Discussions
Trending Discussions on java-performance
QUESTION
I followed this guide (http://tutorials.jenkov.com/java-performance/jmh.html) and have opened a new project with that class MyBenchmark which looks like this:
...ANSWER
Answered 2021-Mar-17 at 17:41You need to build an executable JAR.
See e.g. How can I create an executable JAR with dependencies using Maven? for information how to do this with Maven.
You can use the maven assembly or maven shade plugin.
QUESTION
There are some articles available online where they mention some of the cons of using Stream
-s over old loop
-s:
- https://blog.jooq.org/2015/12/08/3-reasons-why-you-shouldnt-replace-your-for-loops-by-stream-foreach/
- https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
But is there any impact from the GC perspective? As I assume (is it correct?) that every stream call creates some short-lived objects underneath. If the particular code fragment which uses streams is called frequently by the underlying system could it cause eventually some performance issue from the GC perspective or put extra pressure on GC? Or the impact is minimal and could be ignored most of the time?
Are there any articles covering this more in detail?
...ANSWER
Answered 2020-Jan-15 at 04:53To be fair, it's very complicated to give an answer when Holger already linked the main idea via his answer; still I will try to.
Extra pressure on GC - may be. Extra time for a GC cycle to execute - most probably not. Ignorable? I'd say totally. In the end what you care from a GC
- that it takes little time to reclaim lots of space, preferably with super tiny stop-the-world events.
Let's talk about the potential overhead in the GC main two phases : mark and evacuation/realocation (Shenandoah/ZGC). First mark
phase, where GC
finds out what is garbage (by actually identifying what is alive).
If objects that were created by the Stream internals are not reachable, they will never be scanned (zero overhead here) and if they are reachable, scanning them will be extremely fast. The other side of the story is: when you create an Object and GC
might touch it while it's running in the mark phase, the slow path of a LoadBarrier (in case of Shenandoah
) will be active. This will add some tens of ns
I assume to the total time of that particular phase of the GC
as well as some space in the SATB
queues. Aleksey Shipilev in one talk said that he tried to measure the overhead from executing a single barrier and could not, so he measured 3
and the time was in the region of tens of ns
. I don't know the exact details of ZGC, but a LoadBarrier is there in place too.
The main point is that this mark phase is done in a concurrent fashion, while the application is running, so you application will still run perfectly fine. And even if some GC code will be triggered to do something specific work (Load Barrier), it will be extremely fast and completely transparent to you.
The second phase is "compactation", or making space for future allocations. What a GC does is move live objects from regions with the most garbage (Shenandoah
for sure) to regions that are empty. But only live objects. So if a certain region has 100 objects and only 1 is alive, only 1 will be moved, then that entire region is going to be marked as free. So potentially if the Stream implementation generated only garbage (i.e.: not currently alive), it is "free lunch" for GC, it will not even know it existed.
The better picture here is that this phase is still done concurrently. To keep the "concurrency" active, you need to know how much was allocated from start to end of a GC cycle. This amount is the minimum "extra" space you need to have on top of the java process in order for a GC to be happy.
So overall, you are looking at a super tiny impact; if any at all.
QUESTION
Consider the following two snippets of code on an array of length 2:
...ANSWER
Answered 2019-Nov-30 at 10:29The loop presented likely falls under the "non counted" category of loops, which are loops for which the iteration count can neither be determined at compile time nor at run time. Not only because of @Andreas argument about the array size but also because of the randomly conditional break
(that used to be in your benchmark when I wrote this post).
State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can optimize the unrolled code. See this 2017 paper for details where they make proposals how to unroll such stuff too.
From this follows, that your assumption does not hold that you did sort of "manual unrolling" of the loop. You're considering it a basic loop unrolling technique to transform an iteration over an array with conditional break to an &&
chained boolean expression. I'd consider this a rather special case and would be surprised to find a hot-spot optimizer do a complex refactoring on the fly. Here they're discussing what it actually might do, perhaps this reference is interesting.
This would reflect closer the mechanics of a contemporary unrolling and is perhaps still nowhere near what unrolled machine code would look like:
QUESTION
HiI saw an online answer for counting the distinct prime-factors of a number, and it looked non-optimal. So I tried to improve it, but in a simple benchmark, my variant is much slower than the original.
The algorithm counts the distinct prime factors of a number. The original uses a HashSet to collect the factors, then uses size to get their number. My "improved" version uses an int counter, and breaks up while loops into if/while to avoid unnecessary calls.
Update: tl/dr (see accepted answer for details)
The original code had a performance bug calling Math.sqrt unnecessarily that the compiler fixed:
...ANSWER
Answered 2019-Jul-02 at 14:18First your block if here :
for (int i = 3; i <= Math.sqrt(n); i += 2) { if (n % i == 0) {...
should be out of the loop,
Secondly, you can perform this code with differents methodes like :
while (n % 2 == 0) {
Current++;
n /= 2;
}
you can change it with :
if(n % 2 ==0) {
current++;
n=n%2; }
Essentially, you should avoid conditions or instruction inside loops because of your methode:
(findNumberWithNPrimeFactors)
the complexity of your algorithm is the complexity of each loop (findNumberWithNPrimeFactors) X ( iteration number )
if you add a test or an affectation inside your loop you will get a + 1 ( Complexity (findNumberWithNPrimeFactors) X ( iteration number ) )
QUESTION
We all know streams allow us parallel foreach execution, e.g. (actual code not important):
...ANSWER
Answered 2018-Oct-04 at 01:08Not exactly, but Java 7 added the ForkJoinPool, which was specifically meant to execute decomposed subtasks (parts of a larger task) in parallel. This could easily be applied to a Collection
.
Java 5 also added the ThreadPoolExecutor, which isn't specifically for running decomposed subtasks, but it could still be used for it with a little more work.
QUESTION
We have server APIs to support clients running on ten millions devices. Normally clients call server once a day. That is about 116 clients seen per second. For each client (each with unique ID), it may make several APIs calls concurrently. Server then need to sequence those API calls from the same client. Because, those API calls will update the same document in the Mongodb database. For example: last seen time and other embedded documents.
Therefore, I need to create a synchronization mechanism based on client's unique Id. After some research, I found String Pool is appealing and easy to implement. But, someone made a comment that locking on String Pool may conflict with other library/module which also use it. And, therefore, String Pool should never be used for synchronization purpose. Is the statement true? Or should I implement my own "String Pool" by WeakHashMap as mentioned in the link below?
Good explanation of String Pool implementation in Java: http://java-performance.info/string-intern-in-java-6-7-8/
Article stating String Pool should not be use for synchronization: http://www.journaldev.com/1061/thread-safety-in-java
==================================
Thanks for BeeOnRope's suggestion, I will use Guava's Interner to explain the solution. This way, client that don't send multiple requests at the same time will not be blocked. In addition, it guarantees only one API request from one client is processed at the same time. By the way, we need to use a wrapper class as it's bad idea to lock on String object as explained by BeeOnRope and the link he provided in his answer.
...ANSWER
Answered 2017-Jul-24 at 19:50Well if your strings are unique enough (e.g., generated via a cryptographic hash1) synchronizing on client IDs will probably work, as long as you call String.intern()
on them first. Since the IDs are unique, you aren't likely to run into conflicts with other modules, unless you happen to pass your IDs in to them and they follow the bad practice of locking on them.
That said, it is probably a bad idea. In addition to the small chance of one day running into unnecessary contention if someone else locks on the same String
instance, the main problem is that you have to intern()
all your String
objects, and this often suffers from poor performance because of the native implementation of the string intern table, it's fixed size, etc. If you really need to lock based only on a String
, you are better off using Guava's Interners.newWeakInterner()
interner implementation, which is likely to perform much better. Wrap your string in another class to avoid clashing on the built-in String
lock. More details on that approach in this answer.
Besides that, there is often another natural object to lock on, such as a lock in a session object, etc.
This is quite similar to this question which has more fleshed out answers.
1 ... or, at a minimum, have at least have enough bits to make collision unlikely enough and if your client IDs aren't part of your attack surface.
QUESTION
I'm trying to gather some profiling data for my app and I run the perf tool and Flame Graphs for that.
I'm referring the instructions provided in this slideshare: https://www.slideshare.net/brendangregg/java-performance-analysis-on-linux-with-flame-graphs
Below are the commands that I'm running:
...ANSWER
Answered 2017-Apr-30 at 02:51Failed to open /tmp/perf-9931.map
message is not about incorrect debuginfo - it is about profiling code which was generated by JIT (and Java usually generate machine code from class files with JIT), when there was no compatible with perf profiling agent running.
In http://www.brendangregg.com/perf.html#JIT_Symbols there is recommendation "Java can do this with perf-map-agent" to use https://github.com/jvm-profiling-tools/perf-map-agent which will generate map files for perf:
Architecture
Linux perf tools will expect symbols for code executed from unknown memory regions at /tmp/perf-.map. This allows runtimes that generate code on the fly to supply dynamic symbol mappings to be used with the perf suite of tools.
perf-map-agent is an agent that will generate such a mapping file for Java applications. It consists of a Java agent written C and a small Java bootstrap application which attaches the agent to a running Java process.
When the agent is attached it instructs the JVM to report code blobs generated by the JVM at runtime for various purposes. Most importantly, this includes JIT-compiled methods but also various dynamically-generated infrastructure parts like the dynamically created interpreter, adaptors, and jump tables for virtual dispatch (see vtable and itable entries). The agent creates a /tmp/perf-.map file which it fills with one line per code blob that maps a memory location to a code blob name.
The Java application takes the PID of a Java process as an argument and an arbitrary number of additional arguments which it passes to the agent. It then attaches to the target process and instructs it to load the agent library.
And in https://www.slideshare.net/brendangregg/java-performance-analysis-on-linux-with-flame-graphs Gregg used special hacked build of OpenJDK - slide 36 - "-XX:+PreserveFramePointer
• I hacked OpenJDK x86_64 to support frame pointers".
And from slide 41 Gregg talks about /tmp/perf-*.map
files:
Fixing Symbols
• For JIT'd code, Linux perf already looks for an externally provided symbol file: /tmp/perf-PID.map, and warns if it doesn't exist • This file can be created by a Java agent
QUESTION
I use Java primarily for writing pet projects, which are idle most of the time. And after being idle for hours/days response time increases to seconds (up to 10s), then slowly decreases back to 200-300ms.
As far as I understand, this happens because of JIT deoptimization (optimized code becomes marked as a zombie, removed and later compiled again).
Is there any way to forbid JVM to deoptimize code unless code cache is full? Java 9's AOT looks like the best solution for this case, but I still haven't managed to make it work.
UPD: And as always, the right solution is the obvious one. Looks like the problem was actually caused by swap. Despite 12 GB of ram, 6 of which were free, about 100 MB of every JVM's memory was swapped to HDD after a while.
Nevertheless @apangin's answer can be useful for someone else who run into the same situation, so I leave this question here. Thanks all!
...ANSWER
Answered 2017-Apr-18 at 08:03-XX:-UseCodeCacheFlushing
disables sweeping compiled methods altogether.
Though this is the answer to the given question, I highly doubt this will resolve your original problem.
When an application is idle, NMethod sweeper is also idle. It is also unlikely that JIT compilation is so slow that it takes tens of seconds before the hot code is (re-)compiled. Flushed file caches, stale network connections etc. are more likely reasons for such slowdowns.
QUESTION
I'm currently trying to build some code execution optimization for a contest, and was looking at the ObjectPool pattern to favor object reuse instead of new object instantiation.
I've put together a small project (and the only test class) to investigate some of the things I see and don't understand.
What I'm doing:
- compare the creation of very simple objects for 5 000 000 iterations using both the new() and Pool.get() operations
- play around three axes, running all tests with and without:
- a "warmup" that runs the loop once before doing the measurements
- assigning the newly creating object to a local variable and using it for some computation
- using fixed vs random parameters as arguments
The results I have are:
Figures are for new instantiation vs with object pool for 5 000 000 iterations
without_warmup_without_new_object_use_with_random_parameters: 417 vs 457
without_warmup_without_new_object_use_with_fixed_parameters: 11 vs 84
without_warmup_with_new_object_use_with_random_parameters: 515 vs 493
without_warmup_with_new_object_use_with_fixed_parameters: 64 vs 90
with_warmup_without_new_object_use_with_random_parameters: 284 vs 419
with_warmup_without_new_object_use_with_fixed_parameters: 8 vs 55
with_warmup_with_new_object_use_with_random_parameters: 410 vs 397
with_warmup_with_new_object_use_with_fixed_parameters: 69 vs 82
What I notice from that:
- Using fixed parameters has a huge impact when instantiating a new object without reusing it. My guess was that the compiler was doing some kind of optimization and found that there was no side-effects and would remove the object instantiation altogether, but comparing the perfs with an empty loop shows that something still happens
- Using fixed parameters has a significant impact (though less pronounced) for the speed of new Object(), making it faster than the object pool version in some cases
- The object pool is faster in the "real life" scenarios (ie reuse the new objects and use somewhat random params), but not in most of them, which also hints at a compiler optimization.
What I'm looking for here is to understand these results, and get pointers to docs / books that I could read to get a good knowledge of what happens behind the scenes in these cases.
Thanks!
...ANSWER
Answered 2017-Jan-28 at 14:45As mentioned in the comment by Mike Nakis the difference between your tests with random parameters vs those with fixed parameters is entirely due to the expense of generating the random number, a fairer test might be to generate a 10 million entry array of random integers (1 for each parameter needed to initialise a Point
) before engaging the loop and comparing that to a 10 million entry array of number picked by you (i.e 1 and 2) that way you are comparing like for like, without including the expense of the random number generation in your test results.
The reason why your pool is performing worse than initialising new objects each time(at least in terms of execution time), is because the object that you are storing in your pool is a relatively trivial object that will take next to no time to initialise. As such the conditional statement that you are evaluating:
QUESTION
The user uploads a huge file consisting of 1 million words. I parse the file and put the each line of the file into a LinkedHashMap
.
I need O(1) access and removal by key. Also, I need to preserve the access order, iterate from any position and sort.
The memory consumption is huge. I enabled String
s deduplication feature which appears in Java 8, but it turns out that the LinkedHashMap
consumes most of the memory.
I found that LinkedHashMap.Entry
consumes 40 bytes, but there are only 2 pointers - one for the next entry and one for the previous entry. I thought 1 pointer should be 64 bits or 32 bits. Buy if I divide 409,405,320(bytes) by 6,823,422(entries count) I have 60 bytes per entry.
I think I don't need the previous pointer, the next pointer should be enough to keep order. Why does LinkedHashMap
consume so much memory? How can I reduce memory consumption?
ANSWER
Answered 2017-Jan-05 at 03:22How to reduce memory consumption?
1) Add -XX:+UseCompressedOops
flag to your JVM startup.
2) Implement your own version of LinkedHashMap, optimized for your needs. I. e. use primitive int
as a key instead of Integer
, remove "previous" pointer if you don't need it, etc. Note that copying OpenJDK source might be impossible unless you wish to release your modified hash map implementation under GPLv2 license, because OpenJDK is GPLv2. However you can copy and modify LinkedHashMap implementation from Android Open Source Project, because it is Apache licensed.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install java-performance
You can use java-performance like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the java-performance component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page