bloat | analyze code size via nm/objdump output | Code Analyzer library
kandi X-RAY | bloat Summary
kandi X-RAY | bloat Summary
Generate webtreemap-compatible JSON summaries of binary size.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Parse a C ++ compiler name .
- Given a list of symbols return a dictionary .
- Parse nm file .
- Serialize a tree structure .
- Parse the objectdump .
- Format a list of sections .
- Clean up suffixes .
- Convert symbol type to human readable string .
- Format number of bytes .
- Dump the sections of the objectdump .
bloat Key Features
bloat Examples and Code Snippets
Community Discussions
Trending Discussions on bloat
QUESTION
If I run git fetch origin
and then git checkout
on a series of consecutive commits, I get a relatively small repo directory.
But if I run git fetch origin
and then git checkout FETCH_HEAD
on the same series of commits, the directory is relatively bloated. Specifically, there seem to be a bunch of large packfiles.
The behavior appears the same whether the commits are all in place at the time of the first fetch
or if they are committed immediately before each fetch.
The following examples use a public repo, so you can reproduce the behavior.
Why is the directory size of example 2 so much larger?
Example 1 (small):
...ANSWER
Answered 2022-Mar-25 at 19:08Because each fetch produces its own packfile and one packfile is more efficient than multiple packfiles. A lot more efficient. How?
First, the checkouts are a red herring. They don't affect the size of the .git/ directory.
Second, in the first example only the first git fetch origin
does anything. The rest will fetch nothing (unless something changed on origin).
Compression works by finding common long sequences within the data and reducing them to very short sequences. If
long block of legal mumbo jumbo
appears dozens of times it could be replaced with a few bytes. But the original long string must still be stored. If there's a single packfile it must only be stored once. If there's multiple packfiles it must be stored multiple times. You are, effectively, storing the whole history of changes up to that point in each packfile.
We can see in the example below that the first packfile is 113M, the second is 161M, the third is 177M, and the final fetch is 209M. The size of the final packfile is roughly equal to the size of the single garbage compacted packfile.
Why do multiple fetches result in multiple packfiles?git fetch
is very efficient. It will only fetch objects you not already have. Sending individual object files is inefficient. A smart Git server will send them as a single packfile.
When you do a single git fetch
on a fresh repository, Git asks the server for every object. The remote sends it a packfile of every object.
When you do git fetch ABC
and then git fetch DEF
s, Git tells the server "I already have everything up to ABC, give me all the objects up to DEF", so the server makes a new packfile of everything from ABC to DEF and sends it.
Eventually your repository will do an automatic garbage collection and repack these into a single packfile.
We can reduce the examples. I'm going to use Rails to illustrate because it has clearly defined tags to fetch.
QUESTION
What is the simplest vertical spacer in Outlook (and supported everywhere else)?
I have two elements, one on top of the other, both with display:block
. I would like to space them without wrapping either in a table. I want a spacer that I can reuse anywhere with only simple adjustments to its height. I need to be able to set it to specific pixel heights. I also need to be able to override the height with a media query to make it responsive.
ANSWER
Answered 2022-Feb-23 at 13:02For a application specific spacer you could use:
QUESTION
We have micro service which consumes(subscribes)messages from 50+ RabbitMQ queues.
Producing message for this queue happens in two places
The application process when encounter short delayed execution business logic ( like send emails OR notify another service), the application directly sends the message to exchange ( which in turn it is sent to the queue ).
When we encounter long/delayed execution business logic We have
messages
table which has entries of messages which has to be executed after some time.
Now we have cron worker which runs every 10 mins which scans the messages
table and pushes the messages to RabbitMQ.
Let's say the messages table has 10,000 messages which will be queued in next cron run,
- 9.00 AM - Cron worker runs and it queues 10,000 messages to RabbitMQ queue.
- We do have subscribers which are listening to the queue and start consuming the messages, but due to some issue in the system or 3rd party response time delay it takes each message to complete
1 Min
. - 9.10 AM - Now cron worker once again runs next 10 Mins and see there are yet 9000+ messages yet to get completed and time is also crossed so once again it pushes 9000+ duplicates messages to Queue.
Note: The subscribers which consumes the messages are idempotent, so there is no issue in duplicate processing
Design Idea I had in my mind but not best logicI can have 4 status ( RequiresQueuing, Queued, Completed, Failed )
- Whenever a message is inserted i can set the status to
RequiresQueuing
- Next when cron worker picks and pushes the messages successfully to Queue i can set it to
Queued
- When subscribers completes it mark the queue status as
Completed / Failed
.
There is an issue with above logic, let's say RabbitMQ somehow goes down OR in some use we have purge the queue for maintenance.
Now the messages which are marked as Queued
is in wrong state, because they have to be once again identified and status needs to be changed manually.
Let say I have RabbitMQ Queue named ( events )
This events queue has 5 subscribers, each subscribers gets 1 message from the queue and post this event using REST API to another micro service ( event-aggregator ). Each API Call usually takes 50ms.
Use Case:
- Due to high load the numbers events produced becomes 3x.
- Also the micro service ( event-aggregator ) which accepts the event also became slow in processing, the response time increased from 50ms to 1 Min.
- Cron workers follows your design mentioned above and queues the message for each min. Now the queue is becoming too large, but i cannot also increase the number of subscribers because the dependent micro service ( event-aggregator ) is also lagging.
Now the question is, If keep sending the messages to events queue, it is just bloating the queue.
https://www.rabbitmq.com/memory.html - While reading this page, i found out that rabbitmq won't even accept the connection if it reaches high watermark fraction (default is 40%). Of course this can be changed, but this requires manual intervention.
So if the queue length increases it affects the rabbitmq memory, that is reason i thought of throttling at producer level.
Questions- How can i throttle my cron worker to skip that particular run or somehow inspect the queue and identify it already being heavily loaded so don't push the messages ?
- How can i handle the use cases i said above ? Is there design which solves my problem ? Is anyone faced the same issue ?
Thanks in advance.
AnswerCheck the accepted answer Comments for the throttling using queueCount
...ANSWER
Answered 2022-Feb-21 at 04:45You can combine QoS - (Quality of service) and Manual ACK to get around this problem. Your exact scenario is documented in https://www.rabbitmq.com/tutorials/tutorial-two-python.html. This example is for python, you can refer other examples as well.
Let says you have 1 publisher and 5 worker scripts. Lets say these read from the same queue. Each worker script takes 1 min to process a message. You can set QoS at channel level. If you set it to 1, then in this case each worker script will be allocated only 1 message. So we are processing 5 messages at a time. No new messages will be delivered until one of the 5 worker scripts does a MANUAL ACK.
If you want to increase the throughput of message processing, you can increase the worker nodes count.
The idea of updating the tables based on message status is not a good option, DB polling is the main reason that system uses queues and it would cause a scaling issue. At one point you have to update the tables and you would bottleneck because of locking and isolations levels.
QUESTION
I need to filter a dataset according to multiple, mutually exclusive conditions. The xor
operator seems useful for this case, but it feels a bit awkward to use in the dplyr::filter
function. The other logical operators (|
, &
, ==
, etc.) allow me to chain the comparisons, but I have not found a way to do that with xor
. Here are the two approaches I could think of:
ANSWER
Answered 2022-Feb-16 at 14:58Use the explicit conversion of booleans to integers to just look where the vectorized sum of the 3 logical checks you're doing is 1.
QUESTION
I am not really getting any smarter from these error messages.
...ANSWER
Answered 2022-Feb-16 at 08:42{1,2,3}
can be multiple things, and make_shared has no possibility of knowing what it is at the time parameter pack is expanded.
If you don't want to state the long std::initializer_list{1,2,3}
explicitly, the easiest solutions would be:
a. shortening the type's name: using ints=std::initializer_list;
b. wrapping the call:
QUESTION
I'm using Pundit gem for my authorization classes, where each controller action is checked against the model policy, to see if action is allowed by the user.
These methods are sometimes becoming quite bloated and unreadable, because I'm checking quite some stuff for some objects.
Now I'm thinking to refactor those methods, and place every "validation" in it's own method:
Previous:
...ANSWER
Answered 2022-Feb-11 at 18:42What you can do is chain &&
operators.
As soon as one is false
, ruby will not evaluate the others (And the update method will return false
).
QUESTION
For such a function, clang
(and sometimes gcc
in certain contexts that I cannot reproduce minimally) seems to generate bloated code when -mavx2
switch is on.
ANSWER
Answered 2022-Jan-13 at 17:32It's auto-vectorizing as well as unrolling, which is a performance win for large arrays (or would be if clang had less overhead), at least on Intel CPUs where popcnt
is 1/clock, so 64 bits per clock. (AMD Zen has 3 or 4/clock popcnt
, so with add
instructions taking an equal amount of the 4 scalar-integer ALU ports, it could sustain 2/clock uint64_t popcnt+load and add.) https://uops.info/
But vpshufb
is also 1/clock on Intel (or 2/clock on Ice Lake), and if it's the bottleneck that's 128 bits of popcount work per clock. (Doing table lookups for the low 4 bits of each of 32 bytes.) But it's certainly not going to be that good, with all the extra shuffling it's doing inside the loop. :/
This vectorization loses on Zen1 where the SIMD ALUs are only 256 bits wide, but should be a significant win on Intel, and maybe a win on Zen2 and later.
But looks like clang widens to 32-bit counts inside the inner loop with vpsadbw
, so it's not as good as it could be. 1024x uint64_t
is 256 __m256i
vectors of input data, and clang is unrolling by 4 so the max count in any one element is only 64, which can't overflow.
Clang is unrolling a surprising amount, given how much work it does. The vextracti128
and vpackusdw
don't make much sense to me, IDK why it would do that inside the loop. The simple way to vectorize without overflow risk is just vpsadbw
-> vpaddq
or vpaddd
, and it's already using vpsadbw
for horizontal byte sums within 8-byte chunks. (A better way is to defer that until just before the byte elements could overflow, so do a few vpaddb
. Like in How to count character occurrences using SIMD, although the byte counters are only incremented by 0 or 1 there, rather than 0 .. 8)
See Counting 1 bits (population count) on large data using AVX-512 or AVX-2, especially Wojciech Muła's big-array popcnt functions: https://github.com/WojciechMula/sse-popcount/ - clang is using the same strategy as popcnt_AVX2_lookup
but with a much less efficient way to accumulate the results across iterations.
QUESTION
I am developing a C++ application, where the program run endlessly, allocating and freeing millions of strings (char*) over time. And RAM usage is a serious consideration in the program. This results in RAM usage getting higher and higher over time. I think the problem is heap fragmentation. And I really need to find a solution.
You can see in the image, after millions of allocation and freeing in the program, the usage is just increasing. And the way I am testing it, I know for a fact that the data it stores is not increasing. I can guess that you will ask, "How are you sure of that?", "How are you sure it's not just a memory leak?", Well.
This test run much longer. I run malloc_trim(0)
, whenever possible in my program. And it seems, application can finally return the unused memory to the OS, and it goes almost to zero (the actual data size my program has currently). This implies the problem is not a memory leak. But I can't rely on this behavior, the allocation and freeing pattern of my program is random, what if it never releases the memory ?
- I said memory pools are a bad idea for this project in the title. Of course I don't have absolute knowledge. But the strings I am allocating can be anything between 30-4000 bytes. Which makes many optimizations and clever ideas much harder. Memory pools are one of them.
- I am using
GCC 11 / G++ 11
as a compiler. If some old versions have bad allocators. I shouldn't have that problem. - How am I getting memory usage ? Python
psutil
module.proc.memory_full_info()[0]
, which gives meRSS
. - Of course, you don't know the details of my program. It is still a valid question, if this is indeed because of heap fragmentation. Well what I can say is, I am keeping a up to date information about how many allocations and frees took place. And I know the element counts of every container in my program. But if you still have some ideas about the causes of the problem, I am open to suggestions.
- I can't just allocate, say 4096 bytes for all the strings so it would become easier to optimize. That's the opposite I am trying to do.
So my question is, what do programmers do(what should I do), in an application where millions of alloc's and free's take place over time, and they are of different sizes so memory pools are hard to use efficiently. I can't change what the program does, I can only change implementation details.
Bounty Edit: When trying to utilize memory pools, isn't it possible to make multiple of them, to the extent that there is a pool for every possible byte count ? For example my strings can be something in between 30-4000 bytes. So couldn't somebody make 4000 - 30 + 1
, 3971 memory pools, for each and every possible allocation size of the program. Isn't this applicable ? All pools could start small (no not lose much memory), then enlarge, in a balance between performance and memory. I am not trying to make a use of memory pool's ability to reserve big spaces beforehand. I am just trying to effectively reuse freed space, because of frequent alloc's and free's.
Last edit: It turns out that, the memory growth appearing in the graphs, was actually from a http request queue in my program. I failed to see that hundreds of thousands of tests that I did, bloated this queue (something like webhook). And the reasonable explanation of figure 2 is, I finally get DDOS banned from the server (or can't open a connection anymore for some reason), the queue emptied, and the RAM issue resolved. So anyone reading this question later in the future, consider every possibility. It would have never crossed my mind that it was something like this. Not a memory leak, but an implementation detail. Still I think @Hajo Kirchhoff deserves the bounty, his answer was really enlightening.
...ANSWER
Answered 2022-Jan-09 at 12:25If everything really is/works as you say it does and there is no bug you have not yet found, then try this:
malloc and other memory allocation usually uses chunks of 16 bytes anyway, even if the actual requested size is smaller than 16 bytes. So you only need 4000/16 - 30/16 ~ 250 different memory pools.
QUESTION
With Python, I wanted to format a string of hex characters:
- spaces between each byte (easy enough):
2f2f
->2f 2f
- line breaks at a specified max byte width (not hard):
2f 2f 2f 2f 2f 2f 2f 2f\n
- address ranges for each line (doable):
0x7f8-0x808: 2f 2f 2f 2f 2f 2f 2f 2f\n
- replace large ranges of sequential
00
bytes with:... trimmed 35 x 00 bytes [0x7 - 0x2a] ...
... it was at this point that I knew I was doing some bad coding. The function got bloated and hard to follow. Too many features piled up in a non-intuitive way.
Example output:
...ANSWER
Answered 2021-Dec-23 at 11:16I would suggest to not start a "trimmed 00 bytes" series in the middle of an output line, but only apply this compacting when it applies to complete output lines with only zeroes.
This means that you will still see non-compacted zeroes in a line that also contains non-zeroes, but in my opinion this results in a cleaner output format. For instance, if a line would end with just two 00 bytes, it really does not help to replace that last part of the line with the longer "trimmed 2 x 00 bytes" message. By only replacing complete 00-lines with this message, and compress multiple such lines with one message, the output format seems cleaner.
To produce that output format, I would use the power of regular expressions:
to identify a block of bytes to be output on one line: either a line with at least one non-zero, or a range of zero bytes which either runs to the end of the input, or else is a multiple of the "byte width" argument.
to insert spaces in a line of bytes
All this can be done through iterations in one expression:
QUESTION
I have been following this tutorial for creating a variadic structure, which is nearly identical to another tutorial on creating a rudimentary tuple from scratch. Unfortunately when I analyze the variadic structure it seems very inefficient. The size of the structure seems bloated as in the struct's size does not seem to match its variable layout. It doesn't seem like byte alignment is the issue since actual tuples do not seem to suffer from this effect so I was wondering how they get around it, or what I am doing wrong in my struct.
Below is the code I have been using to test the variadic struct:
...ANSWER
Answered 2021-Dec-11 at 20:12Even an empty class needs space to store itself, the minimum size of a class is therefore 1
. As your no argument DataStructure
class is empty and a member it takes up space and causes the rest of the members to take more space to allow for alignment. Making the base non-empty fixes the issue:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install bloat
Check out a copy of webtreemap in a webtreemap subdirectory: git clone git://github.com/martine/webtreemap.git
Build your binary with the -g flag to get symbols.
Run ./bloat.py --help and generate nm.out as instructed there.
Example command line: ./bloat.py --strip-prefix=/path/to/src syms > bloat.json
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page