o2 | 2D Game Engine with visual WYSIWYG editor | Game Engine library

by zenkovich C Version: Current License: MIT

X-Ray Key Features Code Snippets(1)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | o2 Summary

o2 is a C library typically used in Gaming, Game Engine, Unity, WebGL applications. o2 has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

o2 - it's an open-source technology for easy making 2D games and applications for mobile and PC platforms using C++ and Lua with very flexible editor. The main features are performance, usability and effective development. Here is the test project: Now work in progress. Discord channel -

Support

Quality

Security

License

Reuse

Support

o2 has a low active ecosystem.

It has 136 star(s) with 10 fork(s). There are 12 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 28 have been closed. On average issues are closed in 1014 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of o2 is current.

Quality

o2 has 0 bugs and 0 code smells.

Security

o2 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

o2 code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

o2 is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

o2 releases are not available. You will need to build from source code and install.

It has 22059 lines of code, 340 functions and 1311 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of o2

Get all kandi verified functions for this library.

o2 Key Features

No Key Features are available at this moment for o2.

o2 Examples and Code Snippets

Copies the o1 o2 properties from o112 object .

javascript

Lines of Code : 19

License : No License

Copy

function copyState(o1, o2) {
    o2.fillStyle     = o1.fillStyle;
    o2.lineCap       = o1.lineCap;
    o2.lineJoin      = o1.lineJoin;
    o2.lineWidth     = o1.lineWidth;
    o2.miterLimit    = o1.miterLimit;
    o2.shadowBlur    = o1.shadowBlur;

Community Discussions

Trending Discussions on o2

Why can compiler not optimize out unused static std::string?

package io/fs is not in GOROOT while building the go project

Is it legal to use an unexpanded parameter pack as the type of a template template parameter's non-type template parameter?

Why is the XOR swap optimized into a normal swap using the MOV instruction?

Wrong result of multiplication: Undefined behavior or compiler bug?

no warning for missing ctor initializer list?

Configuring compilers on Mac M1 (Big Sur, Monterey) for Rcpp and other tools

Bubble sort slower with -O3 than -O2 with GCC

Generate ARM thumb-2 assembly code from android app for Cortex M3 architecture

Assembly why is "lea eax, [eax + eax*const]; shl eax, eax, const;" combined faster than "imul eax, eax, const" according to gcc -O2?

QUESTION

Why can compiler not optimize out unused static std::string?

Asked 2022-Mar-18 at 06:44

If I compile this code with GCC or Clang and enable -O2 optimizations, I still get some global object initialization. Is it even possible for any code to reach these variables?

...

ANSWER

Answered 2022-Mar-18 at 06:44

Compiling that code with short string optimization (SSO) may be an equivalent of taking address of std::string's member variable. Constructor have to analyze string length at compile time and choose if it can fit into internal storage of std::string object or it have to allocate memory dynamically but then find that it never was read so allocation code can be optimized out.

Lack of optimization in this case might be an optimization flaw limited to such simple outlying examples like this one:

Source https://stackoverflow.com/questions/71445432

QUESTION

package io/fs is not in GOROOT while building the go project

Asked 2022-Mar-14 at 19:15

I don't have much experience in go but I have been tasked to execute a go project :)

So i need to build the go project and then execute it

Below is the error when i build the go project. Seems to be some dependency(package and io/fs) is missing

...

ANSWER

Answered 2021-Aug-12 at 05:56

This package requires go v1.16, please upgrade your go version or use the appropriate docker builder.

Source https://stackoverflow.com/questions/68752103

QUESTION

Is it legal to use an unexpanded parameter pack as the type of a template template parameter's non-type template parameter?

Asked 2022-Mar-13 at 12:00

gcc and clang disagree about whether the following code should compile:

...

ANSWER

Answered 2022-Mar-13 at 12:00

It is not valid because:

a type parameter pack cannot be expanded in its own parameter clause.

As from [temp.param]/17:

If a template-parameter is a type-parameter with an ellipsis prior to its optional identifier or is a parameter-declaration that declares a pack ([dcl.fct]), then the template-parameter is a template parameter pack. A template parameter pack that is a parameter-declaration whose type contains one or more unexpanded packs is a pack expansion. ... A template parameter pack that is a pack expansion shall not expand a template parameter pack declared in the same template-parameter-list.

So consider the following invalid example:

Source https://stackoverflow.com/questions/71453755

QUESTION

Why is the XOR swap optimized into a normal swap using the MOV instruction?

Asked 2022-Mar-08 at 10:00

While testing things around Compiler Explorer, I tried out the following overflow-free function for calculating average of 2 unsigned 32-bit integer:

...

ANSWER

Answered 2022-Mar-08 at 10:00

Clang does the same thing. Probably for compiler-construction and CPU architecture reasons:

Disentangling that logic into just a swap may allow better optimization in some cases; definitely something it makes sense for a compiler to do early so it can follow values through the swap.
Xor-swap is total garbage for swapping registers, the only advantage being that it doesn't need a temporary. But xchg reg,reg already does that better.

I'm not surprised that GCC's optimizer recognizes the xor-swap pattern and disentangles it to follow the original values. In general, this makes constant-propagation and value-range optimizations possible through swaps, especially for cases where the swap wasn't conditional on the values of the vars being swapped. This pattern-recognition probably happens soon after transforming the program logic to GIMPLE (SSA) representation, so at that point it will forget that the original source ever used an xor swap, and not think about emitting asm that way.

Hopefully sometimes that lets it then optimize down to only a single mov, or two movs, depending on register allocation for the surrounding code (e.g. if one of the vars can move to a new register, instead of having to end up back in the original locations). And whether both variables are actually used later, or only one. Or if it can fully disentangle an unconditional swap, maybe no mov instructions.

But worst case, three mov instructions needing a temporary register is still better, unless it's running out of registers. I'd guess GCC is not smart enough to use xchg reg,reg instead of spilling something else or saving/restoring another tmp reg, so there might be corner cases where this optimization actually hurts.

(Apparently GCC -Os does have a peephole optimization to use xchg reg,reg instead of 3x mov: PR 92549 was fixed for GCC10. It looks for that quite late, during RTL -> assembly. And yes, it works here: turning your xor-swap into an xchg: https://godbolt.org/z/zs969xh47)

xor-swap has worse latency and defeats mov-elimination

with no memory reads, and the same number of instructions, I don't see any bad impacts and feels odd that it be changed. Clearly there is something I did not think through though, but what is it?

Instruction count is only a rough proxy for one of three things that are relevant for perf analysis: front-end uops, latency, and back-end execution ports. (And machine-code size in bytes: x86 machine-code instructions are variable-length.)

It's the same size in machine-code bytes, and same number of front-end uops, but the critical-path latency is worse: 3 cycles from input a to output a for xor-swap, and 2 from input b to output a, for example.

MOV-swap has at worst 1-cycle and 2-cycle latencies from inputs to outputs, or less with mov-elimination. (Which can also avoid using back-end execution ports, especially relevant for CPUs like IvyBridge and Tiger Lake with a front-end wider than the number of integer ALU ports. And Ice Lake, except Intel disabled mov-elimination on it as an erratum workaround; not sure if it's re-enabled for Tiger Lake or not.)

Also related:

Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? - and those 3 uops can't benefit from mov-elimination. But on modern AMD xchg reg,reg is only 2 uops.

If you're going to branch, just duplicate the averaging code

GCC's real missed optimization here (even with -O3) is that tail-duplication results in about the same static code size, just a couple extra bytes since these are mostly 2-byte instructions. The big win is that the a path then becomes the same length as the other, instead of twice as long to first do a swap and then run the same 3 uops for averaging.

update: GCC will do this for you with -ftracer (https://godbolt.org/z/es7a3bEPv), optimizing away the swap. (That's only enabled manually or as part of -fprofile-use, not at -O3, so it's probably not a good idea to use all the time without PGO, potentially bloating machine code in cold functions / code-paths.)

Doing it manually in the source (Godbolt):

Source https://stackoverflow.com/questions/71382441

QUESTION

Wrong result of multiplication: Undefined behavior or compiler bug?

Asked 2022-Feb-18 at 23:52

Background
While debugging a problem in a numerical library, I was able to pinpoint the first place where the numbers started to become incorrect. However, the C++ code itself seemed correct. So I looked at the assembly produced by Visual Studio's C++ compiler and started suspecting a compiler bug.
Code
I was able to reproduce the behavior in a strongly simplified, isolated version of the code:

sourceB.cpp:
...

ANSWER

Answered 2022-Feb-18 at 23:52

Even though nobody posted an answer, from the comment section I could conclude that:

Nobody found any undefined behavior in the bug repro code.

At least some of you were able to reproduce the undesired behavior.

So I filed a bug report against Visual Studio 2019.

The Microsoft team confirmed the problem.

However, unfortunately it seems like Visual Studio 2019 will not receive a bug fix because Visual Studio 2022 seemingly does not have the bug. Apparently, the most recent version not having that particular bug is good enough for Microsoft's quality standards.

I find this disappointing because I think that the correctness of a compiler is essential and Visual Studio 2022 has just been released with new features and therefore probably contains new bugs. So there is no real "stable version" (one is cutting edge, the other one doesn't get bug fixes). But I guess we have to live with that or choose a different, more stable compiler.

Source https://stackoverflow.com/questions/70823697

QUESTION

no warning for missing ctor initializer list?

Asked 2022-Feb-12 at 22:38

This code is missing a constructor initializer list:

...

ANSWER

Answered 2022-Feb-10 at 13:48

You could add -Weffc++ to catch it (inspired by Scott Meyers book "Effective C++"). Strangely enough it does not refer to any other -W option (and neither does clang++).

The option is however considered, by some, a bit outdated by now, but in this case, it's finding a real problem.

Source https://stackoverflow.com/questions/71065942

QUESTION

Configuring compilers on Mac M1 (Big Sur, Monterey) for Rcpp and other tools

Asked 2022-Feb-10 at 21:07

I'm trying to use packages that require Rcpp in R on my M1 Mac, which I was never able to get up and running after purchasing this computer. I updated it to Monterey in the hope that this would fix some installation issues but it hasn't. I tried running the Rcpp check from this page but I get the following error:

...

ANSWER

Answered 2022-Feb-10 at 21:07

Background
Currently (2022-02-05), CRAN builds R binaries for Apple silicon using Apple clang (from Command Line Tools for Xcode 12.4) and an experimental build of gfortran.

If you obtain R from CRAN (i.e., here), then you need to replicate CRAN's compiler setup on your system before building R packages that contain C/C++/Fortran code from their sources (and before using Rcpp, etc.). This requirement ensures that your package builds are compatible with R itself.

A further complication is the fact that Apple clang doesn't support OpenMP, so you need to do even more work to compile programs that make use of multithreading. You could circumvent the issue by building R itself and all R packages from sources with LLVM clang, which does support OpenMP, but this approach is onerous and "for experts only". There is another approach that has been tested by a few people, including Simon Urbanek, the maintainer of R for macOS. It is experimental and also "for experts only", but seems to work on my machine and is simpler than trying to build R yourself.
Instructions for obtaining a working toolchain
Warning: These instructions come with no warranty and could break at any time. They assume some level of familiarity with C/C++/Fortran program compilation, Makefile syntax, and Unix shells. As usual, sudo at your own risk.

I will try to address compilers and OpenMP support at the same time. I am going to assume that you are starting from nothing. Feel free to skip steps you've already taken, though you might find a fresh start helpful.

I've tested these instructions on a machine running Big Sur, and at least one person has tested them on a machine running Monterey. I would be glad to hear from others.

Download an R binary from CRAN here and install. Be sure to select the binary built for Apple silicon.

Run

Source https://stackoverflow.com/questions/70638118

QUESTION

Bubble sort slower with -O3 than -O2 with GCC

Asked 2022-Jan-21 at 02:41

I made a bubble sort implementation in C, and was testing its performance when I noticed that the -O3 flag made it run even slower than no flags at all! Meanwhile -O2 was making it run a lot faster as expected.

Without optimisations:
...

ANSWER

Answered 2021-Oct-27 at 19:53

It looks like GCC's naïveté about store-forwarding stalls is hurting its auto-vectorization strategy here. See also Store forwarding by example for some practical benchmarks on Intel with hardware performance counters, and What are the costs of failed store-to-load forwarding on x86? Also Agner Fog's x86 optimization guides.

(gcc -O3 enables -ftree-vectorize and a few other options not included by -O2, e.g. if-conversion to branchless cmov, which is another way -O3 can hurt with data patterns GCC didn't expect. By comparison, Clang enables auto-vectorization even at -O2, although some of its optimizations are still only on at -O3.)

It's doing 64-bit loads (and branching to store or not) on pairs of ints. This means, if we swapped the last iteration, this load comes half from that store, half from fresh memory, so we get a store-forwarding stall after every swap. But bubble sort often has long chains of swapping every iteration as an element bubbles far, so this is really bad.

(Bubble sort is bad in general, especially if implemented naively without keeping the previous iteration's second element around in a register. It can be interesting to analyze the asm details of exactly why it sucks, so it is fair enough for wanting to try.)

Anyway, this is pretty clearly an anti-optimization you should report on GCC Bugzilla with the "missed-optimization" keyword. Scalar loads are cheap, and store-forwarding stalls are costly. (Can modern x86 implementations store-forward from more than one prior store? no, nor can microarchitectures other than in-order Atom efficiently load when it partially overlaps with one previous store, and partially from data that has to come from the L1d cache.)

Even better would be to keep buf[x+1] in a register and use it as buf[x] in the next iteration, avoiding a store and load. (Like good hand-written asm bubble sort examples, a few of which exist on Stack Overflow.)

If it wasn't for the store-forwarding stalls (which AFAIK GCC doesn't know about in its cost model), this strategy might be about break-even. SSE 4.1 for a branchless pmind / pmaxd comparator might be interesting, but that would mean always storing and the C source doesn't do that.

If this strategy of double-width load had any merit, it would be better implemented with pure integer on a 64-bit machine like x86-64, where you can operate on just the low 32 bits with garbage (or valuable data) in the upper half. E.g.,

Source https://stackoverflow.com/questions/69503317

QUESTION

Generate ARM thumb-2 assembly code from android app for Cortex M3 architecture

Asked 2021-Dec-16 at 16:58

I want to build an Android app which will be an interface to convert C++ into assembly code for ARM Cortex M3 architecture.

I'm not an android java developer, and I do mainly arduino projects with C/C++. So I need your help to point me in good directions about how to build an android app with java in Android Studio or similar, which will be able to convert from C++ source code to ASM code M3 Cortex.

I did some research and found that I need to use ARM NONE EABI GCC compiler to generate ASM code from C++, simple like these command line instructions:
...

ANSWER

Answered 2021-Dec-16 at 16:58

A solution would be if in Termux app you will do next things: (more details here)

pkg install proot

pkg install proot-distro

proot-distro install debian

proot-distro login debian

After that you should be logged in a Debian environment, and you can install almost any Arm packages available on debian repositories.

For example you should be able to install this Cortex compiler:

Source https://stackoverflow.com/questions/70233126

QUESTION

Assembly why is "lea eax, [eax + eax*const]; shl eax, eax, const;" combined faster than "imul eax, eax, const" according to gcc -O2?

Asked 2021-Dec-13 at 10:27

I'm using godbolt to get assembly of the following program:

...

ANSWER

Answered 2021-Dec-13 at 06:33

You can see the cost of instructions on most mainstream architecture here and there. Based on that and assuming you use for example an Intel Skylake processor, you can see that one 32-bit imul instruction can be computed per cycle but with a latency of 3 cycles. In the optimized code, 2 lea instructions (which are very cheap) can be executed per cycle with a 1 cycle latency. The same thing apply for the sal instruction (2 per cycle and 1 cycle of latency).

This means that the optimized version can be executed with only 2 cycle of latency while the first one takes 3 cycle of latency (not taking into account load/store instructions that are the same). Moreover, the second version can be better pipelined since the two instructions can be executed for two different input data in parallel thanks to a superscalar out-of-order execution. Note that two loads can be executed in parallel too although only one store can be executed in parallel per cycle. This means that the execution is bounded by the throughput of store instructions. Overall, only 1 value can only computed per cycle. AFAIK, recent Intel Icelake processors can do two stores in parallel like new AMD Ryzen processors. The second one is expected to be as fast or possibly faster on the chosen use-case (Intel Skylake processors). It should be significantly faster on very recent x86-64 processors.

Note that the lea instruction is very fast because the multiply-add is done on a dedicated CPU unit (hard-wired shifters) and it only supports some specific constant for the multiplication (supported factors are 1, 2, 4 and 8, which mean that lea can be used to multiply an integer by the constants 2, 3, 4, 5, 8 and 9). This is why lea is faster than imul/mul.
UPDATE (v2):
I can reproduce the slower execution with -O2 using GCC 11.2 (on Linux with a i5-9600KF processor).

The main source of source of slowdown comes from the higher number of micro-operations (uops) to be executed in the -O2 version certainly combined with the saturation of some execution ports certainly due to a bad micro-operation scheduling.

Here is the assembly of the loop with -Os:

Source https://stackoverflow.com/questions/70316686

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities
No vulnerabilities reported

Install o2
You can download it from GitHub.

Support
For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
Find more information at: