cost-model | Cross-cloud cost allocation models for Kubernetes workloads | GCP library

by kubecost Go Version: v1.92.0 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(4)Vulnerabilities Install Support

kandi X-RAY | cost-model Summary

cost-model is a Go library typically used in Cloud, GCP applications. cost-model has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Kubecost models give teams visibility into current and historical Kubernetes spend and resource allocation. These models provide cost transparency in Kubernetes environments that support multiple applications, teams, departments, etc.

Support

Quality

Security

License

Reuse

Support

cost-model has a medium active ecosystem.

It has 1831 star(s) with 191 fork(s). There are 22 watchers for this library.

It had no major release in the last 12 months.

There are 76 open issues and 212 have been closed. On average issues are closed in 44 days. There are 26 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of cost-model is v1.92.0

Quality

cost-model has 0 bugs and 0 code smells.

Security

cost-model has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

cost-model code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

cost-model is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

cost-model releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 44991 lines of code, 1925 functions and 153 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cost-model

Get all kandi verified functions for this library.

cost-model Key Features

No Key Features are available at this moment for cost-model.

cost-model Examples and Code Snippets

No Code Snippets are available at this moment for cost-model.

Community Discussions

Trending Discussions on cost-model

Is it possible to vectorize non-trivial loop in C with SIMD? (multiple length 5 double-precision dot products reusing one input)

Why is vectorization not beneficial in this for loop?

Packing non-contiguous vector elements in AVX (and higher)

Speed up compilation and bench-marking of schedules

QUESTION

Is it possible to vectorize non-trivial loop in C with SIMD? (multiple length 5 double-precision dot products reusing one input)

Asked 2022-Jan-11 at 16:09

I have a performance critical C code where > 90% of the time is spent doing one basic operation:

The C code I am using is:

...

ANSWER

Answered 2022-Jan-11 at 07:45

You can at least process 2 elements at a time by loading the lower and upper half registers separately. Unrolling i by two may give a small edge...

The __restrict keyword, if applicable, allows the five constant coefficients X1[0..4], X2[0..4] to be preloaded. If X1 or X2 partially alias output, it's better to let the compiler know it (by using the same array). In this way, as the complete function is unrolled, the compiler will not reload any element unnecessarily.

Source https://stackoverflow.com/questions/70652936

QUESTION

Why is vectorization not beneficial in this for loop?

Asked 2021-Jan-12 at 17:10

I am trying to vectorize this for loop. After using the Rpass flag, I am getting the following remark for it:

...

ANSWER

Answered 2021-Jan-12 at 17:10

It's hard to answer without more details about your types. But in general, starting a loop incurs some costs and vectorising also implies some costs (such as moving data to/from SIMD registers, ensuring proper alignment of data)

I'm guessing here that the compiler tells you that the vectorisation cost here is bigger than simply running the 8 iterations without it, so it's not doing it.

Try to increase the number of iterations, or help the compiler for computing alignement for example.

Typically, unless the type of array's item are exactly of the proper alignment for SIMD vector, accessing an array from a "unknown" offset (what you've called someOuterVariable) prevents the compiler to write an efficient vectorisation code.

EDIT: About the "interleaving" question, it's hard to guess without knowning your tool. But in general, interleaving usually means mixing 2 streams of computations so that the compute units of the CPU are all busy. For example, if you have 2 ALU in your CPU, and the program is doing:

Source https://stackoverflow.com/questions/65680489

QUESTION

Packing non-contiguous vector elements in AVX (and higher)

Asked 2020-Nov-16 at 14:27

Having codes of this nature:

...

ANSWER

Answered 2020-Nov-12 at 20:46

vfmaddXXXsd and pd instructions are "cheap" (single uop, 2/clock throughput), even cheaper than shuffles (1/clock throughput on Intel CPUs) or gather-loads. https://uops.info/. Load operations are also 2/clock, so lots of scalar loads (especially from the same cache line) are quite cheap, and notice how 3 of them can fold into memory source operands for FMAs.

Worst case, packing 4 (x2) totally non-contiguous inputs and then manually scattering the outputs is definitely not worth it vs. just using scalar loads and scalar FMAs (especially when that allows memory source operands for the FMAs).

Your case is far from the worst case; you have 3 contiguous elements from 1 input. If you know you can safely load 4 elements without risk of touching an unmapped page, that takes care of that input. (And you can always use maskload). But the other vector is still non-contiguous and may be a showstopper for speedups.

It's usually not worth it if it would take more total instructions (actually uops) to do it via shuffling than plain scalar. And/or if shuffle throughput would be a worse bottleneck than anything in the scalar version.

(vgatherdpd counts as many instructions for this, being multi-uop and doing 1 cache access per load. Also you'd have to load constant vectors of indices instead of hard-coding offsets into addressing modes.

Also, gathers are quite slow on AMD CPUs, even Zen2. We don't have scatter at all until AVX512, and those are slow even on Ice Lake. Your case doesn't need scatters, though, just a horizontal sum. Which will involve more shuffles and vaddpd / sd. So even with a maskload + gather for inputs, having 3 products in separate vector elements is not particularly convenient for you.)

A little bit of SIMD (not a whole array, just a few operations) can be helpful, but this doesn't look like one of the cases where it's a significant win. Maybe there's something worth doing, like maybe replace 2 loads with a load + a shuffle. Or maybe shorten a latency chain for y[5] by summing the 3 products before adding to the output, instead of the chain of 3 FMAs. That might even be numerically better, in cases where an accumulator can hold a large number; adding multiple small numbers to a big total loses precision. Of course that would cost 1 mul, 2 FMA, and 1 add.

Source https://stackoverflow.com/questions/64810953

QUESTION

Speed up compilation and bench-marking of schedules

Asked 2020-May-31 at 17:31

I am making a program that is bench-marking a lot of generated schedules for a particular algorithm. But that is taking a lot of time, for the most part due to the compilation of each schedule. And I was wondering If there are any ways to speed up this process.

For example using AOT compilation or generators, but I don't think it is possible to give a generator different schedules after it has been created? (E.g. have the schedule as an input parameter.)

Or are there any compiler flags that can give a significant speed-up?

However I also saw that in the autoscheduler a cost-model is used to predict the execution time of a schedule, this would solve my problem. But I cannot figure out if it is possible or how to use this cost model in my own program, and if it only works for schedules that the autoscheduler generated or for every schedule.

...

ANSWER

Answered 2020-May-31 at 17:31

Unfortunately there's no great answer. The bulk of the compile time is in Halide lowering and in LLVM, which must be done separately for every schedule, so just reusing a Generator won't help you. You can use Func::specialize on a boolean input param to switch between schedules at runtime, but that doesn't save you much compile time relative to compiling the options separately.

The cost model in the autoscheduler is specific to its representation of the subspace of Halide schedules that it explores, and wouldn't work on arbitrary Halide schedules.

There's one trick that might help: If your algorithm is long and complicated, and you know where some of the compute_roots should be (e.g. the last thing before a conv layer), then you can break your algorithm into multiple pieces and independently search over schedules for each. Compiling smaller algorithms is moderately faster, but more importantly this will make the overall search more efficient in terms of the number of samples it needs to take.

Source https://stackoverflow.com/questions/62115021

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cost-model

You can deploy Kubecost on any Kubernetes 1.8+ cluster in a matter of minutes, if not seconds. Visit the Kubecost docs for recommended install options. Compared to building from source, installing from Helm is faster and includes all necessary dependencies.