ctypeslib | Generate python ctypes classes from C headers | Compiler library
kandi X-RAY | ctypeslib Summary
kandi X-RAY | ctypeslib Summary
Quick usage guide in the docs/ folder.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Handle variable type
- Prints a comment
- Finds a library with the given function
- Generate code
- Parse MACRO_DEFINITION
- Get a unique name for a cursor
- Clean string literal
- Handles literal parsing
- Translate code into python code
- Parse options
- Parse record declaration
- Handle function definition
- Handles function declaration
- Prints an enumeration
- Generate the body of a struct
- Prints a macro
- Generate code for a function
- Processes a struct
- Handles POINTER declarations
- Processes a structure head
- Handles variable declaration
- Parse parameter declaration
- Return the value of the given cursor
- Parse a typedef declaration
- Handles array types
- Handle typedef
ctypeslib Key Features
ctypeslib Examples and Code Snippets
Community Discussions
Trending Discussions on ctypeslib
QUESTION
I am a newbie at coding. My sysyem is win10 64 bit. I'm using CodeBlocks to write C code. Also I'm using Python 3.10. This is my C code. This function adds the value 0.99 to all elements of a matrix. I want to call this function from python 3. My C file is;
cab2.c
...ANSWER
Answered 2022-Apr-11 at 21:01Requirements: With a C-function, 0.99 is to be added to all elements of a matrix. The matrix is to come from Python.
There are a couple of problems:
- the argument
s
in the C function seems to be the intended Python matrix, but the parameter is not used at all in the C function. - the type for
s
is wrong, assuming a contiguous array it should just befloat *
c_bes
has no declared return type, you could usevoid
since the matrix elements can be manipulated in-place- if you would prefer dynamically allocated memory on the C side after all, you should use the correct sizes, so if you want to work with a float array, you should not use
sizeof(int)
butsizeof(float)
. - if dynamic memory is used, you should also free it after use
- maybe use better naming to make the code more readable, e.g. something like
matrix
,cols
,rows
etc - with the compiler options
-Wall -Wextra
you get helpful warnings about possible problems in the code, use these = 0.9
assigns a fixed value, but actually you want to add something to the matrix element, so use the addition assignment+= 0.99f
. Thef
at the end indicates that you want to work with a float constant and not a double constant- in Python, the types of function arguments can be described using
argtypes
- with
dtype
the desired type of the matrix elements can be specified
If you apply the points just listed to your example code, it might look something like this:
cab2.h:
QUESTION
I am currently trying to figure a way to return a mutidimensional array (of doubles) from a shared library in C to python and make it an np.array. My current approach looks like this:
shared library ("utils.c")
...ANSWER
Answered 2022-Feb-21 at 10:17First of all, a scoped C array allocated on the stack (like in somefunction
) must never be returned by a function. The space of the stack will be reused by other function like the one of CPython for example. The returned array must be allocated on the heap instead.
Moreover, writing a function working with Numpy arrays using ctypes is pretty cumbersome. As you found out, you need to pass the full shape in parameter. But the thing is you also need to pass the strides for each dimension and for each input arrays in parameter of the function since they may not be contiguous in memory (for example np.transpose
change this). That being said, we can assume that the input array is contiguous for sake of performance and sanity. This can be enforced with np.ascontiguousarray
. The pointer of the views a
and b
can be extracted using numpy.ctypeslib.as_ctypes
, but hopefully ctype can do that automatically. Furthermore, the returned array is currently a C pointer and not a Numpy array. Thus, you need to create a Numpy array with the right shape and strides from it numpy.ctypeslib.as_array
. Because the resulting shape is not known from the caller, you need to retrieve it from the callee function using several integer pointers (one per dimension). In the end, this results in a pretty-big ugly highly-bug-prone code (which will often silently crash if anything goes wrong not to mention the possible memory leaks if you do not pay attention to that). You can use Cython to do most of this work for you.
Assuming you do not want to use Cython or you cannot, here is an example code with ctypes:
QUESTION
I am trying to use ctypes to interface with c code. I am trying to pass obj by reference to c and I am using ndpointer function from numpy to pass to the function.
My C code:
...ANSWER
Answered 2022-Feb-17 at 00:00Your C function takes int *
arguments but your np.array
is using 64-bit integers. In C, int
is 4 bytes (32-bit). Try np.int32
.
QUESTION
I want to write a function which will take an index lefts
of shape (N_ROWS,)
I want to write a function which will create a matrix out = (N_ROWS, N_COLS)
matrix such that out[i, j] = 1
if and only if j >= lefts[i]
. A simple example of doing this in a loop is here:
ANSWER
Answered 2021-Dec-09 at 23:52Numba currently uses LLVM-Lite to compile the code efficiently to a binary (after the Python code has been translated to an LLVM intermediate representation). The code is optimized like en C++ code would be using Clang with the flags -O3
and -march=native
. This last parameter is very important as is enable LLVM to use wider SIMD instructions on relatively-recent x86-64 processors: AVX and AVX2 (possible AVX512 for very recent Intel processors). Otherwise, by default Clang and GCC use only the SSE/SSE2 instructions (because of backward compatibility).
Another difference come from the comparison between GCC and the LLVM code from Numba. Clang/LLVM tends to aggressively unroll the loops while GCC often don't. This has a significant performance impact on the resulting program. In fact, you can see that the generated assembly code from Clang:
With Clang (128 items per loops):
QUESTION
I'm trying to speed up my heavy calculations on large numpy
arrays by applying this tutorial here to my use case. In principal, I have an input array and a result array and want to share them throughout many processes, in which data are read from the input array, are tweaked and then written to the output array. I don't think I need locks, since the array indices for reading and writing will be unique for each process.
Here is my test example, based on the linked tutorial:
...ANSWER
Answered 2021-Sep-08 at 11:14So after many failures and countless tries, I came up with this, which seems to work:
QUESTION
I cannot figure out how to use numpy.ndpointer as a field in a python ctypes structure. Here is some basic code, showing how I can do this without using ndpointer, but I would like to know if it's possible to use ndpointer as well if it's possible!
...ANSWER
Answered 2021-Jul-29 at 20:30Listing [Python.Docs]: ctypes - A foreign function library for Python.
According to [NumPy]: numpy.ctypeslib.ndpointer(dtype=None, ndim=None, shape=None, flags=None) (emphasis is mine):
An ndpointer instance is used to describe an ndarray in restypes and argtypes specifications.
argtypes and restype only have meaning in the context of a function (call), and that is when ndpointer should be used (exemplified in the URL).
As a workaround, you can have an ndpointer structure member, but you have to initialize it in the same manner as the ctypes.POINTER one (in MyStruct_Working). However, bear in mind that this will come with some limitations (you won't have indexing) afterwards.
My suggestion is to stick with ctypes.POINTER(ctypes.c_double).
QUESTION
I have a function in C which returns an array of data and its length.
Everything compiles and works fine and I can invoke the function.
...ANSWER
Answered 2021-Jul-25 at 07:51I finally found the way:
QUESTION
I currently encounter huge overhead because of NumPy's transpose function. I found this function virtually always run in single-threaded, whatever how large the transposed matrix/array is. I probably need to avoid this huge time cost.
To my understanding, other functions like np.dot
or vector increment would run in parallel, if numpy array is large enough. Some element-wise operations seems to be better parallelized in package numexpr, but numexpr probably couldn't handle transpose.
I wish to learn what is the better way to resolve problem. To state this problem in detail,
- Sometimes NumPy runs transpose ultrafast (like
B = A.T
) because the transposed tensor is not used in calculation or be dumped, and there is no need to really transpose data at this stage. When callingB[:] = A.T
, that really do transpose of data. - I think a parallelized transpose function should be a resolution. The problem is how to implement it.
- Hope the solution does not require packages other than NumPy. ctype binding is acceptable. And hope code is not too difficult to use nor too complicated.
- Tensor transpose is a plus. Though techniques to transpose a matrix could be also utilized in specific tensor transpose problem, I think it could be difficult to write a universal API for tensor transpose. I actually also need to handle tensor transpose, but handling tensors could complicate this stackoverflow problem.
- And if there's possibility to implement parallelized transpose in future, or there's a plan exists? Then there would be no need to implement transpose by myself ;)
Thanks in advance for any suggestions!
Current workaroundsHandling a model transpose problem (size of A
is ~763MB) on my personal computer in Linux with 4-cores available (400% CPU in total).
ANSWER
Answered 2021-May-08 at 14:57Computing transpositions efficiently is hard. This primitive is not compute-bound but memory-bound. This is especially true for big matrices stored in the RAM (and not CPU caches).
Numpy use a view-based approach which is great when only a slice of the array is needed and quite slow the computation is done eagerly (eg. when a copy is performed). The way Numpy is implemented results in a lot of cache misses strongly decreasing performance when a copy is performed in this case.
I found this function virtually always run in single-threaded, whatever how large the transposed matrix/array is.
This is clearly dependant of the Numpy implementation used. AFAIK, some optimized packages like the one of Intel are more efficient and take more often advantage of multithreading.
I think a parallelized transpose function should be a resolution. The problem is how to implement it.
Yes and no. It may not be necessary faster to use more threads. At least not much more, and not on all platforms. The algorithm used is far more important than using multiple threads.
On modern desktop x86-64 processors, each core can be bounded by its cache hierarchy. But this limit is quite high. Thus, two cores are often enough to nearly saturate the RAM throughput. For example, on my (4-core) machine, a sequential copy can reach 20.4 GiB/s (Numpy succeed to reach this limit), while my (practical) memory throughput is close to 35 GiB/s. Copying A
takes 72 ms while the naive Numpy transposition A
takes 700 ms. Even using all my cores, a parallel implementation would not be faster than 175 ms while the optimal theoretical time is 42 ms. Actually, a naive parallel implementation would be much slower than 175 ms because of caches-misses and the saturation of my L3 cache.
Naive transposition implementations do not write/read data contiguously. The memory access pattern is strided and most cache-lines are wasted. Because of this, data are read/written multiple time from memory on huge matrices (typically 8 times on current x86-64 platforms using double-precision). Tile-based transposition algorithm is an efficient way to prevent this issue. It also strongly reduces cache misses. Ideally, hierarchical tiles should be used or the cache-oblivious Z-tiling pattern although this is hard to implement.
Here is a Numba-based implementation based on the previous informations:
QUESTION
I'm writing a callback function in Python:
...ANSWER
Answered 2021-Apr-30 at 01:37Listing [Python.Docs]: ctypes - A foreign function library for Python and [NumPy]: C-Types Foreign Function Interface (numpy.ctypeslib). The latter states about the following 2 items (emphasis is mine):
as_array
Create a numpy array from a ctypes array or POINTER.
ndpointer
An ndpointer instance is used to describe an ndarray in restypes and argtypes specifications. This approach is more flexible than using, for example,
POINTER(c_double)
, since several restrictions can be specified, which are verified upon calling the ctypes function.
imageBuffer_ptr is not a "ctypes array or POINTER" (literally: it's not a ctypes.POINTER instance, but ctypes.c_void_p one (there's a subtle difference between the 2)).
The C layer (which in our case lies in the middle) doesn't know about NumPy arrays and so on, so from its PoV, that argument is simply an unsigned short* (doesn't have information about shape, ...). So, on it's other side (in the Python callback), you have to reconstruct the array from the plain C pointer. That can be achieved in 2 ways, like I did in the example below:
callback_ct: Use good-old C way: modify the callback's signature to use ct.POINTER(ct.c_ushort). That is the most straightforward, but the drawback is that it loses all the extra checks when called, meaning that it takes any pointer not just a ndarray one
callback_np: Manually recreate the array from the argument (using a series of conversions). It's more advanced (some might even consider it a bit hackish)
code00.py:
QUESTION
I implemented a Cuda matrix multiplication solely in C which successfully runs. Now I am trying to shift the Matrix initialization to numpy and use Python's ctypes
library to execute the c code. It seems like the array with the pointer does not contain the multiplied values. I am not quite sure where the problem lies, but already in the CUDA code - even after calling the Kernel method and loading back the values from device to host, values are still zeroes.
The CUDA code:
...ANSWER
Answered 2021-Apr-17 at 00:02I can't compile your code as is, but the problem is that np.shape
returns (rows,columns) or the equivalent (height,width), not (width,height):
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ctypeslib
Install libclang-11-dev to get libclang.so (maybe)
OR create a link to libclang-11.so.1 named libclang.so
OR hardcode a call to clang.cindex.Config.load_library_file('libclang-10.so.1') in your code
See the LLVM Clang instructions at http://apt.llvm.org/ or use your distribution's packages.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page