turbodbc | Python module to access relational databases | Database library
kandi X-RAY | turbodbc Summary
kandi X-RAY | turbodbc Summary
[Anaconda-Server Badge] Turbodbc is a Python module to access relational databases via the [Open Database Connectivity (ODBC)] interface. Its primary target audience are data scientist that use databases for which no efficient native Python drivers are available. For maximum compatibility, turbodbc complies with the [Python Database API Specification 2.0 (PEP 249)] For maximum performance, turbodbc offers built-in [NumPy] and [Apache Arrow] support and internally relies on batched data transfer instead of single-record communication as other popular ODBC modules do. Turbodbc is free to use ([MIT license] open source ([GitHub] works with Python 3.8+, and is available for Linux, macOS, and Windows. Turbodbc is routinely tested with [MySQL] [PostgreSQL] [EXASOL] and [MSSQL] but probably also works with your database.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of turbodbc
turbodbc Key Features
turbodbc Examples and Code Snippets
Community Discussions
Trending Discussions on turbodbc
QUESTION
ANSWER
Answered 2021-Jan-11 at 20:49Boost is not installed. You can try this
QUESTION
I'm using Cloudera Hive ODBC driver in my code and I'm trying to containerize the app. Below is my Dockerfile,
...ANSWER
Answered 2020-Dec-11 at 04:38As suggested by @DavidMaze, I managed create a successful Dockerfile
& is shown below
QUESTION
I have a dask dataframe which has 220 partitions and 7 columns. I have imported this file from a bcp file as and completed some wrangling in dask. I then want to write this whole file to mssql using turboodbc. I connect to the DB as follows:
...ANSWER
Answered 2020-Oct-06 at 14:20I needed to convert to a maskedarray by changing:
QUESTION
I am trying to test my dataflow pipeline on the DataflowRunner. My code always gets stuck at 1 hr 1min and says: The Dataflow appears to be stuck. When digging through the stack trace of the Dataflow stackdriver, I come across the error saying the Failed to install packages: failed to install workflow: exit status 1
. I saw other stack overflow messages saying that this can be caused when pip packages are not compatible. This is causing my worker startup to always fail.
This is my current setup.py. Can someone please help me understand what I am missing. The job id is 2018-02-09_08_22_34-6196858167817670597.
setup.py
...ANSWER
Answered 2018-Feb-24 at 20:05So I have figured out that workflow is not a pypi package in this case, but actually the name of the .tar that is created by Dataflow which contains the source code. Dataflow will compress your source code and create a workflow.tar file in your staging environment, then it will try to run pip install workflow.tar. If any issues comes up from this install, it will fail to install the packages onto the workers.
My issue was resolved by a few things: 1) I added six==1.10.0 to my requires, as I found from : Workflow failed. Causes: (35af2d4d3e5569e4): The Dataflow appears to be stuck , that there is an issue with the latest version of six. 2) I realized that sqlalchemy-vertica and sqlalchemy are out of sync and have issues with dependency versions. I hence removed my need for both and found a different vertica client.
QUESTION
I wanted to install SQLAlchemy for Python 3 for working with databases.
I searched for the package using pip3 search SQLAlchemy
, but I didn't find SQLAlchemy as part of the results.
Why don't SQLAlchemy show up in the output below, when the package is available on PyPI?
https://pypi.org/project/SQLAlchemy/
SQLAlchemy 1.3.15
...ANSWER
Answered 2020-Apr-01 at 18:38$ pip search sqlalchemy | wc -l
100
QUESTION
I'm currently trying to tune the performance of a few of my scripts a little bit and it seems that the bottleneck is always the actual insert into the DB (=MSSQL) with the pandas to_sql function.
One factor which plays into this is mssql's parameter limit of 2100.
I establish my connection with sqlalchemy (with the mssql + pyodbc flavour):
...ANSWER
Answered 2019-Dec-22 at 13:13If you are using the most recent version of pyodbc with ODBC Driver 17 for SQL Server and fast_executemany=True
in your SQLAlchemy create_engine
call then you should be using method=None
(the default) in your to_sql
call. That will allow pyodbc to use an ODBC parameter array and give you the best performance under that setup. You will not hit the SQL Server stored procedure limit of 2100 parameters (unless your DataFrame has ~2100 columns). The only limit you would face would be if your Python process does not have sufficient memory available to build the entire parameter array before sending it to the SQL Server.
The method='multi'
option for to_sql
is only applicable to pyodbc when using an ODBC driver that does not support parameter arrays (e.g., FreeTDS ODBC). In that case fast_executemany=True
will not help and may actually cause errors.
QUESTION
I have a package that allows the user to use any one of 4 packages they want to connect to a database. It works great but I'm unhappy with the way I'm importing things.
I could simply import all the packages, but I don't want to do that in case the specific user doesn't ever need to use turbodbc
for example:
ANSWER
Answered 2018-Oct-16 at 00:10You can put imports in places other than the beginning of the file. "Re-importing" something doesn't actually do anything, so it's not computationally expensive to import x
frequently:
QUESTION
I am trying to identify whether two values held in different numpy orderdict objects are the same.
Both dictionaries were created by using the fetchallnumpy()
option in turbodbc
and consist of two keys. First key is an id field the second key is a string value of variable length. I want to see whether the string value in the fist set of dictionary items, is present in the second set of dictionary items.
It's probably worth noting that both dictionary objects are holding approximately 60 million values under each key.
I've tried several things so far:-
np.isin(dict1[str_col],dict2[str_col])
As a function but this was extremely slow, presumably because the string values are stored as
dtype
object.I've tried converting both ditctionary objects to
numpy
arrays with an explicit string type asnp.asarray(dict1[str_col], dtype='S500')
and then tried to use theisin
andin1d
functions. At which point the system runs out of RAM. Have swapped out 'S500' todtype=np.string_
but still get aMemoryError
.(ar=np.concatenate((ar1,ar2)))
whilst performing theisin
function.I also tried a for loop.
[r in dict2[str_col] for r in dict1[str_col]]
Again this was extremely slow.
My aim is to have a relatively quick way of testing the two string columns without running out of memory.
Additional Bits In the long run I'll be running more than one check as I'm trying to identify > new values and values that have changed.
Dictionary A = Current Data ['ID': [int,int,int]] Dictionary B = Historic Data ['record':[str,str,str]]
So the bits I'm interested in are :-
- A != B (current record is different to historic record)
- A not present in B (New record added to the database)
- B not present in A (Records need to be redacted)
The last two elements the quickest way I've found so far has been to pass the id columns to a function that contains the np.isin(arr1,arr2). Takes on average 15 seconds to compare the data.
...ANSWER
Answered 2019-Apr-24 at 15:37You can use np.searchsorted
for faster searches:
QUESTION
I'm trying to create a temporary table in Microsoft SQL Server, then insert data into it, then return the data to Python as a dataframe, preferably.
Here is my connection, which works fine (password hidden).
...ANSWER
Answered 2018-Aug-13 at 02:53I had to use [enmax].[smccarth].[#retaildeals]
instead of just #retaildeals
.
QUESTION
I'm trying to generate and insert many (>1.000.000) Rows in a MS Access Database. For the generation I use numpy functions, therefore I try to access the database with python. I started with pyodbc:
...ANSWER
Answered 2018-Apr-24 at 16:27The pyodbc fast_executemany
feature uses an ODBC mechanism called "parameter arrays". Not all ODBC drivers support parameter arrays, and apparently the Microsoft Access ODBC driver is one that doesn't. As mentioned in the pyodbc Wiki
Note that this feature ... is currently only recommended for applications running on Windows that use Microsoft's ODBC Driver for SQL Server.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install turbodbc
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page