gcsfs | Google Cloud Storage filesystem for PyFilesystem2 | Cloud Storage library
kandi X-RAY | gcsfs Summary
kandi X-RAY | gcsfs Summary
Google Cloud Storage filesystem for PyFilesystem2
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Opens a binary file
- Get the info for a resource
- Create an Info object from a blob
- Return information about a directory
- Creates a proxy for a temporary file
- Seek to the specified position
- Convert path to key
- Return an iterator over a directory
- Scans a directory
- Get the URL for the given path
- Read n characters from the file
gcsfs Key Features
gcsfs Examples and Code Snippets
Community Discussions
Trending Discussions on gcsfs
QUESTION
I am trying to migrate from google cloud composer composer-1.16.4-airflow-1.10.15 to composer-2.0.1-airflow-2.1.4, However we are getting some difficulties with the libraries as each time I upload the libs, the scheduler fails to work.
here is my requirements.txt
...ANSWER
Answered 2022-Mar-27 at 07:04We have found out what was happening. The root cause was the performances of the workers. To be properly working, composer expects the scanning of the dags to take less than 15% of the CPU ressources. If it exceeds this limit, it fails to schedule or update the dags. We have just taken bigger workers and it has worked well
QUESTION
I wanna read a big CSV file with pyarrow. All my columns are float64's. But pyarrow seems to be inferring int64.
How do I specify a dtype for all columns?
...ANSWER
Answered 2022-Mar-18 at 23:48Pyarrow's dataset module reads CSV files in chunks (the default is 1MB I think) and it processes those chunks in parallel. This makes column inference a bit tricky and it handles this by using the first chunk to infer data types. So the error you are getting is very common when the first chunk of the file has a column that looks integral but in future chunks the column has decimal values.
If you know the column names in advance then you can specify the data types of the columns:
QUESTION
Build failed when I try to update code and re-deploy the Google Cloud Function.
Deploy Script:
...ANSWER
Answered 2022-Jan-07 at 15:01The release of setuptools 60.3.0 caused AttributeError because of a bug and now Setuptools 60.3.1 is available. You can refer to the GitHub Link here.
For more information you can refer to the stackoverflow answer as :
If you run into this pip error in a Cloud Function, you might consider updating pip in the "
requirements.txt
" but if you are in such an unstable Cloud Function the better workaround seems to be to create a new Cloud Function and copy everything in there.The pip error probably just shows that the source script, in this case the
requirements.txt
, cannot be run since the source code is not fully embedded anymore or has lost some embedding in the Google Storage. or you give that Cloud Function a second chance and edit, go to Source tab, click on Dropdown Source code to choose Inline Editor and add main.py and requirements.txt manually (Runtime: Python).”
QUESTION
I try to connect from a Google Cloud Function in Python runtime to an external MySQL server db that is not hosted by Google Cloud.
My "requirements.txt":
...ANSWER
Answered 2022-Jan-14 at 22:55If the database is on a VM, and in your VPC, you can create a VPC connector and attach it to your Cloud Function to access it.
If it's deployed else where,
- Either the database has a public IP, and Cloud Functions can directly access it.
- Or the database has a private IP and you need to create a VPN between your VPC and the private foreign network with your database. And again add a serverless VPC connector to Cloud Functions to allow it to your your VPC and the VPN to access the database.
QUESTION
we are reading the yaml file with below code in python but its giving me [1 rows x 30 columns] but i want it in 2 rows. 1 row for my_table_01 and another for my_table_02(giving sample data below the code)
...ANSWER
Answered 2021-Dec-29 at 12:59json_normalize
expects a list
of dicts
, not a single nested dict
if it is to create multiple rows. You therefore need to 'unpack' your nested dict
into a list
of dicts
, for example by taking the values()
of config_queries
:
QUESTION
I was wondering if anyone can help. I'm trying to take a CSV from a GCP bucket, run it into a dataframe, and then output the file to another bucket in the project, however using this method my dag is running but i dont im not getting any outputs into my designated bucket? My dag just takes ages to run. Any insight on this issue?
...ANSWER
Answered 2021-Nov-18 at 16:19Not sure if I understand this correctly but you seem to be nesting your PythonOperator
creation inside the make_csv
dependency which is an infinite loop as far as I can see. Maybe try removing that outside of the function and see what happens?
QUESTION
I've been using gcsfs
in my Cloud Functions for a while now without issue. Suddenly, it has stopped working for newly deployed functions and is throwing an error:
RuntimeError: This class is not fork-safe
(full traceback attached in photo)
I'm guessing it's due to one of the dependencies of the gcsfs
package. In any case, I've updated gcsfs
to current version in the requirements.txt
and that has not helped.
The error can be reproduced by defining a cloud function as follows (Python 3.7):
main.py:
...ANSWER
Answered 2021-Oct-15 at 07:48This change is related to the Python 3.7 buildpacks rollout. As a result of the move to gunicorn
and its worker model, the global scope and function scope can be executed in separate processes. This issue can be fixed by moving the GCSFileSystem
initialization into the function body.
You need to put fs = gcsfs.GCSFileSystem(project='project-name-1234')
inside the entrypoint try_gcsfs
. Your code should look like this:
QUESTION
I am training a Calibrated Classifier on Google Cloud Scheduler every day which takes about 5 mins to run. My python script receives latest data (from that day) and concatenate it to the original data and then the model gets trained and saves the pickled files on Cloud Storage. The issue I am facing now is, if it takes more than 5 mins (which it will at some point), it gives an upstream request timeout error.
I imagine, that it because of the more time the model is taking to train and I can think of one solution where I train the model only on the new data and update the weights of the original model in the pickled file. However, I am not sure if its possible.
Below is my function that runs on the scheduler:
...ANSWER
Answered 2021-Aug-19 at 10:58I couldn't find a way to update the weights of a pickle file and eventually settled with increasing the timeout parameter in cloud run to more than the training time and it fixed the issue for the time being.
QUESTION
So far I was working with Pandas 1.2.2, after to upgrade it to 1.3.1 I have the next error when I read a csv file, I didn't have any problem before upgrade.
Here is de kind of encoding for the file:
...ANSWER
Answered 2021-Aug-02 at 08:33According to the exeption and pandas version, the problem could be that you have non-Unicode character(s) in your file, that was suppressed before v1.3. See this bug report comment.
Also, pandas documentation introduced the encoding_errors
parameter (encoding_errors str, optional, default “strict”) in version 1.3 to explicitly handle encoding errors. So you should check your file for incorrect characters.
In any case, if you want the behavior prior v1.3, you can use replace
(or ignore
if it better for your case):
QUESTION
I have successfully run my model in GCP in Vertex AI but when I try to source batch predictions, it hangs.
When I run the model in my local environment, it is done in seconds. The model does take 8 minutes to calculate on GCP.
My model code is here:
...ANSWER
Answered 2021-May-28 at 14:08So simple answer to this appears to be that the file literally has to be saved as "model.pkl". I assumed that the name before the extension could vary but no.
I am still struggling to make a prediction be generated but it now returns the failure within 15 minutes or so
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install gcsfs
You can use gcsfs like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page