flyio | Input Output Files in R from Cloud or Local | Cloud Storage library
kandi X-RAY | flyio Summary
kandi X-RAY | flyio Summary
flyio provides a common interface to interact with data from cloud storage providers or local storage directly from R. It currently supports AWS S3 and Google Cloud Storage, thanks to the API wrappers provided by cloudyr. flyio also supports reading or writing tables, rasters, shapefiles and R objects to the data source from memory. For global usage, the datsource, authentication keys and bucket can be set in the environment variables of the machine so that one does not have to input it every time.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of flyio
flyio Key Features
flyio Examples and Code Snippets
# Setting the data source
flyio_set_datasource("gcs")
# Verify if the data source is set
flyio_get_datasource()
# Authenticate the default data source and set bucket
flyio_auth("key.json")
flyio_set_bucket("atlanhq-flyio")
# Authenticate S3 also
f
# Install the stable version from CRAN:
install.packages("flyio")
# Install the latest dev version from GitHub:
install.packages("devtools")
devtools::install_github("atlanhq/flyio")
# Load the library
library(flyio)
Community Discussions
Trending Discussions on Cloud Storage
QUESTION
I have some static contents which will be downloaded by a big number of concurrent users. I am using a google cloud storage bucket to serve those contents.
i am afraid of low performance due to bandwidth, or file read speed. in case of big number of concurrent users. i want to ask if is it better to use more than one bucket with a load balancer to serve the same contents, or there will not be much difference?
...ANSWER
Answered 2022-Mar-28 at 17:23I have not benchmarked using multiple buckets, but I do not think there will be any benefit. The downside is increased complexity in your deployments.
Cloud Storage is already very fast and can handle global access. I do not believe a single load balancer would be able to overload a storage bucket. There are exceptions such as object name hotspots (sequential object names), but this would also affect your multiple bucket strategy.
You can also configure dual-region storage buckets, which are primarily used for replicating data. Selecting a bucket location will have more of an impact link.
The key to fast performance for the client is two-fold. Network performance and locality.
For network performance, ensure data travels from the bucket to the user over Google's premium tier network. This reduces the unpredictability of the Internet.
To improve locality, bring the data closer to the client. This means using Google's CDN, which caches bucket data around the world at points-of-presence that are closer to the client.
Read speed will be determined by the client's network speed (Internet connection) and TCP/IP stack configuration. Cloud Storage is many orders of magnitude faster.
For best performance:
- Create a multi-region bucket.
- Add Cloud CDN to your load balancer to cache bucket objects.
QUESTION
I am using Nifi to send data from Pub/Sub queue to Cloud storage. I'm using the ConsumeGCPubSub processor to fetch data from the queue and the PutGCSObject processor to add Cloud Storage in Nifi. But the PutGCSObject processor is sending duplicate data in Cloud Storage.
I also see that this data has the same MD5 Hash code in its Cloud Storage records. What could be causing this and how can I fix it?
I double checked:
- pub/sub messages is not duplicated.
- When I send 30 piece of data, there are come exactly 30 pieces in Nifi
- I checked my google storage have different data. But there was not..
- When I examine it, the number of data coming from the queue and exiting the PutGCSObject processor as success is the same, but I see that the data is written over and over again. When I looked into NiFi Data Provenance, I found that there are multiple data with the same FlowFile UUID.
ANSWER
Answered 2022-Mar-25 at 09:30You should have connected the success criterion on the terminate side to the processor.
QUESTION
The questions are related to snowflake account with organization/ orgadmin role enabled.
1.Is it possible to detach a snowflake account from a snowflake organization?
a. If yes, will the removed account become a standalone(separate contract) account? How does the billing work?
b. Will the account url change after detachment?
c. Procedure to achieve the above?
- In an organization, are the background services charged on each account?
- Can we clone a database across accounts within an organization?
- What happens to the other accounts within an organization when the primary account is deleted?
- Can we get a cost comparison table between 2 standalone accounts and an organization with 2 accounts?
- After detachment can the account type/region/cloud provider be changed?
I have asked similar questions to snowflake support through support ticket system, But would like to get answers from the community too.
P.S If I get an answer from Snowflake, I will post it here!.
...ANSWER
Answered 2021-Nov-27 at 19:19- Yes - but you'd need to contact Snowflake support to do that.
a) not sure what you mean by standalone. All accounts are technically standalone. If you mean from a Snowflake contract, perspective, if you want them to be on a separate contract, you can do that. If you don't, it can remain on the same contract.
b) see (a).
c) if you are using the URL that uses ORG and account name, then yes, the URL will change. If you are using the URL that leverages an account locator and the deployment/region, then no. If won't and can't change.
d) Call Support
- background services are always related to an account. An organization is just a way of grouping accounts.
- No, but you can replicate data from one account to another account in Org. Cloning can only ever be done within a single account.
- What happens to what? If this was related to cloning, then I don't think the question is valid. A replication would cease to replicate.
- No costs for Organization so costs are the same per credit and per TB costs that you'd see on any account.
- No, you can't move an account around. You'd need to create a new account, move your objects to the new account and then just remove your original account, if you wanted to move platform or region.
QUESTION
Due to the nature of our business, we basically need to disclose where in the globe the files uploaded by our users are located.
In other words, we need the exact address where the data storage that keeps these files is located.
We're using Google Firebase's Cloud Storage and, even though they mention which city each location option refers to, we are unable to check the exact address.
The bucket that corresponds to our Google Cloud Storage is currently configured as: us (multiple regions in United States)
, which I suppose makes it even worse to pinpoint where the data resides. But that is an easy fix: we can simply start from scratch selecting a specific region as our storage location.
The main issue, however, is that, even if selecting a specific location, we can't really know the address where those files will be stored.
Has anyone ever come across something like this?
I tried getting support in my project's Google Cloud Platform, but apparently I need to purchase it. And I'm afraid that they won't be able to help me.
In case someone has contacted their support and got this answer, please let me know.
...ANSWER
Answered 2021-Oct-28 at 22:14When you store data in GCS, even in a regional bucket, Google does not make any guarantee which zone(s) within the region the data is stored in, nor is this visible. Different zones in a region can be at a different street address, so street-address level location data is unavailable, even if you get the datacenter addresses by finding the datacenters on Google maps (you could start here).
QUESTION
I have 10 requests per second of data I want to save that looks like the entry below. I need to save this data after a CloudRun function completes. (My infrastructure is on google-cloud-platform
). The data will be used as a data set for machine learning.
ANSWER
Answered 2021-Sep-24 at 20:07I can propose you 2 patterns, but in both case you need to store the messages:
- Either use PubSub to stack the messages. Then, use Dataflow to read pubsub and to sink to Cloud Storage. Or use a on demand service (Cloud Run for exemple) to pull your PubSub subscription and write a file with all the message read (You can trigger your Cloud Run with Cloud Scheduler, every hour for example)
- Or store the message in BigQuery, and then perform query export to GCS regularly (again with a Cloud Scheduler + Cloud Functions/Run). It's my preferred solution, because, maybe a day, you will have to process differently your message, and to get metrics/perform analytics on them.
QUESTION
I have stored thousands of images in GCP Cloud Storage in very high resolution. I want to serve these images in an iOS/Android App and on a website. I don't want to serve all the time the high-resolution version and wondered whether I have to create duplicate images in different resolutions - which seems very inefficient. The perfect solution would be that I can append a parameter like ?size=100 to the image URL. Is something like that natively possible with GCP Cloud Storage?
I don't find anything in the documentation from cloud storage: https://cloud.google.com/storage/docs. Several other resources link to deprecated solutions: https://medium.com/google-cloud/uploading-resizing-and-serving-images-with-google-cloud-platform-ca9631a2c556
What is the best solution to implement such functionality?
...ANSWER
Answered 2021-Jul-12 at 03:54John Hanley is correct. Cloud Storage currently does not have Imaging services yet, though a Feature Request already exists. I highly suggest that you "+1" and "star" this issue to increase its chance to be prioritized in development.
You are right that this use case is common. Image API is a Legacy App Engine API. It's no longer a recommended solution because Legacy App Engine APIs are only available in older runtimes that have limited support. GCP would advise developers to use Client Libraries instead but since your requested feature is not yet available, then you'll have to use third-party imaging libraries.
In this case, developers are commonly using Cloud Functions with Cloud Storage Trigger, thus resizing and creating duplicate images in different resolutions. While you may find the solution inefficient, unfortunately there's not much choice but to process those images until the feature request becomes available in public.
One good thing though is that Cloud Functions supports multiple runtimes so you can write code in any supported languages and pick libraries you're comfortable using. If you're using Node runtime, feel free to check this sample that automatically creates thumbnail when an image is uploaded to Cloud Storage.
QUESTION
I was wondering that how do storage solutions like S3 or Google Drive check whether their storage platform is being abused for the storage of malicious content?
e.g. if someone uploads a password protected zip file to their servers, I don't see a way on how they can verify it. For unencrypted files, I can understand some sort of file parser could work. But if someone uploads a password protected file, the only way to see/verify the contents is try to brute force your way into it (ignoring the moral obligations for the organisation to not do that).
So, how do these companies/solutions verify the kind of data that is being uploaded on their platforms?
...ANSWER
Answered 2021-May-04 at 15:10There isn't technical solution, but on legal solution. They say: "We are only a service provider, not a content provider. We aren't responsible of the illegal use of our services".
This stand has been the same with Youtube, where you was able to upload content with copyright without issue with Google (but with the owner of the copyright). Now, it has changed and Youtube performed check, but it was the same legal principle.
QUESTION
I want to find a free cloud storage service with free API, that could help me back up some files automatically.
I want to write some script (for example python) to upload files automatically.
I investigated OneDrive and GoogleDrive. OneDrive API is not free, GoogleDrive API is free while it need human interactive authorization before using API.
For now I'm simply using email SMTP protocol to send files as email attachments, but there's a max file size limition, which will fail me in the future, as my file size is growing.
Is there any other recommendations ?
...ANSWER
Answered 2021-Apr-27 at 03:27I believe your goal as follows.
- You want to upload a file using Drive API with the service account.
- You want to achieve your goal using python.
At first, in your situation, how about using google-api-python-client? In this answer, I would like to explain the following flow and the sample script using google-api-python-client.
Usage: 1. Create service account.Please create the service account and download a JSON file. Ref
2. Installgoogle-api-python-client
.
In order to use the sample script, please install google-api-python-client
.
QUESTION
As an intro, I'm developing an app with Flutter that has an audio section.
I would like to address two subjects.
For the moment the audio is stored in the cloud, more specific using Firebase. The main problem is that the pricing is not very supportive when the bandwidth threshold is exceeded. Also, as I discovered, each song is downloaded completely when trying to play it. Therefore, it doesn't matter that I want play 10 seconds or 1 minute of a song, the same traffic is generated. I'm using just_audio package as audio library and I'm wondering if there is a solution to integrate a stream base solution that implies buffering.
As I've seen in the debug logs, a HTTP request is sent every time a song (from the cloud) is requested to play. Now, my concerns are that I can't use just_audio for streaming. Is there a cloud solution that fulfills a good compromise between price and bandwidth, even if the song is downloaded entirely each time the play action is required? I'm taking in consideration to develop an offline mode for the audio section, so that each song could be played from the local memory. Even so, it must be a user option, not a by default feature.
ANSWER
Answered 2021-Apr-02 at 16:00I found two solutions for cloud storage that offer a good price for data transfer. The first option is starting from 5$/mo, 1TB included and 0.01$ / GB for extra traffic. https://www.digitalocean.com/. The second one starts from 9EUR/mo, 1TB included and the cost for any extra traffic is 0.5EUR/TB.
just-audio has support for HLS and MPEG-DASH. Therefore, for the server-side, a good solution is nginx with the rtmp module.
Credits to: https://docs.peer5.com/guides/setting-up-hls-live-streaming-server-using-nginx/
The setup is pretty much straight forward:
QUESTION
I'm using @google-cloud/storage package and generating signed url to upload file like this:
...ANSWER
Answered 2021-Apr-02 at 13:30Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install flyio
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page