geocube | Tool to convert geopandas vector data into rasterized xarray data | Map library
kandi X-RAY | geocube Summary
kandi X-RAY | geocube Summary
Tool to convert geopandas vector data into rasterized xarray data.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Create a geocube dataset
- Gets the gridded grid from a grouped dataframe
- Get a dataset from a list of measurements
- Get grid coordinates from a dataframe
- Format a pandas data series
- Get measurement attributes
- Update time attributes
- Get the logger
- Rasterize an image
- Return numpy dtype based on fill value
- Remove missing data
- Check if data values values are numeric
- Create geobox from vector data
- Load vector data
- Create a GeoBox from rio coordinates
- Make geocube from vector data
- Write coordinates to netCDF file
- Rasterize a set of points
- Performs rasterization on a grid
- Register subcommands
geocube Key Features
geocube Examples and Code Snippets
Community Discussions
Trending Discussions on geocube
QUESTION
I have a raster file and a shapefile containing polygons. For each polygon, I want to compute the mean value of all raster cells within that polygon.
I had been doing this with rasterstats.zonal_stats
, which works perfectly but is very slow for large rasters.
Recently I came across rioxarray's example of instead clipping the raster to the polygon and then computing the mean.
From my experiments, the clipping approach is a lot faster than zonal_stats and the results are the same.
I am therefore wondering if there is anything else except for the time difference that would lead to a preference of one over the other method?
And why it is that the clipping is so much faster than the zonal_stats
?
Below the output and timing of the two methods, and a snippet of the code. The full code can be found here.
It would be great to get insights on this :)
...ANSWER
Answered 2021-Aug-17 at 14:56In the view of self-learning, I realized I made a mistake here and I think it is beneficial to report.
It all depends on the order you run the functions in. As I loaded the raster data with rioxarray.open_rasterio
, this data is not loaded into memory. Thereafter both the clip and zonal_stats
still have to load this into memory. When one of the two is called before the other, the latter makes use of the fact that the data is already loaded.
You can also choose to instead call ds.load()
before calling the fuctions. In my tests this didn't change the total computation time.
QUESTION
I have to calculate the value of S, which has the formula as: S = (25400/CN) − 254
the CN value which I have to choose will depend on the amc_active condition viz 1, 2 and 3. if amc_active condition at 'index 0 or 1st row' is 1 then I have to choose the CN value from column cn1 i.e 47
and if the amc_active is 3, then I have to choose CN value as 95 from cn3 column in the 4th row and so on..
...ANSWER
Answered 2021-Jun-14 at 09:15You can use numpy's fancy indexing:
QUESTION
I have a "seed" GeoDataFrame (GDF)(RED) which contains a 0.5 arc minutes global grid ((180*2)*(360*2) = 259200). Each cell contains an absolute population estimate. In addition, I have a "leech" GDF (GREEN) with roughly 8250 adjoining non-regular shapes of various sizes (watersheds).
I wrote a script to allocate the population estimates to the geometries in the leech GDF based on the overlapping area between grid cells (seed GDF) and the geometries in the leech GDF. The script works perfectly fine for my sample data (see below). However, once I run it on my actual data, it is very slow. I ran it overnight and the next morning only 27% of the calculations had been performed. I will have to run this script many times and waiting for two days each time, is simply not an option.
After doing a bit of literature research, I already replaced (?) for loops with for index i in df.iterrows()
(or is this the same as "conventional" python for loops) but it didn't bring about the performance imporvement I had hoped for.
Any suggestion son how I can speed up my code? In twelve hours, my script only processed only ~30000 rows out of ~200000.
My expected output is the column leech_df['leeched_values']
.
ANSWER
Answered 2020-Feb-27 at 18:33It might be worthy to profile your code in details to get precise insights of what is your bottleneck.
Bellow some advises to already improve your script performance:
- Avoid
list.append(1)
to count occurrences, usecollection.Counter
instead; - Avoid
pandas.DataFrame.iterrows
, usepandas.DataFrame.itertuples
instead; - Avoid extra assignation that are not needed, use
pandas.DataFrame.fillna
instead:
Eg. this line:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install geocube
You can use geocube like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page