rawes | Low level elasticsearch driver for Python | HTTP library
kandi X-RAY | rawes Summary
kandi X-RAY | rawes Summary
Low level elasticsearch driver for Python
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- HTTP HEAD operation
- Make a PUT request
- HTTP GET operation
- Build a path
- Make a HTTP request
- Get a connection to the graph
- Attempt to resurrect a connection
- Send a request to the server
- Sends an execute request
- Write this object to the given oprotation
- Read the result from the server
- Reads the response from the given iprot
- Process an execute command
- Execute the request
- Write the result to the operation
- Read the object from the iprot
- Return a connection object from a given URL
- Decode a URL
rawes Key Features
rawes Examples and Code Snippets
Community Discussions
Trending Discussions on rawes
QUESTION
I am getting back user usage data from the Google Admin Report User Usage Api via the Python SDK on Databricks. The data size is around 100 000 records per day which I do a night via a batch process. The api returns a max page size of 1000 so I call it 1000 roughly to get the data I need for the day. This is working fine.
My ultimate aim is to store the data in its raw format in a data lake (Azure Gen2, but irrelevant to this question). Later on, I will transform the data using Databricks into an aggregated reporting model and put PowerBI on top of it to track Google App usage over time.
As a C# programmer, I am new to Python and Spark: my current approach is to request the first page of 1000 records from the api and then write it to the datalake directly as a JSON file, then get the next pageset and write that too. The folder structure would be something like "\raw\googleuser\YYYY\MM\DD\data1.json".
I would like to keep data in it's rawest form possible in the raw zone and not apply too many transformations. The 2nd process can extract the fields I need, tag it with metadata and write it back as Parquet ready for consumption by function. This is why I am thinking of writing it as JSON.
This means that the 2nd process needs to read the JSON into a dataframe where I can transform it and write it as parquet (this part is also straight forward).
Because I am using the Google Api I am not working with Json - it returns dict objects (with complex nesting). I can extract it as a Json string using json.dump() but I cannot figure out how to write a STRING directly to my datalake. Once I get it into a dataframe I can easily write it in any format, however it seems like a performance overhead to convert it from Json into a dataframe and then essentially back to Json just to write it.
Here are the things I have tried and the results:
- Build up a list of pyspark.sql.Rows and at the end of all the paging (100k of rows) - use spark.createDataFrame(rows) to turn it into a dataframe. Once it is a dataframe then I can save it as a Json file. This works, but seems inefficient.
Use json.dump(request) to get a string of 1000 records in Json. I am able to write it to the Databricks File System using this code:
with open("/dbfs/tmp/googleuserusagejsonoutput-{0}.json" .format(keyDateFilter), 'w') as f: f.write(json.dumps(response))
However, I then have to move it to my Azure data lake with:
dbutils.fs.cp("/tmp/test_dbfs1.txt", datalake_path + dbfs_path + "xyz.json")
Then I get the next 1000 records and keep doing this. I cannot seem to use the open() method directory to the data lake store (Azure abfss driver) or this would be a decent solution. It seems fragile and strange to dump it locally first and then move it.
Same as option 1, but do dump the dataframe to datalake every 1000 records and overwrite it (so that memory does not increase more than 1000 records at a time)
Ignore the rule of dumping raw Json. Massage the data into the simplest format I want and get rid of all extra data I don't need. This would result in a much smaller footprint and then Option 1 or 3 above would be followed. (This is the second question - the principle of saving all data from the Api in it's raw format so that as requirements change over time I always have the historical data in the data lake and can just change the transformation routines to extract different metrics out of it. Hence I am reluctant to drop any data at this stage.
Any advice appreciated please...
...ANSWER
Answered 2019-Apr-11 at 15:46Mount the lake to your databricks environment so you can just save it to the lake as if it was a normal folder:
QUESTION
Where is mouse cursor movement acceleration and scroll wheel acceleration implemented in MacOSX?
On the API level, Core Graphics / Quartz Event Services provides the CGEvent type.
On the application side, there are many relevant and interesting comments at this Chrome change review, and extracted from there this comment:
...ANSWER
Answered 2017-May-26 at 10:26Good, question - and a big one at that. I should note that I don't have full information - as you've discovered, a lot of this is closed source. As far as I've been able to tell, these are the important points:
- The mouse settings (tracking speed, scrolling speed, etc.) you see in System Preferences for a generic, non-Apple mouse are all handled inside WindowServer - where the
CGEvent
s originate. There is some massaging of HID reports going on inIOHIDPointing
etc. but from what I've seen this is mostly for maintaining compatibility with quirky devices. - Update: (see discussion in comments) It looks like WindowServer probably passes the acceleration parameters down to the kernel driver by setting properties on its IORegistry entry.
- I believe the momentum scrolling for Apple touch devices (trackpads, magic mouse) may actually at least partially be implemented in their respective closed source kernel drivers, such as
AppleMultitouchTrackpadHIDEventDriver
. - The source code for the lower level userspace side of the HID stack is available in the IOKitUser source bundle. This includes the
IOHIDManager
and so on. - The journey from device to CGEvent is roughly: Device ->
IOHIDDevice
->IOHIDInterface
->IOHIDEventDriver
-> [IOHIDevice
] ->IOHIDSystem
->IOHIDUserClient
-> IOKit kernel-user communications mechanism (uses Mach Ports internally) -> HID parts of IOKitUser -> WindowServer (Core Graphics). - You can bypass this path from a userspace process by directly connecting to an
IOHIDDevice
via aIOHIDLibUserClient
.
The IOKitUser source might answer some of your questions in more detail. Or if you're trying to do something specific, open a new question for that.
QUESTION
I would like to do an index mapping by passing through nest but by i want to give directly a raw elasticsearch request:
...ANSWER
Answered 2017-Feb-05 at 23:45Yes, with the low level client in Elasticsearch.Net
that is also exposed on the high level client in NEST through the .LowLevel
property. You just need to remove the HTTP verb and URI as these are part of the method call on the client.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rawes
You can use rawes like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page