webhdfs | c library and fuse file

by matteobertozzi C Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | webhdfs Summary

webhdfs is a C library typically used in Big Data, Hadoop applications. webhdfs has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

c library and fuse file-system that allows access to hdfs (HDFS-2631)

Support

Quality

Security

License

Reuse

Support

webhdfs has a low active ecosystem.

It has 9 star(s) with 9 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

webhdfs has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of webhdfs is current.

Quality

webhdfs has no bugs reported.

Security

webhdfs has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

webhdfs is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

webhdfs releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of webhdfs

Get all kandi verified functions for this library.

webhdfs Key Features

No Key Features are available at this moment for webhdfs.

webhdfs Examples and Code Snippets

No Code Snippets are available at this moment for webhdfs.

Community Discussions

Trending Discussions on webhdfs

High availability HDFS client python

logstash with hdfs for paritcular duration

Curl throws error (3) with variable but not with manually written URL

Why it says "(No such file or directory)" when using the file stored in HDFS?

How create a (key, value) in JsonArray

How return the list of file form HDFS using the HDFS API

hadoop installation, to start secondary namenode, nodemanagers, and resource managers

Spark Structured Streaming - This query does not support recovering from checkpoint location

HUE failed to access file systems

How to connect and access Azure Datalake Gen1 storage using Azure Ad username and password only - c#

QUESTION

High availability HDFS client python

Asked 2021-May-19 at 13:15

In HDFSCLI docs it says that it can be configured to connect to multiple hosts by adding urls separated with semicolon ; (https://hdfscli.readthedocs.io/en/latest/quickstart.html#configuration). I use kerberos client, and this is my code - from hdfs.ext.kerberos import KerberosClient hdfs_client = KerberosClient('http://host01:50070;http://host02:50070')

And when I try to makedir for example, I get the following error - requests.exceptions.InvalidURL: Failed to parse: http://host01:50070;http://host02:50070/webhdfs/v1/path/to/create

...

ANSWER

Answered 2021-May-19 at 13:15

Apparently the version of hdfs I installed was old, the code didn't work with version 2.0.8, and it did work with version 2.5.7

Source https://stackoverflow.com/questions/66783953

QUESTION

logstash with hdfs for paritcular duration

Asked 2021-Apr-24 at 05:30

Hi I am new logstash and i have done with read the data from tcp and write to the hdfs...that part is don but i want to write to data to 4 different folder of hdfs

Here is sample code

...

ANSWER

Answered 2021-Apr-24 at 05:30

It is possible, you will need to use some mutate filters and some conditionals.

First you need to get the value of the minute from the @timestamp of the event and add the value into a new field, you can use the [@metadata] object, which can be use to filtering, but it will not be present in the output event.

Source https://stackoverflow.com/questions/67227430

QUESTION

Curl throws error (3) with variable but not with manually written URL

Asked 2021-Apr-22 at 14:52

I am communicating with HDFS using curl. Procedure to interact with HDFS via webhdfs is two steps and I receive a url from a first curl command:

...

ANSWER

Answered 2021-Apr-22 at 14:52

You get a \r (carriage return) back in $destination. You can remove it with tr -d '\r'

Source https://stackoverflow.com/questions/67214881

QUESTION

Why it says "(No such file or directory)" when using the file stored in HDFS?

Asked 2021-Apr-05 at 13:37

So I have this file on HDFS but apparently HDFS can't find it and I don't know why.

The piece of code I have is:

...

ANSWER

Answered 2021-Apr-05 at 13:37

The getSchema() method that works is:

Source https://stackoverflow.com/questions/66943071

QUESTION

How create a (key, value) in JsonArray

Asked 2021-Mar-03 at 10:23

I have a JSONObject, like the output in this link:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#GETFILESTATUS

I woul dlike to get the pathSuffix (file names) and the modificationTime (Dates) values in a JSON Array, like this:

...

ANSWER

Answered 2021-Mar-02 at 22:40

json does not support a time type, that is the reason for the error. What you need to do is to change that into a type json can use. That might be a string that represents the time (choose the formating yourself, so you are sure, that when reading it out again you have consistent data) or easier you just keep the long value used.

Here you cansee what json can use: https://www.json.org/json-en.html

Source https://stackoverflow.com/questions/66447226

QUESTION

How return the list of file form HDFS using the HDFS API

Asked 2021-Feb-24 at 15:36

I created a java function to open a file in HDFS. The function is used only the API HDFS. I do not use any Hadoop dependencies in my code. My function worked well:

...

ANSWER

Answered 2021-Feb-24 at 15:36

You can use the exact same logic as the first solution, but this time, use a StringBuilder to get the full response which you then need to parse using a JSON library.

Source https://stackoverflow.com/questions/66339840

QUESTION

hadoop installation, to start secondary namenode, nodemanagers, and resource managers

Asked 2021-Feb-22 at 08:50

I have installed hadoop 3.1.0 clusters on 4 linux machines, hadoop1(master),hadoop2,hadoop3,and hadoop4.

I ran start-dfs.sh and start-yarn.sh, and saw only namenodes and datanodes running with jps. secondary namenodes, nodemanagers and resourcemanagers failed. I tried a few solutions and this is where I got. How to configure and start secondary namenodes, nodemanagers and resroucemanagers?

About secondary namenodes logs says

...

ANSWER

Answered 2021-Feb-22 at 08:50

I had jdk15.0.2 installed and it had some sort of problem with hadoop 3.1.0. Later I installed jdk8 and changed java_home. It went all fine!

About secondary node manager, I had hadoop1:9000 for both fs.defaultFS and dfs.namenode.secondary.http-address, and therefore created a conflict. I changed secondary into 9001 and it went all fine!

Source https://stackoverflow.com/questions/66289226

QUESTION

Spark Structured Streaming - This query does not support recovering from checkpoint location

Asked 2021-Jan-25 at 14:57

I am trying to do some experiment/test on the checkpoint for learning purposes.

But I am getting limited options to see the working of the internals. I am trying to read from socket.

...

ANSWER

Answered 2021-Jan-25 at 08:22

You are getting this error "This query does not support recovering from checkpoint location" because a socket readStream is not a re-playable source and hence does not allow any usage of checkpointing. You need to make sure not to use the option checkpointLocation at all in your writeStream.

Typically, you differentiate between local file system and an hdfs location by using either file:///path/to/dir or hdfs:///path/to/dir.

Make sure that you application user has all the rights to write and read these locations. Also, you may have changes the code base in which case the application cannot recover from the checkpoint files. You can read about the allowed and not allowed changes in a Structured Streaming job in the Structured Streaming Programming Guid on Recovery Semantics after Changes in a Streaming Query.

In order to make Spark aware of your HDFS you need to include two Hadoop configration files on Spark's classpath:

hdfs-site.xml which provides default behaviors for the HDFS client; and
core-site.xml which sets the default file system name.

Usually, they are stored in "/etc/hadoop/conf". To make these files visible to Spark, you need to set HADOOP_CONF_DIR in $SPARK_HOME/spark-env.sh to a location containing the configuration files.

[Source from the book "Spark - The definitive Guide"]

"Do we need to provide checkpoint in every df.writeStream options, i.e. We can also pass in spark.sparkContext.setCheckpointDir(checkpointLocation) right?"

Theroetically, you could set the checkpoint location centrally for all queries in your SQLContext but it is highly recommend to set a unique checkpoint location for every single Stream. The Databricks blog on Structured Streaming in Production says:

"This checkpoint location preserves all of the essential information that uniquely identifies a query. Hence, each query must have a different checkpoint location, and multiple queries should never have the same location.

"As a best practice, we recommend that you always specify the checkpointLocation option."

Source https://stackoverflow.com/questions/65880196

QUESTION

HUE failed to access file systems

Asked 2021-Jan-11 at 13:40

I downloaded HUE from https://github.com/cloudera/hue through git clone and compiled it successfully. However, it shows that some connection errors happen, which means it failed to connect to my local files or HDFS file systems.

Error info from command lines:

[11/Jan/2021 03:58:33 +0000] cluster INFO Resource Manager not available, trying another RM: YARN RM returned a failed response: HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?finalStatus=UNDEFINED&limit=1000&user.name=hue&user=jyy&startedTimeBegin=1609761513000&doAs=jyy (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)).

on localhost:8080:

Could not connect to any of [('127.0.0.1', 10000)] (code THRIFTTRANSPORT): TTransportException("Could not connect to any of [('127.0.0.1', 10000)]",)

dashboard, schedulers, documents all reported error. For example, hdfs reported following error:

Cannot access: /user/jyy. The HDFS REST service is not available. HTTPConnectionPool(host='localhost', port=50070): Max retries exceeded with url: /webhdfs/v1/user/jyy?op=GETFILESTATUS&user.name=hue&doas=jyy (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))

Could anyone figure out what the problem is? Thanks in advance!

...

ANSWER

Answered 2021-Jan-11 at 13:40

If your Hadoop services are not on localhost, you will need to update the Hue configuration to point to them.

Source https://stackoverflow.com/questions/65666646

QUESTION

How to connect and access Azure Datalake Gen1 storage using Azure Ad username and password only - c#

Asked 2020-Oct-06 at 05:33

I want to connect and access Azure Datalake Gen1 storage using Azure Ad username and password only.

I have a service account that has access to the Azure Datalake Gen1 storage. I am able to connect and download the files using credentials via Microsoft Azure Storage Explorer. Also, I am able to connect in SSIS via ADLS Connection Manager and Azure Data Lake Store File System Task.

Now I need to create a console application to connect and perform certain operations (list files and folders and download files)

Searching on google all the results suggest using Azure Ad Application (clientid, tenantid etc.). Unfortunately, I do not have that option.

It looks like in SSIS ADLS Connection Manager uses some kind of WebHdfs connection that supports Azure Ad username and password. But, I am not able to implement something similar in c#.

As usual, running tight on deadlines. Any help is appreciated.

...

ANSWER

Answered 2020-Oct-06 at 05:33

Azure Data Lake Storage Gen1 uses Azure Active Directory for authentication. Before authoring an application that works with Data Lake Storage Gen1, you must decide how to authenticate your application with Azure Active Directory (Azure AD).

The following table illustrates how end-user and service-to-service authentication mechanisms are supported for Data Lake Storage Gen1.

Reference: Authentication with Azure Data Lake Storage Gen1 using Azure Active Directory

Check out this Service-to-service authentication with Azure Data Lake Storage Gen1 using .NET SDK - to learn about how to use the .NET SDK to do service-to-service authentication with Azure Data Lake Storage Gen1.

For end-user authentication with Data Lake Storage Gen1 using .NET SDK, see End-user authentication with Data Lake Storage Gen1 using .NET SDK.

Source https://stackoverflow.com/questions/63855803

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install webhdfs

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: