lakeFS | Data version control for your data lake | Cloud Storage library

by treeverse Go Version: 0.7.0 License: Apache-2.0

X-Ray Key Features Code Snippets(1)Community Discussions(4)Vulnerabilities Install Support

kandi X-RAY | lakeFS Summary

lakeFS is a Go library typically used in Storage, Cloud Storage, Spark, Amazon S3 applications. lakeFS has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

lakeFS is an open source tool that transforms your object storage into a Git-like repository. It enables you to manage your data lake the way you manage your code. With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. For more information see the official lakeFS documentation.

Support

Quality

Security

License

Reuse

Support

lakeFS has a medium active ecosystem.

It has 3443 star(s) with 289 fork(s). There are 41 watchers for this library.

There were 2 major release(s) in the last 12 months.

There are 549 open issues and 1838 have been closed. On average issues are closed in 11 days. There are 33 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of lakeFS is 0.7.0

Quality

lakeFS has 0 bugs and 0 code smells.

Security

lakeFS has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

lakeFS code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

lakeFS is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

lakeFS releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 110129 lines of code, 5971 functions and 913 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of lakeFS

Get all kandi verified functions for this library.

lakeFS Key Features

No Key Features are available at this moment for lakeFS.

lakeFS Examples and Code Snippets

Centralized Swagger in MicroServices with SpringBoot, Gateway and SpringFox Swagger

Lines of Code : 107

License : Strong Copyleft (CC BY-SA 4.0)

Copy


@Controller
public class SwaggerController {
    private final JsonSerializer jsonSerializer;
    private final SwaggerResourcesProvider swaggerResources;

    @Autowired
    public SwaggerController(JsonSerializer jsonSerializer, Swagger

Community Discussions

Trending Discussions on lakeFS

Do I need garbage collector when I delete object from branch by API?

AWS S3 Bucket giving 'policies must be valid JSON and the first byte must be '{'

How to hard delete objects older than n-days in LakeFS?

lakeFS, Hudi, Delta Lake merge and merge conflicts

QUESTION

Do I need garbage collector when I delete object from branch by API?

Asked 2021-Dec-14 at 07:58

Do I need a garbage collector in LakeFS when I delete an object from a branch by API? Using appropriate method of course. Do I understand right that the garbage collector is used only for objects that are deleted by a commit. And this objects are soft deleted (by the commit). And if I use the delete API method than the object is hard deleted and I don’t need to invoke the garbage collector?

...

ANSWER

Answered 2021-Dec-14 at 07:58

lakeFS manages versions of your data. So deletions only affect successive versions. The object itself remains, and can be accessed by accessing an older version.

Garbage collection removes the underlying files. Once the file is gone, its key is still visible in older versions, but if you try to access the file itself you will receive HTTP status code 410 Gone.

For full information, please see the Garbage collection docs.

Source https://stackoverflow.com/questions/70337958

QUESTION

AWS S3 Bucket giving 'policies must be valid JSON and the first byte must be '{'

Asked 2021-Dec-10 at 08:58

    { 
"Id": "Policy1590051531320", 
"Version": "2012-10-17",
"Statement": [ 
{ "Sid": "Stmt1590051522178", 
"Action": [ "s3:GetObject", 
"s3:GetObjectVersion", 
"s3:PutObject", 
"s3:AbortMultipartUpload", 
"s3:ListMultipartUploadParts", 
"s3:GetBucketVersioning", 
"s3:ListBucket", 
"s3:GetBucketLocation", 
"s3:ListBucketMultipartUploads", 
"s3:ListBucketVersions" ], 
"Effect": "Allow", 
"Resource": ["arn:aws:s3:::lakefs", "arn:aws:s3:::lakefs/backend.txt/*"], 
"Principal": {"AWS": ["arn:aws:iam::REDACTED:user/uing"]
 } 
} 
] 
}

...

ANSWER

Answered 2021-Oct-12 at 02:52

You can't have these spaces { at the beginning. It should be:

Source https://stackoverflow.com/questions/69534527

QUESTION

How to hard delete objects older than n-days in LakeFS?

Asked 2021-Nov-28 at 11:02

How to find and hard delete objects older than n-days in LakeFS? Later it'll be a scheduled job.

...

ANSWER

Answered 2021-Nov-28 at 10:34

To do that you should use the Garbage Collection (GC) feature in lakeFS.

Note: This feature cleans objects from the storage only after they are deleted from your branches in lakeFS.

You will need to:

Define GC rules to set your desired retention period.

From the lakeFS UI, go to the repository you would like to hard delete objects from -> Settings -> Retention, and define the GC rule for each branch under the repository. For example -

Source https://stackoverflow.com/questions/70141714

QUESTION

lakeFS, Hudi, Delta Lake merge and merge conflicts

Asked 2021-Oct-04 at 16:59

I'm reading documentation about lakeFS and right now don't clearly understand what is a merge or even merge conflict in terms of lakeFS.

Let's say I use Apache Hudi for ACID support over a single table. I'd like to introduce multi-table ACID support and for this purpose would like to use lakeFS together with Hudi.

If I understand everything correctly, lakeFS is a data agnostic solution and knows nothing about the data itself. lakeFS only establishes boundaries (version control) and moderates somehow the concurent access to the data..

So the reasonable question is - if lakeFS is data agnostic, how it supports merge operation? What merge itself means in terms of lakeFS? And is it possible to have a merge conflict there?

...

ANSWER

Answered 2021-Oct-04 at 16:59

You do understand everything correctly. You could see in the branching model page that lakeFS is currently data agnostic and relies simply on the hierarchical directory structure. A conflict would occur when two branches update the same file. This behavior fits most data engineers CI/CD use cases.

In case you are working with Delta Lake and made changes to the same table from two different branches, there will still be a conflict because the two branches changed the log file. In order to resolve the conflict you would need to forgo one of the change sets. Admittedly this is not the best user experience and it's currently being worked on. You could read more about it on the roadmap documentation.

Source https://stackoverflow.com/questions/69427317

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install lakeFS

Ensure you have Docker & Docker Compose installed on your computer.
Ensure you have Docker & Docker Compose installed on your computer.
Run the following command: curl https://compose.lakefs.io | docker-compose -f - up
Open http://127.0.0.1:8000/setup in your web browser to set up an initial admin user, used to login and send API requests.
Ensure you have Docker installed.
Run the following command in PowerShell: Invoke-WebRequest https://compose.lakefs.io | Select-Object -ExpandProperty Content | docker-compose -f - up
Open http://127.0.0.1:8000/setup in your web browser to set up an initial admin user, used to login and send API requests.