bfg-repo-cleaner | Removes large or troublesome blobs like git | Search Engine library

by rtyley Scala Version: v1.14.0 License: GPL-3.0

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | bfg-repo-cleaner Summary

bfg-repo-cleaner is a Scala library typically used in Database, Search Engine applications. bfg-repo-cleaner has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has medium support. You can download it from GitHub.

BFG Repo-Cleaner

Support

Quality

Security

License

Reuse

Support

bfg-repo-cleaner has a medium active ecosystem.

It has 9782 star(s) with 523 fork(s). There are 103 watchers for this library.

It had no major release in the last 12 months.

There are 237 open issues and 163 have been closed. On average issues are closed in 292 days. There are 15 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of bfg-repo-cleaner is v1.14.0

Quality

bfg-repo-cleaner has 0 bugs and 0 code smells.

Security

bfg-repo-cleaner has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

bfg-repo-cleaner code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

bfg-repo-cleaner is licensed under the GPL-3.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

bfg-repo-cleaner releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

It has 2149 lines of code, 163 functions and 57 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bfg-repo-cleaner

Get all kandi verified functions for this library.

bfg-repo-cleaner Key Features

No Key Features are available at this moment for bfg-repo-cleaner.

bfg-repo-cleaner Examples and Code Snippets

No Code Snippets are available at this moment for bfg-repo-cleaner.

Community Discussions

Trending Discussions on bfg-repo-cleaner

How to skip lfs pre-receive hook or drop the lfs object all together

Delete list of files with BFG-repo-cleaner

Rejected refs in repo-cleaning

What is the simplest way to assign a Java program its own command?

Dirty trick to keep commit hashes when rewriting Git history?

How to remove binary files from all of the bitbucket git history, when the repo is already very close to 2GB?

Git: git fetch ends in "xxx/xxx does not appear to be a git repository"

Deleting folders from git history to reduce repo size

QUESTION

How to skip lfs pre-receive hook or drop the lfs object all together

Asked 2022-Jan-20 at 12:55

I have a BitBucket repository that shall be migrated to Github. It is using git lfs for large files and has a few corrupt lfs objects, which hinders the pushing to github (now the origin in below command). The push always returns the error Your push referenced at least X unknown Git LFS objects.

I'm now wondering if I can either ignore the pre-receive hook or drop/delete the lfs objects in question all together. I don't care much about these files, because they are quite old already and by now not even in the repo anymore.

...

ANSWER

Answered 2022-Jan-20 at 12:55

I did retry the approach with BFG repo cleaner and it did work nicely now. Also, I did directly push to the new repo (Github), which has the advantage that the original repository is not touched by the update at all.

So following measures allowed me to migrate the repo from Bitbucket to Github while deleting the corrupt files from the repository.

Source https://stackoverflow.com/questions/70785041

QUESTION

Delete list of files with BFG-repo-cleaner

Asked 2021-Dec-17 at 00:26

We are trying to shrink our git repository to under 500MB due to deployment issues.

To achieve that, we have created a new branch where we have moved all old images, videos and fonts to AWS S3.

I can easily get the list of files with git diff --name-only --diff-filter=D master -- public/assets/.

Now, I have tried to run BFG-repo-cleaner 1.14.0 on each file. But I have 400 files and it is taking ages to delete each files separately (still running as I'm writing this).

git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | xargs -i bfg --delete-files '{}'

Since each file is distinct, I can not really use a glob pattern, as suggested at Delete multiple files from multiple branch using bfg repo cleaner.

I tried to separate each file with a comma but that resulted in BFG-repo-cleaner telling me:

BFG aborting: No refs to update - no dirty commits found??

Is there a way to provide multiple files to BFG-repo-cleaner without a glob pattern?

PS. The command I tried with multiple files is: git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | sed -z 's/\n/,/g;s/,$/\n/' | xargs -i bfg --delete-files '{}' && git reflog expire --expire=now --all && git gc --prune=now --aggressive

PPS. The bfg command is on my PATH as a simple bash script with java -jar /tools/BFG-repo-cleaner/bfg-1.14.0.jar "$@"

...

ANSWER

Answered 2021-Dec-17 at 00:26

But I have 400 files and it is taking ages to delete each files separately

That is why the tool to use (python-based) is newren/git-filter-repo (see installation)

That way, you can feed that tool a file, with the list of files in it:

Source https://stackoverflow.com/questions/70387071

QUESTION

Rejected refs in repo-cleaning

Asked 2020-Dec-21 at 22:21

I am cleaning a git repository's pull requests (PR). There was one PR created to merge a branch B, which was later considered deprecated and removed before being merged. As a result, branch B was deleted, and this PR is not showing up in Bitbucket's pull request list. However, if I use git show-ref, this PR is in the ref list as well as in the remote repository history. Is there a way to clear this PR in the remote repository?

...

ANSWER

Answered 2020-Nov-05 at 04:11

Make sure you have deleted the branch and not just the PR. If the ref is showing up on your local you can run git fetch --prune to remove an refs on local that are not on your remote. You might also want to run git gc to remove orphaned objects as a followup.

Source https://stackoverflow.com/questions/64691036

QUESTION

What is the simplest way to assign a Java program its own command?

Asked 2020-Oct-23 at 22:34

I'm creating a CLI program in Kotlin (Java). I want to bind the main function/class to an individual command, such as program.

However, from what I searched online, it seems like the only way to run a Java program is with the command java. For example, java -jar program.jar args, or java -cp "..." Program args. But they are very inconvenient for users to type every time, which I experienced when I used BFG, a command-line repository cleaner tool written in Java.

I could use an alias, but there is no standard way to add aliases to a system when users install my CLI program. For example, most people use Bash, so I have to install the alias to .bashrc or .profile, but others might use zsh or csh, which don't read .profile.

I could also wrap it with a native program, but I'll need to write that in a native language just to redirect the commands, which I might as well just rewrite the entire thing in that language.

In Node.js, developers can simply specify their command in their package.json and everyone who installed this package through npm i -g can use the command. What is the simplest alternative to this in JVM languages?

...

ANSWER

Answered 2020-Jul-06 at 16:33

If I were you, I'd have organised things following way:

Source https://stackoverflow.com/questions/62759694

QUESTION

Dirty trick to keep commit hashes when rewriting Git history?

Asked 2020-Oct-13 at 07:15

Note: there's a similar question How to keep commit hashs not change when use git filter-repo rewrite the history but the answers focus on how Git cannot do that. In this question, I'd like to explore whether, in theory, it's possible to write a custom script to keep the commit hashes.

Git filter-branch and BFG Repo-Cleaner are two popular tools to remove large files and other things from a repo history. They lead to different commit SHAs / hashes, which is how Git works as it "fingerprints" the contents of the commit, its parents etc.

However, we're in a situation where the unfortunate large file commit happened a while ago and we have all sorts of references to newer commits e.g. in GitHub issues ("see commit f0ec467") and other external systems. If we used filter-branch or BFG, lot's of things would break.

So I came here to ask whether there's some dirty, low-level trick how to keep commit IDs / SHA-1 even for rewritten commits. I imagine that for a bad commit that we want to rewrite, a custom script would create a new Git object but "hardcoded" the same / old SHA-1, skipping the calculation of it. The newer commits (its children / descendants) should continue working I think (?!).

If this couldn't work, I'd like to understand why. Does Git check that hashes agree with the actual contents regularly? Does it do so only during some operations, like gc or push or pull?

(I know this is a very thin ice, I'm just technically exploring our options before we accept that we'll have a large binary in our repo forever, with all the implications like having much larger backups forever, full clones taking longer, etc.)

UPDATE: There's now an accepted answer but at the same time, no answer mentions git replace which might be the solution to this? I've done some basic experiments but am not sure yet.

...

ANSWER

Answered 2020-Oct-05 at 09:06

I included a link as a comment, but in fact, breaking SHA-1 doesn't help very much.

The problem is that Gits exchange objects by comparing object hash IDs. These are currently SHA-1 (see the other question and its answer for some future possibilities). If you manage to break SHA-1, and produce a new input object that generates the same hash ID, you could:

rip the old object out of your Git's object database, then
insert the new object into your Git's database

and from then on, your Git would see only the new object, instead of the old one. But when you connect your Git to some other Git, and your Git says to that other Git: I have object a123456..., would you like it? the other Git might just answer: No thanks, I already have that one. They have the old one, of course. So you've made your Git incompatible with their Git, but gained nothing from this.

If the other Git doesn't have the object in question, well, then you're OK! They will ask for your copy and you can hand that over.

Commit and tag objects have room in them for somewhat-arbitrary (not completely arbitrary) user data. This is where you would put your perturbable data for breaking SHA-1. Tree objects are less friendly, but as long as you can do what you need to with commit and tag objects, you can probably bypass this.

As for where to get the compute power, well, the price of a large group of Raspberry Pi computers is coming down....

Edit: I forgot to address this question:

Does Git check that hashes agree with the actual contents regularly?

Yes. In fact, it does this check every time it extracts an object by its hash ID. Remember that the bulk of most repositories is the object database, which is a simple key-value store. The key is the hash ID and the data stored under that key represent the object. Git uses the key to do the lookup, then verifies that the stored data hash to that key, to make sure the stored data were not corrupted by a disk or memory error.

Source https://stackoverflow.com/questions/64204804

QUESTION

How to remove binary files from all of the bitbucket git history, when the repo is already very close to 2GB?

Asked 2020-Jun-08 at 04:08

Bitbucket git repo has a size limit of 2GB, and now I have one repo ( let's call it bigsize repo) that is already dangerously close to that limit, due to a lot of binary files ( files with extensions of dll and msm). It's so close to the limit that I'm scared anymore commit operation involving binary files will tip the size to over 2GB, and hence the commit will fail and unable to proceed.

Now, how to best go by to reduce the bigsize repo size?

I'm thinking about using the LFS feature, but then, there is a 1GB limit on the LFS space, which I afraid will not be sufficient for the usage of this repo ( as the majority of the repo size is coming from the binary files that I want to store in LFS).

So I'm thinking about just removing all the binary files ( I don't mind losing them from source control, as I have them on my local drive) from the repo and the history, how to best do this, considering the current size of my bigsize repo?

The attack plan that I have:

Make sure that for all the branches on the bigsize repo ( yes, I've more than one branch on this gigantic repo), I've remove all of the binary files ( by submitting a commit that specifies *.dll in gitignore and use git rm -rf -cached command) . This is needed because "By default the BFG doesn't modify the contents of your latest commit on your master (or 'HEAD') branch, even though it will clean all the commits before it."
Then use BFG delete-files command to "rewrite the history" so that the repo size will be reduced.

Does the approach work for a repo that is very closed to 2GB? I afraid that at step 1, when I use git rm, it will add to the history and push the repo size to over 2GB, and hence fail.

Important details:

I'm the sole author of the repository
Now I am multiple active branches. Throughout the history there are multiple branches that are merged into the main branch
I don't use the repo for discussions or code reviews, and not even tags. I just use it as a single branch, with occasional branching and merging

...

ANSWER

Answered 2020-Jun-04 at 06:00

git filter-branch or BFG are obsolete

With Git 2.22 or more, use git filter-repo:

Source https://stackoverflow.com/questions/62186378

QUESTION

Git: git fetch ends in "xxx/xxx does not appear to be a git repository"

Asked 2020-May-04 at 16:57

I'm having problems on fetching one specific branch from my remote repo.

If I do git branch -a the output is:

...

ANSWER

Answered 2020-May-04 at 16:57

The git fetch command takes zero, one, or two-or-more arguments:

git fetch: call up the default remote (usually origin) and fetch everything
git fetch remote: call up the named remote. Usually you must use origin here.
git fetch remote branch1 ... branchN: call up the named remote, and when it lists its branches, pick out only the specific named branches.

You're trying to use the last of these three forms but making two separate mistakes:

You must provide the name of the remote, in this case, origin.
The origin/* names in your Git are the result of your Git renaming their Git's branch names so they don't conflict with your own branch names.

Hence what you want is:

Source https://stackoverflow.com/questions/61587416

QUESTION

Deleting folders from git history to reduce repo size

Asked 2020-Jan-05 at 06:48

I'm trying to solve a repository size limit issue on a repository hosted on Bitbucket, where I have reached the 2GB repository size and can no longer push new commits into.

At one point in time, this repository became very big due to the nature of the files in it (Many textures and sound files that were needed as part of the Unity project this repo is for), and our solution at the time was to create a submodule to place all assets into while keeping the core project in the main repo.

This worked for a while, but now I have reached the limit again on the main repo, and digging into it (Thanks to the script provided in this answer: How to find out which files take up the most space in git repo?), I found out that the reason for this was that all of the files that we have moved to the submodule still have their history intact in the main repo.

I then found out about The BFG (https://rtyley.github.io/bfg-repo-cleaner/) that I could use to rewrite the history of my repo and remove the large entries in it by using bfg --delete-folders, then running git gc as instructed in the Usage segment of The BFG page.

Running git count-objects -v afterwards seems to suggest that this had worked, with my repo dropping from a 1.9gb size-pack to under 300mb. And running the script linked above that finds the heaviest entries in the repo no longer finds any of the problem files that were moved to the submodule.

My problem is that despite all that, when I try to push the cleaned repo back to Bitbucket, it fails under the same claim that the repo had exceeded the 2GB size limit and can no longer accept any pushes. If it can't accept the push that's supposed to clean it up, then how exactly am I expected to do that? Any ideas what I'm missing?

Thanks

...

ANSWER

Answered 2020-Jan-05 at 03:24

You might delete all its branches on github and then push the new branches.... or you might delete the repo as a whole on github, recreate it from scratch and push the new branches.

Source https://stackoverflow.com/questions/59596630

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install bfg-repo-cleaner

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: