bfg-repo-cleaner | Removes large or troublesome blobs like git | Search Engine library
kandi X-RAY | bfg-repo-cleaner Summary
kandi X-RAY | bfg-repo-cleaner Summary
BFG Repo-Cleaner
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bfg-repo-cleaner
bfg-repo-cleaner Key Features
bfg-repo-cleaner Examples and Code Snippets
Community Discussions
Trending Discussions on bfg-repo-cleaner
QUESTION
I have a BitBucket repository that shall be migrated to Github. It is using git lfs for large files and has a few corrupt lfs objects, which hinders the pushing to github (now the origin
in below command). The push always returns the error Your push referenced at least X unknown Git LFS objects
.
I'm now wondering if I can either ignore the pre-receive hook or drop/delete the lfs objects in question all together. I don't care much about these files, because they are quite old already and by now not even in the repo anymore.
...ANSWER
Answered 2022-Jan-20 at 12:55I did retry the approach with BFG repo cleaner and it did work nicely now. Also, I did directly push to the new repo (Github), which has the advantage that the original repository is not touched by the update at all.
So following measures allowed me to migrate the repo from Bitbucket to Github while deleting the corrupt files from the repository.
QUESTION
We are trying to shrink our git repository to under 500MB due to deployment issues.
To achieve that, we have created a new branch where we have moved all old images, videos and fonts to AWS S3.
I can easily get the list of files with git diff --name-only --diff-filter=D master -- public/assets/
.
Now, I have tried to run BFG-repo-cleaner 1.14.0 on each file. But I have 400 files and it is taking ages to delete each files separately (still running as I'm writing this).
git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | xargs -i bfg --delete-files '{}'
Since each file is distinct, I can not really use a glob pattern, as suggested at Delete multiple files from multiple branch using bfg repo cleaner.
I tried to separate each file with a comma but that resulted in BFG-repo-cleaner telling me:
BFG aborting: No refs to update - no dirty commits found??
Is there a way to provide multiple files to BFG-repo-cleaner without a glob pattern?
PS. The command I tried with multiple files is: git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | sed -z 's/\n/,/g;s/,$/\n/' | xargs -i bfg --delete-files '{}' && git reflog expire --expire=now --all && git gc --prune=now --aggressive
PPS. The bfg
command is on my PATH as a simple bash script with java -jar /tools/BFG-repo-cleaner/bfg-1.14.0.jar "$@"
ANSWER
Answered 2021-Dec-17 at 00:26But I have 400 files and it is taking ages to delete each files separately
That is why the tool to use (python-based) is newren/git-filter-repo
(see installation)
That way, you can feed that tool a file, with the list of files in it:
QUESTION
I am cleaning a git repository's pull requests (PR). There was one PR created to merge a branch B, which was later considered deprecated and removed before being merged. As a result, branch B was deleted, and this PR is not showing up in Bitbucket's pull request list. However, if I use git show-ref
, this PR is in the ref list as well as in the remote repository history. Is there a way to clear this PR in the remote repository?
ANSWER
Answered 2020-Nov-05 at 04:11Make sure you have deleted the branch and not just the PR. If the ref is showing up on your local you can run git fetch --prune
to remove an refs on local that are not on your remote. You might also want to run git gc
to remove orphaned objects as a followup.
QUESTION
I'm creating a CLI program in Kotlin (Java). I want to bind the main function/class to an individual command, such as program
.
However, from what I searched online, it seems like the only way to run a Java program is with the command java
. For example, java -jar program.jar args
, or java -cp "..." Program args
. But they are very inconvenient for users to type every time, which I experienced when I used BFG, a command-line repository cleaner tool written in Java.
I could use an alias, but there is no standard way to add aliases to a system when users install my CLI program. For example, most people use Bash, so I have to install the alias to .bashrc
or .profile
, but others might use zsh
or csh
, which don't read .profile
.
I could also wrap it with a native program, but I'll need to write that in a native language just to redirect the commands, which I might as well just rewrite the entire thing in that language.
In Node.js, developers can simply specify their command in their package.json
and everyone who installed this package through npm i -g
can use the command. What is the simplest alternative to this in JVM languages?
ANSWER
Answered 2020-Jul-06 at 16:33If I were you, I'd have organised things following way:
QUESTION
Note: there's a similar question How to keep commit hashs not change when use git filter-repo rewrite the history but the answers focus on how Git cannot do that. In this question, I'd like to explore whether, in theory, it's possible to write a custom script to keep the commit hashes.
Git filter-branch
and BFG Repo-Cleaner are two popular tools to remove large files and other things from a repo history. They lead to different commit SHAs / hashes, which is how Git works as it "fingerprints" the contents of the commit, its parents etc.
However, we're in a situation where the unfortunate large file commit happened a while ago and we have all sorts of references to newer commits e.g. in GitHub issues ("see commit f0ec467") and other external systems. If we used filter-branch or BFG, lot's of things would break.
So I came here to ask whether there's some dirty, low-level trick how to keep commit IDs / SHA-1 even for rewritten commits. I imagine that for a bad commit that we want to rewrite, a custom script would create a new Git object but "hardcoded" the same / old SHA-1, skipping the calculation of it. The newer commits (its children / descendants) should continue working I think (?!).
If this couldn't work, I'd like to understand why. Does Git check that hashes agree with the actual contents regularly? Does it do so only during some operations, like gc
or push or pull?
(I know this is a very thin ice, I'm just technically exploring our options before we accept that we'll have a large binary in our repo forever, with all the implications like having much larger backups forever, full clones taking longer, etc.)
UPDATE: There's now an accepted answer but at the same time, no answer mentions git replace
which might be the solution to this? I've done some basic experiments but am not sure yet.
ANSWER
Answered 2020-Oct-05 at 09:06I included a link as a comment, but in fact, breaking SHA-1 doesn't help very much.
The problem is that Gits exchange objects by comparing object hash IDs. These are currently SHA-1 (see the other question and its answer for some future possibilities). If you manage to break SHA-1, and produce a new input object that generates the same hash ID, you could:
- rip the old object out of your Git's object database, then
- insert the new object into your Git's database
and from then on, your Git would see only the new object, instead of the old one. But when you connect your Git to some other Git, and your Git says to that other Git: I have object a123456...
, would you like it? the other Git might just answer: No thanks, I already have that one. They have the old one, of course. So you've made your Git incompatible with their Git, but gained nothing from this.
If the other Git doesn't have the object in question, well, then you're OK! They will ask for your copy and you can hand that over.
Commit and tag objects have room in them for somewhat-arbitrary (not completely arbitrary) user data. This is where you would put your perturbable data for breaking SHA-1. Tree objects are less friendly, but as long as you can do what you need to with commit and tag objects, you can probably bypass this.
As for where to get the compute power, well, the price of a large group of Raspberry Pi computers is coming down....
Edit: I forgot to address this question:
Does Git check that hashes agree with the actual contents regularly?
Yes. In fact, it does this check every time it extracts an object by its hash ID. Remember that the bulk of most repositories is the object database, which is a simple key-value store. The key is the hash ID and the data stored under that key represent the object. Git uses the key to do the lookup, then verifies that the stored data hash to that key, to make sure the stored data were not corrupted by a disk or memory error.
QUESTION
Bitbucket git repo has a size limit of 2GB, and now I have one repo ( let's call it bigsize repo) that is already dangerously close to that limit, due to a lot of binary files ( files with extensions of dll
and msm
). It's so close to the limit that I'm scared anymore commit operation involving binary files will tip the size to over 2GB, and hence the commit will fail and unable to proceed.
Now, how to best go by to reduce the bigsize repo size?
I'm thinking about using the LFS feature, but then, there is a 1GB limit on the LFS space, which I afraid will not be sufficient for the usage of this repo ( as the majority of the repo size is coming from the binary files that I want to store in LFS).
So I'm thinking about just removing all the binary files ( I don't mind losing them from source control, as I have them on my local drive) from the repo and the history, how to best do this, considering the current size of my bigsize repo?
The attack plan that I have:
- Make sure that for all the branches on the bigsize repo ( yes, I've more than one branch on this gigantic repo), I've remove all of the binary files ( by submitting a commit that specifies
*.dll
in gitignore and usegit rm -rf -cached
command) . This is needed because "By default the BFG doesn't modify the contents of your latest commit on your master (or 'HEAD') branch, even though it will clean all the commits before it." - Then use BFG delete-files command to "rewrite the history" so that the repo size will be reduced.
Does the approach work for a repo that is very closed to 2GB? I afraid that at step 1, when I use git rm
, it will add to the history and push the repo size to over 2GB, and hence fail.
Important details:
- I'm the sole author of the repository
- Now I am multiple active branches. Throughout the history there are multiple branches that are merged into the main branch
- I don't use the repo for discussions or code reviews, and not even tags. I just use it as a single branch, with occasional branching and merging
ANSWER
Answered 2020-Jun-04 at 06:00git filter-branch
or BFG are obsolete
With Git 2.22 or more, use git filter-repo
:
QUESTION
I'm having problems on fetching one specific branch from my remote repo.
If I do git branch -a the output is:
...ANSWER
Answered 2020-May-04 at 16:57The git fetch
command takes zero, one, or two-or-more arguments:
git fetch
: call up the default remote (usuallyorigin
) and fetch everythinggit fetch remote
: call up the named remote. Usually you must useorigin
here.git fetch remote branch1 ... branchN
: call up the named remote, and when it lists its branches, pick out only the specific named branches.
You're trying to use the last of these three forms but making two separate mistakes:
- You must provide the name of the remote, in this case,
origin
. - The
origin/*
names in your Git are the result of your Git renaming their Git's branch names so they don't conflict with your own branch names.
Hence what you want is:
QUESTION
I'm trying to solve a repository size limit issue on a repository hosted on Bitbucket, where I have reached the 2GB repository size and can no longer push new commits into.
At one point in time, this repository became very big due to the nature of the files in it (Many textures and sound files that were needed as part of the Unity project this repo is for), and our solution at the time was to create a submodule to place all assets into while keeping the core project in the main repo.
This worked for a while, but now I have reached the limit again on the main repo, and digging into it (Thanks to the script provided in this answer: How to find out which files take up the most space in git repo?), I found out that the reason for this was that all of the files that we have moved to the submodule still have their history intact in the main repo.
I then found out about The BFG (https://rtyley.github.io/bfg-repo-cleaner/) that I could use to rewrite the history of my repo and remove the large entries in it by using bfg --delete-folders
, then running git gc
as instructed in the Usage segment of The BFG page.
Running git count-objects -v
afterwards seems to suggest that this had worked, with my repo dropping from a 1.9gb size-pack to under 300mb.
And running the script linked above that finds the heaviest entries in the repo no longer finds any of the problem files that were moved to the submodule.
My problem is that despite all that, when I try to push the cleaned repo back to Bitbucket, it fails under the same claim that the repo had exceeded the 2GB size limit and can no longer accept any pushes. If it can't accept the push that's supposed to clean it up, then how exactly am I expected to do that? Any ideas what I'm missing?
Thanks
...ANSWER
Answered 2020-Jan-05 at 03:24You might delete all its branches on github and then push the new branches.... or you might delete the repo as a whole on github, recreate it from scratch and push the new branches.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install bfg-repo-cleaner
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page