gpt-2 | Language Models are Unsupervised Multitask Learners

 by   openai Python Version: Current License: Non-SPDX

kandi X-RAY | gpt-2 Summary

kandi X-RAY | gpt-2 Summary

gpt-2 is a Python library. gpt-2 has no bugs, it has no vulnerabilities, it has build file available and it has medium support. However gpt-2 has a Non-SPDX License. You can download it from GitHub.

Status: Archive (code is provided as-is, no updates expected).
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              gpt-2 has a medium active ecosystem.
              It has 19332 star(s) with 4951 fork(s). There are 628 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 119 open issues and 130 have been closed. On average issues are closed in 18 days. There are 34 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of gpt-2 is current.

            kandi-Quality Quality

              gpt-2 has 0 bugs and 0 code smells.

            kandi-Security Security

              gpt-2 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              gpt-2 code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              gpt-2 has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              gpt-2 releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              gpt-2 saves you 173 person hours of effort in developing the same functionality from scratch.
              It has 428 lines of code, 36 functions and 6 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed gpt-2 and discovered the below as its top functions. This is intended to give you an instant insight into gpt-2 implemented functionality, and help decide if they suit your requirements.
            • Constructs an interaction model
            • Decode a sequence of tokens
            • Encode text using bpe
            • Return the bpe of the given token
            • Randomly sample a model
            Get all kandi verified functions for this library.

            gpt-2 Key Features

            No Key Features are available at this moment for gpt-2.

            gpt-2 Examples and Code Snippets

            GPT-2 PyTorch Implementation,Usage,How to train?
            Pythondot img1Lines of Code : 72dot img1License : Permissive (Apache-2.0)
            copy iconCopy
            $ python -m gpt2 train --train_corpus           build/corpus.train.txt \
                                   --eval_corpus            build/corpus.test.txt \
                                   --vocab_path             build/vocab.txt \
                                   --save_checkpoin  
            GPT-2 PyTorch Implementation,Usage,Generate sentences!
            Pythondot img2Lines of Code : 26dot img2License : Permissive (Apache-2.0)
            copy iconCopy
            $ python -m gpt2 generate --vocab_path      build/vocab.txt \
                                      --model_path      model.pth \
                                      --seq_len         64 \
                                      --nucleus_prob    0.8
            
            usage: gpt2 generate [-h] --vocab_  
            Belgian GPT-2 ,Usage
            Pythondot img3Lines of Code : 23dot img3License : Permissive (MIT)
            copy iconCopy
            import torch
            from transformers import GPT2Tokenizer, GPT2LMHeadModel
            
            # Load pretrained model and tokenizer
            model = GPT2LMHeadModel.from_pretrained("antoiloui/belgpt2")
            tokenizer = GPT2Tokenizer.from_pretrained("antoiloui/belgpt2")
            
            # Generate a samp  

            Community Discussions

            QUESTION

            Solving "CUDA out of memory" when fine-tuning GPT-2 (HuggingFace)
            Asked 2022-Apr-03 at 09:45

            I get the reoccuring CUDA out of memory error when using the HuggingFace Transformers library to fine-tune a GPT-2 model and can't seem to solve it, despite my 6 GB GPU capacity, which I thought should be enough for fine-tuning on texts. The error reads as follows:

            ...

            ANSWER

            Answered 2022-Apr-03 at 09:45
            1. If the memory problems still persist, you could opt for DistillGPT2, as it has a 33% reduction in the parameters of the network (the forward pass is also twice as fast). Particularly for a small GPU memory like 6GB VRAM, it could be a solution/alternative to your problem.
            2. At the same time, it depends on how you preprocess the data. Indeed, the model is capable of "receiving" a maximum length of N tokens (could be for example 512/768) depending on the models you choose. I recently trained a named entity recognition model and the model had a maximum length of 768 tokens. However, when I manually set the dimension of the padded tokens in my PyTorch DataLoader() to a big number, I also got OOM memory (even on 3090 24GB VRAM). As I reduced the dimension of the tokens to a much smaller one (512 instead of 768 for example) the training started to work and I did not get any issues with the lack of memory.

            TLDR: Reducing the number of tokens in the preprocessing phase, regardless of the max capacity of the network, can also help to solve your memories problem. Note that reducing the number of tokens to process in a sequence is different from the dimension of a token.

            Source https://stackoverflow.com/questions/70606666

            QUESTION

            How to save checkpoints for thie transformer gpt2 to continue training?
            Asked 2022-Feb-22 at 19:10

            I am retraining the GPT2 language model, and am following this blog :

            https://towardsdatascience.com/train-gpt-2-in-your-own-language-fc6ad4d60171

            Here, they have trained a network on GPT2, and I am trying to recreate a same. However, my dataset is too large(250Mb), so I want to continue training in intervals. In other words, I want to checkpoint the model training. If there is any help, or a piece of code that I can implement to checkpoint and continue training, it would help a great deal for me. Thank you.

            ...

            ANSWER

            Answered 2022-Feb-22 at 19:10
            training_args = TrainingArguments(
                output_dir=model_checkpoint,
                # other hyper-params
            )
            
            trainer = Trainer(
                model=model,
                args=training_args,
                train_dataset=train_set,
                eval_dataset=dev_set,
                tokenizer=tokenizer
            )
            
            trainer.train()
            # Save the model to model_dir
            trainer.save_model()
            
            def prepare_model(tokenizer, model_name_path):
                model = AutoModelForCausalLM.from_pretrained(model_name_path)
                model.resize_token_embeddings(len(tokenizer))
                return model
            
            # Assume tokenizer is defined, You can simply pass the saved model directory path.
            model = prepare_model(tokenizer, model_checkpoint)
            

            Source https://stackoverflow.com/questions/71215965

            QUESTION

            How to use GPU for this python file
            Asked 2022-Jan-05 at 07:19

            I have this python file where I am trying to train a GPT2 model from scratch. For the same, I want to use gpu for faster acceleration and I am unable to do so. Help will be much appreciated

            My python code is as follows.

            PS : I am running this code on AWS Sagemaker, so I want to use their gpu acceleration.

            I have used this for reference link

            ...

            ANSWER

            Answered 2022-Jan-05 at 07:19

            You need to activate GPU runtime while hosting the notebook session in AWS SageMaker. The code will automatically take care of utilizing GPU resources.

            Looking at the link which you shared - it doesn't have any custom configs to manually specify GPU resources.

            If it's handled automatically by the framework which you're using to train the network, then in an active GPU session it will automatically allocate GPU resources while training.

            Source https://stackoverflow.com/questions/70588756

            QUESTION

            "ValueError: You have to specify either input_ids or inputs_embeds" when training AutoModelWithLMHead Model (GPT-2)
            Asked 2022-Jan-04 at 14:08

            I want to fine-tune the AutoModelWithLMHead model from this repository, which is a German GPT-2 model. I have followed the tutorials for pre-processing and fine-tuning. I have prepocessed a bunch of text passages for the fine-tuning, but when beginning training, I receive the following error:

            ...

            ANSWER

            Answered 2022-Jan-04 at 14:08

            I didn't find the concrete answer to this question, but a workaround. For anyone looking for examples on how to fine-tune the GPT models from HuggingFace, you may have a look into this repo. They listed a couple of examples on how to fine-tune different Transformer models, complemented by documented code examples. I used the run_clm.py script and it achieved what I wanted.

            Source https://stackoverflow.com/questions/70577285

            QUESTION

            pad_token_id not working in hugging face transformers
            Asked 2021-Oct-11 at 15:09

            I want to download the GPT-2 model and tokeniser. For open-end generation, HuggingFace sets the padding token ID to be equal to the end-of-sentence token ID, so I configured it manually using :

            ...

            ANSWER

            Answered 2021-Oct-11 at 13:25

            Your code does not throw any error for me - I would try re-installing the most recent version of transformers - if that is a viable solution for you.

            Source https://stackoverflow.com/questions/69480199

            QUESTION

            Spacy-Transformers: Access GPT-2?
            Asked 2021-Aug-28 at 05:16

            I'm using Spacy-Transformers to build some NLP models.

            The Spacy-Transformers docs say:

            spacy-transformers

            spaCy pipelines for pretrained BERT, XLNet and GPT-2

            The sample code on that page shows:

            ...

            ANSWER

            Answered 2021-Aug-28 at 05:16

            The en_core_web_trf uses a specific Transformers model, but you can specify arbitrary ones using the TransformerModel wrapper class from spacy-transformers. See the docs for that. An example config:

            Source https://stackoverflow.com/questions/68946827

            QUESTION

            How to get th content of a string inside a request response?
            Asked 2021-Jun-18 at 16:36

            I was coding a webapp based on GPT-2 but it was not good so I decided to switch to official OpenAI GPT-3. So I make that request:

            ...

            ANSWER

            Answered 2021-Jun-18 at 16:36

            Using the dict indexing by key, and the list indexing by index

            Source https://stackoverflow.com/questions/68038662

            QUESTION

            Fine-tuning GPT-2/3 on new data
            Asked 2021-May-30 at 12:09

            I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard? My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.

            Is it possible??

            Thanks a lot for any (also conceptual) help!

            ...

            ANSWER

            Answered 2021-May-28 at 08:46

            You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?

            Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.

            When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

            Source https://stackoverflow.com/questions/67735561

            QUESTION

            Flask app serving GPT2 on Google Cloud Run not persisting downloaded files?
            Asked 2021-Mar-30 at 16:27

            I have a Flask app running on Google Cloud Run, which needs to download a large model (GPT-2 from huggingface). This takes a while to download, so I am trying to set up so that it only downloads on deployment and then just serves this up for subsequent visits. That is I have the following code in a script that is imported by my main flask app app.py:

            ...

            ANSWER

            Answered 2021-Mar-30 at 16:27

            Data written to the filesystem does not persist when the container instance is stopped.

            Cloud Run lifetime is the time between an HTTP Request and the HTTP response. Overlapped requests extend this lifetime. Once the final HTTP response is sent your container can be stopped.

            Cloud Run instances can run on different hardware (clusters). One instance will not have the same temporary data as another instance. Instances can be moved. Your strategy of downloading a large file and saving it to the in-memory file system will not work consistently.

            Filesystem access

            Also note that the file system is in-memory, which means you need to have additional memory to store files.

            Source https://stackoverflow.com/questions/66873983

            QUESTION

            What do the logits and probabilities from RobertaForSequenceClassification represent?
            Asked 2020-Dec-10 at 23:53

            Being new to the "Natural Language Processing" scene, I am experimentally learning and have implemented the following segment of code:

            ...

            ANSWER

            Answered 2020-Dec-10 at 23:53

            You have initialized a RobertaForSequenceClassification model that per default (in case of roberta-base and roberta-large which have no trained output layers for sequence classification) tries to classify if a sequence belongs to one class or another. I used the expression "belongs to one class or another" because these classes have no meaning yet. The output layer is untrained and it requires a finetuning to give these classes a meaning. Class 0 could be X and Class 1 could be Y or the other way around. For example, the tutorial for finetuning a sequence classification model for the IMDb review dataset defines negative reviews as Class 0 and positive reviews as Class 1 (link).

            You can check the number of supported classes with:

            Source https://stackoverflow.com/questions/65221079

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install gpt-2

            You can download it from GitHub.
            You can use gpt-2 like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/openai/gpt-2.git

          • CLI

            gh repo clone openai/gpt-2

          • sshUrl

            git@github.com:openai/gpt-2.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link