kandi background


by ganesh

Paraphrasing refers to rewriting something in different words and using different expressions. It does not include changing the whole concept or meaning. It is a method in which we use words’ alternatives and different sentence structures. Paraphrasing is a restatement of any content or text. This is done by using a sentence re-phraser (Paraphraser). What is a good paraphrase? Almost all conditioned text generation models are validated on 2 factors, (1) if the generated text conveys the same meaning as the original context (Adequacy) (2) if the text is fluent / grammatically correct English (Fluency). For instance Neural Machine Translation outputs are tested for Adequacy and Fluency. But a good paraphrase should be adequate and fluent while being as different as possible on the surface lexical form. With respect to this definition, the 3 key metrics that measures the quality of paraphrases are:
  • Adequacy (Is the meaning preserved adequately?)
  • Fluency (Is the paraphrase fluent English?)
  • Diversity (Lexical / Phrasal / Syntactical) (How much has the paraphrase changed the original sentence?)
The aim of a paraphraser is to create paraphrases that are fluent and have the same meaning. There are many uses or applications of a Paraphraser:
  • Data Augmentation: Paraphrasing helps in augmenting/creating training data for Natural Language Understanding(NLU) models to build robust models for conversational engines by creating equivalent paraphrases for a particular phrase or sentence thereby creating a text corpus as training data.
  • Summarization: Paraphrasing helps to create summaries of a large text corpus for understanding the crux of the text corpus.
  • Sentence Rephrasing: Paraphrasing helps in generating sentences with similar context for a particular phrase/sentence. These rephrased sentences can be used to create plagiarism free content for articles, blogs etc.
  • A typical process flow to create training data by data augmentation using Paraphraser is picturized below: The Paraphrase Generator kit helps to paraphrase a given phrase in different ways. This kit uses Parrot paraphraser's pre-build model to demonstrate how a paraphraser works. The Parrot_Paraphraser library aims to serve the purpose of text augmentation for building good NLU models by creating paraphrases. The library depends on NLP/text embedding generation packages like sentence-transformer, Transformer etc. for creating sentence-level text embeddings. The Paraphraser model is trained on the the following datasets - MSRP Paraphrase - Google PAWS - ParaNMT - Quora question pairs. - SNIPS commands - MSRP Frames A sample input/output of Paraphraser module is displayed below:
    Input Phrase: Can you recommend some upscale restaurants in New York?
    Paraphraser Output:
    list some excellent restaurants to visit in New York city?
    what upscale restaurants do you recommend in New York?
    i want to try some upscale restaurants in New York?
    recommend some upscale restaurants in New York?
    can you recommend some high end restaurants in New York?
    can you recommend some upscale restaurants in New York?
    can you recommend some upscale restaurants in New York?
    Input Phrase: What are the famous places we should not miss in Russia
    Paraphraser Output:
    what should we not miss when visiting Russia?
    recommend some of the best places to visit in Russia?
    list some of the best places to visit in Russia?
    can you list the top places to visit in Russia?
    show the places that we should not miss in Russia?
    list some famous places which we should not miss in Russia?

    Solution Source, Deployment Instructions are not available for this kit.


For Windows users: While you attempt to run the kit_installer batch file, you might be view a prompt from Microsoft Defender as below: In such cases, click on 'More info' like in the above image and click 'Run anyway' like in the below image. The batch file will install all the necessary libraries, environment to run the solution. Once the batch file run is complete, the prompt displays if the kit is successfully installed.