Hierarchical-Attention-Networks | Tensorflow Implementation of Hierarchical Attention Networks | Natural Language Processing library

 by   SSinyu Python Version: Current License: No License

kandi X-RAY | Hierarchical-Attention-Networks Summary

kandi X-RAY | Hierarchical-Attention-Networks Summary

Hierarchical-Attention-Networks is a Python library typically used in Artificial Intelligence, Natural Language Processing, Deep Learning, Pytorch, Tensorflow applications. Hierarchical-Attention-Networks has no bugs, it has no vulnerabilities and it has low support. However Hierarchical-Attention-Networks build file is not available. You can download it from GitHub.

Pytorch/Tensorflow implementation of Hierarchical Attention Networks for Document Classification. Model has a hierarchical structure that mirrors the hierarchical structure of documents, and consist of word-level encoder/attention layer, sentence-level encoder/attention layer.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              Hierarchical-Attention-Networks has a low active ecosystem.
              It has 10 star(s) with 2 fork(s). There are no watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              Hierarchical-Attention-Networks has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of Hierarchical-Attention-Networks is current.

            kandi-Quality Quality

              Hierarchical-Attention-Networks has 0 bugs and 24 code smells.

            kandi-Security Security

              Hierarchical-Attention-Networks has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              Hierarchical-Attention-Networks code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              Hierarchical-Attention-Networks does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              Hierarchical-Attention-Networks releases are not available. You will need to build from source code and install.
              Hierarchical-Attention-Networks has no build file. You will be need to create the build yourself to build the component from source.
              It has 662 lines of code, 35 functions and 10 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed Hierarchical-Attention-Networks and discovered the below as its top functions. This is intended to give you an instant insight into Hierarchical-Attention-Networks implemented functionality, and help decide if they suit your requirements.
            • Train the HAN
            • Get a batch of data
            • Sort a tensor
            • Make one - hot encoded one - hot matrix
            • Decrement lr
            • Convert a sentence layer to vector attention
            • Implements BiGRU
            • Compute the length of a sequence
            • Attention layer
            • Compute the accuracy of the network
            • Load the HAN model
            • Print a progress bar
            • Prepare text
            • Cleans a string
            • Return data loader
            • Convert a sentence representation of a sentence
            • Create directory
            Get all kandi verified functions for this library.

            Hierarchical-Attention-Networks Key Features

            No Key Features are available at this moment for Hierarchical-Attention-Networks.

            Hierarchical-Attention-Networks Examples and Code Snippets

            No Code Snippets are available at this moment for Hierarchical-Attention-Networks.

            Community Discussions

            QUESTION

            Should RNN attention weights over variable length sequences be re-normalized to "mask" the effects of zero-padding?
            Asked 2019-Apr-15 at 21:29

            To be clear, I am referring to "self-attention" of the type described in Hierarchical Attention Networks for Document Classification and implemented many places, for example: here. I am not referring to the seq2seq type of attention used in encoder-decoder models (i.e. Bahdanau), although my question might apply to that as well... I am just not as familiar with it.

            Self-attention basically just computes a weighted average of RNN hidden states (a generalization of mean-pooling, i.e. un-weighted average). When there are variable length sequences in the same batch, they will typically be zero-padded to the length of the longest sequence in the batch (if using dynamic RNN). When the attention weights are computed for each sequence, the final step is a softmax, so the attention weights sum to 1.

            However, in every attention implementation I have seen, there is no care taken to mask out, or otherwise cancel, the effects of the zero-padding on the attention weights. This seems wrong to me, but I fear maybe I am missing something since nobody else seems bothered by this.

            For example, consider a sequence of length 2, zero-padded to length 5. Ultimately this leads to the attention weights being computed as the softmax of a similarly 0-padded vector, e.g.:

            weights = softmax([0.1, 0.2, 0, 0, 0]) = [0.20, 0.23, 0.19, 0.19, 0.19]

            and because exp(0)=1, the zero-padding in effect "waters down" the attention weights. This can be easily fixed, after the softmax operation, by multiplying the weights with a binary mask, i.e.

            mask = [1, 1, 0, 0, 0]

            and then re-normalizing the weights to sum to 1. Which would result in:

            weights = [0.48, 0.52, 0, 0, 0]

            When I do this, I almost always see a performance boost (in the accuracy of my models - I am doing document classification/regression). So why does nobody do this?

            For a while I considered that maybe all that matters is the relative values of the attention weights (i.e., ratios), since the gradient doesn't pass through the zero-padding anyway. But then why would we use softmax at all, as opposed to just exp(.), if normalization doesn't matter? (plus, that wouldn't explain the performance boost...)

            ...

            ANSWER

            Answered 2018-Apr-17 at 21:25

            Great question! I believe your concern is valid and zero attention scores for the padded encoder outputs do affect the attention. However, there are few aspects that you have to keep in mind:

            • There are different score functions, the one in tf-rnn-attention uses simple linear + tanh + linear transformation. But even this score function can learn to output negative scores. If you look at the code and imagine inputs consists of zeros, vector v is not necessarily zero due to bias and the dot product with u_omega can boost it further to low negative numbers (in other words, plain simple NN with a non-linearity can make both positive and negative predictions). Low negative scores don't water down the high scores in softmax.

            • Due to bucketing technique, the sequences within a bucket usually have roughly the same length, so it's unlikely to have half of the input sequence padded with zeros. Of course, it doesn't fix anything, it just means that in real applications negative effect from the padding is naturally limited.

            • You mentioned it in the end, but I'd like to stress it too: the final attended output is the weighted sum of encoder outputs, i.e. relative values actually matter. Take your own example and compute the weighted sum in this case:

              • the first one is 0.2 * o1 + 0.23 * o2 (the rest is zero)
              • the second one is 0.48 * o1 + 0.52 * o2 (the rest is zero too)


              Yes, the magnitude of the second vector is two times bigger and it isn't a critical issue, because it goes then to the linear layer. But relative attention on o2 is just 7% higher, than it would have been with masking.

              What this means is that even if the attention weights won't do a good job in learning to ignore zero outputs, the end effect on the output vector is still good enough for the decoder to take the right outputs into account, in this case to concentrate on o2.

            Hope this convinces you that re-normalization isn't that critical, though probably will speed-up learning if actually applied.

            Source https://stackoverflow.com/questions/49522673

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install Hierarchical-Attention-Networks

            You can download it from GitHub.
            You can use Hierarchical-Attention-Networks like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/SSinyu/Hierarchical-Attention-Networks.git

          • CLI

            gh repo clone SSinyu/Hierarchical-Attention-Networks

          • sshUrl

            git@github.com:SSinyu/Hierarchical-Attention-Networks.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Natural Language Processing Libraries

            transformers

            by huggingface

            funNLP

            by fighting41love

            bert

            by google-research

            jieba

            by fxsjy

            Python

            by geekcomputers

            Try Top Libraries by SSinyu

            RED-CNN

            by SSinyuPython

            RED_CNN

            by SSinyuPython

            WGAN-VGG

            by SSinyuPython

            WGAN_VGG

            by SSinyuPython

            CycleGAN-CT-Denoising

            by SSinyuPython