voxpopuli | scale multilingual speech corpus for representation learning | Speech library

 by   facebookresearch Python Version: Current License: Non-SPDX

kandi X-RAY | voxpopuli Summary

kandi X-RAY | voxpopuli Summary

voxpopuli is a Python library typically used in Artificial Intelligence, Speech applications. voxpopuli has no bugs, it has no vulnerabilities, it has build file available and it has low support. However voxpopuli has a Non-SPDX License. You can download it from GitHub.

a large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. voxpopuli provides - 400k hours of unlabelled speech data for 23 languages - 1.8k hours of transcribed speech data for 16 languages - 17.3k hours of speech-to-speech interpretation data for 15x15 directions. the raw data is collected from 2009-2020 [european parliament event recordings] we acknowledge the european parliament for creating and sharing these materials. unlabelled and transcribed data. | language | code | unlabelled hours (v1/v2) | transcribed hours | transcribed speakers | transcribed tokens | lm tokens | |:---:|:---:|:---:|:---:|:---:|:---:|:---:| | english | en | 4.5k/24.1k | 543 | 1313 | 4.8m | 60.1m | | german | de | 4.5k/23.2k | 282 | 531 | 2.3m | 50.0m | | french | fr | 4.5k/22.8k | 211 | 534 | 2.1m | 58.6m | | spanish | es | 4.4k/21.4k | 166 | 305 | 1.6m | 57.4m | | polish | pl | 4.5k/21.2k | 111 | 282 | 802k | 13.6m | | italian | it | 4.6k/21.9k | 91 | 306 | 757k | 52.1m | | romanian | ro | 4.5k/17.9k | 89 | 164 | 739k | 10.3m | | hungarian | hu |
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              voxpopuli has a low active ecosystem.
              It has 407 star(s) with 35 fork(s). There are 19 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 9 open issues and 10 have been closed. On average issues are closed in 7 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of voxpopuli is current.

            kandi-Quality Quality

              voxpopuli has no bugs reported.

            kandi-Security Security

              voxpopuli has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              voxpopuli has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              voxpopuli releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed voxpopuli and discovered the below as its top functions. This is intended to give you an instant insight into voxpopuli implemented functionality, and help decide if they suit your requirements.
            • Downloads subtitles
            • Get metadata for given subset
            • Perform multiprocessing
            • Parses src_id string
            • Split audio
            • Get a list of segmented audio segments
            • Get pyannote segments
            • Load tracks from a pkl file
            • Download the audios data
            • Load an annotation file
            • Get all audio data
            • Get a list of all audio files for lang
            • Return a list of all the years of lang
            • Return a list of all sessions for a given language
            • Process an audio session
            • Process the word alignment file
            • Process text
            • Parse arguments
            • Load normalized normalized text
            • Return a set of session IDs that are used in alignment
            • Get all the audio files for a language
            • Wrapper for multiprocessing
            • Download the ASR dataset
            • Check if the audio file exists
            • Cut a session
            • Run the cut
            • Launches the segmentation on the given sessions
            Get all kandi verified functions for this library.

            voxpopuli Key Features

            No Key Features are available at this moment for voxpopuli.

            voxpopuli Examples and Code Snippets

            No Code Snippets are available at this moment for voxpopuli.

            Community Discussions

            Trending Discussions on voxpopuli

            QUESTION

            Wrong array with while
            Asked 2019-Mar-18 at 22:13

            I'm sorry for this question, I'm sure is a noob question.

            But... I'm not able to manage that and I'm sorry for asking your help.

            I have this script:

            ...

            ANSWER

            Answered 2019-Mar-18 at 21:58

            You want to use fetch() in while loops, not fetchAll(). fetchAll pulls all the rows at once.

            Source https://stackoverflow.com/questions/55230585

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install voxpopuli

            You can download it from GitHub.
            You can use voxpopuli like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/facebookresearch/voxpopuli.git

          • CLI

            gh repo clone facebookresearch/voxpopuli

          • sshUrl

            git@github.com:facebookresearch/voxpopuli.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Speech Libraries

            DeepSpeech

            by mozilla

            kaldi

            by kaldi-asr

            zeal

            by zealdocs

            leon

            by leon-ai

            Try Top Libraries by facebookresearch

            segment-anything

            by facebookresearchJupyter Notebook

            fairseq

            by facebookresearchPython

            Detectron

            by facebookresearchPython

            detectron2

            by facebookresearchPython

            fastText

            by facebookresearchHTML