Automatic-CS | explore deep learning solutions for suggesting answers

by ricardorei Python Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions Vulnerabilities Install Support

kandi X-RAY | Automatic-CS Summary

Automatic-CS is a Python library. Automatic-CS has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

This project is related with my thesis. The goal of the project is to explore deep learning solutions for suggesting answers to customer support human agents.

Support

Quality

Security

License

Reuse

Support

Automatic-CS has a low active ecosystem.

It has 0 star(s) with 0 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

Automatic-CS has no issues reported. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of Automatic-CS is current.

Quality

Automatic-CS has no bugs reported.

Security

Automatic-CS has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

Automatic-CS does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

Automatic-CS releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Automatic-CS

Get all kandi verified functions for this library.

Automatic-CS Key Features

No Key Features are available at this moment for Automatic-CS.

Automatic-CS Examples and Code Snippets

No Code Snippets are available at this moment for Automatic-CS.

Community Discussions

No Community Discussions are available at this moment for Automatic-CS.Refer to stack overflow page for discussions.

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install Automatic-CS

You can download it from GitHub.
You can use Automatic-CS like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

The Twitter Customer Support Corpus from Kaggle is a large, modern corpus of tweets and replies thatintended to encourage research in natural language understanding and conversational models applied tothe customer support scenario. With almost 3M tweets from 20 major brands such as Apple, Amazon,Xbox, Playstation, Spotify, and so forth, this is the largest publicly available real Customer Supportcorpus and a great fit to our study. Ith that said, all the preprocessing steps that will be described next were inspired by the work done by Hardalov et al. 2018. Since the support type provided by different companies typically changes we selected only Apple support tweets, due to the fact that this is the company with most tweets in the original corpus. Then for each Apple support answer tweet we excluded all that redirected customers to other support channels and disclaimers saying that Apple only offers support in English. After this data selection step we ended up with 49k Apple support answers and in order to build context/answer pairs we searched in the original corpus for the previous tweets that originated those answers. The context was defined by concatenating all the previous tweets until a maximum of 150 tokens. For each document the following preprocessing were applied: lower-casing, text split into tokens (using NLTK twitter tokenizer), ids anonymized, and links replaced by the URL token. After this preprocessing we ended up with 49007 pairs that we split into train/validation/test by using all the pairs in which the answer was given in the last 5 days of the corpus for validation or test and the remaining for training. This lead to a corpus with 45844 pairs for training, 1581 for validation and another 1581 for testing. Negative samples were created by pairing each econtext with a randomly selected answer that had less then 0.85 cosine similarity from the original answer in a TF-IDF feature space. The reason we selected only answers with less than 0.85 cosine similarity is because, in our domain, we have many similar answers and by blindly creating negative pairs we will end up with valid pairs with negative labels. Also, with the validation and test sets, we created a ranking set composed of series with a customer email and 10 possible answers including the ground truth. This is later used to test the models in a ranking task, as in Lowe et al., 2015. Finally, with all the unique answers from our training set, we created 1000 clusters in a TF-IDF feature space using a K-Means++ algorithm. Then, for each cluster, we selected the document closer to the centroid and created a list of 1000 possible template answers to be used later by retrieval-based models. We use a value of K equal to 1000 because, in this way, we guarantee a good answer coverage, and, at the same time, retrieval-based models are able to compute in a few seconds what is the best candidate answer, to a given question.

Find more information at: