phylogenomic_dataset_construction

by rafelafrance Python Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions Vulnerabilities Install Support

kandi X-RAY | phylogenomic_dataset_construction Summary

phylogenomic_dataset_construction is a Python library. phylogenomic_dataset_construction has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

phylogenomic_dataset_construction

Support

Quality

Security

License

Reuse

Support

phylogenomic_dataset_construction has a low active ecosystem.

It has 1 star(s) with 0 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 1 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of phylogenomic_dataset_construction is current.

Quality

phylogenomic_dataset_construction has no bugs reported.

Security

phylogenomic_dataset_construction has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

phylogenomic_dataset_construction does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

phylogenomic_dataset_construction releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of phylogenomic_dataset_construction

Get all kandi verified functions for this library.

phylogenomic_dataset_construction Key Features

No Key Features are available at this moment for phylogenomic_dataset_construction.

phylogenomic_dataset_construction Examples and Code Snippets

No Code Snippets are available at this moment for phylogenomic_dataset_construction.

Community Discussions

No Community Discussions are available at this moment for phylogenomic_dataset_construction.Refer to stack overflow page for discussions.

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install phylogenomic_dataset_construction

Make sure that raxml, fasttree, phyx(pxclsq,pxcat,pxs2phy,pxs2nex), TreeShrink, mafft and pasta are properly installed and excutables are in the path, and the executables are named exactly as raxml, fasttree, phyx(pxclsq,pxcat,pxs2phy,pxs2nex) mafft and pasta respectively. Align each cluster, trim alignment, and infer a tree. Make sure that you have raxml 8 or later installed so that it reads fasta file. With an ealier version of raxml you will get an error message "Problem reading number of species and sites". For clusters that have less than 1000 sequences, it will be aligned with mafft (--genafpair --maxiterate 1000), trimmed by a minimal column occupancy of 0.1 and tree inference using raxml. For larger clusters it will be aligned with pasta, trimmed by a minimal column occupancy of 0.01 and tree inference using fasttree. The ouput tree files look like clusterID.raxml.tre or clusterID.fasttree.tre for clusters with 1000 or more sequences. You can visualize some of the trees and alignments. You can see that tips that are 0.4 or more are pretty much junk. There are also some tips that are much longer than near-by tips that are probably results of assembly artifacts. Trim these spurious tips with TreeShrink. It outputs the tips that were trimmed in the file .txt and the trimmed trees in the files .tt. You would need to test different quantiles to see which one fit better you data. The TreeShrink uses 0.05 as default but this might be too high for some dataset. This produces that when the outgroups have long branches these get cut, so make sure that you check for the output trees and txt files to see wich branches are getting cut and choose a quantile value for you data (although a single quantile value would not work for all trees). An alternative option is to trim the tips using relative and absolute length cutoffs. For examples, trim tips that are longer than a relative length cutoff and more than 10 times longer than its sister. Also trim tips that are longer than an absolute value. The output tree also ends with ".tt". Keep input and output trees in the same directory. python trim_tips.py input_tree_dir tree_file_ending relative_cutoff absolute_cutoff. Mask both mono- and (optional) paraphyletic tips that belong to the same taxon. Keep the tip that has the most un-ambiguous charactors in the trimmed alignment. Keep input and output trees in the same directory. For phylogenomic data sets that are from annotated genomes, I would only mask monophyletic tips, and keep the sequence with the shortest terminal branch length. Keep input and output trees in the same directory. Cut deep paralogs. If interested in building phylogeny a lower (more stringent) long_internal_branch_cutoff should be used. Use a higher (more relaxed) cutoff if interested in homologs to avoid splitting homologs. This works very well with CDS and less effective amino acid squences. For CDS the branch lengths are mostly determined by synonymous distance and are more consistant than for amino acids. Make sure that the indir and outdir are different directories. Write fasta files from trees. The imput tree file ending should be .subtree. Repeat the alignment tree estimation, trimming, masking and cutting deep paralogs. Can use a set of more stringent cutoffs in the second round. After the final round, write fasta files from trees using tree files that ends with .subtree, and estimate the final homolog trees. Alternatively one can calculate the synonymous distance and use that to guide cutting. However, since we are only trying to get well-aligned clusters for tree inference, choice of length cutoffs here can be somewhat arbitary. From here a number of further analyses can be done with the homologs, such as gene tree discordance and back translate peptide alignment to codons with pal2nal and investigate signature of natural selection.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: