F4 | Assessing confidence in introgression among four populations | Genomics library

 by   mmatschiner Python Version: Current License: No License

kandi X-RAY | F4 Summary

kandi X-RAY | F4 Summary

F4 is a Python library typically used in Artificial Intelligence, Genomics applications. F4 has no bugs, it has no vulnerabilities and it has low support. However F4 build file is not available. You can download it from GitHub.

Assessing confidence in introgression among four populations. F4 calculates the f4-statistic from allele frequencies of four populations and uses coalescent simulations to test whether this value could be the result of incomplete lineage sorting. The f4-statistic was introduced by Reich et al. (2009) and is a powerful measure to distinguish introgression from incomplete lineage sorting, based on allele frequencies of four populations. With populations A, B, C, and D, and the assumed population topology (A,B),(C,D), the f4-statistic is calculated as the product of the difference of allele frequencies between A and B, and between C and D. Thus, if at a particular SNP, the frequency of the base "G" was 0.4 in population A, 1.0 in population B, 0.2 in population C, and 0.0 in population D, the f4-statistic for this SNP would be (0.4-1.0)x(0.2-0.0) = -0.6x0.2 = -0.12. With more than one SNP, the f4-statistic of the whole data set is simply the mean of the f4 values of all individual SNPs. What's interesting about this measure is that under incomplete lineage sorting alone, the allele frequency differences between A and B should be independent of those between C and D, and the f4-statistic should thus be zero. If there is introgression however between the two pairs of populations (e.g. A introgressed into C), this would lead to non-zero f4 values. The tricky part however is not to calculate the f4-statistic, but to assess support for it being different from zero, and thus for introgression. As implemented in the program fourpop which comes as part of the Treemix package (Pickrell & Pritchard 2012), block jackknifing is one way to do this, which is commonly used. This means that the data set is chopped into blocks of a particular size, and the f4-statistic is calculated individually for each of these blocks. By then comparing the overall f4 value to the standard error of f4 values taken from jackknife blocks, a z-score is calculated and serves as a measure of support. There are a few problems to this approach. One is that SNP data sets, e.g. those obtained by RAD sequencing often contain linked groups of SNPs that can confound the standard error of jackknife block f4 values. Further, the use of z-scores as support assumes that the underlying data is normally distributed, but often, f4 values of jackknife blocks are not. This is due to a large proportion of these values being exactly zero if the block from which they are taken does not show any evidence of either introgression or incomplete lineage sorting (or worse, if monomorphic SNPs were not removed from the data set, which they should be for this analysis). Another issue is that the chosen value of the jackknife block size can (and often does) influence the standard error and thus the z-score supporting introgression. The approach chosen by F4 is therefore to run simulations to assess support for introgression. After all, we're not so much interested in whether the observed f4-statistic is different from zero, but in whether or not it could be produced by incomplete lineage sorting alone, without any introgression. F4 uses the coalescent software fastsimcoal2 (Excoffier et al. 2013) to produce SNP data sets that resemble the actual SNP data set, with migration rates set to zero and therefore strictly without introgression. All simulated data sets have the same number of individuals and SNPs, and are masked to include the same amount of missing data per population and SNP as the original data set. Further, a burn-in phase is used to automatically adjust simulation parameters (effective population size and relative divergence times) so that the resulting SNP variation matches the observed, both between all populations and between the two pairs of populations. Thus, the simulated data sets should be equivalent to the original data set, but have been produced completely without introgression. Finally, F4 calculates the f4-statistic for each simulated data set and reports the proportion of these that is more extreme than the observed. If the number of simulations is large enough, this proportion of more extreme values can be taken as the probability of obtaining this f4 value (or a more extreme one) without introgression. Thus if this probability is small enough (e.g. < 0.05), it supports introgression between the two pairs of populations. It does not, however, indicate the directionality of introgression. F4 is written in python3, therefore this version of python must be installed. It also uses a number of python packages, most of these however are likely to be installed on your machine anyway, if you have python. Only the two packages numpy and scipy may require additional installations. Importantly, F4 uses fastsimcoal2 for coalescent simulations, thus this program needs to be installed, and needs to be executable with fsc252, for F4 to run.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              F4 has a low active ecosystem.
              It has 18 star(s) with 4 fork(s). There are 4 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 2 open issues and 0 have been closed. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of F4 is current.

            kandi-Quality Quality

              F4 has 0 bugs and 0 code smells.

            kandi-Security Security

              F4 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              F4 code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              F4 does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              F4 releases are not available. You will need to build from source code and install.
              F4 has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of F4
            Get all kandi verified functions for this library.

            F4 Key Features

            No Key Features are available at this moment for F4.

            F4 Examples and Code Snippets

            No Code Snippets are available at this moment for F4.

            Community Discussions

            QUESTION

            How to groupby average in multicondition with Python Pandas
            Asked 2022-Apr-16 at 12:16

            The objective is calculate subset column average based on the multi condition in a multiindex dataframe.

            The first condition is to get the average group by the first level of the multiindex.

            The second condition is to get the average based on the dict_ref value below

            ...

            ANSWER

            Answered 2022-Apr-16 at 12:16

            You can flip dict_ref so each item in the value array becomes the key, perform a replacement, and group by the new ch:

            Source https://stackoverflow.com/questions/71892140

            QUESTION

            How to add columns in dataframe under my specific condition?
            Asked 2022-Apr-12 at 11:23

            I have a dataframe df1 with columns id1 and id2:

            ...

            ANSWER

            Answered 2022-Apr-12 at 09:00

            You could stack df2 and then perform a double merge on df1 to get the features for the relevant id combinations:

            Source https://stackoverflow.com/questions/71839280

            QUESTION

            Faulty benchmark, puzzling assembly
            Asked 2022-Mar-28 at 07:40

            Assembly novice here. I've written a benchmark to measure the floating-point performance of a machine in computing a transposed matrix-tensor product.

            Given my machine with 32GiB RAM (bandwidth ~37GiB/s) and Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz (Turbo 4.0GHz) processor, I estimate the maximum performance (with pipelining and data in registers) to be 6 cores x 4.0GHz = 24GFLOP/s. However, when I run my benchmark, I am measuring 127GFLOP/s, which is obviously a wrong measurement.

            Note: in order to measure the FP performance, I am measuring the op-count: n*n*n*n*6 (n^3 for matrix-matrix multiplication, performed on n slices of complex data-points i.e. assuming 6 FLOPs for 1 complex-complex multiplication) and dividing it by the average time taken for each run.

            Code snippet in main function:

            ...

            ANSWER

            Answered 2022-Mar-25 at 19:33

            1 FP operation per core clock cycle would be pathetic for a modern superscalar CPU. Your Skylake-derived CPU can actually do 2x 4-wide SIMD double-precision FMA operations per core per clock, and each FMA counts as two FLOPs, so theoretical max = 16 double-precision FLOPs per core clock, so 24 * 16 = 384 GFLOP/S. (Using vectors of 4 doubles, i.e. 256-bit wide AVX). See FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

            There is a a function call inside the timed region, callq 403c0b <_Z12do_timed_runRKmRd+0x1eb> (as well as the __kmpc_end_serialized_parallel stuff).

            There's no symbol associated with that call target, so I guess you didn't compile with debug info enabled. (That's separate from optimization level, e.g. gcc -g -O3 -march=native -fopenmp should run the same asm, just have more debug metadata.) Even a function invented by OpenMP should have a symbol name associated at some point.

            As far as benchmark validity, a good litmus test is whether it scales reasonably with problem size. Unless you exceed L3 cache size or not with a smaller or larger problem, the time should change in some reasonable way. If not, then you'd worry about it optimizing away, or clock speed warm-up effects (Idiomatic way of performance evaluation? for that and more, like page-faults.)

            1. Why are there non-conditional jumps in code (at 403ad3, 403b53, 403d78 and 403d8f)?

            Once you're already in an if block, you unconditionally know the else block should not run, so you jmp over it instead of jcc (even if FLAGS were still set so you didn't have to test the condition again). Or you put one or the other block out-of-line (like at the end of the function, or before the entry point) and jcc to it, then it jmps back to after the other side. That allows the fast path to be contiguous with no taken branches.

            1. Why are there 3 retq instances in the same function with only one return path (at 403c0a, 403ca4 and 403d26)?

            Duplicate ret comes from "tail duplication" optimization, where multiple paths of execution that all return can just get their own ret instead of jumping to a ret. (And copies of any cleanup necessary, like restoring regs and stack pointer.)

            Source https://stackoverflow.com/questions/71618068

            QUESTION

            How to avoid overlapping of labels in scatter plot
            Asked 2022-Mar-23 at 09:28

            My Dataframe looks like this:

            ...

            ANSWER

            Answered 2022-Mar-23 at 09:28

            There may be several approaches, create a data frame for the annotation, group by column value and list the indexes. Set annotations in the created data frame. In this data example, more strings overlap, so we change the offset values only for the indices we do not want to overlap.

            Source https://stackoverflow.com/questions/71583780

            QUESTION

            How to copy list of files in parallel to list of destinations? Python
            Asked 2022-Feb-22 at 07:59

            I have src_list and dst_list, two lists of the same length.

            src_list contains paths to existing files.

            dst_list contains paths to maybe existing files to maybe overwrite (not folders!).

            src_list[i] should correspond to dst_list[i].

            I want to copy every src_list[i] to dst_list[i], in (multiprocessing, not threading) parallel.

            I want the call to be blocking, meaning all processes should be joined before moving on.

            Examples of lists:

            ...

            ANSWER

            Answered 2022-Feb-21 at 08:56
            from concurrent.futures import ProcessPoolExecutor
            from shutil import copyfile
            
            with ProcessPoolExecutor() as executor:
                executor.map(copyfile, src_list, dst_list)
            

            Source https://stackoverflow.com/questions/71203376

            QUESTION

            How to concatenate the column by column name in pandas?
            Asked 2022-Feb-21 at 02:01

            Is there any efficient way to concatenate Pandas column name, and don't use loop.

            My current method is very slow.

            input :

            ...

            ANSWER

            Answered 2022-Feb-18 at 10:29

            You could rework your dictionary to form groups and use groupby+agg(list):

            Source https://stackoverflow.com/questions/71171917

            QUESTION

            R: How do I reduce the ranges of genes into a single vector?
            Asked 2022-Feb-19 at 23:15

            I created var_nt dataframe by subsetting tx_df columns based on row variant - "J3", "J10", "J11", "J13". Then, I converted the var_nt dataframe to a GRanges object (varnt_grange) using the makeGRangesFromDataFrame function.

            Now, I want to write a for loop to collapse the varnt_grange into a single vector.

            ...

            ANSWER

            Answered 2022-Feb-19 at 23:15

            On each iteration, you are inadvertently rewriting over the same object, repeatedly. Instead, you can iterate over the values in gene.list$entrez using lapply

            Source https://stackoverflow.com/questions/71189445

            QUESTION

            How to select a part of a range, which is output of a formula
            Asked 2022-Feb-01 at 14:00

            I seem to be really bad at writing readable titles :-)

            Here's the situation:
            I have a list of names and values, and I would like to calculate the sum of the values, corresponding to every name:

            ...

            ANSWER

            Answered 2022-Feb-01 at 14:00

            I'm unsure if you want two columns or not, but with ms365:

            Formula in D2:

            Source https://stackoverflow.com/questions/70941367

            QUESTION

            How to create a single column from multiple?
            Asked 2022-Jan-30 at 22:10

            I have df1:

            ...

            ANSWER

            Answered 2022-Jan-30 at 21:02

            If the values are "NULL", then we can select the columns of interest, convert to long format with pivot_longer and filter out the "NULL" elements

            Source https://stackoverflow.com/questions/70918854

            QUESTION

            std::move versus copy elision
            Asked 2022-Jan-18 at 10:25

            The following code compiles without warnings in Visual Studio 2019 msvc x64:

            ...

            ANSWER

            Answered 2022-Jan-18 at 08:13

            Does this mean that I should have written:

            Source https://stackoverflow.com/questions/70752203

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install F4

            You can download it from GitHub.
            You can use F4 like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/mmatschiner/F4.git

          • CLI

            gh repo clone mmatschiner/F4

          • sshUrl

            git@github.com:mmatschiner/F4.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Genomics Libraries

            Try Top Libraries by mmatschiner

            tutorials

            by mmatschinerHTML

            snapp_prep

            by mmatschinerRuby

            kumara

            by mmatschinerPython

            Fitchi

            by mmatschinerPython

            panama

            by mmatschinerRuby