kandi background
Explore Kits

rumble-tools | Open source tools, libraries, and datasets related | Genomics library

 by   RumbleDiscovery Go Version: Current License: MIT

 by   RumbleDiscovery Go Version: Current License: MIT

Download this library from

kandi X-RAY | rumble-tools Summary

rumble-tools is a Go library typically used in Artificial Intelligence, Genomics applications. rumble-tools has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.
Open source tools, libraries, and datasets related to the Rumble Network Discovery product and associated research
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • rumble-tools has a low active ecosystem.
  • It has 87 star(s) with 16 fork(s). There are 16 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 1 open issues and 1 have been closed. There are 1 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of rumble-tools is current.
rumble-tools Support
Best in #Genomics
Average in #Genomics
rumble-tools Support
Best in #Genomics
Average in #Genomics

quality kandi Quality

  • rumble-tools has no bugs reported.
rumble-tools Quality
Best in #Genomics
Average in #Genomics
rumble-tools Quality
Best in #Genomics
Average in #Genomics

securitySecurity

  • rumble-tools has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
rumble-tools Security
Best in #Genomics
Average in #Genomics
rumble-tools Security
Best in #Genomics
Average in #Genomics

license License

  • rumble-tools is licensed under the MIT License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
rumble-tools License
Best in #Genomics
Average in #Genomics
rumble-tools License
Best in #Genomics
Average in #Genomics

buildReuse

  • rumble-tools releases are not available. You will need to build from source code and install.
rumble-tools Reuse
Best in #Genomics
Average in #Genomics
rumble-tools Reuse
Best in #Genomics
Average in #Genomics
Top functions reviewed by kandi - BETA

kandi has reviewed rumble-tools and discovered the below as its top functions. This is intended to give you an instant insight into rumble-tools implemented functionality, and help decide if they suit your requirements.

  • handleReflect handles reflection requests
  • SMBExtractFieldsFromSecurityBlob extracts fields from a security blob .
  • doHunt runs a job to the given dst .
  • SMB2ParseNegotiateContext is used to parse the SMB2 response .
  • doMonitor runs a monitor on the given destination directory .
  • SMBReadFrame is like SMBReadFrame but returns a byte slice .
  • SMB2ExtractFieldsFromNegotiateReply extracts the fields from the reply message .
  • probe is used to negotiate a new protocol
  • SMB2NegotiateProtocolRequest builds the SMB2NegotiateProtocolRequest message
  • BatchPorts splits a string into a list of ports .

rumble-tools Key Features

Open source tools, libraries, and datasets related to the Rumble Network Discovery product and associated research

Community Discussions

Trending Discussions on Genomics
  • search for regex match between two files using python
  • Is there a way to permute inside using to variables in bash?
  • BigQuery Regex to extract string between two substrings
  • how to stop letter repeating itself python
  • Split multiallelic to biallelic in vcf by plink 1.9 and its variant name
  • Delete specific letter in a FASTA sequence
  • How to get the words within the first single quote in r using regex?
  • Does Apache Spark 3 support GPU usage for Spark RDDs?
  • Aggregating and summing columns across 1500 files by matching IDs in R (or bash)
  • Usage of compression IO functions in apache arrow
Trending Discussions on Genomics

QUESTION

search for regex match between two files using python

Asked 2022-Apr-09 at 00:49

I´m working with two text files that look like this: File 1

#   See ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt for a description of the columns in this file.
# assembly_accession    bioproject  biosample   wgs_master  refseq_category taxid   species_taxid   organism_name   infraspecific_name  isolate version_status  assembly_level  release_type    genome_rep  seq_rel_date    asm_name    submitter   gbrs_paired_asm paired_asm_comp ftp_path    excluded_from_refseq    relation_to_type_material   asm_not_live_date
GCF_000739415.1 PRJNA224116 SAMN02732406        na  837 837 Porphyromonas gingivalis    strain=HG66     latest  Chromosome  Major   Full    2014/08/14  ASM73941v1  University of Louisville    GCA_000739415.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/739/415/GCF_000739415.1_ASM73941v1         na
GCF_001263815.1 PRJNA224116 SAMN03366764        na  837 837 Porphyromonas gingivalis    strain=A7436        latest  Complete Genome Major   Full    2015/08/11  ASM126381v1 University of Florida   GCA_001263815.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/263/815/GCF_001263815.1_ASM126381v1            na
GCF_001297745.1 PRJNA224116 SAMD00040429    BCBV00000000.1  na  837 837 Porphyromonas gingivalis    strain=Ando     latest  Scaffold    Major   Full    2015/09/17  ASM129774v1 Lab. of Plant Genomics and Genetics, Department of Plant Genome Research, Kazusa DNA Research Institute GCA_001297745.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/297/745/GCF_001297745.1_ASM129774v1            an
...

File 2:

#   See ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt for a description of the columns in this file.
# assembly_accession    bioproject  biosample   wgs_master  refseq_category taxid   species_taxid   organism_name   infraspecific_name  isolate version_status  assembly_level  release_type    genome_rep  seq_rel_date    asm_name    submitter   gbrs_paired_asm paired_asm_comp ftp_path    excluded_from_refseq    relation_to_type_material   asm_not_live_date
GCA_000739415.1 PRJNA245225 SAMN02732406        na  837 837 Porphyromonas gingivalis    strain=HG66     latest  Chromosome  Major   Full    2014/08/14  ASM73941v1  University of Louisville    GCF_000739415.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/739/415/GCA_000739415.1_ASM73941v1         na
GCA_001263815.1 PRJNA276132 SAMN03366764        na  837 837 Porphyromonas gingivalis    strain=A7436        latest  Complete Genome Major   Full    2015/08/11  ASM126381v1 University of Florida   GCF_001263815.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/263/815/GCA_001263815.1_ASM126381v1            na

So, I want to search for a specific pattern using regex. For example, file 1 has this pattern:

GCF_000739415.1

and file 2 this one:

GCA_000739415.1

The difference is the third character: F versus A. However, sometimes numbers differ. Difference between files is the third row of data. These two files have a lot of patterns like the previous one, however, there are some differences. My goal is to search for the pattern that only exists in one file and not in the other file. For example, "GCF_001297745.1 in the third row in the file 1 but not in the file 2. This should be a GCA_001297745.1"

I´m working on a python code:

# PART 1: Open and read text file
with open("assembly_summary_genbank.txt", 'r') as f_1:
    contents_1 = f_1.readlines()
with open("assembly_summary_refseq.txt", 'r') as f_2:
    contents_2 = f_2.readlines()

# PART 2: Search for IDs
matches_1 = re.findall("GCF_[0-9]*\.[0-9]", str(contents_1))
matches_2 = re.findall("GCA_[0-9]*\.[0-9]", str(contents_2))

# PART 3: Match between files
# Seudocode
for line in matches_1:
    if matches_1 == matches_2:
        print("PATTERN THAT ONLY EXIST IN ONE FILE")

Part 3 refers to doing a for loop that searches for each line in both files and prints the patterns that only exist in one file and not in the other one. Any idea for doing this for loop?

ANSWER

Answered 2022-Apr-09 at 00:49

Perhaps you are after this?

import re

given_example = "GCA_000739415.1 PRJNA245225 SAMN02732406        na  837 837 Porphyromonas gingivalis    strain=HG66     latest  Chromosome  Major   Full    2014/08/14  ASM73941v1  University of Louisville    GCF_000739415.1 identical   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/739/415/GCA_000739415.1_ASM73941v1         an"
altered_example = "GCA_000739415.1 GCTEST_000739415.1"

# GX[A or F]_[number; digit >= 1].[number; digit >= 1]
regex = r"GC[AF]_\d+.\d+"

matches_1 = re.findall(regex, given_example)
matches_2 = re.findall(regex, altered_example)

# Iteration for intersection
for match in matches_1:
    if match in matches_2:
        print(f"{match} is in both files")

Prints

GCA_000739415.1 is in both files
GCA_000739415.1 is in both files

But I would recommend:

# The preferred method for intersection, where order is not important
matches = list(set(matches_1) & set(matches_2))

Which saves as:

['GCA_000739415.1']

Note the regex matches in a form of GX[A or F]_[number; digit >= 1].[number; digit >= 1]. Let me know if this is not what you are after

Regex demo here


Edit

I believe you are after the symmetric difference of sets for files 1 and 2. Which is a fancy way of saying "things in A & B, that are not in both"

Which can be done with literation:

# Iteration
# A set has no duplicates, and is unordered
sym_dif = set()
for match in matches_1:
    if match not in matches_2:
        sym_dif.add(match)
>>> list(sym_dif)
['GCF_001297745.1', 'GCA_001297745.1']

I think your mistake was not using a set, you should't have any duplicates, and using matches_1 == matches_2. The lists won't be the same. You should check if it is not in the other set.

Or using this set notation which is the preferred method:

>>> list(set(matches_1).symmetric_difference(set(matches_2)))
['GCF_001297745.1', 'GCA_001297745.1']

Source https://stackoverflow.com/questions/71789818

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install rumble-tools

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

Explore Related Topics

Share this Page

share link
Reuse Pre-built Kits with rumble-tools
Compare Genomics Libraries with Permissive License
Compare Genomics Libraries with Highest Reuse
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.