Software
MyMpn is an online resource devoted to studying the human pathogen Mycoplasma pneumoniae, a minimal bacterium causing lower respiratory tract infections.
Nextflow is a pipeline orchestration tool that provides a domain specific language (DSL), meant to simplify the writing of complex distributed computational workflows in a portable and replicable manner. It allows the seamless parallelization and deployment of any existing application with minimal development and maintenance overhead, irrespective of the original programming language.
overlap is a program that computes the overlap between two sets of genomic features.
overlap is a program that computes the overlap between two sets of genomic features. More precisely it takes two gff files of genomic features as input and for each feature of the first set, says whether it is overlapped by a feature of the second set (basic mode, however more and more precise information can be retrieved).
PATRONUS is a program designed to compute in a very fast way the exact probability of observing a given number of occurrences of a simple motif in a sequence.
PATRONUS (from "PATtern Recognition by Optimized Numerical Universal Scoring") is a program designed to compute in a very fast way the exact probability of observing a given number of occurrences of a simple motif (that is, a continuous word without gaps) in a sequence. Its intended scope is the analysis of very long biological sequences, like chromosomes or whole genomes of complex organisms. The probability is computed on the basis of the Markovian statistics of order m for the sequence, that is the recorded number of the occurrences of all the submotifs of length m + 1 in the sequence. Contrary to what many people believe, computing such a probability for a generic motif is a computationally demanding task, mainly because motifs can overlap in non-trivial ways.
PhylomeDB is a public database for complete collections of gene phylogenies (phylomes).
PhylomeDB is a public database for complete collections of gene phylogenies (phylomes). It allows users to interactively explore the evolutionary history of genes through the visualization of phylogenetic trees and multiple sequence alignments. Moreover, phylomeDB provides genome-wide orthology and paralogy predictions based on the analysis of the phylogenetic trees.
The Plant Resistance Genes database (PRGdb; http://prgdb.org) is a comprehensive resource on resistance genes (R-genes), a major class of genes in plant genomes that convey disease resistance against pathogens.
project is a program that projects genomic features onto their sequences.
project is a program that projects genomic features onto their sequences. Please contact Sarah Djebali (sarah dot djebali at crg dot es for any question).
Various R scripts for exploratory biological sequence-derived data analysis
SeAMotE (Sequence Analysis of Motifs Enrichment) allows fast and accurate large-scale de novo motif discovery in nucleic acid sequences.
SECISaln will predict a SECIS element in the query sequence, split it into its constituent parts and align these against a precompiled database of eukaryotic SECIS elements.
SECISaln will predict a SECIS element in the query sequence, split it into its constituent parts and align these against a precompiled database of eukaryotic SECIS elements.
In this web server we provide public access to two new computational methods for selenoprotein identification and analysis:
In this web server we provide public access to two new computational methods for selenoprotein identification and analysis: SECISearch3 replaces its predecessor SECISearch as a tool for prediction of eukaryotic SECIS elements. Seblastian is a new method for selenoprotein gene detection that uses SECISearch3 and then predicts selenoprotein sequences encoded upstream of SECIS elements. Seblastian is able to both identify known selenoproteins and predict new selenoproteins. This project is the result of a collaboration with Vadim Gladyshev's lab in Harvard
Selenoprofiles is a homology-based gene finding tool which is suitable for selenoprotein prediction in large nucleotide databases, like genomes.
Selenoprofiles is a homology-based gene finding tool which is suitable for selenoprotein prediction in large nucleotide databases, like genomes. Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins UGA is recoded to Sec in presence of specific signals on selenoprotein gene transcripts. Due to the dual role of the UGA codon, selenoprotein prediction and annotation are difficult tasks and are left mostly to manual analysis, since there are no reliable “golden standard” programs for this purpose. Here we present an homology-based in silico tool to scan genomes for members of the known selenoprotein families: selenoprofiles. This pipeline has features that make it suitable for selenoprotein prediction, and is shown to correctly predict selenoproteins that are badly annotated in Ensembl. Selenoprofiles is a python-built pipeline that internally runs psitblastn, exonerate, genewise and SECISearch.
sgp2 is a program to predict genes by comparing anonymous genomic sequences from different species.
sgp2 is a program to predict genes by comparing anonymous genomic sequences from different species. It combines tblastx, a sequence similarity search program, with geneid, an ab initio gene prediction program.
SJcount is a utility for fast SJ (splice-junction) quantification. It is an annotation-agnostic offset-aware version of bam2ssj