Bioinformatics Unit
Bioinformatics Unit
Overview
The Bioinformatics Unit provides researchers at CRG and PRBB and external organizations with services of consultation, planning NGS and other genomic experiments, NGS data processing, analysis and management, software and database development, bioinformatics training, and access to high-performance computing resources at CRG.
The Unit works in synergy with the Genomics Unit and the Biomolecular Screening & Protein Technologies Unit (BMS-PT) to support users using high-throughput sequencing technologies from an experiment planning to delivery of timely and reliable results.
For more information please access the Unit website at http://biocore.crg.eu and/or take the virtual tour HERE.
Latest Updates
The Bioinformatics Unit gave the online ELIXIR course “Containers and Workflow Pipelines for reproducible and automated data analysis” (Oct 28, 2020)
The course was organized and supported by the VIB Bioinformatics core. 25 people attended the course. The course materials are available in GitHub and at this webpage.
The CRG Covid Viral Beacon relies on the SARS-CoV-2 genomic data processed by the Bioinformatics Unit using its MasterOfPore pipeline (Sep 8, 2020)
The EGA team has released the CRG Covid Viral Beacon, a tool to find SARS-CoV-2 variability at genomic, amino acid and motif level. The Bioinformatics Unit adopted its MasterOfPore pipeline for the analysis of all publicly available cDNA Nanopore sequencing data used by the Beacon.
Two papers were published together with Harris Onywera, an intern in Bioinformatics Unit on the CRG-Novartis-Africa Mobility Programme (July 30, 2020)
Dr. Harris Onywera was an intern in the Unit for 6 months in 2016. Two collaborative papers were recently published in Frontiers in Medicine and BMC Microbiology. His experience at CRG has been recently featured in a U. Cape Town press-release.
The Unit participated in the mass Covid-19 testing run by the CRG (June 23, 2020)
The ORFEU program was launched in April 2020 in response to the covid-19 outbreak. The Unit developed the web interface for scanning and registering the sample tubes and supported the flow of data on the samples and PCR results to the centralized database used by the hospitals.
The website on uniformly analysed data of coronavirus and SARS-CoV-2 Nanopore direct RNA sequencing (Apr 10, 2020)
Data are provided at http://covid.crg.eu and are analyzed using the MasterOfPores pipeline developed by the Bioinformatics Unit in collaboration with the CRG group of Eva Novoa. The resource is listed by TransBioNet covid-19 research efforts and was highlighted in Medical Xpress and CETEM News.
The paper on MasterOfPores is published (March 17, 2020)
The paper describing MasterOfPores is published in Frontiers Genetics. MasterOfPores is a publicly available parallel and scalable workflow for the analysis of Oxford Nanopore direct RNA sequencing datasets. It has been developed and supported by the Bioinformatics Unit in collaboration with the CRG group of Eva Novoa.
The Unit members conducted the RNA-seq data analysis course at the Pasteur Institute, Tunis (Feb 13, 2020)
On 10-13 February, 2020, we organized and delivered the RNA-Seq data analysis course as part of the EU-funded collaborative PHINDaccess project. The course took place at the Institut Pasteur in Tunis.
Julia Ponomarenko CV
2002 PhD in Biology at the Institute of Cytology and Genetics, Novosibirsk, Russia.
2002 - 2004 Project Scientist, San Diego Supercomputer Center, University of California San Diego, USA.
2004 - 2008 Senior Research Scientist San Diego Supercomputer Center, University of California San Diego, USA.
2008 - 2015 Project Investigator (NIH/ NHGMS, NIAID), San Diego Supercomputer Center, University of California San Diego, USA.
2015 - Head of the Bioinformatics Unit, Centre for Genomic Regulation, Barcelona, Spain.
How to access
All services and equipment offered by the Bioinformatics Unit are accessible by the web platform AGENDO.
You can access to AGENDO in the following link: https://crg.agendoscience.com/
Service Prices
Services
- Consultation on bioinformatics methods and resources, experimental design and budgeting, grant proposal development, bioinformatics and statistical data analysis, usage of high-performance computing resources at CRG.
- Bioinformatics training (in person and via internal and external courses).
Genomics
- Reference-based and de novo assembly of eukaryotic and prokaryotic genomes.
- Genome re-sequencing and quality assessment of genome assemblies.
- ChIP-seq (TFs, histone modifications): peak calling, differential binding analysis among sample groups, peak annotation.
- Whole exome and whole genome analysis: variant calling, CNVs.
- Identification and annotation of DNA structural variants for common and rare human diseases: individual and family analysis, cancer driver gene mutations.
- Genomes comparison.
- Genome functional annotation: ab initio gene prediction, annotation of genes, transcripts, DNA motifs, promoters, and other DNA regulatory elements.
- Analysis of 5C, Hi-C, ATAC-seq, and other high-throughput data.
Transcriptomics
- Reference-based and de novo assembly of eukaryotic and prokaryotic transcriptomes.
- Transcriptome functional annotation: ab initio gene prediction, annotation of genes, transcripts, DNA motifs, promoters, and other regulatory elements.
- Variant calling from transcriptome sequencing data.
- Analysis of commercial and custom microarrays: differentially expressed genes, group comparison.
- RNA-seq for mRNA: discovery of new transcripts, differentially expressed genes/transcripts.
- Functional analysis of differentially expressed genes/transcripts: Gene Ontology terms, DNA motifs, and pathways enrichment analysis.
- RNA-seq for small and non-coding RNA: differential expression, discovery of new microRNAs, microRNA target prediction.
- Analysis of OpenArray real-time PCR, and other high-throughput experimental data.
- Identification of batch effects and visualization of data and results: hierarchical clustering, heatmaps, dendrograms, volcano plots, principal components analysis for the overall (dis)similarity among experiments.
- RNA-target-based sequencing: RIP-seq, iCLIP, CLIP-seq, and other.
- Data submission to GEO, ArrayExpress and other public data repositories.
Metagenomics
- Analysis of amplicon (16S rRNA genes), whole genome and transcriptome shotgun sequencing data.
- Identification of microbial communities, taxonomic diversity and abundances at the levels of genus, family, order, class, phylum.
- Conservation and abundance of bacterial gene functional modules and biochemical pathways.
- Estimation of microbial diversity and sequence coverage.
- ORF prediction and functional annotation.
- Comparative analysis of samples: microbial profiles, Gene Ontology terms, metabolic and pathway analyses.
Proteomics
- Protein functional annotation and prediction.
- Analysis of SNPs and other variations effects on protein structure and function.
- Multiple sequence alignment.
- Orthologs and paralogs assignment.
- Phylogenetic analysis and tree construction.
- Protein structure comparison and 3D homology modeling.
- Protein-protein and protein-ligand 3D docking.
- B- and T-cell epitope prediction.
Databases, Websites and Software
- Databases: Relational and NoSQL.
- Websites for data submission, search, and analysis.
- Web-tools.
- LIMSs (Laboratory Information Management System) for management of the laboratory's operations, data flow, and communication with users and external collaborators.
- External software evaluation and benchmarking.
- Software development: bioinformatics scripts; data processing and analysis pipelines; integrative bioinformatics web applications; customized genome browsers.
- External and internal data integration solutions.
In addition to services provided for fee, we support fully collaborative grant-funded investigations. This includes preliminary data analysis, planning the grant budget and experiments provided by the CRG Core facility, writing the grant, data analysis and biological inference, custom software development, and co-authored dissemination of the grant results.
For more information and service fees please access the Unit website at http://biocore.crg.eu
For specifics on procedures and deliverables please contact the Unit at BioinformaticsUnit@crg.eu
To request a service or to propose a collaborative project please contact the Unit head at julia.ponomarenko@crg.eu
Equipment
- Linux cluster with 3020 cores for computing and more than 200 compute nodes and servers.
- Univa Grid Engine batch queuing system
- 87 compute nodes with 2 Intel Xeon E5-2680 20M Cache 8 core at 2.70 Ghz, 128 GB memory
- 56 compute nodes with 2 Intel Xeon E5530 4 core at 2.40GHz, 48 GB memory
- High memory compute node with 2 processor Intel Xeon E5-2699 v4 22 core at 2.20GHz, 1 TB memory
- High memory compute node: with 8 processor Intel Xeon E7450 6 core at 2.40 GHz, 512GB memory
- High memory compute node with 4 processors Intel Xeon E7540 6 core at 2.00 GHz, 256 GB memory
- 60 heterogeneous compute nodes with different hardware specifications
- 4.5 PB of storage EMC-Isilon, DDN and Nexsan