Complete Cluster course - September 2024
Complete Cluster course - September 2024
This course will consolidate material presented in the beginner cluster course and expand on the concepts to be aware of when trying to optimize use of the cluster.
The main message of the course is to embrace the parallelism available within the cluster and that pipelines should be made from lots of small independent pieces that are spread throughout the cluster rather than large monolithic long jobs that run on a single node. The course will show why this should be done and how to achieve it.
Topics that are going to be addressed:
- Video tour of the data centre
- What is a cluster
- Logging in
- Queuing / the scheduler
- What resource are available at the CRG cluster
- Simple batch scripts - directives
- Troubleshooting - what happened to my jobs?
- Interactive sessions
- Supercomputers, beowulf clusters, horizontal v vertical scaling
- Hardware considerations
- Multithreaded jobs, parallelism, Amdahl's Law
- Job arrays
- Job dependencies
- Building a pipeline
- Storage issues, treemap
- Job stats, resource estimation
- Scaling analysis
What NOT to expect:
Specific bioinformatics methods, pipeline builders (nextflow, snakemake etc.)
Instructors and teachers: Emyr James, Head of SIT, and co-trainers: Rodny Hernandez, Gabriel Gonzalez, Luis Exposito, Clemente Borges
Dates: 25, 26 and 27 of September 2024 from 10:30-13:30h
Level: You have to have a minimum level of Linux experience OR have taken the Linux Terminal for beginners course. The level of this course is intermediate-advanced
Location: Bioinformatics room, CRG Training Centre
Maximum number of participants: 16
Registration deadline: 11th September 2024 Extension dateline: 16th September 2024
Registration HERE
For any information, please send an email to CRG Training and Academic office (TAO): training@crg.eu