CAC Workshops

Highly qualified personnel (HQP) are the key component to academic, private, and public sector success. The Centre for Advanced Computing offers a wide variety of workshops and training opportunities to advance and maintain your HQP’s skills. Our user support and cognitive development specialists deliver training to any experience level, from Introductory Linux to advanced parallel programming optimization. Hot topics such as cloud computing, Apache Spark, and data analytics are available.

Looking for something outside of our standard curriculum? We will work with your requirements to meet your team’s training needs.

Data and Pipelines Workshops:

Data Understanding: 2 hours
This is an introductory workshop to Data Analytics. It starts by introducing the Data Analytics pipeline and its processes. Then, it discusses the different statistical and visualization approaches for conducting Exploratory and Descriptive Analytics on data to answer the question of “What happened in the past?”.

Text Mining:  4 hours
Text mining is the process of extracting meaning, patterns and trends from unstructured textual data. Massive amounts of unstructured text are prevalent in today. Traditional machine learning algorithms handle only numerical or categorical data. Existing data analytical platforms provide special components to facilitate the analysis of textual data. This workshop introduces the topic of text mining and provides a tour with hands-on exercises and demonstrations of four texting mining tools, each of which supports an interesting and diverse set of features

Data Preparation: 4 hours
The Data Preparation workshop covers the different approaches for preparing data. This includes data cleaning, missing values handling, outlier detection and handling, feature transformation and the art of feature engineering which is considered one of the most vital operations in the Data Analytics process.

Machine Learning Workshops:

Unsupervised Learning:  3 hours
This workshop introduces Unsupervised Learning approaches for clustering data to find hidden groups within the data. Algorithms discussed in this workshop are KMeans, KMedoids, Fuzzy C-Mean, Hierarchical Clustering, and Self Organizing Maps. The workshop also introduces a set of statistical evaluation methods for evaluating the existence of groups in a data set, evaluating the algorithms performance and for assessing a cluster’s stability.

Introduction to Spark: 4 hours
Apache Spark is one of the most popular projects in the Hadoop ecosystem. This workshop provides an overview on the Spark environment, its model and its core data abstractions. It introduces you to the Spark SQL API and Spark Machine Learning Library API. Two practical application examples will be presented in the   context of the IBM Watson Data Platform.

Supervised Learning: 5 hours
This workshop introduces Predictive Analytics to answer the question of “What will happen?”. It discusses when and how to use the different predictive Machine Learning algorithms. The workshop covers algorithms in (1) Supervised Learning (Classification and Regression) such as KNN, Decision Trees, Random Forest, Naïve Bayes, Support Vector Machines, Neural Networks, Logistic and Linear Regression and (2) Ensemble Learning techniques such as Bagging, Boosting, Gradient Boosting and Stacking. The workshop also introduces a set of statistical evaluation methods to compare the performance of different algorithms.

Cloud Computing Workshops:

Cloud Computing: 2 hours
The cloud computing workshop gives a brief introduction to the main concepts of cloud computing. The workshop then introduces the IBM Cloud and goes over the steps needed to start working in this environment including creating an account, understanding the dashboard and creating and deploying an application in the IBM Cloud.

Programming Language Workshops:

Introduction to Modern Fortran:  7 hours
The Fortran programming language was one of the first “high level” languages. A great number of important technical computing packages are written and maintained in Fortran. It is specifically geared toward numerical computing and the development of scientific and engineering applications. Due to its structural simplicity, it also naturally supports efficient execution and thus is very suitable for “high-performance” computing. This is an opportunity to familiarize yourself with the basic concepts and features of Fortran 90, such as modules, memory allocation, array operation, and routine overloading.

Analysis Pipelines with Python:  7 hours
Python is perhaps the most versatile programming language in existence, and sees widespread use in every field of modern computing. This tutorial focuses on Python for high-performance computing applications, and includes topics on performance optimization, parallel programming, and pipelining. The second part focusses on using Python to (easily) write and scale massively parallel data analysis pipelines across a cluster.

Introduction to Julia: 7 hours
Julia is a high-level, high-performance dynamic programming language for numerical computing. It provides a  compiler, parallel execution, numerical accuracy, and a mathematical function library. It aims to combine the simplicity and accessibility of environments such as R and Python with the execution speed and efficiency of programming languages such as Fortran or C++. This workshop offers a starting point for further exploration of the language, and enables users with little or no programming background to write simple but functional programs.

Systems Tool and Data Workshops:

Data Science with R:  7 hours
This is an introductory R course that focuses on teaching the basics of using R to perform research, specifically: write and run R code using RStudio, analyze data with the dplyr and purrr packages, make publication-quality plots with ggplot2, and write reports using R Markdown.

Databases and SQL: 4 hours
A relational database is a common way to store and manipulate information, especially in business and corporate environments. Databases include powerful tools for search and analysis, and can handle large, complex data sets. This lesson is focused on teaching the basics of using, manipulating, and creating databases using SQLite as a teaching tool.

The UNIX Shell:  4 hours
This class serves as an introduction to Linux, the UNIX-like operating system that runs on almost all high-performance computing systems. It is intended for users who have little or no experience with UNIX or Linux. The focus is on the common bash shell. We cover material that helps the user to develop an understanding of the Linux command-line environment, which is necessary for successful use of Unix.

Automation and Make: 4 hours
Make is a tool for managing the building of software and databases. It uses a simple syntax to describe a “dependence graph”, which allows the automatic execution of a sequence of commands that rebuild all files of interest. It is included in virtually every Unix system, and it is no exaggeration to call it one of the most  useful Unix tools. This course provides a practical hands-on introduction.

Version control with Git:  4 hours
Version control is a method of intelligently managing code for any project, enabling programmers to collaborate, keep track of changes, track down bugs, and maintain multiple versions/backups of their software. During this tutorial, students learn the basics of version control using Git, as well as how to host and collaborate on coding projects with online services like GitHub.

High Performance Computing Workshops:

Introduction to High-Performance Computing: 7 hours
This workshop is an introduction to using high-performance computing systems effectively. We obviously can’t cover every case or give an exhaustive course on parallel programming in just 7 hours of teaching time. Instead, this workshop is intended to give students a good introduction and overview of the tools available and how to use them effectively. By the end of this workshop, students will know how to use the UNIX command line to operate a computer, connect to a cluster, write simple shell scripts, submit and manage jobs on a cluster using a scheduler, transfer files, and use software through environment modules.

Introductory / intermediate parallel programming with MPI:  2 days
The Message Passing Interface (MPI) is the standard method for writing programs for execution on a cluster. Our workshop provides an introduction that allows users to write basic parallel applications, or to turn existing serial programs into parallel ones. An extension covers more advanced topics such as parallel IO, and user-defined data types.

Thread Programming with the Posix thread library:  7 hours
The Posix thread library is the most commonly used thread library for the Unix platform. It enables the design of flexible multi-threaded applications that can make full use of the multi-core or multi-processor structure of the underlying hardware. Virtually any system application in a Unix-based OS makes use of them. This introductory course explains the basic usage of the library and enables the writing of simple multi-threaded applications that run in parallel on a shared-memory system.

High-Performance Computing with R:  7 hours
The R programming language has become the standard tool for data science, statistics, and bioinformatics. This course focuses on making your R code as fast as possible, including topics on performance optimization and parallelization. There is a major emphasis on newer additions to the language, in particular, the “tidyverse” set of packages.

Parallel Computing Concepts with Chapel: 7 hours
Our Chapel lesson is an in-depth overview of parallel programming, using Chapel as a teaching tool. Students will leave the workshop able to write fast, performant code, and be able to parallelize a program across a set of compute nodes. Topics covered include the following: basic Chapel language syntax, an overview of parallel programming concepts, writing shared-memory parallel programs, and writing distributed-memory parallel programs (to be executed across several nodes in a cluster).

Shared-memory programming with OpenMP: 7 hours
The OpenMP compiler directives were designed to provide a simple alternative to explicit thread programming. This enables those new to parallel programming to turn serial applications into parallel ones that can make use of the multicore architecture of modern computers. This affords a way to program for shared-memory machines ranging from desktops to large-memory SMP systems.

Prices are available upon request.