Optimization & Vectorization

Universiteit Utrecht - Information and Computing Sciences

academic year 2019/20 – 1st block

title image title image title image

Navigation

News

Lectures & Slides

Examination & Grading

Course Overview

Schedule

Literature & Links

Newsback to navigation

Recent news


November 6:


November 4:


October 31:


October 28:

  • Slides for lecture 13 ("Snippets") now available.

October 24:

  • Slides for lecture 12 ("Cache Oblivious") have been added.
  • Sample project for tomorrow's multi-threading experiments: click.


Older news is still available here.



Course Overview back to navigation

bunny logo image

Course: INFOMOV is a practical course on optimization: the art of improving software performance, without affecting functionality. We apply high level and low level optimizations, in a structured manner. Especially for the low level optimizations, we must intimately understand the hardware platform (CPU, GPU, memory, caches) and modify our code to use it efficiently.

Vectorization: Modern processors achieve their performance levels using parallel execution. This happens on the thread level, but also on the instruction level. Being able to produce efficient vectorized code is an important factor in achieving peak performance.

GPGPU: Graphics processors employ a streaming code execution model, taking vectorization to extremes, both in the programming model and the underlying architecture. Leveraging GPU processing power is an important option when optimizing existing code.

Context: Optimization is a vital skill for game engine developers, but also applies to other fields.

Lecturer: Jacco Bikker (j.bikker@uu.nl or bikker.j@gmail.com)

Comms: join us on Slack: UUGMT (UU students only).

Lectures:

  • Mondays, 10:00h - 11:45h, room BBG-209
    Monday Working Lecture PART 1: 09:00 - 09:45
    Monday Working Lecture PART 2: 12:00 - 12:45
  • Wednesdays, 10:00h - 11:45h, room BBG-214
    Wednesday Working Lecture PART 1: 09:00 - 09:45
    Wednesday Working Lecture PART 2: 12:00 - 12:45

The course will be taught in English. Warning: A decent level of C/C++ is expected. Expect a significantly higher workload if you are a C++ novice.


Files back to navigation

Downloads

Fundamentals:
Cross-platform version of the C/C++ template used in the course, 2019 edition (v2).
OpenCL template for the GPGPU lectures (updated 2019 edition).
OpenCL template for those that prefer C# (2018 edition, may get an update).
CUDA template (10.1).
Practice / demonstration materials:
balls - tmpl_2019.zip - profiling practice material (see accompanying tutorial).
dotcloud - tmp2019.zip - pratice material for low-level optimization (see accompanying tutorial).
gravity_soa - tmp2019.zip - practice material for the SIMD lectures.
voronoicpu - tmp2019.zip - example material for the first GPGPU lecture.

OptmzdSummaries™:
#0: Introduction and profiling
#1: Low level optimization
#2: Caching
#3: SIMD
#4: GPGPU using C# and OpenCL (also get: C# OpenCL template and conversion steps)

Additional resources will be made available during the course.


Lecture Slides & Recommended Readingsback to navigation

Below is a list of all lectures with a very brief summary of the topics, slides downloads, and recommended readings to prepare for the lecture.
This list is tentative.

Lecture 01
Mon Sep 9

Topic: Introduction  This lecture serves as an introduction to the course. And: Profiling With or without knowledge of optimization, it proves hard to 'guess' application performance bottlenecks. Profiling is a vital first (and often repeated) step in a structured approach to optimization.

Suggested readings:

Designing for Performance, Scalability & Reliability: StarCraft II's Approach

Slides:
Lecture 1 - Introduction

 
Lecture 02
Wed Sep 11

Topic: Low Level Optimization  In this lecture, we explore various low level factors that determine application performance.

Suggested readings:

Michael Karbo, Inside the CPU (Chapter 30 from PC Architecture)

Slides:
Lecture 2 - Low Level


 
Lecture 03
Mon Sep 16

Topic: Caching (1)   Considering the huge latencies involved in fetching data from RAM, caches play a crucial role in 'feeding the beast'. We explore various cache architectures and investigate implications in software development.

Suggested readings:

What Every Programmer Should Know About Memory

Slides:
Lecture 3 - Caching (1)


 
Lecture 04
Mon Sep 23

Topic: Caching (2)  Continuation of the topic of the previous lecture.

Suggested readings:

Game Programming Patterns - Data Locality

Slides:
Lecture 4 - Caching (2)

 
Lecture 05
Wed Sep 25

Topic: SIMD (1)  With CPU clock speeds reaching practical limits, parallelism becomes the main source of further advances in hardware performance. In this lecture, Intel's approach to SIMD programming (SSE) is introduced.


Slides:
Lecture 5 - SIMD (1)


Lecture 06
Mon Sep 30

Topic: SIMD (2)  Building on the concepts of lecture 6, we investigate advanced SIMD topics such as gather / scatter and masking.

Suggested readings:

Looking for 4x speedups? SSE to the rescue! (warning: by Intel)
OptmzdSummary #3

Slides:
Lecture 6 - SIMD (2)

 
Lecture 07
Wed Oct 2

Topic:  Data-Oriented Design  Where Object-Oriented Design focuses on ease of development and maintainability, Data-Oriented Design focuses on data layout that is optimal for performance. Targeting the computer rather than the developer has significant impact on software architecture, but also on performance.

Suggested readings:

Data-Oriented Design (Or Why You Might Be Shooting Yourself in the Foot With OOP)


Slides:
Lecture 7 - Data Oriented


Lecture 08
Mon Oct 7 

Topic: GPGPU (1)  For certain problems, a streaming processor is a good (and powerful) alternative to the CPU. In this lecture, we briefly explore GPU architecture and the concept of GPGPU.

Slides:
Lecture 8 - GPGPU (1)



 Lecture 09
Wed Oct 9

Topic: GPGPU (2)  Building on the previous lecture, we investigate the implementation of a Verlet fluid simulator on the GPU.

Slides:
Lecture 9 - GPGPU (2)


 
Lecture 10
Mon Oct 14

Topic: GPGPU (3)  Transferring optimization concepts to dthe GPU: profiling, high-level, low-level, caches, other architecture considerations. GPGPU-specific algorithms for common problems.

Suggested readings:

A Survey of General-Purpose Computation on Graphics Hardware

Slides:
Lecture 10 - GPGPU (3)
Roland's Crowded Slides



Lecture 11
Wed Oct 16

Topic: Fixed Point Math  Floating point calculations can often be done with integer arithmetic, and there are good reasons for doing so. In this lecture, the 'lost art' of fixed point arithmetic is introduced.

Suggested readings:

The Neglected Art of Fixed Point Arithmetic

Slides:
Lecture 11 - Fixed Point
Sandy's CMAKE slides

 
Lecture 12
  Wed Oct 23

Topic: Cache-Oblivious  Writing code that runs efficiently, regardless of cache architecture details.

Slides:
Lecture 12 - Cache Oblivious

 
Lecture 13
Mon Oct 28

Topic: Snippets  Various examples of optimizations that worked and didn't work. Also: multi-threading.


Slides:
Lecture 13 - Snippets


Lecture 14
Wed Oct 30

Topic: Process & Grand Recap Final lecture, recap of the structured optimization process, recap of various concepts, exam preparation.


Slides:
Will be made available after the lecture.


 

Course Schedule back to navigation

Period 1 Schedule (tentative)

Week Date Lecture / Exams Practicum Deadlines
Notes
36

No lecture
 


37
Mon Sep 9
09:00-10:45  
Lecture 1:
Introduction
  First practicum: profiling tutorial.
Document, project files.
Wed Sep 11
09:00-10:45
Lecture 2:
Low-level

P1 will be introduced.
38
Mon Sep 16
09:00-10:45
Lecture 3:
Caching (1)


Wed Sep 18
09:00-10:45
NO LECTURE


39
Mon Sep 23
09:00-10:45
Lecture 4:
Caching (2)
Tue Sep 24: Deadline Assignment 1
Click here for P1 details
Wed Sep 25, 23:59:
Extended Deadline (-1 pt)
Wed Sep 25
09:00-10:45
Lecture 5:
SIMD (1)

P2 will be introduced.
40
Mon Sep 30
09:00-10:45
Lecture 6:
SIMD (2)


Wed Oct 2
09:00-10:45
Lecture 7:
Data-Oriented Design
 

41
Mon Oct 7
09:00-10:45
Lecture 8:
GPGPU (1)


Wed Oct 9
09:00-10:45
Lecture 9:
GPGPU (2)
Thu Oct 10: Deadline Assignment 2
Click here for P2 details.
 Fri Oct 11, 23:59:
Extended Deadline (-1 pt)
42
Mon Oct 14
09:00-10:45
Lecture 10:
GPGPU (3)
12:00 - 12:45 : Guest Lecture -
Roland Geraerts, "Crowd Simulation".
 P3 will be introduced.
Wed Oct 16
09:00-10:45
Lecture 11:
Fixed Point Math
12:00 - 12:45 : Guest Lecture -
Sandy Carter, "Using CMAKE".

43
Mon Oct 21
09:00-10:45
NO LECTURE
 
Wed Oct 23
09:00-10:45
Lecture 12:
Cache-oblivious
 
44
Mon Oct 28
09:00-10:45
Lecture 13:
Snippets + Multithreading

 
Wed Oct 30
09:00-10:45
Lecture 14:
Grand Recap, Exam Practice
Thu Oct 31, 23:59:
Deadline
Final Assignment
Fri Nov 1, 23:59:
Extended Deadline (-1 pt)

45


EXAM: Tue Nov 5


Assignments back to navigation


All assignments can be done alone or in teams of two students. For teams: it is not mandatory to do all four assignments with the same partner; switching is allowed.


Assignment P1 - LOW LEVEL

For this assignment you will apply low-level optimization to a graphical application. Full details are in the formal assignment description.

Deadline: Tuesday September 24, 23:59. Extended deadline (1pt penalty): Wednesday September 25, 23:59.


Assignment P2 - CACHING

For this assignment, you will build a three-level set associative cache simulator. Full details are in the formal assignment description.

Deadline: Thursday October 10, 23:59. Extended deadline (1pt penalty): Friday October 11, 23:59.


Assignment P3 - NOT A DRILL

In this final assignment, you will apply all techniques to speed up real-world software. Full details are in the formal assignment description.

Resources:

Additional options will be presented by Roland Geraerts on Monday.


Deadline: Thursday October 31, 23:59. Extended deadline (1pt penalty): Friday November 1, 23:59.

 

Exam & Grading back to navigation

GRADING

Programming assignments: Your practical grade P is based on two programming assignments P1, P2 (25% each) and one final assignment P3 (50%).

Exam: Your exam / theory grade T is based on a single final exam.

Final grade: Your final grade is (3P + T) / 4. You must score at least 4.0 (before rounding) for the exam to pass this course.

RETAKES AND REQUIREMENTS

Retake: To qualify for a retake, the final grade must be at least 4 (before rounding). You may repair your final grade by redoing one of the four assignments, or the exam. Exact terms will be discussed individually.

 

Literature & Links back to navigation

Overview of literature used during this course:

Links verified on September 5.


Previous editions

 

News Archive back to navigation

Old posts


October 19:


October 17:


October 14:


October 11:

  • Results for the second assignment (P2) can be downloaded here.
  • Some preliminary details for P3 are now available for those that cannot wait.


October 9:


October 7:


October 4:


October 2:


October 1:

  • Two practical guest appearances added to the schedule.

September 30:


September 26:


September 23:


September 16:


September 13:

  • Reference implementation for "Galaxy": download.


September 11:


September 10:


September 9:

  • Slides for lecture 1 now available.
  • Wednesday Sep 11 lecture starts at 10:00!


September 5:

  • Added materials for first lecture / tutorial.

August 21:

  • Website for 2019/2020.