Simply speaking, SSE is vector processing. The way to do is to declare a vector type in GCC:
[more]
The 13 Motifs of Parallel Programming
List of Dwarfs (7 dwarfs, later expanded into 13 motifs):
[more]
Kowarschik and Weiss (2003) An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms (LNCS 2625)
Data access techniques to optimize code in modern processor:
[more]
Lam et al (1991) The Cache Performance and Optimization of Blocked Algorithms (ASPLOS)
Blocking/Tiling is a well-known way to make program faster by leveraging the properties of memory hierarchy.
[more]
Gude et al (2008) NOX: Towards an Operating System for Networks (CCR)
NOX, Network Operating System, is proposing an “OS” layer on top of network hardware. Similar to OS in a computer, it abstracts the network functions into API and user can write “applications” on top of those API to control the network. The idea is to make network management easier by...
[more]