Saturday, February 28, 2015

Maxwell for the masses (GM206)

As you probably already know the mainstream version of Maxwell GPU has already been released in the form of GM206. The graphics card bearing the chip is the GTX-960. The card seems to be pretty efficient and a significant improvement over Kepler especially in compute applications which is the one aspect that I'm particularly interested in. There has been some controversy of course due to its short memory bus (128bit) which entails a peak memory bandwidth of 112GB/sec. However, the larger cache memory should help alleviating this bottleneck.

The Zotac GTX-960 AMP! edition

In order to give you a taste about the compute capabilities of Maxwell I provide the results of experimenting with the OpenCL NBody example (16384 bodies) from the NVidia SDK 4.2 (the last one with OpenCL support). The GTX-960 yields a well above of 1TeraFlop performance which is impressive. I also performed executions with 3 more GPUs. All results are depicted in the chart that follows.


The red bars represent measured performance in GFLOPs and the green ones the efficiency as the ratio measured/peak GFLOPs performance.
The Maxwell architecture seems to address many issues with compute efficiency of its predecessor. However, there are two drawbacks. First, the low memory bandwidth as mentioned above and second, the quite low compute performance in double precision operations which is set now at 1/32 ratio with regard to single precision operations.
One last observation is the quite good performance of the AMD GPU although the example application had been developed by NVidia and it's reasonable to think that it is optimized for its own GPUs. This could be one of the main reasons that they stopped supporting the OpenCL paradigm.

Friday, February 13, 2015

Raspberry Pi 2 is here!


Well, it's here! Raspberry PI 2 looks very similar to it's predecessor, the Raspberry PI B+, except of two things. The rather old ARM11 core is upgraded to not one but four Cortex-A7 cores (900MHz). The Cortex-A7 is an upgrade by itself as benchmarks has shown that it is 1.5-3 times faster than the old CPU core. Four CPU cores do a decent upgrade for the same power envelope and the same price ($35). And this is not all of the changes. The new PI features double the amount of RAM which now reaches to 1GB.
To summarize it is a great upgrade of the old PI. I would say that it is the most affordable 4 core computer for applying parallel programming paradigms, e.g. OpenMP.
One can compare these nbench output to the original Raspberry PI nbench results. Keep in your mind that nbench is a single threaded benchmark.



BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           453.9  :      11.64  :       3.82
STRING SORT         :          36.298  :      16.22  :       2.51
BITFIELD            :      1.1028e+08  :      18.92  :       3.95
FP EMULATION        :          82.381  :      39.53  :       9.12
FOURIER             :          4877.8  :       5.55  :       3.12
ASSIGNMENT          :          7.1713  :      27.29  :       7.08
IDEA                :          1364.7  :      20.87  :       6.20
HUFFMAN             :           663.8  :      18.41  :       5.88
NEURAL NET          :          5.7769  :       9.28  :       3.90
LU DECOMPOSITION    :          224.96  :      11.65  :       8.42
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 20.419
FLOATING-POINT INDEX: 8.434
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU ARMv7 Processor rev 5 (v7l)
L2 Cache            : 
OS                  : Linux 3.18.5-v7+
C compiler          : gcc-4.7
libc                : /lib/arm-linux-gnueabihf/libgcc_s.so.1
MEMORY INDEX        : 4.125
INTEGER INDEX       : 5.970
FLOATING-POINT INDEX: 4.678
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.