ProFit OpenCL

ProFit OpenCL Nov 27, 2017 17:49:02 GMT 8

Quote

Post by taranu on Nov 27, 2017 17:49:02 GMT 8

It occurred to me that there isn't much information anywhere about how to use OpenCL with ProFit, so I thought I'd give a few tips for the Linux users out there and maybe Aaron can help the Macheads.

First off, OpenCL is a framework that is designed to allow general-purpose computing on graphics cards (GPUs). Most modern graphics cards are specially designed to render video in parallel and can be hundreds of times faster than CPUs for certain kinds of problems. With ProFit, you may be able to get a speedup of anywhere from 1-10x using your GPU, leaving your CPU free to do something else.

To build ProFit with OpenCL support, you need OpenCL headers, implementations and appropriate device drivers. Depending on your distribution (I use Fedora), this means installing packages like:

opencl and opencl-devel

beignet (for Intel on-board graphics chips)
pocl (a general purpose framework that supports multiple devices)
mesa-libOpenCL (a somewhat out-of-date implementation called Clover)

There are also vendor-specific packages, including:

intel-opencl
rocm-opencl
xorg-x11-drv-nvidia-cuda

To make things even more complicated, you may need drivers for your specific card. For example, I installed the Intel SDK for OpenCL for my onboard GPU, as well as Nvidia's CUDA toolkit and drivers. I tried an older AMD card and found that AMD's Linux support is, uh, not great. There is an open source package for AMD called rocm (RadeonOpenCompute) which I did eventually manage to get working, but it requires a custom kernel and will likely take a day of your life or more.

If all goes well, when you build ProFit you should see these two messages:

- Found OpenCL headers
- Found OpenCL libs

... and you should get some return value from profitOpenCLEnvInfo(). For more useful output, run profitGetOpenCLEnvs(). On my desktop, I get this output:


    name env_i                                  env_name version dev_i                                        dev_name supports_double supports_single
1 opencl     1               Portable Computing Language     2.0     1 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz            TRUE TRUE
2 opencl     2 Experimental OpenCL 2.1 CPU Only Platform     2.1     1         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz            TRUE TRUE
3 opencl     3                               NVIDIA CUDA     1.2     1                            GeForce GTX 1060 3GB            TRUE TRUE
4 opencl     4                           Intel(R) OpenCL     1.2     1                            Intel(R) HD Graphics           FALSE TRUE
5 opencl     4                           Intel(R) OpenCL     1.2     2         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz            TRUE TRUE

I have four different environments, including POCL, CUDA and a couple of Intel OpenCL implementations (beignet and Intel SDK). Miraculously, all of them work, but this isn't always the case - on my laptop, POCL crashed constantly.

To check if all of these environments can be initialized properly, run:

profitBenchmarkResultStripPointers(profitGetOpenCLEnvs(make.envs = T))

Yes, it's a bit of a mouthful, but this should work without crashing and print the same table as above along with some pointers to memory addresses. These are OpenCL environments which you can pass to profitMakeModel, profitConvolve, etc.

Lastly, to check that you can actually make galaxies and ProFit, try running the example code in ?profitBenchmark. Here's my output for the last set of images:

    name                                  env_name version                                        dev_name tinms.mean_single tinms.mean_double
1  brute                                      <NA>      NA                                            <NA>                NA               159
2 opencl               Portable Computing Language     2.0 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz                80               109
3 opencl Experimental OpenCL 2.1 CPU Only Platform     2.1         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz                36                55
4 opencl                               NVIDIA CUDA     1.2                            GeForce GTX 1060 3GB                16                29
5 opencl                           Intel(R) OpenCL     1.2                            Intel(R) HD Graphics                25                NA
6 opencl                           Intel(R) OpenCL     1.2         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz                35                53
    name                                  env_name version                                        dev_name tinms.mean_single tinms.mean_double
1  brute                                      <NA>      NA                                            <NA>                NA               812
2 opencl               Portable Computing Language     2.0 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz               618               644
3 opencl Experimental OpenCL 2.1 CPU Only Platform     2.1         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz               202               343
4 opencl                               NVIDIA CUDA     1.2                            GeForce GTX 1060 3GB                27                46
5 opencl                           Intel(R) OpenCL     1.2                            Intel(R) HD Graphics               470                NA
6 opencl                           Intel(R) OpenCL     1.2         Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz               184               379
7    fft                                      <NA>      NA                                            <NA>                NA               164
[1] "Diff. FFTconv range: -2.3161e-23 7.2378e-24"
[1] "Rel. diff. FFTconv range: -1.0485e-11 7.5754e-12"
[1] "Diff. FFTWconv range: -1.6544e-23 6.6174e-24"
[1] "Rel. diff. FFTWconv range: -1.5217e-11 1.0311e-11"
[1] "Bruteconv 7.950e+02 ms, FFTconv 3.560e+02 ms, FFTWconv 2.040e+02 ms"

If you do manage to get opencl running on a powerful graphics card, please post your benchmark results here for comparison. I'd be curious to see how quickly it can run in practice.

Post by taranu on Nov 27, 2017 17:49:02 GMT 8

Quick Reply