Post by taranu on Nov 27, 2017 17:49:02 GMT 8
It occurred to me that there isn't much information anywhere about how to use OpenCL with ProFit, so I thought I'd give a few tips for the Linux users out there and maybe Aaron can help the Macheads.
First off, OpenCL is a framework that is designed to allow general-purpose computing on graphics cards (GPUs). Most modern graphics cards are specially designed to render video in parallel and can be hundreds of times faster than CPUs for certain kinds of problems. With ProFit, you may be able to get a speedup of anywhere from 1-10x using your GPU, leaving your CPU free to do something else.
To build ProFit with OpenCL support, you need OpenCL headers, implementations and appropriate device drivers. Depending on your distribution (I use Fedora), this means installing packages like:
opencl and opencl-devel
beignet (for Intel on-board graphics chips)
pocl (a general purpose framework that supports multiple devices)
mesa-libOpenCL (a somewhat out-of-date implementation called Clover)
There are also vendor-specific packages, including:
intel-opencl
rocm-opencl
xorg-x11-drv-nvidia-cuda
To make things even more complicated, you may need drivers for your specific card. For example, I installed the Intel SDK for OpenCL for my onboard GPU, as well as Nvidia's CUDA toolkit and drivers. I tried an older AMD card and found that AMD's Linux support is, uh, not great. There is an open source package for AMD called rocm (RadeonOpenCompute) which I did eventually manage to get working, but it requires a custom kernel and will likely take a day of your life or more.
If all goes well, when you build ProFit you should see these two messages:
- Found OpenCL headers
- Found OpenCL libs
... and you should get some return value from profitOpenCLEnvInfo(). For more useful output, run profitGetOpenCLEnvs(). On my desktop, I get this output:
I have four different environments, including POCL, CUDA and a couple of Intel OpenCL implementations (beignet and Intel SDK). Miraculously, all of them work, but this isn't always the case - on my laptop, POCL crashed constantly.
To check if all of these environments can be initialized properly, run:
profitBenchmarkResultStripPointers(profitGetOpenCLEnvs(make.envs = T))
Yes, it's a bit of a mouthful, but this should work without crashing and print the same table as above along with some pointers to memory addresses. These are OpenCL environments which you can pass to profitMakeModel, profitConvolve, etc.
Lastly, to check that you can actually make galaxies and ProFit, try running the example code in ?profitBenchmark. Here's my output for the last set of images:
If you do manage to get opencl running on a powerful graphics card, please post your benchmark results here for comparison. I'd be curious to see how quickly it can run in practice.
First off, OpenCL is a framework that is designed to allow general-purpose computing on graphics cards (GPUs). Most modern graphics cards are specially designed to render video in parallel and can be hundreds of times faster than CPUs for certain kinds of problems. With ProFit, you may be able to get a speedup of anywhere from 1-10x using your GPU, leaving your CPU free to do something else.
To build ProFit with OpenCL support, you need OpenCL headers, implementations and appropriate device drivers. Depending on your distribution (I use Fedora), this means installing packages like:
opencl and opencl-devel
beignet (for Intel on-board graphics chips)
pocl (a general purpose framework that supports multiple devices)
mesa-libOpenCL (a somewhat out-of-date implementation called Clover)
There are also vendor-specific packages, including:
intel-opencl
rocm-opencl
xorg-x11-drv-nvidia-cuda
To make things even more complicated, you may need drivers for your specific card. For example, I installed the Intel SDK for OpenCL for my onboard GPU, as well as Nvidia's CUDA toolkit and drivers. I tried an older AMD card and found that AMD's Linux support is, uh, not great. There is an open source package for AMD called rocm (RadeonOpenCompute) which I did eventually manage to get working, but it requires a custom kernel and will likely take a day of your life or more.
If all goes well, when you build ProFit you should see these two messages:
- Found OpenCL headers
- Found OpenCL libs
... and you should get some return value from profitOpenCLEnvInfo(). For more useful output, run profitGetOpenCLEnvs(). On my desktop, I get this output:
name env_i env_name version dev_i dev_name supports_double supports_single
1 opencl 1 Portable Computing Language 2.0 1 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz TRUE TRUE
2 opencl 2 Experimental OpenCL 2.1 CPU Only Platform 2.1 1 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz TRUE TRUE
3 opencl 3 NVIDIA CUDA 1.2 1 GeForce GTX 1060 3GB TRUE TRUE
4 opencl 4 Intel(R) OpenCL 1.2 1 Intel(R) HD Graphics FALSE TRUE
5 opencl 4 Intel(R) OpenCL 1.2 2 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz TRUE TRUE
I have four different environments, including POCL, CUDA and a couple of Intel OpenCL implementations (beignet and Intel SDK). Miraculously, all of them work, but this isn't always the case - on my laptop, POCL crashed constantly.
To check if all of these environments can be initialized properly, run:
profitBenchmarkResultStripPointers(profitGetOpenCLEnvs(make.envs = T))
Yes, it's a bit of a mouthful, but this should work without crashing and print the same table as above along with some pointers to memory addresses. These are OpenCL environments which you can pass to profitMakeModel, profitConvolve, etc.
Lastly, to check that you can actually make galaxies and ProFit, try running the example code in ?profitBenchmark. Here's my output for the last set of images:
name env_name version dev_name tinms.mean_single tinms.mean_double
1 brute <NA> NA <NA> NA 159
2 opencl Portable Computing Language 2.0 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 80 109
3 opencl Experimental OpenCL 2.1 CPU Only Platform 2.1 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 36 55
4 opencl NVIDIA CUDA 1.2 GeForce GTX 1060 3GB 16 29
5 opencl Intel(R) OpenCL 1.2 Intel(R) HD Graphics 25 NA
6 opencl Intel(R) OpenCL 1.2 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 35 53
name env_name version dev_name tinms.mean_single tinms.mean_double
1 brute <NA> NA <NA> NA 812
2 opencl Portable Computing Language 2.0 pthread-Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 618 644
3 opencl Experimental OpenCL 2.1 CPU Only Platform 2.1 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 202 343
4 opencl NVIDIA CUDA 1.2 GeForce GTX 1060 3GB 27 46
5 opencl Intel(R) OpenCL 1.2 Intel(R) HD Graphics 470 NA
6 opencl Intel(R) OpenCL 1.2 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 184 379
7 fft <NA> NA <NA> NA 164
[1] "Diff. FFTconv range: -2.3161e-23 7.2378e-24"
[1] "Rel. diff. FFTconv range: -1.0485e-11 7.5754e-12"
[1] "Diff. FFTWconv range: -1.6544e-23 6.6174e-24"
[1] "Rel. diff. FFTWconv range: -1.5217e-11 1.0311e-11"
[1] "Bruteconv 7.950e+02 ms, FFTconv 3.560e+02 ms, FFTWconv 2.040e+02 ms"
If you do manage to get opencl running on a powerful graphics card, please post your benchmark results here for comparison. I'd be curious to see how quickly it can run in practice.