Thursday, April 30, 2020

Render Buddha fractal on decommissioned P106-90

Probably I got an answer, how to prepare me to parallel data computing with great Pascal architecture for my small budget 30usd. I have found Zotac P106-90 is out of mining game for now, I can bought and adopt for GPU computation. This graphics card looks like Zotac GTX 1060 but it have no any video output. My first test is for Windows 10, latest Cuda 10.2 produce great Buddha fractal for me. For render Buddha 48MB fractal image it takes 6 min.

nvcc --version

Cuda compilation tools, release 10.2, V10.2.89Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019Copyright (c) 2005-2019 NVIDIA Corporationnvcc: NVIDIA (R) Cuda compiler driver



I used JCudaFractals written in Java with JCuda library.

mvn compile
mvn exec:java -Dexec.mainClass="net.marvk.jcudafractals.Main"


Sometime I got CUDA_ERROR_ILLEGAL_ADDRESS errors.

jcuda.CudaException: CUDA_ERROR_ILLEGAL_ADDRESS
    at jcuda.driver.JCudaDriver.checkResult (JCudaDriver.java:359)
    at jcuda.driver.JCudaDriver.cuCtxSynchronize (JCudaDriver.java:2139)
    at net.marvk.jcudafractals.fractal.Buddhabrot2.buddhabrot (Buddhabrot2.java:95)
    at net.marvk.jcudafractals.controller.Controller.lambda$new$0 (Controller.java:25)
    at java.util.stream.IntPipeline$1$1.accept (IntPipeline.java:180)
    at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining (Streams.java:104)
    at java.util.Spliterator$OfInt.forEachRemaining (Spliterator.java:699)
    at java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:484)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:474)
    at java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:550)
    at java.util.stream.AbstractPipeline.evaluateToArrayNode (AbstractPipeline.java:260)
    at java.util.stream.ReferencePipeline.toArray (ReferencePipeline.java:517)
    at net.marvk.jcudafractals.controller.Controller.<init> (Controller.java:26)
    at net.marvk.jcudafractals.Main.main (Main.java:10)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:566)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:834)

In common this cheap GPU works in calculations as expected.

C:\Users\Eugene>"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"
Thu Apr 30 22:49:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.22       Driver Version: 441.22       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  P106-090            TCC  | 00000000:02:00.0 Off |                  N/A |
| 47%   49C    P0    55W /  75W |    674MiB /  3012MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1524      C   ...v7\win\64bit\Core_22.fah\FahCore_22.exe   289MiB |
|    0      2892      C   D:\CUDA-Z-0.10.251-32bit.exe                  73MiB |
|    0      3360      C   ...gram Files\Java\jdk-11.0.7\bin\java.exe   270MiB |
+-----------------------------------------------------------------------------+

I also check performance parameters in CUDA-Z program that show, it is slow to upload data to device, but it very fast to share data between several devices.

CUDA-Z Report
=============
Version: 0.10.251 32 bit http://cuda-z.sf.net/
OS Version: Windows AMD64 6.2.9200
Driver Version: 441.22 (TCC)
Driver Dll Version: 10.20 (26.21.14.4122)
Runtime Dll Version: 6.50
Core Information
----------------
 Name: P106-090
 Compute Capability: 6.1
 Clock Rate: 1531 MHz
 PCI Location: 0:2:0
 Multiprocessors: 5
 Threads Per Multiproc.: 2048
 Warp Size: 32
 Regs Per Block: 65536
 Threads Per Block: 1024
 Threads Dimensions: 1024 x 1024 x 64
 Grid Dimensions: 2147483647 x 65535 x 65535
 Watchdog Enabled: No
 Integrated GPU: No
 Concurrent Kernels: Yes
 Compute Mode: Default
 Stream Priorities: Yes
Memory Information
------------------
 Total Global: 3012.12 MiB
 Bus Width: 192 bits
 Clock Rate: 4004 MHz
 Error Correction: No
 L2 Cache Size: 48 KiB
 Shared Per Block: 48 KiB
 Pitch: 2048 MiB
 Total Constant: 64 KiB
 Texture Alignment: 512 B
 Texture 1D Size: 131072
 Texture 2D Size: 131072 x 65536
 Texture 3D Size: 16384 x 16384 x 16384
 GPU Overlap: Yes
 Map Host Memory: Yes
 Unified Addressing: No
 Async Engine: Yes, Bidirectional
Performance Information
-----------------------
Memory Copy
 Host Pinned to Device: 738.929 MiB/s
 Host Pageable to Device: 712.643 MiB/s
 Device to Host Pinned: 776.933 MiB/s
 Device to Host Pageable: 724.519 MiB/s
 Device to Device: 35.1506 GiB/s
GPU Core Performance
 Single-precision Float: 2149.6 Gflop/s
 Double-precision Float: 36.9897 Gflop/s
 64-bit Integer: 92.5176 Giop/s
 32-bit Integer: 743.671 Giop/s
 24-bit Integer: 558.109 Giop/s
Generated: Thu Apr 30 23:18:43 2020

No comments:

debug magazine archive

  71 jounals still available on issuu with great story of netlabels time.  debug_mag Publisher Publications - Issuu