Artificial renaissance itself: Render Buddha fractal on decommissioned P106-90

Probably I got an answer, how to prepare me to parallel data computing with great Pascal architecture for my small budget 30usd. I have found Zotac P106-90 is out of mining game for now, I can bought and adopt for GPU computation. This graphics card looks like Zotac GTX 1060 but it have no any video output. My first test is for Windows 10, latest Cuda 10.2 produce great Buddha fractal for me. For render Buddha 48MB fractal image it takes 6 min.

nvcc --version

Cuda compilation tools, release 10.2, V10.2.89Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019Copyright (c) 2005-2019 NVIDIA Corporationnvcc: NVIDIA (R) Cuda compiler driver

I used JCudaFractals written in Java with JCuda library.

mvn compile
mvn exec:java -Dexec.mainClass="net.marvk.jcudafractals.Main"

Sometime I got CUDA_ERROR_ILLEGAL_ADDRESS errors.

jcuda.CudaException: CUDA_ERROR_ILLEGAL_ADDRESS
    at jcuda.driver.JCudaDriver.checkResult (JCudaDriver.java:359)
    at jcuda.driver.JCudaDriver.cuCtxSynchronize (JCudaDriver.java:2139)
    at net.marvk.jcudafractals.fractal.Buddhabrot2.buddhabrot (Buddhabrot2.java:95)
    at net.marvk.jcudafractals.controller.Controller.lambda$new$0 (Controller.java:25)
    at java.util.stream.IntPipeline$1$1.accept (IntPipeline.java:180)
    at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining (Streams.java:104)
    at java.util.Spliterator$OfInt.forEachRemaining (Spliterator.java:699)
    at java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:484)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:474)
    at java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:550)
    at java.util.stream.AbstractPipeline.evaluateToArrayNode (AbstractPipeline.java:260)
    at java.util.stream.ReferencePipeline.toArray (ReferencePipeline.java:517)
    at net.marvk.jcudafractals.controller.Controller.<init> (Controller.java:26)
    at net.marvk.jcudafractals.Main.main (Main.java:10)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:566)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:834)

In common this cheap GPU works in calculations as expected.

C:\Users\Eugene>"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"
Thu Apr 30 22:49:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.22       Driver Version: 441.22       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0 P106-090            TCC | 00000000:02:00.0 Off |                  N/A |
| 47%   49C    P0    55W / 75W |    674MiB / 3012MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
| GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1524      C   ...v7\win\64bit\Core_22.fah\FahCore_22.exe   289MiB |
|    0      2892      C   D:\CUDA-Z-0.10.251-32bit.exe                  73MiB |
|    0      3360      C   ...gram Files\Java\jdk-11.0.7\bin\java.exe   270MiB |
+-----------------------------------------------------------------------------+

I also check performance parameters in CUDA-Z program that show, it is slow to upload data to device, but it very fast to share data between several devices.

CUDA-Z Report
=============
Version: 0.10.251 32 bit http://cuda-z.sf.net/
OS Version: Windows AMD64 6.2.9200
Driver Version: 441.22 (TCC)
Driver Dll Version: 10.20 (26.21.14.4122)
Runtime Dll Version: 6.50
Core Information
----------------
Name: P106-090
Compute Capability: 6.1
Clock Rate: 1531 MHz
PCI Location: 0:2:0
Multiprocessors: 5
Threads Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: No
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Stream Priorities: Yes
Memory Information
------------------
Total Global: 3012.12 MiB
Bus Width: 192 bits
Clock Rate: 4004 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 131072
Texture 2D Size: 131072 x 65536
Texture 3D Size: 16384 x 16384 x 16384
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Bidirectional
Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 738.929 MiB/s
Host Pageable to Device: 712.643 MiB/s
Device to Host Pinned: 776.933 MiB/s
Device to Host Pageable: 724.519 MiB/s
Device to Device: 35.1506 GiB/s
GPU Core Performance
Single-precision Float: 2149.6 Gflop/s
Double-precision Float: 36.9897 Gflop/s
64-bit Integer: 92.5176 Giop/s
32-bit Integer: 743.671 Giop/s
24-bit Integer: 558.109 Giop/s
Generated: Thu Apr 30 23:18:43 2020

Artificial renaissance itself

Thursday, April 30, 2020

Render Buddha fractal on decommissioned P106-90

No comments:

debug magazine archive

Search This Blog