Probably I got an answer, how to prepare me to parallel data computing with great Pascal architecture for my small budget 30usd. I have found Zotac P106-90 is out of mining game for now, I can bought and adopt for GPU computation. This graphics card looks like Zotac GTX 1060 but it have no any video output. My first test is for Windows 10, latest Cuda 10.2 produce great Buddha fractal for me. For render Buddha 48MB fractal image it takes 6 min.
nvcc --version
Cuda compilation tools, release 10.2, V10.2.89Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019Copyright (c) 2005-2019 NVIDIA Corporationnvcc: NVIDIA (R) Cuda compiler driver
I used JCudaFractals written in Java with JCuda library.
mvn compile
mvn exec:java -Dexec.mainClass="net.marvk.jcudafractals.Main"
Sometime I got CUDA_ERROR_ILLEGAL_ADDRESS errors.
jcuda.CudaException: CUDA_ERROR_ILLEGAL_ADDRESS
at jcuda.driver.JCudaDriver.checkResult (JCudaDriver.java:359)
at jcuda.driver.JCudaDriver.cuCtxSynchronize (JCudaDriver.java:2139)
at net.marvk.jcudafractals.fractal.Buddhabrot2.buddhabrot (Buddhabrot2.java:95)
at net.marvk.jcudafractals.controller.Controller.lambda$new$0 (Controller.java:25)
at java.util.stream.IntPipeline$1$1.accept (IntPipeline.java:180)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining (Streams.java:104)
at java.util.Spliterator$OfInt.forEachRemaining (Spliterator.java:699)
at java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:484)
at java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:474)
at java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:550)
at java.util.stream.AbstractPipeline.evaluateToArrayNode (AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray (ReferencePipeline.java:517)
at net.marvk.jcudafractals.controller.Controller.<init> (Controller.java:26)
at net.marvk.jcudafractals.Main.main (Main.java:10)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:834)
In common this cheap GPU works in calculations as expected.
C:\Users\Eugene>"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"
Thu Apr 30 22:49:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.22 Driver Version: 441.22 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 P106-090 TCC | 00000000:02:00.0 Off | N/A |
| 47% 49C P0 55W / 75W | 674MiB / 3012MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1524 C ...v7\win\64bit\Core_22.fah\FahCore_22.exe 289MiB |
| 0 2892 C D:\CUDA-Z-0.10.251-32bit.exe 73MiB |
| 0 3360 C ...gram Files\Java\jdk-11.0.7\bin\java.exe 270MiB |
+-----------------------------------------------------------------------------+
I also check performance parameters in CUDA-Z program that show, it is slow to upload data to device, but it very fast to share data between several devices.
CUDA-Z Report
=============
Version: 0.10.251 32 bit http://cuda-z.sf.net/
OS Version: Windows AMD64 6.2.9200
Driver Version: 441.22 (TCC)
Driver Dll Version: 10.20 (26.21.14.4122)
Runtime Dll Version: 6.50
Core Information
----------------
Name: P106-090
Compute Capability: 6.1
Clock Rate: 1531 MHz
PCI Location: 0:2:0
Multiprocessors: 5
Threads Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: No
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Stream Priorities: Yes
Memory Information
------------------
Total Global: 3012.12 MiB
Bus Width: 192 bits
Clock Rate: 4004 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 131072
Texture 2D Size: 131072 x 65536
Texture 3D Size: 16384 x 16384 x 16384
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Bidirectional
Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 738.929 MiB/s
Host Pageable to Device: 712.643 MiB/s
Device to Host Pinned: 776.933 MiB/s
Device to Host Pageable: 724.519 MiB/s
Device to Device: 35.1506 GiB/s
GPU Core Performance
Single-precision Float: 2149.6 Gflop/s
Double-precision Float: 36.9897 Gflop/s
64-bit Integer: 92.5176 Giop/s
32-bit Integer: 743.671 Giop/s
24-bit Integer: 558.109 Giop/s
Generated: Thu Apr 30 23:18:43 2020
Subscribe to:
Post Comments (Atom)
debug magazine archive
71 jounals still available on issuu with great story of netlabels time. debug_mag Publisher Publications - Issuu
-
For a cheap, obsolete environment I got a problem, fyne ui does not see my opengl. I remembered, the same problem appears for WFP, it also ...
-
This iportant option in Visual studio I started to use from 2004, in JetBrains Rider 'Always Select Opened File' option is located ...
-
71 jounals still available on issuu with great story of netlabels time. debug_mag Publisher Publications - Issuu
No comments:
Post a Comment