- do follow best practices from nividia:0
- Critical best practices:
- find those loops
- memory coalesce
with GPU:
[java] Score 2134 0.0000s 0.0000s 0.0570s 0.0084s 17.9790s
[java] Grow 11238 0.0000s 0.0000s 0.1950s 0.0040s 45.2970s
[java] Scoring-Cuda 2124 0.0010s 0.0000s 0.0020s 0.0010s 2.1580s
without GPU (64 bit):
[java] Score 2134 0.0000s 0.0000s 0.4000s 0.0488s 104.1230s
[java] Grow 11238 0.0000s 0.0000s 0.1980s 0.0032s 36.0170s
without GPU (32 bit):
[java] Score 9607 0.0000s 0.0000s 0.2150s 0.0343s 329.9720s
[java] Prune 57599 0.0000s 0.0000s 0.0910s 0.0005s 30.7650s
[java] Grow 57642 0.0000s 0.0000s 0.7900s 0.0059s 339.4380s
Quite amazing, isn't it? Acoustic scoring time reduced from 104s -> 17s. Also, note the GPU only took 2s.
edit: If you must know how much was shipped to GPU: that's 15 million loops per 0.1s of audio