More processing elements per compute unit on GPU Intel recommends workgroup size of 64-128 Often 128 is minimum to get good performance on GPU On NVIDIA Fermi, workgroup size must be at least 192 for full utilization of cores If using a lot of registers or local memory, may be necessary/optimal to use smaller workgroup sizes The RT cores are used for just a portion of the rendering (intersecting rays with scene geometry); the CUDA cores are still used for shading and all other calculations. Keeping in mind that this is the very first hardware generation that supports raytracing, I'm pretty sure the next generations will have better performance.
May 09, 2016 · Each Pascal streaming multiprocessor houses 64 FP32 CUDA cores, half that of a Maxwell SM. Within each Pascal streaming multiprocessor there are two 32 CUDA core partitions, two dispatch units and a brand new, smarter, scheduler. In addition to an instruction buffer that’s twice the size of Maxwell per CUDA core.