Profiling CUDA Benchmarks