Profiling CUDA Benchmarks for Performance Analysis on Modern GPUs
High-Performance Computing (HPC) systems are becoming highly heterogeneous as we enter the exascale era. Performance, as well as power analysis of applications on hardware components such as Graphics Processing Units (GPUs), is thus of high interest and importance to the computing industry. In this research, we delve into evaluating the performance characteristics of two state-of-the-art NVIDIA GPU architectures: Volta and Turing. We select four CUDA applications, namely, Gaussian Elimination, Lower-Upper Decomposition, Stream Cluster, and Jacobi for profiling. We aim to study how these applications make effective use of the hardware by collecting and analyzing the profiling data available through six performance counters, laying the framework for future analysis under power and energy constraints. Such analysis is a precursor to identifying performance and scalability bottlenecks, improving GPU occupancy and utilization, and reducing the power print of HPC applications.