Scientific applications are critical for solving complex problems in many areas of research, and often require a large amount of computing... Show moreScientific applications are critical for solving complex problems in many areas of research, and often require a large amount of computing resources in terms of both runtime and memory. Massively parallel supercomputers with ever increasing computing power are being built to satisfy the need of large-scale scientific applications. With the advent of petascale era, there is an enlarged gap between the computing power of supercomputers and the parallel scalability of many applications. To take full advantage of the massive parallelism of supercomputers, it is indispensable to improve the scalability of large-scale scientific applications through performance analysis and optimization. This thesis work is motivated by cell-based AMR (Adaptive Mesh Refinement) cosmology simulations, in particular, the Adaptive Refinement Tree (ART) application. Performance analysis is performed to identify its scaling bottleneck, a performance emulator is designed for efficient evaluation of different load balancing schemes, and topology mapping strategies are explored for performance improvements. More importantly, the exploration of topology mapping mechanisms leads to a generic methodology for network and multicore aware topology mapping, and a set of efficient mapping algorithms for popular topologies. These have been implemented in a topology mapping library – TOPOMap, which can be used to support MPI topology functions. PH.D in Computer Science, July 2013 Show less