Fortran 使用MPI共享内存模型编程
文章目录
【注意】最后更新于 October 17, 2016,文中内容可能已过时,请谨慎使用。
前言
为了实现节点间和节点内的并行通讯,一般采用的是MPI+OpenMP混合编程,但这种编程方式比较复杂。在MPI 3.0标准中,实现了同一节点内各进程间的远程内存访问(RMA),不必再使用send/recive方式来传输数据,通讯的开销可以降低,而且这样可以使用统一的通讯模型,编程难度也较低。
参考了Intel 对SHM模型的介绍An Introduction to MPI-3 Shared Memory Programming
以及Pavan Balaji和Torsten Hoefler在ISC16上讲的中高级MPI编程教程Next Generation MPI Programming: Advanced MPI-2 and New Features in MPI-3 at ISC’16
这些教程都是用C语言编写的,转换到Fortran时会遇到一些坑,所以又从Stack Overflow查找了Fortran相关的问题:MPI Fortran code: how to share data on node via openMP?
函数定义
1. MPI_Comm_split_type
Split the world communicator into groups that span the same host/node.
split_type需要选择为MPI_COMM_TYPE_SHARED
|
|
2. MPI_Group_translate_ranks
This function is important for determining the relative numbering of the same processes in two different groups. For instance, if one knows the ranks of certain processes in the group of MPI_COMM_WORLD, one might want to know their ranks in a subset of that group.
|
|
3. MPI_Win_allocate_shared
Window That Allocates Shared Memory
MPI_WIN_ALLOCATE_SHARED allocates a chunk of shared memory in each process.
|
|
4. MPI_Win_shared_query
This function queries the process-local address for remote memory segments created with MPI_WIN_ALLOCATE_SHARED.
|
|
5. MPI-3 RMA access epoch
有几组函数可用: MPI_Win_lock / MPI_Win_unlock + MPI_Win_sync MPI_Win_lock_all / MPI_Win_unlock_all + MPI_Win_sync MPI_Win_Fence / MPI_Win_Fence
进行完1-4的操作后,用5里边的函数来显式划分需要进行远程内存访问的代码区域。根据前边的参考文章,三组函数的开销依次增加。
例子
SHM模型中的函数用到了很多指针的操作,个人认为采用C BINDING的函数类型较好。
就是指针用type(C_PTR)来声明,而不是INTEGER (KIND=MPI_ADDRESS_KIND)。之后再调用C_F_POINTER来把c_ptr的指针与Fortran中的pointer类型指针关联起来。
|
|
文章作者 Izumiko
上次更新 2016-10-17