亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Enhancement on parallel unstructured overset grid method for complex aerospace engineering applications

        2023-02-09 08:58:56TinhngXIAOHolinZHIShunghouDENGZholinCHENXinyingLI
        CHINESE JOURNAL OF AERONAUTICS 2023年1期

        Tinhng XIAO, Holin ZHI, Shunghou DENG, Zholin CHEN,Xinying LI

        a College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

        b China Special Vehicle Research Institute, Jingmen 448035, China

        KEYWORDS Alternative digital tree;Implicit hole-cutting;Overset grid assembly;Parallel computing;Wall distance

        Abstract In the present study,an efficient overset grid method by means of parallel implicit holecutting is proposed for the sake of simulating unsteady flows in aerospace engineering involving multiple bodies in relative movement.In view of the degraded computational efficiency and robustness for conventional overset grid assembly,several innovative techniques are developed within the overset grid assembly process, viz., a bookkeeping alternative digital tree method to speed up the donor-cell searching,a fast parallel advancing front algorithm to accelerate the wall-distance calculation and a message-passing strategy with efficient information communication and lower storage expenditure within distributed computational architecture.The contribution of the developed techniques is evidenced by comparison with the existing alternative ways in terms of computing efficiency. Subsequently, the overset grid method is embedded into an in-house programed URANS solver to examine its capability in predicting the flow field of complex applications such as helicopter, store separation and component deploying. Results show that the developed overset grid methodology is, in practice, able to resolve the aerodynamic characteristics of complex aerospace engineering with a high-fidelity flow topology and accuracy.

        1. Introduction

        High-fidelity unsteady flow simulation on very complex aerospace configurations has always been challenging by means of Computational Fluid Dynamics (CFD) approach, particularly for the problems involving multiple bodies in relative motion. The conventional approach for flow simulation of complex configuration has been routinely performed recently using overset grids or sometimes called the Chimera grids,which allows modelling the multi-component systems with an optimum body-fitted grid in each. The overset grid was firstly proposed by Steger1to simplify the generation of structured grid on complex geometries and was then extended to unstructured grids and unsteady flow simulations.2-9Compared with the single grid topology, using overset grid requires an additional grid assembly procedure in order to categorize the mesh elements as hole cells, interpolated cells and computational cells. As denoted by the archived investigations, the grid assembly can be challenging for engineering applications with large-scale meshes partitioned and distributed among a number of separated processors in a distributed parallel environment. The memory requirement and execution time rising considerably in large-scale simulation is always a bottleneck for the overset grid technique. Therefore, a novel overset grid assembly methodology which is efficient, robust and memorysaving is required.

        Within the overset grid assembly, one of the most timeconsuming steps is the hole-cutting which is generally implemented by either explicit or implicit ways. The basic idea of explicit method10-12firstly defines the inter-grid boundaries and interpolated elements (nodes or cells) using geometric information to deactivate the grid elements inside solid bodies.Subsequently, a stand-alone donor-cell searching process has to be completed for establishing the interpolation relationship for those interpolated elements.However,the limitation of this method is requiring some amount of user inputs and expertise,which extremely affects the hole-cutting performance. The alternative Implicit Hole-Cutting method13-17(hereafter referred to as IHC) is regarded as a cell-selection process,where only the optimal cells located in a multiply overlapped region are used for the computation. A notable advantage of IHC is that the optimal cells are automatically selected by a one-by-one comparison with the proper ones found based on certain criterions (e.g., the cell size or distance to solid wall),which makes the IHC method automatic, robust and userfriendly.

        However, IHC still suffers from its assembling efficiency since the one-by-one comparison between overlapping regions for hole-cutting requires massive donor-cell searching and thus incurs a heavy computational task. Considering that there are M sub-grids and each contains N grid cells, computational expense of donor-cell searching can be in the order of O((M-1)×N2), which is extremely time-consuming for largescale simulations. To overcome such dilemma, several donorcell searching algorithms have been developed such as inverse map method,18stencil walk method,19and digital-tree based method20for the sake of accelerating the searching task as documented. Among them, the Alternating Digital Tree (ADT)method, employing a binary spatial data structure to perform a hierarchical search, is recently regarded as the most popular and applicable one in view of its inherent accuracy and robustness.21While looking into the procedure of ADT method, a notable drawback is obviously seen that the searching task always starts from the root of the entire ADT which degrades the searching efficiency when the depth of ADT becomes large particularly for large-scale meshes. Moreover, the irregular partition boundary of grid subzone leads to an imbalanced tree data structure which further degrades the superiority of the ADT method. Hence, the first aim of the present study is to propose a novel bookkeeping ADT method to overcome the above shortcomings and to enhance the efficiency of holecutting.

        Moving to the searching logic of the IHC method,the minimum wall distance is always employed as an indicator to define the optimal cells, and consequently, the wall distance information should be prepared ready beforehand with an extra effort which may also be time-consuming for largescale mesh system. Additionally, the above circumstance becomes even worse for unsteady flow with relative motion or deformation that the self-wall distance of all the movable sub-grids has to be calculated and updated for each time step.Recently, several efficient strategies in wall-distance calculation have been proposed, such as k-d tree based method,22inverse map based method23and differential equation based method.24-25Although the aforementioned methods have superior efficiency over the direct exhaustive search, the walldistance calculation task is still a very heavy burden which cannot be accepted in the overset grid assembly for large-scale problems with multi-body movements.23In this context,another focus of this paper is to develop a parallel advancing front method that is robust and sufficiently efficient to calculate the wall distance for large-scale partitioned meshes.

        Another issue that should be considered when using overset grid method is its feasible parallelization to deal with largescale meshes.The overset grid assembly process should be fulfilled in a distributed parallel environment, i.e.,using Message Passing Interface (MPI) technique, to alleviate the massive memory consumption of the necessary topological and geometrical information for millions or billions of cells.However,parallel overset grid assembly is a difficult task for large-scale partitioned grids because of enormous logical operations and memory consumption limitation.It is also not easy to improve the parallel efficiency for hole-cutting processes due to message communication between processors. In addition, the load balancing of the overset grid assembly can be strongly different from flow solver while grid partitioning is mostly solverbased. It is hardly scalable to efficiently distribute the job between processors because the work requirements for holecutting and donor search are unknown beforehand, particularly when the load is dynamically changing as the grids are moving. For the sake of improving the efficiency and robustness, several parallel hole-cutting algorithms have been developed. For example, Sitaraman26and Roget et al.27developed a parallel implicit hole-cutting software PUNDIT of which the efficiency was further enhanced by load rebalance algorithm.Chang et al.28proposed strategies to speed up the donor search task by maintaining the necessary information of the global meshes in each processor,which can simplify the overset grid assembly process and reduce communications between processors, but would occupy massive memory storage particularly for unstructured grids, making it impracticable for large-scale meshes.Subsequently,the memory storage problem was alleviated in their further work17by employing a coarser background grid.Besides,Zhang and Owens,29Cai,30Liang,31Martin,32Zagaris33and Li34et al.also developed their parallel strategies for either explicit or implicit hole-cutting method,respectively. Since there is a great variation in the holecutting methods, the corresponding parallel strategies are also different from each other, while further exploration can never be too required for the sake of efficiency, automation and robustness.

        With the comprehensive literature survey on the existing technology development in overset grid assembly, the initial goal of the present study is to develop an efficient, automatic,and robust overset grid method in distributed-memory parallel environment for complex flow simulation.The developed overset grid assembly based on IHC is in a fully automatic manner.To achieve the desired efficiency and robustness, several techniques are proposed within the assembly process, viz., (A) a novel bookkeeping ADT method to efficiently search the donor cells; (B) a faster and easier wall-distance calculation method using parallel advancing front; (C) a low memory expenditure message-passing strategy for parallel overset grid assembly with local data used only,i.e.,on a per-process basis,avoiding massive global data storage and communications.To examine the capability of the proposed overset grid method,the parallel overset grid assembly module is then embedded into an in-house developed Unsteady Reynolds-Averaged Navier-Stokes (URANS) solver and demonstrated by several complex applications in aerospace engineering.

        2. Parallel overset grid method

        The proposed parallel overset grid method in the present study is based on a domain decomposition in a distributed-memory parallel environment. It generates separate sub-grids for each individual geometrical component,such as wing,flap,fuselage,and tail for the sake of simplifying the meshing,allowing relative movement and guaranteeing the grid quality simultaneously. One or more Cartesian/hybrid unstructured off-body sub-grids can also be prepared as background grid in view of the complexity and resolution requirements. The generated sub-grids are hierarchically organized into LAYERs,16which consists of several sub-grids overlapped between each other.Each LAYER(for instance LAYER n)can be only embedded into its immediate lower LAYER,i.e.,LAYER n-1.The number of LAYERs is arbitrarily defined to obtain a good overall grid system which can provide a high resolution near the bodies and gradually becomes coarser towards the far-field.

        For parallel computation, each sub-grid is partitioned into subzones by METIS or ParMETIS35approach, and the subzones are processed on distributed-memory computer architectures. The goal of the parallel overset grid assembly is to classify grid elements (cells or nodes) into active, nonactive and interpolated ones and to establish interpolation stencils between interpolated and donor cells among the distributed grid zones. The procedure of the present overset grid method is mainly divided into three steps.

        Step 1. Preparatory work: this step includes the generation and hierarchically organization of sub-grids for multicomponent system, domain decompositions onto different processors, as well as self-wall distance calculation of each sub-grid and ADT preparing for donor-cell searching job.

        Step 2.Implicit hole-cutting:this step is to exclude the grid elements intersecting or within solid wall surfaces from computation and to classify the remained grid elements into active and nonactive types (detail information is described in Section 2.1). Massive intersection check and donor-cell search in the implicit hole-cutting method require an accurate and efficient searching strategy. In this paper, a bookkeeping ADT algorithm is used to accelerate the hole-cutting task(Section 2.2).

        Step 3.Inter-grid boundary optimization:the final step is to define and optimize the inter-grid boundary between overlapped sub-grids and to establish the interpolation stencils(Section 2.3). In the present study, inter-grid boundary optimization is performed by a uniform algorithm for either cellcentered or vertex-centered scheme,enabling the present overset grid method to adapt to both types of solver.

        The present overset grid method is implemented in a distributed-memory parallel environment where the messagepassing strategy should be refined to improve efficiency (Section 2.4).Wall distance needs to be calculated on each separate sub-grid and to update to the whole domain for implicit holecutting and turbulence simulation, respectively. The efficiency of wall distance computation is also of critical importance particularly for unsteady problems. A parallel advancing front method is developed in the present paper (Section 2.5).

        2.1. Implicit hole-cutting

        For the present overset grid assembly method, a fully automatic IHC method is proposed to exclude the mesh vertices or cells which are non-active during the computation. For the sake of robustness, a pre-cutting procedure is performed firstly to blank the grid cells overlapping with or inside solid bodies since there may be grid cells of more than one subgrid overlapping within a single solid body and they could cause logical messes for the subsequent processes if they are not excluded previously. The detailed procedure of the precutting is illustrated by means of a two-element high lift airfoil as shown in Fig. 1.

        Step 1. The information of solid wall surface elements is collected from all subzones and stored in each process. Then,identify the grid cells that overlap with solid wall surfaces by an intersection check (Fig. 1(a)). Actually, geometrically precise intersection check is not necessary here. It is enough to meet the need with the potential intersecting cells identified by overlap check between the bounding boxes of grid cell and wall surface element.This avoids a large amount of unnecessary geometrical calculations. Fast search techniques, e.g.,ADT method, can be used to accelerate the intersection check task.

        Step 2.Identify and activate the grid cells outside solid bodies by a color painting algorithm in each grid subzone. The color painting starts from an arbitrary outer grid cell adjacent to physical or partitioned boundaries and paints color onto the whole region based on grid topology until it reaches the hole profile of intersection, as shown in Fig. 1(b).

        Step 3.The remaining unmarked cells (if there is any) in a sub-grid are only those inside the solid bodies and they can be identified by a single loop over all grid cells. Then,the grid cells that intersect with or inside solid body are blanked. By doing so, only the grid cells outside solid bodies are retained potentially for computation as shown in Fig. 1(c).

        The above pre-cutting process can be easily parallelized with high efficiency as most of the work is fulfilled locally with no need of communication except for the surface data gathering at the beginning.

        Fig. 1 Procedure of pre-cutting.

        After the pre-cutting, the hole-cutting is subsequently performed based on classification of grid elements so that optimal ones are selected and retained for flow computation. Generally, the grid elements in proximity to the rigid body own a higher resolution for resolving the flow field structures in near field region and thus they are expected to be retained for the computation. For the overset grid assembly, the wall distance is used as an indicator to activate or blank grid elements. The minimum self-wall distance from each node to the body surfaces in the same sub-grid is first measured. For the bodyfitted sub-grid, the real wall distances of the nodes are computed to the self-wall boundaries; for the body-off sub-grids served as background grid, the wall distances of each node are determined according to the LAYER’s level as (nmax-n)Δd,where n is the LAYER’s level number,nmaxthe maximum layers and Δd a constant distance value decided beforehand by user.

        With all grid elements outside solid bodies being active initially after the pre-cutting,the grid nodes(except for the nodes inside solid bodies) are firstly categorized into active and nonactive ones by the following two steps:

        Step 1. Search the donor cell for each node lying in the overlapping regions via searching algorithms. In the present method, a novel bookkeeping ADT method is used.

        Step 2. Compare the self-wall distances between the node and its donor cell. As illustrated in Fig. 2 (a), if the self-wall distance of the node(e.g.,P1)is smaller than that of its donor cell, the node is defined as an active or computational node;otherwise it is defined as a nonactive node (e.g., P2) excluded from computation.

        Fig. 2 Activate/deactivate grid nodes via minimum self-wall distance (active cells in solid lines, interpolated cells colored in blue and non-active cells in dashed lines).

        By the nodal activity, all cells outside solid bodies are classified into three groups: active cell with all nodes active, nonactive cell with all nodes non-active and interpolated cell that has both active and nonactive nodes. As shown in Fig. 2 (b),with this hole-cutting process performed in the overlapping regions, the finer grid elements close to the body are retained and the coarser ones far from the wall are excluded for computation, and thus an appropriate resolution near wall faces is provided.

        2.2. Book-keeping ADT technique

        During the IHC process,the intersection check and donor-cell search within solid hole profiling and grid element classifying dominate the time consumption, which stimulates a fast searching strategy. In this paper, a novel bookkeeping ADT method is developed and employed to perform searching tasks in an efficient,accuracy and robust manner.For completeness,the fundamentals of ADT method are briefly explained.

        It is then possible to efficiently search for grid elements inside a given search region by checking for overlap with the region at a tree node. For example, to find the grid cell in a subzone intersecting with or containing a given query element Q bounded by xQmin,xQmax(xQmin=xQmax=xQif the query element is a grid node), the search starts from the root of the ADT and recursively moves down to its sub-trees, as diagramed in Fig. 3(c). At each tree node, if overlap exists, i.e.,

        the grid cell associated with this tree node is regarded potentially intersecting with or containing the query element Q,and the search moves to its children. Otherwise, the current tree node and its children nodes are ignored.Once these potential grid cells have been identified, a small amount of additional geometrical checks are performed to identify the actual grid cell (if it exists) that intersects with or contains the query element. Obviously, the ADT algorithm is accurate and robust, and its unique data structure and hierarchical search significantly reduce the times of geometrical check and thus speed up the search process.

        However, the efficiency of the basic ADT method could be crippled under the circumstance of large-scale partitioned grids due to the fact that the large amount of grid cells in each subzone dramatically increases the depth of ADT tree and thus results in an obvious average increment of overlap and geometrical check for potential tree nodes. Moreover, the irregular partition boundary of grid subzone leads to an imbalanced tree data structure which further degrades the superiority of the ADT method.

        To solve this problem, a bookkeeping ADT is proposed.Instead of organizing all the grid cells in a given region into a single ADT tree, the bookkeeping ADT method creates a series of sub-ADTs with each associated with a subregion and the addresses of the sub-ADTs are registered onto a bookkeeping in a matrix manner.As illustrated in Fig.3(d)and(e),the creation and organization of sub-ADTs can be summarized as follows:

        Fig. 3 Schematic diagrams of basic ADT method and proposed bookkeeping ADT technique.

        As seen from Fig. 3(e), to keep integrity, the grid cells located over the boundaries of subregions are inevitably organized into more than one sub-ADTs. But this will not cause efficiency problem or not use too much additional memory since only the numbers of grid cells are stored in ADTs and such grid cells take a very small proportion for large-scale meshes.

        By using the present bookkeeping ADT method,the donorcell search for a given query grid point P=xPstarts from assigning the search task directly to a specific sub-ADT, i.e.,sub-ADT(i,j,k),where i,j,k are the int-coordinates of the query point with respect to the minimum corner of the logical cube,i.e.,

        Then the actual donor cell (if it exists) is identified by performing the search task in this sub-ADT following the same aforementioned process as the original ADT method. This process is demonstrated in Fig. 3 (f).

        Obviously,the benefits that can be expected from the developed bookkeeping ADT method are twofold:(A)Using multiply sub-ADTs instead of a single ADT to organize a given group of grid cells significantly reduces the depth of each sub-ADT, and the tree structure is more balanced for most of the nonempty sub-ADTs as each sub-ADT is connected to an isotropic cubic space.This feature gradually alleviates the burden of overlap and geometrical check for potentials.(B)Bookkeeping organization of sub-ADTs enables the donor-cell search to quickly locate the query point to a specific small subregion and thus dramatically narrows the search space.

        One thing should be mentioned is that two strategies of bookkeeping ADTs (or an ADT in the basic method) establishment are considered in the present method. The first one is establishing ADTs for the whole grid cells of each subzone,namely bookkeeping ADT for whole.This strategy is straightforward with good load balance and each grid cell is processed only once,but the cells outside the overlapping region are also included and thus it may increase the search workloads,particularly when only a small amount of cells are overlapped with other sub-grids. The second way is locally establishing ADTs for each subzone in the overlapping region between each pair of two overlapping sub-grids, namely bookkeeping ADT for local, in order to avoid the grid cells located outside the overlapping region unnecessarily organized into ADTs and hence to narrow the search range. However, this strategy may consume more time to construct the ADTs as the cells in the multiply overlapping regions are processed multiply times and are organized into multiply ADTs. The performance of these two ADT establishing strategies will be tested and compared in Sections 3 and 4.

        2.3. Inter-grid boundary optimization and interpolation stencils

        Fig. 4 Illustration of inter-grid boundary redefinition.

        The inter-grid boundary has been identified between the subgrids for inter-grid communication by the above implicit hole-cutting procedure. However, these overlapping layers are not spatially sufficient for a higher order flux computation.As shown in Fig.4(a),for the interface ij between control volume i and control volume j,where i is active node/cell and j is interpolating node/cell determined by the initial inter-grid boundary defining step, only first-order accuracy of flux computation can be obtained as the flow gradient cannot be reconstructed due to the fact that some neighbors of control volume j are non-activated and excluded from flow computation.Therefore, an optimization of inter-grid boundary, by which the non-active neighbors of volume j are activated as new interpolating nodes/cells, is needed to recover the high order accuracy of flux computation for volume i as demonstrated in Fig. 4 (a). An optimized redefining algorithm, which adds one or a few more layers of nodes/cells at each inter-grid boundary by advancing the inter-grid boundary to its nonactive region, is implemented for each sub-grid to recover the accuracy as illustrated in Fig.4(b).Two layers of interpolating nodes/cells are enough for the present secondary-order accuracy,but higher order scheme may need more layers.The algorithm is summarized in a pseudo program presented in Algorithm 1 which can be applied to both cell-centered and cell-vertex scheme and higher order accuracy of spatial discretization. Since the donor cells have been identified for all the nodes in the overlapping region during the aforementioned implicit hole-cutting procedure, the optimization of inter-grid boundary does not need additional donor-cell search and thus can be performed at very low cost by using this advancing algorithm.

        The final task of the present overset method is to establish interpolation stencils which are responsible for transferring the flow properties among different sub-grids. The interpolation stencils are uniform for both vertex/cell-centered scheme in present method.As presented in Fig.5,an interpolation stencil of each sub-grid contains the information of an acceptor and its donor control volumes. For vertex-centered finite volume discretization,the acceptor is defined by an interpolated node,and the donors are the nodes of its donor cell, while for cellcentered scheme,an interpolated cell is regarded as an acceptor and its donor cell together with neighbor cells are the donor control volumes. In the case of multiple sub-grids, there may be more than one candidate of donor cells for interpolation in the overlapping regions. Herein the active one with smallest/smaller cell volume is chosen as the optimum donor cell.In addition, the interpolation stencil also records the IDs of sub-grid and processor for the donor control volumes, which facilitates the flow data transmission during parallel message communication.The second-order accurate Laplace interpolation method is used here to transfer flow information from donor control volumes to an acceptor, where the Laplace Interpolation Coefficient(LIC)relative to the acceptor coordinate for each donor is computed and stored in the interpolation stencil beforehand.

        Fig. 5 Information for each interpolation stencil of each subgrid.

        Algorithm 1. Pseudo program for inter-grid boundary optimization

        images/BZ_126_868_1001_886_1018.pngimages/BZ_126_714_1039_731_1056.pngimages/BZ_126_1027_1076_1044_1094.pngimages/BZ_126_599_1565_617_1583.pngimages/BZ_126_900_1603_917_1620.pngimages/BZ_126_951_1603_968_1620.png

        2.4. Message-passing strategy for parallel grid assembly

        In this work, the MPI technique35is utilized for the distributed-memory computer architectures to deal with large-scale grids efficiently. The procedures of both flow solution and overset grid assembly require a massive communication between processors, which has an adverse effect on the simulation efficiency. Therefore, the message-passing strategy should be as refined as possible to improve efficiency.

        As aforementioned, a domain decomposition technique based on METIS35is employed to partition the initial subgrid into several parts with approximately equivalent cells,which are then distributed to each processor. Owing to that,load balance is usually derived in the procedure of flow solution, e.g., computing flux and solving linear equations. However, the domain decomposition poses challenge to overset grid method, particularly when it involves multi-body relative motions. The communication maps and data structure are dynamic for the relative positions between domains, incurring serious load imbalance during hole-cutting. As illustrated in Fig. 6 (a), each domain of m sub-grids is partitioned to n processors.Any processor i contains m subzones(G1_i,G2_i,...,Gm_i) which require assembly with those of other processors.Therefore, the number of donor-cell searching operations on each processor can be up to O ((m-1)×m×n).

        In the donor-cell searching step, the information of grid cells needs to be communicated between processors. A simple and concise way is to gather up the information and broadcast to all the processors as documented,17consuming two communication operations with a penalty of huge memory cost. The improved strategy employed here is shown in Fig. 6 (b). During one communication, the messages containing query point coordinates and their wall-distances of any subzone Gx_i are sent from processor i to another processor j. The messagereceived processor j finds out the potential donor-cells containing the received query point from all its sub-grids except Gx_j of the same domain, and then sends back the numbers of donor-cells to processor i. Note that each processor sends and receives messages simultaneously, so all the subzones of the same domain (e.g., Gx) on various processors will complete hole-cutting procedure with one communication, which can potentially improve load balance. Though the number of communication operations is of O(2mn),the data size of a single message decreases to a quite small amount so that the message-passing strategy may not have a significant effect on the efficiency while it can actually alleviate the memory expenditure.

        Fig. 6 Parallel strategy for overset grid assembly.

        One thing should be mentioned is the load imbalance related to donor-cell search task particularly. The distribution of the query nodes in the overlapping regions can be quite imbalanced among grid subzones since each sub-grid is usually partitioned to maintain an equivalent number of cells per partition. A large subzone overlapping small subzones can have an undesirably large number of query nodes, while a subzone without any overlap will not get any query nodes. Some load re-balance algorithms, such as the method designed by Roget and Sitaraman,27can be used to alleviate this load imbalance.However,load rebalance is not implemented yet in the present study and it will be accomplished in future work.

        2.5. Parallel advancing front method for wall distance calculation on overset grid

        The efficiency of wall distance calculation is of critical importance for overset grid assembly particularly for dynamic problems involving deforming and moving boundaries that the wall distance has to be updated at each time step. A parallel advancing front method for efficient and robust wall distance calculation is developed for the purpose of enhancing the overset grid assembly performance. The procedure of this method consists of two parts: self-wall distance computation on each separate sub-grid for implicit hole-cutting and wall distance updating to the whole domain on overset grids for turbulence simulation.

        2.5.1. Self-wall distance of sub-grid

        Advancing front method using color painting technique to calculate the wall distance based on the grid topology information is regarded as a fast strategy. Unlike the alternate methods (e.g., k-d tree based searching methods,22inverse map based searching methods23and differential equation based methods,24-25advancing front method is free from mapping establishment, massive searching and iterative solution,which is theoretically efficient whereas it highly relies on the uniform grids with similar cell sizes.Another issue that should be fully considered is using advancing front method combined with distributed-memory parallel architectures, where the initial grid is divided into a certain number of subzones (e.g. by METIS35and distributed to each processor to achieve a load balance and high parallel efficiency as well as low memory consumption. However, this parallelization brings difficulties to the wall distance computation using advancing front method,for the partial topology of grid cells on each processor fails to enable the front advancing properly.

        Fig. 7 Flowchart of parallel advancing front method.

        The flowchart of the parallel advancing front method is summarized in Fig. 7, where the innovative idea here to solve the above issues is a ‘‘two fronts and one smoothing” strategy which is highlighted in brown. Firstly, the first front wave forming adjacent to partial wall faces propagates throughout the partitioned grid on each processor,obtaining an initial wall distance field to be revised. Secondly, the information of minimum wall distances at nodes neighboring to partition boundary (namely communication interface) will be revised by message communication with their neighbors in the closest processors, and then the grids cells containing the revised nodes will be collected as the second front wave to advance throughout the partitioned region again. Finally, a smoothing operation,which regards the nodes with incorrect wall distance as the third front source, is implemented to recover the accuracy. These nodes can be identified by calculating the distance between each node and the nearest wall node of its neighbor,and the pseudo program for this smoothing operation is elaborated in Algorithm 2. The unstructured sub-grid 2 in Fig. 1 with 32 partitioned subzones illustrates the procedure of parallel advancing method in Fig.8.It can be vividly seen that only a few subzones adjacent to solid wall faces can derive their exact minimum wall distances in Fig. 8(a), whereas the results are improved and the accuracy is recovered after advancing front from communication interface in Fig. 8(b)and smoothing operation in Fig. 8 (c). Therefore, the proposed parallel advancing front method is able to derive accurate wall distance for partitioned unstructured grids.

        Fig. 8 Illustration of wall distance computation procedure via parallel advancing front method.

        Algorithm 2. Pseudo program for smoothing operation

        ! The partial grid of each processor is classified as! grid-flag Di=0 - unhandled / 1 - handled if grid-flag Di=0 return end if for all nodes in the partitioned grid, do for all nearest wall nodes of its neighbors, do calculate the distance between two nodes if the distance is smaller set its minimum wall distance as this distance set its nearest wall node as this nearest wall node end if end if this node is revised add all of its neighbor nodes to FrontNode-list end if end while FrontNode-list is not empty for all nodes in the FrontNode-list, do for all nearest wall nodes of its neighbors, do calculate the distance between two nodes if the distance is smaller set its minimum wall distance as this distance set its nearest wall node as the nearest wall node end if end if this node is revised add its neighbor nodes to TempFrontNode-list end if end set FrontNode-list as TempFrontNode-list empty TempFrontNode-list end

        2.5.2. Updating for overset grid system

        Once the minimum self-wall distances for each sub-grid have been computed, the overset grid assembly can be fulfilled as aforementioned in Section 2.1. The remained task is to recompute the exact wall distances for the whole overset grid system.The above parallel advancing front method is of modular nature and thus modified with a little effort for the sake of extending its capability to the overset grid system. An extra advancing front procedure is subsequently implemented, just replacing the wall boundary by the inter-grid boundary as the first front source in Fig.7.The wall distances for the nodes on the boundary are compared with those evaluated between themselves and the nearest wall nodes of their donor cells in other grid systems. If the re-computed distance is smaller,the information of wall distance for this node on the intergrid boundary is updated. The cells adjacent to the inter-grid boundary are regarded as the first front advancing to the grid region, meanwhile the ‘‘a(chǎn)dvancing from communication interface” and ‘‘smoothing” operations can be directly utilized for further recovering the accuracy according to the flowchart in Fig.7.The recovered accuracy of minimum wall distance after updating for overset grid system can be seen in Fig. 9.

        3. Feasibility and efficiency evaluation

        In Section 2, the developed parallel overset grid methodology with implicit hole-cutting is introduced in detail, and several techniques for the sake of improving the computational efficiency are highlighted.To examine the feasibility and efficiency of the above method, particularly to evaluate the acceleration of the minimal wall distance calculation and the bookkeeping ADT technique, a four-sphere overlapping system is employed,as shown in Fig.10(a),as the test case.The domain around each sphere is meshed with unstructured grid with about 10 million cells and organized in one LAYER. The background Cartesian mesh is around 10 million, resulting in an overlapping grid system with total of about 50 million cells.The overset grid assembly test was performed on a highperformance cluster with 2.6 GHz AMD EPYC 7H12 CPUs(64 cores on each CPU). The assembled overset grids of this case are shown in Fig. 10(b),10(c), where the accurate intergrid boundaries clearly demonstrate the feasibility of the current overset grid method for very large-scale meshes.

        During the test, the minimum wall distance fields for each individual sub-grid as shown in Fig.11(a)were computed first via the parallel advancing front method.Note that the wall distance contours in the background grid are given a constant value and those in overlapping zones are incorrect. The recomputed wall distance contours after establishing overset grid system are presented in Fig. 11(b), where the whole region derives the exact minimum wall distances.To test the minimal wall distance calculation efficiency of the parallel overset grid,the speedup ratio to the exhaustive search (generally regarded as an indicator of computational efficiency) is compared with other archived alternative methods,23e.g.,the sphere marching method, the octree method and the Eikonal equation method,as illustrated in Fig. 12.

        Fig. 9 Minimum wall distance field before and after updating for overset grid system.

        Fig. 10 Overset grid system for test case test with four spheres:

        Fig. 11 Minimum wall distance contours.

        Clearly, the parallel advancing front method is more efficient than other methods for a varying number of processors in wall distance calculation. The present method is expected to have about three orders of speedup ratio though a decreasing tendency of parallel efficiency is noted due to the load imbalance and time expenditure of message communication.Fig.12(b)plots the time consumption of self-wall distance calculation and updating for overall sub-grids, where the similar parallel acceleration characteristic can be found.

        Fig. 12 Efficiency performance of wall distance computation.

        As aforementioned, the proposed bookkeeping ADT technique is, in principle, able to accelerate the donor-cell searching during implicit hole-cutting. The performance of both ADT establishing strategies, i.e. bookkeeping ADT for local and bookkeeping ADT for whole, was tested with a various number of processors by varying the division number ND.Following the technical routine in Section 2.2, the overlapping region or the whole region in each subzone in Fig. 10 (a) is divided into ND3subregions (NDvaries from 1 to 20 for local ADT strategy, from 1 to 60 for the whole one) and subsequently the elements contained in each subregion are organized in its corresponding sub-ADT. As noted, NDonly influences the time expenditure of ADTs establishment, precutting (blanking cells in solid walls) and donor-cell searching processes. Fig. 13 presents the investigated results related to the performance of the two bookkeeping ADT strategies. As shown in Fig. 13 (a), compared with the basic ADT method(ND=1), the two bookkeeping ADT strategies both significantly accelerate the hole-cutting process. For each group of tests with the same number of processors, the speedup ratio increases with the increase of division number ND, but the growth slows down gradually for larger ND. The reason lies in the fact that it consumes more time to establish the increasing number of sub-ADTs as demonstrated in Fig.13(b)where the task duration for ADTs establishment,pre-cutting process and donor searching with 256 processors are compared. The task duration also indicates that, although the time consumption of ADTs establishment increases with larger ND,the positive effect of bookkeeping ADT technique is more evidenced in pre-cutting and donor-cell searching, particularly for the strategy of bookkeeping ADT for whole. Since bookkeeping ADT for whole involves all the cells of each sub-grid, the elements in each sub-ADT are more than those of the local strategy,in principle it needs more time to accomplish the searching process when NDis small, while increasing the NDcan effectively improve the efficiency. It should be noted from Fig. 13(b) that the total time of overset assembly by the two ADT strategies are almost approximate when NDincreases large enough to their own optimum. The reason is that, with optimally large ND, the sub-ADT regions of both strategies are almost in the same size and contain roughly the same amount of grid cells, and therefore the donor-cell search processes by these two strategies consume about the same time.When comparing the speedup ratio between the tests for the local ADT strategy with various processors in Fig. 13 (a), one can note that with the same division number(ND≥5),the performance of the bookkeeping ADT degrades with the increasing processors. This is mainly due to the fact that each partitioned subgrid contains small enough grid cells for large-scale processors,where it will involve more repeated cells in different sub-ADTs so that it benefits less from the bookkeeping ADT technique which is originally aimed at reducing the cell amount registered in each sub-ADT.

        Fig. 13 Overset grid assembly performance of bookkeeping ADT technique at various division number ND in each direction.

        Fig. 14 ROBIN configuration.

        Table 1 Task duration for hole-cutting based on bookkeeping ADT for local and for whole with a various number of processors.

        The CPU time of the individual tasks, i.e., establishing ADTs, pre-cutting, searching donor cell and defining intergrid boundary,with respect to the implicit hole-cutting routine is summarized in Table 1 where the comparison is also presented between the two strategies,bookkeeping ADT for local and bookkeeping ADT for whole, with NDbeing 10 and 40,respectively. For this particular 4-ball case, as the domains of all the five sub-grids are mostly overlapped,most cells of both background sub-grid and four ball sub-grids are located in the multiply overlapping region, and hence it took more time to construct ADT by the local strategy than by the whole one.Owing to the large portion of overlapping region in this case,ND=10 is not optimal for the local ADT strategy in smallscale parallelization, and thus it is more time-consuming in the donor-cell search task than the method of bookkeeping ADT for whole. With larger NDor with large number of processors where the sub-grids are partitioned into smaller subzones, both of these two strategies reach approximate performance.

        In view of the parallel efficiency,it can be seen that the task durations of all the sub-tasks of hole-cutting are dramatically reduced when using more processors. However, the scalability remains far from ideal particularly for donor-cell search. This mainly contributes to the large load imbalance caused by the very imbalanced distribution of query points,while load rebalance algorithm has not been implemented yet in the present study. Though the disadvantage of load imbalance emerges,the parallel overset grid method is feasible to improve the efficiency,especially when equipping the bookkeeping ADT technique and parallel advancing front wall-distance calculation method. Nevertheless, a more consistent accelerating performance can be expected if load balance is achieved by rebalance algorithm for the hole-cutting.

        Regarding the choice of the division number ND, the value of 5-15 is suggested for local ADT strategy while it is better to be larger than 40 for the one of ADT for whole. Herein NDis set to 10 and 40 as a default for them respectively for the remaining cases of this paper afterwards.

        4. Applications in aerospace engineering

        The efficiency of using implicit hole-cutting is evaluated and demonstrated in Section 4, and as for the initial motivation of the present study, the developed overset grid methodology should be capable of solving the complex problems in aerospace engineering with high efficiency, accuracy, and robustness. It is therefore the unsteady flow associated with several complex aerospace applications including helicopters, store separation and component deployment that is selected and simulated in this section for the sake of further examining the solver capability.

        An in-house-developed URANS solver is adopted to perform the simulations on the overset grids. This solver solves the unsteady Reynolds-averaged Navier-Stokes equations enclosed with k-omega shear stress transport turbulence model on unstructured grids by using finite volume method in a fully implicit manner. The convection term and diffusion term are calculated by upwind Roe’s flux difference splitting scheme and central difference scheme, respectively. Second-order spatial accuracy is achieved by a weighted least square linear or Green-Gauss gradient reconstruction with Venkatakrishnan’s limiter36applied to prevent oscillations near shock waves.For viscous fluxes computation, the velocity and temperature gradients at the interface are obtained by averaging the values of its adjacent control volumes with an additional correction to avoid odd-even decoupling. The discretized linear system equations are solved by either an iterative Lower-Upper Symmetric Gauss-Seidel (LU-SGS) or Krylov subspace type Generalized Minimal Residual (GMRES) algorithm. Time accurate computation for unsteady problems is achieved by a dual time stepping marching solution. More details can be found in Ref.16.

        4.1. ROBIN model

        The first validation case used herein is the NASA generic ROtor-Body INteraction (ROBIN) helicopter model,37which can be regarded as a multi-body system with large relative motions. The fuselage and each blade are meshed with 2.3 and 0.8 million hexahedral cells,respectively.The background structured mesh is around 16.5 million cells as seen in Fig. 14(a), resulting in a total grid cells number around 22.0 million.In the wind tunnel test of ROBIN model,37the rotor spun at 2000 r/min with an advanced ratio μ=0.15. The cyclic blade pitch angle θ for each rotor is defined using a function of the azimuthal angle ψ, formulated as.

        where θ0=12.8°, θ1c=-2.2° and θ1s=-2.0°.

        The overset grid assembly and flow simulation were performed with 32 processors, while both strategies of bookkeeping ADT establishment were tested and compared. Fig. 14 (b)illustrates the overset grid system after the overset grid assembly, where the areas bounded by the black lines are the intergrid boundaries, indicating that the overset grid system was well assembled. The task duration for hole-cutting process at one timestep is presented in Table 2 and the total time spent using various processors are shown in Table 3. In this case,since the background mesh occupies most grid cells (16.5 million of total 22 million),the sub-ADTs in the background subgrid established by the strategy of bookkeeping ADT for whole is still very large, and therefore it took much more time in the donor-cell searching compared with the method of ADT for local. Compared with the basic ADT method, the bookkeeping ADT methods, whether the local or the whole strategy, took significantly less time to complete the hole-cutting process for all the test cases with various number of processors, demonstrating the obvious superiority of the bookkeeping ADT method over the basic one.

        Fig.15(a)plots the vortex rings shedding from the rotating system which is contoured using iso-surface Q-criterion at 500 and rendered by the pressure coefficient.The thrust coefficient of the ROBIN model as well as the components,i.e.,rotor and fuselage, are plotted in Fig. 15(b). As seen, the force variation behaves with a periodic tendency after two rotating cycles,which indicates a good convergence,and the mean thrust coefficient is in a good agreement with the experimental data(with 5.5% underestimated by the CFD). To further validate the accuracy of the overset grid solver, the comparisons of the time-averaged and instantaneous pressure distribution along the longitudinal center line of the fuselage between the computed results and experimental data are presented in Fig. 15(c) and 15(d).

        4.2. Store separation trajectory prediction

        Accurately predicting the separation movement of external store is crucial particularly in terms of defining the safe operating envelopes. The wing-store configuration38is employed as the second validation case to further examine the accuracy of the parallel overset grid method. Note that the efforts of modelling the trajectory onset of separation also initiated the development of six-degree-of-freedom (6-DOF) relative body-motion capability in the solver. The generic pylon/store geometric configuration and global coordinate system are shown in Fig. 16 (a), where the benchmark wind tunnel tests for this case conducted at the Arnold Engineering Development Center can be referred to in the work of Rolland39and Lijewski and Suhs.40The scenario in Table 4 is selected for the current simulation. The resulted overset grid system (with total grid cells of 7.4 million)is presented in Fig.16(b)and the CPU time consumption for the hole-cutting by these ADT strategies are shown in Table 5. The total time of hole-cutting with various number of processors are also presented in Table 6 for comparison. The two bookkeeping ADT establishing strategies show a very similar performance in holecutting though the method of ADT for local took a bit more time to establish the ADT, and again, a competitive edge of the bookkeeping ADT method can be observed over the basic ADT method in term of efficiency for all the parallel test cases.The whole hole-cutting process by the bookkeeping ADT method consumed about 19 s on 32 processors for the grids with 7.4 million cells. Compared with the PUNDIT method tested by Roget et al.27, the present parallel overset grid method shows a nearly equivalent performance even though the present method involves a much heavier task of donorcell search than the PUNDIT method for the sake of more robustness and precision in dynamic unsteady computations.Two trajectory parameters, Center of Gravity (CG) location and CG velocity, for total store separation process in 1.0 s are predicted and compared with the experimental data39in Figs. 17. It is apparent that the global location and velocity of the CG show a good agreement with the experimental data39. Additionally, the pressure contour of the separation event is also depicted in Fig. 18.

        Table 2 Task duration of hole-cutting for ROBIN case (32 processors).

        Table 3 Total time of hole-cutting for ROBIN case with various number of processors.

        Fig. 16 Grid system for wing-store configuration.

        Table 4 Parameters of wing-store separation.

        4.3. Landing gears/cabin doors deployment

        The structure design of landing gears heavily relies on the aerodynamic loading especially when landing gears are deploying and retracting. Herein, the first application is to predict the complicated unsteady flow around an aircraft with its cabin doors and landing gears performing deployment. The aircraft configuration consists of three cabin doors and their landing gears,two aft fins and one fuselage.For this case,nine unstructured sub-grids were generated around each component with 16.8 million grid cells in total as presented in Table 7. The cabin doors start to deflect after a fully converged steadystate solution is derived, and the landing gears start deploying once all the doors reach the prescribed positions, where their motion setup can be found in Table 7. The freestream Mach number is set to 0.5 with angle of attack α=3.0° and angle of sideslip β=-3.5° being also considered.

        The overset grid system during deployment is illustrated in Fig. 19, indicating that the complex overset grids are properly assembled. The duration for each hole-cutting task with 64 processors and the total consumed CPU time of hole-cutting with various number of processors are presented in Table 8 and Table 9, respectively, where the results obtained by the basic ADT method are also given for comparison. As can be seen, the proposed bookkeeping ADT method is at least three times faster than the basic method in performing the holecutting in all the parallel test cases. Fig. 20 shows the surface pressure contours of each component and streamlines during the whole deploying process. The results demonstrate the robustness and efficiency of the present parallel overset gridmethod for handling large-scale overlapping meshes with complex geometries.

        Table 5 Task duration of hole-cutting for wing-store case (32 processors).

        Table 6 Total time of hole-cutting for wing-store case with various number of processors.

        Fig. 17 Temporal location and velocity variation of the center of gravity vs experimental data.

        Fig. 18 Store separation events colored by pressure coefficients.

        Table 7 Grid information and rotation parameters of each component.

        Fig. 19 Overset grid system illustration during deploying.

        Table 8 Task duration of hole-cutting for deploying case (64 processors).

        Table 9 Total time of hole-cutting for deploying case with various number of processors.

        Fig. 20 Surface pressure contour and streamlines during deploying.

        4.4. Weapons released from geometrically complex fighter

        The second demonstration case is to simulate the separation of two weapons released from a fighter with complex geometries.The fighter configuration as shown in Fig. 21 (a) is equipped with one drop tank and three missiles on a unilateral wing on each side. Four sub-grids are generated around the drop tank, and missiles are organized into one LAYER. The total overlapping sub-grids consist of about 17.6 million cells and 6.67 million nodes. The overset grid system at the initial time on multiple processors is presented in Fig. 21 (b), and the hole-cutting task duration with 64 processors and the CPU time consumption with various number of processors are shown in Table 10 and Table 11, respectively. The data indicate that the hole-cutting of such complex overset grid system can also be significantly accelerated by the proposed bookkeeping ADT method when compared with the basic one.

        During the computation, the Mach number of incoming free stream is 0.8,resulting in a chord-length Reynolds number of 5.5×106. Identical to the case in Section 4.2, both missiles under the wing are released in 6-DOF motion with a time step size Δt=0.0005 s once the steady-state solution is fully converged. The mass of inboard weapon is 420 kg while that of the outboard one is 152 kg. The CG displacements and velocities for each store during the 0.65 s separation time are predicted and presented in Fig. 22, from which different varying behavior of CG location in the X and Y direction can be observed between the inboard and outboard weapons. Due to the larger windward area and its non-streamlined blunt nose, the inboard weapon suffers a bigger drag than the outboard one, and thus it moves faster and further downstream,i.e. in X-direction; the pressure difference on the left and right side of the outboard weapon caused by aerodynamic interaction in the region near the wingtip contributes to the CG varying behavior of outboard weapon in spanwise direction different from the inboard one. The hole-cutting results on a slice of the overset grid system and flow fields for separation events are presented in Fig. 23.

        Fig. 21 Configuration of a fighter carrying stores.

        Table 10 Task duration of hole-cutting for weapons-releasing case (64 processors).

        Table 11 Total time of hole-cutting for weapons-releasing case at various numbers of processors.

        Fig. 22 Center of gravity location and velocity for each store.

        4.5. High-speed helicopter

        The final application presented in this paper is a high-speed helicopter configuration in forward flight configuration. As shown in Fig.24(a),the full-scale model consists of a fuselage,a pair of coaxial contra-rotating rotors, and a pushing propeller,and meanwhile carries four stores.Meanwhile,the fuselage was constructed in detail with a radar, monitor, pylons,and tails. These extremely complex geometries pose a great challenge for both the overset grid method and flow solver.The cyclic pitch angle θ of the upper and lower rotors is defined as.

        Fig. 23 Overset grid system together with surface pressure coefficients and streamlines for separation events.

        Fig. 24 High-speed helicopter configuration.

        Fig. 25 Detailed view of resulted overset grid system for 18 sub-grids around high-speed helicopter configuration.

        Table 12 Task duration of hole-cutting for high-speed helicopter case (64 processors).

        Table 13 Total time of hole-cutting for high-speed helicopter case at various numbers of processors.

        Fig. 26 Surface pressure coefficient contours during one quarter of main rotor rotation period.

        where θ0=6.0°,θ1c=-2.2°and θ1s=-2.0°in this case.The control phase Γ and differential lateral cyclic pitch θdsare set at 0°. The upper rotor of the coaxial contra-rotating system performs counterclockwise rotation at 300 r/min while the lower one is clockwise with the same speed. The 3.6 m diameter pushing propeller has eight twisted blades and rotates at a speed of 1200 r/min.According to the above described geometries and defined motions, 18 sub-grids were generated and organized in 2 LAYERs: a structured background grid in LAYER 1 and the 17 remaining sub-grids for fuselage, rotor blades, hubs, stores and pusher propeller in LAYER 2 which are embedded in LAYER 1. The overall sub-grids include 39.2 million cells and 29.5 million nodes, and the resulting overset grid system is presented in Fig. 24 (b). Detailed views of the overset grids near the rotor hub, pylon/store and tail propellers are also presented in Fig. 25 which indicates that the grid cells inside solid bodies are properly blanked and the hole-cutting algorithm works well for such extremely complicated configuration.Table 12 presents the time duration for the large-scale 18-sub-grid hole-cutting process, and Table 13 shows the total time of the hole-cutting process with different numbers of processors. As can be seen, similar accelerating rates were obtained by the proposed bookkeeping ADT method in this case when compared with the basic ADT method. In this case, the helicopter model is in straight forward flight with an advance ratio of μ=0.3728. Fig. 26 presents the surface pressure coefficient contour during one quarter of the rotation period of coaxial main rotors. The results of the overset grid assembly and flow computation in this case demonstrate the strong capability of the present method for complex aerospace engineering problems.

        5. Conclusions and future work

        A parallel implicit hole-cutting overset grid method was proposed in this paper by combining an unstructured overset grid with several efficiency-enhancement strategies,i.e.,a fast walldistance calculating algorithm,a bookkeeping ADT technique and a low-memory and efficient message-passing strategy, for simulating complex unsteady flows in aerospace applications.

        The unstructured overset grid technique, with sub-grids hierarchically organized into LAYERs,allows for overlapping and embedding of different types of meshes,of which the mesh quality and resolution can be independently guaranteed. An efficient implicit hole-cutting and inter-grid boundary definition procedure was designed,allowing a fully automatic implementation for either cell-centered or cell-vertex schemes. To accelerate the implicit hole-cutting process, a parallel advancing front method was developed to rapidly derive wall distance field and a bookkeeping ADT technique was proposed to speed up donor-cell searching jobs. Test results indicate that two or three orders of speedup ratio can be achieved by the present parallel advancing front method with a superiority over the archived methods for wall distance calculation, while the bookkeeping ADT accelerates the donor search with a speed several times faster than the basic ADT method. The strategy of locally establishing bookkeeping ADT for each reciprocally overlapping region is recommended since the test cases indicate its less case-dependence and more efficiency in donor searching when compared with the method of ADT for whole, though it takes a bit more time to establish the ADTs.In addition, a message-passing strategy was presented to implement parallel hole-cutting task with a low memory cost and a satisfying efficiency.As demonstrated,the time expenditure of overset grid assembly is significantly reduced by these accelerating techniques.

        The parallel overset grid method was successfully applied to several complicated unsteady aerodynamic problems in aerospace engineering, i.e., landing gears and cabin doors deployment, double weapons separating from a fighter and a highspeed helicopter in straight forward flight, to demonstrate its capability for simulating multiple body flows undergoing large relative movement and extremely complex configurations in engineering. The results show the robustness of this method for complicated unsteady problems and suggest this method to be an efficient way to deal with unsteady dynamic problems.

        Parallel scalability is still insufficient in the present overset grid method mainly due to the inherent load imbalance in donor-cell searching task caused by the inhomogeneous distribution of query nodes in overlapping regions among partitioned subzones. Future work is to implement load rebalance algorithms in the present method to enhance the scalability for massive processors.

        Declaration of Competing Interest

        The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

        Acknowledgements

        The present work is supported by the National Natural Science Foundation of China (Nos. 11672133 and 12002161), and Open Foundations of EDL Laboratory,China (No. EDL19092111). The supports from National Science Foundation of Shaanxi Province, China (No.2021JQ-078), Fundamental Research Fund of Zhuhai, China(No. ZH22017003210011PWC) and Aeronautical Science Foundation of China (No. F2021110) are acknowledged as well.

        亚洲福利第一页在线观看| 久久久久人妻精品一区三寸| 97se亚洲国产综合在线| 亚洲av永久精品爱情岛论坛| 中国a级毛片免费观看| 91麻豆国产香蕉久久精品| 国产女人91精品嗷嗷嗷嗷| 国产在线一区二区视频免费观看 | 国产精品国产三级国产an| 亚洲中国美女精品久久久| 亚洲av日韩精品一区二区| 一区二区视频中文字幕| 国产高清在线精品一区app| 激烈的性高湖波多野结衣| 内射无码专区久久亚洲| 国产男女猛烈无遮挡免费视频| 久久亚洲午夜牛牛影视| 免费女同毛片在线不卡| 日韩精品免费在线视频一区| 欧美牲交a欧美牲交| 天天爽夜夜爱| 国产精品久久久久国产a级| 亚州无线国产2021| 蜜桃视频一区二区三区在线| 久久一区二区视频在线观看| 国产精品视频自拍在线| 国产精品人妻一区二区三区四| 好男人日本社区www| 国产免费一级高清淫日本片| 精品系列无码一区二区三区| 亚洲国产一区二区视频| 精品福利一区二区三区蜜桃| 亚洲精品一品区二品区三品区| 台湾佬综合网| 成人不卡国产福利电影在线看| 日本骚色老妇视频网站| 激情五月天在线观看视频| 国产午夜免费高清久久影院| 亚洲成av人片在线观看ww| 免费人成又黄又爽的视频在线| 被暴雨淋湿爆乳少妇正在播放|