An efficient large-scale mesh deformation method based on MPI/OpenMP hybrid parallel radial basis function interpolation

2020-07-02 03:05:26ZhongZHAORongMALeiHEXinghuaCHANGLaipingZHANG

CHINESE JOURNAL OF AERONAUTICS 2020年5期

Zhong ZHAO, Rong MA, Lei HE, Xinghua CHANG,Laiping ZHANG,*

a State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China

b Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China

KEYWORDS Mesh deformation;Moving mesh generation;MPI/OpenMP hybrid parallel computing;Parallel radial basis function interpolation;Unstructured hybrid grid

Abstract An efficient MPI/OpenMP hybrid parallel Radial Basis Function (RBF) strategy for both continuous and discontinuous large-scale mesh deformation is proposed to reduce the computational cost and memory consumption.Unlike the conventional parallel methods in which all processors use the same surface displacement and implement the same operation, the present method employs different surface points sets and influence radius for each volume point movement,accompanied with efficient geometry searching strategy.The deformed surface points,also called Control Points (CPs), are stored in each processor. The displacement of spatial points is interpolated by using only 20-50 nearest control points,and the local influence radius is set to 5-20 times the maximum displacement of control points. To shorten the searching time for the nearest control point clouds, an Alternating Digital Tree (ADT) algorithm for 3D complex geometry is designed based on an iterative bisection technique.Besides,an MPI/OpenMP hybrid parallel approach is developed to reduce the memory cost in each High-Performance Computing(HPC)node for large-scale applications.Three 3D cases,including the ONERA-M6 wing and a commercial transport airplane standard model with up to 2.5 billion hybrid elements, are used to test the present mesh deformation method.The robustness and high parallel efficiency are demonstrated by a wing deflection case with a maximum bending angle of 45°and more than 80%parallel efficiency with 1024 MPI processors.In addition, the availability for both continuous and discontinuous surface deformation is verified by interpolating the projecting displacement with opposite directions surface points to the spatial points.

1. Introduction

Mesh deformation is one of the most important issues in the field of Computational Fluid Dynamics (CFD), such as in the simulations of unsteady multi-body separation,fluid-structure coupling, geometry optimization, morphing aircraft design, bio-fluid study, and so on. As well-known in the CFD,the computational mesh is generated firstly to discretize the flow field into amounts of single elements, which play the role of equation discretization and data carrier. For unsteady problems with moving boundary, once the geometry which refers to the mesh is deformed,the mesh also requires deformation. The capability and efficiency of the mesh deformation operation are crucial for the whole procedure of unsteady simulations. For example, during multi-body separation or shape optimization, a series of mesh deformation would seriously extend the simulation period, even though the flow field simulation runs in parallel.Moreover,the development of CFD on the air vehicle design is obviously driven by the capability of modern High-Performance Computing (HPC) systems which have made significant progress in recent years. Therefore, efficient parallel mesh deformation techniques should be developed to fulfill the needs of CFD simulations and to catch up with the development of HPC hardware.

To achieve this goal, several popular mesh deformation algorithms were developed by CFD researchers in the last decades. There are several widely used mesh deformation strategies in CFD simulations, besides the ones used in Finite Elements Method (FEM) such as nonlinear elasticity analogy method1,2and method by solving Laplacian equation.3Generally speaking,those mesh deformation strategies widely used in CFD applications can be categorized into two classes: the spring analogy method and the algebraic interpolation.

The spring analogy approach was firstly proposed by Batina.4It is the earliest moving mesh method that has been widely used for many years. In this method, the mesh points are assumed to be connected with its neighbor points like a network of spring system, in which the stiffness is inversely proportional to the edge length. Equilibrium balance is maintained by performing a spring for each of the spring elements which are the mesh nodes actually.Once the balance is broken by extra forces (morphing and/or movement of moving physical boundary), the mesh points can shrink or stretch to rebalance this spring system. If two points are closed to each other, the spring force can repeal them away from each other and vice versa.As a result,a large matrix system of static equilibrium equations needs to be solved for spring relaxation,which results in the drawback of extreme expensive computational cost.Another disadvantage is that this approach is only suitable for small deformations.Large deformation would lead to negative volume elements due to the intersection of mesh edges, particularly for anisotropic elements in the boundary layer. To overcome this shortcoming, the torsional spring techniques were introduced by considering not only the edge springs but also non-linear torsion,5which brings more coding complexity due to the complicated data structures of connectivity. Zhang et al.6proposed a hybrid method by coupling the spring analogy with a re-meshing approach to solve large deformation problems such as multi-body separation.Although CFD researchers have made great efforts to improve the performance, for example, Liu et al.7developed a Delaunay graph mapping approach on background grids to improve the efficiency,the spring analogy methods still suffer problems of low efficiency and poor robustness for large deformations.

The second type of popular mesh deformation technology is based on algebraic interpolation,such as Trans-Finite Interpolation(TFI)8-10and Radial Basis Function(RBF)based interpolation.11-14In these methods, the displacement of the interior field mesh nodes is interpolated from the morphing boundary.The TFI interpolation was originally used to generate static structured mesh,10and then had been developed to solve the problem of aeroelastic and optimization in recent years.8,9An improved method using exponential blending functions was further proposed to improve the quality and robustness of the TFI,9and it brought benefits of both efficiency and robustness for complex configurations. However, the applications of the TFI are limited to multi-block structured mesh due to the topology dependence of the interpolation.

Another widely used interpolation approach is the RBF based method, which was firstly used for mesh deformation by Boer et al.11This method has been attracting widespread attention due to its advantages regarding efficiency, maintenance of mesh quality,and independence of mesh connectivity information.12-14Through the RBF interpolation, the displacement or the new coordinates of the interior mesh points can be obtained directly by interpolating the boundary vertices’movement into the field points.The main difference from the TFI is that, the RBF interpolation depends on only the mesh coordinates and all connectivity dependence relations are removed, indicating that it can be used for arbitrary element types, both structured and unstructured meshes. Additionally, since the coefficient matrix can be prepared and stored in pre-processing, no extra computation, except the matrix-vector multiplications, is required during an unsteady simulation. Consequently, the RBF method is very efficient for mesh deformation in unsteady simulations. However, the memory consumption of this method is very high due to the large sparse basis function coefficient matrix,resulting in difficulties in three-dimensional complex cases with hundreds of millions of elements.To solve this problem,Rendall and Allen proposed a greedy approach to reduce the matrix dimension13,14by selecting a subset of surface moving points as the control points,based on displacement error minimization with a correction process, which has shown significant decrements of memory cost. To improve efficiency further, Kedward et al.15developed a new greedy approach in which the correction state is removed by capturing global and local motions at multiple scales. Furthermore, Wang et al.16improved the RBF interpolation efficiency by dividing the deformation into a series of sub-steps and adjusting the radius according to the maximum error. The RBF method has been further improved and applied by other researchers. For example, Wang et al.17combined the RBF approach with the Delaunay graph mapping method, similar to the previous work in Ref.7,and Michler used it for aircraft control surface deflection.18

Although the adaptability of the RBF to complex configurations is improved by data reduction or some other technologies, the efficiency and practicability should be improved further, especially for large-scale simulations with billions of elements. As mentioned above, CFD solvers usually run in a parallel environment; however, the mesh deformation module is usually serially implemented,which results in the bottleneck of the whole simulation period. In fact,some researchers have focused on developing parallel RBF interpolation, especially with data reduction scheme,to improve the mesh deformation efficiency.19,20In these parallel strategies,each processor stores the same global or reduced selected points and performs the same matrix-vector multiplication operations, which are almost essentially the same as the serial version. This type of RBF parallel strategy with data reduction still needs to be further improved.As mentioned in Ref.10,although the number of active control points is substantially reduced,the number of selected points is almost proportional to the total number of points in the mesh. Hence for a large-scale mesh with billions of elements, the selected surface points would be significantly increased.

Fig. 1 Discontinuous surface deflecting comparison using greedy-based RBF interpolation with different numbers of selected points (red).

More importantly,for problems with discontinuous surface movement,the greedy approach may select points in the inappropriate position which results in inaccurate deformation on and near the discontinuous surface. Different from those described in Ref. 21, the discontinuous interfaces cases may include surface deflection around a geometry entity edge (as shown in Fig. 1), the projection of inserted adaptive or highorder points to different directions on upper and lower surfaces near the trailing edge of a thin wing, and so on. Fig. 1 compares the discontinuous surface folding (rotating around a line) using the RBF interpolation based on data reduction with 100 and 500 subset points selected as the control points.For this type of surface motion problem, the surface folding is discontinuous, and inaccurate surface and volume deformation can be observed near the discontinuous region if only a small number of surface points are selected (Fig. 1(a)). To guarantee the surface accuracy,the number of reference points must be substantially increased, and the selected set of points should gather to the folding region (Fig. 1(b)).

This work aims to develop a fully parallel RBF approach for mesh deformation of three-dimensional applications with billions of unstructured elements. The paper is organized as follows. In Section 2, the original RBF interpolation formulation will be briefly introduced.In Section 3,the parallel version of RBF strategy is presented in detail, including the MPI/OpenMP hybrid parallel mode, the nearest point selection,and the determination of the local influence radius. In Section 4, the robustness and efficiency of the present method are demonstrated by three cases, in which the wing deflection of a transport airplane with element number up to 2.514 billion on 3072 CPU processors is employed. Finally, the concluding remarks are discussed in the last section.

2. RBF interpolation formulation

Firstly, the original Radial Basis Function (RBF) interpolation approach is introduced briefly as follows. Mesh deformation is achieved by interpolating boundary displacement, such as surface point movement, into the interior points. The formulation of the RBF interpolation11-17can be written as

where f（x）is the displacement of an interior point at the location of x,while xiis the position of the ith point on the moving surfaces,called control points(CPs).αiis the coefficient determined by the interpolation conditions, requiring exact recovery of the original function value at the surface point xiin the whole system.The interpolation basis function φ(the form of Wendland’s C2 is used as the basis functions22,23in this work) is defined as

Fig. 2 Parallel RBF strategy.

where R is the influence radius of all control points on the moving boundary.The influence radius strictly limits the influence of surface deformation within a support region. It means that the interpolated displacement will decrease to zero if the volume points lay outside the influence region. In general,the influence radius is an important parameter related to robustness.

The coefficients αishould be solved relying on the surface compatible condition.Apply Eq.(1)to the solid wall,by using s to represent the surface, and then the displacement of a surface point f（xs） is

where

and

In the equations above, ΔXs, ΔYs, ΔZsand Msare all known variables on the deformed solid surface wall, and Ax,Ayand Azcan be solved by Eq. (5). Then the displacements of interior nodes are calculated by

where the subscript v denotes volume points (or interior points) and Mvis similar to Eq. (6).

3. Parallel RBF strategy

3.1. Basic idea

In the original RBF method, xi(i=1, 2,···, N) is a point set including all of the surface moving points,and N is the number of surface points. The efficiency of the RBF method may become very poor when dealing with large-scale mesh due to a large number of control points. To overcome this drawback,Rendall and Allen13further proposed two improved techniques.

The first one is based on data reduction, in which a reduced subset of surface points is selected by estimating the function error. The selection of control points starts from a random initial point, and then new points are added into the point list one by one by a greedy type algorithm to minimize the error estimation. This method can greatly reduce the number of surface control points for general mesh deformation problems. The drawbacks which have been analyzed in the introduction make it hard to be applied to discontinuous surface deformation problems, and the selected subset of points may be substantially increased and gathered to the discontinuous region.

The other one is based on the Partition of Unity (PoU)idea.23-26To fulfill the demand of efficient fluid-structure coupling simulations, a pointwise form of the PoU approach is developed for surface grid interpolations. In this approach,the RBF interpolation is applied to each aerodynamic surface grid point separately, so that each aerodynamic point only depends on a reduced set of N nearest structural surface points found by a tree-type searching method.Although a large number of matrices must be inverted in each deformation time step,the inverting operations are efficient and the memory storage consumption is small due to the small matrix dimension. In order to smooth the interpolation,for each aerodynamic point,after interpolated from the structural surface, a smoothing procedure is performed by using M weighted nearest aerodynamic points.

In this work, in order to handle large-scale mesh deformation,a task-based parallel RBF approach is proposed for mesh deformation problems, in which a modified pointwise PoU based RBF interpolation method is applied on each processor.A specified small system of the RBF interpolation is performed for each volume point separately, using only a specified small number of nearest surface points as the control points.Unlike the fluid-structure PoU version, the smoothness operation is removed to adapt to large-scale parallel computing, because the M nearest volume points searching may be hard to be accomplished when each processor deals with completely different mesh partitions. Actually, for different volume points,their supporting point clouds partially overlap with each other and change smoothly even in the region of discontinuous surface deformation. The examples in Section 4 will demonstrate that even if the spatial smoothing procedure is omitted, the mesh quality is still maintained. In the following subsections,the parallel strategy is introduced firstly, and then the entire procedure is presented.

3.2. Parallel strategy

The most important issue to solve the RBF system is to get the inverse matrix of the coefficient matrix Ms.A feasible way is to inverse it using a math library, such as PETSc. However, it is difficult to carry out engineering applications for large-scale mesh with up to billions of elements, due to the huge memory consumption. Another way is based on task parallel computing, that is to say, all processors perform the same partition PoU based RBF interpolation. The latter way is chosen for parallel computing in this work.

Fig. 2 illustrates the parallel architecture and the running procedure of our parallel RBF strategy. First of all, the global computational mesh is divided into different sub-zones using open source code METIS,27and the sub-zones are assigned to different processors (the number of zones on a CPU processor depending on the hybrid parallel strategy that will be discussed in detail later). After that, the data structure of control points in each zone, including coordinates and displacements, are gathered to the server processor by MPI communication. Then the server processor (with MPI rank 0) sends the collected whole global data structure of control points on the wall back to each processor efficiently by a binary tree-type network communication structure.

In each zone, similar to the serial RBF version, the taskbased parallel RBF strategy is implemented independently on each processor.On each processor,the displacement of volume point is interpolated using the RBF approach one by one.However, if the original version of serial RBF is executed straightly, the problem of expensive memory and computing cost would arise again even though running on an HPC system, since the whole global control points are still used. Actually, the mesh quality and deformation are mainly determined by the nearest supporting point clouds. In the current work,the local control points, instead of the whole control points,are chosen as the RBF control points for each volume point that needs to deform.

3.3. Selection of nearest surface control points

In the present parallel RBF approach, the efficiency of selecting process of the nearest surface control points can greatly influence the whole deforming time. If the number of global control points is N, the direct searching complexity of the list of the nearest points for each volume point is O(N). That would result in the RBF efficiency declining significantly. An alternative efficient way is to use Alternating Digital Tree(ADT)algorithm for 3D geometry searching which can reduce the computational complexity to be lg N.28

The pseudo code in Algorithm 1 describes the searching procedure for the nearest CP subset using the ADT data structure. The basic idea of this algorithm is to reduce or enlarge the search scope iteratively by bisection on each 3D coordinate axis, till an ideal number of the nearest control points are found. Generally, 10-100 points are adequate for local RBF interpolation, which will be proved by the test cases in Section 4. In order to improve the ADT searching efficiency, the maximum node number, Nmax, is set to limit the bisection searching depth of ADT level, which means that the searching can be interrupted if the ADT node number of Nmaxis achieved. And Lnodeis 0.1 times geometry size.

Algorithm 1 Local nearest control point searching by ADT algorithm.

3.4. Influence radius definition

In the original RBF formulation,influence radius R in Eq.(3)is set to a global unified value for all points that need to be deformed. However, in the current parallel RBF approach,because the surface control point list of each volume point is different from each other, R is defined as 5-20 times the maximum displacement of the local control point subset.

The influence scope of deformation is limited to the nearby region, no matter continuous or discontinuous surface deformation occurs, which makes the local influence radius R reasonable for local parallel RBF interpolation. The following test cases will demonstrate the feasibility of the present approach.

3.5. MPI/OpenMP hybrid parallel mode

In the proposed parallel RBF interpolation algorithm,the global surface control points are gathered to the server processor and then are broadcasted efficiently to other processors. The drawback of this procedure is expensive memory cost for complex simulations with hundreds of millions of grid points since the massive global control points are stored on each processor.To overcome the shortcoming,an MPI/OpenMP hybrid parallel mode is designed for multi-core HPC system.

Fig.3 illustrates our MPI/OpenMP hybrid parallel communication model, which is designed based on TianHe-II system at Sun Yat-sen University and an in-house HPC system at China Aerodynamics Research and Development Center in China.The TianHe-II system consists of 16,000 nodes containing 32,000 Intel Xeon E5-2692 CPUs, each of which contains 24 cores and 64 GB memory. The in-house HPC system is equipped with FT-1500A CPUs and with 16 cores and 32 GB memory in each node.

In each HPC computational node,the memory is shared by all CPU cores within the node,and the data are communicated with other nodes by Message Passing Interface (MPI). Corresponding to the computer architecture and communication model, after each mesh partition zone has been assigned a CPU core, the global control points are stored in each HPC node with only one or few global copies which are collected from other zones using MPI communication.Then the displacements of volume points are interpolated in each zone simultaneously by using OpenMP to automatically parallelize the zone loops onto threads in each HPC node as shown in Algorithm 2.

An unstructured mesh of JAXA standard model with 52 million of elements from 3rd AIAA high lift prediction workshop is used to test the memory cost of the MPI/OpenMP hybrid parallel mode.Table 1 compares the memory consumption of surface mesh structure, including both point locations and topology connectivity, for different numbers of MPI processors and OpenMP threads.It is obvious that the number of surface data copies is reduced with MPI processors decreasing,which also reduces the total memory consumption.

Fig. 3 MPI/OpenMP hybrid parallel mode.

Table 1 Memory cost of different numbers of MPI processors and OpenMP threads.

Fig. 4 Grid partition interfaces and wing positions of M6 wing.

Algorithm 2 OpenMP parallel computing in each node.

4. Examples and applications

To validate the capability of the present method for moving boundary problems, several typical 3D configurations, including the case of maximum element number up to 2.5 billion,are adopted to test the MPI/OpenMP parallel RBF approach on 3072 cores in a parallel environment.

4.1. ONERA-M6 wing deflection

ONERA-M6 wing deflection is a typical case used to test the mesh deformation capability in the field of fluid-structure interaction and configuration optimization. This case is selected here to test the basic characters of the current mesh deforming method. The initial static mesh contains 0.47 million elements,including 0.17 million tetrahedrons,0.29 million prisms and 30 thousand pyramids.The height of the first layer is 8×10-5. The computational mesh is divided into 7 partitions and is parallel deformed on the corresponding MPI processors.

Fig.4 shows the grid partition interfaces and wing positions at different deflection steps. The bending deflection is defined as Fig.5 Effect of number of CPs on mesh quality,with maximum flapping angle of 20°.

Fig. 6 Effect of used number of control points on minimum skewness.

where Δβ is the step rotating angle, Lchordis the chord length of wingspan, and z is the spanwise position. The wing deformation is defined as rotating the whole surface points around the wing root of z=0.

The face skewness angle is chosen to measure the deformed mesh quality.It is defined as the complementary angle between the face normal and the vector from the face center to the neighbor cell center. The skewness angle varies from 0° to 90°, representing the worst and the best quality respectively.The mesh is poor for CFD simulation if negative skewness angles are found.

Fig. 7 Time consumption of each mesh deforming step.

Different from the original RBF method,only a few nearest CPs rather than the global CPs are selected in our improved method. The best way to test the impact of our local selection strategy on the mesh quality is to compare the mesh skewness by using local nearest CPs and global CPs.However,it is hard to carry out the comparison using global CPs due to expensive resource consumption. Instead, we compared the deformed mesh quality by using different numbers of selected local CPs. As shown in Fig. 5, up to 200 CPs have been used to deform the M6 wing by 20° deflection as Eq. (9). We can see that, when less than 30 CPs are adopted, the deformed mesh is invalid. The mesh quality is guaranteed and converges to the original (before deformation) mesh quality when more than 30 CPs are used, which indicates that the improved RBF method by using local nearest CPs is practical for mesh deformation.

Fig. 6 shows the comparison of mesh quality with a different number of control points, where the RBF influence radius is set to be 50 times the maximum local CPs’ displacement.From this figure, we note that the minimum skewness is negative unless more than 30 nearest control points are used. It is also shown that the minimum angle increases with the number of CPs increasing.

Fig. 7 shows the time consumption of each wing deflection step with different numbers of CPs. The more control points are used, the more time is needed since the matrix dimension Msis equal to the number of CPs. The time consumption of each deforming step is limited to less than 5 s with the minimum skewness angle about 3°, which means excellent mesh quality.

Fig. 8 shows the local view of slice mesh at different wing deflection positions, from 0° to maximum 45°. The statistics of the minimum skewness angle shown in Fig. 8(d) demonstrates that the mesh is available with less than 45° of deflection deformation, which is completely adequate for fluidstructure interaction simulations or optimization simulations.

Fig. 8 Slice mesh and mesh quality statistics at different wing deflection positions.

Fig. 9 Front and back view of surface deformation of CHN-T1 model.

Fig. 10 Mesh quality statistics with different deformation angles.

4.2. Continuous deformation of CHN-T1 model

Aeroelastic deformation has been studied for a long time,especially for large transport airplane, due to the huge wingspan.CHN-T1 is a standard model designed to verify the reliability of CFD in China.29Here we choose this model to prove the adaptability of the parallel RBF interpolation to complex configuration deformation.Viscous mesh with 13.6 million hybrid elements is generated, including 4.4 million tetrahedrons, 9.1 million prisms and 80 thousand pyramids.

As shown in Fig. 9, the wing bending deflection occurs on both the main wings and the horizontal tails, and the law of Eq.(10)was used to test the deflection.An artificial wing bending of a maximum of 15° is specified. At the wing tip, about 27% of the wingspan displacement is imposed. This large deformation is divided into three sub-steps, every with 5°.

Fig. 11 Time consumption of parallel RBF deformation.

Fig. 12 Parallel speed-up ratio and efficiency.

Parallel RBF interpolation is tested with different numbers of cores, from 64 to 1024. For this case, 90 local nearest CPs are selected, and influence radius R was set to be 5 times the maximum displacement of the local control point subset.

Fig.10 shows that the mesh quality at 5°and 10°bending is almost the same as the initial static mesh. Although the minimum skewness angle drops down to about 0.9°at the position of 15°deforming,the mesh can still be available for CFD simulation,since our CFD solver can run normally as long as the minimum angle is bigger than 0.01°. In Fig. 11, the wall-clock time statistics is given, which illustrates that the CPU time is reduced to less than 18 s when 1024 processors are used for deformation. Fig. 12 proves that this parallel algorithm has good speed-up performance, and the more than 80% parallel efficiency demonstrates the excellent scalability.

Fig. 15 Local view of slice mesh at maximum deformation.

Fig. 13 shows the front and back local view of the mesh after 15° deformation nearby the wingtip, the horizontal tails,and the vertical tail. We can see that the mesh orthogonality and smoothness are still maintained even after large deformation.From the mesh view and the quality statistics,it is clearly shown that the robustness of the current algorithm is guaranteed even viscous prism cells with large aspect ratio are used for high Reynolds number flows.

Fig. 16 Time consumption in each deformation step.

Furthermore,the main wing is deformed by exerting a sinusoidal signal disturbance as shown in Fig. 14. The mesh is deformed using 1024 processors, with 30 local nearest control points, and 20 times local maximum displacement. Fig. 15 shows the local slice mesh at the maximum deformation.Fig. 16 displays the wall-clock time of deformation which is divided into 7 steps, and only about 2.4 s for volume point deformation. Fig. 17 compares the mesh skewness before and after deforming. Although the mesh quality would be decreased along with deforming,the minimum skewness is still greater than the available tolerance of the solver (more than 0.01).

Fig. 17 Comparison of mesh quality before and after deformation.

4.3. Discontinuous deformation of CHN-T1 model

The above two cases demonstrate the ability of the current method for continuous deformation. In addition, the discontinuous surface motion needs to be considered in certain cases.For instance, near the trailing edge of a thin wing, the newly inserted adaptive points should be projected onto the CAD model, where the upper inserted points should be projected onto the upper surface, while the lower inserted points should be projected in opposite direction.In this case,the efficiency of the original RBF method may be seriously deteriorated because the global control points are used as mentioned in the introduction. The weakness would be obvious when point selection approach is used to reduce the surface control points attributing to the error estimation, where the number of selected reference points must be substantially increased and the selected set of points should gather to the discontinuous region.

This case is designed to test the mesh deformation ability of our parallel RBF method for discontinuous surface displacement. In the Adaptive Mesh Refinement (AMR) procedure,a new point is inserted between two endpoints of a mesh edge,and the newly inserted points on the surface are then projected onto the CAD entity to preserve the geometry,which will leadto mesh intersection in the boundary layer region (Fig. 18(a)).To untangle the interior mesh, the displacement of the newly inserted surface points is interpolated into the field spatial points by the present parallel RBF strategy. Actually, we employ this global refinement approach to generate largescale mesh from an initial small-scale mesh generated by available serial grid generation software. As shown in Fig. 18(b),the blue and red regions indicate small and large projection displacements, respectively. In addition, the displacement directions near the trailing edge of the wings are opposite on upper and lower surfaces, which is actually discontinuous.

Table 2 Mesh deformation time consumption and quality of different mesh.

Different from that in the large deformation in the above cases,during mesh refinement, the projection distance of solid wall points is relatively small, so only 20 CPs are selected and influence radius is set to be 20 times the maximum projection distance.

Table 2 shows the mesh quality and the CPU time of two suits of large-scale mesh deformation tests with 0.347 billion and 2.514 billion hybrid mesh composed of prisms, tetrahedrons and pyramids. The cases are tested using 384 and 3072 CPU cores on TianHe-II cluster,respectively.The comparison shows that the global minimum face angle is positive which indicates well-maintained mesh quality. Due to the computational resource limitation, only 3072 cores are used to deform the massive mesh with 2.514 billion elements,corresponding to about 0.82 million cells assigned to a computational core.The CPU time is 179 s and 821 s for the deformation of the two suits of mesh, respectively. Fig. 19 is the enlarged view near the wall in the 2.514 billion hybrid elements. It is clear that after projecting the surface newly inserted points and deforming the volume points, both the smoothness and the orthogonality of the mesh in the boundary layer are preserved.

Fig. 18 Surface points’ discontinuous displacement of CHN-T1 model, edges intersection caused by wall point projection (left), and contours of wall point projection distance (right).

Fig. 19 Near wall mesh after surface projection and volume deformation.

5. Conclusions

An efficient task-based parallel radial basis function interpolation method for large-scale mesh deformation is presented in this work. Since it only depends on the coordinates and mesh topology connectivity is not needed, it can be used for arbitrary type of element, both structured and unstructured mesh.Furthermore, the algorithm is parallelized based on task decomposition, so it is simple to be implemented and easy to reconstruct the parallel version by the available serial version of codes.

In order to deal with large-scale mesh up to billions of elements, MPI/OpenMP hybrid parallel computing is developed to save memory consumption with storing only one or few copies of the whole surface points.In addition,variable surface points and influence radius in the RBF approach for each volume point movement significantly improve the efficiency without compromising the quality of moving mesh. Fifty nearest control points are enough to guarantee the moving mesh quality. Of course, the fast ADT algorithm is necessary to accelerate the searching procedure.

The robustness and high parallel efficiency are shown through three test cases with viscous mesh.For tens of millions of elements, deformation can be completed in tens of seconds,which makes this method practical for engineering applications. The large-scale case of a transport airplane model with 2.514 billion elements is tested with 3072 CPU cores, which demonstrates the excellent mesh deforming ability and parallel scalability for complex applications.Additionally,this method is suitable for both continuous and discontinuous surface displacement cases.

Acknowledgments

This study was partially supported by the National Key Research and Development Program of China(No.2016YFB0200701) and the National Natural Science Foundation of China (Nos. 11532016 and 91530325).

CHINESE JOURNAL OF AERONAUTICS2020年5期

CHINESE JOURNAL OF AERONAUTICS的其它文章: Multi-disciplinary design optimization with fuzzy uncertainties and its application in hybrid rocket motor powered launch vehicle; Study on effects of thickness on airfoil-stall at low Reynolds numbers by cusp-catastrophic model based on GA(W)-1 airfoil; Unsteady experimental and numerical investigation of aerodynamic performance in ultra-high-lift LPT; Optimizing accuracy of a parabolic cylindrical deployable antenna mechanism based on stiffness analysis; A multi-criteria fusion feature selection algorithm for fault diagnosis of helicopter planetary gear train; Microstructure, thermophysical property and ablation behavior of high thermal conductivity carbon/carbon composites after heat-treatment

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放