Hongbo Sun and Ling Ma
(State Key Laboratory of Deep-Sea Manned Vehicles, China Ship Scientific Research Center, Wuxi 214033, Jiangsu, China)
Abstract: In this paper, a new algorithm combining the features of bi-direction evolutionary structural optimization (BESO) and reinforcement learning (RL) is proposed for continuum structural topology optimization (STO). In contrast to conventional approaches which only generate a certain quasi-optimal solution, the goal of the combined method is to provide more quasi-optimal solutions for designers such as the idea of generative design. Two key components were adopted. First, besides sensitivity, value function updated by Monte-Carlo reinforcement learning was utilized to measure the importance of each element, which made the solving process convergent and closer to the optimum. Second, ε-greedy policy added a random perturbation to the main search direction so as to extend the search ability. Finally, the quality and diversity of solutions could be guaranteed by controlling the value of compliance as well as Intersection-over-Union (IoU). Results of several 2D and 3D compliance minimization problems, including a geometrically nonlinear case, show that the combined method is capable of generating a group of good and different solutions that satisfy various possible requirements in engineering design within acceptable computation cost.
Keywords: structural topology optimization; bi-direction evolutionary structural optimization; reinforcement learning; first-visit Monte-Carlo method; ε-greedy policy; generative design
Structural topology optimization (STO) focuses on determining the optimal layout of material inside the given design domain. As an optimization problem, it has one or more objective functions such as compliance, weight, and frequency, which subject to some constraints like volume, stress, and displacement. The material distribution is described by the density variable with value 0 or 1 at each point in the design domain. Considering the difficulty of solving the problem directly, the design domain is typically discretized into a large number of finite elements, so the nodal or element-wise densities can be used as the solution. Moreover, the densities are set as continuous variables varying from 0 to 1, thus the gradient becomes available.
As a continuous density-based method, solid isotropic material with penalization (SIMP), originally developed by Bends?e[1], Zhou and Rozvany[2-3], is the most popular methodology for STO, which can be solved by some gradient-based optimization algorithms efficiently, such as optimal criteria (OC) and the method of moving asymptotes (MMA).
Instead of using continuous density variables, bi-directional evolutionary structural optimization (BESO), developed by Young et al.[4]and Huang and Xie[5], directly removes the elements with low sensitivity and adds the elements with high sensitivity step by step. Although using gradient for updating the discrete variables will generate error to some extent, BESO is comparable to SIMP in the quality of solutions and the computational efficiency for a large number of problems. Furthermore, because of its discrete formulation, it is more convenient to combine BESO with other algorithms.
Due to the material penalization, neither SIMP[6]nor BESO[7]can guarantee the convergence to a global optimum. A possible approach to overcome the difficulty is to introduce the global search techniques, such as heuristic algorithms like Genetic Algorithms[8-9](GA), Ant Colonies[10-11], and Particle Swarms[12]. But even global search techniques cannot guarantee the convergence to the global optima for STO problems, and some non-gradient approaches show an extremely low computational efficiency[13]because of the astronomical number of the combinations of design variables. Thus, combining the advantages of gradient information and global search strategies is a compromising way to get an optimized solution. For example, GESO[8]is a successful combination of GA and ESO, in which the sensitivities calculated by BESO is used to keep the solving process effective and convergent, and GA operators are applied to improve the global search ability. Furthermore, Zuo et al.[9]combined GA with BESO to help recover those important elements, which is an improvement of GESO.
Over the past few years, considerable attention has been paid to STO problems by machine learning algorithms. For those problems whose sensitivity information of the design variables is difficult to calculate or even not available, Aulig and Olhofer[14-15]utilized some machine learning algorithms like linear regression, support vector machine, and neural network to replace the process of calculating the sensitivity. As for deep learning, convolutional neural network (CNN) and generative adversarial network (GAN) are used for generating the final structure with negligible computational time by inputting the intermediate results or even initial conditions[16-19]. However, this end-to-end learning method can only figure out problems similar to the training samples, which limits the scope of its applicability.
Reinforcement learning (RL) is another branch of machine learning, which learns how to get the maximum reward in the environment by trial and error constantly. In contrast to supervised learning, reinforcement learning may be less efficient but has the potential to explore unseen problems and gain better results beyond the history of human experience. Unfortunately, reinforcement learning has not yet been used to solve STO problems directly. The reason may lie in the enormous number of design variables. After all, reinforcement learning has recently been used for solving combinatorial optimization problems such as the traveling salesman problem with up to only 100 nodes and the 0-1 KnapSack problem with up to only 200 nodes[20], in which the number of the combinations of variables is much less than the problem faced in topology optimization, so RL needs to be proceeded with the help of prior knowledge.
Generative design is the process of exploring the variants of a design beyond only one design traditionally[21]. Simple exploration methods are varying geometrical parameters[22]or problem definitions[23], whose search ability is limited. More complex exploration methods are achieved using deep neural networks such as CNN and GAN[14-18,24]. The search scope is indeed enlarged in this way, but the drawbacks are the excessive computing source to prepare the training data and some methods are lack of mechanical analysis to guarantee the engineering performance.
In this paper, a method combining BESO with RL for generative design is proposed. Each element was considered as an agent, whose value function was updated by Monte-Carlo reinforcement learning based on its sensitivity, which could make use of the previous experience to achieve better results. Also,ε-greedy policy was used in the procedure of adding and deleting elements in order to introduce the uncertainty, so that the search ability could be improved. Finally, a group of different quasi-optimal solutions with acceptable compliance were generated for designers.
The remainder of this paper is organized as follows. Section 2 provides an introduction of the corresponding reinforcement learning techniques. Section 3 describes our proposed method that combines BESO with RL in detail. The results of several well-known 2D and 3D minimum compliance problems are given as validations in Section 4. Finally, Section 5 draws the conclusion.
Reinforcement learning has three key elements: states, actiona, and rewardr. The process of learning can be described as the interaction between the agent and the environment. The agent chooses actionatbased on the current statest, then the environment responds to these actions and acquaints the agent with the rewardrt+1as well as the next statest+1. Finally, the policy π, which consists of a series of actions, is learned to maximize the cumulative reward in the future. Sometimes the agent would rather sacrifice the short-term reward in order to get more long-term reward. Thus, reinforcement learning is adaptive for solving sequential decision problems.
When selecting the actions, there are two different kinds of methods: policy-based method and value-based method. The former determines the policy directly, while the latter determines the policy implicitly by value function, which are used in this paper for its convenience to express the importance of elements.
In this paper, first-visit Monte-Carlo method, which has been widely studied since the 1940s[25], is used to evaluate the value functionV. The information in episodes under policy π can be represented as a sequence:
s1,a1,r2,s2,a2,…,st,at,rt+1,…,sk~π
Gtis the return of the statest, which is calculated as the total discounted reward that follows the first occurrence ofst:
Gt=rt+1+γV(st+1)=rt+1+γrt+2+
……+γT-1rT
(1)
whereTdenotes the terminal time, and the discount factorγis in the range of 0 to 1. Then, the value functionV(s) of a statesis calculated as the expected return in the future:
V(s)=E[Gt|st=s]
(2)
There are two other calculation methods for calculating the value function, namely dynamic programming and temporal-difference learning, which are not suitable for STO because abundant iterations are needed.
Exploration-exploitation dilemma is a fundamental problem of RL. Exploitation means making the best decision based on current information, while exploration indicates gathering more information, which helps to make the best overall decision at the expense of the efficiency.
ε-greedy policy is a common method for exploration and mostly used for discrete action spaces in RL, which is defined as[25]:
(3)
wherea*denotes the best action for maximizing the value function, and |A(s)| is the total number of actions in the state. By choosing an appropriate value ofε, the balance between exploration and exploitation can be well controlled. In general, a decay schedule forεis usually added to reach the convergence.
In this section, a new method combining BESO with RL for continuum topology optimization problem of minimum compliance is proposed and described. The flowchart of the proposed method is shown in Fig. 1.
Fig.1 Flowchart of the combined method
BESO method was utilized in the first episode to provide prior knowledge to create a good benchmark for evaluation and guarantee a stable value function, so as to avoid the inefficiency of searching from scratch. Then during the subsequent episodes, the problem was solved repeatedly, the value functions were updated simultaneously, and elements were removed and added based on their objective functions byε-greedy policy. The final structure of each episode was reserved in the quasi-optimal solution set unless it satisfied the two metrics about the compliance and diversity. In the rest of the section, the procedure is explained in detail.
As the most typical topology optimization problem, minimum compliance STO under a certain limit of volume based on the density-based method is defined as
(4)
In BESO, the sensitivity of a solid element is equal to the change of the compliance of the structure when this element is removed[27], and the sensitivities for void elements are zero, as in[7]
(5)
whereueis the nodal displacement vector of theith element, andKeis the element stiffness matrix.
The filter scheme is achieved by averaging the elemental sensitivities:
(6)
where the weight factorHejis calculated by:
Hej=rmin-rej, {e∈N|rej}
(7)
in whichrejmeans the distance between the center of theeth and thejth element,rminis the filter radius.
Furthermore, the element sensitivities should be changed into the mean value of the sensitivities of the current iteration and the last iteration[28]:
(8)
wheretdenotes the current iteration number. It is a simple but effective way to keep the solving process more stable and easier to converge.
The STO problem belongs to a 0-1 integer planning problem, and the solving process is a kind of sequential decision process which can be represented by a policy π, so the policy π can be acquired by RL. But it should be noted that if the whole structure is viewed as an agent, the number of the combination of actions reaches up to over 1085just for a simple problem like designing a cantilever beam discretized by 288 elements and within a 50% volume fraction constraint[13]. Due to the complexity, elements were treated as individuals.
The features of the reinforcement learning algorithm are defined below. The state was defined as the current iteration number in each episode, which was sequential, and each state appeared once and only once during one episode. Moreover, it is noted that the volume fraction of the structure was fixed for each state. The action was the addition or removal of the element, and the reward was characterized by the updated density of elements:
(9)
(10)
whereCdenotes the compliance of the final structure andCBis the compliance of the final structure generated by BESO as a benchmark. In this way, the policy of the final structure with lower compliance had more influence on the value function and vice versa, then the search was guided to the direction closer to the optimum.
During the solving process of BESO, the changes in earlier iterations had a crucial impact on the determination of the final structure, when the volume fraction declined close to the minimum, the structure varied little, because there were not so many elements removed or added in each iteration. Hence, exploring in the latter iterations seemed valueless and would prolong the time for convergence. As a result, a decision was made to end up updating the value function after 5 iterations when volume fractions were invariant, so as to keep the balance between exploration and efficiency. The remaining work was to be accomplished by BESO until convergence.
According to the criterion that the elements are sorted by, the objective functionFis the weighted sum of the value functionVand the current sensitivityα. The numerical distributions ofVandαcan be seen in Fig.2(a) and 2(b), respectively, which derive from several intermediate results of a case. In order to makeVandαequal in the order of magnitude, a linear transformation similar to min-max normalization was applied toαaccording to the distribution ofV.
The elements were divided into two classes based on the ranking ofα. The proportion of elements ranked in the higher class was set as
Phigher=Vol*/2
(11)
where Vol*is the volume fraction constraint. Elements of the higher class could not be removed considering that the whole structure may collapse suddenly. So it should be guaranteed that the values ofαwere roughly equal to the values ofVfor the lower class (Fig.2(c)).αwas transformed as
(12)
Then, the objective functionFis calculated as
(13)
whereωis a weight that controls the search scope to keep the convergence.ωhas a strong influence on the ranking of the elements as shown in the jagged curve of Fig. 2(a). Thus generally,ωis a small value varying from 0.005 to 0.01 for common simple 2D cases, and a biggerωis suitable for 3D cases with more design variables and lower volume fraction constraint.
Fig.2 Distribution of value functions and sensitivities of several intermediate results of a case
ε-greedy policy was used to decide whether an element should be added or removed. In BESO, elements were deleted depending on the sensitivities completely like the greedy algorithm. Inε-greedy policy, after sorting the elements by their objective functions, the elements were divided into higher class and lower class, and the bottom (1-ε) fraction of the elements to be removed in this iteration were firstly deleted in the same way as BESO, while the otherεfraction were deleted randomly from the rest in the lower class.
To enhance the effectiveness of the random search, a ‘search window’ as shown in Fig.3 was used to change the scope of removal. For 2D problems, the search window has two parameters: winxand winy, representing the scope of the adjacent elements inxandydirection to be deleted together when an element is removed. A larger winxor winyis recommended to be used on a denser mesh. For 3D problems, a three-dimensional search window is needed with an additional parameter winz.
Fig.3 Search windows of different sizes
Like the penalization mechanism of GA operators[9], in each iteration,εdeclined gradually to obtain a convergent solution, and the scheme was formed as follows:
ε←ε0-ε0(t/t0)3
(14)
whereε0denotes theεat the beginning, andt0means the number of iterations when the volume fraction just reaches the minimum.
The convergence criterion used in the combined method is the same as BESO[28]. In one episode, the update of topology will not be stopped unless the volume constraint Vol*is reached and the following criterion is satisfied:
(15)
whereIdenotes an integral number which was set to 5 in this study,τrepresents the convergence tolerance and is always a specified small value.
To get a group of quasi-optimal solutions with low compliance as well as enough novelty, two metrics were used to evaluate the performance of the final structure in each episode:
First, the difference of compliance between the final structure and the benchmark was considered:
ΔC=(C-CB)/CB
(16)
Only the results with ΔCless than 0.2 were acceptable in this study.
Another metric Intersection-over-Union (IoU), which is a common evaluation metric for the similarity of images in the field of object detection, was calculated as the ratio of the intersection to the union of two rectangles named candidate bound CB and ground truth bound GB:
(17)
In this study, the threshold was set to be 0.9, and IoU was also available in 3D, where ‘a(chǎn)rea’ in Eq. (17) should be replaced by “volume”.
The first example considered the stiffness optimization design of a cantilever beam, which was subjected to a concentrated force applied at the center of the free end as shown in Fig. 4. The length and height of the design domain was 80 mm and 50 mm, respectively, and it was discretized into 80×50 four-node quadrilateral elements. Material properties were Young's modulusE=1 MPa and Poisson's ratioυ=0.3. The objective volume fraction was set to be 50% of the design domain. Parameters of BESO are shown as follows: the evolution volume ratio ER=0.04, the maximum volume addition ratio, a filter radiusrmin=4 mm, the minimum design variablexmin=0.001, the convergence criterionτ=0.1%, and the penalty exponentp=3.0. Moreover, parameters concerning RL are as follows: the discount factorγ=0.9,ε0=0.1, weightω=0.005, and winx=winy=3. It is noted that the symmetry between the upper half and lower half of the beam was considered during removing and adding the elements.
The final topology of BESO is shown in Fig. 5(a), and eight quasi-optimal solutions of the combined method are displayed in Fig. 5(b)-5(i). Finally, the information about compliance, IoU, and iterations of the final result in each episode is given in Table 1. It can be seen that the quasi-optimal solutions are different from the result of BESO to various extent. Some are highly similar as shown in Fig. 5(f), and some change a lot as shown in Fig. 5(e) and 5(h), but their compliance is all within the scope of acceptability. Moreover, the structure in Fig. 5(f) even has less compliance than the referenced structure, which means the combined method indeed contributed to searching for global optimums. Furthermore, to get eight applicable solutions, 14 episodes were needed, and the difference in the number of iterations of each episode were not significant. All in all, the quasi-optimal solutions were generated within acceptable computation cost.
Fig. 4 Dimensions of the design domain of a cantilever beam with loading and boundary conditions
Fig. 5 Final topologies of a cantilever beam by BESO method (a) and by combined method (b)-(k)
Table 1 Information of designs of a cantilever beam (Unsatisfactory metrics are deleted by strikethrough)
The evolutionary histories of the compliance are presented in Fig. 6. As can be observed, the compliance does not always increase as the number of iterations increases, and the evolution curve is usually not smooth. There are one or more peaks in each curve, and the amplitude and corresponding iteration of the peaks varies in different episodes, representing different search directions.
Fig. 6 Evolutionary histories of the compliance for the cantilever
It is worth noting that although increasingωcan help search for better results, it hampers diversity at the same time. For example, ifωis set to a relatively high value, such as 0.5, most solutions are with lower compliance than that of the benchmark, but also with extremely high IoU as shown in Table 2. It means that the solutions are lack of novelty and fail to provide designers with more information.
Table 2 Information of designs of a cantilever beam (when ω=0.5,unsatisfactory metrics are deleted by strikethrough)
The second example demonstrates a design problem for an L-shaped beam as shown in Fig. 7, which was loaded at the center of the rightmost free end byF=-1 N. The beam was discretized into 1600 elements and the volume fraction was 40%. Material properties wereE=1 MPa andυ=0.3. BESO parameters were set as ER=0.03,rmin=2 mm,xmin=0.001,τ=0.1%, andp=3.0. RL parameters wereγ=0.9,ε0=0.1,ω=0.005, and winx=winy=1.
Fig. 7 Design domain of an L-shaped beam with dimensions, boundary, and loading conditions
Final topologies obtained by BESO and the combined approach are shown in Fig. 8. The corresponding details are given in Table 3. All the eight results seem satisfactory, and half of them have lower compliance than the first generated by BESO method. It is noteworthy that an additional bar inside the structure as shown in Fig. 8(i) would help in decreasing the compliance. Another worthy information is that compared with a simpler design like example in Section 4.1, a more complicated structure is easier to satisfy the requirement of IoU.
Table 3 Information of designs of an L-shaped beam
The proposed approach can be extended to three dimensions straightforwardly. An optimized design problem for a 3D cantilever beam is shown in Fig. 9. The design domain was divided into 50×20×10 and modeled using 10000 eight node cubic elements in total. The remaining volume fraction was only 10% volume of the design domain. Material withE=1 MPa andυ=0.3 was used. BESO parameters were ER=0.03,rmin=3 mm,xmin=0.001,τ=1% andp=3.0. Parameters relative to RL were generally similar to those of 2D problems, whereγ=0.9, a lowerε0=0.05, a higherω=0.01, and winx=winy=winz=1 for a 3D search window. The symmetry alongyaxis andzaxis was considered.
Fig. 8 Final topologies of the L-shaped beam by BESO method (a) and by combined method (b)-(k)
Fig. 9 Dimensions of the design domain with loading and boundary conditions for a 3D cantilever beam
The optimal designs are shown in Fig. 10, and the corresponding information is in Table 4. As can be seen, the topologies in this 3D case vary in a greater degree compared with that in 2D cases despite more conservative parameters were chosen. As a cantilever beam, the IoU of the solutions of a three-dimensional beam is much lower than that in two dimensions, so the diversity of solution is easy to guarantee. However, divergence is more likely to occur in this case, for example, in the 9thepisode. That is why a lowerε0and a higherωwere selected to alleviate the problem.
Fig.10 Final topologies of the 3D cantilever beam by BESO method (a) and by combined method (b)-(k)
Table 5 shows the effect of main parametersωandε0. Six pairs of different parameters were chosen. In each case, the combined method was performed at most of the 21 episodes and was terminated early when 8 acceptable structures were obtained. The data in Table 5 was calculated by running the procedure for 10 times. It can be seen that decreasingε0can significantly avoid undesirable results, but the search scope is limited at the same time, so a balance between the convergence and the search ability should be kept by choosing an appropriateε0. Asωchanges from 0 to 0.1 (value function is not considered absolutely whenω=0), the number of solutions with an inacceptableCreduces gradually, which means the value function does help guarantee the quality of the solutions, but a too largeωis not recommended due to the decline in diversity.
Table 4 Information of designs of a 3D cantilever beam(Unsatisfactory metrics are deleted by strikethrough)
Table 5 Influence of main parameters on the performance of final designs
The combined method was extended conveniently to geometrically nonlinear structures. In nonlinear analysis, complementary work and force control by Newton-Raphson method was used. The difference was that the complementary workWCshould replace compliance as the final measurement:
(18)
in whichUmeans the displacement vector, ΔFis determined by the step used in Newton-Raphson method,iis the incremental number of the applied load, andnis the total number of increments.
The stiffness optimization of a 2D cantilever beam is demonstrated considering the geometrical nonlinearity simultaneously. The structure with dimensions of 60 mm in length, 40 mm in width, and 10 mm in thickness is shown in Fig.11. According to assumption, the available material only covers 50%. The material is linear elastic withE=1 GPa andυ=0.3. BESO parameters are ER=0.03,rmin=3 mm,xmin=0.001,τ=1%, andp=3.0. RL parameters are generally similar to those of 2D problems, whereγ=0.9, a lowerε0=0.05, a higherω=0.01, and a 3D search window with winx=winy=1. The symmetry constraint was no longer considered.
Fig. 11 Dimensions of the design domain of a nonlinear cantilever beam with loading and boundary conditions
To judge which design was better, the complementary workWCwas calculated by nonlinear FEM. The topologies as well as results are given in Table 6 and Fig.12. Compared with the results of linear FEM in Section 4.1, most of the structures became less symmetric, which was caused not only by the randomness of the deletion but by the nonlinearity. As a result, the diversity of solution is easier to guarantee. In addition, the values of complementary work are still acceptable and not influenced too much by the asymmetry. As the computation cost used in nonlinear analysis becomes higher, more running time is needed, so optimizing large complex 3D nonlinear structures seems challengeable.
Fig. 12 Final topologies of the nonlinear cantilever beam by BESO method (a) and by combined method (b)-(k)
Table 6 Information of designs of a nonlinear cantilever beam(Unsatisfactory metrics are deleted by strikethrough)
This paper presents a new approach for structural topology optimization combining the features of BESO method and reinforcement learning. During the process, the roles of BESO are providing the prior experience for RL and assisting in getting a convergent result in the final stage of each episode. As an auxiliary method to improve search ability,ε-greedy policy was utilized to disturb the search direction by deleting a fraction of elements randomly. In the meantime, the objective function consisting of the weighted sum of value function and sensitivity did well to help in getting convergent solutions with low compliance. The combined method was demonstrated on four compliance minimization problems of 2D and 3D, including one geometrically nonlinear problem. By fine-tuning the parameters like weight,ε, and the size of search window, a group of quasi-optimal solutions with acceptable engineering performance and novelty could be generated.
In contrast to traditional density-based approaches like SIMP and BESO which can only generate one determined solution and cannot guarantee the optimality, the proposed approach is able to create several different solutions with acceptable compliance. In theory, the structure with smallest value of compliance should be chosen as an optimum, but the one that fits the practical problem the most is the priority in practice. Thus, although some solutions proposed by this study are not better than that of BESO in compliance, they can serve as valuable alternatives for decision making in engineering.
For the combined method, the mechanical analysis needs restarting for each new episode, so the computing time in the following episodes is almost the same as that in the first episode. It has the potential to decrease the consumption by referring to similar information from previous episodes. Another improvement is hidden in the exploration methods. After all,ε-greedy policy is just a basic method. There are many other powerful approaches for exploration in RL.
Dataavailability
The corresponding code can be found at https:// github.com/Nick0095/RL_BESO.
Journal of Harbin Institute of Technology(New Series)2021年1期