亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Application of Random Forest Regressions on Stellar Parameters of A-type Stars and Feature Extraction*

2022-05-23 08:45:34ShuXinChenWeiMinSunandYingHe

Research in Astronomy and Astrophysics 2022年2期

Shu-Xin Chen,Wei-Min Sun,and Ying He

1 Qiqihar University,Qiqihar 161006,China

2 Key Lab of In-fiber Integrated Optics,Ministry Education of China,Harbin Engineering University,Harbin 150009,China;sunweimin@hrbeu.edu.cn

3 Department of Computer Science and Technology,Tianjin Ren’ai College,Tianjin 301636,China

Received 2020 September 16;revised 2021 November 28;accepted 2021 November 29;published 2022 February 2

Abstract Measuring the stellar parameters of A-type stars is more dif ficult than FGK stars because of the sparse features in their spectra and the degeneracy between effective temperature(T eff)and gravity(log g).Modeling the relationship between fundamental stellar parameters and features through machine learning is possible because we can employ the advantage of big data rather than sparse known features.As soon as the model is successfully trained,it can be an ef ficient approach for predicting T eff and log g for A-type stars especially when there is large uncertainty in the continuum caused by flux calibration or extinction.In this paper,A-type stars are selected from LAMOST DR7 with a signal-to-noise ratio greater than 50 and the T eff ranging within 7000 to 10,000 K.We perform the Random Forest(RF)algorithm,one of the most widely used machine learning algorithms to establish the regression relationship between the flux of all wavelengths and their corresponding stellar parameters(T eff)and(log g)respectively.The trained RF model not only can regress the stellar parameters but also can obtain the rank of the wavelength based on their sensibility to parameters.According to the rankings,we de fine line indices by merging adjacent wavelengths.The objectively de fined line indices in this work are amendments to Lick indices including some weak lines.We use the Support Vector Regression algorithm based on our new de fined line indices to measure the temperature and gravity and use some common stars from Simbad to evaluate our result.In addition,the Gaia Hertzsprung-Russell diagram is used for checking the accuracy of T eff and log g.

Key words:methods:data analysis–surveys–stars:early-type–stars:abundances

1.Introduction

The A-type stars encompass a bewildering array of stellar types,and many horizontal-branch stars shown in the A-type star region on the Hertzsprung-Russell(HR)diagram suggest their evolutionary states.The fundamental stellar atmospheric parameters(Teffand logg)are the basis for astrophysics study of A-type stars,and estimation of these parameters are often from strong Balmer spectral lines.For low-resolution spectra,line index is an effective method to extract spectral features and has been widely used in astronomical research.Cenarro(2001)used the line index to calculate the Ca II flux and measured stellar atmospheric parameters to determine the effective temperature.Covery et al.(2007)wrote IDL programs to use Hammer line index to automatically classify stellar spectra.Yi et al.(2014)also added the features extracted from the spectrum using the Random Forest(RF)algorithm on the basis of Covery's program as a new feature index and applied it to the spectral classi fication of M dwarfs,and proving that the improved feature index has a better performance in the classi fication of M dwarfs.Inspired by the work of Yi et al.(2014),we apply RF in A-type stars to de fine new spectral line indices representing features for low-resolution spectra,and this speci fic de finition of line indices of A-type stars is sensitive to their stellar parameters.

Among all de finitions of line index systems,the Lick index is one of the most widely used line index systems applied in many spectral analysis fields.The line indices for A-type stars released by LAMOST were calculated following the de finition of the Lick system,which includes most of the prominent absorption lines.Hou et al.(2014)described the details of lines of A-type stars for low-resolution spectra.The advance of using Lick indices is that the error of flux calibration and radial velocity measurement can be ignored and the noise has little effect on the line indices.Tan et al.(2013)used line index as the training feature of sky survey data in the measurement process of stellar atmospheric physical parameters,and obtained the best regression model in the training of linear regression.Wang et al.(2014)used the Lick line index and applied the partial least-squares regression method for the measurement of the atmospheric physical parameters.The result of the partial least-squares regression model is not only consistent with the parameters of Sloan Stellar Parameter Pipeline(SSPP)released but also the partial least-squares regression can reduce the computational complexity,speed up the training process.Pan et al.(2015)pointed out different sensitivities of spectral lines to the effective temperature of main-sequence stars.They used line index as input of Support Vector Machines(SVM)to do the classi fication of stars.

However,there are only strong lines in the Lick system that are not enough for the correct parameterization of A-type stars.Thus,we are motivated to accurately estimate theTeffand loggfor A-type stars and get relatively weak features that are sensitive to the stellar parameters.To obtain the possible additional features,we choose to use the decision tree based RF algorithm to extract more features other than Balmer lines and Calcium HK,etc.RF is a regression method that has been used in several astronomical research.For example,Bai et al.(2019)applied RF to the stellar effective temperature regression for the second Gaia data release with the precision of about 191 K,based on the combination of the stars in four spectroscopic surveys.

In this work,we use LAMOST DR7 released A-type spectra with full wavelength as input of RF algorithm to establish the regression model for stellar parameters.Then we rank the wavelength according to the sensitivity to the parameters and obtain the most sensible lines finally.We then de fine the line indices for these lines and compare them to Lick indices.Using the newly de fined indices,we employ Support Vector Regression(SVR)to estimate the stellar parameters for A-type stars.The result of temperature and gravity from our method agrees with those from LAMOST.Cross-matching with Simbad,we get around 200 common stars with published parameters.A comparison of parameters is conducted to the common star.In addition,we calculate the absolute magnitude for the star with Gaia parallax and use the HR diagram to check our result.

The article is organized as follows.In Section 2,we introduce the LAMOST data we used.In Section 3,we present the application of RF regression in derivingTeff,loggand[Fe/H]of A-type stars from full spectra and de finition of speci fic line indices for parameter determination of A-type stars.Section 4 introduces the application of SVR to estimate stellar parameters using our de fined indices,and also presents HR and Keil diagrams to check the parameters we compute,and Section 5 summary the work in this paper.

2.Data

2.1.LAMOST Released Spectra of A-type Stars

The published LAMOST DR7 catalog includes 599,762 A-type star spectra,which were obtained during the pilot survey and 7 yr regular surveys.There are two formats for the A-type star catalog:i.e.,FITS and CSV.The full spectra ranging from 3700 to 8800?are used as input of the RF algorithm in the first run.The class of these stars contains both spectral type and luminosity class provided by the LAMOST analysis pipeline.We also compare our de fined index system with the line indices published in the LAMOST LRS Line-Index Catalog of A-Type Stars.The comparison includes kp12,Halpha12,and Hgamma12 are the Ca II-K,Halpha,and Hgamma.Teffand loggare from the catalog LAMOST LRS Stellar Parameter Catalog of A,F,G,and K Stars,in which parameters of 114,208 A type spectra are included.Crossmatching with Gaia EDR3,we obtained 108,581 stars with good parallax.We also remove some spectra classi fied as A-type but with a temperature lower than 7000 K.An example is shown in Figure 1 titled“spec-55859-f5907_sp15-081.fits”,of which the effective temperature is 6833 K and class is A9Vtype.Thus,we selected A-type stellar data with temperatures from 7000 to 12,500 K and S/N greater than 50.

Figure 1.A pipeline classi fied A9V-type star“spec-55859-f5907_sp15-081.fits”,whose T eff is 6833.16 K.

Figure 2.OOB(Out-Of-Bag)error and decision tree number in the random forest.

Figure 3.Distribution of the three physical parameters T eff(effective temperature),and log g(surface gravity),and Fe/H(chemical abundance)from A-type stellar spectra published by LAMOST.

Figure 4.Checking the T eff and log g with both on the HR(left panel)and Keil(right panel).Red dots in both panels represent A-type stars with the parameter estimated through line indices.

2.2.Removing Contamination of Negative Index Values

To obtain a robust relationship between stellar atmospheric parameters and spectral features for A-type stars,a clear sample without affection emission lines from stellar disks or exchange of material between binaries is necessary.We checked the line indices of A-type stars released by LAMOST and remove those spectra having negative index values.

3.Random Forest Prediction Analysis

The random forest(RF)algorithm,which belongs to the ensemble learning method in machine learning,is a combination of supervised prediction models.It can handle highdimensional data sets with good advantages and hold thousands of input variables.The model can output the importance of variables and establish a model for setting the variables of the data set.All decision trees depend on the corresponding random vectors.All the vectors are independent and identically distributed,and the most important variables are determined by reducing the dimensionality.Finally,the results of the classi fication tree are summed,and the accuracy of the prediction model is improved.Even with a large number of missing data,RFs can also maintain accuracy.

3.1.Random Sampling in the Whole Dataset

From the total A-type data set of around 80 thousand spectra described in Section 2.1,we randomly sample the data set to train the model.Section 3.3 will introduce the method for calculating the distance between different data points through an RF,thus realizing the regression.When the data set is not veri fied,the outside prediction error can be calculated,the category corresponding to the sample points that are not used when the tree is generated can be estimated by the spanning tree,and the outside prediction can be obtained by comparing with the real category.

3.2.Normalization

Before establishing the RF model,we remove the pseudocontinuum of each spectrum to keep spectral lines.We use a ninth-order polynomial to fit each spectrum,removing those points outside 3σfrom the fitted curve,and iteratively repeat the fitting four times.Then the intensity of each spectrum is recti fied by dividing the observed spectrum by the pseudocontinuum.

3.3.Random Forest Algorithm

All vectors in the RF are independent and identically distributed.Random forests are randomizations of column variables and row observations of data sets,generating multiple classi fication numbers.Finally,the results of classi fication trees are aggregated.Compared to neural networks,RFs reduce computation and improve prediction accuracy.Moreover,this algorithm is not sensitive to multicollinearity,and it issuf ficiently robust to process missing data and nonbalanced data.

The RF algorithm for prediction and regression mainly includes N randomly selected sample units from the original data to generate decision or regression trees,and m＜M randomly selected variables at each node as the candidate variables of the segmentation node.The number of variables at each node should be consistent.The full wavelength spectra as input of the RF and the results of each decision or regression tree are integrated to generate predicted values.In the training process,multiple decision trees will be generated,and each decision tree will produce a corresponding prediction output according to the input data set.The number of decision trees is a key parameter in the RF algorithm,the larger the number of decision trees,the better the regression results,the longer time consumption.In this work,we used 3800 decision trees as well as the number of input spectral data points.The remaining parameters were set to the default values.

The out-of-bag(OOB)error—which is an unbiased estimate of the generalization error whose result approximates the K-th tree fold cross-validation which requires additional computation—and the decision tree number in the RF are shown in Figure 2.The number of trees is about 500 to realize the regression.The difference for each split is less than 1.Mean of squared residuals is 4926.627,in addition,Var value is 96.57,which comply with the requirements of Section 3.

We rank the wavelength according to the importance of the parameters and then identify the spectral lines where the first 30 feature points forTeff,loggand[Fe/H]are located by searching for the line table from Moore et al.(1966).The details are listed in Table 1.We only listed the main elements contained in spectral lines with low-resolution.The first column lists the feature ID.In order to make the table more concise,features that fall on the same absorption line are placed in the same entry.The second column shows the name of the line in which feature points are located.The third column lists the vacuum wavelength corresponding to each spectral line.The fourth column shows the importance of the corresponding feature determined with the RF algorithm.

Table 1 Identi fication of Elements Sensitive to Parameters based on the Location of the First 30 Feature Points

Table 2List of Three New De fined Line Indices for Parameter Regression

Table 3Effective Temperature T eff as Predicted by Random Forest Algorithm with Three New Indices as Input

As listed in Table 1,we group the conjuncted wavelengths as spectral features.To obtain the most sensitive lines to three parameters,we consider top one or two features for each parameter.Then,we de fined three most important features,Ca II K at 3933?,blended feature of Co I,Mn I,Cr I,and V I lines ranging from 4109 to 4112?,and Sr II at 4077?.The detailed de finitions are listed in Table 2,including the feature name,index bandpass,and two sidebands.

3.4.Random Forest with New De f i ned Line Indices

In the RF algorithm,each tree grows to its maximum extent,and there is no branch-pruning process.Using training data that perform better in regression analysis can result in improved learning model characteristics.In this step,we made an RF temperature model using Ca II K,Blended Co I Mn I Cr I V I,and Sr II as input rather than using full spectra,the effective temperature is predicted as shown in Table 3.

4.Veri fication with SVR Algorithm

SVR is one of the best regression algorithms that focuses on handling overall error and tries to avoid outlier issues better than algorithms like linear regression.SVR builds a hyperplane in an N-Dimensional vector space,where we aim to keep data points inside the hyperplane for regression.We tried the SVR algorithm using the software package Sklearn with the newly de fined line indices as input.Comparing with LAMOST stellar parameter catalog,the precision is 123 K forTeff,0.32 dex logg,and 0.28 dex for[Fe/H]respectively.

4.1.Veri f i cation with Gaia Data

We cross match our sample with Gaia using Topcat to obtain parallax of these A type stars,and then calculate their absolute magnitudes.We plot them on both the HR and Keil diagrams to verify the regression results shown in Figure 4.

4.2.HR Diagram of A-type Stars

A schematic representation of how rotation affects the position of a star in the HR diagram,shown as Figure 4.In any case,a rotating star generally appears to be above the main sequence.Rotation displaces a star in the HR diagram.Consider a star seen in the equatorial plane.If it were possible to increase this star’s rotational velocity,we would see it move to the right and down,which toward cooler temperatures and lower luminosities.On the other hand,a star seen pole-on toward higher luminosities would move generally upwards in the HR diagram.Neither of these paths is necessarily parallel to the main sequence,and so a rapidly rotating main-sequence star,no matter the orientation,tends to lie above the main sequence.The A-type and early F-type stars have detected subtle differential effects in the spectra and photometry of rapid rotators,even those that are seen pole-on.

5.Discussion

Because line index would not be seriously affected by noise,it is a good feature representation of stellar spectra especially with low S/N ratio.In this work,we re-de fine a line index system using the RF algorithm.We apply the system in the LAMOST DR7 and get very good prediction performance.The indices are veri fied with SVR,and the correctness is veri fied by using Gaia data.The result shows that the RFs are a very useful tool for feature extraction dealing with high-dimensional data.For unbalanced data sets,RFs provide an effective way to balance data set errors to achieve balanced errors.Using our newly de fined line index system for A type stars to predict the stellar parameters of A-type stars,we can avoid the effect of interstellar extinction and degeneration of parameters.

Acknowledgments

We are very grateful to the anonymous referee for many useful comments and suggestions.This work was funded by the Joint Research Fund in Astronomy(Grant No.U2031142)under cooperative agreement between the National Natural Science Foundation of China(NSFC)and Chinese Academy of Sciences(CAS);the National Science Foundation for Young Scientists of China(Grant No.11803013)and Technology Innovation Center of Agricultural Multi-Dimensional Sensor Information Perception,Heilongjiang Province.

This research uses data obtained through the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST),which is funded by the National Astronomical Observatories,Chinese Academy of Sciences.

Research in Astronomy and Astrophysics2022年2期

Research in Astronomy and Astrophysics的其它文章: Calibrating Photometric Redshift Measurements with the Multi-channel Imager(MCI)of the China Space Station Telescope(CSST); The“Bi-drifting”Subpulses of PSR J0815+0939 Observed with the Fivehundred-meter Aperture Spherical Radio Telescope; Detection of Gamma-Rays from the Protostellar Jet in the HH 80–81 System; Radio Frequency Interference Mitigation and Statistics in the Spectral Observations of FAST; Systematic Errors Induced by the Elliptical Power-law model in Galaxy–Galaxy Strong Lens Modeling; Light Curve Analysis and Period Study of Two Eclipsing Binaries UZ Lyr and BR Cyg