The variation of wood properties between different geographical origin and tree species has an important influence on end use applications. This study aimed to investigate the feasibility of wood origin and species classification based on visible and near infrared spectroscopy and chemometric methods. The influence of geographical origin on tree species identification also was analyzed. A total of 530 samples with 2 origins and 5 tree species were collected for analysis. The raw reflectance spectra were preprocessed by spectral transformation technique, and nonlinear discrimination models were built by support vector machine (SVM) using various spectral forms. Three algorithms—grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO)—were applied to optimize the parameters of SVM models, respectively. Regardless of spectral forms and optimization techniques, the prediction accuracy was lower than that of the calibration set for wood origin and tree species identification. Except for reflectance spectra, prediction accuracy of 100 percent was obtained based on SVM in combination with three algorithms for origin discrimination. However, SVM in combination with reflectance spectra and GS technique achieved the best prediction accuracy (93.18%) for tree species identification. These results demonstrated that visible and near infrared spectroscopy combined with chemometric techniques can be used for geographical origin and tree species determination.Abstract
Wood properties (e.g., physical, chemical, anatomical, and other mechanical parameters) are essential for forest cultivation and resource use (Vega et al. 2021, Villa et al. 2021). However, these property parameters are influenced by many factors. Two kinds of factors are of importance, one of which is genetics, that is, the differences in genotype that can produce different phenotypes or tree species (Hamilton and Potts 2007). The other factor is the environmental aspect including items such as origin and climate (Cordier et al. 2021). Therefore, different uses are employed for various tree species because of the differences in wood properties.
However, in order to maximize the economic benefits, some rare or expensive wood may be displaced by the cheap or illegal wood because of its similar appearance. For example, in the musical instruments industry, the best raw material for guitar soundboards and fingerboards is rosewood from Brazil (Phillips 2009), which may be displaced by North American rosewood by some companies. Likewise, in the furniture industry, the classical Chinese furniture made of yellow rosewood from Hainan is expensive and rare because the wood is derived from one of the most endangered trees in China (Huang et al. 2018), whereas, rosewood products from other locations are relatively cheap. However, it is difficult to distinguish these two types of wood even with perceptiveness or experience. Therefore, a fast, accurate, and nondestructive technology is needed to classify tree species and geographical origin for large-scale samples.
Visible and near infrared (Vis-NIR) spectroscopy has been widely used for qualitative and quantitative analysis in many applications (van Kollenburg et al. 2021, Kapoor et al. 2022, Paltseva et al. 2022). Vis-NIR spectra include a visible region, which contains wavelengths from 380 nm to 780 nm; and a near infrared region, which contains short-wavelength (780 to 1,100 nm) and long-wavelength (1,100 to 2,526 nm) near infrared regions (Scotter 2005). Sample information can be extracted by Vis-NIR spectra and appropriate chemometric techniques. Thus, the selection and optimization of chemometric methods are essential for the spectral analysis. For the quantitative analysis of spectral data, it is not necessary to measure the concentration of samples (Lu 2007). Therefore, the prediction results are highly affected by spectral quality and chemometric methods. Some spectral preprocessing approaches are able to remove noise or reduce irrelevant variables. These techniques include multiplication scatter correction, standard normalized variate, successive projections algorithm, etc. The performance of a single method or combination of methods has been analyzed in many studies (Galvão et al. 2007, Cai et al. 2010, Jiang et al. 2019). Another crucial part of chemometric analysis is building the calibration model and including therein linear and nonlinear modelling techniques.
In the forestry field, the traditional techniques of geographical origin and tree species discrimination are laboratory detection methods, or the sensory means performed by the relevant working personnel, which have the disadvantages of being time-consuming, labor-intensive, and destructive. Populus davidiana, Tilia tuan Szyszyl., Ulmus pumila L., Acer mono Maxim., and Larix gmelinii have been widely planted as the ecological public welfare forest with high value in terms of ecological, economic, and social benefits. For instance, the stems of Ulmus pumila L. are used as materials in furniture and vehicles; and additionally, it is a medicinal plant with bark, leaves, and winged fruits that could play an important role in treating anxiety and diuretics (Wang et al. 2020). Additionally, Tilia tuan Szyszyl. is a medicinal and honey plant that is essential for reducing the risk for heart disease, high blood pressure, constipation, and insomnia (Wu and Zhang 2022). This study aimed to investigate the feasibility of origin and tree species classification using Vis-NIR spectra in combination with chemometric methods. Spectral transformation algorithm was employed to analyze the performance of various spectral data. A support vector machine (SVM) was used to classify wood geographical origin and tree species. Three algorithms (i.e., grid search [GS], genetic algorithm [GA], and particle swarm optimization [PSO]), were adopted to optimize the parameters of SVM models, respectively.
Materials and Methods
Wood sample origin information
In order to better analyze the performance of Vis-NIR spectra in combination with chemometric methods for wood origin and tree species identification, 354 samples with 4 species were collected from the Jinsha forest of Heilongjiang Province (131°08′ to 131°21′E, 45°44′ to 45°53′N). One hundred seventy six samples with two species came from the Jingshan forest of Jilin Province (126°30′ to 127°16′E, 42°06′ to 42°48′N), China (Fig. 1). Taking into account the influence of origin on tree species discrimination, the same tree species of Chinese white poplar samples of the same age-class were harvested from two regions, respectively. The first location (Heilongjiang Province) has temperate continental monsoon climate and dark brown soil, with a total area of 2,765 hm2. Average annual relative humidity is between 65 and 75 percent (Jiang 2014). The second region (Jilin Province) is East Asian monsoon climate and average elevation is 560 m (Tan 2012). The main soil types are black soil and loess soil with soil thickness of 35 to 50 cm (Liu et al. 2012).
Samples preparation and spectral measurement
Twenty seven standard woods, including eight Acer mono Maxim., four Populus davidiana, eight Ulmus pumila L., and seven Tilia tuan Szyszyl., were randomly harvested from the first location. Five Populus davidiana and five Larix gmelinii were collected from the second location at the same time. These tree species were represented by A, B, C, D, E, and F, respectively. Five-centimeter disks were made from each tree and 530 wood samples with the dimensions of 2 by 2 by 2 cm3 were collected from these two regions for spectral measurement (Table 1).
The reflectance spectra of the wood samples were measured on the cross-section using a spectrometer with the wavelength range from 350 to 2,500 nm (LabSpec Pro, Analytical Spectral Devices, Boulder, Colorado, USA). The spectral resolution was 3 nm@700 nm, 10 nm@1,400/2,100 nm. Two spectra of each sample were measured and the average spectrum was treated as the raw spectrum after preheating the spectrometer for half an hour. A white panel, primarily consisting of everypolytetraflouroethylene (PTFE), was used to calibrate every three wood samples (Davari et al. 2022).
Chemometric analysis
Spectral preprocessing.
The quantitative analysis of wood origin and tree species identification is mainly affected by the spectral quality and discrimination models. In terms of spectral data, regardless of the influence of environment factors and instrumental background, the various spectral forms are essential for classification results. For this reason, the raw reflectance spectra (R) were subjected to the reciprocal (1/R) and logarithm reflectance [log(1/R)] based on spectral transformation technology to reduce the nonlinear relationship and multiplicative light-scattering effects (Zhuang et al. 2017). The statistics formulas of spectral transformation were determined by the following equations: where Xmn represents raw reflectance spectral matrix, and m and n are the number of samples and wavelength variables, respectively. Transformation technology was implemented in Matlab R2010b (MathWorks, Natick, Massachusetts, USA).
Development of classification models.—
The SVM was employed to establish classification models. The basic idea of SVM is to determine the optimal hyperplane that can separate data samples with the largest isolation edge. The kernel function is used to map spectral data onto the feature space. Spectral data can be fitted using the following equation: where (xi,yi) is spectral data of training set, i = 1,2,…, n, yi ∈ R, n is the number of samples, w is weight vector, g(x) is nonlinear mapping function, and b is a threshold value. According to the structural risk minimization principle, the optimal separating hyperplane should satisfy
The optimization problem is transferred into a classification problem by adding the relaxation factor (Yang 2021). It can be expressed as follows:
Here, c is penalty function, ξi is relaxation factor (ξi >0). According to the Lagrange algorithm, Equation 6 can be transferred into the following equation: where K(xi,xj) represents kernel function. There are four types of kernel function— linear, radial basis function (RBF), sigmoid, and polynomial kernel. RBF was applied in this study because of the advantage of good prediction ability. The SVM was conducted using Matlab R2010b.
The optimization of models.—
SVM in combination with kernel function can solve nonlinear identification problem. However, the optimization of parameters is essential for achieving good prediction accuracy (Tian et al. 2012). Three algorithms, including grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO), were compared to optimize the parameters of penalty factor (c) and kernel function (g) in the discrimination of wood origin and tree species based on various spectral data.
As for the GS method, the grid was established from 2−8 to 28 with the moving step of 20.5. The fitness extreme value can be updated according to the grid parameters until the best c and g are achieved. For the GA and PSO techniques, initial sizepop and maxgen are 20 and 100, respectively. Five-fold cross-validation was performed for the SVM model optimized by these three algorithms. The model with the largest accuracy rate of cross-validation was used to predict after five runs in this study.
Overview of classification analysis.—
According to the spectral transformation technology, the raw reflectance spectra (R) of each tree species were subjected to the reciprocal (1/R) and logarithm reflectance [Log(1/R)], SVM models were applied to classify origins and tree species based on three types of wood spectra. The GS, GA, and PSO methods were employed to optimize the parameters of the SVM model and RBF kernel function. The optimal classification model can be obtained based on the accuracy rate of cross-validation. The category labels of samples are shown in Table 2.
Results and Discussion
Wood spectral analysis
The average spectrum in various spectral forms based on spectral transformation technology is shown in Figure 2. Spectral curves of each tree sample show the similar pattern tendency for a specific spectral form. In addition, the feature wavelength variables, namely crests and troughs, are almost at the same position. Except for reflectance spectra (R), good spectral consistency is achieved according to spectral distance in the range of 750 to 1,750 nm. However, a large difference is obtained in the visible light spectrum for each tree. This was mainly because the pigment absorption is located in the visible region (Moberg et al. 2002), and there exist the color difference between heartwood and sapwood. In addition, regardless of various spectral forms, the spectra of Larix gmelinii (softwood) are different from other trees (hardwood). For example, the maximum peak value was generated by L. gmelinii wood with reflectance spectra. The reason may be due to the difference of physicochemical and anatomical properties between hardwood and softwood.
It can be seen in Figure 2a and c that six trees showed the obvious peaks for R and Log(1/R) spectra. As for Log(1/R) spectra, the absorption peak at 1,448 nm was associated with the first overtone of O-H stretching in lignin, and 1,940 nm was related to O-H of water (Bassett et al. 1963). In terms of R spectra, the feature peaks at 1,212 nm, 1,447 nm, 1,929 nm, and 2,200 nm were attributed to the C-H and O-H bands of cellulose, lignin, and water (Mitsui et al. 2008). However, it is difficult to classify the geographical origin and tree species through observing these spectra because of spectral overlap and bandwidth. Therefore, the SVM method was employed to build discrimination models.
Geographical origin classification models
The parameters of penalty factor (c) and kernel function (g) are essential for the SVM model, which determine the number of support vectors and the behavior of kernel. In order to select the best hyperparameters of the SVM models, the GS, GA, and PSO algorithms were employed to optimize these parameters of geographical origin classification models based on various spectral data, respectively. The parameter optimization processes of Log(1/R) spectra are shown in Figure 3.
As can be seen in Figure 3, as for the GS method of Log(1/R) spectra, the accuracy increases gradually and obtains the maximum value with the increasing c value. For the GA method, the fitness value increases rapidly before 41 evolution iterations. In contrast, the fitness value tends to stabilize at approximately 100 evolution iterations for the PSO method. The parameter optimization results for each algorithm are shown in Table 3.
As seen in Table 3, in terms of penalty factor, different parameter values were achieved with various algorithms based on three spectral data. Regardless of these algorithms, a small g value was obtained for these three spectral forms. This demonstrated that the linear limit is one of the major impact factors in this study. The performance of SVM models based on the optimal hyperparameters for geographical origin classification is shown in Figure 4.
Performance of origin classification with these three algorithms (GS, GA, and PSO) for a specific spectral form was the same, and the prediction results of SVM models optimized by the GS algorithm with various spectral data were shown in Table 4. As can be seen, the SVM model has demonstrated good performance with an accuracy rate of 100 percent for calibration set (data not shown) and prediction set, except for R spectra. As for the performance of R spectra, two samples from Heilongjiang Province (location 2) of the prediction set were misjudged by the samples of Jilin Province (location 1), and the same accuracy rate of 98.48 percent was generated for various algorithms. This indicated that the performance of Log(1/R) and 1/R spectra were better than that of R spectra in the discrimination of geographical origin.
Tree species discrimination models
In order to better compare the ability of spectral transformation and these algorithms for tree species discrimination, and to take the impact of geographical origin on tree species classification into account, the same tree species was collected from two locations in this study. The clustering results of tree species with various spectral forms and algorithms are shown in Table 4.
As the table shows, regarding R spectra, the calibration set achieved the same classification results for three algorithms with an accuracy rate of 98.49 percent, and the misjudgment occurred only in three types of tree species (A, B, and C). However, different accuracies were obtained for the prediction set. The performance of the GS method was better than that of the GA and PSO techniques. This may be due to the fact that the number of misjudgments of tree species B was smaller than other two methods. In addition, the distributions of tree species B are different between the three algorithms. For example, 2 of 20 trees in the prediction set of tree species B with the GA method were misclassified as tree species A, C, and D, respectively. However, 1 of 20 trees were misjudged as A and C, respectively, and 3 trees were misclassified as D for the PSO technique. Additionally, for the GA mean, although the distributions of misjudgment of tree species D were different from other two methods, the number of misjudgments was the same as the others. The best discrimination results were demonstrated by the GS method for R spectra with the prediction accuracy rate of 93.18 percent.
In terms of 1/R spectra, different results were generated among three algorithms. Additionally, compared with R spectra, the accuracy rate was decreased for calibration and prediction set except for the calibration set of the PSO method. The best accuracy rate was achieved with the GS technique for 1/R spectra. As for Log(1/R) spectra, the performance of the calibration set was improved except for the model optimized by the GA method. However, the prediction accuracy of GA was the same as the PSO technique because of the same number of misjudgments. The best results of the prediction set were achieved by the GS method with the accuracy rate of 93.18 percent. Additionally, regardless of various algorithms and spectral forms, an accuracy rate of 100 percent was obtained for the tree species from the second location. This indicated that the performance of discrimination of tree species from the second location was better than that of trees from the first location for these algorithms. Compared with other studies (Korpela et al. 2011, Lang et al. 2017, Zulfa et al. 2020), although an accuracy rate of 100 percent was achieved for the classification of tree species (The second location), there existed a large number of misjudgments for A, B, and C species from the first location. This may be due to the single level of tree species spectra or the spectral noise. Therefore, spectral data at different scales and some spectral denoising techniques should be analyzed in future scopes.
Conclusions
The combination of Vis-NIR and chemometric techniques have been shown to be suitable methods for the determination of geographical origin and tree species. For origin classification, except for reflectance spectra, the prediction accuracy of 100 percent was achieved with three algorithms. However, reflectance spectra together with the GS technique demonstrated the best performance of tree species discrimination. Furthermore, these spectral forms and algorithms displayed good performance of tree species identification for the second location according to the discrimination results of Populus davidiana wood. Therefore, an appropriate combination of various algorithms can be an alternative for geographical origin and tree species classification. Although the best results were achieved by Vis-NIR spectra and chemometric methods, the limitation of this study is that the models can be used for the determination of these five tree species (Populus davidiana, Tilia tuan Szyszyl., Ulmus pumila L., Acer mono Maxim., and Larix gmelinii). Therefore, the spectral data of other tree species, especially for the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) -listed tree species, should be added in future studies.
Contributor Notes
The authors are, respectively, Lecturer (yingli@nefu.edu.cn) and Professor (xhk1512@163.com [corresponding author]), College of Energy and Transportation Engineering, Inner Mongolia Agric. Univ., Hohhot 010018, China; Professor, Forest Products Development Center, School of Forestry and Wildlife Sci., Auburn Univ., Auburn, Alabama (brianvia@auburn.edu); and Professor, College of Engineering and Technol., Northeast Forestry Univ., Harbin 150040, China (yaoxiangli@nefu.edu.cn). This paper was received for publication in February 2022. Article no. FPJ-D-22-00011R1.