Due to the multidimensional complexity and redundancy between wavelengths in the visible and near infrared (Vis-NIR) region, the speed and accuracy of data analysis can be affected. This study aims to investigate the feasibility of simplifying high dimensional data based on transformation of the spectra and local correlation maximization (LCM). These two methods will be applied to determine the prediction accuracy of air-dry density of Ulmus pumila wood. In this study, the reflectance spectra (Refl.) were subjected to the reciprocal (1/Refl.) and logarithm reflectance to improve the spectra signal for prediction. LCM was developed for selecting spectral sensitive regions that were important in the prediction of density. A local correlation coefficient (r) criterion was developed such that if the r ≥ 0.75 (between wavelength and density), then partial least squares and support vector machine (SVM) were employed as the prediction method. Likewise, 2D correlation spectroscopy plots were used to further reduce the data matrix by removing redundant wavelengths. The results showed that (1) although the sensitive region of density was different, the region of r ≥ 0.80 was mainly in the Vis and NIR spectral region. Additionally, the performance of models developed from the sensitive region was better than that of data used from the less-sensitive region. (2) The SVM model was optimized by a genetic algorithm based on the log (1/Refl.) of the sensitive region. In conclusion, it was found that the spectral transformation presented better density estimation results ( = 0.909, root mean square error of calibration = 0.014) than when less sensitive wavelengths were used in the data matrix.Abstract
Wood density is a critical indicator of wood quality and variation in density can have a profound effect on end use applications (Prasetyo et al. 2018). Additionally, tissue density correlates with the morphological, physiological, and mechanical properties of wood as a result of strong intercorrelations with microfibril angle, stiffness, and strength (Wu et al. 2009, Li and Jiang 2013, Dahlen et al. 2018). The tree can begin to produce tissue of variable density as early as 3 months old, while genetics can be used to fine-tune tree density for improved product performance (Gonçalves et al. 2019). Although the traditional determination of density through gravimetric means has the best accuracy, it has been shown that visible and near infrared (Vis-NIR) spectroscopy can provide good in situ estimation of within-ring density without having to destroy the sample for conventional testing (Giroud et al. 2015).
Vis-NIR can also reduce the cost and time of traditional density-measurement methods. Vis-NIR spectroscopy—a simple, fast, and nondestructive method—has been widely applied to agriculture, petrochemical industries, food safety, and life sciences (Verbeek et al. 2014, Li et al. 2017, Siriphollakul et al. 2017, Zhuang et al. 2017). Many studies have demonstrated that Vis-NIR technology can be used for the analysis of components, species and habitat identification, and detection of wood preservation or modification in the field of forestry (Yang et al. 2012; Dou et al. 2016; Li et al. 2016, 2018a; Kurata 2017). The qualitative and quantitative relationship between spectra and wood properties can be obtained using chemometric methods. However, the speed and accuracy of spectral data analysis can be complex and slow because of the multidimensional nature and redundancy of the spectral data. Reduction in the spectra to only those wavelengths necessary for prediction could be useful in speeding up the process; however, a reduction in the data matrix content could result in lower prediction accuracy, and ways to maintain predictability with less spectral information are a key challenge when using Vis-NIR spectroscopy.
In the field of agriculture, Sun and Cheng (2010) found that leaf chlorophyll was highly correlated with their second derivation spectra at wavelengths 700, 670, 600, 500, 490, 440, and 410 nm. Guo et al. (2015) predicted paddy soil available nitrogen (AN) content using sensitive wavelengths (694, 2,058, and 2,189 nm) through 16 kinds of mathematical transformations and obtained good results (best R = 0.748). It can be found that the spectral sensitive region of a particular trait can be used for calibration and simplifying spectral data can result in good prediction accuracy in the agricultural sector. However, to our knowledge, we could not find other studies that used these transformation techniques to predict wood density, and feel this research would be useful to industry and academic entities.
In this study, Ulmus pumila wood samples were used for spectra collection and air-dry density determination. Different spectral transformations were first applied to the wood spectra to improve the sensitivity of the prediction model. Local correlation maximization (LCM) methods (Zhang et al. 2017) were developed for selecting the spectral sensitive region most correlated to air-dry density. The chosen sensitive region was used for model establishment with linear (partial least squares, PLS) and nonlinear methods (support vector machine, SVM), respectively.
Materials and Methods
Sample preparation
Ulmus pumila L. is one of the major commercial tree species in northeastern China (Xu et al. 2000). Eight Ulmus pumila trees were harvested from the location 126°30′–127°16′E, 42°06′–42°48′N, Jilin Province, China. Five-centimeter disks were cut from each tree at 1-m intervals along the stem with 69 total discs prepared for model calibration and spectra collection. The disks were air-dried in an environment controlled laboratory (temperature: 20°C ± 2°C; relative humidity: 65% ± 3%). Additionally, to reduce the roughness of sample surface, the cross-sections of all disks were polished using an electric plane.
Vis-NIR spectra collection and air-dry density measurement
Cell spacing, anatomy position–frequency, and annual ring width influence wood density, and these variables are more obvious when viewed in the cross-section. Therefore, the Vis-NIR spectra were collected from a cross-section of each sample using a LabSpec Pro FR/A114260 (Analytical Spectral Devices, Inc., Boulder, Colorado). Additionally, a traditional fiber-optic probe was replaced with a glare probe to obtain more spectral information. Before spectra collection, the spectrometer was calibrated with a commercial white plate made of polytetrafluoroethylene. Each sample was scanned three times and the average spectrum was regarded as the original spectrum. The air-dry density of the samples were measured at 12 percent moisture content (Standardization Administration of China 2009).
Spectral data analysis
Spectra of transformations
Even though the whole process of spectra collection was determined in a controlled environment, there can still be uncontrollable sources of error during spectra collection such as spectrometer signal, and environmental influence on data quality and corresponding accuracy and precision (Via et al. 2005). Therefore, the reflectance spectra (Refl.) were subjected to reciprocal (1/Refl.) and logarithm reflectance (log(Refl.) and log(1/Refl.)) to eliminate multiplicative effects (He et al. 2006). The spectra of transformations were implemented by using Matlab R2014b (MathWorks, Natick, Massachusetts).
Selection of sensitive regions to air-dry density variation
To simplify the complexity of modeling for multidimensional spectral data, the reflectance spectra (Refl.) was subjected to reciprocal and logarithm transformations. The LCM was used for selection of important wavelengths in the prediction of air-dry density. The region was deemed statistically important if the local correlation coefficient (r) was greater than or equal to 0.75 (r ≥ 0.75) when a correlation matrix was run between the spectra and density. The LCM was also implemented in Matlab R2014b (MathWorks).
In this study, 69 samples were randomly divided into a calibration set (50 samples) and a prediction set (19 samples). Given that X(m,n) is the matrix of the Vis-NIR spectral data, xij is the vector of X(m,n), and yi is the air-dry density value, where m is the number of samples (m = 50 for calibration set) and n is the wavelength of Vis-NIR spectra (n = 2,151). The LCM was used to analyze the correlation between xij and yi, and local correlation coefficients (r) for different spectra of transformations were obtained. High correlation should consist of high r (Mo 2015). The computation equations of r is shown as follows:
where x̄j and ȳ represent the mean of the vector of the Vis-NIR spectral data and the air-dry density value, respectively, which are given by
where i = 1, 2…m, j = 1, 2…n, the meanings of m and n are the same with Eq. (1).
Vis-NIR model calibration and evaluation
Vis-NIR technology is an indirect nondestructive method in which the method for the prediction of sample properties can be obtained through advanced chemometric procedures. In this study, the performance of linear and nonlinear methods, including PLS regression and SVM, were compared. PLS is a useful linear method in the modeling of multidimension spectral data. It simplifies spectral data and selects variables by correlating the independent variable yi with the Vis-NIR model. PLS was implemented in The Unscrambler V10.4 (CAMO Software AS, Oslo, Norway). PLS does inflate the error of the peaks in the loading vector; however, transformations such as those being explored have been shown to minimize this error (Via et al. 2014).
The SVM was first proposed to solve classification problems. In recent years, it has been shown that SVM is a powerful method for classification and regression in agriculture, life sciences, and other fields (Hajikhodaverdikhan et al. 2018, Oguntunde et al. 2018, Zhi et al. 2018). SVM was used for Vis-NIR model calibration because of the advantages of processing small sample sets and high dimensional spaces effectively. Grid search (GS) and a genetic algorithm (GA) were employed to optimize the cost parameter c and the radial basis function (RBF) kernel parameter gamma. The SVM was implemented in Matlab R2014b (MathWorks). It is hypothesized that SVM may be superior to more traditional methods of transformation that is commonly used during classification (Kurata 2017) and is the subject of this research.
After the sensitive regions that correlated to air-dry density were selected using the LCM algorithm, the PLS and SVM transformations were executed to establish Vis-NIR models, respectively. Additionally, the performance of the band with higher local correlation coefficients (r ≥ 0.80) in sensitive and less-sensitive regions (0.7 < r < 0.75) were compared. The performance of calibration and prediction models was evaluated based on determination coefficients (R2), root mean square error (RMSE), standard error of estimation (SEE), mean absolute percentage error (MAPE), and residual predictive deviation (RPD). Lower RMSE, SEE, MAPE and higher R2, RPD values indicate higher model accuracy (Yan et al. 2013). The criteria were calculated according to Eqs. (4) to Eq. (8), respectively.
where SD is standard deviation of prediction set.
Results and Discussion
Statistical characteristics of wood air-dry density
The statistical characteristics of the samples are illustrated in Table 1. The air-dry density ranged from 0.909 to 1.128 g/cm3, with an average value of 1.062 g/cm3. The standard deviation (SD) in the calibration and prediction set were similar and smaller than 0.05. Additionally, there was low negative skewness and positive kurtosis, indicating low scatter distribution.
Correlations between wavelength variables
As shown in Figures 1A through 1D, regardless of various spectral transformations, a relatively high correlation was obtained within each of the two regions (i.e., near 500 to 1875 nm, and 1800 to 2500 nm). It can be seen that the Vis-NIR spectral region showed high redundancy and selecting only highly sensitive variables related to the properties of interest is needed for simplifying the high-dimensional spectral-data matrix.
Sensitive region of air-dry density analysis
There was high redundancy of wavelength variables (Fig. 1), so the LCM was performed for selecting only those sensitive regions of air-dry density based on local correlation coefficients between different spectra and air-dry density. The correlation coefficients and frequency statistics are shown in Figure 2 and Table 2, respectively.
It was observed that the correlation levels between wavelengths were dependent on the type of spectra transformation performed. This suggests there is the potential for different transformation methods to yield better calibration models. In terms of reflectance spectra (Refl.), there was a negative relationship between Refl. spectra and air-dry density in the wavelength range of 593 to 813 and 1,148 to 1,036 nm, while the other bands exhibited a positive correlation.
For the 1/Refl. spectral data, there were negative correlations in the Vis-NIR spectral region (350 to 2,500 nm). Compared with the Refl. spectral data, the local correlation coefficients in the NIR spectral region of 1,897 to 2,476 nm were significant—they showed values up to 0.70.
Compared with Refl. spectral data, the log(Refl.) spectra improved the correlation in the Vis spectral region of 350 to 401 nm; among which, the correlation coefficient located in 367 nm was increased from 0.72 to 0.78, and the others were all improved to >0.80. Additionally, the correlations in the wavelength range of 1,890 to 2,438 nm were improved. Comparing 1/Refl. spectral data, it could be found that the r located in 1,922 nm, 1,932 nm, and 1,933 nm were improved to >0.80. According to the properties of logarithmic function, the distributions of r for log(1/Refl.) and log(Refl.) spectral data were symmetrical to wavelength axis.
Figure 3 shows the distribution of the sensitive wavelengths of air-dry density (r ≥ 0.75). Although the sensitive regions were different for Refl., 1/Refl., log(Refl.), and log(1/Refl.) spectral data, the distribution of r ≥ 0.80 were located at the Vis spectral region of 350 to 391 nm and NIR spectral region of 1,932 to 1,933 nm. The Vis spectral range was assigned to polycyclic aromatic hydrocarbons and their derivatives (Workman and Weyer 2007). As for the NIR spectral region, there was a narrow band associated with density variance. However, this band was quite close to the absorption peak of lignin and cellulose as associated with 1,900 nm (Üner et al. 2011). Additionally, it is associated with the combination of O–H deformation and stretching vibration (Schwanninger et al. 2011, Wójciak et al. 2014).
Establishment of Vis-NIR calibration models
To better analyze the effect of important spectral regions on the prediction of air-dry density, linear (PLS) and nonlinear (SVM) methods were employed. The results of the PLS model are shown in Table 3.
Regardless of the transformation, the performance of the model that uses the sensitive region (r ≥ 0.75) to predict air-dry density was better than when the less-sensitive region was included (Table 3). Compared with the less-sensitive region, the accuracy of models developed from very sensitive regions was indeed better. However, as for 1/Refl. spectral data, the accuracy of models developed from less-sensitive regions ( = 0.677) was comparable to the highly correlated region in the sensitive region ( = 0.675; Table 3). This may be due to the fact that the number of variables selected into models of 0.70 < r < 0.75 (no. = 347) were larger than others.
For the linear models, the best performance was from the log(1/Refl.) spectral data transformation on the sensitive region, with and root mean square error of calibration values of 0.870 and 0.017, respectively (Table 3). The next best model was when the log(Refl.) spectra transformation was used. Compared with the models using the less-sensitive region, for the models that used the highly correlated region, the was increased by 33.03 and 11.54 percent, respectively (Table 3). This demonstrated that the log(1/Refl.) spectral transformation coupled with the sensitive region of air-dry density not only reduced the dimension of the spectral data matrix, but also the model obtained a good fit. This was an indication that the quality of the information was perhaps improved while the amount of information needed for collection or analysis was reduced.
The improved models developed from spectra only from the sensitive region performed well, so this same reduced data matrix was used to build nonlinear models using SVM. The GS and GA were used for the optimization of the cost parameter c and RBF kernel parameter gamma. The results of SVM models were shown in Table 4.
For the different methods of transformation, performance of the SVM models optimized by GA was superior to that of the GS-optimized SVM and PLS models; were all >0.75 (Table 4). Although the SVM is a nonlinear modeling method, the results of SVM optimized by GS were inferior to the PLS models, except for log(1/Refl.) spectra. This could be caused by the nonheuristic algorithm of GS, which searches the cost parameter c and RBF kernel parameter gamma in a limited region. It should be noted that the reasonable selection of parameters is critical in modeling. Among the SVM models, the combination of log(1/Refl.) spectra and GA method achieved the best performance; the cost parameter c and RBF kernel parameter gamma were 99.653 and 0.001, respectively (Fig. 4). In comparison with GS-optimized SVM model and PLS model, the was increased by 0.44 and 4.48 percent, respectively.
The relationship between measured air-dry values and predicted values based on GA-optimized SVM models using log(1/Refl.) spectra are shown in Figure 5. The SVM model optimized by GA obtained good predictive performance, laying the basis for air-dry density estimation.
After a review of the current literature, few studies have focused on the comparison of different transformations of spectra prior to modeling wood properties, the most common spectra data matrix being the reflectance spectra or log(1/R) spectra (Ramirez et al. 2015, Inagaki et al. 2018, Li et al. 2018b). In the field of agriculture, Sun et al. (2018) applied five spectral transformations including reflectance spectra, reciprocal, reciprocal logarithm, first-order, and second-order differential spectra to predict the soil organic carbon (SOC) of samples from a coal mining area. They found that the spectral reflectance obtained the best results when predicting SOC, which differed from the results of this study, suggesting that there is not a universal transformation that works for all scenarios. This difference may be due to the difference of properties between soil and wood. In addition, the different denoising methods, such as Savitaky–Golay (SG), multiple scattering correction (MSC), and the combination of SG and MSC, were performed before the spectra of transformation in their study.
Vis-NIR spectral data often contain redundant information that makes modeling more difficult. To assist with this problem, the LCM was used to analyze which wavelength regions were sensitive to variation in air-dry density. It could be observed that the performance of models developed from the sensitive region was better than that of the less-sensitive region, regardless of the different spectral transformations. This is because specific absorption bands can be assigned to various wood chemical constituents such as lignin, cellulose, and hemicellulose (Cheng et al. 2018); and the spectra sensitive regions simply covary with this underlying chemistry. Additionally, the prediction accuracy of SVM model optimized by GA was >0.85 and superior to the results of Schimleck et al. (2018), which demonstrated the feasibility of simplifying the multidimension spectral data and keeping a good-fitting model based on LCM and GA-optimized SVM model.
Conclusions
This study investigated the effects of various combinations of spectral transformations and LCM algorithms on wood air-dry density estimation based on Vis-NIR spectroscopy. The correlation between density and spectra varied with the method of spectra transformation. The LCM algorithm selected the most statistically sensitive region of the spectra in relation to air-dry density (r ≥ 0.75) for Refl., 1/Refl., log(Refl.), and log(1/Refl.). Although the sensitive regions were different, there existed the same band (i.e., the region of r ≥ 0.80) in Vis spectral region (i.e., 350 to 391 nm) and NIR spectral region (i.e., 1,932 to 1,933 nm), which was critical for the prediction of density. The linear (PLS) and nonlinear (SVM) modeling results using sensitive regions demonstrated better accuracy than less-sensitive regions in density estimation, regardless of spectral transformations. In comparison with PLS and SVM models, the performance of GA-optimized SVM model based on the sensitive region was better than PLS and the GS-optimized SVM model. In conclusion, this study improved the prediction and accuracy of the air-dry density models by selecting only those wavelengths that were most sensitive.
Contributor Notes
The authors are, respectively, Graduate Researcher, College of Engineering and Technol., Northeast Forestry Univ., Harbin, China (yingli@nefu.edu.cn); Director and Research Fellow, Forest Products Development Center, SFWS, Auburn Univ., Auburn, Alabama (brianvia@auburn.edu, qzc0007@auburn.edu); and Graduate Researcher and Professor, College of Engineering and Technol., Northeast Forestry Univ., Harbin, China (1874670424@qq.com, yaoxiangli@nefu.edu.cn [corresponding author]). This paper was received for publication in January 2019. Article no. 19-00004.