7. Opublikowane badania własne
228 J Orzrt r» al/Furt 117(2014) 224-229
TaMc 1
Diicnmirunon re tulu obumcd from the PLS-DA modfłs wirh ehe opumaJ number of foctort. f and R indirate the number of umpln from ehe fuli raee of dury and ehe rrbaerd eax oil groopt. mpectivf ly.
Biiury dependent vj ruble |
The considered compound |
Samples in the model sel |
Samples in the test set |
/ |
Number of correctly discriminated samples in model set |
Number of correctly discrimiiutrd umplri in test |
ni |
SY124 |
30(F-15 *• 15) |
30 (F-9/1-21) |
3 |
30(100%) |
30(100%) |
n2 |
SR 19 |
30 (F- 15 * - 15) |
30 (F- 6 R- 24) |
10 |
30 (100%) |
23(77%) |
n3 |
SY124 jnd SR19 |
14 (F-7R-7) |
46 (F-2R-44) |
6 |
14(100%) |
35 (78%) |
M |
SYI24 or SRI9 |
30(F- 15 R - 15) |
30(F-9R-21) |
10 |
30(100%) |
27(90%) |
After the PCA analysis. multivariate discriminant models were construaed. For each of the four discrimination problems. samples were Split into model and test sets. according to the following procedurę:
(i) samples were divided into two subsets. i.e. denoted as *-1* and as T.
(ii) an equal number of samples was selected from each subset using the Kennard and Stone algorithm |26| (seleaing samples with unique characteristics) and induded into the model set. while the rcmaining samples formed the test set.
The number of samples defined as the fuli ratę of duty and rebated tax oil that were included in each model and test set is difTerent (see Tablc 1).
The model and test sets were then used for the construction and validation of discriminant models. In the case of discriminant problem ni. three latent factors were found to be sufficient to constmct a model with an acceptable performance, i.e. all model and test sets samples are correctly recognized.
For the second discrimination problem (n2), the optimal discrimination results were achieved with a model containing ten la-rent factors. However. its performance is worse compared to the previous model. i.e. all samples from the model set are recognized correctly but seven samples (30%) from the test set are incorrectly classified. A better performance is observed when the samples are discriminated with respect to the concentrations of a marker and dye. 86% and 93% of the test set samples are correctly classified using the discrimination models constructed for the n3 and n4 discrimination criteria. respectively. The discrimination results ob-tained from all of the models constructed are summarized in Tablc 1.
An analysis of the results obtained from the optimal models al-lows the conclusion to be drawn that discrimination criterion ni is the best, i.e. the concentration of SY124 is a marker that indicates a possible sorption process with a high probability.
Misclassified samples according to discrimination criterion n2. are characterized by concentration of SR19 in the rangę from 0 to 8mgL *. However. concentration of SY124 in these samples is high. i.e. 9 (two samples) and 10 (five samples) mg L'It can be concluded that discrimination criterion based on the SR19 concentration is affected to some extent by presence of SY124. Thus. criterion n2 can be expccted to yield the worst model s performance (from all possible discrimination schemes).
According to n3 criterion samples with a high SY124 concentration (i.e. 9mgL"' and above) are incorrectly discriminated. For four wrongly discriminated samples using criterion n4 it is impos-sible to explain the fact of their incorrect discrimination using con-centrations of SY124 and SR 19.
EEM fingerprints can be modeled using the n-way approach. e.g. n-PLS (27|. The n-PLS method can be regarded as an extension of PIS to handle data with tri-linear structure. and by including this property into modeling. compared with classic PLS, may have a better performance. Initially, we have modeled our data using the n-PLS approach. However, for our data. in generał. n-PLS did not outperformed than standard PLS in terms of discrimination error. In case of criterion ni. the PLS model (with three factors only) performed better than morę complex n-PLS (with four factors). The complexity of the optimal n-PLS model for criterion n2 contains eleven factors (one morę than in PLS) and the performance of this model is worse (in case of PLS correct dassification ratę approaches 77%. whereas for n-PLS 70%). The n-PLS model constructed for criterion n3 requires six factors and performs virtually the same as the six-factor PLS model. Criterion n4 can be modeled using n-PLS with the same efTectiveness using one factor less than for the optimal PLS model construaed. Since the n-PLS models are morę complex. we have decided to discuss results only using a classic PLS modeling.
4. Conduslons
ln this paper. a method to facilitate the discrimination of a rebated tax diesel oil from samples after the sorption process was introduced. This goal was achieved using the excitation-emission fluorescence spectroscopic signals that were preprocessed and then modeled using chemometric tools. The data exploration performed using PCA supported the hypothesis that Solvent Yellow 124 can be regarded as a possible marker for the detection of an illegal sorption process. The best discrimination results based on the PLS-DA model. i.e. 100% correaly discriminated samples (both in the model and test sets) were obtained by focusing on the marker concentration (discrimination criterion nl). Other discrimination models had a performance that was similar to the first model. The model with the worst predictive performance, only 77% of correctly classified samples. was constructed assuming the concentration of a dye. Solvent Red 19. as the discriminant criterion. The results that were obtained supporr the hypothesis that residues of a marker indicate a possible sorption process. The dye was praaically completely removed during the sorption procedurę. and thus the discrimination based on its concentration is the least efTective. The method presented here can be considered as a solution for the deteaion of illegally preprocessed oil samples. However. in order to obtain a generał solution. a larger pool of samples of commercially available diesel oil should be included in the data set that is used for the model construaion and valida-tion of the model. To adopt our procedurę to comply with regula-tions of other countries, their law need to be studied. A set of samples with marker and a particular dye (used in a considered country) need to be prepared and the cut-off value corresponding to assumed in the country law concentration level have to be used.
Simulated sorption process used in our experiment is one of the possible activities that influence properties of rebated tax diesel oil. To extend the proposed method to be used in a forensic context (useful in court trial) other activities (e.g. transport and storage conditions) need to be considered. It can be performed after adopt-ing. e.g. the likelihood ratio approach |28|.
Acknowledgements
M.D. wishes to express his gratitude to the Minister of Science and Higher Education of the Polish Republic for funding the scholarship.