2.4.6. Conclusions (Dr. Frank Dieterle)

Frank Dieterle

Ph. D. Thesis

2. Theory – Fundamentals of the Multivariate Data Analysis

2.4. Data Splitting and Validation

2.4.6. Conclusions

Home
News
About Me
Ph. D. Thesis
	Abstract
	Table of Contents
	1. Introduction
	2. Theory – Fundamentals of the Multivariate Data Analysis
		2.1. Overview of the Multivariate Quantitative Data Analysis
		2.2. Experimental Design
		2.3. Data Preprocessing
		2.4. Data Splitting and Validation
			2.4.1. Crossvalidation
			2.4.2. Bootstrapping
			2.4.3. Random Subsampling
			2.4.4. Kennard Stones
			2.4.5. Kohonen Neural Networks
			2.4.6. Conclusions
		2.5. Calibration of Linear Relationships
		2.6. Calibration of Nonlinear Relationships
		2.7. Neural Networks – Universal Calibration Tools
		2.8. Too Much Information Deteriorates Calibration
		2.9. Measures of Error and Validation
	3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
	4. Experiments, Setups and Data Sets
	5. Results – Kinetic Measurements
	6. Results – Multivariate Calibrations
	7. Results – Genetic Algorithm Framework
	8. Results – Growing Neural Network Framework
	9. Results – All Data Sets
	10. Results – Various Aspects of the Frameworks and Measurements
	11. Summary and Outlook
	12. References
	13. Acknowledgements
Publications
Research Tutorials
Downloads and Links
Contact
Search
Site Map
Print this Page

2.4.6. Conclusions

When comparing the advantages and disadvantages of the different subsampling algorithms bootstrapping and random subsampling are most suited for splitting the data into calibration, test and validation subsets. As the user definable ratio between the sizes of the different subset allows a high flexibility, the random subsampling procedure was used to split the data into calibration, test and monitor data sets in this work, whereas for most data sets a static external validation set was recorded and used. The monitor set for the early-stopping procedure of the neural networks (see section 2.7.3) was generated by a modified full crossvalidation procedure, which speeds up learning and which is described in detail in [28].

Besides of the averaging effect of the subsampling procedure, the comparison of the standard deviations between the predictions of the test data of the different subsets additionally allows an estimation of the robustness of the calibration method. A high standard deviation is an indication of the calibration being subject to the random partitioning of the data. If the quality of the calibration and prediction significantly depends on the perturbation of the data sub sets, the calibration method is not very robust.

Page 39