Previous Topic Back Forward Next Topic
Print Page Frank Dieterle
 
Ph. D. ThesisPh. D. Thesis 2. Theory – Fundamentals of the Multivariate Data Analysis 2. Theory – Fundamentals of the Multivariate Data Analysis 2.4. Data Splitting and Validation2.4. Data Splitting and Validation 2.4.4. Kennard Stones2.4.4. Kennard Stones
Home
News
About Me
Ph. D. Thesis
  Abstract
  Table of Contents
  1. Introduction
  2. Theory – Fundamentals of the Multivariate Data Analysis
    2.1. Overview of the Multivariate Quantitative Data Analysis
    2.2. Experimental Design
    2.3. Data Preprocessing
    2.4. Data Splitting and Validation
      2.4.1. Crossvalidation
      2.4.2. Bootstrapping
      2.4.3. Random Subsampling
      2.4.4. Kennard Stones
      2.4.5. Kohonen Neural Networks
      2.4.6. Conclusions
    2.5. Calibration of Linear Relationships
    2.6. Calibration of Nonlinear Relationships
    2.7. Neural Networks – Universal Calibration Tools
    2.8. Too Much Information Deteriorates Calibration
    2.9. Measures of Error and Validation
  3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
  4. Experiments, Setups and Data Sets
  5. Results – Kinetic Measurements
  6. Results – Multivariate Calibrations
  7. Results – Genetic Algorithm Framework
  8. Results – Growing Neural Network Framework
  9. Results – All Data Sets
  10. Results – Various Aspects of the Frameworks and Measurements
  11. Summary and Outlook
  12. References
  13. Acknowledgements
Publications
Research Tutorials
Downloads and Links
Contact
Search
Site Map
Print this Page Print this Page

2.4.4.   Kennard Stones

The Kennard Stones algorithm [23]-[25] has gained an increasing popularity for splitting data sets into two subsets. The algorithm starts by finding 2 samples that are the farthest apart from each other on the basis of the input variables. These 2 samples are removed from the original data set and put into the calibration data set. This procedure is repeated until the desired number of samples has been reached in the calibration set. The advantages of this algorithm are that the calibration samples map the measured region of the variable space completely and that the test samples all fall inside the measured region. Yet, this algorithm is only usable for a single subsampling run, as the partitioning of the data is unique rendering the algorithm for a resampling procedure unusable.

Page 37 © Frank Dieterle, 03.03.2019 Navigation