K-Fold Cross-Validation and Kruskal-Wallis K Test

Cross-validation is a technique used in machine learning and statistics to evaluate the performance of a predictive model. It involves partitioning a dataset into subsets, training the model on some of these subsets, and evaluating its performance on the remaining subset.

K-fold cross-validation is a specific approach to cross-validation where the original dataset is divided into K equal-sized folds. The model is trained on K-1 of these folds and validated on the remaining fold. This process is repeated K times, with each fold used as the validation set exactly once. 

  • Split into K equal parts.
  • For training and validation: Train on K-1 parts, and validate on the first part.
  • Repeat this K times with a different validation part each time.
  • To evaluate performance: Measure accuracy, mean squared error, etc.
  • To get the overall performance: Average the performance metrics from all K iterations.

In our project, we used the Kruskal-Wallis H test which is a non-parametric statistical test used to determine whether there are statistically significant differences between the medians of three or more independent variables.

  • The statistic value is a measure of the overall difference in ranks among the groups being compared. The calculated H statistic is approximately 896.813.
  • The p-value is a measure of the probability of observing the data, assuming the null hypothesis is true. It tells us how likely it is to observe such an extreme H statistic by chance alone if there were no actual differences between the groups.
  • The p-value in this output is approximately 1.817×(10)^−195, an extremely small value close to zero.
  • The small p-value suggests strong evidence against the null hypothesis. With such a small p-value, it’s safe to reject the null hypothesis and conclude that there are significant differences in medians among the groups involved in the Kruskal-Wallis H test.

 

Leave a Reply

Your email address will not be published. Required fields are marked *