Multilayer Bootstrap Networks


Original paper:


MATLAB code for dimensionality reduction:
 

 

[MBN code for dimension reduction](1.98MB) (version 3, last updated on 23th June 2016) 
 

MATLAB code for clustering:

 
[MBN code for clustering](0.57MB) (version 5, last updated on 23th June 2016): 
  1. This code uses the low-dimensional output of MBN as the input of k-means clustering. You may use other clustering algorithms to replace k-means freely. 
  2. Different from the MBN clustering in the main text of the original paper which reported the average clustering performance over multiple independent runs of k-means clustering, we select the clustering result that achieves the optimal objective value (i.e. the minimum mean squared error) of k-means clustering among 50 candidate objective values (i.e. 50 independent runs of k-means)  as the final result of this MBN-based clustering.
  3. The default size of the largest matrix the hardware can handle for each MATLAB worker is 10000x10000 or its equivalent size (defined by 'kmax'). If you wish to reproduce the result on MNIST in the paper, you need to specify parameter k as [8000, 4000, 2000, 1000, 500, 250, 125, 62, 31, 15] (or k1 = 8000) instead of using the default value 'kmax', since k1 is only around 1700 under the default setting of 'kmax'. If your computer has large enough memory (say, over 100GB), then you may run MBN with k1=35000 which produces a result of NMI=91.35% and ACC=96.64%. The 10-dimensional feature for producing the above result is downloadable here (5.2 MB).
  4. See here for the implementation of the calculation of normalized mutual information (NMI). See bestMap.m together with hungarian.m for the implementation of the calculation of clustering accuracy.

     


 
5 demo data sets: 
1.Wine.mat  (0.02MB)
2.New-Thyroid.mat (0.02MB)
3.Dermathology.mat (0.02MB)
4.MNIST_small_scale1.mat (2.5MB)
5.USPS.mat (3.8MB)
The last two demo data sets are not included in the MBN clustering code, please download them from the above.

Visualizations of demo data sets:

1. Wine (178 instances, 13 dimensions, 3 classes)

 
2. New-Thyroid (215 instances, 5 dimensions, 3 classes)

 
3. Dermathology (366 instances, 34 dimensions, 6 classes)

 
4. MNIST_small_scale1 (5000 instances, 784 dimensions, 10 classes)

 
5. USPS (11000 instances, 256 dimensions, 10 classes)

Other benchmark data sets in Appendix E:

 


[go back to homepage]

last updated on 07-03-2018