Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ewis:laboratoare:09 [2023/05/10 17:43]
alexandru.predescu [Exercises]
ewis:laboratoare:09 [2023/05/10 18:02] (current)
alexandru.predescu [K-Means Clustering]
Line 71: Line 71:
  
 <note tip>​Clustering can also be used to predict new data based on the identified patterns. If you want to predict the cluster for new points, just find the centroid they'​re closest to</​note>​ <note tip>​Clustering can also be used to predict new data based on the identified patterns. If you want to predict the cluster for new points, just find the centroid they'​re closest to</​note>​
 +
 +The following code generates a random array of points and performs K-Means Clustering.
 +
 +<code python>
 +import numpy as np
 +import matplotlib.pyplot as plt
 +from sklearn.cluster import KMeans
 +from sklearn import metrics
 +
 +X = 10 * np.random.randn(100,​ 2) + 6
 +kmeans_model = KMeans(n_clusters=3)
 +kmeans_model.fit(X)
 +
 +plt.scatter(X[:,​ 0], X[:, 1], c=kmeans_model.labels_,​
 +            cmap='​rainbow',​ label="​points"​)
 +
 +plt.show()
 +</​code>​
 +
 +{{ :​ewis:​laboratoare:​lab9:​random_points_clustering.png?​400 |}}
  
 === Choosing the optimal number of clusters === === Choosing the optimal number of clusters ===
Line 93: Line 113:
   * $\bar{x_j}$ = cluster centroid $j$   * $\bar{x_j}$ = cluster centroid $j$
  
-== The Elbow Method == +The WCSS (inertia) is already provided in the result.
- +
-Below is a plot of sum of squared distances (WCSS). If the plot looks like an arm, then the elbow on the arm is optimal k. In this example, the optimal number of clusters is 4. +
- +
-{{ :​ewis:​laboratoare:​lab9:​elbow_method.png?​400 |}} +
- +
-The following code generates a random array of points and performs K-Means Clustering. ​The WCSS (inertia) is already provided in the result.+
  
 <code python> <code python>
-import numpy as np 
-from sklearn.cluster import KMeans 
-from sklearn import metrics 
-X = 10 * np.random.randn(100,​ 2) + 6 
-kmeans_model = KMeans(n_clusters=3) 
-kmeans_model.fit(X) 
-labels = kmeans_model.labels_ 
 inertia = kmeans_model.inertia_ inertia = kmeans_model.inertia_
 print(inertia) print(inertia)
 </​code>​ </​code>​
 +
 +== The Elbow Method ==
 +
 +Below is a plot of sum of squared distances (WCSS). If the plot looks like an arm, then the elbow on the arm is optimal k. In this example, the optimal number of clusters is 4.
 +
 +{{ :​ewis:​laboratoare:​lab9:​elbow_method.png?​400 |}}
  
  
Line 123: Line 136:
   * -1 – the sample is assigned to the wrong cluster   * -1 – the sample is assigned to the wrong cluster
  
-The clustering evaluation using both Elbow Method and Silhouette Coefficient is shown below. In this example, the optimal number of clusters is 4, as shown by both methods (looks like an arm, has the highest silhouette coefficient,​ k=4). +The Silhouette Score is calculated using the scikit-learn provided function //​silhouette_score//​.
- +
-{{ :​ewis:​laboratoare:​lab9:​clustering_evaluation.png?​400 |}} +
- +
-The following code generates a random array of points and performs K-Means Clustering. ​The Silhouette Score is then calculated using the scikit-learn provided function //​silhouette_score//​.+
  
 <code python> <code python>
-import numpy as np +s = metrics.silhouette_score(X, ​kmeans_model.labels_, metric='​euclidean'​)
-from sklearn.cluster import KMeans +
-from sklearn import metrics +
-X = 10 * np.random.randn(100,​ 2) + 6 +
-kmeans_model = KMeans(n_clusters=3) +
-kmeans_model.fit(X) +
-labels = kmeans_model.labels_ +
-s = metrics.silhouette_score(X, ​labels, metric='​euclidean'​)+
 print(s) print(s)
 </​code>​ </​code>​
 +
 +The clustering evaluation using both Elbow Method and Silhouette Coefficient is shown below. In this example, the optimal number of clusters is 4, as shown by both methods (looks like an arm, has the highest silhouette coefficient,​ k=4).
 +
 +{{ :​ewis:​laboratoare:​lab9:​clustering_evaluation.png?​400 |}}
  
 <note tip> <note tip>
ewis/laboratoare/09.1683729820.txt.gz · Last modified: 2023/05/10 17:43 by alexandru.predescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0