Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ewis:laboratoare:09 [2023/05/10 17:59]
alexandru.predescu [K-Means Clustering]
ewis:laboratoare:09 [2023/05/10 18:02] (current)
alexandru.predescu [K-Means Clustering]
Line 71: Line 71:
  
 <note tip>​Clustering can also be used to predict new data based on the identified patterns. If you want to predict the cluster for new points, just find the centroid they'​re closest to</​note>​ <note tip>​Clustering can also be used to predict new data based on the identified patterns. If you want to predict the cluster for new points, just find the centroid they'​re closest to</​note>​
 +
 +The following code generates a random array of points and performs K-Means Clustering.
 +
 +<code python>
 +import numpy as np
 +import matplotlib.pyplot as plt
 +from sklearn.cluster import KMeans
 +from sklearn import metrics
 +
 +X = 10 * np.random.randn(100,​ 2) + 6
 +kmeans_model = KMeans(n_clusters=3)
 +kmeans_model.fit(X)
 +
 +plt.scatter(X[:,​ 0], X[:, 1], c=kmeans_model.labels_,​
 +            cmap='​rainbow',​ label="​points"​)
 +
 +plt.show()
 +</​code>​
 +
 +{{ :​ewis:​laboratoare:​lab9:​random_points_clustering.png?​400 |}}
  
 === Choosing the optimal number of clusters === === Choosing the optimal number of clusters ===
Line 92: Line 112:
   * $x_i$ = data point $i$   * $x_i$ = data point $i$
   * $\bar{x_j}$ = cluster centroid $j$   * $\bar{x_j}$ = cluster centroid $j$
- 
-== The Elbow Method == 
- 
-Below is a plot of sum of squared distances (WCSS). If the plot looks like an arm, then the elbow on the arm is optimal k. In this example, the optimal number of clusters is 4. 
- 
-{{ :​ewis:​laboratoare:​lab9:​elbow_method.png?​400 |}} 
- 
-The following code generates a random array of points and performs K-Means Clustering. 
- 
-<code python> 
-import numpy as np 
-import matplotlib.pyplot as plt 
-from sklearn.cluster import KMeans 
-from sklearn import metrics 
- 
-X = 10 * np.random.randn(100,​ 2) + 6 
-kmeans_model = KMeans(n_clusters=3) 
-kmeans_model.fit(X) 
- 
-plt.scatter(X[:,​ 0], X[:, 1], c=kmeans_model.labels_,​ 
-            cmap='​rainbow',​ label="​points"​) 
- 
-plt.show() 
-</​code>​ 
- 
-{{ :​ewis:​laboratoare:​lab9:​random_points_clustering.png?​400 |}} 
  
 The WCSS (inertia) is already provided in the result. The WCSS (inertia) is already provided in the result.
Line 125: Line 119:
 print(inertia) print(inertia)
 </​code>​ </​code>​
 +
 +== The Elbow Method ==
 +
 +Below is a plot of sum of squared distances (WCSS). If the plot looks like an arm, then the elbow on the arm is optimal k. In this example, the optimal number of clusters is 4.
 +
 +{{ :​ewis:​laboratoare:​lab9:​elbow_method.png?​400 |}}
  
  
Line 136: Line 136:
   * -1 – the sample is assigned to the wrong cluster   * -1 – the sample is assigned to the wrong cluster
  
-The clustering evaluation using both Elbow Method and Silhouette Coefficient is shown below. In this example, the optimal number of clusters is 4, as shown by both methods (looks like an arm, has the highest silhouette coefficient,​ k=4). +The Silhouette Score is calculated using the scikit-learn provided function //​silhouette_score//​.
- +
-{{ :​ewis:​laboratoare:​lab9:​clustering_evaluation.png?​400 |}} +
- +
-The Silhouette Score is then calculated using the scikit-learn provided function //​silhouette_score//​.+
  
 <code python> <code python>
Line 146: Line 142:
 print(s) print(s)
 </​code>​ </​code>​
 +
 +The clustering evaluation using both Elbow Method and Silhouette Coefficient is shown below. In this example, the optimal number of clusters is 4, as shown by both methods (looks like an arm, has the highest silhouette coefficient,​ k=4).
 +
 +{{ :​ewis:​laboratoare:​lab9:​clustering_evaluation.png?​400 |}}
  
 <note tip> <note tip>
ewis/laboratoare/09.1683730772.txt.gz · Last modified: 2023/05/10 17:59 by alexandru.predescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0