FuzzySKWIC : This extracts keywords and categorizes documents.
- choose n = 200 keywords based on IDF.
- Compute document frequency vectors xi based on the occurance of ith term in documents.
- Compute cik as the document frequency of the kth component of the ith cluster center vector.
- compute the cosine based dissimilarity as 1/n - xjk.cik