What are clusters and how to do clustering?

Why a menace by Bach Bystander Division of why This is the formula for this X When you do this essentially what you’re doing this You’re shifting your data points to the center So this is my data points All the salmon data points in the summer Data points project them over here So these data points The experts opposes this point This Linus makes more And these data points really project it on the Y axis on the Y axis This is my Wiberg This is my expert So excitement is expert These will become posters This will become negatives Why I minus y bust These will post is it’ll be negatives.

What we’re essentially doing is you’re shifting your origin Tow the center of the mathematical This is what she’s gorgeous So now in this truck this new mathematical space your central it’s in terms of the values differently They’re different but their position in the mathematical space remains the same sense of measuring height in meters Your mission in hide and feed something Let’s go So physically there the sea the measurements change the unit states no question my closeness So table of temper life got one perfect matches that I want to find the anomaly Yeah.

What That also received the same thing And then only you can do that every time I go to the transformed Data dressed list So But when they look at the data for the no analysts purpose are idiots I wanted to have that reverse transformed And wouldn’t you know I could look up the data on find out what the findings all those things Because I’m Trans and they died only useful for me to do the model Not for any other purpose Because Dr Lee mislead many times because of factory knows who So when were transforming her Please keep in mind we’re not changing any kind of distributions Nothing.

We’re not changing anything We’re just measuring it using different skills That’s all we’re doing However when in when I put this in the production a new query point comes in It has to undergo the same transformation video to build our nominee mortals However when I’m doing idiot exploded re Data Analytics I always do it on the road I always do it on the raw data first and that’s why I am telling at this point is valued over there Many times he’ll be advice to convert your already turned to scale the data before you Do you clustering I want to demonstrate to you here today that you need to be careful there You can do it blindly Many times in a scale.

Your data Your clusters might go wrong a lot of times in the raw data we have attributes which naturally lead to clustering They naturally bring in some kind of clustering on them on By scaling those dimensions we dilute the clusters The natural tendency for plastering on their dimension is diluted when you actually skillet so let’s see what happens So when you talk about clustering will talk about a way of measuring the similarity dissimilarity between two points Xie Xie Xie Excite Dash on the J to Dimension Diamond Jay’s.

The Dimension excites one point excited ashes and you can call it Expo next week if you like Right So we need to have a distance calculation method Fortunately when you’re using Euclidean sorry When you’re using high rock insular and using came into clustering on the Euclidean distance is alone So let this is what a selling impact off normal a vision you’ll always be told this uh you’ll always be told to normalize it But what I’m saying is be careful Look at this I I generated some data and then on that date I played around with it I’ll share the court with which simple stuff.

This is my duty Is it some data points We can visually see their two clusters but actually we won’t know how many questions that that right now in this case visually we’re seeing two clusters on these two clusters When I run my K means clustering without doing any normalization on the raw data The two clusters I get are the red and the blue As you can see that is missing when a normalized my data Look at the scales This is these cold Look at the difference Here look at the difference in this is the score It gives me the right clusters are able to see the difference However look at this distribution here also VC to natural clusters When I don’t normalize it I get the two natural clusters.

When a normal is my data here using Z score again Look do you see the change in clusters They’re defusing so whether you should normal are not It depends on your understanding of the distributions of the data on their dimensions that you’re using for clustering somehow Unfortunately some or most of the articles books authors I come across the also just that to normalize their leader agreed most of the time it works but not always Yes so you export Give me taking second how he is I don’t know what I’m saying That’s what I’m saying You’ll actually have to Don’t Don’t take that thing That normalization will lead you to good clustering not necessary Might lead to the port list No no no.

It is how the data is distributed of the various dimensions For example here the data distribution on this dimension this dimension What is the difference off this damage in this dimension On this place drink and somebody quickly tell me what is the difference you’re seeing here in terms off the quality of the dimensions to create clusters This is a poor dimension for clustering It is unable to segregate the two clusters Compared to this dimension or this attribute this attribute is able to segregate the two clusters.

In your pair panel in the diagonal you’ll see two Garson’s in this dimension But it’s in this dimension You see only one goes in when you have two dimensions on Both of them were able to see clusters Do Gazans separated out in the You need to be careful Should we be applying Z score there or not Right So it depends on your analysis of the data There’s no straitjacket answered your question your understanding of the date on the videos dimensions that import this human being be convened So normally one of you.

For if I exactly like my thanks and I’m seeing some guy I know you’ve already given me This is how they get it I get by doing the last one we do more than just earnest But still one of the closest we got it is that you’re not mad with you It has been the target available values It’s just one of us is In fact when you’re doing clustering there is no concept of independent and target variables All dimensions are equal.

However if I’m very clear that there is a cartographic variable which is going to act as my target variable supervision and in the third I should remove it because I really want to understand the relationship with dimensions in different dimensions Are there any clusters on those dimensions I need to remove them Otherwise the target Very little influence In addition to the question Same question actually year get up once we just class fingers car But the moment we do the clustering it has become a bus But the actual count of Car Buster and other thing van is the same The reason for such things happen But in fact this is a very valuable analysis Were there including the target Call him the clustering We thought the target column the clustering the clusters a commune difference The reason why this happens is I send the data.

Leave a Comment