Classification

1 / 38

Machine Learning Classification Part -1

Recording of Session

Slides

Index Next

Please login to comment

240 Comments

Zanbaz Ahmed Khan

3 years ago

hello sir, suppose i have downloaded the data using the open_ml funtion. i see the data is in the form of a dictionaary

now i wish to create a file in jupyter and store the data there so that each time i open the jupyter i load it from the jupyter instead of running the command of sklearn.

how can I do that?

also, say i , read the downloaded data using a file handle, but when i read it , it is shown in the from a string. I am not able to convert it back to the dictionay mode as it was earlier when downlaoded using the open_ml

kindly help..

and secondly, wht does it mean to have a 'more than one class'

kindly help

Thanks

Shubh Tripathi

3 years ago

If I understood you correctly, you want to know how to save the sklearn datasets locally. So, you can do that by converting the data to a pandas data frame and saving it locally in CSV format. There are other methods too to save files in different formats locally but working with CSV files is much handier.

Can you give me the context of 'more than one class'?

Zanbaz Ahmed Khan

3 years ago

I have save the in text format as well as in CSV format. however, when i load it, its loaded as a string and not as a dictionary.

i will just show you the exact situation.

Zanbaz Ahmed Khan

3 years ago

mnist = fetch_openml('mnist_784', version= 1, cache = True)

I used this commad to download the data

it looks like this. and its a dictionary.

now I have save this data as a text file on my jupyter notebook

When i Load it using as a handle and read its content using f.read()

I get an obvious string.

had i saved it as a CSV, it would still be read as a string coz when you read data from a file, it is always read as a stiring

now the problem that i am facing is that inorder to proceed further i need it in the form of a dictonary as i had downloaded which i tried but in vain.

for example:

So this is a string where as the earlier one was a dictionary.

I am not able to convert this string into a dictionary..

Abouth the class thing::

i am not able to understand as to what 'class' is the error about

Shubh Tripathi

3 years ago

You can read the CSV file by the pandas read_csv method. That directly loads it as a DataFrame.

Zanbaz Ahmed Khan

3 years ago

please guide with regard to the error in the SGDClassifier

Thanks

This comment has been removed.

Shubh Tripathi

3 years ago

The error means Target variable y_train_9 contains only one unique class. To fix that you'll have to load the dataset correctly.

This comment has been removed.

Zanbaz Ahmed Khan

3 years ago

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version= 1, cache = True)
#please have a look at the data. Its a dictionary
X,y = mnist['data'], mnist['target']

#looking at the datasamples
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
X = X.to_numpy()
some_digits = X[36000]
some_digits_image = some_digits.reshape(28,28)


plt.imshow(some_digits_image, cmap= matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.show()

mnist.keys()

X_train , X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

np.random.seed(42)
shuffle_index = np.random.permutation(60000) #creates an array of 600000 nos. randomly
X_train = X_train[shuffle_index]
y_train = y_train[shuffle_index]
#shuffling of the data has been done

#binary classification using SGDClassifier
y_train_9 = y_train == 9
y_train_9


#picking up the classifier to see whether the it yield the right output
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_9)

Here's my work

please point out the mistake so that i can proceed

Thanks

Shubh Tripathi

3 years ago

Hi,

The mistake is at :

y_train_9 = y_train == 9

If you print the y_train, then you will notice that it contains string of integers instead of real integers. It contains-

array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype=object)

And here you can see the dtype is 'object' instead of ;int'.

So, you have to change this line to-

y_train_9 = y_train == '9'

to make it work.

Abolfazl Abdoli Arani

3 years ago

Hi
I had a question, thank you from the professor for answering me
Can I do the image processing when the images were taken with the Samsung a 52s That I can recognize a few millimeters in the image and do labeling ?

Shubh Tripathi

3 years ago

Hi,

Yes, real-time models are trained to handle images of all quality. So yes, you can do the image processing in that case too.

Thanks

Nitin Singh

3 years ago

Getting error while downloading the mldata set:

from sklearn.datasets import fetch_mldata

mnist = fetch_mldata("MINST Original")

mnist

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-3757838d18bb> in <module>
----> 1 from sklearn.datasets import fetch_mldata
      2 mnist = fetch_mldata("MINST Original")
      3 mnist

ImportError: cannot import name 'fetch_mldata'

Abhinav Singh

3 years ago

Instead of fetch_mldata() please use fetch_openml()

Hope this helps.

This comment has been removed.

Vedant Agarwal

4 years ago

unable to import datasets

Vagdevi K

4 years ago

Hi,

It is a warning which you may ignore.

Thanks.

Samaksh Chandra

4 years ago

Why have we shuffled only the training dataset? Why not shuffle the test dataset as well??

Vagdevi K

4 years ago

Hi,

When training, we are trying to get optimal generalized model which is not effected by order of training samples. So we shuffle train data. While testing, we are not really making any changes to the model, but just using that already trained model to test its performance( in terms of accuracy or other metrics) on unseen data. So we don't need to shuffle test data.

Thanks.

Samaksh Chandra

4 years ago

Okay. Thank you for the clarification

Manuj Kumar Joshi

4 years ago

I would advise you to please provide hands-on at the beginning of a chapter, it would be easy to relate theory with the hands-on part.

Rajtilak Bhattacharjee

4 years ago

Hi,

We apply the theory for solving the hands-on. So without learning the theory, we cannot provide the hands-on. For example, without knowing what One-Hot Encoding is, one will not be able to apply it at the correct place.

Thanks.

This comment has been removed.

Narasimha Murthy N

4 years ago

y_train has all images from 0 to 9. Whereas, y_train_5 has only false values. Should not y_train_5 have both true and false?

Rajtilak Bhattacharjee

4 years ago

Hi,

Yes, it should have both True and False. Why don't you write that code in a separate cell and try again. If you get the same results, check your code using which you created y_train_5.

Thanks.

Narasimha Murthy N

4 years ago

Referring to the line "X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]"

We used sklearn.model_selection.train_test_split for splitting the data in the previous projects. Whereas, in this topic, we are using the above statement. Why are we following different approaches? Pl clarify

Vagdevi K

4 years ago

Hi,

The train_test_split is like an API which we could use to split the data just by mentioning % of train and test sets. It's just like using an existing function. Whereas this method is manual where we manually mention the indices till which we wish to have train and test. It is better to know both of the approaches.

Thanks.

Narasimha Murthy N

4 years ago

The line "mnist = fetch_openml('mnist_784')" is successfully getting executed. It created a folder scikit_learn_data in my home directory. I could print the values of mnist. But, I do not see any .mat file created in my home directory. Where is the dataset downloaded? Pl clarify.

Rajtilak Bhattacharjee

4 years ago

Hi,

The data is in .gz format, you can check inside the scikit-learn folder that was created.

Thanks.

Bhanu Prakash

4 years ago

what are coeff_ and intercept_ in SGDClassifier?

Rajtilak Bhattacharjee

4 years ago

Hi,

If you look at the equation of a straight line, it is as follows:

y = mx + c

Here, m is the coefficient, and c is the intercept.

Thanks.

Bhanu Prakash

4 years ago

is it opeml or mldata to fetch the data? there is a difference from what shown in video and whats given in the slides.

Rajtilak Bhattacharjee

4 years ago

Hi,

Try the following:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')

As of version 0.20, sklearn deprecates fetch_mldata function.

Thanks.

Bhanu Prakash

4 years ago

thank you sir

This comment has been removed.

Abhishek Kumar

4 years ago

Code:

from sklearn.datasets import fetch_openml
mnist = fetch_openml("MNIST Original")
mnist

Abhishek Kumar

4 years ago

Error:

HTTPError: HTTP Error 400: Bad Request

Vagdevi K

4 years ago

Hi,

Could you please attach the screenshot of your issue?

Thanks.

Sajja Tulasi Krishna

4 years ago

In precision, and recall which one should be high, which one should be low, please explain, what about sensitivity and specificity

Vagdevi K

4 years ago

Hi,

We have many evaluation metrics like precision, recall, accuracy, etc. But we can’t generalize something to be the best, like a one-size-fits-all solution. It often changes based on the scenario for which we want to build the model.

For example,

Consider the scenario where we want to build a model to classify a credit card transaction to be fraudulent or not. Here it is more important for us to make sure no fraudulent transaction is mistakenly classified as a non-fraudulent transaction, because this is a monetary issue where security should be the at-most priority. Thus we can’t afford False Negatives. So we shall focus on improving recall by reducing False Negatives. (recall = (true positives) / (true positives + false negatives)).

In some other situations, like spam email detection, it’s sometimes ok to classify a spam-email(positive) as a non-spam-email(negative), but it’s not ok to mark a non-spam-email as a spam email, as the user might miss some valuable information carried by a good email. So here we can’t afford False Positives, and hence precision matters here. So here, we care for high precision(precision = (true positives)/(true positives+false positives)), whereas in our fraudulent detection case we care for high recall. So based on our necessity, we generally choose the features which positively affect the higher performance in terms of the chosen metric.

Thanks.

Vagdevi K

4 years ago

Further, Sensitivity is nothing but recall. Sensitivity = (true positives)/(true positives + false negatives). In our credit car example, this answers the question: Of the total fraudulent transactions, how many are correctly classified as fraudulent.

Specificity is the opposite of sensitivity. Specificity = (true negatives)/(true negatives + false positives). This answers the question: Of the total non-fradulent transactions, how many are correctly classified to be non-fraudulent.

Dhanesh Kumar

4 years ago

How to download these slides?

Rajtilak Bhattacharjee

4 years ago

Hi,

If you hover your cursor over the slides, you will see an arrow icon on the top right of the slides. You can click on that to download these slides.

Thanks.

Vagdevi K

4 years ago

Hi,

Please click on the arrow mark of top-right corner in slides section:

Then in the new tab, you can either save the slides to your google drive, or click on print option and download it.

Thanks.

Nirav Raj

4 years ago

I did not understand .

How much you are putting in y_test and X_test??

please say

Vagdevi K

4 years ago

Hi,

As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:

Thanks.

Nirav Raj

4 years ago

how is the 10,000 in test_set??

You are putting 60,000 in test set

Vagdevi K

4 years ago

Hi,

As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:

Thanks.

Manjunath Malagavi

4 years ago

showing error of image as per instruction given in Videos hence getting confused.

Sandeep Giri

4 years ago

Hi Manjunath,

The dataset has changed a bit in recent version. So, the image at same index may be different.

Manjunath Malagavi

4 years ago

Bcz of this I am not able to practice..just listen to videos..

Manuj Kumar Joshi

4 years ago

Split your screen and open colab on the right screen...do the hands-on.

Jayateertha Rao D

4 years ago

Getting aboe mentioned error

This comment has been removed.

Shashwat Verma

4 years ago

Hi I am unable to view the Jupyter notebook on the right side of the half screen. Also, I do not see an option to "Hide Playground" or "Show Playground" on the screen. Please help how to enable it so that I can run the code side by side while I am going through the learning material.

Thanks

Sachin Giri

4 years ago

Hi Shashwat,

As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.

Ankit Gokhale

4 years ago

Hello. I am trying to use SGDClassifier. But getting error. Please help. Thanks

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-c2594dabc585> in <module>
      2 
      3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
    709                          loss=self.loss, learning_rate=self.learning_rate,
    710                          coef_init=coef_init, intercept_init=intercept_init,
--> 711                          sample_weight=sample_weight)
    712 
    713 

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
    548 
    549         self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550                           classes, sample_weight, coef_init, intercept_init)
    551 
    552         if (self.tol is not None and self.tol > -np.inf

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
    512             raise ValueError(
    513                 "The number of classes has to be greater than one;"
--> 514                 " got %d class" % n_classes)
    515 
    516         return self

ValueError: The number of classes has to be greater than one; got 1 class

Rajtilak Bhattacharjee

4 years ago

Hi,

Please check the y_train_5 train set, it needs to have more than 1 class. If it does not, please review your code where you created this dataset.

Thanks.

Ankit Gokhale

4 years ago

Hello

using

from sklearn.datasets import fetch_mldata always gives error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-f24642af8337> in <module>
----> 1 from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

Rajtilak Bhattacharjee

4 years ago

Hi,

fetch_mldata() has been deprecated. Please use fetch_openml() instead, you can find the updated code in the slides and notebook from our GitHub repository.

Thanks.

Ankit Gokhale

4 years ago

thanks

Sahoo Pk

4 years ago

Hi

The lab is not visisble here. Can you pls guide me

Sachin Giri

4 years ago

Hi Sahoo,

As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.

Sujeet Pathak

4 years ago

y_train_9 = (y_train==9)

y_test_9 = (y_test==9)



from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state = 42, max_iter = 10)

sgd_clf.fit(X_train,y_train_9)

in above code, I am getting following error

ValueError: The number of classes has to be greater than one; got 1 class

Vagdevi K

4 years ago

Hi,

The classifier expects atleast 2 unique class labels for training, whereas you are providing the data with only "9" class label. So give the data with at least 2 class labels.

Thanks.

Ankit Gokhale

4 years ago

How can I give the data 2 class labels. I am stuck at this stage also and getting the same error.

Vagdevi K

4 years ago

Hi,

You could use the follow:

array = ['5', '6']
df.loc[df['Class'].isin(array)]

This returns the rows with class labels '5' and '6'. You could modify the code as per your need. Hope this helps.

Thanks.

Bhavika Sehgal

4 years ago

Hello,

Can you please explain. what is decision function?

Thanks!

Rajtilak Bhattacharjee

4 years ago

Hi,

Good question!

Please find the detailed explanation of a decision function from the below link:

https://stats.stackexchange.com/questions/104988/what-is-the-difference-between-a-loss-function-and-decision-function

Thanks.

Bhavika Sehgal

4 years ago

Hi,

Thanks for replying. I tried to go through the link but was not able to understand it. Can you please explain it in simple terms.

Thanks!

Rajtilak Bhattacharjee

4 years ago

Hi,

A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include:

Estimation problems: the "decision" is the estimate.
Hypothesis testing problems: the decision is to reject or not reject the null hypothesis.
Classification problems: the decision is to classify a new observation (or observations) into a category.
Model selection problems: the decision is to chose one of the candidate models.

Thanks.

Taksham Gupta

4 years ago

hello,

I need MNIST dataset in offline mode, so could you please provide me the path to access the dataset.

Sandeep Giri

4 years ago

Hi Taksham,

When you use sklearn to download data, it gets downloaded in a folder in a file. You can download it from there.

You can also download it from here: http://yann.lecun.com/exdb/mnist/

Taksham Gupta

4 years ago

Thank you so much sir !

Bhavika Sehgal

4 years ago

Hi,

I am using Stochastic Gradient Descent Classifier to check whether a digit is 5 or not. I am getting the below error :

I have checked training set also. It is showing count for digit 5 as 5421 but when I am checking using y_train==5, I am getting false values only.

Please let me know the mistake that I am doing.

Thanks!

Vagdevi K

4 years ago

Hi,

When you want to classify the number in a given image, you should use "predict" function on the trained model. Here, you are using model.fit, which is used to train a model based on the input data. So, first train you model on the train features and train labels. After training, use model.predict to predict the class of a given image. Hope this helps.

Thanks.

Bhavika Sehgal

4 years ago

Hi,

I have not trained the model yet. I am trying out binary classifier above. So, I am fitting the model to check whether digit is 5 or not.

Thanks!

Vagdevi K

4 years ago

Hi,

In that case, as your error says, the number of classes the model is expecting is more than 1(here you should pass the data with 2 classes). So train the model by sending data with 2 classes as expected by the keras model. Then you could predict an image as part of testing. Hope this helps.

Thanks.

This comment has been removed.

Bhavika Sehgal

4 years ago

Hi,

While dividing data into train-test set, mannually we are picking first 60,000 rows and labelling it as train data. Would not that include bias and hence, train data obtained would not be random?

Can we use train_test_split with test_size=0.9 ?

Rajtilak Bhattacharjee

4 years ago

Hi,

Having a test size of 0.9 means that 90% of the data will be set aside for testing purposes, which is not practical because you need more data to train and less data to test.

Thanks.

Bhavika Sehgal

4 years ago

Hi,

Sorry, by mistake I have written test_size=0.9. I meant 90% of data fro training purpose. Can we split using train_test_split?

If we want to perform stratifiedshuffle split, then numbers of stratas to be created should be 10 i.e equal to number of digits ?

Thanks.

Rajtilak Bhattacharjee

4 years ago

Hi,

So there's a basic rule of Machine Learning/Deep Learning, that there is no one rule fits all. Depending on data size, the problem at hand, and other factors, you need to decide what you need to do. Here, the dataset is already split into train and test sets, so you don't need to do that, but if you want to experiment you sure can. Merge the train and test set and then use train_test_split on it. StratifiedShuffleSplit's number of strata has got nothing to do with the number of digits, actually it does not have a strata hyperparameter at all. Rather you need to specify the number of splits using the n_splits hyperparameter which does not need to be equal to the number of digits. Read more about it from the below link:

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html

Thanks.

Bhavika Sehgal

4 years ago

Hi,

Thanks for the clarity.

GODISHALA ANIL KUMAR

5 years ago

I have a problem with Playground. The show playground option is not visble in Classification module.Please resolve it

Sachin Giri

5 years ago

Hi Godishala,

The side playground is not available for the slides where there is nothing to evalaute.

Rohit Goyal

5 years ago

Hi,

1. It seems like until I run the below code, the output of y[36000] is 9 and not 5

def sort_by_target(mnist):
reorder_train = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[:60000])]))[:, 1]
reorder_test = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[60000:])]))[:, 1]
mnist.data[:60000] = mnist.data[reorder_train]
mnist.target[:60000] = mnist.target[reorder_train]
mnist.data[60000:] = mnist.data[reorder_test + 60000]
mnist.target[60000:] = mnist.target[reorder_test + 60000]

This code is not explicitly covered in the video. Can you please explain what is going on here and why is the output without this code different, secondly why is this not covered in the video or has an explanation of the use in the notebook?

2. If I run the below without using the snippet in #1 above

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

I can see that my y_train still has 5 as the target variable but y_train_5 does not have any TRUE values (which is leading to the only 1 class error that most people got). My guess is that this is happening due to the data type for y_train and 5 not being the same. It gets fixed if I use the below: -

y_train_5 = (y_train.astype(np.int8) == int(5))
y_test_5 = (y_test.astype(np.int8) == int(5))

I also saw that this is being covered (again without an explicit note or purpose of why it is being done in the below snippet) in the github code -

mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)

While I am glad that these things were not mentioned and I got to learn these things on my own, I think it will be better to have these things covered in the notebook/video and why they are being used

Thanks,

Rohit

Rajtilak Bhattacharjee

5 years ago

Hi,

1. It is a sorting function which sorts the dataset based on the target value.

2. With these 2 lines of code, we are simply marking any other target value other than '5' as False. We are doing this because we only want to classify the digit '5' now.

3. We would urge you to explore the codes, and try to find out how they work. If we explain everything, like I did just now, it would defeat the purpose of the course where we are trying to help you learn. If you come across a code, be it here or elsewhere, you can try to find out how it works by searhing in Google or Stackoverflow.

Thanks.

Lav Deshpande

5 years ago

while plottinghte precision recall curve jvs the threshold why have we dont the indexing -[:-1]

   plt.plot(recalls[:-1], precisions[:-1], "b-", label="Precision")

Rajtilak Bhattacharjee

5 years ago

Hi,

Good question. This is because plotting the indexes will not be of any help in getting a meaning out of the charts. So we omit them.

Thanks.

Mani Deol

5 years ago

while runing following code:

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train_5)

I am getting following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-c2594dabc585> in <module>
      2 
      3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
    709                          loss=self.loss, learning_rate=self.learning_rate,
    710                          coef_init=coef_init, intercept_init=intercept_init,
--> 711                          sample_weight=sample_weight)
    712 
    713 

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
    548 
    549         self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550                           classes, sample_weight, coef_init, intercept_init)
    551 
    552         if (self.tol is not None and self.tol > -np.inf

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
    512             raise ValueError(
    513                 "The number of classes has to be greater than one;"
--> 514                 " got %d class" % n_classes)
    515 
    516         return self

ValueError: The number of classes has to be greater than one; got 1 class

Please help. Infact I have pasted the code also

Rajtilak Bhattacharjee

5 years ago

Hi,

Please run it from the beginning. If you are stuck, please refer to the code in our GitHub repository.

Thanks.

Saquib Khan

5 years ago

Sir

How is precision 10% in the case of always 5 classifier? Can you please show it for the exact data give in the video?

Rajtilak Bhattacharjee

5 years ago

Hi,

The accuracy for the Never5Classifier is 90% because the classifier always predict with Not 5 irrespective of the input. Since most of the dataset consists of digits which are not 5, it matches the prediction without it doing any actual prediction.

Thanks.

Preet Nandeshwar

5 years ago

sir ....at 35.15 (time) why we used random.permutation() ??

Rajtilak Bhattacharjee

5 years ago

Hi,

Good question!

random.permutation() randomly permute a sequence, or return a permuted range. We are using it here so that we can shuffle the dataset, it is somewhat similar to shuffling a deck of cards.

Thanks.

ayush ranjan

5 years ago

Hi,

Sir just wanted to say one thing i am working in a company where i am not getting enough time.I took this course with lots of expectation as it is being associated with iit but the only prblem is that the content is nice but its duration is very very big.It gets really boring after some time and i drop it in between.Please reduce the duration in some ways.MORE THAN 2-2.5 HOURS TAKES MY ENTIRE DAY TO COVER.A bit disappointed.The code length is also very big and not understandable.

Rajtilak Bhattacharjee

5 years ago

Hi,

Thank you for your feedback. If you check out the lecture videos, you will find that these contains numerous sub topics. I would suggest that instead of covering an entire video, cover by sub topics each day. Also, take notes as and when you are learning from the videos, you can also make flashcards. These will help you remember these concepts for a long time.

Thanks.

This comment has been removed.

Thanneru rahul

5 years ago

sir i am getting this error.i have also gone through github notes but unable to resolve please help me

Rajtilak Bhattacharjee

5 years ago

Hi,

Please match your code with the code from our GitHub repository, it seems the training data was not prepared correctly.

Thanks.

This comment has been removed.

This comment has been removed.

Elite Coder

5 years ago

Where can I find how to reopen the playground? It appears to have closed.

Sachin Giri

5 years ago

Hi Elite Coder,

In those slides, where there is no evaluation to present, you will not see the side playground, although you can still open the lab in a separate tab from My Lab page.

Elite Coder

5 years ago

Thank you for your prompt response. I will remember that in the future.

Jerome Gomes

5 years ago

i am getting this error .... please help

Rajtilak Bhattacharjee

5 years ago

Hi,

The variable name on the first line should be sgd_clf, it has a typo. Let me know if this solves the issue. If not, then would request you to review your code against our notebooks from our GitHub repository.

Thanks.

Jerome Gomes

5 years ago

i am stil getting error can you send me the link to download this .ipynb file .......... will be greatful.

Rajtilak Bhattacharjee

5 years ago

Hi,

Sure! Please find below the link to the classification.ipynb file:

https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb

Thanks.

Jerome Gomes

5 years ago

In "sklearn.datasets" version 0.22 and above there is no function "fetch_mldata" how do i import the data set???

Rajtilak Bhattacharjee

5 years ago

Hi,

You can use fetch_openml as given in slide# 15.

Thanks.

Ravi Teja Pavuluri

5 years ago

at 1:16:32 in the video, Sgiri said regarding False negative "The model/image was actually not 5 but was clssified or pridicted as not 5 ok?"

Actually that definition is for True Negative but in the video it was explained for False Negative which is incorrect.

Actually correct statement for False Negative should be "Where the image/model is 5 but classified/predicted as not 5"

Rajtilak Bhattacharjee

5 years ago

Hi,

I checked, his voice is not clear at that instance. He said "of 5" and not "not 5", which makes is a False Negative. You are right about the definition of True Negative though.

Thanks.

Ravi Teja Pavuluri

5 years ago

In the video in the first 30 mins, the data was first split and then shuffled. What is the purpose of shuffling after splitting?

As per my knowledge we shuffle before splitting to get the mixed data which covers all type of samples in both train and test sets.

I also think shuffling after splitting minimizes the opportuntiy of testing randomly because we loose track which rocord has which label and check if our predicted values is equal to the expected label as everything was shuffled.

Rajtilak Bhattacharjee

5 years ago

Hi,

If you notice the comment just above that cell, we are shuffling here because we will be using cross validation next. We want to reduce bias even with the training and validation sets.

Thanks.

Ravi Teja Pavuluri

4 years ago

Thanks that explains my doubt

narenderfdrrr scholar

5 years ago

GOOD AFTERNOON SIR,

UNABLE TO GET THE PLAY GROUND (JUPYTER NOTEBOOK BESIDES THE CONTENT)

Rajtilak Bhattacharjee

5 years ago

Hi,

This is a lecture only slide, so this does not have a Jupyter notebook beside it.

Thanks.

sandeep sathyamurthy

5 years ago

Hi Rajtilak,

It is working fine now,thanks !

sandeep sathyamurthy

5 years ago

Hi,

When the run below code, I am getting an error,

from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, cache=True)

error:

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/datasets/_openml.py:55: RuntimeWarning: Invalid cache, redownloading file
  warn("Invalid cache, redownloading file", RuntimeWarning)

Rajtilak Bhattacharjee

5 years ago

Hi,

OpenML is down right now. Please try after some time.

Thanks.

Rajtilak Bhattacharjee

5 years ago

Hi,

Can you try this again, it should be working now.

Thanks.

Dhruv Sinha

5 years ago

While performing SGDClassifier:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)

Error:

ValueError: The number of classes has to be greater than one; got 1 class

I have changed the way the way the dataset was split into training and test by using test_train_split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15).

The y_train_9 is a numpy array containing 59500 values which are all False when I am doing it this way. Is it correct? Also, why are we using the y_train_9 instead of y_train?

Rajtilak Bhattacharjee

5 years ago

Hi,

This is a part of which assessment?

Thanks.

jayshree rathod

5 years ago

Hello sir,

I tried the binary classification of predicting value 1 image. I got precision as 97% ,recall as 91% and f1_score as 94%.whereas in lecture i saw for value 5 image precision , recall and fi_score are different which are in the range of 70%. Please tell me according to the input image the precision,recall and f1_score gets changed? and if it gets changed then how should i know that i am getting good prediction of my model?

Rajtilak Bhattacharjee

5 years ago

Hi,

To understand the difference you need to review the formula for precision, recall, and f1_score, and find out how many "5" and "1" images you have.

Thanks.

Pooja Ramrakhiani

5 years ago

Can you please explain what is a decision function and what it does?

Rajtilak Bhattacharjee

5 years ago

Hi,

A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include: Estimation problems: the "decision" is the estimate.

Thanks.

Romil T

5 years ago

Hello,

Please help:

1: 36000 image is not "Number 5" as per the tutorial,It is "Number 9".-it is shuffled...

2: y_train_9= here "y" is an object,hence i had change to int by defining it as y=y.astype('int16')

2: While performing SGDClassifier:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)

Error:

ValueError: The number of classes has to be greater than one; got 1 class

Please help,because of this error,i am not able to complete the Project,also checked all the comments but did not helped.

Rajtilak Bhattacharjee

5 years ago

Hi,

Please follow our notebook for a hint.

Thanks.

Romil T

5 years ago

Hello Sir,

Thank-You for your response.

I am unable to find out the Notebook in GitHub,also i checked ppt number of times but it didnt helped.

Please give me the link of the notebook.

Rajtilak Bhattacharjee

5 years ago

Hi,

This is the link to the notebook for Classification:

https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb

Thanks.

Romil T

5 years ago

Thank You Sir...

Romil T

5 years ago

Hello As per the Notebook, I tried every single steps,just copying and pasting the code,only i have changed the Target value which is "9".

1: When i tried below code:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train_9)

ValueError: The number of classes has to be greater than one; got 1 class

2: When i tried without "9"

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train)

It runs properly as shown in your notebook.

3: So wheneevr i am trying to run the y_train_9 (targeted value) it is giving error in confusion_matrix.

Also while cross_val_predict:

from sklearn.model_selection import cross_val_predict
y_train_pred=cross_val_predict(sgd_clf,X_train,y_train_9,cv=3)

ValueError: The number of classes has to be greater than one; got 1 class

So there is a problem or error if i am trying to specify the targeted value.

Please help.

Thanks.

Rajtilak Bhattacharjee

5 years ago

Hi,

You may have to review the code from the first step of this assessment, and not just this one.

Thanks.

Romil T

5 years ago

Hello Sir,

Thank-You for your response:

I again checked from the beginning and now it is working correctly.

Thanks.

Dhruv Sinha

5 years ago

what did u do to remove that error?I copied the code from the Notebook and changed y_train_5 to y_train_9.

Somendra Tiwari

5 years ago

sir, why y_train_5 is used ?

Rajtilak Bhattacharjee

5 years ago

Hi,

y_train_5 is the target variable which points only to the "5" digit since this is a binary classification, even though the dataset contains all digits.

Thanks.

Prabhupad Mohapatra

5 years ago

At Slide no 88, and Video 2:01:53 it is mentioned as FN = 2

But FN =3

Please check.

Rajtilak Bhattacharjee

5 years ago

Hi,

That's correct! False Negatives should be 3. We will make the required updates.

Thanks.

Arun Kumar

5 years ago

Arun Kumar

5 years ago

whats wrong here?

Rajtilak Bhattacharjee

5 years ago

Hi,

This function has been deprecated. Please download our latest notebooks from our GitHub repository for the updated codes.

Thanks.

Amit Kumar

5 years ago

This prediction model not working fine as X[31000] is number '5' but prediction giving it false.

sgd_clf.predict([X[31000]])

Rajtilak Bhattacharjee

5 years ago

Hi,

Would request you to obtain the latest copy of our notebook from our repository.

Thanks.

Himanshu Kumar

5 years ago

fetch_mldata is not available

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-8a029c5f792a> in <module>
----> 1 from sklearn.datasets import fetch_mldata
      2 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'

Rajtilak Bhattacharjee

5 years ago

Hi,

fetch_mldata has been deprecated. Please get the latest codes from our repository which contains an alternate command to download the dataset.

Thanks.

Vijay Saini

5 years ago

hi Team,

plt.imshow(255-some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")

if i am running this above code without using interpolation it does not showing any difference then what is the role of using interpolation

Rajtilak Bhattacharjee

5 years ago

Hi,

interpolation='nearest' simply displays an image without trying to interpolate between pixels if the display resolution is not the same as the image resolution (which is most often the case). It will result an image in which pixels are displayed as a square of multiple pixels.

Thanks.

Vijay Saini

5 years ago

Hi Team,

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
  ConvergenceWarning)

why am I getting above warning while running code / what does it mean ?

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)

Rajtilak Bhattacharjee

5 years ago

Hi,

This is a custom warning to capture convergence problems. You can disable it using the following method:

https://stackoverflow.com/questions/53784971/how-to-disable-convergencewarning-using-sklearn

Thanks.

PRANAV KRISHNAN

5 years ago

1.why we used y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3) instead of y_train_pred=sgd_clf.predict(X_train)?

2.For prediction we should be passing only the training data and it should return the target variable ,why we are passing both?

3.In calculating scores the cross_val returned the scores for each fold , but here we got only one output.Can you explain how cross_val_predict work in this case?

Rajtilak Bhattacharjee

5 years ago

Hi,

1. Generate cross-validated estimates for each input data point. The data is split according to the cv parameter. Each sample belongs to exactly one test set, and its prediction is computed with an estimator fitted on the corresponding training set.

2. That is the syntax of the cross_val_predict() function.

3. The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).

Thanks.

PRANAV KRISHNAN

5 years ago

Now its clear for me,Thank you

Sameer Rs

5 years ago

Hi Cloud X Team,

Apart from sklearn.datasets,, came across that MNIST dataset is also located in Keras & Tensorflow libraries of Python.

Is it possible to import the aforesaid dataset from Keras & TensorFlow libraries too???

https://www.tensorflow.org/...

This is what I came across while doing Google Search.

I believe that this is a possible solution for fetch_data() which has recently been depreceated from scikit-learn & ml.org as follows:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Another alternative is:
http://yann.lecun.com/exdb/...

Kindly let me know your feedback.

CloudxLab

5 years ago

Hi,

The MNIST dataset referred here is a part of the Scikit-Learn dataset. However, you are right, Keras too contains the MNIST dataset. You can find the details of all the Keras datasets in the below link:

https://keras.io/api/datasets/

Thanks.

-- Rajtilak Bhattacharjee

Sameer Rs

5 years ago

Gr888 & thanks for your response...

Sameer Rs

5 years ago

Hi Cloud X Team,

In your video-recording tied at 01:21:38 approx wrt MNIST dataset----

a) The given large image further broken down into smaller images comprising of dimensions --- 28 x 28 pixels i.e. 784 features (pixels). In other words each pixel is converted into a column. Does this mean that each small image comprises of 28 columns & the given large image (in the slide) comprises of totally 784 columns.??? Is this what is meant by the Trainer's statement? Just want to understand the concept that has been grasped by me. Kindly correct me, in case if I have wrongly understood.

b) Now these 784 features (pixels) can be transformed into an array comprising of 784 blocks. .

c) 70,000 are the rows. How did 70,000 come into the picture or rather how was this figure derived or arrived at?

Would be glad if you can clear my doubts.

CloudxLab

5 years ago

Hi,

1. You can visualize it as 28 rows x 28 columns
2. Yes
3. The rows are the total number of images in the dataset, i.e. 70,000
Thanks.

-- Rajtilak Bhattacharjee

Sameer Rs

5 years ago

Dear Rajtilak,

Thanks for your crystal-clear explanation...Now I've understood it better....

Deepjyoti Saikia

5 years ago

class Never5Classifier(BaseEstimator):
def fit(self, X, y=None):
pass
def predict(self, X):
return np.zeros((len(X), 1), dtype=bool)

Why we are passing y=None by default ?

And I am not able to understand the predict function,Why are we returning all zeros ?

And Iam getting True negative as 54579 and False Negative as 5421 ,The true positive is zero and False positive is also zero ?

CloudxLab

5 years ago

Hi,

Please find the answer to your queries in the link below:
https://stackoverflow.com/q...
Thanks.

-- Rajtilak Bhattacharjee

Deepjyoti Saikia

5 years ago

There was only the explanation of fit which I know what is doing ,But in the link they have said we are only providing the 5 ,but I don't think it is because we are not choosing particularly 5 we are passing the whole data set X which contains different numbers .
And I have asked why we have set y=0 by default ? and inside predict why we are returning zeros of shape (len(x),1)?

Please give the explanation.

CloudxLab

5 years ago

Hi,

The Never5Classifier is just a toy classifier which always predicts False (meaning "not a 5"), without even looking at the image. The goal is to demonstrate that even such a bad classifier (which doesn't learn anything at all and doesn't even look at the images) can get pretty good accuracy if most images are not 5s.

You will find the explanation in the below link:

https://github.com/ageron/h...

Thanks.

-- Rajtilak Bhattacharjee

Siddharth

5 years ago

Hi Cloudxlab:

1. It seems a few functions in the code provided in the PDF explained in video is deprecated. I had to put in a lot of unnecessary time to understand "fetch_mldata()" is deprecated. Please attach the latest updated git hub code link below the video.

2. Also until 1 hour into this video the trainer is explaining from some other PDF which is not added here. Exactly at "1:12:00" is where he starts with Classification PDF , the one which you have attached here in the course. Please five us the transcripts(more importantly the PDFs) of the explanation till "1:12:00"

CloudxLab

5 years ago

Hi,

1. We constantly try to update our notebooks as and when required. We have addresses this change too by updating our notebook in our GitHub repository. You can obtain them by forking our repository from the link below:

https://github.com/cloudxla...

If in future you face any such issues, would request you to either check our repository whether we have changed any codes, or let us know through your comments.

2. The first part of this lecture is a continuation of the End-to-End project, and you would find these slides under that topic.

Please let us know if you have further queries.

Thanks.

-- Rajtilak Bhattacharjee

Siddharth

5 years ago

Hi,
I can download the latest notebook from the link provided. But what I'm more worried about is your videos which explain the code is not updated as well. The trainer is still explaining the old code of "fetch_ml()". And then the code is updated in the notebook.

How do you then expect us to follow the updated code then ? All by our self ?

CloudxLab

5 years ago

Hi,

We will update our videos soon. However, please follow the code given in our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

Biru

5 years ago

Hi Team,

In video, tutor is asking to use below code to load MINSAT dataset.

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]

But the above code is not working. This code is commented in latest code repository.And even if I try to run the above piece of code it is giving me error. on 'fetch_mldata'.

I can see some different lines of code in new file, which is below mentioned

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)

I can see mnist dataset in both the cases is a dictionary only and it has all data. But, this image algorithm is working differently I think, because in Viedo for y[36000] is giving 5, which is same as what is shown by matplot.But , with the latest code which is using 'fetch_openml', y[36000] is 9.

Plus there is a function in new code. 'sort_by_target'.
Please let me know reasons for all these things.

Thanks!

CloudxLab

5 years ago

Hi,

The code fetch_mldata() has been deprecated. We have updated our notebooks, you can download the latest notebook from our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

Mrityunjay

5 years ago

Hi, Team

in Classification
from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist

Error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'

just tell me How should i practice..

CloudxLab

5 years ago

Hi,

Please note that fetch_mldata has been deprecated. We have updated our notebooks accordingly. Would request you to pull the updated notebook from our GitHub repository to reflect the same.

Thanks.

-- Rajtilak Bhattacharjee

Mrityunjay

5 years ago

Please tell me how to do the practice??

CloudxLab

5 years ago

Hi,

You can get the latest notebooks from our GitHub repository, study the codes and understand their workings, and then imply the same understanding while working on the projects related to this topic.

Thanks.

-- Rajtilak Bhattacharjee

Vivek

5 years ago

please help

Mohini Singhal

5 years ago

Side playground is not show to me.Please resolve.

CloudxLab

5 years ago

Hi Mohini,

Thank you for contacting us.
Take a look at the top right side of your screen, are you able to locate "Show Playground"? Just click on it.
Please feel free to let me know if you have any queries and I'll be glad to help.

Hope this helps.

Thanks.

-- Anupam Singh Vishal

Mohini Singhal

5 years ago

you can see in the screenshot sir, there no button named Show playground.

Srihari

5 years ago

I guess this session we have to do in our own jupyter notebook that is installed, since it is not graded.

CloudxLab

5 years ago

Hi, Srihari.

Yes, you can do it by following the tutorial and by creating with another Jupyter file.

All the best!

-- Satyajit Das

CloudxLab

5 years ago

Hi Mohini,

The playground will not show at the side of this topic, and a few more of them, as they do not have any assessments.

Thanks.

-- Rajtilak Bhattacharjee

Bhaskar Saikia

5 years ago

Please let me know where can I get the same dataset in local?

CloudxLab

5 years ago

Hi,

You can use the following command in your local Jupyter installation, and you will be able to access the same dataset:

mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.target <http: disq.us="" url?url="http%3A%2F%2Fmnist.target%3Ao8f962NqilaEdXjITo3wJy-wzBM&cuid=4082636"> =
mnist.target.astype(np.int8)

Thanks.

-- Rajtilak Bhattacharjee

Debasish Deb

5 years ago

from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

CloudxLab

5 years ago

Hi,

Use this, it should work.
from sklearn.datasets import fetch_openml
mnist = fetch_openml(‘mnist_784’)

All the best!

-- Satyajit Das

Amit Kumar Srivastava

5 years ago

Hi,
I am trying to download but unable to do, please find the attached screen shot.

CloudxLab

5 years ago

Hi,

Try the following code instead:

from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, cache=True)

mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)

Thanks.

-- Rajtilak Bhattacharjee

Gb

5 years ago

What is sort_by_target function?

CloudxLab

5 years ago

Hi,

Please comment out that line and try again. Also, please note that we have updated our notebooks with this code. Would request you to use the latest notebooks from our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

Prachi Singla

5 years ago

Hi
The code after fitting classifier is supposed to give True boolean but it is giving False.Plz help
Thanks

CloudxLab

5 years ago

Hi,

Could you please share a screenshot of your code and the error that you are getting.

Thanks.

-- Rajtilak Bhattacharjee

Prachi Singla

5 years ago

Hi
Can i get all stuff of recordings and slides for future reference.
Thanks

CloudxLab

5 years ago

Hi Prachi,

You will have a lifetime free access to all the videos and the slides. You can even download the slides using the arrow button that shows up on the top right corner when you hover over them.

Thanks.

-- Rajtilak Bhattacharjee

Prachi Singla

5 years ago

OK Thanks

Dhyey Kotecha

5 years ago

@disqus_XTh3bUKOBh:disqus Team,

I just wanted to highlight below two observations, due to change in the data source from MLDATA to OPENML:

1. The 36,000 th image in the video was '5' while in the OPENML dataset, it points to the value '9'. Perhaps, the order in this dataset seems shuffled.

2. When binary classification is attempted, the SGDClassifier gives out an error as "ValueError: The number of classes has to be greater than one; got 1 class". I came to know that, the issue lies with the data type of the dataset's labels. The data type of values of the 'target' key is 'object', which would not work when we create a boolean array of 5 (True) and not 5 (False). However, this can be resolved by changing the data type of labels using below code -

y = y.astype('int16')

I hope it helps everyone.

CloudxLab

5 years ago

Hi,

Thanks for pointing this out.

Thanks.

-- Rajtilak Bhattacharjee

Sanjay Ray

5 years ago

Great!!! thanks

Romil T

5 years ago

Hello,Even after changing the data type from object to int (y=y.astype('int16')),i am getting the same error:

"ValueError: The number of classes has to be greater than one; got 1 class"

Please help

Rajtilak Bhattacharjee

5 years ago

Hi,

This error means that there is some issue with your dataset and not it's data type. Please follow our notebook for more details.

Thanks.

Vinay Singh

5 years ago

Hi Team,
Why am i getting this error.

CloudxLab

5 years ago

Hi,

Try the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Thanks.

-- Rajtilak Bhattacharjee

Vinay Singh

5 years ago

Thanks Rajtilak. It worked.

Deepak Kumar

5 years ago

Classification import issue...

Vinay Singh

5 years ago

Try the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Deepak Kumar

5 years ago

Thanks Vinay it worked. After proceeding further I am stuck with this error.

Dhyey Kotecha

5 years ago

Hi!

The possible resolution to this query is available in this post http://disq.us/p/28x5bs9 .

I hope it helps you.

CloudxLab

5 years ago

Hi Deepak,

Would request you to recheck your code, and check if you have formed the X_train and y_train_5 properly. If you want you can take a hint, or look at the answer to match with the code you wrote. If you are still stuck, would request you to post a screenshot of your code from the beginning.

Thanks.

-- Rajtilak Bhattacharjee

Rohit Raj Jalheria

5 years ago

Hello,
While fitting the croos_val_score to the sgd_clf I'm getting the Convergence warning in the result.
how could this be solved?

CloudxLab

5 years ago

Hi Rohit,

It is a warning, and not an error. If your results are fine then you need not be worried about it.

Thanks.

-- Rajtilak Bhattacharjee

Rohit Raj Jalheria

5 years ago

Hello,

I cannot find the jupyter notebook that is displayed on the right.
can someone help me with that?

Thanks in advance

CloudxLab

5 years ago

Hi Rohit,

This topic does not contain any assessment questions, so you would not find the playground on the right. However, if this is the issue you are facing with all topics, then would request you to restart your server using the following method:
https://discuss.cloudxlab.c...
Thanks.

-- Rajtilak Bhattacharjee

Rohit Raj Jalheria

5 years ago

okay. Thank you

Punit Bhilota

5 years ago

Hi,

I have imported data as mentioned below:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Problem1:

The image of digit for 36000 is attached. It is 9. It is mentioned in the pdf that 36000th image is '5', which is not the case.

The program gives warning after following code.

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train)

WARNING:
-------
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
ConvergenceWarning)

Problem 2:

The next line of code:
some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])

It produces output as: array(['9'], dtype='<u1')< b="">
The output mentioned in the course material (pdf) as: array([True], dtype=bool)

Query 1:

As suggested in above warning I changed the max_iter to '15' and reran the code.

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=15)
sgd_clf.fit(X_train, y_train)

some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])

The output I received is: array(['4'], dtype='<u1')< b="">

Can you please explain how max_iter is impacting the prediction from '9' to '4'?

Problem3:

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train_5)

getting below error while using y_train_5
----> 3 sgd_clf.fit(X_train, y_train_5)
ValueError: The number of classes has to be greater than one; got 1 class

None of the code is working for 'y_train_5'

from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train_5, cv=3,scoring="accuracy")

output: array([nan, nan, nan])

never_5_clf = Never5Classifier()
never_5_pred = never_5_clf.predict(X_train)
cross_val_score(never_5_clf, X_train, y_train_5,cv=3, scoring="accuracy")
Output: array([1., 1., 1.])
Output as per pdf: Never5Classifier - a dumb classifier gave an accuracy of 90%

CloudxLab

5 years ago

Hi Punit,

For the first query, it is a warning, not an error. So you need not change the max_iter. max_iter is the parameter which control the maximum number of iterations that can specify for training this model. It is the maximum number of iterations taken for the solvers to converge.

Thanks.

-- Rajtilak Bhattacharjee

Dhyey Kotecha

5 years ago

Hi!

You can refer this post http://disq.us/p/28x5bs9 for answer to your queries.

I hope it helps you.

Alpesh

5 years ago

Not able to find data set fetch_mldata although i have already pull git repository.
can you share the path?

CloudxLab

5 years ago

Hi Alpesh,

Please use the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Thanks.

-- Rajtilak Bhattacharjee

Vivek Bohra

5 years ago

I am getting error on first line itself... I tried multiple times...

ImportError Traceback (most recent call last)
<ipython-input-3-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'

Jayant Kumar Dixit

5 years ago

Hi Vivek,
The same question I have had. However, I have searched on google about fetch_mldata and found that it is dead because it relied on a website that died. So, We need to replace it with fetch_openml(), which relies on https://openml.org, which is alive and kicking. The data set name is "mnist_784" on this website.

Hi Cloudxlab Team,

Can we proceed with fetch_openml() instead of fetch_mldata(). Please let us know your response.
@disqus_zQl19TrWvN:disqus Please help.

Regards,

Jayant

Ishvina Kapoor

5 years ago

After importing the SGDClassifier and creating it's instance , when I run the fit model from this object , it throws an error - ValueError: The number of classes has to be greater than one; got 1 class
Please help

Dhyey Kotecha

5 years ago

Hi!

You can refer this post http://disq.us/p/28x5bs9 for resolution of your concern.

I hope it helps you.

Vaskar Sarkar

5 years ago

from sklearn.datasets import fetch_mldata

mnist = fetch_mldata("MNIST original")

This is not workong.
Showing

ImportError: cannot import name 'fetch_mldata'
Please help me out.

Harry

5 years ago

Hi , when I run :
"from sklearn.datasets import fetch_mldata"

It gives below error:
ImportError Traceback (most recent call last)
<ipython-input-2-1955b0fbdeec> in <module>
1 import sklearn
----> 2 from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

Please help how to lead the mnist data.

Satyajit Das

5 years ago

Hi, Harry.

Kindly refer to this discussions :- https://discuss.cloudxlab.c...

All the best!

Vivek Bohra

5 years ago

In this video, you have given a google drive link as shared folder where all PPts are provided. Please share that link with me .

Satyajit Das

5 years ago

Hi, Vivek.

You can refer to this GitHub directory for any materials.
https://github.com/cloudxla...

All the best!

Vivek Bohra

5 years ago

At this path only notebooks and data is available. I want the PPT / PDF which are in google drive. Please share the google drive link to download PPT / PDF. You shared that in video with live attendees. While I am listening it now, I am not able to get those.
Thanks,

Gopendra Mohan

5 years ago

from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist

ImportError: cannot import name 'fetch_mldata'

getting the above error .Please help me to resoolve it.

Vijai Narayanan Nallan Chakrav

5 years ago

Please use the following code instead of 1st line
def sort_by_target(mnist):
reorder_train=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[:60000])]))[:,1]
reorder_test=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[60000:])]))[:,1]
mnist.data[:60000]=mnist.data[reorder_train]
mnist.target[:60000]=mnist.target[reorder_train]
mnist.data[60000:]=mnist.data[reorder_test+60000]
mnist.target[60000:]=mnist.target[reorder_test+60000]
import numpy as np
from sklearn.datasets import fetch_openml
#from sklearn.datasets import fetch_mldata
#from sklearn.datasets import fetch_openml
#mnist = fetch_openml('MNIST original')
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_openml('mnist_784',version=1)
mnist.target=mnist.target.astype(np.int8)
sort_by_target(mnist)
mnist

Fetch_mldata fetched from a Site that is down currently, so use Fetch_openml which has different attributes for the data so we have to sort the data and convert the string target to a int.

Avishek Desarkar

5 years ago

cannot import fetch_maldata is the error i am getting in the first line itself, there is no scikit
_learn folder in my home directory, pls help!! i created one as its given and ran the -rvc command to pull it but its still not working

Vinod Kumar Jodu

5 years ago

IF a Regression Model said to be performing well using performance metrics MAE or MSE, then what will be the ranges of MAE or MSE when data is not scaled? What will be the ranges of MAE and MSE if the data scaled in between 0 and 1 or -1 to 1?

Navjot

5 years ago

Hi sir, can you please upload the google drive link of slides so that every student can download them. Thanks

Atul

5 years ago

The following error is got when trying to download the MNIST Data

:
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)

And nothing is downloaded...pl help

Anant Saraogi

5 years ago

..

Satyajit Das

5 years ago

Hi, Anant.

Can you please tell where you are facing the problem?

All the best.

Sahil Sawant

6 years ago

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]

this is not working . giving error in 2nd line. it takes a lot of time to run and in the end it shows the error: Connection Reset by peer

CloudxLab

6 years ago

When too many people are downloading it happens. You can try after sometime.