Classification

1 / 38

Machine Learning Classification Part -1

Recording of Session

Slides


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

240 Comments

hello sir, suppose i have downloaded the data using the open_ml funtion. i see the data is in the form of a dictionaary

 

now i wish to create a file in jupyter and store the data there so that each time i open the jupyter i load it from the jupyter instead of running the command of sklearn. 

how can I do that?

also, say i , read the downloaded data using a file handle,  but when i read it , it  is shown in the from a string. I am not able to convert it back to the dictionay mode as it was earlier when downlaoded using the open_ml

 

kindly help..

 

and secondly, wht does it mean to have a 'more than one class' 

kindly help

 

Thanks

  Upvote    Share

If I understood you correctly, you want to know how to save the sklearn datasets locally. So, you can do that by converting the data to a pandas data frame and saving it locally in CSV format. There are other methods too to save files in different formats locally but working with CSV files is much handier.

Can you give me the context of 'more than one class'?

  Upvote    Share

I have save the in text format as well as in  CSV format. however, when i load it, its loaded as a string and not as a dictionary. 

i will just show you the exact situation.

  Upvote    Share

mnist = fetch_openml('mnist_784', version= 1, cache = True)

I used this commad to download the data

 

it looks like this. and its a dictionary.

 

now I have save this data as a text file on my jupyter notebook

When i Load it using as a handle and read its content using f.read()

I get an obvious string.

had i saved it as a CSV, it would still be read as a string coz when you read data from a file, it is always read as a stiring

now the problem that i am facing is that inorder to proceed further i need it in the form of a dictonary as i had downloaded which i tried but in vain.

 for example:

 

So this is a string where as the earlier one was a dictionary.

 

I am not able to convert this string into a dictionary..

 

Abouth the class thing::

 

 

i am not able to understand as to what 'class' is the  error about

 

 

 

  Upvote    Share

You can read the CSV file by the pandas read_csv method. That directly loads it as a DataFrame.

  Upvote    Share

please guide with regard to the error in the SGDClassifier

 

Thanks

  Upvote    Share

This comment has been removed.

The error means Target variable y_train_9 contains only one unique class. To fix that you'll have to load the dataset correctly.

  Upvote    Share

This comment has been removed.

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version= 1, cache = True)
#please have a look at the data. Its a dictionary
X,y = mnist['data'], mnist['target']

#looking at the datasamples
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
X = X.to_numpy()
some_digits = X[36000]
some_digits_image = some_digits.reshape(28,28)


plt.imshow(some_digits_image, cmap= matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.show()

mnist.keys()

X_train , X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

np.random.seed(42)
shuffle_index = np.random.permutation(60000) #creates an array of 600000 nos. randomly
X_train = X_train[shuffle_index]
y_train = y_train[shuffle_index]
#shuffling of the data has been done

#binary classification using SGDClassifier
y_train_9 = y_train == 9
y_train_9


#picking up the classifier to see whether the it yield the right output
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_9)

 

Here's my work

please point out the mistake so  that i can proceed 

 

Thanks

  Upvote    Share

Hi,

The mistake is at :

y_train_9 = y_train == 9

If you print the y_train, then you will notice that it contains string of integers instead of real integers. It contains-

array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype=object)

And here you can see the dtype is 'object' instead of ;int'.

So, you have to change this line to-

y_train_9 = y_train == '9'

to make it work.

  Upvote    Share

Hi
I had a question, thank you from the professor for answering me
Can I do the image processing when the images were taken with the Samsung a 52s That I can recognize a few millimeters in the image and do labeling ?

  Upvote    Share

Hi,

Yes, real-time models are trained to handle images of all quality. So yes, you can do the image processing in that case too.

Thanks

 1  Upvote    Share
Getting error while downloading the mldata set:

 

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MINST Original")
mnist
?
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-3757838d18bb> in <module>
----> 1 from sklearn.datasets import fetch_mldata
      2 mnist = fetch_mldata("MINST Original")
      3 mnist

ImportError: cannot import name 'fetch_mldata'

  Upvote    Share

Instead of fetch_mldata() please use fetch_openml()

Hope this helps.

  Upvote    Share

This comment has been removed.

unable to import datasets 

  Upvote    Share

Hi,

It is a warning which you may ignore.

Thanks.

  Upvote    Share

Why have we shuffled only the training dataset? Why not shuffle the test dataset as well??

 1  Upvote    Share

Hi,

When training, we are trying to get optimal generalized model which is not effected by order of training samples. So we shuffle train data. While testing, we are not really making any changes to the model, but just using that already trained model to test its performance( in terms of accuracy or other metrics) on unseen data. So we don't need to shuffle test data.

Thanks.

 1  Upvote    Share

Okay. Thank you for the clarification

  Upvote    Share

I would advise you to please provide hands-on at the beginning of a chapter, it would be easy to relate theory with the hands-on part. 

  Upvote    Share

Hi,

We apply the theory for solving the hands-on. So without learning the theory, we cannot provide the hands-on. For example, without knowing what One-Hot Encoding is, one will not be able to apply it at the correct place.

Thanks.

 4  Upvote    Share

This comment has been removed.

y_train has all images from 0 to 9. Whereas, y_train_5 has only false values. Should not y_train_5 have both true and false?

 

  Upvote    Share

Hi,

Yes, it should have both True and False. Why don't you write that code in a separate cell and try again. If you get the same results, check your code using which you created y_train_5.

Thanks.

  Upvote    Share

Referring to the line "X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]"

We used sklearn.model_selection.train_test_split for splitting the data in the previous projects. Whereas, in this topic, we are using the above statement. Why are we following different approaches? Pl clarify

  Upvote    Share

Hi,

The train_test_split is like an API which we could use to split the data just by mentioning % of train and test sets. It's just like using an existing function. Whereas this method is manual where we manually mention the indices till which we wish to have train and test. It is better to know both of the approaches.

Thanks.

  Upvote    Share

The line "mnist = fetch_openml('mnist_784')" is successfully getting executed. It created a folder scikit_learn_data in my home directory. I could print the values of mnist. But, I do not see any .mat file created in my home directory. Where is the dataset downloaded? Pl clarify.

  Upvote    Share

Hi,

The data is in .gz format, you can check inside the scikit-learn folder that was created.

Thanks.

  Upvote    Share

what are coeff_  and intercept_ in SGDClassifier?

 1  Upvote    Share

Hi,

If you look at the equation of a straight line, it is as follows:

y = mx + c

Here, m is the coefficient, and c is the intercept.

Thanks.

  Upvote    Share

is it opeml or mldata to fetch the data? there is a difference from what shown in video and whats given in the slides.

 1  Upvote    Share

Hi,

Try the following:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')

As of version 0.20, sklearn deprecates fetch_mldata function.

Thanks.

  Upvote    Share

thank you sir

 1  Upvote    Share

This comment has been removed.

Code:

from sklearn.datasets import fetch_openml
mnist = fetch_openml("MNIST Original")
mnist

  Upvote    Share

Error: 

HTTPError: HTTP Error 400: Bad Request
  Upvote    Share

Hi,

Could you please attach the screenshot of your issue?

Thanks.

  Upvote    Share

In precision, and recall which one should be high, which one should be low, please explain, what about sensitivity and specificity

 1  Upvote    Share

Hi,

We have many evaluation metrics like precision, recall, accuracy, etc. But we can’t generalize something to be the best, like a one-size-fits-all solution. It often changes based on the scenario for which we want to build the model. 

For example,

Consider the scenario where we want to build a model to classify a credit card transaction to be fraudulent or not. Here it is more important for us to make sure no fraudulent transaction is mistakenly classified as a non-fraudulent transaction, because this is a monetary issue where security should be the at-most priority. Thus we can’t afford False Negatives. So we shall focus on improving recall by reducing False Negatives. (recall = (true positives) / (true positives + false negatives)). 

In some other situations, like spam email detection, it’s sometimes ok to classify a spam-email(positive) as a non-spam-email(negative), but it’s not ok to mark a non-spam-email as a spam email, as the user might miss some valuable information carried by a good email. So here we can’t afford False Positives, and hence precision matters here. So here, we care for high precision(precision = (true positives)/(true positives+false positives)), whereas in our fraudulent detection case we care for high recall. So based on our necessity, we generally choose the features which positively affect the higher performance in terms of the chosen metric.

Thanks.

 1  Upvote    Share

Further, Sensitivity is nothing but recall. Sensitivity = (true positives)/(true positives + false negatives). In our credit car example, this answers the question: Of the total fraudulent transactions, how many are correctly classified as fraudulent.

Specificity is the opposite of sensitivity. Specificity = (true negatives)/(true negatives + false positives). This answers the question: Of the total non-fradulent transactions, how many are correctly classified to be non-fraudulent.

 1  Upvote    Share

How to download these slides?

  Upvote    Share

Hi,

If you hover your cursor over the slides, you will see an arrow icon on the top right of the slides. You can click on that to download these slides.

Thanks.

  Upvote    Share

Hi,

Please click on the arrow mark of top-right corner in slides section:

Then in the new tab, you can either save the slides to your google drive, or click on print option and download it.

Thanks.

  Upvote    Share

I did not understand .

 

How much you are putting in y_test and X_test??

 

please say

  Upvote    Share

Hi,

As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:

Thanks.

  Upvote    Share

how is the 10,000 in test_set??

 

You are putting 60,000 in test set

  Upvote    Share

Hi,

As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:

Thanks.

  Upvote    Share

showing error of image as per instruction given in Videos hence getting confused.

  Upvote    Share

Hi Manjunath,

The dataset has changed a bit in recent version. So, the image at same index may be different.

  Upvote    Share

Bcz of this I am not able to practice..just listen to videos..

  Upvote    Share

Split your screen and open colab on the right screen...do the hands-on.

 1  Upvote    Share

Getting aboe mentioned error

  Upvote    Share

This comment has been removed.

Hi I am unable to view the Jupyter notebook on the right side of the half screen. Also, I do not see an option to "Hide Playground" or "Show Playground" on the screen. Please help how to enable it so that I can run the code side by side while I am going through the learning material.

Thanks

  Upvote    Share

Hi Shashwat,

As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.

  Upvote    Share

Hello. I am trying to use SGDClassifier. But getting error. Please help. Thanks

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-c2594dabc585> in <module>
      2 
      3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
    709                          loss=self.loss, learning_rate=self.learning_rate,
    710                          coef_init=coef_init, intercept_init=intercept_init,
--> 711                          sample_weight=sample_weight)
    712 
    713 

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
    548 
    549         self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550                           classes, sample_weight, coef_init, intercept_init)
    551 
    552         if (self.tol is not None and self.tol > -np.inf

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
    512             raise ValueError(
    513                 "The number of classes has to be greater than one;"
--> 514                 " got %d class" % n_classes)
    515 
    516         return self

ValueError: The number of classes has to be greater than one; got 1 class

 

  Upvote    Share

Hi,

Please check the y_train_5 train set, it needs to have more than 1 class. If it does not, please review your code where you created this dataset.

Thanks.

  Upvote    Share

Hello

using

from sklearn.datasets import fetch_mldata always gives error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-f24642af8337> in <module>
----> 1 from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

 

  Upvote    Share

Hi,

fetch_mldata() has been deprecated. Please use fetch_openml() instead, you can find the updated code in the slides and notebook from our GitHub repository.

Thanks.

  Upvote    Share

thanks

  Upvote    Share

Hi 

The lab is not visisble here. Can you pls guide me

 

  Upvote    Share

Hi Sahoo,

As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.

  Upvote    Share
y_train_9 = (y_train==9)

y_test_9 = (y_test==9)



from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state = 42, max_iter = 10)

sgd_clf.fit(X_train,y_train_9)

in above code, I am getting following error

 

ValueError: The number of classes has to be greater than one; got 1 class
  Upvote    Share

Hi,

The classifier expects atleast 2 unique class labels for training, whereas you are providing the data with only "9" class label. So give the data with at least 2 class labels.

Thanks.

  Upvote    Share

How can I give the data 2 class labels. I am stuck at this stage also and getting the same error.

  Upvote    Share

Hi,

You could use the follow:

array = ['5', '6']
df.loc[df['Class'].isin(array)]

This returns the rows with class labels '5' and '6'. You could modify the code as per your need. Hope this helps.

Thanks.

 1  Upvote    Share

Hello,

Can you please explain. what is decision function?

Thanks!

 1  Upvote    Share

Hi,

Good question!

Please find the detailed explanation of a decision function from the below link:

https://stats.stackexchange.com/questions/104988/what-is-the-difference-between-a-loss-function-and-decision-function

Thanks.

  Upvote    Share

Hi,

Thanks for replying. I tried to go through the link but was not able to understand it. Can you please explain it in simple terms. 

Thanks!

  Upvote    Share

Hi,

decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include:

  • Estimation problems: the "decision" is the estimate.
  • Hypothesis testing problems: the decision is to reject or not reject the null hypothesis.
  • Classification problems: the decision is to classify a new observation (or observations) into a category.
  • Model selection problems: the decision is to chose one of the candidate models.

Thanks.

  Upvote    Share

hello,

I need MNIST dataset in offline mode, so could you please provide me the path to access the dataset. 

  Upvote    Share

Hi Taksham,

When you use sklearn to download data, it gets downloaded in a folder in a file. You can download it from there.

You can also download it from here: http://yann.lecun.com/exdb/mnist/

 

 1  Upvote    Share

Thank you so much sir !

  Upvote    Share

Hi,

I am using Stochastic Gradient Descent Classifier to check whether a digit is 5 or not. I am getting the below error :

I have checked training set also. It is showing count for digit 5 as 5421 but when I am checking using y_train==5, I am getting false values only.

 

Please let me know the mistake that I am doing.

Thanks!

  Upvote    Share

Hi,

When you want to classify the number in a given image, you should use "predict" function on the trained model. Here, you are using model.fit, which is used to train a model based on the input data. So, first train you model on the train features and train labels. After training, use model.predict to predict the class of a given image. Hope this helps.

Thanks.

  Upvote    Share

Hi,

I have not trained the model yet. I am trying out binary classifier above. So, I am fitting the model to check whether digit is 5 or not.

Thanks!

  Upvote    Share

Hi,

In that case, as your error says, the number of classes the model is expecting is more than 1(here you should pass the data with 2 classes). So train the model by sending data with 2 classes as expected by the keras model. Then you could predict an image as part of testing. Hope this helps.

Thanks.

  Upvote    Share

This comment has been removed.

Hi,

While dividing data into train-test set, mannually we are picking first 60,000 rows and labelling it as train data. Would not that include bias and hence, train data obtained would not be random?

Can we use train_test_split with test_size=0.9 ?

  Upvote    Share

Hi,

Having a test size of 0.9 means that 90% of the data will be set aside for testing purposes, which is not practical because you need more data to train and less data to test.

Thanks.

  Upvote    Share

Hi,

Sorry, by mistake I have written test_size=0.9. I meant 90% of data fro training purpose. Can we split using train_test_split?

If we want to perform stratifiedshuffle split, then numbers of stratas to be created should be 10 i.e equal to number of digits ?

Thanks.

  Upvote    Share

Hi,

So there's a basic rule of Machine Learning/Deep Learning, that there is no one rule fits all. Depending on data size, the problem at hand, and other factors, you need to decide what you need to do. Here, the dataset is already split into train and test sets, so you don't need to do that, but if you want to experiment you sure can. Merge the train and test set and then use train_test_split on it. StratifiedShuffleSplit's number of strata has got nothing to do with the number of digits, actually it does not have a strata hyperparameter at all. Rather you need to specify the number of splits using the n_splits hyperparameter which does not need to be equal to the number of digits. Read more about it from the below link:

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html

Thanks.

  Upvote    Share

Hi,

Thanks for the clarity.

  Upvote    Share

I have a problem with Playground. The show playground option is not visble in Classification module.Please resolve it

  Upvote    Share

Hi Godishala,

The side playground is not available for the slides where there is nothing to evalaute.

  Upvote    Share

Hi,

1. It seems like until I run the below code, the output of y[36000] is 9 and not 5

def sort_by_target(mnist):
    reorder_train = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[:60000])]))[:, 1]
    reorder_test = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[60000:])]))[:, 1]
    mnist.data[:60000] = mnist.data[reorder_train]
    mnist.target[:60000] = mnist.target[reorder_train]
    mnist.data[60000:] = mnist.data[reorder_test + 60000]
    mnist.target[60000:] = mnist.target[reorder_test + 60000]

This code is not explicitly covered in the video. Can you please explain what is going on here and why is the output without this code different, secondly why is this not covered in the video or has an explanation of the use in the notebook?

2. If I run the below without using the snippet in #1 above

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

I can see that my y_train still has 5 as the target variable but y_train_5 does not have any TRUE values (which is leading to the only 1 class error that most people got). My guess is that this is happening due to the data type for y_train and 5 not being the same. It gets fixed if I use the below: - 

y_train_5 = (y_train.astype(np.int8) == int(5))
y_test_5 = (y_test.astype(np.int8) == int(5))

I also saw that this is being covered (again without an explicit note or purpose of why it is being done in the below snippet) in the github code - 

mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)

While I am glad that these things were not mentioned and I got to learn these things on my own, I think it will be better to have these things covered in the notebook/video and why they are being used

 

Thanks,

Rohit

  Upvote    Share

Hi,

1. It is a sorting function which sorts the dataset based on the target value.

2. With these 2 lines of code, we are simply marking any other target value other than '5' as False. We are doing this because we only want to classify the digit '5' now.

3. We would urge you to explore the codes, and try to find out how they work. If we explain everything, like I did just now, it would defeat the purpose of the course where we are trying to help you learn. If you come across a code, be it here or elsewhere, you can try to find out how it works by searhing in Google or Stackoverflow.

Thanks.

  Upvote    Share

while plottinghte precision recall curve jvs the threshold why have we dont the indexing -[:-1]

   plt.plot(recalls[:-1], precisions[:-1], "b-", label="Precision")
  Upvote    Share

Hi,

Good question. This is because plotting the indexes will not be of any help in getting a meaning out of the charts. So we omit them.

Thanks.

  Upvote    Share

while runing following code:

 

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) 
sgd_clf.fit(X_train, y_train_5)

 

I am getting following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-c2594dabc585> in <module>
      2 
      3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
    709                          loss=self.loss, learning_rate=self.learning_rate,
    710                          coef_init=coef_init, intercept_init=intercept_init,
--> 711                          sample_weight=sample_weight)
    712 
    713 

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
    548 
    549         self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550                           classes, sample_weight, coef_init, intercept_init)
    551 
    552         if (self.tol is not None and self.tol > -np.inf

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
    512             raise ValueError(
    513                 "The number of classes has to be greater than one;"
--> 514                 " got %d class" % n_classes)
    515 
    516         return self

ValueError: The number of classes has to be greater than one; got 1 class

 

Please help. Infact I have pasted the code also

 

 

  Upvote    Share

Hi,

Please run it from the beginning. If you are stuck, please refer to the code in our GitHub repository.

Thanks.

  Upvote    Share

Sir

How is precision 10% in the case of always 5 classifier? Can you please show it for the exact data give in the video?

  Upvote    Share

Hi,

The accuracy for the Never5Classifier is 90% because the classifier always predict with Not 5 irrespective of the input. Since most of the dataset consists of digits which are not 5, it matches the prediction without it doing any actual prediction.

Thanks.

  Upvote    Share

sir ....at 35.15 (time) why we used random.permutation() ??

  Upvote    Share

Hi,

Good question!

random.permutation() randomly permute a sequence, or return a permuted range. We are using it here so that we can shuffle the dataset, it is somewhat similar to shuffling a deck of cards.

Thanks.

  Upvote    Share

Hi,

Sir just wanted to say one thing i am working in a company where i am not getting enough time.I took this course with lots of expectation as it is being associated with iit but the only prblem is that the content is nice but its duration is very very big.It gets really boring after some time and i drop it in between.Please reduce the duration in some ways.MORE THAN 2-2.5 HOURS TAKES MY ENTIRE DAY TO COVER.A bit disappointed.The code length is also very big and not understandable.

  Upvote    Share

Hi,

Thank you for your feedback. If you check out the lecture videos, you will find that these contains numerous sub topics. I would suggest that instead of covering an entire video, cover by sub topics each day. Also, take notes as and when you are learning from the videos, you can also make flashcards. These will help you remember these concepts for a long time.

Thanks.

 1  Upvote    Share

This comment has been removed.

  • sir i am getting this error.i have also gone through github notes but unable to resolve please help me
  Upvote    Share

Hi,

Please match your code with the code from our GitHub repository, it seems the training data was not prepared correctly.

Thanks.

  Upvote    Share

This comment has been removed.

This comment has been removed.

Where can I find how to reopen the playground? It appears to have closed.

 1  Upvote    Share

Hi Elite Coder,

In those slides, where there is no evaluation to present, you will not see the side playground, although you can still open the lab in a separate tab from My Lab page.

 1  Upvote    Share

Thank you for your prompt response. I will remember that in the future.

 1  Upvote    Share

i am getting this error .... please help

  Upvote    Share

Hi,

The variable name on the first line should be sgd_clf, it has a typo. Let me know if this solves the issue. If not, then would request you to review your code against our notebooks from our GitHub repository.

Thanks.

  Upvote    Share

i am stil getting error can you send me the link to download this .ipynb file .......... will be greatful.

  Upvote    Share

Hi,

Sure! Please find below the link to the classification.ipynb file:

https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb

Thanks.

  Upvote    Share

In "sklearn.datasets" version 0.22 and above there is no function "fetch_mldata" how do i import the data set???

  Upvote    Share

Hi,

You can use fetch_openml as given in slide# 15.

Thanks.

  Upvote    Share

at 1:16:32 in the video, Sgiri said regarding False negative "The model/image was actually not 5 but was clssified or pridicted as not 5 ok?"

Actually that definition is for True Negative but in the video it was explained for False Negative which is incorrect.

Actually correct statement for False Negative should be "Where the image/model is 5 but classified/predicted as not 5"

  Upvote    Share

Hi,

I checked, his voice is not clear at that instance. He said "of 5" and not "not 5", which makes is a False Negative. You are right about the definition of True Negative though.

Thanks.

  Upvote    Share

In the video in the first 30 mins, the data was first split and then shuffled. What is the purpose of shuffling after splitting? 

As per my knowledge we shuffle before splitting to get the mixed data which covers all type of samples in both train and test sets.

I also think shuffling after splitting minimizes the opportuntiy of testing randomly because we loose track which rocord has which label and check if our predicted values is equal to the expected label as everything was shuffled.

 

 1  Upvote    Share

Hi,

If you notice the comment just above that cell, we are shuffling here because we will be using cross validation next. We want to reduce bias even with the training and validation sets.

Thanks.

 2  Upvote    Share

Thanks that explains my doubt

  Upvote    Share

GOOD AFTERNOON SIR,

UNABLE TO GET THE PLAY GROUND (JUPYTER NOTEBOOK BESIDES THE CONTENT)

  Upvote    Share

Hi,

This is a lecture only slide, so this does not have a Jupyter notebook beside it.

Thanks.

  Upvote    Share

Hi Rajtilak,

It is working fine now,thanks !

  Upvote    Share

Hi,

When the run below code, I am getting an error,

from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, cache=True)

error:

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/datasets/_openml.py:55: RuntimeWarning: Invalid cache, redownloading file
  warn("Invalid cache, redownloading file", RuntimeWarning)
  Upvote    Share

Hi,

OpenML is down right now. Please try after some time.

Thanks.

  Upvote    Share

Hi,

Can you try this again, it should be working now.

Thanks.

  Upvote    Share

While performing SGDClassifier:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)

Error:

ValueError: The number of classes has to be greater than one; got 1 class

I have changed the way the way the dataset was split into training and test by using test_train_split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15).

 

The y_train_9 is a numpy array containing 59500 values which are all False when I am doing it this way. Is it correct? Also, why are we using the y_train_9 instead of y_train?

  Upvote    Share

Hi,

This is a part of which assessment?

Thanks.

  Upvote    Share

Hello sir,

I tried the binary classification of predicting value 1 image. I got precision as 97% ,recall as 91% and f1_score as 94%.whereas in lecture i saw for value 5 image precision , recall and fi_score are different which are in the range of 70%. Please tell me according to the  input image the precision,recall and f1_score gets changed? and if it gets changed then how should i know that i am getting good prediction of my model?

  Upvote    Share

Hi,

To understand the difference you need to review the formula for precision, recall, and f1_score, and find out how many "5" and "1" images you have.

Thanks.

  Upvote    Share

Can you please explain what is a decision function and what it does?

  Upvote    Share

Hi,

A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include: Estimation problems: the "decision" is the estimate.

Thanks.

  Upvote    Share

Hello,

Please help:

1: 36000 image is not "Number 5" as per the tutorial,It is "Number 9".-it is shuffled...

2: y_train_9= here "y" is an object,hence i had change to int by defining it as y=y.astype('int16')

2: While performing SGDClassifier:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)

 

Error:

ValueError: The number of classes has to be greater than one; got 1 class

Please help,because of this error,i am not able to complete the Project,also checked all the comments but did not helped.

 

  Upvote    Share

Hi,

Please follow our notebook for a hint.

Thanks.

  Upvote    Share

Hello Sir,

Thank-You for your response.

I am unable to find out the Notebook in GitHub,also i checked ppt number of times but it didnt helped.

Please give me the link of the notebook.

 

  Upvote    Share

Hi,

This is the link to the notebook for Classification:

https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb

Thanks.

  Upvote    Share

Thank You Sir...

  Upvote    Share

Hello As per the Notebook, I tried every single steps,just copying and pasting the code,only i have changed the Target value which is "9".

1: When i tried below code:

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train_9)

ValueError: The number of classes has to be greater than one; got 1 class

2: When i tried without "9"

from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train)

It runs properly as shown in your notebook.

3: So wheneevr i am trying to run the y_train_9 (targeted value) it is giving error in confusion_matrix.

Also while cross_val_predict:

from sklearn.model_selection import cross_val_predict
y_train_pred=cross_val_predict(sgd_clf,X_train,y_train_9,cv=3)

ValueError: The number of classes has to be greater than one; got 1 class

So there is a problem or error if i am trying to specify the targeted value.

Please help.

Thanks.

  Upvote    Share

Hi,

You may have to review the code from the first step of this assessment, and not just this one.

Thanks.

  Upvote    Share

Hello Sir,

Thank-You for your response:

I again checked from the beginning and now it is working correctly.

Thanks.

  Upvote    Share

what did u do to remove that error?I copied the code from the Notebook and changed y_train_5 to y_train_9.

  Upvote    Share

sir, why y_train_5 is used ? 

  Upvote    Share

Hi,

y_train_5 is the target variable which points only to the "5" digit since this is a binary classification, even though the dataset contains all digits.

Thanks.

  Upvote    Share

At Slide no 88, and Video 2:01:53 it is mentioned as FN = 2

But FN =3

Please check.

  Upvote    Share

Hi,

That's correct! False Negatives should be 3. We will make the required updates.

Thanks.

  Upvote    Share

 

  Upvote    Share

whats wrong here?

  Upvote    Share

Hi,

This function has been deprecated. Please download our latest notebooks from our GitHub repository for the updated codes.

Thanks.

  Upvote    Share

This prediction model not working fine as X[31000] is number '5' but prediction giving it false.

sgd_clf.predict([X[31000]])

  Upvote    Share

Hi,

Would request you to obtain the latest copy of our notebook from our repository.

Thanks.

  Upvote    Share

fetch_mldata is not available

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-8a029c5f792a> in <module>
----> 1 from sklearn.datasets import fetch_mldata
      2 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'
  Upvote    Share

Hi,

fetch_mldata has been deprecated. Please get the latest codes from our repository which contains an alternate command to download the dataset.

Thanks.

  Upvote    Share

hi Team,

plt.imshow(255-some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")

if i am running this above code without using interpolation it does not showing any difference then what is the role of using interpolation

  Upvote    Share

Hi,

interpolation='nearest' simply displays an image without trying to interpolate between pixels if the display resolution is not the same as the image resolution (which is most often the case). It will result an image in which pixels are displayed as a square of multiple pixels.

Thanks.

  Upvote    Share
Hi Team,

/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
  ConvergenceWarning)

why am I getting above warning while running code / what does it mean ?

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)

  Upvote    Share

Hi,

This is a custom warning to capture convergence problems. You can disable it using the following method:

https://stackoverflow.com/questions/53784971/how-to-disable-convergencewarning-using-sklearn

Thanks.

  Upvote    Share

1.why we used  y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3) instead of y_train_pred=sgd_clf.predict(X_train)?

2.For prediction we should be passing only the training data and it should return the target variable ,why we are passing both?

3.In calculating scores the cross_val returned the scores for each fold , but here we got only one output.Can you explain how  cross_val_predict work in this case?

  Upvote    Share

Hi,

1. Generate cross-validated estimates for each input data point. The data is split according to the cv parameter. Each sample belongs to exactly one test set, and its prediction is computed with an estimator fitted on the corresponding training set.

2. That is the syntax of the cross_val_predict() function.

3. The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).

Thanks.

  Upvote    Share

Now its clear for me,Thank you

  Upvote    Share

Hi Cloud X Team,

Apart from sklearn.datasets,, came across that MNIST dataset is also located in Keras & Tensorflow libraries of Python.

Is it possible to import the aforesaid dataset from Keras & TensorFlow libraries too???

https://www.tensorflow.org/...

This is what I came across while doing Google Search.

I believe that this is a possible solution for fetch_data() which has recently been depreceated from scikit-learn & ml.org as follows:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Another alternative is:
http://yann.lecun.com/exdb/...

Kindly let me know your feedback.

  Upvote    Share

Hi,

The MNIST dataset referred here is a part of the Scikit-Learn dataset. However, you are right, Keras too contains the MNIST dataset. You can find the details of all the Keras datasets in the below link:

https://keras.io/api/datasets/

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Gr888 & thanks for your response...

  Upvote    Share

Hi Cloud X Team,

In your video-recording tied at 01:21:38 approx wrt MNIST dataset----

a) The given large image further broken down into smaller images comprising of dimensions --- 28 x 28 pixels i.e. 784 features (pixels). In other words each pixel is converted into a column. Does this mean that each small image comprises of 28 columns & the given large image (in the slide) comprises of totally 784 columns.??? Is this what is meant by the Trainer's statement? Just want to understand the concept that has been grasped by me. Kindly correct me, in case if I have wrongly understood.

b) Now these 784 features (pixels) can be transformed into an array comprising of 784 blocks. .

c) 70,000 are the rows. How did 70,000 come into the picture or rather how was this figure derived or arrived at?

Would be glad if you can clear my doubts.

  Upvote    Share

Hi,

1. You can visualize it as 28 rows x 28 columns
2. Yes
3. The rows are the total number of images in the dataset, i.e. 70,000
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Dear Rajtilak,

Thanks for your crystal-clear explanation...Now I've understood it better....

  Upvote    Share

class Never5Classifier(BaseEstimator):
def fit(self, X, y=None):
pass
def predict(self, X):
return np.zeros((len(X), 1), dtype=bool)

Why we are passing y=None by default ?

And I am not able to understand the predict function,Why are we returning all zeros ?

And Iam getting True negative as 54579 and False Negative as 5421 ,The true positive is zero and False positive is also zero ?

  Upvote    Share

Hi,

Please find the answer to your queries in the link below:
https://stackoverflow.com/q...
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

There was only the explanation of fit which I know what is doing ,But in the link they have said we are only providing the 5 ,but I don't think it is because we are not choosing particularly 5 we are passing the whole data set X which contains different numbers .
And I have asked why we have set y=0 by default ? and inside predict why we are returning zeros of shape (len(x),1)?

Please give the explanation.

  Upvote    Share

Hi,

The Never5Classifier is just a toy classifier which always predicts False (meaning "not a 5"), without even looking at the image. The goal is to demonstrate that even such a bad classifier (which doesn't learn anything at all and doesn't even look at the images) can get pretty good accuracy if most images are not 5s.

You will find the explanation in the below link:

https://github.com/ageron/h...

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi Cloudxlab:

1. It seems a few functions in the code provided in the PDF explained in video is deprecated. I had to put in a lot of unnecessary time to understand "fetch_mldata()" is deprecated. Please attach the latest updated git hub code link below the video.

2. Also until 1 hour into this video the trainer is explaining from some other PDF which is not added here. Exactly at "1:12:00" is where he starts with Classification PDF , the one which you have attached here in the course. Please five us the transcripts(more importantly the PDFs) of the explanation till "1:12:00"

  Upvote    Share

Hi,

1. We constantly try to update our notebooks as and when required. We have addresses this change too by updating our notebook in our GitHub repository. You can obtain them by forking our repository from the link below:

https://github.com/cloudxla...

If in future you face any such issues, would request you to either check our repository whether we have changed any codes, or let us know through your comments.

2. The first part of this lecture is a continuation of the End-to-End project, and you would find these slides under that topic.

Please let us know if you have further queries.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi,
I can download the latest notebook from the link provided. But what I'm more worried about is your videos which explain the code is not updated as well. The trainer is still explaining the old code of "fetch_ml()". And then the code is updated in the notebook.

How do you then expect us to follow the updated code then ? All by our self ?

  Upvote    Share

Hi,

We will update our videos soon. However, please follow the code given in our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi Team,

In video, tutor is asking to use below code to load MINSAT dataset.

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]

But the above code is not working. This code is commented in latest code repository.And even if I try to run the above piece of code it is giving me error. on 'fetch_mldata'.

I can see some different lines of code in new file, which is below mentioned

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)

I can see mnist dataset in both the cases is a dictionary only and it has all data. But, this image algorithm is working differently I think, because in Viedo for y[36000] is giving 5, which is same as what is shown by matplot.But , with the latest code which is using 'fetch_openml', y[36000] is 9.

Plus there is a function in new code. 'sort_by_target'.
Please let me know reasons for all these things.

Thanks!

  Upvote    Share

Hi,

The code fetch_mldata() has been deprecated. We have updated our notebooks, you can download the latest notebook from our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi, Team

in Classification
from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist

Error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'

just tell me How should i practice..

  Upvote    Share

Hi,

Please note that fetch_mldata has been deprecated. We have updated our notebooks accordingly. Would request you to pull the updated notebook from our GitHub repository to reflect the same.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Please tell me how to do the practice??

  Upvote    Share

Hi,

You can get the latest notebooks from our GitHub repository, study the codes and understand their workings, and then imply the same understanding while working on the projects related to this topic.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

please help

  Upvote    Share

Side playground is not show to me.Please resolve.

  Upvote    Share

Hi Mohini,

Thank you for contacting us.
Take a look at the top right side of your screen, are you able to locate "Show Playground"? Just click on it.
Please feel free to let me know if you have any queries and I'll be glad to help.

Hope this helps.

Thanks.

-- Anupam Singh Vishal

  Upvote    Share

you can see in the screenshot sir, there no button named Show playground.

  Upvote    Share

I guess this session we have to do in our own jupyter notebook that is installed, since it is not graded.

  Upvote    Share

Hi, Srihari.

Yes, you can do it by following the tutorial and by creating with another Jupyter file.

All the best!

-- Satyajit Das

  Upvote    Share

Hi Mohini,

The playground will not show at the side of this topic, and a few more of them, as they do not have any assessments.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Please let me know where can I get the same dataset in local?

  Upvote    Share

Hi,

You can use the following command in your local Jupyter installation, and you will be able to access the same dataset:

mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.target <http: disq.us="" url?url="http%3A%2F%2Fmnist.target%3Ao8f962NqilaEdXjITo3wJy-wzBM&amp;cuid=4082636"> =
mnist.target.astype(np.int8)

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

  Upvote    Share

Hi,

Use this, it should work.
from sklearn.datasets import fetch_openml
mnist = fetch_openml(‘mnist_784’)

All the best!

-- Satyajit Das

  Upvote    Share

Hi,
I am trying to download but unable to do, please find the attached screen shot.

  Upvote    Share

Hi,

Try the following code instead:

from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, cache=True)

mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

What is sort_by_target function?

  Upvote    Share

Hi,

Please comment out that line and try again. Also, please note that we have updated our notebooks with this code. Would request you to use the latest notebooks from our GitHub repository.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi
The code after fitting classifier is supposed to give True boolean but it is giving False.Plz help
Thanks

  Upvote    Share

Hi,

Could you please share a screenshot of your code and the error that you are getting.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi
Can i get all stuff of recordings and slides for future reference.
Thanks

  Upvote    Share

Hi Prachi,

You will have a lifetime free access to all the videos and the slides. You can even download the slides using the arrow button that shows up on the top right corner when you hover over them.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

OK Thanks

  Upvote    Share

@disqus_XTh3bUKOBh:disqus Team,

I just wanted to highlight below two observations, due to change in the data source from MLDATA to OPENML:

1. The 36,000 th image in the video was '5' while in the OPENML dataset, it points to the value '9'. Perhaps, the order in this dataset seems shuffled.

2. When binary classification is attempted, the SGDClassifier gives out an error as "ValueError: The number of classes has to be greater than one; got 1 class". I came to know that, the issue lies with the data type of the dataset's labels. The data type of values of the 'target' key is 'object', which would not work when we create a boolean array of 5 (True) and not 5 (False). However, this can be resolved by changing the data type of labels using below code -

y = y.astype('int16')

I hope it helps everyone.

  Upvote    Share

Hi,

Thanks for pointing this out.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Great!!! thanks

  Upvote    Share

Hello,Even after changing the data type from object to int  (y=y.astype('int16')),i am getting the same error:

"ValueError: The number of classes has to be greater than one; got 1 class"

Please help

  Upvote    Share

Hi,

This error means that there is some issue with your dataset and not it's data type. Please follow our notebook for more details.

Thanks.

  Upvote    Share

Hi Team,
Why am i getting this error.

  Upvote    Share

Hi,

Try the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Thanks Rajtilak. It worked.

  Upvote    Share

Classification import issue...

  Upvote    Share

Try the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

  Upvote    Share

Thanks Vinay it worked. After proceeding further I am stuck with this error.

  Upvote    Share

Hi!

The possible resolution to this query is available in this post http://disq.us/p/28x5bs9 .

I hope it helps you.

  Upvote    Share

Hi Deepak,

Would request you to recheck your code, and check if you have formed the X_train and y_train_5 properly. If you want you can take a hint, or look at the answer to match with the code you wrote. If you are still stuck, would request you to post a screenshot of your code from the beginning.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hello,
While fitting the croos_val_score to the sgd_clf I'm getting the Convergence warning in the result.
how could this be solved?

  Upvote    Share

Hi Rohit,

It is a warning, and not an error. If your results are fine then you need not be worried about it.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hello,

I cannot find the jupyter notebook that is displayed on the right.
can someone help me with that?

Thanks in advance

  Upvote    Share

Hi Rohit,

This topic does not contain any assessment questions, so you would not find the playground on the right. However, if this is the issue you are facing with all topics, then would request you to restart your server using the following method:
https://discuss.cloudxlab.c...
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

okay. Thank you

  Upvote    Share

Hi,

I have imported data as mentioned below:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Problem1:

The image of digit for 36000 is attached. It is 9. It is mentioned in the pdf that 36000th image is '5', which is not the case.

The program gives warning after following code.

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train)

WARNING:
-------
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
ConvergenceWarning)

Problem 2:

The next line of code:
some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])

It produces output as: array(['9'], dtype='<u1')< b="">
The output mentioned in the course material (pdf) as: array([True], dtype=bool)

Query 1:

As suggested in above warning I changed the max_iter to '15' and reran the code.

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=15)
sgd_clf.fit(X_train, y_train)

some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])

The output I received is: array(['4'], dtype='<u1')< b="">

Can you please explain how max_iter is impacting the prediction from '9' to '4'?

Problem3:

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train_5)

getting below error while using y_train_5
----> 3 sgd_clf.fit(X_train, y_train_5)
ValueError: The number of classes has to be greater than one; got 1 class

None of the code is working for 'y_train_5'

from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train_5, cv=3,scoring="accuracy")

output: array([nan, nan, nan])

never_5_clf = Never5Classifier()
never_5_pred = never_5_clf.predict(X_train)
cross_val_score(never_5_clf, X_train, y_train_5,cv=3, scoring="accuracy")
Output: array([1., 1., 1.])
Output as per pdf: Never5Classifier - a dumb classifier gave an accuracy of 90%

  Upvote    Share

Hi Punit,

For the first query, it is a warning, not an error. So you need not change the max_iter. max_iter is the parameter which control the maximum number of iterations that can specify for training this model. It is the maximum number of iterations taken for the solvers to converge.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi!

You can refer this post http://disq.us/p/28x5bs9 for answer to your queries.

I hope it helps you.

  Upvote    Share

Not able to find data set fetch_mldata although i have already pull git repository.
can you share the path?

  Upvote    Share

Hi Alpesh,

Please use the following code instead:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

I am getting error on first line itself... I tried multiple times...

ImportError Traceback (most recent call last)
<ipython-input-3-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")

ImportError: cannot import name 'fetch_mldata'

  Upvote    Share

Hi Vivek,
The same question I have had. However, I have searched on google about fetch_mldata and found that it is dead because it relied on a website that died. So, We need to replace it with fetch_openml(), which relies on https://openml.org, which is alive and kicking. The data set name is "mnist_784" on this website.

Hi Cloudxlab Team,

Can we proceed with fetch_openml() instead of fetch_mldata(). Please let us know your response.
@disqus_zQl19TrWvN:disqus Please help.

Regards,

Jayant

  Upvote    Share

After importing the SGDClassifier and creating it's instance , when I run the fit model from this object , it throws an error - ValueError: The number of classes has to be greater than one; got 1 class
Please help

  Upvote    Share

Hi!

You can refer this post http://disq.us/p/28x5bs9 for resolution of your concern.

I hope it helps you.

  Upvote    Share

from sklearn.datasets import fetch_mldata

mnist = fetch_mldata("MNIST original")

This is not workong.
Showing

ImportError: cannot import name 'fetch_mldata'
Please help me out.

  Upvote    Share

Hi , when I run :
"from sklearn.datasets import fetch_mldata"

It gives below error:
ImportError Traceback (most recent call last)
<ipython-input-2-1955b0fbdeec> in <module>
1 import sklearn
----> 2 from sklearn.datasets import fetch_mldata

ImportError: cannot import name 'fetch_mldata'

Please help how to lead the mnist data.

  Upvote    Share

Hi, Harry.

Kindly refer to this discussions :- https://discuss.cloudxlab.c...

All the best!

  Upvote    Share

In this video, you have given a google drive link as shared folder where all PPts are provided. Please share that link with me .

  Upvote    Share

Hi, Vivek.

You can refer to this GitHub directory for any materials.
https://github.com/cloudxla...

All the best!

 1  Upvote    Share

At this path only notebooks and data is available. I want the PPT / PDF which are in google drive. Please share the google drive link to download PPT / PDF. You shared that in video with live attendees. While I am listening it now, I am not able to get those.
Thanks,

  Upvote    Share

from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist

ImportError: cannot import name 'fetch_mldata'

getting the above error .Please help me to resoolve it.

  Upvote    Share

Please use the following code instead of 1st line
def sort_by_target(mnist):
reorder_train=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[:60000])]))[:,1]
reorder_test=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[60000:])]))[:,1]
mnist.data[:60000]=mnist.data[reorder_train]
mnist.target[:60000]=mnist.target[reorder_train]
mnist.data[60000:]=mnist.data[reorder_test+60000]
mnist.target[60000:]=mnist.target[reorder_test+60000]
import numpy as np
from sklearn.datasets import fetch_openml
#from sklearn.datasets import fetch_mldata
#from sklearn.datasets import fetch_openml
#mnist = fetch_openml('MNIST original')
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_openml('mnist_784',version=1)
mnist.target=mnist.target.astype(np.int8)
sort_by_target(mnist)
mnist

Fetch_mldata fetched from a Site that is down currently, so use Fetch_openml which has different attributes for the data so we have to sort the data and convert the string target to a int.

  Upvote    Share

cannot import fetch_maldata is the error i am getting in the first line itself, there is no scikit
_learn folder in my home directory, pls help!! i created one as its given and ran the -rvc command to pull it but its still not working

  Upvote    Share

IF a Regression Model said to be performing well using performance metrics MAE or MSE, then what will be the ranges of MAE or MSE when data is not scaled? What will be the ranges of MAE and MSE if the data scaled in between 0 and 1 or -1 to 1?

  Upvote    Share

Hi sir, can you please upload the google drive link of slides so that every student can download them. Thanks

  Upvote    Share

The following error is got when trying to download the MNIST Data

:
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)

And nothing is downloaded...pl help

  Upvote    Share

..

  Upvote    Share

Hi, Anant.

Can you please tell where you are facing the problem?

All the best.

  Upvote    Share

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]

this is not working . giving error in 2nd line. it takes a lot of time to run and in the end it shows the error: Connection Reset by peer

  Upvote    Share

When too many people are downloading it happens. You can try after sometime.

  Upvote    Share