hello sir, suppose i have downloaded the data using the open_ml funtion. i see the data is in the form of a dictionaary
now i wish to create a file in jupyter and store the data there so that each time i open the jupyter i load it from the jupyter instead of running the command of sklearn.
how can I do that?
also, say i , read the downloaded data using a file handle, but when i read it , it is shown in the from a string. I am not able to convert it back to the dictionay mode as it was earlier when downlaoded using the open_ml
kindly help..
and secondly, wht does it mean to have a 'more than one class'
If I understood you correctly, you want to know how to save the sklearn datasets locally. So, you can do that by converting the data to a pandas data frame and saving it locally in CSV format. There are other methods too to save files in different formats locally but working with CSV files is much handier.
Can you give me the context of 'more than one class'?
now I have save this data as a text file on my jupyter notebook
When i Load it using as a handle and read its content using f.read()
I get an obvious string.
had i saved it as a CSV, it would still be read as a string coz when you read data from a file, it is always read as a stiring
now the problem that i am facing is that inorder to proceed further i need it in the form of a dictonary as i had downloaded which i tried but in vain.
for example:
So this is a string where as the earlier one was a dictionary.
I am not able to convert this string into a dictionary..
Abouth the class thing::
i am not able to understand as to what 'class' is the error about
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version= 1, cache = True)
#please have a look at the data. Its a dictionary
X,y = mnist['data'], mnist['target']
#looking at the datasamples
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
X = X.to_numpy()
some_digits = X[36000]
some_digits_image = some_digits.reshape(28,28)
plt.imshow(some_digits_image, cmap= matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.show()
mnist.keys()
X_train , X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
np.random.seed(42)
shuffle_index = np.random.permutation(60000) #creates an array of 600000 nos. randomly
X_train = X_train[shuffle_index]
y_train = y_train[shuffle_index]
#shuffling of the data has been done
#binary classification using SGDClassifier
y_train_9 = y_train == 9
y_train_9
#picking up the classifier to see whether the it yield the right output
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_9)
Here's my work
please point out the mistake so that i can proceed
Hi
I had a question, thank you from the professor for answering me
Can I do the image processing when the images were taken with the Samsung a 52s That I can recognize a few millimeters in the image and do labeling ?
When training, we are trying to get optimal generalized model which is not effected by order of training samples. So we shuffle train data. While testing, we are not really making any changes to the model, but just using that already trained model to test its performance( in terms of accuracy or other metrics) on unseen data. So we don't need to shuffle test data.
We apply the theory for solving the hands-on. So without learning the theory, we cannot provide the hands-on. For example, without knowing what One-Hot Encoding is, one will not be able to apply it at the correct place.
Yes, it should have both True and False. Why don't you write that code in a separate cell and try again. If you get the same results, check your code using which you created y_train_5.
Referring to the line "X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]"
We used sklearn.model_selection.train_test_split for splitting the data in the previous projects. Whereas, in this topic, we are using the above statement. Why are we following different approaches? Pl clarify
The train_test_split is like an API which we could use to split the data just by mentioning % of train and test sets. It's just like using an existing function. Whereas this method is manual where we manually mention the indices till which we wish to have train and test. It is better to know both of the approaches.
The line "mnist = fetch_openml('mnist_784')" is successfully getting executed. It created a folder scikit_learn_data in my home directory. I could print the values of mnist. But, I do not see any .mat file created in my home directory. Where is the dataset downloaded? Pl clarify.
We have many evaluation metrics like precision, recall, accuracy, etc. But we can’t generalize something to be the best, like a one-size-fits-all solution. It often changes based on the scenario for which we want to build the model.
For example,
Consider the scenario where we want to build a model to classify a credit card transaction to be fraudulent or not. Here it is more important for us to make sure no fraudulent transaction is mistakenly classified as a non-fraudulent transaction, because this is a monetary issue where security should be the at-most priority. Thus we can’t afford False Negatives. So we shall focus on improving recall by reducing False Negatives. (recall = (true positives) / (true positives + false negatives)).
In some other situations, like spam email detection, it’s sometimes ok to classify a spam-email(positive) as a non-spam-email(negative), but it’s not ok to mark a non-spam-email as a spam email, as the user might miss some valuable information carried by a good email. So here we can’t afford False Positives, and hence precision matters here. So here, we care for high precision(precision = (true positives)/(true positives+false positives)), whereas in our fraudulent detection case we care for high recall. So based on our necessity, we generally choose the features which positively affect the higher performance in terms of the chosen metric.
Further, Sensitivity is nothing but recall. Sensitivity = (true positives)/(true positives + false negatives). In our credit car example, this answers the question: Of the total fraudulent transactions, how many are correctly classified as fraudulent.
Specificity is the opposite of sensitivity. Specificity = (true negatives)/(true negatives + false positives). This answers the question: Of the total non-fradulent transactions, how many are correctly classified to be non-fraudulent.
Hi I am unable to view the Jupyter notebook on the right side of the half screen. Also, I do not see an option to "Hide Playground" or "Show Playground" on the screen. Please help how to enable it so that I can run the code side by side while I am going through the learning material.
Hello. I am trying to use SGDClassifier. But getting error. Please help. Thanks
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-35-c2594dabc585> in <module>
2
3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
709 loss=self.loss, learning_rate=self.learning_rate,
710 coef_init=coef_init, intercept_init=intercept_init,
--> 711 sample_weight=sample_weight)
712
713
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
548
549 self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550 classes, sample_weight, coef_init, intercept_init)
551
552 if (self.tol is not None and self.tol > -np.inf
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
512 raise ValueError(
513 "The number of classes has to be greater than one;"
--> 514 " got %d class" % n_classes)
515
516 return self
ValueError: The number of classes has to be greater than one; got 1 class
fetch_mldata() has been deprecated. Please use fetch_openml() instead, you can find the updated code in the slides and notebook from our GitHub repository.
The classifier expects atleast 2 unique class labels for training, whereas you are providing the data with only "9" class label. So give the data with at least 2 class labels.
A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include:
Estimation problems: the "decision" is the estimate.
Hypothesis testing problems: the decision is to reject or not reject the null hypothesis.
Classification problems: the decision is to classify a new observation (or observations) into a category.
Model selection problems: the decision is to chose one of the candidate models.
When you want to classify the number in a given image, you should use "predict" function on the trained model. Here, you are using model.fit, which is used to train a model based on the input data. So, first train you model on the train features and train labels. After training, use model.predict to predict the class of a given image. Hope this helps.
In that case, as your error says, the number of classes the model is expecting is more than 1(here you should pass the data with 2 classes). So train the model by sending data with 2 classes as expected by the keras model. Then you could predict an image as part of testing. Hope this helps.
While dividing data into train-test set, mannually we are picking first 60,000 rows and labelling it as train data. Would not that include bias and hence, train data obtained would not be random?
Having a test size of 0.9 means that 90% of the data will be set aside for testing purposes, which is not practical because you need more data to train and less data to test.
So there's a basic rule of Machine Learning/Deep Learning, that there is no one rule fits all. Depending on data size, the problem at hand, and other factors, you need to decide what you need to do. Here, the dataset is already split into train and test sets, so you don't need to do that, but if you want to experiment you sure can. Merge the train and test set and then use train_test_split on it. StratifiedShuffleSplit's number of strata has got nothing to do with the number of digits, actually it does not have a strata hyperparameter at all. Rather you need to specify the number of splits using the n_splits hyperparameter which does not need to be equal to the number of digits. Read more about it from the below link:
1. It seems like until I run the below code, the output of y[36000] is 9 and not 5
def sort_by_target(mnist):
reorder_train = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[:60000])]))[:, 1]
reorder_test = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[60000:])]))[:, 1]
mnist.data[:60000] = mnist.data[reorder_train]
mnist.target[:60000] = mnist.target[reorder_train]
mnist.data[60000:] = mnist.data[reorder_test + 60000]
mnist.target[60000:] = mnist.target[reorder_test + 60000]
This code is not explicitly covered in the video. Can you please explain what is going on here and why is the output without this code different, secondly why is this not covered in the video or has an explanation of the use in the notebook?
2. If I run the below without using the snippet in #1 above
I can see that my y_train still has 5 as the target variable but y_train_5 does not have any TRUE values (which is leading to the only 1 class error that most people got). My guess is that this is happening due to the data type for y_train and 5 not being the same. It gets fixed if I use the below: -
While I am glad that these things were not mentioned and I got to learn these things on my own, I think it will be better to have these things covered in the notebook/video and why they are being used
1. It is a sorting function which sorts the dataset based on the target value.
2. With these 2 lines of code, we are simply marking any other target value other than '5' as False. We are doing this because we only want to classify the digit '5' now.
3. We would urge you to explore the codes, and try to find out how they work. If we explain everything, like I did just now, it would defeat the purpose of the course where we are trying to help you learn. If you come across a code, be it here or elsewhere, you can try to find out how it works by searhing in Google or Stackoverflow.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-c2594dabc585> in <module>
2
3 sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
----> 4 sgd_clf.fit(X_train, y_train_5)
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)
709 loss=self.loss, learning_rate=self.learning_rate,
710 coef_init=coef_init, intercept_init=intercept_init,
--> 711 sample_weight=sample_weight)
712
713
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)
548
549 self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,
--> 550 classes, sample_weight, coef_init, intercept_init)
551
552 if (self.tol is not None and self.tol > -np.inf
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)
512 raise ValueError(
513 "The number of classes has to be greater than one;"
--> 514 " got %d class" % n_classes)
515
516 return self
ValueError: The number of classes has to be greater than one; got 1 class
The accuracy for the Never5Classifier is 90% because the classifier always predict with Not 5 irrespective of the input. Since most of the dataset consists of digits which are not 5, it matches the prediction without it doing any actual prediction.
random.permutation() randomly permute a sequence, or return a permuted range. We are using it here so that we can shuffle the dataset, it is somewhat similar to shuffling a deck of cards.
Sir just wanted to say one thing i am working in a company where i am not getting enough time.I took this course with lots of expectation as it is being associated with iit but the only prblem is that the content is nice but its duration is very very big.It gets really boring after some time and i drop it in between.Please reduce the duration in some ways.MORE THAN 2-2.5 HOURS TAKES MY ENTIRE DAY TO COVER.A bit disappointed.The code length is also very big and not understandable.
Thank you for your feedback. If you check out the lecture videos, you will find that these contains numerous sub topics. I would suggest that instead of covering an entire video, cover by sub topics each day. Also, take notes as and when you are learning from the videos, you can also make flashcards. These will help you remember these concepts for a long time.
In those slides, where there is no evaluation to present, you will not see the side playground, although you can still open the lab in a separate tab from My Lab page.
The variable name on the first line should be sgd_clf, it has a typo. Let me know if this solves the issue. If not, then would request you to review your code against our notebooks from our GitHub repository.
I checked, his voice is not clear at that instance. He said "of 5" and not "not 5", which makes is a False Negative. You are right about the definition of True Negative though.
In the video in the first 30 mins, the data was first split and then shuffled. What is the purpose of shuffling after splitting?
As per my knowledge we shuffle before splitting to get the mixed data which covers all type of samples in both train and test sets.
I also think shuffling after splitting minimizes the opportuntiy of testing randomly because we loose track which rocord has which label and check if our predicted values is equal to the expected label as everything was shuffled.
If you notice the comment just above that cell, we are shuffling here because we will be using cross validation next. We want to reduce bias even with the training and validation sets.
from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)
Error:
ValueError: The number of classes has to be greater than one; got 1 class
I have changed the way the way the dataset was split into training and test by using test_train_split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15).
The y_train_9 is a numpy array containing 59500 values which are all False when I am doing it this way. Is it correct? Also, why are we using the y_train_9 instead of y_train?
I tried the binary classification of predicting value 1 image. I got precision as 97% ,recall as 91% and f1_score as 94%.whereas in lecture i saw for value 5 image precision , recall and fi_score are different which are in the range of 70%. Please tell me according to the input image the precision,recall and f1_score gets changed? and if it gets changed then how should i know that i am getting good prediction of my model?
A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include: Estimation problems: the "decision" is the estimate.
y_train_5 is the target variable which points only to the "5" digit since this is a binary classification, even though the dataset contains all digits.
interpolation='nearest' simply displays an image without trying to interpolate between pixels if the display resolution is not the same as the image resolution (which is most often the case). It will result an image in which pixels are displayed as a square of multiple pixels.
Hi Team,
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
ConvergenceWarning)
why am I getting above warning while running code / what does it mean ?
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
sgd_clf.fit(X_train, y_train_5)
1.why we used y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3) instead of y_train_pred=sgd_clf.predict(X_train)?
2.For prediction we should be passing only the training data and it should return the target variable ,why we are passing both?
3.In calculating scores the cross_val returned the scores for each fold , but here we got only one output.Can you explain how cross_val_predict work in this case?
1. Generate cross-validated estimates for each input data point. The data is split according to the cv parameter. Each sample belongs to exactly one test set, and its prediction is computed with an estimator fitted on the corresponding training set.
2. That is the syntax of the cross_val_predict() function.
3. The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).
The MNIST dataset referred here is a part of the Scikit-Learn dataset. However, you are right, Keras too contains the MNIST dataset. You can find the details of all the Keras datasets in the below link:
In your video-recording tied at 01:21:38 approx wrt MNIST dataset----
a) The given large image further broken down into smaller images comprising of dimensions --- 28 x 28 pixels i.e. 784 features (pixels). In other words each pixel is converted into a column. Does this mean that each small image comprises of 28 columns & the given large image (in the slide) comprises of totally 784 columns.??? Is this what is meant by the Trainer's statement? Just want to understand the concept that has been grasped by me. Kindly correct me, in case if I have wrongly understood.
b) Now these 784 features (pixels) can be transformed into an array comprising of 784 blocks. .
c) 70,000 are the rows. How did 70,000 come into the picture or rather how was this figure derived or arrived at?
There was only the explanation of fit which I know what is doing ,But in the link they have said we are only providing the 5 ,but I don't think it is because we are not choosing particularly 5 we are passing the whole data set X which contains different numbers . And I have asked why we have set y=0 by default ? and inside predict why we are returning zeros of shape (len(x),1)?
The Never5Classifier is just a toy classifier which always predicts False (meaning "not a 5"), without even looking at the image. The goal is to demonstrate that even such a bad classifier (which doesn't learn anything at all and doesn't even look at the images) can get pretty good accuracy if most images are not 5s.
1. It seems a few functions in the code provided in the PDF explained in video is deprecated. I had to put in a lot of unnecessary time to understand "fetch_mldata()" is deprecated. Please attach the latest updated git hub code link below the video.
2. Also until 1 hour into this video the trainer is explaining from some other PDF which is not added here. Exactly at "1:12:00" is where he starts with Classification PDF , the one which you have attached here in the course. Please five us the transcripts(more importantly the PDFs) of the explanation till "1:12:00"
1. We constantly try to update our notebooks as and when required. We have addresses this change too by updating our notebook in our GitHub repository. You can obtain them by forking our repository from the link below:
If in future you face any such issues, would request you to either check our repository whether we have changed any codes, or let us know through your comments.
2. The first part of this lecture is a continuation of the End-to-End project, and you would find these slides under that topic.
Hi, I can download the latest notebook from the link provided. But what I'm more worried about is your videos which explain the code is not updated as well. The trainer is still explaining the old code of "fetch_ml()". And then the code is updated in the notebook.
How do you then expect us to follow the updated code then ? All by our self ?
In video, tutor is asking to use below code to load MINSAT dataset.
from sklearn.datasets import fetch_mldata mnist = fetch_mldata("MNIST original") X, y = mnist["data"], mnist["target"]
But the above code is not working. This code is commented in latest code repository.And even if I try to run the above piece of code it is giving me error. on 'fetch_mldata'.
I can see some different lines of code in new file, which is below mentioned
from sklearn.datasets import fetch_openml import numpy as np mnist = fetch_openml('mnist_784', version=1, cache=True)
I can see mnist dataset in both the cases is a dictionary only and it has all data. But, this image algorithm is working differently I think, because in Viedo for y[36000] is giving 5, which is same as what is shown by matplot.But , with the latest code which is using 'fetch_openml', y[36000] is 9.
Plus there is a function in new code. 'sort_by_target'. Please let me know reasons for all these things.
in Classification from sklearn.datasets import fetch_mldata # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat # in your home directory. # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data . mnist = fetch_mldata("MNIST original") mnist
Error: --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-caec9ae19f90> in <module> ----> 1 from sklearn.datasets import fetch_mldata 2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat 3 # in your home directory. 4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data . 5 mnist = fetch_mldata("MNIST original")
Please note that fetch_mldata has been deprecated. We have updated our notebooks accordingly. Would request you to pull the updated notebook from our GitHub repository to reflect the same.
You can get the latest notebooks from our GitHub repository, study the codes and understand their workings, and then imply the same understanding while working on the projects related to this topic.
Thank you for contacting us. Take a look at the top right side of your screen, are you able to locate "Show Playground"? Just click on it. Please feel free to let me know if you have any queries and I'll be glad to help.
Please comment out that line and try again. Also, please note that we have updated our notebooks with this code. Would request you to use the latest notebooks from our GitHub repository.
You will have a lifetime free access to all the videos and the slides. You can even download the slides using the arrow button that shows up on the top right corner when you hover over them.
I just wanted to highlight below two observations, due to change in the data source from MLDATA to OPENML:
1. The 36,000 th image in the video was '5' while in the OPENML dataset, it points to the value '9'. Perhaps, the order in this dataset seems shuffled.
2. When binary classification is attempted, the SGDClassifier gives out an error as "ValueError: The number of classes has to be greater than one; got 1 class". I came to know that, the issue lies with the data type of the dataset's labels. The data type of values of the 'target' key is 'object', which would not work when we create a boolean array of 5 (True) and not 5 (False). However, this can be resolved by changing the data type of labels using below code -
Would request you to recheck your code, and check if you have formed the X_train and y_train_5 properly. If you want you can take a hint, or look at the answer to match with the code you wrote. If you are still stuck, would request you to post a screenshot of your code from the beginning.
This topic does not contain any assessment questions, so you would not find the playground on the right. However, if this is the issue you are facing with all topics, then would request you to restart your server using the following method: https://discuss.cloudxlab.c... Thanks.
I have imported data as mentioned below: from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1, cache=True)
Problem1:
The image of digit for 36000 is attached. It is 9. It is mentioned in the pdf that 36000th image is '5', which is not the case.
The program gives warning after following code.
from sklearn.linear_model import SGDClassifier sgd_clf = SGDClassifier(random_state=42, max_iter=10) sgd_clf.fit(X_train, y_train)
WARNING: ------- /usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit. ConvergenceWarning)
Problem 2:
The next line of code: some_digit = X[36000] # Taking the 36,000th image sgd_clf.predict([some_digit])
It produces output as: array(['9'], dtype='<u1')< b=""> The output mentioned in the course material (pdf) as: array([True], dtype=bool)
Query 1:
As suggested in above warning I changed the max_iter to '15' and reran the code.
from sklearn.linear_model import SGDClassifier sgd_clf = SGDClassifier(random_state=42, max_iter=15) sgd_clf.fit(X_train, y_train)
some_digit = X[36000] # Taking the 36,000th image sgd_clf.predict([some_digit])
The output I received is: array(['4'], dtype='<u1')< b="">
Can you please explain how max_iter is impacting the prediction from '9' to '4'?
from sklearn.linear_model import SGDClassifier sgd_clf = SGDClassifier(random_state=42, max_iter=10) sgd_clf.fit(X_train, y_train_5)
getting below error while using y_train_5 ----> 3 sgd_clf.fit(X_train, y_train_5) ValueError: The number of classes has to be greater than one; got 1 class
None of the code is working for 'y_train_5'
from sklearn.model_selection import cross_val_score cross_val_score(sgd_clf, X_train, y_train_5, cv=3,scoring="accuracy")
output: array([nan, nan, nan])
never_5_clf = Never5Classifier() never_5_pred = never_5_clf.predict(X_train) cross_val_score(never_5_clf, X_train, y_train_5,cv=3, scoring="accuracy") Output: array([1., 1., 1.]) Output as per pdf: Never5Classifier - a dumb classifier gave an accuracy of 90%
For the first query, it is a warning, not an error. So you need not change the max_iter. max_iter is the parameter which control the maximum number of iterations that can specify for training this model. It is the maximum number of iterations taken for the solvers to converge.
I am getting error on first line itself... I tried multiple times...
ImportError Traceback (most recent call last) <ipython-input-3-caec9ae19f90> in <module> ----> 1 from sklearn.datasets import fetch_mldata 2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat 3 # in your home directory. 4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data . 5 mnist = fetch_mldata("MNIST original")
Hi Vivek, The same question I have had. However, I have searched on google about fetch_mldata and found that it is dead because it relied on a website that died. So, We need to replace it with fetch_openml(), which relies on https://openml.org, which is alive and kicking. The data set name is "mnist_784" on this website.
Hi Cloudxlab Team,
Can we proceed with fetch_openml() instead of fetch_mldata(). Please let us know your response. @disqus_zQl19TrWvN:disqus Please help.
After importing the SGDClassifier and creating it's instance , when I run the fit model from this object , it throws an error - ValueError: The number of classes has to be greater than one; got 1 class Please help
At this path only notebooks and data is available. I want the PPT / PDF which are in google drive. Please share the google drive link to download PPT / PDF. You shared that in video with live attendees. While I am listening it now, I am not able to get those. Thanks,
from sklearn.datasets import fetch_mldata # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat # in your home directory. # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data . mnist = fetch_mldata("MNIST original") mnist
ImportError: cannot import name 'fetch_mldata'
getting the above error .Please help me to resoolve it.
Please use the following code instead of 1st line def sort_by_target(mnist): reorder_train=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[:60000])]))[:,1] reorder_test=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[60000:])]))[:,1] mnist.data[:60000]=mnist.data[reorder_train] mnist.target[:60000]=mnist.target[reorder_train] mnist.data[60000:]=mnist.data[reorder_test+60000] mnist.target[60000:]=mnist.target[reorder_test+60000] import numpy as np from sklearn.datasets import fetch_openml #from sklearn.datasets import fetch_mldata #from sklearn.datasets import fetch_openml #mnist = fetch_openml('MNIST original') # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat # in your home directory. # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data . mnist = fetch_openml('mnist_784',version=1) mnist.target=mnist.target.astype(np.int8) sort_by_target(mnist) mnist
Fetch_mldata fetched from a Site that is down currently, so use Fetch_openml which has different attributes for the data so we have to sort the data and convert the string target to a int.
cannot import fetch_maldata is the error i am getting in the first line itself, there is no scikit _learn folder in my home directory, pls help!! i created one as its given and ran the -rvc command to pull it but its still not working
IF a Regression Model said to be performing well using performance metrics MAE or MSE, then what will be the ranges of MAE or MSE when data is not scaled? What will be the ranges of MAE and MSE if the data scaled in between 0 and 1 or -1 to 1?
The following error is got when trying to download the MNIST Data
: c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22 warnings.warn(msg, category=DeprecationWarning) c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22 warnings.warn(msg, category=DeprecationWarning)
Please login to comment
240 Comments
hello sir, suppose i have downloaded the data using the open_ml funtion. i see the data is in the form of a dictionaary
now i wish to create a file in jupyter and store the data there so that each time i open the jupyter i load it from the jupyter instead of running the command of sklearn.
how can I do that?
also, say i , read the downloaded data using a file handle, but when i read it , it is shown in the from a string. I am not able to convert it back to the dictionay mode as it was earlier when downlaoded using the open_ml
kindly help..
and secondly, wht does it mean to have a 'more than one class'
kindly help
Thanks
Upvote ShareIf I understood you correctly, you want to know how to save the sklearn datasets locally. So, you can do that by converting the data to a pandas data frame and saving it locally in CSV format. There are other methods too to save files in different formats locally but working with CSV files is much handier.
Can you give me the context of 'more than one class'?
Upvote ShareI have save the in text format as well as in CSV format. however, when i load it, its loaded as a string and not as a dictionary.
i will just show you the exact situation.
Upvote Sharemnist = fetch_openml('mnist_784', version= 1, cache = True)
I used this commad to download the data
it looks like this. and its a dictionary.
now I have save this data as a text file on my jupyter notebook
When i Load it using as a handle and read its content using f.read()
I get an obvious string.
had i saved it as a CSV, it would still be read as a string coz when you read data from a file, it is always read as a stiring
now the problem that i am facing is that inorder to proceed further i need it in the form of a dictonary as i had downloaded which i tried but in vain.
for example:
So this is a string where as the earlier one was a dictionary.
I am not able to convert this string into a dictionary..
Abouth the class thing::
i am not able to understand as to what 'class' is the error about
You can read the CSV file by the pandas read_csv method. That directly loads it as a DataFrame.
Upvote Shareplease guide with regard to the error in the SGDClassifier
Thanks
Upvote ShareThis comment has been removed.
The error means Target variable y_train_9 contains only one unique class. To fix that you'll have to load the dataset correctly.
Upvote ShareThis comment has been removed.
Here's my work
please point out the mistake so that i can proceed
Thanks
Upvote ShareHi,
The mistake is at :
If you print the y_train, then you will notice that it contains string of integers instead of real integers. It contains-
And here you can see the dtype is 'object' instead of ;int'.
So, you have to change this line to-
to make it work.
Upvote ShareHi
Upvote ShareI had a question, thank you from the professor for answering me
Can I do the image processing when the images were taken with the Samsung a 52s That I can recognize a few millimeters in the image and do labeling ?
Hi,
Yes, real-time models are trained to handle images of all quality. So yes, you can do the image processing in that case too.
Thanks
1 Upvote ShareInstead of fetch_mldata() please use fetch_openml()
Hope this helps.
Upvote ShareThis comment has been removed.
Hi,
It is a warning which you may ignore.
Thanks.
Upvote ShareWhy have we shuffled only the training dataset? Why not shuffle the test dataset as well??
1 Upvote ShareHi,
When training, we are trying to get optimal generalized model which is not effected by order of training samples. So we shuffle train data. While testing, we are not really making any changes to the model, but just using that already trained model to test its performance( in terms of accuracy or other metrics) on unseen data. So we don't need to shuffle test data.
Thanks.
1 Upvote ShareOkay. Thank you for the clarification
Upvote ShareI would advise you to please provide hands-on at the beginning of a chapter, it would be easy to relate theory with the hands-on part.
Upvote ShareHi,
We apply the theory for solving the hands-on. So without learning the theory, we cannot provide the hands-on. For example, without knowing what One-Hot Encoding is, one will not be able to apply it at the correct place.
Thanks.
4 Upvote ShareThis comment has been removed.
Hi,
Yes, it should have both True and False. Why don't you write that code in a separate cell and try again. If you get the same results, check your code using which you created y_train_5.
Thanks.
Upvote ShareReferring to the line "X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]"
We used sklearn.model_selection.train_test_split for splitting the data in the previous projects. Whereas, in this topic, we are using the above statement. Why are we following different approaches? Pl clarify
Upvote ShareHi,
The train_test_split is like an API which we could use to split the data just by mentioning % of train and test sets. It's just like using an existing function. Whereas this method is manual where we manually mention the indices till which we wish to have train and test. It is better to know both of the approaches.
Thanks.
Upvote ShareThe line "mnist = fetch_openml('mnist_784')" is successfully getting executed. It created a folder scikit_learn_data in my home directory. I could print the values of mnist. But, I do not see any .mat file created in my home directory. Where is the dataset downloaded? Pl clarify.
Upvote ShareHi,
The data is in .gz format, you can check inside the scikit-learn folder that was created.
Thanks.
Upvote Sharewhat are coeff_ and intercept_ in SGDClassifier?
1 Upvote ShareHi,
If you look at the equation of a straight line, it is as follows:
y = mx + c
Here, m is the coefficient, and c is the intercept.
Thanks.
Upvote Shareis it opeml or mldata to fetch the data? there is a difference from what shown in video and whats given in the slides.
1 Upvote ShareHi,
Try the following:
As of version 0.20, sklearn deprecates fetch_mldata function.
Thanks.
Upvote Sharethank you sir
1 Upvote ShareThis comment has been removed.
Code:
from sklearn.datasets import fetch_openml
Upvote Sharemnist = fetch_openml("MNIST Original")
mnist
Error:
Upvote ShareHi,
Could you please attach the screenshot of your issue?
Thanks.
Upvote ShareIn precision, and recall which one should be high, which one should be low, please explain, what about sensitivity and specificity
1 Upvote ShareHi,
We have many evaluation metrics like precision, recall, accuracy, etc. But we can’t generalize something to be the best, like a one-size-fits-all solution. It often changes based on the scenario for which we want to build the model.
For example,
Consider the scenario where we want to build a model to classify a credit card transaction to be fraudulent or not. Here it is more important for us to make sure no fraudulent transaction is mistakenly classified as a non-fraudulent transaction, because this is a monetary issue where security should be the at-most priority. Thus we can’t afford False Negatives. So we shall focus on improving recall by reducing False Negatives. (recall = (true positives) / (true positives + false negatives)).
In some other situations, like spam email detection, it’s sometimes ok to classify a spam-email(positive) as a non-spam-email(negative), but it’s not ok to mark a non-spam-email as a spam email, as the user might miss some valuable information carried by a good email. So here we can’t afford False Positives, and hence precision matters here. So here, we care for high precision(precision = (true positives)/(true positives+false positives)), whereas in our fraudulent detection case we care for high recall. So based on our necessity, we generally choose the features which positively affect the higher performance in terms of the chosen metric.
Thanks.
1 Upvote ShareFurther, Sensitivity is nothing but recall. Sensitivity = (true positives)/(true positives + false negatives). In our credit car example, this answers the question: Of the total fraudulent transactions, how many are correctly classified as fraudulent.
Specificity is the opposite of sensitivity. Specificity = (true negatives)/(true negatives + false positives). This answers the question: Of the total non-fradulent transactions, how many are correctly classified to be non-fraudulent.
1 Upvote ShareHow to download these slides?
Upvote ShareHi,
If you hover your cursor over the slides, you will see an arrow icon on the top right of the slides. You can click on that to download these slides.
Thanks.
Upvote ShareHi,
Please click on the arrow mark of top-right corner in slides section:
Then in the new tab, you can either save the slides to your google drive, or click on print option and download it.
Thanks.
Upvote ShareI did not understand .
How much you are putting in y_test and X_test??
please say
Upvote ShareHi,
As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:
Thanks.
Upvote Sharehow is the 10,000 in test_set??
You are putting 60,000 in test set
Upvote ShareHi,
As shown above, the test_set contains 10,000 samples and the train set contains 60,000 samples:
Thanks.
Upvote Shareshowing error of image as per instruction given in Videos hence getting confused.
Hi Manjunath,
The dataset has changed a bit in recent version. So, the image at same index may be different.
Upvote ShareBcz of this I am not able to practice..just listen to videos..
Upvote ShareSplit your screen and open colab on the right screen...do the hands-on.
1 Upvote ShareGetting aboe mentioned error
Upvote ShareThis comment has been removed.
Hi I am unable to view the Jupyter notebook on the right side of the half screen. Also, I do not see an option to "Hide Playground" or "Show Playground" on the screen. Please help how to enable it so that I can run the code side by side while I am going through the learning material.
Thanks
Upvote ShareHi Shashwat,
As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.
Upvote ShareHello. I am trying to use SGDClassifier. But getting error. Please help. Thanks
Error:
Hi,
Please check the y_train_5 train set, it needs to have more than 1 class. If it does not, please review your code where you created this dataset.
Thanks.
Upvote ShareHello
using
from sklearn.datasets import fetch_mldata always gives error:
Hi,
fetch_mldata() has been deprecated. Please use fetch_openml() instead, you can find the updated code in the slides and notebook from our GitHub repository.
Thanks.
Upvote Sharethanks
Upvote ShareHi
The lab is not visisble here. Can you pls guide me
Hi Sahoo,
As there is no assessment to perform here, we have not provided the side playground you can still use lab services in a different tab.
Upvote Sharein above code, I am getting following error
Hi,
The classifier expects atleast 2 unique class labels for training, whereas you are providing the data with only "9" class label. So give the data with at least 2 class labels.
Thanks.
Upvote ShareHow can I give the data 2 class labels. I am stuck at this stage also and getting the same error.
Upvote ShareHi,
You could use the follow:
This returns the rows with class labels '5' and '6'. You could modify the code as per your need. Hope this helps.
Thanks.
1 Upvote ShareHello,
Can you please explain. what is decision function?
Thanks!
1 Upvote ShareHi,
Good question!
Please find the detailed explanation of a decision function from the below link:
https://stats.stackexchange.com/questions/104988/what-is-the-difference-between-a-loss-function-and-decision-function
Thanks.
Upvote ShareHi,
Thanks for replying. I tried to go through the link but was not able to understand it. Can you please explain it in simple terms.
Thanks!
Upvote ShareHi,
A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include:
Thanks.
Upvote Sharehello,
I need MNIST dataset in offline mode, so could you please provide me the path to access the dataset.
Upvote ShareHi Taksham,
When you use sklearn to download data, it gets downloaded in a folder in a file. You can download it from there.
You can also download it from here: http://yann.lecun.com/exdb/mnist/
Thank you so much sir !
Upvote ShareHi,
I am using Stochastic Gradient Descent Classifier to check whether a digit is 5 or not. I am getting the below error :
I have checked training set also. It is showing count for digit 5 as 5421 but when I am checking using y_train==5, I am getting false values only.
Please let me know the mistake that I am doing.
Thanks!
Upvote ShareHi,
When you want to classify the number in a given image, you should use "predict" function on the trained model. Here, you are using model.fit, which is used to train a model based on the input data. So, first train you model on the train features and train labels. After training, use model.predict to predict the class of a given image. Hope this helps.
Thanks.
Upvote ShareHi,
I have not trained the model yet. I am trying out binary classifier above. So, I am fitting the model to check whether digit is 5 or not.
Thanks!
Upvote ShareHi,
In that case, as your error says, the number of classes the model is expecting is more than 1(here you should pass the data with 2 classes). So train the model by sending data with 2 classes as expected by the keras model. Then you could predict an image as part of testing. Hope this helps.
Thanks.
Upvote ShareThis comment has been removed.
Hi,
While dividing data into train-test set, mannually we are picking first 60,000 rows and labelling it as train data. Would not that include bias and hence, train data obtained would not be random?
Can we use train_test_split with test_size=0.9 ?
Upvote ShareHi,
Having a test size of 0.9 means that 90% of the data will be set aside for testing purposes, which is not practical because you need more data to train and less data to test.
Thanks.
Upvote ShareHi,
Sorry, by mistake I have written test_size=0.9. I meant 90% of data fro training purpose. Can we split using train_test_split?
If we want to perform stratifiedshuffle split, then numbers of stratas to be created should be 10 i.e equal to number of digits ?
Thanks.
Upvote ShareHi,
So there's a basic rule of Machine Learning/Deep Learning, that there is no one rule fits all. Depending on data size, the problem at hand, and other factors, you need to decide what you need to do. Here, the dataset is already split into train and test sets, so you don't need to do that, but if you want to experiment you sure can. Merge the train and test set and then use train_test_split on it. StratifiedShuffleSplit's number of strata has got nothing to do with the number of digits, actually it does not have a strata hyperparameter at all. Rather you need to specify the number of splits using the n_splits hyperparameter which does not need to be equal to the number of digits. Read more about it from the below link:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html
Thanks.
Upvote ShareHi,
Thanks for the clarity.
Upvote ShareI have a problem with Playground. The show playground option is not visble in Classification module.Please resolve it
Upvote ShareHi Godishala,
The side playground is not available for the slides where there is nothing to evalaute.
Upvote ShareHi,
1. It seems like until I run the below code, the output of y[36000] is 9 and not 5
def sort_by_target(mnist):
reorder_train = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[:60000])]))[:, 1]
reorder_test = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[60000:])]))[:, 1]
mnist.data[:60000] = mnist.data[reorder_train]
mnist.target[:60000] = mnist.target[reorder_train]
mnist.data[60000:] = mnist.data[reorder_test + 60000]
mnist.target[60000:] = mnist.target[reorder_test + 60000]
This code is not explicitly covered in the video. Can you please explain what is going on here and why is the output without this code different, secondly why is this not covered in the video or has an explanation of the use in the notebook?
2. If I run the below without using the snippet in #1 above
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
I can see that my y_train still has 5 as the target variable but y_train_5 does not have any TRUE values (which is leading to the only 1 class error that most people got). My guess is that this is happening due to the data type for y_train and 5 not being the same. It gets fixed if I use the below: -
y_train_5 = (y_train.astype(np.int8) == int(5))
y_test_5 = (y_test.astype(np.int8) == int(5))
I also saw that this is being covered (again without an explicit note or purpose of why it is being done in the below snippet) in the github code -
mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)
While I am glad that these things were not mentioned and I got to learn these things on my own, I think it will be better to have these things covered in the notebook/video and why they are being used
Thanks,
Rohit
Upvote ShareHi,
1. It is a sorting function which sorts the dataset based on the target value.
2. With these 2 lines of code, we are simply marking any other target value other than '5' as False. We are doing this because we only want to classify the digit '5' now.
3. We would urge you to explore the codes, and try to find out how they work. If we explain everything, like I did just now, it would defeat the purpose of the course where we are trying to help you learn. If you come across a code, be it here or elsewhere, you can try to find out how it works by searhing in Google or Stackoverflow.
Thanks.
Upvote Sharewhile plottinghte precision recall curve jvs the threshold why have we dont the indexing -[:-1]
Upvote ShareHi,
Good question. This is because plotting the indexes will not be of any help in getting a meaning out of the charts. So we omit them.
Thanks.
Upvote Sharewhile runing following code:
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train_5)
I am getting following error:
Please help. Infact I have pasted the code also
Hi,
Please run it from the beginning. If you are stuck, please refer to the code in our GitHub repository.
Thanks.
Upvote ShareSir
How is precision 10% in the case of always 5 classifier? Can you please show it for the exact data give in the video?
Upvote ShareHi,
The accuracy for the Never5Classifier is 90% because the classifier always predict with Not 5 irrespective of the input. Since most of the dataset consists of digits which are not 5, it matches the prediction without it doing any actual prediction.
Thanks.
Upvote Sharesir ....at 35.15 (time) why we used random.permutation() ??
Upvote ShareHi,
Good question!
random.permutation() randomly permute a sequence, or return a permuted range. We are using it here so that we can shuffle the dataset, it is somewhat similar to shuffling a deck of cards.
Thanks.
Upvote ShareHi,
Sir just wanted to say one thing i am working in a company where i am not getting enough time.I took this course with lots of expectation as it is being associated with iit but the only prblem is that the content is nice but its duration is very very big.It gets really boring after some time and i drop it in between.Please reduce the duration in some ways.MORE THAN 2-2.5 HOURS TAKES MY ENTIRE DAY TO COVER.A bit disappointed.The code length is also very big and not understandable.
Upvote ShareHi,
Thank you for your feedback. If you check out the lecture videos, you will find that these contains numerous sub topics. I would suggest that instead of covering an entire video, cover by sub topics each day. Also, take notes as and when you are learning from the videos, you can also make flashcards. These will help you remember these concepts for a long time.
Thanks.
1 Upvote ShareThis comment has been removed.
- sir i am getting this error.i have also gone through github notes but unable to resolve please help me

Upvote ShareHi,
Please match your code with the code from our GitHub repository, it seems the training data was not prepared correctly.
Thanks.
Upvote ShareThis comment has been removed.
This comment has been removed.
Where can I find how to reopen the playground? It appears to have closed.
1 Upvote ShareHi Elite Coder,
In those slides, where there is no evaluation to present, you will not see the side playground, although you can still open the lab in a separate tab from My Lab page.
1 Upvote ShareThank you for your prompt response. I will remember that in the future.
1 Upvote Sharei am getting this error .... please help
Hi,
The variable name on the first line should be sgd_clf, it has a typo. Let me know if this solves the issue. If not, then would request you to review your code against our notebooks from our GitHub repository.
Thanks.
Upvote Sharei am stil getting error can you send me the link to download this .ipynb file .......... will be greatful.
Upvote ShareHi,
Sure! Please find below the link to the classification.ipynb file:
https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb
Thanks.
Upvote ShareIn "sklearn.datasets" version 0.22 and above there is no function "fetch_mldata" how do i import the data set???
Upvote ShareHi,
You can use fetch_openml as given in slide# 15.
Thanks.
Upvote Shareat 1:16:32 in the video, Sgiri said regarding False negative "The model/image was actually not 5 but was clssified or pridicted as not 5 ok?"
Actually that definition is for True Negative but in the video it was explained for False Negative which is incorrect.
Actually correct statement for False Negative should be "Where the image/model is 5 but classified/predicted as not 5"
Upvote ShareHi,
I checked, his voice is not clear at that instance. He said "of 5" and not "not 5", which makes is a False Negative. You are right about the definition of True Negative though.
Thanks.
Upvote ShareIn the video in the first 30 mins, the data was first split and then shuffled. What is the purpose of shuffling after splitting?
As per my knowledge we shuffle before splitting to get the mixed data which covers all type of samples in both train and test sets.
I also think shuffling after splitting minimizes the opportuntiy of testing randomly because we loose track which rocord has which label and check if our predicted values is equal to the expected label as everything was shuffled.
Hi,
If you notice the comment just above that cell, we are shuffling here because we will be using cross validation next. We want to reduce bias even with the training and validation sets.
Thanks.
2 Upvote ShareThanks that explains my doubt
Upvote ShareGOOD AFTERNOON SIR,
UNABLE TO GET THE PLAY GROUND (JUPYTER NOTEBOOK BESIDES THE CONTENT)
Upvote ShareHi,
This is a lecture only slide, so this does not have a Jupyter notebook beside it.
Thanks.
Upvote ShareHi Rajtilak,
It is working fine now,thanks !
Upvote ShareHi,
When the run below code, I am getting an error,
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)
error:
Upvote ShareHi,
OpenML is down right now. Please try after some time.
Thanks.
Upvote ShareHi,
Can you try this again, it should be working now.
Thanks.
Upvote ShareWhile performing SGDClassifier:
from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)
Error:
I have changed the way the way the dataset was split into training and test by using test_train_split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15).
The y_train_9 is a numpy array containing 59500 values which are all False when I am doing it this way. Is it correct? Also, why are we using the y_train_9 instead of y_train?
Upvote ShareHi,
This is a part of which assessment?
Thanks.
Upvote ShareHello sir,
I tried the binary classification of predicting value 1 image. I got precision as 97% ,recall as 91% and f1_score as 94%.whereas in lecture i saw for value 5 image precision , recall and fi_score are different which are in the range of 70%. Please tell me according to the input image the precision,recall and f1_score gets changed? and if it gets changed then how should i know that i am getting good prediction of my model?
Upvote ShareHi,
To understand the difference you need to review the formula for precision, recall, and f1_score, and find out how many "5" and "1" images you have.
Thanks.
Upvote ShareCan you please explain what is a decision function and what it does?
Upvote ShareHi,
A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include: Estimation problems: the "decision" is the estimate.
Thanks.
Upvote ShareHello,
Please help:
1: 36000 image is not "Number 5" as per the tutorial,It is "Number 9".-it is shuffled...
2: y_train_9= here "y" is an object,hence i had change to int by defining it as y=y.astype('int16')
2: While performing SGDClassifier:
from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=20)
sgd_clf.fit(X_train,y_train_9)
Error:
Please help,because of this error,i am not able to complete the Project,also checked all the comments but did not helped.
Hi,
Please follow our notebook for a hint.
Thanks.
Upvote ShareHello Sir,
Thank-You for your response.
I am unable to find out the Notebook in GitHub,also i checked ppt number of times but it didnt helped.
Please give me the link of the notebook.
Hi,
This is the link to the notebook for Classification:
https://github.com/cloudxlab/ml/blob/master/machine_learning/classification.ipynb
Thanks.
Upvote ShareThank You Sir...
Upvote ShareHello As per the Notebook, I tried every single steps,just copying and pasting the code,only i have changed the Target value which is "9".
1: When i tried below code:
from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train_9)
2: When i tried without "9"
from sklearn.linear_model import SGDClassifier
sgd_clf= SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train,y_train)
It runs properly as shown in your notebook.
3: So wheneevr i am trying to run the y_train_9 (targeted value) it is giving error in confusion_matrix.
Also while cross_val_predict:
from sklearn.model_selection import cross_val_predict
y_train_pred=cross_val_predict(sgd_clf,X_train,y_train_9,cv=3)
So there is a problem or error if i am trying to specify the targeted value.
Please help.
Thanks.
Upvote ShareHi,
You may have to review the code from the first step of this assessment, and not just this one.
Thanks.
Upvote ShareHello Sir,
Thank-You for your response:
I again checked from the beginning and now it is working correctly.
Thanks.
Upvote Sharewhat did u do to remove that error?I copied the code from the Notebook and changed y_train_5 to y_train_9.
Upvote Sharesir, why y_train_5 is used ?
Upvote ShareHi,
y_train_5 is the target variable which points only to the "5" digit since this is a binary classification, even though the dataset contains all digits.
Thanks.
Upvote ShareAt Slide no 88, and Video 2:01:53 it is mentioned as FN = 2
But FN =3
Please check.
Upvote ShareHi,
That's correct! False Negatives should be 3. We will make the required updates.
Thanks.
Upvote Sharewhats wrong here?
Upvote ShareHi,
This function has been deprecated. Please download our latest notebooks from our GitHub repository for the updated codes.
Thanks.
Upvote ShareThis prediction model not working fine as X[31000] is number '5' but prediction giving it false.
sgd_clf.predict([X[31000]])
Upvote ShareHi,
Would request you to obtain the latest copy of our notebook from our repository.
Thanks.
Upvote Sharefetch_mldata is not available
Upvote ShareHi,
fetch_mldata has been deprecated. Please get the latest codes from our repository which contains an alternate command to download the dataset.
Thanks.
Upvote Sharehi Team,
plt.imshow(255-some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")
if i am running this above code without using interpolation it does not showing any difference then what is the role of using interpolation
Upvote ShareHi,
interpolation='nearest'
simply displays an image without trying to interpolate between pixels if the display resolution is not the same as the image resolution (which is most often the case). It will result an image in which pixels are displayed as a square of multiple pixels.Thanks.
Upvote Sharewhy am I getting above warning while running code / what does it mean ?
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10) # if you want reproducible results set the random_state value.
Upvote Sharesgd_clf.fit(X_train, y_train_5)
Hi,
This is a custom warning to capture convergence problems. You can disable it using the following method:
https://stackoverflow.com/questions/53784971/how-to-disable-convergencewarning-using-sklearn
Thanks.
Upvote Share1.why we used y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3) instead of y_train_pred=sgd_clf.predict(X_train)?
2.For prediction we should be passing only the training data and it should return the target variable ,why we are passing both?
3.In calculating scores the cross_val returned the scores for each fold , but here we got only one output.Can you explain how cross_val_predict work in this case?
Upvote ShareHi,
1. Generate cross-validated estimates for each input data point. The data is split according to the cv parameter. Each sample belongs to exactly one test set, and its prediction is computed with an estimator fitted on the corresponding training set.
2. That is the syntax of the cross_val_predict() function.
3. The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).
Thanks.
Upvote ShareNow its clear for me,Thank you
Upvote ShareHi Cloud X Team,
Apart from sklearn.datasets,, came across that MNIST dataset is also located in Keras & Tensorflow libraries of Python.
Is it possible to import the aforesaid dataset from Keras & TensorFlow libraries too???
https://www.tensorflow.org/...
This is what I came across while doing Google Search.
I believe that this is a possible solution for fetch_data() which has recently been depreceated from scikit-learn & ml.org as follows:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
Another alternative is:
http://yann.lecun.com/exdb/...
Kindly let me know your feedback.
Upvote ShareHi,
The MNIST dataset referred here is a part of the Scikit-Learn dataset. However, you are right, Keras too contains the MNIST dataset. You can find the details of all the Keras datasets in the below link:
https://keras.io/api/datasets/
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareGr888 & thanks for your response...
Upvote ShareHi Cloud X Team,
In your video-recording tied at 01:21:38 approx wrt MNIST dataset----
a) The given large image further broken down into smaller images comprising of dimensions --- 28 x 28 pixels i.e. 784 features (pixels). In other words each pixel is converted into a column. Does this mean that each small image comprises of 28 columns & the given large image (in the slide) comprises of totally 784 columns.??? Is this what is meant by the Trainer's statement? Just want to understand the concept that has been grasped by me. Kindly correct me, in case if I have wrongly understood.
b) Now these 784 features (pixels) can be transformed into an array comprising of 784 blocks. .
c) 70,000 are the rows. How did 70,000 come into the picture or rather how was this figure derived or arrived at?
Would be glad if you can clear my doubts.
Upvote ShareHi,
1. You can visualize it as 28 rows x 28 columns
2. Yes
3. The rows are the total number of images in the dataset, i.e. 70,000
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareDear Rajtilak,
Thanks for your crystal-clear explanation...Now I've understood it better....
Upvote Shareclass Never5Classifier(BaseEstimator):
def fit(self, X, y=None):
pass
def predict(self, X):
return np.zeros((len(X), 1), dtype=bool)
Why we are passing y=None by default ?
And I am not able to understand the predict function,Why are we returning all zeros ?
And Iam getting True negative as 54579 and False Negative as 5421 ,The true positive is zero and False positive is also zero ?
Upvote ShareHi,
Please find the answer to your queries in the link below:
https://stackoverflow.com/q...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareThere was only the explanation of fit which I know what is doing ,But in the link they have said we are only providing the 5 ,but I don't think it is because we are not choosing particularly 5 we are passing the whole data set X which contains different numbers .
And I have asked why we have set y=0 by default ? and inside predict why we are returning zeros of shape (len(x),1)?
Please give the explanation.
Upvote ShareHi,
The Never5Classifier is just a toy classifier which always predicts False (meaning "not a 5"), without even looking at the image. The goal is to demonstrate that even such a bad classifier (which doesn't learn anything at all and doesn't even look at the images) can get pretty good accuracy if most images are not 5s.
You will find the explanation in the below link:
https://github.com/ageron/h...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi Cloudxlab:
1. It seems a few functions in the code provided in the PDF explained in video is deprecated. I had to put in a lot of unnecessary time to understand "fetch_mldata()" is deprecated. Please attach the latest updated git hub code link below the video.
2. Also until 1 hour into this video the trainer is explaining from some other PDF which is not added here. Exactly at "1:12:00" is where he starts with Classification PDF , the one which you have attached here in the course. Please five us the transcripts(more importantly the PDFs) of the explanation till "1:12:00"
Upvote ShareHi,
1. We constantly try to update our notebooks as and when required. We have addresses this change too by updating our notebook in our GitHub repository. You can obtain them by forking our repository from the link below:
https://github.com/cloudxla...
If in future you face any such issues, would request you to either check our repository whether we have changed any codes, or let us know through your comments.
2. The first part of this lecture is a continuation of the End-to-End project, and you would find these slides under that topic.
Please let us know if you have further queries.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi,
I can download the latest notebook from the link provided. But what I'm more worried about is your videos which explain the code is not updated as well. The trainer is still explaining the old code of "fetch_ml()". And then the code is updated in the notebook.
How do you then expect us to follow the updated code then ? All by our self ?
Upvote ShareHi,
We will update our videos soon. However, please follow the code given in our GitHub repository.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi Team,
In video, tutor is asking to use below code to load MINSAT dataset.
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]
But the above code is not working. This code is commented in latest code repository.And even if I try to run the above piece of code it is giving me error. on 'fetch_mldata'.
I can see some different lines of code in new file, which is below mentioned
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)
I can see mnist dataset in both the cases is a dictionary only and it has all data. But, this image algorithm is working differently I think, because in Viedo for y[36000] is giving 5, which is same as what is shown by matplot.But , with the latest code which is using 'fetch_openml', y[36000] is 9.
Plus there is a function in new code. 'sort_by_target'.
Please let me know reasons for all these things.
Thanks!
Upvote ShareHi,
The code fetch_mldata() has been deprecated. We have updated our notebooks, you can download the latest notebook from our GitHub repository.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi, Team
in Classification
from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist
Error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")
ImportError: cannot import name 'fetch_mldata'
just tell me How should i practice..
Upvote ShareHi,
Please note that fetch_mldata has been deprecated. We have updated our notebooks accordingly. Would request you to pull the updated notebook from our GitHub repository to reflect the same.
Thanks.
-- Rajtilak Bhattacharjee
Upvote SharePlease tell me how to do the practice??
Upvote ShareHi,
You can get the latest notebooks from our GitHub repository, study the codes and understand their workings, and then imply the same understanding while working on the projects related to this topic.
Thanks.
-- Rajtilak Bhattacharjee
Upvote Shareplease help
Side playground is not show to me.Please resolve.
Hi Mohini,
Thank you for contacting us.
Take a look at the top right side of your screen, are you able to locate "Show Playground"? Just click on it.
Please feel free to let me know if you have any queries and I'll be glad to help.
Hope this helps.
Thanks.
-- Anupam Singh Vishal
Upvote Shareyou can see in the screenshot sir, there no button named Show playground.
Upvote ShareI guess this session we have to do in our own jupyter notebook that is installed, since it is not graded.
Upvote ShareHi, Srihari.
Yes, you can do it by following the tutorial and by creating with another Jupyter file.
All the best!
-- Satyajit Das
Upvote ShareHi Mohini,
The playground will not show at the side of this topic, and a few more of them, as they do not have any assessments.
Thanks.
-- Rajtilak Bhattacharjee
Upvote SharePlease let me know where can I get the same dataset in local?
Upvote ShareHi,
You can use the following command in your local Jupyter installation, and you will be able to access the same dataset:
mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.target <http: disq.us="" url?url="http%3A%2F%2Fmnist.target%3Ao8f962NqilaEdXjITo3wJy-wzBM&cuid=4082636"> =
mnist.target.astype(np.int8)
Thanks.
-- Rajtilak Bhattacharjee
Upvote Sharefrom sklearn.datasets import fetch_mldata
ImportError: cannot import name 'fetch_mldata'
Upvote ShareHi,
Use this, it should work.
from sklearn.datasets import fetch_openml
mnist = fetch_openml(‘mnist_784’)
All the best!
-- Satyajit Das
Upvote ShareHi,
I am trying to download but unable to do, please find the attached screen shot.
Hi,
Try the following code instead:
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.target = mnist.target.astype(np.int8)
sort_by_target(mnist)
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareWhat is sort_by_target function?
Upvote ShareHi,
Please comment out that line and try again. Also, please note that we have updated our notebooks with this code. Would request you to use the latest notebooks from our GitHub repository.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareThe code after fitting classifier is supposed to give True boolean but it is giving False.Plz help
Thanks
Hi,
Could you please share a screenshot of your code and the error that you are getting.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareCan i get all stuff of recordings and slides for future reference.
Thanks
Hi Prachi,
You will have a lifetime free access to all the videos and the slides. You can even download the slides using the arrow button that shows up on the top right corner when you hover over them.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareOK Thanks
Upvote Share@disqus_XTh3bUKOBh:disqus Team,
I just wanted to highlight below two observations, due to change in the data source from MLDATA to OPENML:
1. The 36,000 th image in the video was '5' while in the OPENML dataset, it points to the value '9'. Perhaps, the order in this dataset seems shuffled.
2. When binary classification is attempted, the SGDClassifier gives out an error as "ValueError: The number of classes has to be greater than one; got 1 class". I came to know that, the issue lies with the data type of the dataset's labels. The data type of values of the 'target' key is 'object', which would not work when we create a boolean array of 5 (True) and not 5 (False). However, this can be resolved by changing the data type of labels using below code -
y = y.astype('int16')
I hope it helps everyone.
Upvote ShareHi,
Thanks for pointing this out.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareGreat!!! thanks
Upvote ShareHello,Even after changing the data type from object to int (y=y.astype('int16')),i am getting the same error:
"ValueError: The number of classes has to be greater than one; got 1 class"
Please help
Upvote ShareHi,
This error means that there is some issue with your dataset and not it's data type. Please follow our notebook for more details.
Thanks.
Upvote ShareHi Team,
Why am i getting this error.
Hi,
Try the following code instead:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareThanks Rajtilak. It worked.
Upvote ShareClassification import issue...
Try the following code instead:
from sklearn.datasets import fetch_openml
Upvote Sharemnist = fetch_openml('mnist_784', version=1, cache=True)
Thanks Vinay it worked. After proceeding further I am stuck with this error.
Hi!
The possible resolution to this query is available in this post http://disq.us/p/28x5bs9 .
I hope it helps you.
Upvote ShareHi Deepak,
Would request you to recheck your code, and check if you have formed the X_train and y_train_5 properly. If you want you can take a hint, or look at the answer to match with the code you wrote. If you are still stuck, would request you to post a screenshot of your code from the beginning.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHello,
Upvote ShareWhile fitting the croos_val_score to the sgd_clf I'm getting the Convergence warning in the result.
how could this be solved?
Hi Rohit,
It is a warning, and not an error. If your results are fine then you need not be worried about it.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHello,
I cannot find the jupyter notebook that is displayed on the right.
can someone help me with that?
Thanks in advance
Upvote ShareHi Rohit,
This topic does not contain any assessment questions, so you would not find the playground on the right. However, if this is the issue you are facing with all topics, then would request you to restart your server using the following method:
https://discuss.cloudxlab.c...
Thanks.
-- Rajtilak Bhattacharjee
Upvote Shareokay. Thank you
Upvote ShareHi,
I have imported data as mentioned below:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
Problem1:
The image of digit for 36000 is attached. It is 9. It is mentioned in the pdf that 36000th image is '5', which is not the case.
The program gives warning after following code.
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train)
WARNING:
-------
/usr/local/anaconda/lib/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:557: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
ConvergenceWarning)
Problem 2:
The next line of code:
some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])
It produces output as: array(['9'], dtype='<u1')< b="">
The output mentioned in the course material (pdf) as: array([True], dtype=bool)
Query 1:
As suggested in above warning I changed the max_iter to '15' and reran the code.
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=15)
sgd_clf.fit(X_train, y_train)
some_digit = X[36000] # Taking the 36,000th image
sgd_clf.predict([some_digit])
The output I received is: array(['4'], dtype='<u1')< b="">
Can you please explain how max_iter is impacting the prediction from '9' to '4'?
Problem3:
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42, max_iter=10)
sgd_clf.fit(X_train, y_train_5)
getting below error while using y_train_5
----> 3 sgd_clf.fit(X_train, y_train_5)
ValueError: The number of classes has to be greater than one; got 1 class
None of the code is working for 'y_train_5'
from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train_5, cv=3,scoring="accuracy")
output: array([nan, nan, nan])
never_5_clf = Never5Classifier()
Upvote Sharenever_5_pred = never_5_clf.predict(X_train)
cross_val_score(never_5_clf, X_train, y_train_5,cv=3, scoring="accuracy")
Output: array([1., 1., 1.])
Output as per pdf: Never5Classifier - a dumb classifier gave an accuracy of 90%
Hi Punit,
For the first query, it is a warning, not an error. So you need not change the max_iter. max_iter is the parameter which control the maximum number of iterations that can specify for training this model. It is the maximum number of iterations taken for the solvers to converge.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi!
You can refer this post http://disq.us/p/28x5bs9 for answer to your queries.
I hope it helps you.
Upvote ShareNot able to find data set fetch_mldata although i have already pull git repository.
Upvote Sharecan you share the path?
Hi Alpesh,
Please use the following code instead:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareI am getting error on first line itself... I tried multiple times...
ImportError Traceback (most recent call last)
<ipython-input-3-caec9ae19f90> in <module>
----> 1 from sklearn.datasets import fetch_mldata
2 # fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
3 # in your home directory.
4 # you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
5 mnist = fetch_mldata("MNIST original")
ImportError: cannot import name 'fetch_mldata'
Upvote ShareHi Vivek,
The same question I have had. However, I have searched on google about fetch_mldata and found that it is dead because it relied on a website that died. So, We need to replace it with fetch_openml(), which relies on https://openml.org, which is alive and kicking. The data set name is "mnist_784" on this website.
Hi Cloudxlab Team,
Can we proceed with fetch_openml() instead of fetch_mldata(). Please let us know your response.
@disqus_zQl19TrWvN:disqus Please help.
Regards,
Jayant
Upvote ShareAfter importing the SGDClassifier and creating it's instance , when I run the fit model from this object , it throws an error - ValueError: The number of classes has to be greater than one; got 1 class
Upvote SharePlease help
Hi!
You can refer this post http://disq.us/p/28x5bs9 for resolution of your concern.
I hope it helps you.
Upvote Sharefrom sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
This is not workong.
Showing
ImportError: cannot import name 'fetch_mldata'
Upvote SharePlease help me out.
Hi , when I run :
"from sklearn.datasets import fetch_mldata"
It gives below error:
ImportError Traceback (most recent call last)
<ipython-input-2-1955b0fbdeec> in <module>
1 import sklearn
----> 2 from sklearn.datasets import fetch_mldata
ImportError: cannot import name 'fetch_mldata'
Please help how to lead the mnist data.
Upvote ShareHi, Harry.
Kindly refer to this discussions :- https://discuss.cloudxlab.c...
All the best!
Upvote ShareIn this video, you have given a google drive link as shared folder where all PPts are provided. Please share that link with me .
Upvote ShareHi, Vivek.
You can refer to this GitHub directory for any materials.
https://github.com/cloudxla...
All the best!
1 Upvote ShareAt this path only notebooks and data is available. I want the PPT / PDF which are in google drive. Please share the google drive link to download PPT / PDF. You shared that in video with live attendees. While I am listening it now, I am not able to get those.
Upvote ShareThanks,
from sklearn.datasets import fetch_mldata
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_mldata("MNIST original")
mnist
ImportError: cannot import name 'fetch_mldata'
getting the above error .Please help me to resoolve it.
Upvote SharePlease use the following code instead of 1st line
def sort_by_target(mnist):
reorder_train=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[:60000])]))[:,1]
reorder_test=np.array(sorted([(target,i) for i, target in enumerate(mnist.target[60000:])]))[:,1]
mnist.data[:60000]=mnist.data[reorder_train]
mnist.target[:60000]=mnist.target[reorder_train]
mnist.data[60000:]=mnist.data[reorder_test+60000]
mnist.target[60000:]=mnist.target[reorder_test+60000]
import numpy as np
from sklearn.datasets import fetch_openml
#from sklearn.datasets import fetch_mldata
#from sklearn.datasets import fetch_openml
#mnist = fetch_openml('MNIST original')
# fetch_mldata downloads data in the file structure scikit_learn_data/mldata/mnist-original.mat
# in your home directory.
# you can also copy from our dataset using rsync -avz /cxldata/scikit_learn_data .
mnist = fetch_openml('mnist_784',version=1)
mnist.target=mnist.target.astype(np.int8)
sort_by_target(mnist)
mnist
Fetch_mldata fetched from a Site that is down currently, so use Fetch_openml which has different attributes for the data so we have to sort the data and convert the string target to a int.
Upvote Sharecannot import fetch_maldata is the error i am getting in the first line itself, there is no scikit
Upvote Share_learn folder in my home directory, pls help!! i created one as its given and ran the -rvc command to pull it but its still not working
IF a Regression Model said to be performing well using performance metrics MAE or MSE, then what will be the ranges of MAE or MSE when data is not scaled? What will be the ranges of MAE and MSE if the data scaled in between 0 and 1 or -1 to 1?
Upvote ShareHi sir, can you please upload the google drive link of slides so that every student can download them. Thanks
Upvote ShareThe following error is got when trying to download the MNIST Data
:
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)
c:\users\mac\appdata\local\programs\python\python37-32\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22
warnings.warn(msg, category=DeprecationWarning)
And nothing is downloaded...pl help
Upvote Share..
Upvote ShareHi, Anant.
Can you please tell where you are facing the problem?
All the best.
Upvote Sharefrom sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")
X, y = mnist["data"], mnist["target"]
this is not working . giving error in 2nd line. it takes a lot of time to run and in the end it shows the error: Connection Reset by peer
Upvote ShareWhen too many people are downloading it happens. You can try after sometime.
Upvote Share