Login using Social Account
     Continue with GoogleLogin using your credentials
Splitting the dataset may look like an easy task, but it isn't. We can't just randomly split our dataset because-
Sampling bias is introduced when our sample can't capture the actual distribution of our data.
So, we split the data in such a way that it generates the same test data every time, and also the sample captures the actual distribution of our dataset. We capture the actual distribution by using stratified sampling instead of random sampling.
In stratified sampling, we first divide our dataset into homogeneous subgroups called strata, which is based on some characteristics in our data, and then the right number of instances is sampled from each strata to guarantee that the test set is representative of the overall data.
Like in the above college data example, we tried to maintain the 65:35 ratio in our sample. That was stratified sampling and the characteristic used to create the strata was gender.
Note- Stratified sampling doesn't remove sampling bias completely but reduces it to a much greater extent.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...