Projects - Assignments

1 / 1

Project Assignment - Machine Learning

Instructions

Please pick one of the projects from the list below and once you have completed please submit in the following way:

  1. Create a GitHub repository by forking https://github.com/cloudxlab/ml
  2. Create a folder inside “projects” folder in your newly forked repository and upload your notebook in that folder.
  3. Create a presentation to show your findings
  4. And write a brief article of 300 words outlining your approach and send it to reachus@cloudxlab.com

Project 1 - Fashion-MNIST

Fashion-MNIST is a dataset of Zalando's (http://www.zalando.com) article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Fashion-MNIST serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. (See GitHub Repo)

The objective of the project is to use Fashion-MNIST data set to identify the different fashion products from the given pictures using various best possible models (ML algorithms) and report the values of the performance measures for different models. Also, report the model that performs best, and fine-tune the same model using one of the model fine-tuning techniques, and report the best possible combination of hyperparameters for the selected model. Lastly, use the selected model to make final predictions and report the values of various performance measures for the same.

Hint: You can use dimensionality reduction to simplify the things.

filePath = '/cxldata/datasets/project/fashion-mnist/’

Project 2 - MNIST

The objective of the project is to use MNIST data set to identify the different numerics (digits) from the given pictures using various best possible models (ML algorithms) and report the values of the performance measures for different models. Also, report the model that performs best, and fine-tune the same model using one of the model fine-tuning techniques, and report the best possible combination of hyperparameters for the selected model. Lastly, use the selected model to make final predictions and report the values of various performance measures for the same.

Hint: You can use dimensionality reduction to simplify the things.

Project 3 - Bikes Rental

The objective of the project is - using historical usage patterns and weather data, forecast(predict) bike rental demand (number of bike users (‘cnt’)) on hourly basis.

Use “Bikes Rental” data set to predict the bike demand (bike users count - 'cnt') using various best possible models (ML algorithms) and report the values of the performance measures for different models. Use dimensionality reduction on the data set before using it for Training the models. Also, report the model that performs best, and fine-tune the same model using one of the model fine-tuning techniques, and report the best possible combination of hyperparameters for the selected model. Lastly, use the selected model to make final predictions and report the values of various performance measures for the same.

filePath = '/cxldata/datasets/project/bikes.csv'

The dataset contains the following parameters:

  • instant: record index
  • dteday : date
  • season : season (1:springer, 2:summer, 3:fall, 4:winter)
  • yr : year (0: 2011, 1:2012)
  • mnth : month ( 1 to 12)
  • hr : hour (0 to 23)
  • holiday : weather day is holiday or not (extracted from [Web Link])
  • weekday : day of the week (0 to 6; 0 - Sunday, 6 - Saturday)
  • workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
  • weathersit :

1: Clear, Few clouds, Partly cloudy, Partly cloudy

2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

  • temp : Normalized temperature in Celsius. The values are derived via (tt_min)/(t_maxt_min), t_min=*8, t_max=+39 (only in hourly scale)
  • atemp: Normalized feeling temperature in Celsius. The values are derived via (tt_min)/(t_maxt_min), t_min=*16, t_max=+50 (only in hourly scale)
  • hum: Normalized humidity. The values are divided to 100 (max)
  • windspeed: Normalized wind speed. The values are divided to 67 (max)
  • casual: count of casual users
  • registered: count of registered users
  • cnt: count of total rental bikes including both casual and registered

The "target" data set ('y') should have only one 'label' i.e. 'cnt'.

Acknowledgements

Cloudxlab is using this “Bike Sharing Demand” problem for its machine learning learners for learning and practicing. This dataset was provided by Hadi Fanaee Tork using data from Capital Bikeshare. We also thank the UCI machine learning repository for hosting the dataset.

Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...