DevOps: An Introduction

Learn the basic concepts of DevOps. You will also learn the benefits of using DevOps practices in your application.

What is DevOps?

DevOps is essentially cultural philosophies, practices, and tools to help deliver your applications and services and tools to your huge number of users.

Benefits of DevOps

Benefits of DevOps
  1. Rapid Delivery
    Increase the frequency and pace of releases so you can innovate and improve your product faster.
  2. Reliability
    Ensure application updates and infrastructure changes so you can deliver at a more rapid pace.
  3. Scale
    Operate and manage your infrastructure and development processes at scale.
  4. Speed
    Innovate for customers faster, adapt to changing markets better, and grow more efficient
  5. Better Collaboration
    Build more effective teams under a DevOps cultural model, which emphasizes values such as ownership and accountability.
  6. Security
    Move quickly while retaining control and preserving compliance.

Understanding DevOps

Tools used in DevOps
Tools used in DevOps

DevOps stands for:
1. Development and
2. Operations

There are 8 phases in DevOps as follows:

1. Plan
The planning phase involves a shorter goal planning. A scrum or agile planning is a better choice. The various tools you use are:

1. Microsoft Office, Google Docs / Sheet
2. Project Management Tools – Microsoft Project
3. Task Management tools – Asana, Jira, Mantis

2. Code
Text editors and Integrated Development Environments
1. VS Code, Vim, Emacs
2. Eclipse, XCode, Visual Studio

Source Code Management (SCM) Tools
1. Git – Best Choice, Older: SVN, CVS
2. Server – GitLab, GitHub, or own

Unit Test Case Libraries – depends on the language
1. Selenium, JUnit

3. Build
Build process involves compiling code, copying the assets, generating config and documentation. It should work on the developer machine as well as on the unattended machine.

There are various build tools:
Maven, Ant, SBT, etc

4. Test
Testing involves verifying if the code is performing as per requirement. Unit Testing starts at coding time. Write test cases before you code.

Various types of testing are:
Manual testing, Unit testing, Integration testing, Stress testing

Tools: xUnit, Selenium, Scripts

To ensure completeness, we use code coverage tools like Cobertura.

5. Release
Once the testing has been successfully done, the build is released as RC (release candidate) which is ready for being deployed in production.

Tools like Jenkins are used for release. Also, Apache Maven repositories are also used for releasing the binaries.

6. Deploy
Once the release is finalized, we can deploy it using different automation tools: Puppet, Chef, Ansible, SaltStack

Newer tools such as Docker and Kubernetes help scale infinitely and instantaneously.

Docker and Kubernetes are used in testing also these days in Continuous Integration.

7. Operate
Once the software is in production, users can use it and the product managers can customize it. During the operating phase, we can measure the effectiveness using A/B Testing.

8. Monitor
We also need to monitor the various system resource consumptions such as Nagios.

We must also monitor the various logs and errors being thrown by the system. See: Apache Logging System

Visualization tools such as Grafana are used to represent metrics.

DevOps Practices

DevOps Practices

Continuos Integration
It refers to the build and unit testing stages of the software release process. Every code commit triggers an automatic workflow that builds the code and tests the code. It helps in finding and addressing bugs quicker.

Continuous Delivery
It automates the entire software release process. Continuous Delivery extends the Continuous Integration.
Every code-commit triggers an automatic workflow that builds the code (Continuous Integration). Tests the code and (Continuous Integration). Then deploys the code to staging and then to production.

Microservices
Design single application as a set of small services. Each service runs its own process. Each service communicates with other services through a lightweight mechanism like REST APIs. Microservices are built around business capabilities. Each service caters to a single business purpose.

Infrastructure as Code
Infrastructure is provisioned and managed using Code and Software development techniques such as Version control and Continuous integration. This helps developers and system admins to manage infrastructure at scale. Without worrying about manually setting up and configuring the servers and resources.

Monitoring and Logging
Monitor metrics and logs to see how the application and infrastructure are performing. Taking necessary actions to fix the bottlenecks.

Collaboration
DevOps process setup strong cultural norms and best practices. Well defined processes increase the quality of communication and collaboration among various teams

Check out the DevOps Certificate Course Offered by CloudxLab here.

In this course, we will learn to deploy applications to various environment likes testing, staging, and production by building Continuous Integration (CI) and Continuous Delivery (CD) pipelines. Also, we will see end-to-end examples of how to deploy, scale and monitor your application using Docker, Kubernetes, Prometheus, and Jenkins.

The course will be completely hands-on and we will make sure to deliver a best-in-class learning experience and provide you with enough knowledge to start your career as a DevOps Engineer. Also, the course provides sufficient solid foundations to you so that you can start preparing for various certifications like DCA(Docker Certified Associate) and CKAD(Certified Kubernetes Application Developer)

MLOps (Machine Learning Operations) – A Complete Hands-On Guide with Case Study

Learn about MLOps with a case study and guided projects. This will also cover DevOps, Software Engineering, System Engineering in the right proportions to ensure your understanding is complete.

You will learn about MLOps with a case study and guided projects. This will also cover DevOps, Software Engineering, System Engineering in the right proportions to ensure your understanding is complete.

Introduction

As part of this series of blogs, I want to help you understand MLOps. MLOps stands for Machine Learning Ops. It is basically a combination of Machine Learning, Software Development, and Operations. It is a vast topic. I want to first establish the value of MLOps and then discuss the various concepts to learn by way of guided projects in a very hands-on manner. If you are looking for a theoretical foundation on MLOps, please follow this documentation by Google.


Case Study

Objective

When I was working with an organization, we wanted to build and show the recommendation. The main idea was that we need to show something interesting to the users in a small ad space that people would want to use. Since we had the anonymized historical behavior of the users, we settled down to show recommendations of apps that they would also like based on their usage of various apps at the time when the ads were being displayed.

Continue reading “MLOps (Machine Learning Operations) – A Complete Hands-On Guide with Case Study”

Practice questions on Data Structures and Algorithms for Software Engineer Roles

Welcome!

You might have seen many people getting anxious for coding interviews. Mostly you are tested for Data Structures and Algorithms in a coding interview. It can be quite challenging and stressful considering the vastness of the topic.

Software Engineers in the real world have to do a lot of problem-solving. They spend enough time understanding the problem before actually coding it. The main reason to practice Data Structures and Algorithms is to improve your problem-solving skills. So a Software Engineer must have a good understanding of both. But where to practice?

ClouldxLab offers a solution. We have come up with some amazing questions which would help you practice Data Structures and Algorithms and make you interview-ready.

So what are you waiting for? Encourage the aspiring Software Engineer in you, by waking up the problem solver in you. Practice the following questions: https://cloudxlab.com/assessment/playlist-intro/566/data-structures-and-algorithms-questions

All the best!

Practice questions for Machine Learning Engineer Roles

Welcome!

You might have come across several posts which focus only on the theoretical questions for you to prepare for a machine learning engineer role. But is the theoretical preparation enough?

The ML Engineers in the real world do much more than just making models. They spend enough time understanding the data before actually building a model. For this, they should be able to perform different operations on the data, make intuitions and manipulate the data as per the needs. So an ML Engineer must be able to how to play with data and tell some intuition stories.

Pandas is a library for Python to perform various operations on data. Numpy is a famous Python library for numerical computations. It is often expected that an ML Engineer is well-versed with both of these libraries. But where to practice?

ClouldxLab offers a solution. We have come up with some amazing questions which would help you practice Python, Pandas and Numpy hands-on and make you interview ready.

So what are you waiting for? Encourage the aspiring ML Engineer in you, by waking up the problem solver in you. Practice the following questions: https://cloudxlab.com/assessment/playlist-intro/862/machine-learning-prerequisite-mains-10th-july-2021.

All the best!

Getting Started with various tools at CloudxLab

Welcome!

We are happy to announce that we have come up with a new consolidated playlist, which summaries about various tools present at CloudxLab environment, how to use them and where to learn about them.

This would be incrementally improved as new technologies keep getting installed on the lab.

You may find the playlist here.

In this playlist, there is a dedicated slide for each technology. For example, if you want to understand how to use Pandas on the lab, go to the slide named Pandas.

Upon clicking on Pandas, you would be able to see the Pandas guide as follows:

As you could see, this slide contains all the basic information needed such as:

  • the purpose of the library
  • link for the official home page
  • link for the official documentation
  • related resources you could use to learn about the library.
  • instructions on how to use it on the CloudxLab environment.
  • 1-2 lines of sample examples to use it, such as how to inport the library and how to check the version.

We hope that this will be a great starting guide for our users and makes their job of getting started easier.

Happy learning!

When to use While, For, and Map for iterations in Python?

Python has a really sophisticated way of handling iterations. The only thing it does not have “GOTO Labels” which I think is good.

Let us compare the three common ways of iterations in Python: While, For and Map by the way of an example. Imagine that you have a list of numbers and you would like to find the square of each number.

nums = [1,2,3,5,10]
result = []
for num in nums:
    result.append(num*num)
print(result)

It would print [1, 4, 9, 25, 100]

Continue reading “When to use While, For, and Map for iterations in Python?”

How to handle Command Line Arguments in Python?

When you are running python programs from the command line, you can pass various arguments to the program and your program can handle it.

Here is a quick snippet of code that I will be explaining later:

import sys
if __name__ == "__main__":
    print("You passed: ", sys.argv)

When you run this program from the command line, you will get this kind of results:

$ python cmdargs.py
 You passed:  ['cmdargs.py']

Notice that the sys.argv is an array of strings containing all arguments passed to the program. And the first value(at zeroth index) of this array is the name of the program itself. You can put all kinds of check on it.

Continue reading “How to handle Command Line Arguments in Python?”

How does YARN interact with Zookeeper to support High Availability?

In the Hadoop ecosystem, YARN, short for Yet Another Resource negotiator, holds the responsibility of resource allocation and job scheduling/management. The Resource Manager(RM), one of the components of YARN, is primarily responsible for accomplishing these tasks of coordinating with the various nodes and interacting with the client.

To learn more about YARN, feel free to visit here.

Architecture of YARN

Hence, Resource Manager in YARN is a single point of failure – meaning, if the Resource Manager is down for some reason, the whole of the system gets disturbed due to interruption in the resource allocation or job management, and thus we cannot run any jobs on the cluster. 

To avoid this issue, we need to enable the High Availability(HA) feature in YARN. When HA is enabled, we run another Resource Manager parallelly on another node, and this is known as Standby Resource Manager. The idea is that, when the Active Resource Manager is down, the Standby Resource Manager becomes active, and ensures smooth operations on the cluster. And the process continues.

Continue reading “How does YARN interact with Zookeeper to support High Availability?”

Parallel Computing with Dask

Dask collections and schedulers
Source: dask.org

I recently discovered a nice simple library called Dask.

Parallel computing basically means performing multiple tasks in parallel – it could be on the same machine or on multiple machines. When it is on multiple machines, it is called distributed computing.

There are various libraries that support parallel computing such as Apache Spark, Tensorflow. A common characteristic you would find in most parallel computing libraries you would is the computational graph. A computational graph is essentially a directed acyclic graph or dependency graph.

Continue reading “Parallel Computing with Dask”

How to use a library in Apache Spark and process Avro and XML Files

What is Serialization? And why it’s needed?

Before we start with the main topic, let me explain a very important idea called serialization and its utility.

The data in the RAM is accessed based on the address that is why the name Random Access Memory but the data in the disc is stored sequentially. In the disc, the data is accessed using a file name and the data inside a file is kept in a sequence of bits. So, there is inherent mismatch in the format in which data is kept in memory and data is kept in the disc. You can watch this video to understand serialization further.

Serialization is converting an object into a sequence of bytes.
Continue reading “How to use a library in Apache Spark and process Avro and XML Files”