Getting started with Machine Learning and Data Science with Python

Boolean Club
4 min readJul 1, 2020

Regarded by many as the sexiest job of the 21 st century, ML/data science has created a hype that is there to last. More and more people are getting into the field for many reasons, its charm, fascination and of course, money. The most appealing part of the job is it doesn’t depend on your academic background. Many people with different backgrounds have made their profession in the field.

What is it?

Definition: “Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.”

As the word suggests, when a machine learns, it’s called machine learning. However, it learns just like humans do, by processing data. And for this computation we require the help of computers and knowledge of coding languages. There are various tools available to help us process/analyze data,
both free and premium, the most common being programming languages like Python, R, Go, Julia, as well as tools like Excel.

Why machine learning?

Most industries working with large amounts of data have recognized the value of machine learning technology. By gleaning insights from this data — often in real time — organizations are able to work more efficiently or gain an advantage over competitors.

Things like growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data storage. All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results — even on a very large scale. And by building precise models, an organization has a better chance of identifying profitable opportunities — or avoiding unknown risks.

In this particular track of Boolean, we will implement various ML techniques using Python. Python, due to its versatility, libraries, readability and ease to use has the most used tool in data science and machine learning.

Setting up the environment

The first thing we need to use python, is python. The latest version(python3) of python is advised as it will have minimum bugs and also some additional features that previous version might lack.

You can download python here.

Now we need something called an IDE. An IDE, Integrated Development Environment, is an environment that runs our code/program. There are various IDEs, but we will be using Jupyter notebooks. These are widely used specially for data science and machine learning techniques and are pretty easy to use once you a get a hang of them.

Now, there are various ways of downloading Jupyter. Most common way is to download Anaconda, which is an environment itself that has many IDEs and tools used for ML like Spyder, RStudio etc.

You can download it here

If you want help with installing, using or setting up Jupyter or Anaconda, you can refer to these links below:
1. How to use Anaconda for Python programming
2. Executing python code on Jupyter Notebook on Anaconda
3. Install Anaconda Python, Jupyter Notebook on Windows 10

Stuff to know before beginning ML

There are some things that one should be familiar with before he/she begins his/her journey in machine learning. You don’t have to be an expert in it, or should know it all but a familiarity with ML topics and having a background definitely helps.

Mathematics: Machine learning is very math-heavy. if you are looking to build a career in machine learning, then you must have a firm grasp on topics like statistics, probability, algebra etc. It is not necessary to master them as you start but knowing them side-by-side helps.

Coding: As everything mathematical will be implemented using coding languages and algorithms, one should have a knowledge of programming concepts. People think ML coding is easy because you have to implement same algorithms again and again. However, once you climb up the ladder and start implementing techniques on real world problems, you start making your algorithms more efficient, your pipelines faster, and your data less storage heavy. All this can only be a done by a person who is well-versed with coding as well as computational knowledge.

Practice, practice, practice: You literally can’t ignore the very basic principle of machine learning while learning machine learning. An algorithm will not have a desired outcome unless it is trained over a large set of data. And one cannot master ML without practicing it every day.

With your ML model you too will learn from your mistakes, make adjustments and then get the desired target on your path to learning ML. And we at Boolean are there to guide you further.

Written by Ashutosh Arya.

--

--