Data Science for the New Generation: Arzoo Sabharwal Interview
“Data Science has the capability to bring life to the data in any other field.”
Ms. Arzoo has been a Data Scientist at IBM for a little over five years. She holds a master’s degree in the field of Economics from the Delhi School of Economics with an undergraduate degree in the same field from University of Delhi. She founded DataIQ School of Analysis in 2018 along with Mr. Grijesh Sharma to encourage students to take live projects in the domain of Data Science.
Data Science can be picked up by any individual after their graduate studies, as it deals solely with understanding the importance of data and how important it is to analyze it. The field is not limited to prediction. There’s another side to Data Science known as Cognitive which deals with a wider variety of tasks, ranging from analyzing documents to developing chat-bots, where the computer is able to analyze or understand much more over the course of time. It compels an individual to think beyond just the code.
Predictive Modelling, a process of analyzing and predicting outcomes, demands us to answer a lot of questions such as Regression Analysis, Heteroscedasticity, Visualization, Cleansing, and many more. The method one chooses to go about it depends on the kind of data they deal with, and the domain it is concerned with. Details like structured or unstructured data, skewed data, or if the data has outliers, are the factors that influence the entire process.
Many people believe that the problem-solving aspect is a must for a data scientist. It is true but one also needs to understand that it differs from project to project. One thing which remains constant is the willingness to learn across domains. An economist might have to work on a project which involves health care, Human Resources, or even aviation. They’d spend a week or so just to understand the vocabulary and general terms of the domain, in order to better understand the domain to build a better predictive or forecasting model. Hence, domain knowledge is required apart from problem-solving as it influences one’s method of dealing with certain data points.
Ideally, one should start with a three-step process. For a complete beginner, and especially for someone from a non-programming background, a course on Introduction to R can be a good stepping stone. In R, one can quickly move to visualizations, statistics, and complex operations. The brand of the course is unimportant as long as one obtains an expertise in their specific sector. Completing a course would surely add a lot of value to one’s resume and if one can suffice that skill with the help of a certificate, it always proves helpful.
It can be followed up with an introductory course on Python. If one wishes to dive deep into the field, one cannot build complex models in R, hence Python becomes a necessity. After an introduction to Python, one can move on to learn about Machine Learning in Python (Microsoft AZ900, AI900, DP900, etc.). However, one mustn’t take all of the courses at once. They should start with one language or tool, establish their logical analysis, and then move on to the next one.
Adapting to the real world and acquiring skills before it becomes absolute or saturated is a critical skill for this generation. Beginners won’t have a clear idea about how to deal with large amounts of data, therefore having a mentor, someone who guides them at every step is always beneficial. One should always look to choose a mentor whose mindset matches with theirs and is well aware of their vision and goals in life.
There are a lot of datasets like Iris Flower Dataset or Boston Housing Data-set which are being used by almost every other beginner and there’s no harm in it. Once acquired with these skills, one should definitely go for the newer/ different data sets. Just take a data-set that interests you, and apply your knowledge to it. This gives you a great opportunity to work with real-life data. There are several websites, GitHub and Kaggle, that will provide mock datasets that are remarkably similar to your real datasets. The variables available in the data-set would be extremely similar to the variables in the real data-set. Choose a data-set that corresponds to and is closest to your problem statement and work on it to create a real-life model.
You see a surge of startups these days looking for business analysts, and one can definitely go for those opportunities. These internships give you a lot of hands-on experience in the field.
Showcase your live projects, practice, and actively publish on LinkedIn to get recognized by the professionals in that field.
Lastly, you have to consider that the field of Data Science is very vast. It can be applied to any other domain such as Statistics, Economics, History, or more. Data Science of itself is not a concrete or separate domain. It has the capability to bring life to the data in any other field. It can be employed in Sports Analytics, Medical Analysis, HR Analytics, and more. It gives you scope to go back to your original domain where you started even if you think Data Science itself is not fit for you.
To summarize, engage in numerous life projects, internships, and courses that will assist you in developing a strong profile, and take you a long way.