My journey towards completing the “Professional Certificate in Data Science” from Harvardx (Part 1)
Data Science (DS) is one of the subject towards which I have a lot of inclination towards. During college whenever I have had a chance to work on projects pertaining to DS, I somehow work on it with a lot of Zeal and passion. This field is quite huge and very evolving and it is always important that I am abreast with the current trends as new techniques and algorithms are being researched and worked upon.
On this path to be more skilled in the field of DS I started this Data Science course offered by Harvard University as part of its MOOC program — Harvardx. This course can be done from EDX platform. This is one of the most famous courses in the field of Data Science as takes you right from basics of R (that you could use in DS) to developing complex ML algorithms. This course consists of 9 modules which in itself take about 2 months to complete each. Each module along the way had some quizzes and programming assignments. To earn a certificate for every module we had to pass all the assignments with a score of at least 70%. All in all the overall course takes about 1.5 years to complete. Below I would be talking about each modules and my learning from each one of them.
- Data Science: R Basics : Most of the DS work that I have done has been in Python here I got the opportunity to work on my DS skills using R. But, the first step was to understand R and clear my Basics in it. In this module I got acquainted with Functions, Data types in R. Further I understood about how various vectors can be used in R along with some of the sorting techniques in R. We also studied about how Indexing can be done in R language. Using this basic knowledge we started working ion importing data from different sources and making basic plots like line charts, bar charts, histograms etc. There was also some content on improve our Programming basics. This course was a little straightforward since I have already learnt some programming languages in past and could very quickly get my hands dirty on a new language.
- Data Science: Visualization : After we had completed our Basics in R language, the next module was on Visualization. Here I understood about the different types of distributions or Visualizations there are. Along with some new ones like Boxplots. Then and perhpas the most interesting part of the course was when we were introduced to ‘ggplot2’ library. The single and most effective library that can be used to plot numerous graphs in R. Then we were introduced to another library with the name ‘dplyr’ which is used to draw more complex charts. Lastly, we went through some of the Data Visualization principles about important it is to never do too much design on your graphs and make sure that the charts are simple to understand and is very user friendly. This was also a simple course as some these visualization skills I had gone though during my undergrad days.
- Data Science: Probability : This is from where all the fun began. Probability plays a cardinal role when it comes to analytics and Data science, and it is important that these concepts are cleared before me move on. In this course I understood the concepts of Discrete Probability, Combinations and Permutations and some of its applications for example the Monty Hall problem. Then the concept of Continuous Probability was taught. Followed by this there were other techniques on use of Random Variable, Sampling, Center Limit Theory etc. This course on the whole improved my skills in probability to a good extent. Something which would come in handy in future modules.
- Data Science: Inference and Modeling : This next module was a step up on the knowledge we gained in the Probability module. Where firstly we saw how to use Parameters and estimates in action. The Center Limit Theory was introduced with some applications. Following that we looked in Confidence intervals and p values. This stuff is quite extensively used when we are using Null / Alternate hypothesis. Then was a little bit of fun part we started doodling into Bayesian Statistics, and not just that we also started applying all these skills in action in our assignment where we used the knowledge learnt in this course towards Election Forecasting and running some Associate Tests. This was a little tough course as at ever corner we were introduced to something new. But it was super exciting at the same time.
The above 4 courses were really fun to do and had a massive learning curve (especially the last 2 courses).
So that was the first 4 modules I worked on in the Professional Certificate on Data Science by HarvardX and it took me ~6 months to complete it, and I genuinely would recommend if you are new to DS you should start looking into these.
Since there is 5 more modules that I have worked on putting all of them in one page feels like too much info to grasp so i’ll add those in the next article.