Data Science for Economics and Finance¶

Warning

These notes are very much a work in progress. I am still slowly assembling everything and this site as it currently stands is not reflective of the final product. The expectation is to have these notes done by some time in 2022.

In this part of the course we will focus on machine learning methods and databases. At this stage in the world of data science three of the most frequently used languages are R, Python and SQL. This means that when you enter the job market, you will have be proficient in at least one of these languages. In terms of statistical analysis R is the undoubted champion, while Python is a great general programming languages that can be used effectively in machine learning. SQL is the language used to interact with databases. We will be using both Python and R as our preferred programming languages, touching briefly on the basics of SQL. You do not have to worry if you have never worked with Python, you can continue working in R if you are more comfortable focusing on just one language at a time.

Just to make it clear, in this course we officially use R, which means I will provide you with R code (even if we work with Python). For my section it is more important that you understand the machine learning concepts and then you can implement it in any language that you prefer. At this stage in your career as an economist / data scientist, you can focus on one programming language to fulfill most of your needs. Concepts will remain the same across most languages (except C++!) and as you develop your skills you will notice that the language you use becomes almost irrelevant. Someone that is trained in Python, for example, should be able to transfer those skills to other languages in a short time frame. The reason to showcase Python in this course is then partly to introduce you to a new language and then show that you can easily switch between languages if needed. The general mantra is that you should not be afraid to use different tools for different problems. My goal is not to overwhelm you with syntax, but rather show you how easy it can be to play around with different languages. In my personal workflow I leverage the speed of Julia for computational problems and switch to Python for machine learning problems due to its mature package ecosystem. Whenever I have to do basic data wrangling and visualisation, I find that R is the easiest to use.

Data Science 871

Data Science for Economics and Finance¶

Topics¶

Readings¶