There are many reasons why using R or Python has advantages over using MS Excel:
- It’s free
- It’s open source (with a huge active community!)
- R and Python have more data manipulation capabilities
- Both R and Python support larger data sets / Big Data applications
- You can automate your workflows
- Code is easily reproducible and scalable
- State of the art statistics and graphics capabilities
- Thousands of libraries are available for specialized tasks
- You can do machine learning; build neural networks and use applications in the field of A.I.
- You can build interactive dashboards and share them online
- You don’t need any support from your IT/ERP Team to roll out BI Tools
- You will improve your position in the job market by enhancing your capabilities in the use of data driven / analytical decision making
A disadvantage of learning R or Python (or both!), could be the steep learning curve. Nowadays, this can be considered a minor problem. Many world class online courses are available for learning data science and the associated programming languages. Sites like datacamp, udemy, udacity, edx or coursera (just for naming a few prominent ones) enable you to learn R and Python anywhere at any time.
Our experience is, that people can start using R proficiently after about 3 months, practicing 1 to 2 hours per day. We are not saying that they will become a fully proficient programmer or data scientist in that time, but it will be enough training time to start adding value to your organization. We found that people who have a strong background in quality management (e.g.: six sigma, TQM), often have a strong foundation in statistical theory, which helps when learning data science.
What surprised us a bit was the fact, that trained mathematicians and people who came from a computer science background had problems applying the tools to real world problems. They lacked intrinsic knowledge about the problems an organization faces. They could not fully grasp the business problems and implement solutions based on data science frameworks which improved the processes or added more value to the business. We think it is best to form a cross functional team of experts from all parts of the organization that face the current problem. It is best to have somebody understand the business problem and somebody who can code a solution for that problem. In our opinion, a good data scientist has to understand the problem first by talking to the people who face the challenge. Only after the problem has been fully understood, a solution should be worked on. If you have somebody who both has knowledge of the business problem and understands data science, you will have a very valuable asset.
After you understand the basics of R or Python, it is like getting into a formula one car instead of your old excel street car. The additional capabilities that data science software packages offer are staggering. Up to this date, we have not met a single person who went back to excel for advanced data analysis or statistical problem solving. It is also worth mentioning, that the visualization capabilities R offers, with the ggplot2 package, are second to none.
A word of warning! To become an expert in the field of Data Science, it is more realistic to think in years of training and not months. The body of knowledge in this field is vast and complex. It can take many years to become truly proficient! Becoming a data science professional is a lifelong journey of learning. This is also based on the fact, that this field is constantly developing and improving (e.g.: A.I., Big Data, Neural Networks, etc.).