PoV: Data Science for a newbie.

Flavjo Xhelollari
6 min readFeb 15, 2021

After reading and reflecting on some different articles on Data Science, I personally think that the article from DataJobs is quite informing. Also, despite the fact that in academia Wikipedia is usually underestimated, I think that its article on data science is very well formulated and accurate. Anyhow, what made DataJob’s article more convincing was that the article included the most essential information about all the aspects of data science as a new field, whilst the others were either too short, or not very inclusive. Also, I found thearticle to be more straightforward and down to earth. Moreover, the article was easy to followand the graphical details included in it, made the content easier to digest.

Source :https://cdn.searchenginejournal.com/wp-content/uploads/2019/12/when-to-use-data-science-in-seo-5def8e5b1c22c-760x400.png

All the articles were revolving around the same axis. The essence of each of them is that Data Science is a new multidisciplinary field, which uses data to analyze and solve problems. They also agreed upon the set of fields that make Data Science what it is. Basically, the pillars of Data Science are Mathematics, Hacking/technological skills, and Business understanding. Also, there were many other points of agreement between them, including here the historical foundations of Data Science as a new field, but it is important to mention that they were affirmative when it came to give a proper/exact definition of Data Science. The fact that Data Science is going to become even more crucial for businesses in the near future, was mentioned inmost of the papers, which all agreed on the importance of it as a complex toolbox for profit maximization and efficiency-improving strategies. Therefore, the role of a data scientist in the market is gaining more and more significance, which surely will lead to Data Science becoming a necessity for a wide variety of businesses and governments themselves. Some even may argue that this will make the rich richer and faint the privacy of the citizens, but this is an ethical matter of a huge significance, that’s why we will not be talking about it.

I think that one of the most visible differences between the papers was the profile of a data scientist. As a prospective/possible data scientist in the future, I liked the approach of the article of DataJobs, which defined a data scientist as a creative person whose mindset made them possible to solve problems by indulging their creativity. Also, it emphasized the importance of creativity and curiosity into making data scientists’ role, a role which is fit for those who “look” beyond the obvious, and having a sort of independence, dare to face difficult challenges. This approach to the data scientist’s role, wasn’t stressed enough, thus, I found DataJobs’ paper more insightful. Additionally, not all the papers focused on the importance of business’ knowledge importance into making a good data scientist. To produce better data-driven solutions, one needs to have a good understanding of how business work.

Since the first years of my elementary school, I was interested in engineering and computers. I have always seen myself as a person who has the proper skills to enter the world of engineering and computers. That is why, I chose to major in Computer Science, and study abroad for a better education. After two first years in college, I realized that I like Artificial Intelligence and Machine Learning, and I am aware that this is a difficult road, but I think I will make it. Surely, Machine Learning, as an essential part of Data Science, is one of the jobs that is totally relied on data, thus engineering data-driven solutions for business, may be my main professional responsibility in the upcoming years.

Taking the 10 roles mentioned above into consideration, I think that I belong in between three of them: Data Scientist/ Programmer/ Engineer. As I said before, the lack of hands-on experience, makes me doubt everything I know about these respective positions. At the first sight, being a Data Scientist seems very fancy, and there is a lot of ‘black-box’ coming with it. If you asked me in high school, I would think Data Scientist was a white-coat-lab job. Now, on the contrary, I see myself working in the Data Science field, but I don’t think myself to be a “scientist”, and I guess the terminology here confuses other people. Also, as an Eastern European, I prioritize the Engineering job position, despite really liking it or no. We tend to think that being an Engineer is better than being in some relatively new, risky job position (as seen from the Eastern European perspective). But despite that, engineering means building, and building things involves creativity and passion, and that’s why I personally think that it is a better job position for me. Regarding the programming part, this is where Computer Science comes in handy. I personally like coding, but I don’t see myself in some office in front of a screen coding most of the time. Again, that’s why engineering is superior to the other positions. One thing I may regret when entering the market, is not gaining more knowledge regarding the functionality of a business. Since, in my opinion, data science is a strong business tool, it is crucial for a Data Scientist to know how business works.

To elaborate more on the matter of the notion of Data Science itself, and the role of the data scientist, I think that Data Science is not about being able to build complex algorithms that analyze situations and predict outcomes with high accuracy, but rather, being able to create, manipulate, and use data in such way that it maximizes the impact on the data-driven company. Generally speaking, companies doesn’t care how complex the algorithm is, and how you implement it, as long as you get the work done, and fulfill each of the requirements of the project. According to opensource.com, data science is said to be “a branch of computer science dealing with capturing, processing and analyzing data to gain new insights…” [RD] This sounds familiar, but there is a big difference between opensource’s definition, and the others we have seen so far. Opensource considers Data Science to be a branch of computer science, meanwhile most of the other sources’ definitions, set it to be a multidisciplinary blend of fields. Also, they tend to define Data Science as a strong engineering feature. Unsurprisingly, the importance of Data Science is indisputable. The recent decades, and the coming of Internet 2.0 exponentially grew the amount of data circulating around companies. Obviously, this led to the need of more advance data managing capabilities, for which the hardware advances gave a huge push. The presence of this huge amount of data in this enormous global market led to the birth of DataScience as an exceptional tool which enhanced the way companies dealt with data. Also, this presence of data, now accessible by nearly 4.7 billion people around the world, gives them the ability to learn, grow and improve, but this means more data in the pool [JJ]. Thus, new updated software were needed to manage the big-data around, including here Hadoop, Spark etc…

Worth to be mentioned is the fact that statistic is one of the most important parts of being a data scientist. Usually, the process goes like this: gathering data, parsing and normalizing it, and creating routines for a computer to run and search for a pattern or visualize it [RH].Programming languages like Python, R, Julia etc., are essential to a data scientist. Apart from these technical aspects of being a data scientist, one needs to have knowledge about how business itself works, and how to use data to achieve its goals [LD]. To elaborate more on the careers as data scientists, it is considered to be “the sexiest job of the 21st century”, and is currently one of the most desired jobs in the US [LD]. Also, since it is applicable not only to big companies and governments, but also to startups and smaller businesses, data science is considered to be a vital skill for the following decades. This is why the salaries for data science are sky-rocketing.

REFERENCES

  1. [RH] RedHat Inc. (n.d.). What is data science? Retrieved January 30, 2021, fromhttps://opensource.com/resources/data-science
  2. [LD] Doyle, L. (2020, October 01). What does a data scientist do? Retrieved January 30,2021, from https://www.northeastern.edu/graduate/blog/what-does-a-data-scientist-do/
  3. [JJ] Johnson, J. (2021, January 27). Internet users in the WORLD 2020. Retrieved February 05, 2021, from https://www.statista.com/statistics/617136/digital-population-worldwide/#:~:text=How%20many%20people%20use%20the,percent%20of%20the%20global%20population.

--

--

Flavjo Xhelollari
0 Followers

Computer Science and Mathematics recent graduate from @American University in Bulgaria, interested in Data Science and AI.