Tyyppiarvon Data Science -päivä -haastattelusarja julkaistaan poikkeuksellisesti englanniksi, sillä itse tapahtuma on englanninkielinen.
”Data Science Day Interviews” -series introduces the main partners of Data Science Day. The next one in line is Data Scientist Timo Voipio from Solita.
Tell briefly about yourself and how you ended up working for Solita.
Hi! I am Timo, a Data Scientist at Solita. My journey to Solita has wound through TKK/Aalto University (MSc in Engineering Physics), University of Eastern Finland (PhD in Photonics), and finally a short stay at a multinational IT company, where I worked on a large-scale data warehousing project. While my work there was quite interesting itself, I yearned for a position involving more R, Python, and SQL and less Word and Excel. A friend, now also a colleague, hinted that I should apply to Solita, and about a month later I signed for a new set of keys and a new laptop. In addition to formal education, I have worked on hobby projects involving data at various scales.
Why do you work in Solita and what is the role of Data Science in your company?
At Solita I have the privilege of working on projects with different scopes, different clients and industries, and with different technologies. Even though it is a cliché, no two days are the same, which suits me perfectly. The importance of data utilization is increasing rapidly, and our team of 10+ Data Scientists are at the core of helping our clients to both create entirely new business opportunities and to run their existing business more efficiently. One of the reasons why I like working at Solita is that our work has real, measurable impact. Another significant part of why I work here is that we have a bona fide Data Science team: our aim is to never work alone on a project, and we are constantly learning from each other. Data in general and Data Science specifically are critical parts of Solita’s business, and a growing share of our projects involve data analytics in some form or another.
Which tasks belong to your job description? Which statistical methods and programs do you use in your job?
A Data Scientist should be a versatile and flexible data professional; my tasks vary from getting the data from various sources (ranging from CSV/Excel files to cloud-native data warehouses) via implementing and validating the machine learning model, and finally to presenting results to the client, showing what observations we made from their data, how our machine learning model performs, and how it will impact their business. From the technical point of view, I also contribute to the design of the overall architecture if we are the machine learning model or statistical analysis as a component of a larger product; for example, how to deliver sales and delivery predictions as a part of a dashboard used by the client’s sales & operations planners and executive-level management.
Frequently occurring methods and models are, e.g., linear regression and generalized liner regression, principal component analysis, decision trees, and random forests. Since real-world data tends to have missing values, different imputation methods come into play. Cross-validation is used very frequently for accuracy and error estimates. My usual toolset for exploratory statistics and machine learning is R, RStudio, ggplot2 (for visualization) and RMarkdown, but for more production-grade implementations I would use Python. We tend to favor code-oriented (as opposed to visual, “drag-n-drop”) tools due to their repeatability, natural support for version control, straightforward automation possibility, and wide platform support.
Name three skills that you consider most important in your job.
1) Problem solving skills, naturally. This also includes formulating the problem and asking (yourself and the client) whether you are even solving the right problem or not.
2) Communications skills. No matter where you work as a data scientist, you need to be able to communicate with others to first understand the problem and its context, then communicate your methods, assumptions, results, possible error and bias sources, etc. Being able to construct effective, informative, and concise visualizations of data and results is a particularly important and difficult part.
3) Mathematical and algorithmic skills. There is no need to know the ins and outs of every algorithm by heart, but a solid foundation in, e.g., statistics and linear algebra is at the core of understanding different algorithms (and not just treating them as black boxes) and their limitations.
Are there any positions for students on your department and what kind of assignments students and new employees typically get to do?
At the moment we are looking for data scientists with significant experience already under their belt. In general, we look at the skills and experience of an applicant, so students are also welcome to apply! Cultural fit is an important factor, including being passionate towards and taking pride in your work, and caring about your colleagues and clients. New employees are assigned to client projects together with more experienced colleagues, so they are working on real client projects with real clients from the beginning. However, from time to time we organize academy-style induction training for people new to the field. Focus of the academy, and hence the profile of the participants we recruit, varies from year to year; if you want to keep posted, please check out our website www.solita.fi, follow us on social media (Facebook LinkedIn Twitter), or sign up for our open positions newsletter at https://www.solita.fi/avoimet-tyopaikat/.
Which hints would you give to a young statistician who is aiming to work on your field?
Be curious, be openminded, be sure that you develop a wide array of skills. A modern Data Scientist should be able to work with relational databases (no, NoSQL is not going to kill SQL!) and cloud services (e.g. AWS, Azure), and a working knowledge of good software development practices (source control systems, unit testing, DevOps) are skills which will both make your life easier and enhance your employment prospects. Working on projects on your own and publishing them in e.g. GitHub is a nice way of demonstrating your skills.