Hi, Sami. Thank you for sharing your career story with Real World Data Science. Please tell us a little about yourself and your role in data science.
Hello! I’m Sami Rahman, a passionate head of data engineering and data platform at Penguin Random House, the book publisher that has enriched lives through literature. I started my career in data science five years ago and I’ve evolved into a data generalist with expertise in machine learning, data infrastructure, and data strategy.
What does your job involve?
My role is about harnessing the power of data to drive extraordinary outcomes. Leading a skilled team, we empower our company to leverage data and cutting-edge technologies for informed decisions and automation. I help shape our organisation’s capabilities in data science, analytics, machine learning, data management, and strategy.
What does “data science” mean to you?
Data science, to me, is a captivating fusion of modern data technologies and computational statistics that tackles business challenges, crafts intelligent automation, and generates insightful revelations.
What do you think is your most important skill as a data scientist?
Active listening is key. A data scientist must be surgical and precise in developing models, analysis, and tools that reinforce the company’s bottom line and operations. Data science exists to create value using data.
As I’ve transitioned into management, maintaining my coding prowess is an ongoing challenge. I stay sharp by doing data science and infrastructure development for fun, leveraging tools like ChatGPT and AirOps where I’m rusty.
How did you get into data science?
I began with a psychology degree, which led to working as business psychologist where I discovered psychometric data analysis. After a master’s in countering organised crime and terrorism and a few short jobs in counter terrorism/intelligence, I decided that it wasn’t for me. I embraced my love for statistics and research, I dove into data science, learning Python through online platforms, and secured my first data scientist role at a WPP agency called Essence.
What, or who, first inspired you to become a data scientist?
I always thought someone like me couldn’t work in data, let alone data science. Dr Suzy Moat’s fascinating talk on machine learning’s application to human behaviour and psychology showed me that a psychologist could make a significant impact in this field, inspiring my aspiration to try to have a data science career.
What were the hurdles or challenges that you needed to overcome on your route into the profession?
Breaking into data science without a typical background in maths/computer science/physics was daunting. Building a Kaggle portfolio and coding models for fun prepared me for interviews. Another challenge was learning to harmonise my “data brain” and “business brain” to solve problems efficiently. Understanding how data solutions impact business problems will always propel you forward.
And what are the challenges that you face now, as a working data scientist?
As I’ve transitioned into management, maintaining my coding prowess is an ongoing challenge. I stay sharp by doing data science and infrastructure development for fun, leveraging tools like ChatGPT and AirOps where I’m rusty. I’m currently building my own cloud data platform and running a lot of image neural networks on it.
What was your first job in data science, and how does it compare to your current role?
As an analytics executive at WPP agency Essence, I tackled data science, cloud engineering, and analytics problems for clients. They were a lot more singular and tactical in nature. Now, as head of data engineering and data platform at Penguin Random House, I focus on shaping data and technology strategy to align with the company’s broader vision.
What was the most important thing you learned in your first year on the job?
To always consider the bigger picture: how your work integrates with the organisation/client’s objectives, delivers value, and aligns with the aspirations of other stakeholders. Actionable insights and value is the most important thing.
What have been your career highlights so far?
Two shining moments include being the first of three of HSBC UK fraud data science leaders, where each of our departments tackled a different type of crime and protected our customers, and developing data strategies and capabilities for analytics, science, and business intelligence at Penguin Random House.
Have there been any mistakes or regrets along the way?
I regret not delving deeper into natural language processing (NLP) or spatial data science, which are now more accessible and growing fields within data science. I reckon the NLP methodologies would’ve been extremely useful seeing as I’m at a publishing company now!
How do you think your role will evolve over the rest of your career?
As data technologies become more accessible, I anticipate data roles will transform. I envision a future where data professionals focus on general AI, quantum machine learning, and multi-dimensional data analytics as traditional specialisms become democratised.
If you were starting out in data science now, what three things would you put at the top of your reading/study list?
I’d recommend Skin in the Game by Nassim Nicholas Taleb, Calling Bullshit: The Art of Scepticism in a Data-Driven World by Jevin West and Carl Bergstrom, and Artificial Intelligence: How Machine Learning Will Shape the Next Decade by Matthew Burgess.
What personal or professional advice would you give for anyone wanting to be a data scientist now?
Success in data science hinges on understanding how it can transform organisations and engaging with business stakeholders. My advice: never stop listening to the business – the stakeholders are your biggest allies. I would also try to find your niche that sets you apart from everyone else. Mine when I first started in the field was my expertise on computational psychology and behavioural machine learning.
What new ideas or developments in the field of data science are you personally most excited about or intrigued by?
Transfer learning excites me most, as numerous large technology companies now offer pre-trained models based on billions/trillions of parameters. This will revolutionise industries worldwide, as it will be easier to build more performant models even if a company has less data.
What do you think will be the main challenges facing data science as a field in the next few years?
The challenge lies in staying relevant amidst the democratisation of data science. Through large language models, low-code, and transfer learning, advanced data science methods will become easier for non-specialists to do and use. Innovation and keeping up with modern data technologies will be crucial.
- Copyright and licence
- © 2023 Royal Statistical Society
This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence. Photo of Sami Rahman is not covered by this licence.
- How to cite
- Tarran, Brian. 2023. “‘I always thought someone like me couldn’t work in data, let alone data science.’” Real World Data Science, April 24, 2023. URL