Mastering R: A Data Analyst's Journey of Fear and Fascination
By Andrew Tumwesigye
Data and Business Intelligence Analyst, Dimension Metrics
29/01/2025
A Fearful Beginning
Two years ago, while taking the Google Analytics Professional Certificate course, I encountered R programming for the first time. Alongside modules like Excel, SQL, and Tableau, R seemed like a foreign language that I would have trouble understanding. My initial reaction was to avoid it altogether and focus on what I could understand. For purposes of completing the course, I decided to cram my way through the module but promised myself never to pursue it.
Fast forward to the end of the course, we were advised by the course instructors that in the absence of jobs, we should leverage online datasets in order to build data analytics portfolios. I took that to heart in within 3 months, I had completed 14 projects, albeit utilizing SQL and Excel.
The Turning Point
A few months later, while applying for data analytics jobs, I noticed a recurring trend—most job descriptions required knowledge of R or Python alongside SQL, Excel and a Visualization tool such as Tableau, Power BI or Looker Studio. Realizing that knowledge of a coding language was non-negotiable, I decided to confront my fear. After all, others had mastered it—why couldn’t I?
Revisiting the R module in my course, I worked through it systematically, dedicating a month to understanding each concept. To test my learning, I embarked on a project that required analysis of a city pollution dataset. The dataset had 3,000 rows and 5 columns. Knowing that I was in deep waters, I opted to keep the analysis simple, focusing on aggregation, grouping and simple visualizations.
This small project unlocked a newfound interest in R. I discovered the power of packages like Tidyverse, which combines tools like: ggplot2 for data visualization, dplyr for data manipulation and tidyr for data cleaning.
Over time, what initially began as a complex tool quickly became a personal favorite. This is evidenced by the 57 projects I have completed to date, with 40% of them leveraging the power of R.
Why Analysts and Researchers Should Use R
R’s power lies in its flexibility and statistical focus, making it ideal for specific analytical needs. While I began with a tool-agnostic approach, I soon realized that statistical analyses are where R truly excels compared to SQL and Excel.
From my experience, here are some of the reasons that make R tick.
Comprehensive Statistical Analysis
R provides tools for advanced statistical computations, including:
Descriptive statistics: Measures of central tendency (mean, median, mode), dispersion (standard deviation), frequency distributions (histograms), and summary statistics.
Inferential statistics:
Hypothesis testing (e.g., chi-square, t.test)
Correlation and regression analysis (linear and logistic)
Detecting outliers using the Interquartile Range (IQR)
Advanced Visualizations
With the ggplot2 package, R allows you to create and customize complex visualizations that are essential for storytelling and presenting analyses.
Efficiency for Statistical Computing
Initially developed for statistical computing in 1992, R has a massive community and an extensive collection of packages available via the Comprehensive R Archive Network (CRAN). This makes it a one-stop shop for data science-related tasks.
A Simple R Example
R’s syntax is straightforward once you understand the logic. Here’s a brief example of creating a linear model to predict house prices:
# Linear regression model
linear_model <- lm(house_prices ~ number_of_bedrooms + location, data = housing_data)
print(linear_model)
For a logistic regression model, such as predicting disease presence based on age and blood pressure:
# Logistic regression model
logistic_result <- glm(disease_presence ~ blood_pressure + age, data = diseases, family = binomial)
print(logistic_result)
As seen above, R allows you to assign a result name, specify the function, define dependent and independent variables, and reference the dataset—all in a logical sequence.
Conclusion
Today, 1.5 years into using R, it has become my go-to tool for statistical analyses and to a lesser extent, data aggregation, grouping, filtering, and categorization. While other tools like Python, SPSS and SAS exist, R’s versatility, strong community support, and comprehensive package ecosystem make it stand out.[1]
The once intimidating lines of code are now second nature, proving that with patience and practice, what seems insurmountable can become a valuable skill. R isn’t just worth it for analysis—it’s a must-have for any data professional.
[1] Python Vs R for Data Science, which one should you learn? https://www.datacamp.com/blog/python-vs-r-for-data-science-whats-the-difference