• Ask Data Dawn
  • Posts
  • 3 DETAILED data portfolio projects ideas + datasets

3 DETAILED data portfolio projects ideas + datasets

Perfect for Beginner, Intermediate and Advanced data professionals 🔥

Are you ready to build your next portfolio project, but don’t know where to start? Or need a little inspiration on what to work on?

I got you.

I spent the better part of last week looking into Kaggle datasets and coming up with project ideas that would look good in a Data portfolio. I picked these ideas because (1) they are unique, and (2) I think many people would be interested in working in these industries.

For each of these projects, I have Beginner, Intermediate and Advanced level project ideas. As we go up the chain of difficulty, you’ll notice

↳ The level of technical skills required increases. Beginner projects can be done in entirely in SQL, while advanced projects would have to use both SQL & Python (& even a data visualization tool).
↳ The guidance for the projects become more ambiguous, because that’s how real-world data analytics work is.
↳ The projects become more comprehensive. Beginner projects involve extracting insights, while advanced projects involve end-to-end data extraction to actionable outcome.

Also, I wrote these projects such that the 3 levels are relevant to each other. You can start at any level, and slowly increase the depth of your project to the Advanced level.

If this is your first ever project, and you’re working in SQL, here are detailed instructions on how to set up for your first project.

Ok, no more yapping from me. Here are 3 Data project ideas that you can work on starting today!

Topic 1: YouTube Content Strategy Optimization

Beginner: Imagine you’re an analyst at a digital marketing agency, you’ve been asked to analyze video performance for a client who wants to understand what makes their videos successful.

  1. What are the top 5 keywords by average video views?

  2. Which videos have the highest like-to-view ratio, and what keywords do they use?

  3. What is the average sentiment score for comments on videos published in the last 30 days?

  4. Which keyword generates the most commented videos on average?

  5. Create a simple engagement score (views + likes + comments) and find top 10 videos?

Intermediate: Develop a content performance framework, to help your clients understand what performs well (and why), and to allow them to replicate success across their videos

  1. Calculate the 7-day, 30-day, and all-time performance percentiles for each video using window functions. Which videos are consistently high performers across all timeframes

  2. Identify "engagement quality" by analyzing the correlation between comment sentiment distribution and video performance metrics.

  3. Build a cohort analysis: Group videos by publish month and track their average cumulative views over time to find the cohorts with strongest long-term performance.

  4. Create a "comment controversy score" by analyzing the variance in sentiment scores and comment likes within each video. Which keywords tend to generate the most controversial discussions?

Advanced: Develop a Real-time Engagement Quality Monitor —specifically we want a system that identifies videos requiring intervention based on comment patterns:

  • Early warning system for negative sentiment spirals

  • Community health scores by content category

  • Moderator resource allocation optimizer

  • Impact analysis of creator responses on sentiment

  • Cross-platform sentiment comparison framework

Topic 2: Mental Health Treatment Gap

Beginner: Imagine you’re an analyst at a global health organization, create a preliminary report on mental health treatment gaps to guide funding decisions.

  1. Which countries have the highest percentage of untreated anxiety disorders in the most recent year?

  2. What is the average prevalence of depressive disorders by region?

  3. Calculate total mental health burden by country and identify top 10

  4. Compare data coverage availability across different mental disorders

  5. Identify the most common depressive symptoms in the US population

Intermediate: Develop a comprehensive framework for measuring and predicting treatment gap trends across multiple conditions.

  1. Calculate year-over-year prevalence changes (maybe using window functions) to uncover any concerning trends

  2. Build cohort analysis tracking treatment gap evolution by initial prevalence quartiles

  3. Develop a multi-factor "Risk Score" predicting mental health crisis likelihood

Advanced: Choose one comprehensive project direction. Yes, I left these projects ambiguous intentionally. Pick whichever of these feels most interesting, and build out an end-to-end project with little guidance!

  • Real-Time Crisis Detection System: Early warning platform for mental health emergencies

  • Investment Optimization Platform: ROI modeling for mental health interventions

  • Equity Index & Prioritizer: Multi-dimensional tool for reducing mental health inequities

Topic 3: Delivery Performance & Customer Satisfaction Analysis

Beginner: Analyze the company’s delivery performance as a junior analyst investigating basic metrics and their impact on customer reviews.

  1. Calculate average delivery time vs. estimated time by state

  2. Compare review scores across different payment types and delivery speeds

  3. Identify percentage of late deliveries and their impact on satisfaction

  4. Analyze delivery performance across top 5 product categories

  5. Create monthly trends of order volume and review scores

Intermediate: Develop a comprehensive delivery performance framework as a senior analyst responsible for optimizing operations across Brazil.

  1. Calculate seller delivery performance percentiles using window functions to identify consistent top performers

  2. Build customer cohort analysis linking first-order delivery experience to 6-month retention

  3. Analyze "delivery promise gap" impact on reviews, including text mining for delivery mentions

  4. Identify operational bottlenecks by analyzing timestamps between order status transitions

  5. Create composite "seller reliability score" and correlate with business outcomes

Advanced: Design enterprise-scale delivery optimization systems as a data scientist working on strategic initiatives. Pick one of these big ideas and get started!

  • Predictive Delivery Risk System: ML model predicting delays with intervention strategies and ROI analysis

  • Dynamic Delivery Promise Engine: Personalized delivery estimates based on multiple factors with A/B testing framework

  • Seller Performance Platform: Automated monitoring, root cause analysis, and improvement recommendations with stakeholder management considerations

Thank you for reading until the end of this post and for sharing your time with me! Your time and attention is all I ask for.

But if you’d like to support me, here are a few ways you can do so: