- Ask Data Dawn
- Posts
- 3 DETAILED data portfolio projects ideas + datasets
3 DETAILED data portfolio projects ideas + datasets
Perfect for Beginner, Intermediate and Advanced data professionals 🔥

Are you ready to build your next portfolio project, but don’t know where to start? Or need a little inspiration on what to work on?
I got you.
I spent the better part of last week looking into Kaggle datasets and coming up with project ideas that would look good in a Data portfolio. I picked these ideas because (1) they are unique, and (2) I think many people would be interested in working in these industries.
For each of these projects, I have Beginner, Intermediate and Advanced level project ideas. As we go up the chain of difficulty, you’ll notice
↳ The level of technical skills required increases. Beginner projects can be done in entirely in SQL, while advanced projects would have to use both SQL & Python (& even a data visualization tool).
↳ The guidance for the projects become more ambiguous, because that’s how real-world data analytics work is.
↳ The projects become more comprehensive. Beginner projects involve extracting insights, while advanced projects involve end-to-end data extraction to actionable outcome.
Also, I wrote these projects such that the 3 levels are relevant to each other. You can start at any level, and slowly increase the depth of your project to the Advanced level.
If this is your first ever project, and you’re working in SQL, here are detailed instructions on how to set up for your first project.
Ok, no more yapping from me. Here are 3 Data project ideas that you can work on starting today!
Topic 1: YouTube Content Strategy Optimization
→ Link to dataset
Beginner: Imagine you’re an analyst at a digital marketing agency, you’ve been asked to analyze video performance for a client who wants to understand what makes their videos successful.
What are the top 5 keywords by average video views?
Which videos have the highest like-to-view ratio, and what keywords do they use?
What is the average sentiment score for comments on videos published in the last 30 days?
Which keyword generates the most commented videos on average?
Create a simple engagement score (views + likes + comments) and find top 10 videos?
Intermediate: Develop a content performance framework, to help your clients understand what performs well (and why), and to allow them to replicate success across their videos
Calculate the 7-day, 30-day, and all-time performance percentiles for each video using window functions. Which videos are consistently high performers across all timeframes
Identify "engagement quality" by analyzing the correlation between comment sentiment distribution and video performance metrics.
Build a cohort analysis: Group videos by publish month and track their average cumulative views over time to find the cohorts with strongest long-term performance.
Create a "comment controversy score" by analyzing the variance in sentiment scores and comment likes within each video. Which keywords tend to generate the most controversial discussions?
Advanced: Develop a Real-time Engagement Quality Monitor —specifically we want a system that identifies videos requiring intervention based on comment patterns:
Early warning system for negative sentiment spirals
Community health scores by content category
Moderator resource allocation optimizer
Impact analysis of creator responses on sentiment
Cross-platform sentiment comparison framework
Topic 2: Mental Health Treatment Gap
Beginner: Imagine you’re an analyst at a global health organization, create a preliminary report on mental health treatment gaps to guide funding decisions.
Which countries have the highest percentage of untreated anxiety disorders in the most recent year?
What is the average prevalence of depressive disorders by region?
Calculate total mental health burden by country and identify top 10
Compare data coverage availability across different mental disorders
Identify the most common depressive symptoms in the US population
Intermediate: Develop a comprehensive framework for measuring and predicting treatment gap trends across multiple conditions.
Calculate year-over-year prevalence changes (maybe using window functions) to uncover any concerning trends
Build cohort analysis tracking treatment gap evolution by initial prevalence quartiles
Develop a multi-factor "Risk Score" predicting mental health crisis likelihood
Advanced: Choose one comprehensive project direction. Yes, I left these projects ambiguous intentionally. Pick whichever of these feels most interesting, and build out an end-to-end project with little guidance!
Real-Time Crisis Detection System: Early warning platform for mental health emergencies
Investment Optimization Platform: ROI modeling for mental health interventions
Equity Index & Prioritizer: Multi-dimensional tool for reducing mental health inequities
Topic 3: Delivery Performance & Customer Satisfaction Analysis
Beginner: Analyze the company’s delivery performance as a junior analyst investigating basic metrics and their impact on customer reviews.
Calculate average delivery time vs. estimated time by state
Compare review scores across different payment types and delivery speeds
Identify percentage of late deliveries and their impact on satisfaction
Analyze delivery performance across top 5 product categories
Create monthly trends of order volume and review scores
Intermediate: Develop a comprehensive delivery performance framework as a senior analyst responsible for optimizing operations across Brazil.
Calculate seller delivery performance percentiles using window functions to identify consistent top performers
Build customer cohort analysis linking first-order delivery experience to 6-month retention
Analyze "delivery promise gap" impact on reviews, including text mining for delivery mentions
Identify operational bottlenecks by analyzing timestamps between order status transitions
Create composite "seller reliability score" and correlate with business outcomes
Advanced: Design enterprise-scale delivery optimization systems as a data scientist working on strategic initiatives. Pick one of these big ideas and get started!
Predictive Delivery Risk System: ML model predicting delays with intervention strategies and ROI analysis
Dynamic Delivery Promise Engine: Personalized delivery estimates based on multiple factors with A/B testing framework
Seller Performance Platform: Automated monitoring, root cause analysis, and improvement recommendations with stakeholder management considerations

Thank you for reading until the end of this post and for sharing your time with me! Your time and attention is all I ask for.
But if you’d like to support me, here are a few ways you can do so:
Follow me on Instagram… I’m trying really hard to build out a presence on IG!
Follow me on LinkedIn
Use Interview Master, my SQL practice & interview prep platform
Consider purchasing my Product Data Science Interview ebook