The four pillars
Pillar 01 · SML · Subscription Companion Dataset: Bank Marketing — Portuguese banking institution direct marketing campaigns 🔗 kaggle.com/datasets/henriqueyamahata/bank-marketing ~41,000 records of marketing phone calls. Features include age, job, marital status, education, contact channel, prior outcomes, and economic indicators. Target: did the customer subscribe to a term deposit? Class-imbalanced (~89% no, ~11% yes) — your handling of the imbalance is part of how you'll be judged.
Pillar 02 · USML · Find Your Cluster Dataset: Credit Card Applications (mbsoroush) 🔗 kaggle.com/datasets/mbsoroush/credit-cards-applications No labels. Find structure in the applicant profiles that is interpretable and useful to a human.
Pillar 03 · CV · CIFAKE Dataset: CIFAKE — Real vs AI-Generated Images (birdy654) 🔗 kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images 120,000 images at 32×32 resolution — 60K real from CIFAR-10, 60K Stable Diffusion–generated. Binary classification.
Pillar 04 · NLP · Make Reviews Useful Dataset: Women's E-Commerce Clothing Reviews (nicapotato) 🔗 kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews ~23,000 product reviews with free-text body, 1–5 ratings, recommend/not flags, and structured columns. Default task is 5-class rating prediction from text; alternative framings are allowed if you also submit on the default.
Recommended tools
Data wrangling & analysis: Python (pandas, polars, numpy), Jupyter, Google Colab, R + tidyverse.
Classical ML: scikit-learn, XGBoost, LightGBM, CatBoost.
Deep learning: PyTorch, TensorFlow, Hugging Face Transformers.
LLM fine-tuning: LoRA / QLoRA on Llama 3.2 1B/3B, Gemma 2 2B, Phi-3 Mini — realistic on free Colab T4.
LLMs in the app layer (not in Model score): OpenAI, Anthropic Claude, Google Gemini, Ollama for local.
Visualization & demo UI: Plotly, Matplotlib, Streamlit, Gradio, Hugging Face Spaces.
Deployment: Streamlit Cloud, Hugging Face Spaces, Vercel, Replit, Google Colab.
Free compute & credits
- Google Colab — free GPU access
- Kaggle Notebooks — free GPU/TPU
- Hugging Face Spaces — free tier
- GitHub Student Developer Pack — DigitalOcean, MongoDB, JetBrains, and more
- [Sponsor-provided credits — to be announced at opening ceremony]
Learning resources
- Kaggle Learn — free micro-courses
- fast.ai Practical Deep Learning
- Google ML Crash Course
- Hugging Face NLP Course
- pandas 10 Minute Intro
Mentors
Mentors will be on the floor throughout the Friday build window in AIX Data 2.0 staff shirts. Ask for help early, not in the last hour. Mentor support is included in the event — they're here to unblock you, not judge you.
Communication
Discord: https://discord.gg/KKPmj4hwt Email: ai.researchcsulb@gmail.com Website: datathon26.com
