The Hidden Problem
Machine learning optimizers make bold claims about performance, but how do we know how close they actually get to the true global minimum? In practice, we never know where the global minimum is, so optimizer comparisons rely on relative performance - which can be misleading.
The Blacklight Metaphor
Like a blacklight revealing hidden stains invisible under normal light, Project Blacklight exposes what optimizers actually do in the dark by testing them against problems where we do know the true answer.
Our Approach
Project Blacklight generates toy neural networks small enough that we can find the global minimum through exhaustive search or mathematical analysis. We then test various optimizers (SGD, Adam, RMSprop, BFGS, etc.) with statistical rigor to see how close they actually get.
Methodology
Our research protocol includes:
- Toy Problem Generation: Create neural networks small enough for exact solutions
- Statistical Rigor: Minimum 20-50 runs per optimizer configuration
- Comprehensive Analysis: p-values, confidence intervals, quartiles, variance measurements
- Multiple Architectures: Test across different network topologies and activation functions
- Hyperparameter Sensitivity: Systematic exploration of learning rates and other settings
Expected Insights
This research will reveal:
- Optimizer Gaps: How far optimizers typically fall short of true optima
- Consistency Patterns: Which optimizers are most reliable across problems
- Hyperparameter Sensitivity: How robust optimizers are to parameter choices
- Problem-Specific Performance: Which optimizers work best for different loss landscapes
Impact
Project Blacklight will provide the machine learning community with:
- Evidence-Based Optimizer Selection: Move beyond marketing claims to actual performance data
- Realistic Expectations: Understand the true limitations of current optimization methods
- Research Direction: Identify where optimizer improvements are most needed
- Practical Guidelines: Provide actionable advice for practitioners
Technical Infrastructure
Development Platform: Forge system with RTX 5070 Ti (16GB VRAM)
Computational Requirements: Embarrassingly parallel - perfect for GPU acceleration
Timeline: Research phase active, implementation planned for 2025
Research Questions
Key questions we aim to answer:
- How much optimization performance do we actually lose by not finding global minima?
- Are newer optimizers like AdamW meaningfully better than SGD on problems we can solve exactly?
- Do optimizer performance patterns on toy problems predict real-world behavior?
- What loss landscape characteristics make certain optimizers succeed or fail?