Bone-age estimation has been a manual, time-consuming task for radiologists for 60+ years. Since 2017, AI systems can do it in 5 seconds — and their accuracy now rivals expert radiologists.
Why is AI ideal for bone age?
Traditional Greulich-Pyle and Tanner-Whitehouse 2/3 methods:
- Require trained pediatric radiology (1-2 min/image)
- 95% inter-rater agreement (i.e., ±0.5 yr spread is normal)
- Same radiologist test-retest consistency: ±0.3 yr
- Pediatric radiologists in Turkey: ~300 (estimated)
AI systems:
- 3-5 seconds
- 100% consistency (same image → same result)
- ±0.3-0.5 yr mean absolute error (MAE)
- 24/7 access, broadly scalable
Leading AI models (as of 2026)
1. BoneXpert (Visiana, Denmark)
The first FDA-approved AI bone-age system (2009, Class II). The 2026 v3:
- Range: 6 mo - 17 yrs
- Greulich-Pyle and TW2-RUS scoring
- MAE: 0.42 yr (manual radiologist: ~0.5 yr)
- 100,000+ clinical uses, 30+ European hospitals
- Integrated in pediatric endocrinology routine
2. RSNA Pediatric Bone Age Challenge (2017)
12,611 labeled hand radiographs — an open dataset that catalyzed the deep-learning boom in this space. Winning team (16BitInc) used a ResNet50 + InceptionV3 ensemble with MAE 4.265 months (≈0.36 yr) — the year's best result.
3. DeepASA (Stanford, 2020)
- 14,036 hand radiograph training set
- Vision Transformer (ViT) based
- MAE: 0.39 yr
- Open-source code (PyTorch)
4. Visual Genome BA Model (Google Research, 2024)
- 50,000+ radiographs (privately curated)
- Multi-task: BA + skeletal anomaly detection
- MAE: 0.31 yr (reported; prospective clinical validation ongoing)
5. Turkish local models
No FDA/CE-approved Turkish model yet. Istanbul Medical Faculty + Bilkent University joint work (2024-2025) reports 0.45 yr MAE (n=2,300 Turkish children).
Accuracy metrics explained
MAE (Mean Absolute Error)
Average of absolute (prediction − ground truth). 0.4 yr MAE means half of predictions are within ±0.4 yr, 95% within ±1.0 yr.
Population validation
Is the test set spread across ages, ethnicities, disease states? "Healthy white American boys 5-18" might score 95%, but a different ethnicity drops to 70% (the bias problem).
Out-of-distribution performance
Performance on rare pathological images (CAH, Turner, achondroplasia) drops dramatically. As of 2026, AI on rare diseases is still weak.
Clinical validation — what we ask
Accuracy isn't just MAE. More critical questions:
- Did clinician decisions change? Did AI reduce decision time or alter the plan?
- Usability: How robust on low-dose or motion-blurred images?
- Bias: How does performance vary across ethnic, age, sex subgroups?
- Adversarial robustness: Does small image noise dramatically change output?
- Explainable AI: Can the model show which anatomical area drove its prediction (heat maps)?
Limitations (as of 2026)
- AI doesn't replace pediatric radiologists — it's an assistant. Pathology detection (fracture, cyst, dysplasia) is still human work.
- Low explainability — black-box. When an endocrinologist asks "why 12.3 yr?" the AI mostly can't explain beyond a heat map.
- Ethnic training bias — most models trained on US/European samples, less tested on Asian/Turkish populations.
- Regulatory complexity — FDA Class II ≠ CE-MDR Class IIa ≠ Turkey's TİTCK approval.
- Data privacy — hand x-rays count as biometric data under KVKK and GDPR.
AI bone age in Turkey — current state
- Hospital deployment: None. No FDA-approved system imported.
- Research: Bilkent, ITU, Istanbul University at prototype stage.
- Legal: TİTCK Medical Device Classification — Class IIa approval requires clinical validation.
Çocuk Gelişim's AI preview
Our AI Bone-Age tool is at research preview level, not for clinical decisions. Roadmap:
- ✅ Phase 0 (May 2026): Mock prediction prototype, UX testing
- 🔄 Phase 1 (Q3 2026): ResNet-50 + RSNA 2017 dataset, MAE 0.5 yr target
- 🔄 Phase 2 (Q1 2027): Turkish population fine-tuning, IRB-approved clinical validation study (n=500)
- 🔄 Phase 3 (2027-2028): TİTCK Class IIa approval + commercial clinical use
Prerequisite: IRB approval + KVKK compliance + pediatric radiologist supervision.
FAQ
Will AI replace radiologists?
No, in the short and medium term. AI will play a triage + pre-report role; final pediatric radiology sign-off remains required. A 2024 NEJM report shows AI + radiologist hybrid has 30% fewer errors than radiologist alone.
Are there phone-based apps that estimate bone age from a skin photo?
Yes, but not clinically suitable. Bone-age from skin photos peaks at ~60% accuracy — without radiography you can't see internal anatomy.
Is AI bone age more accurate than Khamis-Roche?
Wrong comparison — they use different inputs. AI BA = hand x-ray (BA); Khamis-Roche = height + weight + MPH. You can use AI BA as input for BA-based predictions like Bayley-Pinneau, but compounding errors apply.
Bottom line
As of 2026, AI bone-age systems match or exceed expert radiologist accuracy. But clinical use demands validation + regulation + explainability + bias control. Try our Bone-Age AI prototype free with Premium and join us in building Turkey's first validated clinical AI bone-age system.