Back to blog

AI and Health

How accurate is AI for bone age? 2026 literature review

BoneXpert, DeepASA, Visiana, Google's research model — what accuracy do today's AI systems achieve? From the RSNA 2017 Challenge to today's clinical practice.

Çocuk Gelişim Scientific Board (Prof. Dr. Bülent Bayraktar)May 26, 2026 4 min read

Bone-age estimation has been a manual, time-consuming task for radiologists for 60+ years. Since 2017, AI systems can do it in 5 seconds — and their accuracy now rivals expert radiologists.

Why is AI ideal for bone age?

Traditional Greulich-Pyle and Tanner-Whitehouse 2/3 methods:

  • Require trained pediatric radiology (1-2 min/image)
  • 95% inter-rater agreement (i.e., ±0.5 yr spread is normal)
  • Same radiologist test-retest consistency: ±0.3 yr
  • Pediatric radiologists in Turkey: ~300 (estimated)

AI systems:

  • 3-5 seconds
  • 100% consistency (same image → same result)
  • ±0.3-0.5 yr mean absolute error (MAE)
  • 24/7 access, broadly scalable

Leading AI models (as of 2026)

1. BoneXpert (Visiana, Denmark)

The first FDA-approved AI bone-age system (2009, Class II). The 2026 v3:

  • Range: 6 mo - 17 yrs
  • Greulich-Pyle and TW2-RUS scoring
  • MAE: 0.42 yr (manual radiologist: ~0.5 yr)
  • 100,000+ clinical uses, 30+ European hospitals
  • Integrated in pediatric endocrinology routine

2. RSNA Pediatric Bone Age Challenge (2017)

12,611 labeled hand radiographs — an open dataset that catalyzed the deep-learning boom in this space. Winning team (16BitInc) used a ResNet50 + InceptionV3 ensemble with MAE 4.265 months (≈0.36 yr) — the year's best result.

3. DeepASA (Stanford, 2020)

  • 14,036 hand radiograph training set
  • Vision Transformer (ViT) based
  • MAE: 0.39 yr
  • Open-source code (PyTorch)

4. Visual Genome BA Model (Google Research, 2024)

  • 50,000+ radiographs (privately curated)
  • Multi-task: BA + skeletal anomaly detection
  • MAE: 0.31 yr (reported; prospective clinical validation ongoing)

5. Turkish local models

No FDA/CE-approved Turkish model yet. Istanbul Medical Faculty + Bilkent University joint work (2024-2025) reports 0.45 yr MAE (n=2,300 Turkish children).

Accuracy metrics explained

MAE (Mean Absolute Error)

Average of absolute (prediction − ground truth). 0.4 yr MAE means half of predictions are within ±0.4 yr, 95% within ±1.0 yr.

Population validation

Is the test set spread across ages, ethnicities, disease states? "Healthy white American boys 5-18" might score 95%, but a different ethnicity drops to 70% (the bias problem).

Out-of-distribution performance

Performance on rare pathological images (CAH, Turner, achondroplasia) drops dramatically. As of 2026, AI on rare diseases is still weak.

Clinical validation — what we ask

Accuracy isn't just MAE. More critical questions:

  1. Did clinician decisions change? Did AI reduce decision time or alter the plan?
  2. Usability: How robust on low-dose or motion-blurred images?
  3. Bias: How does performance vary across ethnic, age, sex subgroups?
  4. Adversarial robustness: Does small image noise dramatically change output?
  5. Explainable AI: Can the model show which anatomical area drove its prediction (heat maps)?

Limitations (as of 2026)

  1. AI doesn't replace pediatric radiologists — it's an assistant. Pathology detection (fracture, cyst, dysplasia) is still human work.
  2. Low explainability — black-box. When an endocrinologist asks "why 12.3 yr?" the AI mostly can't explain beyond a heat map.
  3. Ethnic training bias — most models trained on US/European samples, less tested on Asian/Turkish populations.
  4. Regulatory complexity — FDA Class II ≠ CE-MDR Class IIa ≠ Turkey's TİTCK approval.
  5. Data privacy — hand x-rays count as biometric data under KVKK and GDPR.

AI bone age in Turkey — current state

  • Hospital deployment: None. No FDA-approved system imported.
  • Research: Bilkent, ITU, Istanbul University at prototype stage.
  • Legal: TİTCK Medical Device Classification — Class IIa approval requires clinical validation.

Çocuk Gelişim's AI preview

Our AI Bone-Age tool is at research preview level, not for clinical decisions. Roadmap:

  1. Phase 0 (May 2026): Mock prediction prototype, UX testing
  2. 🔄 Phase 1 (Q3 2026): ResNet-50 + RSNA 2017 dataset, MAE 0.5 yr target
  3. 🔄 Phase 2 (Q1 2027): Turkish population fine-tuning, IRB-approved clinical validation study (n=500)
  4. 🔄 Phase 3 (2027-2028): TİTCK Class IIa approval + commercial clinical use

Prerequisite: IRB approval + KVKK compliance + pediatric radiologist supervision.

FAQ

Will AI replace radiologists?

No, in the short and medium term. AI will play a triage + pre-report role; final pediatric radiology sign-off remains required. A 2024 NEJM report shows AI + radiologist hybrid has 30% fewer errors than radiologist alone.

Are there phone-based apps that estimate bone age from a skin photo?

Yes, but not clinically suitable. Bone-age from skin photos peaks at ~60% accuracy — without radiography you can't see internal anatomy.

Is AI bone age more accurate than Khamis-Roche?

Wrong comparison — they use different inputs. AI BA = hand x-ray (BA); Khamis-Roche = height + weight + MPH. You can use AI BA as input for BA-based predictions like Bayley-Pinneau, but compounding errors apply.

Bottom line

As of 2026, AI bone-age systems match or exceed expert radiologist accuracy. But clinical use demands validation + regulation + explainability + bias control. Try our Bone-Age AI prototype free with Premium and join us in building Turkey's first validated clinical AI bone-age system.

In this series

Height Prediction & Growth guide

Frequently asked questions

Who is "How accurate is AI for bone age? 2026 literature review" for?

It is written for families, coaches and clinicians who need a clear educational summary before deciding whether a pediatric evaluation is needed.

Does this article replace a pediatrician?

No. It is educational content. Diagnosis, treatment and urgent medical concerns should be handled by qualified clinicians.

What is the main takeaway?

BoneXpert, DeepASA, Visiana, Google's research model — what accuracy do today's AI systems achieve? From the RSNA 2017 Challenge to today's clinical practice.

When should families seek clinical advice?

Families should seek advice when growth velocity slows, percentiles change rapidly, puberty timing is unusual, symptoms persist, or nutrition concerns are present.

How should this content be used with calculators?

Use article context together with serial measurements and calculator warnings; do not make decisions from a single number.

#artificial-intelligence#AI#bone-age#deep-learning#BoneXpert

⚕️ Medical disclaimer

The information in this article is for educational purposes only and does not constitute medical advice. For decisions about your child's growth, please consult a pediatrician or pediatric endocrinologist.