This article proposes a framework for automated analysis of the inner ear anatomy in radiological data, which could facilitate preoperative planning and clinical research. A fully automated pipeline with a single, dual-headed volumetric 3D U-Net was implemented, trained and evaluated using manually labeled in-house datasets from cadaveric specimen and clinical practice. The model robustness was further evaluated on three independent open-source datasets consisting of cadaveric specimen scans. Results of the ablation studies showed a clear performance benefit of coupling landmark localization with segmentation and a dataset-dependent performance impact on segmentation ability.