AI Classification Engine — Unifying Multi-Modal Sensor Data

The Classification Challenge

The CROWN Diagnostic captures data through three distinct sensing modalities — optical micro-imaging, near-infrared spectroscopy, and impedance sensing. Each modality produces a different type of raw data: high-resolution images, spectral absorption curves, and frequency-dependent electrical measurements. The AI classification engine performs the essential work of integrating these heterogeneous data streams into a single, coherent CROWN Hair DNA profile.

This is a non-trivial machine learning problem. The engine must learn relationships between optical appearance, molecular composition, and electrical properties that have never been systematically mapped across the full diversity of human hair types. It must produce consistent results despite natural variation — the same person’s hair differs from crown to nape, from morning to evening, from winter to summer. And it must do all of this with equal accuracy for every hair type, from the straightest Type 1A to the tightest Type 4C coil.

Each sensor modality contributes a distinct perspective on hair properties:

Optical data provides structural information — fibre diameter, cross-section geometry, cuticle condition, surface morphology. These measurements capture the physical architecture of the hair shaft at microscopic resolution.

Spectroscopic data provides molecular information — protein integrity, hydration, lipid content, chemical residue signatures. These measurements reveal the internal composition of the fibre, including evidence of chemical treatments that may be invisible to optical inspection.

Impedance data provides functional information — how the hair interacts with moisture over time, how porous the cuticle is to water transfer, how electrical conductivity varies across the fibre. These measurements capture dynamic properties that affect real-world hair behaviour.

The classification engine learns to interpret these complementary data streams together. A high ellipticity index (optical) combined with elevated porosity (impedance) and disrupted disulfide bonds (spectroscopic) tells a different story than the same ellipticity index with low porosity and intact protein structure. The first pattern suggests chemically treated hair; the second suggests naturally textured hair in good condition. The engine must learn to distinguish these patterns reliably.

Bias Prevention by Design

The historical pattern in diagnostic AI systems is well documented: models trained predominantly on data from majority populations perform systematically worse for underrepresented groups. In dermatology, AI models trained primarily on lighter skin tones have demonstrated reduced accuracy for darker skin. In facial recognition, training set imbalances have produced measurable disparities in error rates across demographic groups.

CROWN’s classification engine is designed to prevent this pattern from the outset. Equitable performance across all hair types is not an afterthought — it is a foundational engineering requirement that shapes every aspect of model development.

Training data composition. The CROWN Hair Commons is specifically designed to capture comprehensive representation across all hair types, textures, and ethnic backgrounds. Data collection protocols include demographic balance requirements to ensure that no hair type category is systematically underrepresented in the training set.

Performance monitoring by subgroup. Model accuracy is evaluated not only in aggregate but separately for each hair type category. A model that achieves 95% overall accuracy but only 80% accuracy for Type 4C hair would be considered unacceptable under CROWN’s performance standards. Equal accuracy across categories is the metric that matters.

Architectural decisions. The model architecture is designed to learn from shared underlying properties (fibre diameter, porosity, protein integrity) rather than surface-level pattern matching that might encode biased assumptions. This approach means the engine classifies hair based on what it measures, not on what it expects to see.

Continuous validation. As the Commons grows, the engine is continuously retrained and validated against expanding datasets. This ensures that classification accuracy improves for all hair types as more data becomes available, rather than improving only for types that are already well-represented.

Reproducibility Across Devices

A classification system is only useful if it produces consistent results. The same hair sample measured by different CROWN Diagnostic units in different locations — with different ambient lighting, different humidity levels, different operators — must produce statistically equivalent CROWN Hair DNA profiles.

This requirement imposes strict constraints on the AI engine. The model must be robust to the natural variation in sensor readings that arises from environmental and operational differences between measurement sites. It must distinguish genuine differences between hair samples from artefactual differences introduced by measurement conditions.

CROWN addresses this through a multi-layer calibration approach. At the hardware level, each Diagnostic unit undergoes standardisation against reference samples. At the software level, the AI engine incorporates normalisation layers that compensate for systematic differences between devices. At the validation level, cross-device concordance testing ensures that profile assignments remain consistent within defined tolerances.

Learning from Limited Data

A practical challenge for any new classification system is the cold-start problem: achieving reliable performance before the training dataset reaches full maturity. The CROWN Hair Commons will eventually contain hundreds of thousands of profiles, but early-stage models must perform credibly with smaller datasets.

CROWN’s approach draws on established techniques from medical AI, where training data scarcity is a familiar constraint. Transfer learning from related domains (material science spectroscopy, textile fibre analysis) provides initial model weights that capture general principles of fibre characterisation. Data augmentation techniques expand the effective training set by generating realistic synthetic variations. Active learning strategies prioritise the collection of training data for hair types where model uncertainty is highest, directing data collection resources where they will have the greatest impact on classification accuracy.

These techniques do not eliminate the need for comprehensive training data — they manage the path toward it. The long-term accuracy of CROWN’s classification engine depends on the growth of the CROWN Hair Commons, which is why every scan contributes to the collective dataset.

An Evolving System

The AI classification engine is not a fixed system deployed once and left unchanged. It is a research tool that evolves as the underlying science develops.

As new sensor modalities are explored — Raman spectroscopy, thermal analysis, fluorescence imaging — the engine’s architecture must accommodate additional data streams. As the CROWN Hair Commons grows across geographies and demographics, the model must be retrained to reflect expanding diversity. As researchers identify new clinically or socially significant hair properties, the engine must learn to classify them.

This is the nature of research infrastructure. It is built to support inquiry, not to deliver a finished answer. CROWN’s AI classification engine is designed with this principle at its core — flexible enough to incorporate new knowledge, rigorous enough to produce reliable results, and transparent enough to earn the trust of the research community that depends on it.