A comprehensive study investigating gender localization patterns across FlauBERT model architectures
How is French grammatical gender for nouns and adjectives encoded in BERT embeddings? We investigate whether gender information is localized within specific dimensions or distributed across multiple dimensions, using four attribution methods across multiple FlauBERT architectures to identify minimal gender-predictive subsets.
Three main hypotheses investigating gender localization patterns in French BERT embeddings
If gender information is localized within individual models, all attribution methods should converge on shared dimensions at 1-5% thresholds.
If gender follows universal localization principles, all methods should produce consistent dimensional subsets across all architectures.
Smaller embedding dimensions are sufficient for effective gender classification.
Four key findings that reshape our understanding of gender encoding in BERT
Gender information is represented through redundant encoding across many dimensions, not singular "gender neurons". 49-85% accuracy achieved using only 1% of dimensions.
A stable "consensus core" emerges at 10-25% thresholds with significant inter-method agreement, enabling up to 75% dimensionality reduction.
FlauBERT-small-cased was found to be most efficient, often outperforming larger variants while using fewer computational resources.
French adjectives consistently outperform nouns in gender prediction (96-99% vs 89-94% peak accuracy) due to richer morphological markers.
H1: LARGELY SUPPORTED (Within-model localization) • H2: REJECTED (Universal patterns) • H3: STRONGLY SUPPORTED (Minimal dimensions suffice)
Comprehensive four-experiment framework with multiple attribution methods and FlauBERT architectures
Comprehensive analysis across 16 experimental configurations with minimal subset performance findings
Linear classifier results across all models
Multi-layer perceptron results across all models
French Nouns/Adjectives Gender Prediction Performance
Critical Finding: No single reliable "gender neuron" exists. Although individual dimensions show predictive capability, gender information is distributed across multiple correlated dimensions, requiring consensus analysis to identify stable predictive cores.
Transformative discoveries that challenge traditional views of linguistic feature encoding in neural models
BERT embeddings encode gender information through distributed, redundant patterns rather than singular dimensions. This challenges the "localization hypothesis" and supports distributed representation theories.
Up to 75% dimensional reduction is possible with less than 5% accuracy loss, enabling significant model compression for gender-specific tasks while maintaining linguistic competency.
Smaller models often outperform larger ones for grammatical feature tasks. FlauBERT-small-cased (512 dims) frequently achieved superior results compared to larger variants.
Deploy efficient gender-aware models with 75% fewer parameters while maintaining >95% accuracy. Ideal for mobile and edge computing applications.
Use consensus core dimensions (10-25% subset) for systematic bias analysis and fairness auditing in NLP systems across different architectures.
Leverage stable consensus cores to build interpretability tools that explain how neural models process grammatical features in systematic ways.
Extend methodology to Spanish, German, Italian, and other gendered languages to identify universal vs. language-specific encoding patterns.
Apply consensus core identification to other grammatical features (number, tense, aspect) for comprehensive linguistic analysis frameworks.
Transforming our understanding of gender encoding in neural language models with actionable insights
Distributed encoding confirmed: Gender forms a compact but redundant cluster, not singular "gender neurons"
Model-specific localization patterns: No universal subset exists across all architectures
Consensus core emergence: Stable agreement patterns form at 10-25% thresholds
Morphological superiority: Adjectives consistently outperform nouns due to richer gender markers
Advancing interpretability and efficiency in neural language processing
French grammatical gender in BERT embeddings is encoded through model-specific distributed patterns rather than universal localized dimensions. While individual models show strong within-architecture consensus (15/16 at 5% threshold), no universal subset exists across all architectures. A stable consensus core emerges at 10-25% thresholds, enabling significant model compression with minimal accuracy loss and advancing both theoretical understanding and practical applications in neural language processing.