Loading Dashboard
Initializing feature dimension analysis...
Université de Lorraine – MSc in NLP – June 12, 2025

Intrinsic Evaluation of
Word Embeddings
Attribution-Based Analysis of French
Grammatical Gender in BERT Models

A comprehensive study investigating gender localization patterns across FlauBERT model architectures

Research Focus

How is French grammatical gender for nouns and adjectives encoded in BERT embeddings? We investigate whether gender information is localized within specific dimensions or distributed across multiple dimensions, using four attribution methods across multiple FlauBERT architectures to identify minimal gender-predictive subsets.

Research Team

Supervisor
Prof. David Langlois
david.langlois@univ-lorraine.fr
Reviewer
Franco Terranova
franco.terranova@univ-lorraine.fr
Student
Experiment + Analysis + Dashboard
Tayyab M.
muhammad.tayyab5@etu.univ-lorraine.fr
Student
Experiment + Dataset Processing
Celine Zyna-Rahme
celine.zyna-rahme6@etu.univ-lorraine.fr
Student
Dataset Processing
Stephanie Ounanian
stephanie.ounanian3@etu.univ-lorraine.fr
Student
Report Writing
Chenhan Gao
chenhan.gao5@etu.univ-lorraine.fr

Research Framework

Three main hypotheses investigating gender localization patterns in French BERT embeddings

H1

Within-Model Localization

LARGELY SUPPORTED

If gender information is localized within individual models, all attribution methods should converge on shared dimensions at 1-5% thresholds.

Result: 15/16 configurations at 5%
Strong within-model convergence demonstrated
H2

Cross-Model Universal Localization

REJECTED

If gender follows universal localization principles, all methods should produce consistent dimensional subsets across all architectures.

Result: No universal subset found
Gender encoding is model-specific
H3

Smaller Dimensions Suffice

STRONGLY SUPPORTED

Smaller embedding dimensions are sufficient for effective gender classification.

Result: 1% dims = 49-85% accuracy
Minimal dimensions highly effective

Primary Research Questions

1
Dimension Localization
Can gender information be localized to particular embedding dimensions?
2
Method Convergence
Do various attribution methods agree on the same minimal dimension subset?
3
Minimal Requirements
What are the minimal dimensional requirements for gender prediction?
4
Model Size Impact
What is the effect of model size on gender information distribution?

Major Discoveries

Four key findings that reshape our understanding of gender encoding in BERT

Discovery 1: Distributed Gender Encoding

Gender information is represented through redundant encoding across many dimensions, not singular "gender neurons". 49-85% accuracy achieved using only 1% of dimensions.

Evidence:
Distributed representation confirmed

Discovery 2: Stable Consensus Core

A stable "consensus core" emerges at 10-25% thresholds with significant inter-method agreement, enabling up to 75% dimensionality reduction.

Evidence:
10-25% threshold optimal

Discovery 3: FlauBERT-small Efficiency

FlauBERT-small-cased was found to be most efficient, often outperforming larger variants while using fewer computational resources.

Evidence:
512-dim model superior

Discovery 4: Task-Specific Performance

French adjectives consistently outperform nouns in gender prediction (96-99% vs 89-94% peak accuracy) due to richer morphological markers.

Evidence:
Adjectives > Nouns consistently

Hypothesis Evaluation Results

H1: LARGELY SUPPORTED (Within-model localization) • H2: REJECTED (Universal patterns) • H3: STRONGLY SUPPORTED (Minimal dimensions suffice)

15/16
Configurations showing convergence at 5%
0
Universal dimensions across all models
49-85%
Accuracy with just 1% of dimensions

Methodology

Comprehensive four-experiment framework with multiple attribution methods and FlauBERT architectures

FlauBERT Models Tested

Flaubert-small-cased
512 dimensions
Flaubert-base-cased
768 dimensions
Flaubert-base-uncased
768 dimensions
Flaubert-large-uncased
1024 dimensions

Dataset Details

Source: Morphalou3 lexical resource
French Nouns: Up to ~16.5K balanced samples
French Adjectives: Up to ~16.5K balanced samples
Preprocessing: Balanced F/M distribution (50/50)
Split: Stratified 80/20 train-test split

Attribution Methods

SHAP (Permutation) Game-theoretic
LIME (Tabular) Local explanations
Random Forest Gini importance
Ekaterina's Method Cross-validation

Experimental Framework

Experiment 1: Perceptron + French Nouns
Linear baseline for dimensional importance assessment
Experiment 2: Perceptron + French Adjectives
Linear model comparison within same architecture
Experiment 3: MLP + French Nouns
Non-linear feature combination analysis
Experiment 4: MLP + French Adjectives
Complete experimental matrix coverage

Dimensionality Reduction Pipeline

1
Baseline Training
Full dimensions
2
Attribution Analysis
4 methods
3
Ranking
Importance scores
4
Thresholding
1%-75% subsets
5
Retraining
Reduced sets
6
Performance
Evaluation
7
Consensus
Overlap analysis

Experimental Results

Comprehensive analysis across 16 experimental configurations with minimal subset performance findings

Key Performance Findings

85%
Best 1% performance
MLP, Large model, RF method, Adjectives
99%
Peak adjective accuracy
Perceptron, Small model, SHAP, 25% threshold
75%
Max dimension reduction
With <5% accuracy loss
16
Total configurations
4 models × 2 classifiers × 2 tasks

Perceptron Experiments

Linear classifier results across all models

French Nouns - Minimal Subset Results

Flaubert-small (512 dims) Best Overall
1%: 63% (EKA)
10%: 87% (SHAP)
25%: 92% (SHAP)
Flaubert-large (1024 dims) Large Model
1%: 77% (EKA)
10%: 88% (EKA)
25%: 90% (EKA)
Flaubert-base-cased (768 dims) Medium
1%: 58% (EKA)
10%: 65% (EKA)
25%: 69% (SHAP)
Flaubert-base-uncased (768 dims) Lowest
1%: 55% (SHAP/EKA)
10%: 59% (LIME)
25%: 59% (LIME/EKA)

French Adjectives - Minimal Subset Results

Flaubert-small (512 dims) Outstanding
1%: 73% (EKA)
10%: 96% (SHAP)
25%: 99% (SHAP)
Flaubert-large (1024 dims) Strong
1%: 78% (EKA)
10%: 92% (EKA)
25%: 97% (SHAP)
Flaubert-base-cased (768 dims) Good
1%: 61% (EKA)
10%: 69% (SHAP)
25%: 79% (SHAP)
Flaubert-base-uncased (768 dims) Moderate
1%: 56% (EKA)
10%: 58% (SHAP/EKA)
25%: 60% (SHAP)

MLP Experiments

Multi-layer perceptron results across all models

French Nouns - MLP Minimal Subset Results

Flaubert-small (512 dims) Best MLP
1%: 67% (SHAP/RF)
10%: 92% (SHAP)
25%: 94% (SHAP)
Flaubert-large (1024 dims) Strong MLP
1%: 82% (RF)
10%: 90% (RF)
25%: 92% (SHAP)
Flaubert-base-cased (768 dims) Decent MLP
1%: 62% (RF)
10%: 71% (SHAP)
25%: 75% (SHAP)
Flaubert-base-uncased (768 dims) Lower MLP
1%: 60% (RF)
10%: 63% (RF)
25%: 63% (SHAP/RF)

French Adjectives - MLP Minimal Subset Results

Flaubert-small (512 dims) Excellent MLP
1%: 78% (RF)
10%: 95% (SHAP)
25%: 98% (SHAP)
Flaubert-large (1024 dims) Very Good MLP
1%: 85% (RF)
10%: 94% (RF)
25%: 94% (SHAP/RF)
Flaubert-base-cased (768 dims) Good MLP
1%: 66% (RF)
10%: 79% (RF)
25%: 78% (SHAP)
Flaubert-base-uncased (768 dims) Moderate MLP
1%: 62% (RF)
10%: 65% (RF)
25%: 65% (RF/SHAP)

Key Minimal Subset Insights

French Nouns/Adjectives Gender Prediction Performance

85%
Best 1%
MLP, Large, RF, Adjectives
93%
Best 5%
Perceptron, Small, SHAP, Adjectives
96%
Best 10%
Perceptron, Small, SHAP, Adjectives
99%
Peak 25%
Perceptron, Small, SHAP, Adjectives
+5%
Advantage
Adjectives over Nouns

Consensus Analysis: The Core Discovery

Minimal Consensus (1% threshold)

Triple agreement (All methods): 0-2 dimensions
Best consensus: 2 (Base Uncased, Adjectives)
Perceptron configs with consensus: 6/8 configs
MLP configs with consensus: 4/8 configs

Minimal Consensus (5% threshold)

Triple agreement (All methods): 3-13 dimensions
Best consensus: 13 (Base Uncased, Adjectives)
Perceptron configs with consensus: 8/8 configs
MLP configs with consensus: 7/8 configs

Critical Finding: No single reliable "gender neuron" exists. Although individual dimensions show predictive capability, gender information is distributed across multiple correlated dimensions, requiring consensus analysis to identify stable predictive cores.

Analysis Controls

Feature Overlap Distribution

Consensus Trends

Feature Dimension Analysis

Research Insights

Transformative discoveries that challenge traditional views of linguistic feature encoding in neural models

Distributed vs. Localized Encoding

BERT embeddings encode gender information through distributed, redundant patterns rather than singular dimensions. This challenges the "localization hypothesis" and supports distributed representation theories.

Evidence:
Even 1% of dimensions (5-10 dims) achieve 49-85% accuracy, indicating redundant encoding across multiple correlated features

Model Compression Potential

Up to 75% dimensional reduction is possible with less than 5% accuracy loss, enabling significant model compression for gender-specific tasks while maintaining linguistic competency.

Practical Impact:
4× smaller models for mobile deployment, faster inference, reduced memory requirements

Architecture Efficiency

Smaller models often outperform larger ones for grammatical feature tasks. FlauBERT-small-cased (512 dims) frequently achieved superior results compared to larger variants.

Evidence:
512-dim: 99% vs 768-dim: 63-82% vs 1024-dim: 92-96% peak performance

Practical Applications

Model Compression

Deploy efficient gender-aware models with 75% fewer parameters while maintaining >95% accuracy. Ideal for mobile and edge computing applications.

Bias Detection & Mitigation

Use consensus core dimensions (10-25% subset) for systematic bias analysis and fairness auditing in NLP systems across different architectures.

Interpretability Enhancement

Leverage stable consensus cores to build interpretability tools that explain how neural models process grammatical features in systematic ways.

Cross-Linguistic Research

Extend methodology to Spanish, German, Italian, and other gendered languages to identify universal vs. language-specific encoding patterns.

Feature Engineering

Apply consensus core identification to other grammatical features (number, tense, aspect) for comprehensive linguistic analysis frameworks.

Conclusions & Impact

Transforming our understanding of gender encoding in neural language models with actionable insights

Core Theoretical Contribution

Distributed encoding confirmed: Gender forms a compact but redundant cluster, not singular "gender neurons"

Model-specific localization patterns: No universal subset exists across all architectures

Consensus core emergence: Stable agreement patterns form at 10-25% thresholds

Morphological superiority: Adjectives consistently outperform nouns due to richer gender markers

Future Research Directions

Cross-Linguistic
Extend to Romance, Germanic, and Slavic languages for universal pattern identification
Grammatical Features
Apply methodology to number, tense, aspect using same attribution framework
Architectural Analysis
Layer-specific probing and attention mechanism investigation
Downstream Impact
Real-world NLU task performance with compressed models

Research Impact

Advancing interpretability and efficiency in neural language processing

Model compression achievable
75% dimension reduction, <5% accuracy loss
99%
Peak accuracy achieved
Adjective classification, small model
16
Experimental configurations
Comprehensive methodology validation
33K
Total samples analyzed
French nouns + adjectives across models
Acknowledgements
Morphalou3 Dataset FlauBERT Team LORIA Laboratory Prof. Maxime Amblard

Main Conclusion

French grammatical gender in BERT embeddings is encoded through model-specific distributed patterns rather than universal localized dimensions. While individual models show strong within-architecture consensus (15/16 at 5% threshold), no universal subset exists across all architectures. A stable consensus core emerges at 10-25% thresholds, enabling significant model compression with minimal accuracy loss and advancing both theoretical understanding and practical applications in neural language processing.

H1: SUPPORTED
Within-model localization confirmed
H2: REJECTED
No universal cross-model patterns
H3: SUPPORTED
Minimal dimensions highly effective