You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:
As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
As of 4.4, CompoundWordTokenFilterBase doesn't update offsets.