Evaluating and improving a text classifier for subpopulations.
For these subpopulations, we use crime types as an example. The report treats three issues: missingness of text fields, bias in population estimates due to modelling errors and difference in model performance between crime types. The issues are analysed and solutions are proposed. The variability in model performance with crime types appears to be the most difficult issue to tackle. The proposed method to evaluate model performance over subpopulations might also be useful in other situations where machine learning is used.