public class FieldCleaner
extends java.lang.Object
| Constructor and Description |
|---|
FieldCleaner() |
| Modifier and Type | Method and Description |
|---|---|
void |
cleanField(java.util.Collection<DataRecord> records,
java.lang.String fieldName,
DataType fieldType,
boolean guess)
Clean the values in a particular field within a collection of
DataRecords.
|
java.lang.String |
guessValue(java.lang.String value,
java.util.HashSet<java.lang.String> goodValues)
Find the best match from a set of strings for a particular value.
|
void |
replaceValue(java.lang.String oldValue,
java.lang.String newValue,
java.util.Collection<DataRecord> records,
java.lang.String fieldName,
DataType fieldType)
replace all occurrences of a particular value within a particular field.
|
void |
setMaxDistance(int distance)
Set the maximum distance for word replacement.
|
void |
setMinCount(int count)
Set the minimum count for a field value.
|
public void cleanField(java.util.Collection<DataRecord> records, java.lang.String fieldName, DataType fieldType, boolean guess)
guess flag is set, this method
attempts to find a good value to replace the value with. Otherwise the
value is removed. The case of the values is also normalized if the
guess flag is set.records - the collection of records to cleanfieldName - the field namefieldType - the field type. If null, the type is determined from
the set of recordsguess - whether to guess new values for those values which occur
less than the minimum countpublic java.lang.String guessValue(java.lang.String value,
java.util.HashSet<java.lang.String> goodValues)
value - The string value to matchgoodValues - the set of values to match againstpublic void replaceValue(java.lang.String oldValue,
java.lang.String newValue,
java.util.Collection<DataRecord> records,
java.lang.String fieldName,
DataType fieldType)
oldValue - the old value to replacenewValue - the new valuerecords - the collection of DataRecordsfieldName - the field where the value occursfieldType - the type of fieldpublic void setMaxDistance(int distance)
distance - the new maximum distancepublic void setMinCount(int count)
count - the minimum count