EntityExtractor

java.lang.Object
- prajna.entity.EntityExtractor

All Implemented Interfaces:

RecordReasoner

Direct Known Subclasses:

AlchemyExtractor, CalaisExtractor, CompositeExtractor, DateTimeVerifier, LingpipeExtractor, OpenNlpExtractor
```
public abstract class EntityExtractor
extends java.lang.Object
implements RecordReasoner
```
Abstract class for entity extractors. This class provides some basic utilities for extracting entities, extracting typed entities from InputStreams, Files, or Strings. Also, this class includes parsing of DocData and DataRecord elements for entities of interest.

Each of the extract methods returns a Map. This map contains the entities as keys in the map. If the entity extractor supports typing of the entities, the values of the map should identify the type of each entity extracted.

Author:

Edward Swing

Constructor Summary

Constructors
Constructor and Description

EntityExtractor()

Constructors
Constructor and Description
`EntityExtractor()`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addTagFieldMapping(EntityType tag, java.lang.String field)` Associate a tag generated by the entity extractor with a particular field in a DataRecord.
`void`	`addTextField(java.lang.String field)` Add a field to the set of text fields queried for entity extraction.
`boolean`	`augmentRecord(DataRecord rec)` Augment the data record with information from the reasoner.
`java.util.Map<java.lang.String,EntityTag>`	`extract(java.io.BufferedReader reader)` Extract the mapping of entities from a reader.
`java.util.Map<java.lang.String,EntityTag>`	`extract(DocData data, boolean storeTokens)` Extract the mapping of entities from a DocData object.
`java.util.Map<java.lang.String,EntityTag>`	`extract(java.io.File file)` Extract the mapping of entities from a file.
`java.util.Map<java.lang.String,EntityTag>`	`extract(java.lang.String text)` Extract the mapping of entities from a string of text.
`protected abstract void`	`extractEntities(java.lang.String text, java.util.Map<java.lang.String,EntityTag> tokenMap)` Extract the entities, loading the tokenMap provided.
`abstract java.util.Map<java.lang.String,java.lang.String>`	`extractEntityRawTypes(java.lang.String text)` Retrieve the entities and the raw types for this entity extractor.
`java.lang.String`	`getDefaultTagField()` Get the default tag field used to store all extracted entities in the data record.
`java.lang.String`	`getModel()` Get the model string.
`boolean`	`isDebug()` Check whether the debug flag is set for this extractor
`void`	`setDebug(boolean flag)` Set the debug flag.
`void`	`setDefaultTagField(java.lang.String field)` Set the default tag field used to store the extracted entities in the data record.
`void`	`setModel(java.lang.String model)` Set the model for this entity extractor.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - EntityExtractor
```
public EntityExtractor()
```
- Method Detail
  - addTagFieldMapping
```
public void addTagFieldMapping(EntityType tag,
                      java.lang.String field)
```
    Associate a tag generated by the entity extractor with a particular field in a DataRecord. When the augmentRecord method is called, any entities matched to the tag are added to the specified field. If the field is null, the entities will be added to the default tag field. If the default tag field is also null, the entities will be mapped to an internal field in the DataRecord.
    
    Parameters:
    tag - a tag generated by the entity extractor
    field - a field within a DataRecord which will contain the entities extracted
  - addTextField
```
public void addTextField(java.lang.String field)
```
    Add a field to the set of text fields queried for entity extraction. When the augmentRecord method is called, the text in all text fields will be used in the entity extraction process.
    
    Parameters:
    field - the text field name in the DataRecord
  - augmentRecord
```
public boolean augmentRecord(DataRecord rec)
```
    Augment the data record with information from the reasoner. This method extracts all entities from all text fields which have been identified with the addTextField method. It then adds the extracted entities to the data record as text values. If the tagFieldMap has been set, it uses the mapping to define which field each token should be added to. Otherwise, if the tagField has been set, this method places all tokens in the tagField. Otherwise, it stores the entities as internal values.
    
    Specified by:
    
    augmentRecord in interface RecordReasoner
    
    Parameters:
    rec - the data record to augment
    
    Returns:
    true if the record has been augmented, false otherwise
  - extract
```
public java.util.Map<java.lang.String,EntityTag> extract(java.io.BufferedReader reader)
                                                  throws java.io.IOException
```
    Extract the mapping of entities from a reader. This method should return a map of the entities, and their types. If a particular entity extractor does not support typing the entities, the values of the map should be non-null. This method creates a HashMap, and invokes extract(text, tokenMap) method. The map is loaded through successive reads of the stream until the stream is empty.
    
    Parameters:
    reader - The reader for reading the textual data.
    
    Returns:
    a map with the entities for the keys, and types for the values
    
    Throws:
    
    java.io.IOException - if there is a problem reading from the reader
  - extract
```
public java.util.Map<java.lang.String,EntityTag> extract(DocData data,
                                                boolean storeTokens)
```
    Extract the mapping of entities from a DocData object. This method uses the body of the DocData object as the argument to the extract(String text) method. If the storeTokens flag is set, all extracted entities will be added to the set of keys for the DocData object.
    
    Parameters:
    data - the DocData object
    storeTokens - if the method should store the entities as keys for the DocData object
    
    Returns:
    a map with the entities for the keys, and types for the values
  - extract
```
public java.util.Map<java.lang.String,EntityTag> extract(java.io.File file)
                                                  throws java.io.IOException
```
    Extract the mapping of entities from a file. This method simply opens a reader and calls the extract(BufferedReader reader) method. This method does not distinguish any special format for the text file, treating all terms as equal.
    
    Parameters:
    file - The file containing the text.
    
    Returns:
    a map with the entities for the keys, and types for the values
    
    Throws:
    
    java.io.IOException - if there is a problem reading from the file
  - extract
```
public java.util.Map<java.lang.String,EntityTag> extract(java.lang.String text)
```
    Extract the mapping of entities from a string of text.
    
    Parameters:
    text - The body of text.
    
    Returns:
    a map with the entities for the keys, and types for the values
  - extractEntities
```
protected abstract void extractEntities(java.lang.String text,
                   java.util.Map<java.lang.String,EntityTag> tokenMap)
```
    Extract the entities, loading the tokenMap provided. This method is designed to be called numerous times during the extraction process for a file, document, or data record. The tokenMap will contain terms in the keySet, and an optional type in the values of the map.
    
    Parameters:
    text - the text to extract
    tokenMap - the map to contain the tokens in.
  - extractEntityRawTypes
```
public abstract java.util.Map<java.lang.String,java.lang.String> extractEntityRawTypes(java.lang.String text)
```
    Retrieve the entities and the raw types for this entity extractor. This method runs the extraction process on the text, and generates a map that identifies each entity with a type. The types returned from this method are not normalized, and dependent on the entity extractor implementation.
    
    Parameters:
    text - the text to parse for entities
    
    Returns:
    the raw entity types
  - getDefaultTagField
```
public java.lang.String getDefaultTagField()
```
    Get the default tag field used to store all extracted entities in the data record.
    
    Returns:
    the default tag field
  - getModel
```
public java.lang.String getModel()
```
    Get the model string. The interpretation of the model is dependent on the extractor.
    
    Returns:
    the model string. The default implementation returns null.
  - isDebug
```
public boolean isDebug()
```
    Check whether the debug flag is set for this extractor
    
    Returns:
    true if the debug flag is set, false otherwise
  - setDebug
```
public void setDebug(boolean flag)
```
    Set the debug flag. The debug flag will cause messages to be written to stdout for debugging and diagnosis of a particular extractor
    
    Parameters:
    flag - the debug flag
  - setDefaultTagField
```
public void setDefaultTagField(java.lang.String field)
```
    Set the default tag field used to store the extracted entities in the data record. This field will be used for any entities which have no field mapped to a particular tag.
    
    Parameters:
    field - the tag field
  - setModel
```
public void setModel(java.lang.String model)
```
    Set the model for this entity extractor. This method should be overridden for entity extractors that require a model or configuration information. The actual content of the model parameter is unspecified, and may be a file path, URL, or other configuration information.
    
    Parameters:
    model - the model for the entity extractor

Class EntityExtractor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

EntityExtractor

Method Detail

addTagFieldMapping

addTextField

augmentRecord

extract

extract

extract

extract

extractEntities

extractEntityRawTypes

getDefaultTagField

getModel

isDebug

setDebug

setDefaultTagField

setModel