Dovetail Seeker uses the file document specification to enable indexing files that exist on the file system. It searches through the directories listed in the specification and indexes the content of the files it finds. Which files are indexed can controlled by explicitly including or excluding them from the specification.
File document specifications define:
Security Note: The user the Dovetail Seeker windows service or console is running under must have access to the paths in the file document specification in order to index the content.
To extract the text for files. Dovetail Seeker uses a library called Apache Tika.check their website for supported file formats.
The following is a specification for searching for documentation files:
<fileDocumentSpecification description="paths to where your documents live" tags="docs">
<identification displayName="documentation" />
<directories>
<directory path="c:\documentation">
<include name="*pdf" />
<exclude name="bigfiles"/>
</directory>
</directories>
</fileDocumentSpecification>
This file specification will looks in the directory c:\documentation for all files ending in pdf and will exclude any files that exist in a directory named bigfiles.
See Also |
Next |