.if \n1 .ll \n1n \" for page width.. . use cmd line arg -r1# to set width to #
.de Q \" puts quotes around the argument. End previous line with \c
\&\&\\$1\&\&\\$2\&\\c
..
.TH LOOKUP 1
.nr IN 3n
.ce 1
April 22nd, 1994

.SH NAME 
lookup \- interactive file search and display
.SH SYNOPSIS
.B lookup
[
args
]
[
.I file ...
]
.br
.SH DESCRIPTION
.I Lookup
allows the quick interactive search of text files.  It supports ASCII,
JIS-ROMAN, and Japanese EUC Packed formated text, and has an
integrated romajikana converter.
.SH THIS MANUAL
.I Lookup
is flexible for a variety of applications. This manual will, however,
focus on the application of searching Jim Breen's
.I edict
(Japanese-English dictionary) and
.I kanjidic
(kanji database). Being familiar with the content and format of these
files would be helpful. See the INFO section near the end of this
manual for information on how
to obtain these files and their documentation.
.SH OVERVIEW OF MAJOR FEATURES
The following just mentions some major features to whet your appetite
to actually read the whole manual (-:
.TP
Romaji-to-Kana Converter
.I Lookup
can convert romaji to kana for you, even\c
.Q "on the fly"
as you type.
.TP
Fuzzy Searching
Searches can be a bit\c
.Q vague
or\c
.Q fuzzy ", "
so that you'll be able to
find\c
.Q 
even if you try to search for\c
.Q Ȥ
(the proper yomikata being\c
.Q Ȥ礦 ").  "
.TP
Regular Expressions
Uses the powerful and expressive
.I "regular expression"
for searching. One can easily specify complex searches that affect\&\&I want
lines that look like such-and-such, but not like this-and-that, but that
also have this particular characteristic....\&
.TP
Wildcard ``Glob'' Patterns
Optionally, can use well-known filename wildcard patterns instead of
full-fledged regular expressions.
.TP
Filters
You can have
.I lookup
not list certain lines that would otherwise match your search, yet can
optionally save them for quick review. For example, you could have all
name-only entries from
.I edict
filtered from normal output.
.TP
Automatic Modifications
Similarly, you can do a standard search-and-replace on lines just before
they print, perhaps to remove information you don't care to see on
most searches. For example, if you're generally not interested in
.IR kanjidic "'s"
info on Chinese readings, you can have them removed from lines before
printing.
.TP
Smart Word-Preference Mode
You can have
.I lookup
list only entries with
.I "whole words"
that match your search (as opposed to an
.I embedded
match, such as finding\c
.Q the
inside\c
.Q them "), "
but if no whole-word
matches exist, will go ahead and list any entry that matches the
search.
.TP
Handy Features
Other handy features include a dynamically settable and
parameterized prompt, automatic highlighting of that part of the line
that matches your search, an output pager, readline-like input with
horizontal scrolling for long input lines, a\c
.Q .lookup
startup file, automated programability, and much more. Read on!
.SH REGULAR EXPRESSIONS
.I Lookup
makes liberal use of
.I "regular expressions"
(or
.I regex
for short) in controlling various aspects of the searches. If you are
not familiar with the important concepts of regexes, read the tutorial
appendix of this manual before continuing.
.SH JAPANESE CHARACTER ENCODING METHODS
Internally,
.I lookup
works with Japanese packed-format EUC, and all files loaded must be
encoded similarly. If you have files encoded in JIS or Shift-JIS, you
must first convert them to EUC before loading (see the INFO section for
programs that can do this).

Interactive input and output encoding, however,
may be be selected via the -jis, -sjis, and -euc invocation flags
(default is -euc),
or by various commands to the program (described later).

Make sure to use the encoding appropriate for your system.  If you're
using kterm under the X Window System, you can use
.IR lookup "'s"
-jis flag to match kterm's default JIS encoding. Or, you might use
kterm's\c
.Q "-km euc"
startup option (or menu selection) to put kterm into
EUC mode. Also, I have found kterm's scrollbar (\c
.Q "-sb -sl 500" ") "
to be quite useful.

With many\c
.Q English
fonts in Japan, the character that normally prints
as a backslash (halfwidth version of \&) in The States appears as a
yen symbol (the half-width version of \&). How it will appear on your
system is a function of what font you use and what output encoding
method you choose, which may be different from the font and method
that was used to print this manual (both of which may be different
from what's printed on your keyboard's appropriate key).  Make sure to
keep this in mind while reading.

.SH STARTUP
Let's assume that your copy of
.I edict
is in ~/lib/edict. You can start the program simply with
.nf

        lookup ~/lib/edict

.fi
You'll note that
.I lookup
spends some time building an index before the default\c
.Q "lookup>\ "
prompt appears.

.I Lookup
gains much of its search speed by constructing an index of the file(s)
to be searched. Since building the index can be time consuming itself,
you can have
.I lookup
write the built index to a file that can be
quickly loaded the next time you run the program.
Index files will be given a\c
.Q .jin
(Jeffrey's Index) ending.

Let's build the indices for
.I edict
and
.I kanjidic
now:
.nf

        lookup -write ~/lib/edict ~/lib/kanjidic

.fi
This will create the index files
.nf
       ~/lib/edict.jin
       ~/lib/kanjidic.jin
.fi
and exit.

You can now re-start
.I lookup ,
automatically using the pre-computed index files as:
.nf

       lookup ~/lib/edict ~/lib/kanjidic

.fi
You should then be presented with the prompt without having to wait
for the index to be constructed (but see the section on Operating
System concerns for possible reasons of delay).
.SH INPUT
There are basically two types of input: searches and commands.
Commands do such things as tell
.I lookup
to load more files or set flags. Searches report lines of a file that
match some search specifier (where lines to search for are specified by
one or more regular expressions).

The input syntax may perhaps at first seem odd, but has been designed
to be powerful and concise. A bit of time invested to learn it
well will pay off greatly when you need it.
.SH BRIEF EXAMPLE
Assuming you've started
.I lookup
with
.I edict
and
.I kanjidic
as noted above, let's try a few searches. In these examples, the
.nf
    search [edict]> 
.fi
is the prompt.
Note that the space after the\&\&>\&\&is part of the prompt.

Given the input:
.nf

  search [edict]> tranquil

.fi
.I lookup
will report all lines with the string\c
.Q tranquil
in them. There are currently about
a dozen such lines, two of which look like:
.nf

  ¤餫 [䤹餫] /peaceful (an)/tranquil/calm/restful/
  ¤餮 [䤹餮] /peace/tranquility/

.fi
Notice that lines with\c
.Q tranquil
\fIand\fP\c
.Q tranquility
matched? This is because\c
.Q tranquil
was embedded in the
word\&\&tranquility\&\&.
You could restrict the search to only the
\fIword\fP\c
.Q tranquil
by prepending the special\c
.Q "start of word"
symbol\&\&<\&\&and appending the special\c
.Q "end of word"
symbol\&\&>\&\&to the regex, as in:
.nf

  search [edict]> <tranquil>

.fi
This is the regular expression that says\&\&the beginning of a word,
followed by a\&\&t\&\&,\&\&r\&\&, ...,\&\&l\&\&, which is at the end of a word.\&The
current version of
.I edict
has just three matching entries.

Let's try another:
.nf

  search [edict]> fukushima

.fi
This is a search for the\c
.Q English
fukushima -- ways to search for
kana or kanji will be explored later.  Note that among the several
lines selected and printed are:
.nf

   [դ] /Fukushima (pn,pl)/
  ʡ [դ] /Kisofukushima (pl)/

.fi
By default, searches are done in a case-insensitive
manner --\&\&F\&\&and\&\&f\&\&are treated the same by
.IR lookup ,
at least so far as the matching goes.  This is called
.IR "case folding" .

Let's give a command to turn this option off,
so that\&\&f\&\&and\&\&F\&\&won't
be considered the same.  Here's an odd point about
.I "lookup's"
input syntax: the default setting is that all command lines must begin
with a space.  The space is the (default) command-introduction
character and tells the input parser to expect a command rather than a
search regular expression.
.I
It is a common mistake at first to forget the leading space when
issuing a command.  Be careful.

Try the command\c
.Q "\ fold"
to report the current status of case-folding.
Notice that as soon as you type the space, the prompt changes to
.nf
  lookup command> 
.fi
as a reminder that now you're typing a command rather than a search
specification.

.nf

  lookup command>  fold

.fi
The reply should be\c
.Q "file #0's case folding is on"
.br

You can actually turn it off with\c
.Q " fold off" ".  "
Now try the search for\c
.Q fukushima
again. Notice that this time the entries with\c
.Q Fukushima
aren't listed? Now try the search string\c
.Q Fukushima
and see that the entries with\c
.Q fukushima
aren't listed.

Case folding is usually very convenient (it also makes corresponding
katakana and hiragana match the same), so don't forget to turn it back on:
.nf

  lookup command>  fold on

.fi
.SH JAPANESE INPUT
.I Lookup
has an automatic romajikana converter. A leading\&\&/\&\&indicates that
romaji is to follow. Try typing\c
.Q /tokyo
and you'll see it convert to\c
.Q /\&Ȥ
as you type. When you hit return,
.I lookup
will list all lines that have a\&\&Ȥ\&\&somewhere in them. Well, sort
of.  Look carefully at the lines which match. Among them (if you had
case folding back on) you'll see:
.nf

  ꥹȶ [ꥹȤ礦] /Christianity/
   [Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
  ̶ [Ȥä礦] /convex lens/

.fi
The first one has\&\&Ȥ\&\&in it (as\&\&Ȥ\&\&,
where the katakana\&\&\&\&matches in a case-insensitive
manner to the hiragana\&\&\&\&), but you
might consider the others unexpected, since they don't
have\c
.Q Ȥ
in them.
They're close (\&\&Ȥ\&\&and\&\&Ȥä\&\&),
but not exact. This is the result of
.IR lookup "'s\c"
.Q fuzzification "\&."

Try the command\c
.Q "\ fuzz"
(again, don't forget the command-introduction space).
You'll see that fuzzification is turned on.  Turn it off with\c
.Q "\ fuzz off"
and try\c
.Q /tokyo
(which will convert as you type) again.
This time you only get the lines which have\&\&Ȥ\&\&exactly
(well, case folding is still on, so it might match katakana as well).

In a fuzzy search, length of vowels is ignored --\&\&\&\&is
considered the same as\&\&Ȥ\&\&, for example. Also, the
presence or absence of any\&\&\&\&character is ignored, and the
pairs  ,  ,  , and   are considered identical in a
fuzzy search.

It might be convenient to consider a fuzzy search to be a\c
.Q "pronunciation search" ".  "

Special note: fuzzification will not be performed if a regular expression\c
.Q "*" ,
.Q "+" ,
or\c
.Q "?"
modifies a non-ASCII character. This is not an issue when input patterns
are filename-like wildcard patterns (discussed below).

In addition to kana fuzziness, there's one special case for kanji when
fuzziness is on. The kanji repeater mark\c
.Q ""
will be recognized such that\c
.Q ""
and\c
.Q ""
will match each-other.


Turn fuzzification back on (\&\&fuzz on\&\&), and search for all
.I "whole words"
which sound like\&\&tokyo\&\&. That search would be specified as:
.nf

  search [edict]> /<tokyo>

.fi
(again, the\c
.Q tokyo
will be converted to\c
.Q Ȥ
as you type).
My copy of
.I edict
has the three lines
.nf

   [Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
  õ [Ȥä] /special permission/patent/
  ̶ [Ȥä礦] /convex lens/

.fi
This kind of whole-word romaji-to-kana search is so common, there's a
special short cut. Instead of typing\&\&/<tokyo>\&\&, you can
type\c
.Q [tokyo] ".  "
The leading\&\&[\&\&means\&\&start romaji\&\&\c
.I and\c
.Q "start of word" ".  "
Were you to type\c
.Q <tokyo>
instead (without a
leading\&\&/\&\&or\&\&[\&\&to indicate romaji-to-kana conversion), you would
get all lines with the
.I English
whole-word\c
.Q tokyo
in them.
That would be a reasonable request as well, but not what we want at the moment.

Besides the kana conversion, you can use any cut-and-paste that your
windowing system might provide to get Japanese text onto the search
line. Cut\c
.Q Ȥ
from somewhere and paste onto the search line. When
hitting enter to run the search, you'll notice that it is done without
fuzzification (even if the fuzzification flag was\c
.Q on ").  "
That's because
there's no leading\&\&/\&\&. Not only does a leading\&\&/\&\&ndicate that you
want the romaji-to-kana conversion, but that you want it done fuzzily.

So, if you'd like fuzzy cut-and-paste, just type a leading\&\&/\&\&efore
pasting (or go back and prepend one after pasting).

These examples have all been pretty simple, but you can use all the
power that regexes have to offer. As a slightly more complex example,
the search\c
.Q <gr[ea]y>
would look for all lines with
the words\c
.Q grey
or\c
.Q gray
in them.  Since the\&\&[\&\&isn't the first character
of the line, it doesn't mean what was mentioned above (start-of-word romaji).
In this case, it's just the regular-expression\c
.Q class
indicator.

If you feel more comfortable using filename-like\c
.Q "*.txt"
wildcard patterns, you can use the\c
.Q "wildcard on"
command to have patterns be considered this way.

This has been a quick introduction to the basics of
.IR lookup .

It can be very powerful and much more complex. Below is a detailed
description of its various parts and features.
.SH READLINE INPUT
The actual keystrokes are read by a readline-ish package that is
pretty standard. In addition to just typing away, the following
keystrokes are available:
.nf

  ^B  / ^F     move left/right one character on the line
  ^A  / ^E     move to the start/end of the line
  ^H  / ^G     delete one character to the left/right of the cursor
  ^U  / ^K     delete all characters to the left/right of the cursor
  ^P  / ^N     previous/next lines on the history list
  ^L or ^R     redraw the line
  ^D           delete char under the cursor, or EOF if line is empty
  ^space       force romaji conversion (^@ on some systems)

.fi
If automatic romaji-to-kana conversion is turned on (as it is by
default), there are certain situations where the conversion will be
done, as we saw above. Lower-case romaji will be converted to
hiragana, while upper-case romaji to katakana.  This usually won't
matter, though, as case folding will treat hiragana and katakana the
same in the searches.

In exactly what situations the automatic conversion will be done is
intended to be rather intuitive once the basic idea is learned.
However, at
.IR "any time" ,
one can use control-space to convert the ASCII to the left of the
cursor to kana. This can be particularly useful when needing to enter
kana on a command line (where auto conversion is never done; see below)

.SH ROMAJI FLAVOR
Most flavors of romaji are recognized. Special or non-obvious items are
mentioned below. Lowercase are converted to hiragana, uppercase to katakana.

Long vowels can be entered by repeating the vowel, or with\&\&-\&\&or\&\&^\&\&.

In situations where an\&\&n\&\&could be vague, as
in\&\&na\&\&being  or \&, use a single quote to force \&.
Therefore,\&kenichi\&עˤ while\&ken'ichi\&ע󤤤\&.

The romaji has been richly extended with many non-standard
combinations such as դ or \&, which are represented in
intuitive ways:\&fa\&עդ\&,\&che\&ע\&. etc.

Various other mappings of interest:
.nf

  wo      we      wi
  VA    VI    VU      VE    VO
  di      dzi     dya¤   dyu¤   dyo¤
  du      tzu     dzu

(the following kana are all smaller versions of the regular kana)

  xa      xi      xu      xe      xo
  xu      xtu     xwa     xka     xke
  xya     xyu     xyo

.fi
.SH INPUT SYNTAX
Any input line beginning with a space (or whichever character is set as
the command-introduction character) is processed as a command to
.I lookup
rather than a search spec.
.I Automatic
kana conversion is never done on these lines (but
.I forced
conversion with control-space may be done at any time).

Other lines are taken as search regular expressions, with the
following special cases:
.TP
?
A line consisting of a single question mark will report the current
command-introduction character (the default is a space, but can be
changed with the\c
.Q cmdchar
command).
.TP
=
If a line begins with\&\&=\&\&, the line (without the\&\&=\&\&) is taken as a
search regular expression, and no automatic (or internal -- see below)
kana conversion is done anywhere on the line (although again,
conversion can always be forced with control-space).  This can be used
to initiate a search where the beginning of the regex is the
command-introduction character, or in certain situations where automatic kana
conversion is temporarily not desired.
.TP
/
A line beginning with\&\&/\&\&indicates romaji input for the whole line.
If automatic kana conversion is turned on, the conversion will be done
in real-time, as the romaji is typed. Otherwise it will be done
internally once the line is entered.
.IR Regardless ,
the presence of the leading\&\&/\&\&indicates that any kana (either
converted or cut-and-pasted in) should be\c
.Q fuzzified
if fuzzification is turned on.

As an addition to the above, if the line doesn't begin with\&\&=\&\&or the
command-introduction character (and automatic conversion is turned
on),\&\&/\&\&
.I anywhere
on the line initiates automatic conversion for the following word.
.TP
[
A line beginning with\&\&[\&\&is taken to be romaji (just as a line
beginning with\&\&/\&\&, and the converted romaji is subject to
fuzzification (if turned on).  However, if\&\&[\&\&is used rather
than\&\&/\&\&, an implied\&\&<\&\&\c
.Q "beginning of word"
is prepended to the resulting
kana regex.  Also, any ending\&\&]\&\&on such a line is converted to the\c
.Q "ending of word"
specifier\&\&>\&\&in the resulting regex.
.PP
In addition to the above, lines may have certain prefixes and suffixes
to control aspects of the search or command:
.TP
!
Various flags can be toggled for the duration of a particular search
by prepending a\c
.Q !!
sequence to the input line.

Sequences are shown below, along with commands related to each:
.nf

 !F!   Filtration is toggled for this line (filter)
 !M!   Modification is toggled for this line (modify)
 !w!   Word-preference mode is toggled for this line (word)
 !c!   Case folding is toggled for this line (fold)
 !f!   Fuzzification is toggled for this line (fuzz)
 !W!   Wildcard-pattern mode is toggled for this line (wildcard)
 !r!   Raw. Force fuzzification off for this line
 !h!   Highlighting is toggled for this line (highlight)
 !t!   Tagging is toggled for this line (tag)
 !d!   Displaying is on for this line (display)

.fi
The letters can be combined, as in\c
.Q "!cf!" .


The final\&\&!\&\& can be omitted if the first character
after the sequence is not an ASCII letter.

If no letters are given (\c
.Q !! ").\c"
.Q !f!
is the default.

These last two points can be conveniently combined in the common case of\c
.Q !/romaji
which would be the same as\c
.Q !f!/romaji ".  "


The special sequence\c
.Q !?
lists the above, as well as indicates which are currently turned on.

Note that the letters accepted in a\c
.Q !!
sequence are many of the indicators shown by the\c
.Q files
command.
.TP
+
A\&\&+\&\&prepended to anything above will cause the final search
regex to be printed. This can be useful to see when and what kind of
fuzzification and/or internal kana conversion is happening. Consider:
.nf

  search [edict]> +/狼
  a match isȤ[]*?[]*[]*

.fi
Due to the\c
.Q leading "\&/\, "
the kana is fuzzified, which explains the
somewhat complex resulting regex. For comparison, note:
.nf

  search [edict]> +狼 
  a match isȤ狼
  search [edict]> +!/狼
  a match isȤ狼

.fi
As the\&\&+\&\&shows, these are not fuzzified. The first one has no
leading\&\&/\&\&or\&\&[\&\&to induce fuzzification, while the second has
the\&\&!\&\&line prefix (which is the default version of\c
.Q !f! "), "
which toggles fuzzification mode to\c
.Q off
for that line.
.TP
\&,
The default of all searches and most commands is to work with the
first file loaded (\fIedict\fP in these examples). One can change this
default (see the\c
.Q select
command) or, by appending a comma+digit
sequence at the end of an input line, force that line to work with
another previously-loaded file. An appended\c
.Q ,1
works with first
extra file loaded (in these examples, \fIkanjidic\fP).  An appended\c
.Q ,2
works with the 2nd extra file loaded, etc.

An appended\c
.Q ,0
works with the original first file (and can be useful
if the default file has been changed via the\c
.Q select
command).

The following sequence shows a common usage:
.nf

  search [edict]> [Ȥ]    
   [Ȥ礦] /Tokyo Metropolitan area/

.fi
cutting and pasting the  from above, and adding a\c
.Q ,1
to search
.IR kanjidic :
.nf

  search [edict]> ,1
   4554 N4769 S11  .....   ߤ䤳 {metropolis} {capital} 

.fi

.SH FILENAME-LIKE WILDCARD MATCHING
When wildcard-pattern mode is selected, patterns are considered as
extended\
.Q "*.txt" "-like"
patterns. This is often more convenient for users not familiar with
regular expressions. To have this mode selected by default, put
.nf

   default wildcard on

.fi
into your\c
.Q ".lookup"
file (see\c
.Q "STARTUP FILE"
below).

When wildcard mode is on, only \c
.Q "*" ,
.Q "?" ,
.Q "+" ,
and\c
.Q "." ,
are effected.
See the entry for the
.Q wildcard
command below for details.

Other features, such as the multiple-pattern searches (described below)
and other regular-expression metacharacters are available.

.SH MULTIPLE-PATTERN SEARCHES
You can put multiple patterns in a single search specifier.
For example consider
.nf

  search [edict]> china||japan

.fi
The first part (\&\&china\&\&) will select all lines that have\c
.Q china
in them. Then,
.IR "from among those lines" ,
the second part will select lines that have\c
.Q japan
in them.  The\c
.Q ||
is not part of any pattern -- it is
.IR lookup "'s\c"
.Q pipe
mechanism.

The above example is very different from the single pattern
\&\&china|japan\&\&which would select any line that
had either\&\&china\&\&\c
.I or\c
.Q japan ".  "
With\c
.Q china||japan ", "
you get lines that have\c
.Q china
.I "and then also"
have\c
.Q japan
as well.

Note that it is also different from the regular expression\c
.Q china.*japan
(or the wildcard pattern\c
.Q china*japan ")"
which would select lines having\c
.Q "china, then maybe some stuff, then japan" ".  "
But consider the case when\c
.Q japan
comes on the line before\c
.Q china .

Just for your comparison, the multiple-pattern
specifier\&\&china||japan\&\&is pretty
much the same as the single regular
expression\&\&china.*japan|japan.*china\&\&.

If you use\&\&|!|\&\&instead of\&\&||\&\&,
it will mean\&\&...and then lines
.I not
matching...\&\&.

Consider a way to find all lines of
.I kanjidic
that do have a Halpern number, but don't have a Nelson number:
.nf

    search [edict]> <H\\d+>|!|<N\\d+>

.fi
If you then wanted to restrict the listing to those that
.I also
had a\&\&jinmeiyou\&\&marking (\fIkanjidic\fP's\&\&G9\&\&field)
and had a reading of , you could make it:
.nf

    search [edict]> <H\\d+>|!|<N\\d+>||<G9>||<>

A prepended+would explain:

    a match is<H\\d+>
    and not<N\\d+>
    and<G9>
    and<>

.fi
The\&\&|!|\&\&and\&\&||\&\&can be used to make up to ten
separate regular expressions in any one search specification.

Again, it is important to stress that\&\&||\&\&does not
mean\&\&or\&\&(as it does in a C program,
or as\&\&|\&\&does within a regular expression).
You might find it convenient to read\&\&||\&\&as\&\&\fIand\fP also\&\&,
while reading\&\&|!|\&\&as\&\&but \fInot\fP\&\&.

It is also important to stress that any whitespace around the\c
.Q ||
and\c
.Q |!|
construct is
.I not
ignored, but kept as part of the regex on either side.
.SH COMBINATION SLOTS
Each file, when loaded, is assigned to a\c
.Q slot
via which subsequent references to the file are then made.
The slot may then be searched, have filters and flags set, etc.

A special kind of slot, called a\c
.Q "combination slot" ,
rather than representing a single file, can represent multiple
previously-loaded slots. Searches against a combination slot
(or\c
.Q "combo slot"
for short) search all those previously-loaded slots associated with it
(called\c
.Q "component slots" "). "

Combo slots are set up with the
.I combine
command.

A Combo slot has no filter or modify spec, but can have a local prompt
and flags just like normal file slots.  The flags, however, have
special meanings with combo slots. Most combo-slot flags act as a mask
against the component-slot flags; when acted upon as a member of the
combo, a component-slot's flag will be disabled if the corresponding
combo-slot's flag is disabled.

Exceptions to this are the
.IR autokana ,
.IR fuzz ,
and
.I tag
flags.

The
.I autokana
and
.I fuzz
flags governs a combo slot exactly the same as a regular file slot.
When a slot is searched as a component of a combination slot, the
component slot's
.I fuzz
(and
.IR autokana )
flags, or lack thereof, are ignored.

The
.I tag
flag is quite different altogether; see the
.I tag
command for complete information.

Consider the following output from the
.I files
command:
.nf

  
   0F wcfh da I  2762k/usr/jfriedl/lib/edict
   1FM cf  da I   705k/usr/jfriedl/lib/kanjidic
   2F  cfh@da       1k/usr/jfriedl/lib/local.words
  *3FM cfhtda    combokotoba (#2, #0)
  

.fi
See the discussion of the
.I files
command below for basic explanation of the output.

As can be seen, slot #3 is a
.I "combination slot"
with the name\c
.Q kotoba
with
.I "component slots"
two and zero. When a search is initiated on this slot, first slot #2\c
.Q "local.words"
will be searched, then slot #0\c
.Q edict ".  "

Because the combo slot's
.I filter
flag is
.IR on ,
the component slots'
.I filter
flag will remain on during the search.
The combo slot's
.I word
flag is
.IR off ,
however, so slot #0's
.I word
flag will be forced off during the search.

See the
.I combine
command for information about creating combo slots.
.SH PAGER
.I Lookup
has a built in pager (a'la \fImore\fP).  Upon filling a screen with
text, the string
.nf
    --MORE [space,return,c,q]--
.fi
is shown. A space will allow another screen of text; a return will allow
one more line. A\&\&c\&\& will allow output text to continue unpaged until
the next command. A\&\&q\&\& will flush output of the current command.

If supported by the OS,
.I lookup's
idea of the screen size is automatically set upon startup and window resize.
.I Lookup
must know the width of the screen in doing both the horizontal
input-line scrolling, and for knowing when a long line wraps on the screen.

The pager parameters can be set manually with the\c
.Q pager
command.
.SH COMMANDS
Any line intended to be a command must begin with the
command-introduction character (the default is a space, but can be set
via the\&\&cmdchar\&\&command).  However, that character is not part of
the command itself and won't be shown in the following list of
commands.

There are a number of commands that work with the
.I "selected file" 
or
.I "selected slot"
(both meaning the same thing).
The selected file is the one indicated by an appended comma+digit, as
mentioned above. If no such indication is given, the default
.I "selected file"
is used (usually the first file loaded, but can be changed with
the\&\&select\&\&command).

Some commands accept a
.I boolean
argument, such as to turn a flag on or off. In all such cases,
a\&\&1\&\&or\&\&on\&\&means to turn the flag on,
while a\&\&0\&\&or\&\&off\&\&is used to
turn it off.  Some flags are per-file
(\&\&fuzz\&\&,\&\&fold\&\&, etc.), and a
command to set such a flag
normally sets the flag for the selected file only. However, the
default value inherited by subsequently loaded files can be set
by prepending\c
.Q default
to the command. This is particularly useful in the startup file
before any files are loaded (see the section STARTUP FILE).

Items separated by\&\&|\&\&are mutually exclusive possibilities (i.e. a
boolean argument is\&\&1|on|0|off\&\&).

Items shown in brackets (\&[\&and\&\&]\&\&)
are optional. All commands that
accept a boolean argument to set a flag or mode do so optionally --
with no argument the command will report the current status of the
mode or flag.

Any command that allows an argument in quotes (such as load, etc.)
allow the use of single or double quotes.
.PP
The commands:
.br
.so c_autokana.so
.so c_clear.so
.so c_cmdchar.so
.so c_combine.so
.so c_cmd_debug.so
.so c_debug.so
.so c_describe.so
.so c_encoding.so
.so c_files.so
.so c_filter.so
.so c_fold.so
.so c_fuzz.so
.so c_help.so
.so c_highlight.so
.so c_if.so
.so c_in_code.so
.so c_limit.so
.so c_log.so
.so c_load.so
.so c_modify.so
.so c_msg.so
.so c_out_code.so
.so c_pager.so
.so c_prompt.so
.so c_rdebug.so
.so c_list_size.so
.so c_select.so
.so c_show.so
.so c_source.so
.so c_spinner.so
.so c_stats.so
.so c_tag.so
.so c_verbose.so
.so c_version.so
.so c_wild.so
.so c_word.so
.so c_quit.so
.SH STARTUP FILE
If the file\c
.Q ~/.lookup
is present, commands are read from it during
.I lookup
startup.

The file is read in the same way as the
.I source
command reads files (see that entry for more information on file
format, etc.)

However, if there had been files loaded via command-line arguments,
commands within the startup file to load files (and their associated
commands such as to set per-file flags) are ignored.

Similarly, any use of the command-line flags -euc, -jis, or -sjis
will disable in the startup file the commands dealing with setting the
input and/or output encodings.

The special treatment mentioned in the above two paragraphs only applies
to commands within the startup file itself, and does not apply to commands
in command-files that might be
.IR source d
from within the startup file.

The following is a reasonable example of a startup file:
.nf
  ## turn verbose mode off during startup file processing
  verbose off

  prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
  spinner 200
  pager on

  ## The filter for edict will hit for entries that
  ## have only one English part, and that English part
  ## having a pl or pn designation.
  load ~/lib/edict
  filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
  highlight on
  word on

  ## The filter for kanjidic will hit for entries without a
  ## frequency-of-use number.  The modify spec will remove
  ## fields with the named initial code (U,N,Q,M,E, and Y)
  load ~/lib/kanjidic
  filter "uncommon" !/<F\\d+>/
  modify /( [UNQMEY]\S+)+//g

  ## Use the same filter for my local word file,
  ## but turn off by default.
  load ~/lib/local.words
  filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
  filter off
  highlight on
  word on
  ## Want a tag for my local words, but only when
  ## accessed via the combo below
  tag off ""

  combine "words" 2 0
  select words

  ## turn verbosity back on for interactive use.
  verbose on

.fi
.SH "COMMAND-LINE ARGUMENTS"
With the use of a startup file, command-line arguments are rarely needed.
In practical use, they are only needed to create an index file, as in:
.nf

    lookup -write \fItextfile\fP

.fi
Any command line arguments that aren't flags are taken to be files
which are loaded in turn during startup.
In this case, any\&\&load\&\&,\&\&filter\&\&, etc.
commands in the startup file are ignored.

The following flags are supported:
.TP
\-help\ \ \ 
Reports a short help message and exits.
.TP
\-write\ \ \
Creates index files for the named files and exits. No
.I "startup file"
is read.
.TP
\-euc\ \ \ 
Sets the input and output encoding method to EUC (currently the default).
Exactly the same as the\&\&encoding euc\&\&command.
.TP
\-jis\ \ \ 
Sets the input and output encoding method to JIS.
Exactly the same as the\&\&encoding jis\&\&command.
.TP
\-sjis\ \ \ 
Sets the input and output encoding method to Shift-JIS.
Exactly the same as the\&\&encoding sjis\&\&command.
.TP
\-v \-version
Prints the version string and exits.
.TP
\-norc\ \ \ 
.br
Indicates that the startup file should not be read.
.TP
\-rc \fIfile\fP
The named file is used as the startup file, rather than the
default\c
.Q "~/.lookup" ".  "
It is an error for the file not to exist.
.TP
-percent \fInum\fP
.br
When an index is built, letters that appear on more than
.I num
percent (default 50) of the lines are elided from the index.  The
thought is that if a search will have to check most of the lines in a
file anyway, one may as well save the large amount of space in the
index file needed to represent that information, and the time/space
tradeoff shifts, as the indexing of oft-occurring letters provides a
diminishing return.

Smaller indexes can be made by using a smaller number.
.TP
\-noindex
.br
Indicates that any files loaded via the command line should
not be loaded with any precomputed index, but recalculated on the fly.
.TP
\-verbose
.br
Has metric tons of stats spewed whenever an index is created.
.TP
\-port ###
For the (undocumented) server configuration only, tells which port to
listen on.

.SH OPERATING SYSTEM CONSIDERATIONS
I/O primitives and behaviors vary with the operating system. On my
operating system, I can\&\&read\&\&a file by mapping it into memory, which
is a pretty much instant procedure regardless of the size of the file.
When I later access that memory, the appropriate sections of the file
are automatically read into memory by the operating system as needed.

This results in
.I lookup
starting up and presenting a prompt very quickly, but causes the first
few searches that need to check a lot of lines in the file to go more
slowly (as lots of the file will need to be read in). However, once
the bulk of the file is in, searches will go very fast. The win here is
that the rather long file-load times are amortized over the first few
(or few dozen, depending upon the situation) searches rather than always
faced right at command startup time.

On the other hand, on an operating system without the mapping ability,
.I lookup
would start up very slowly as all the files and indexes are read into memory,
but would then search quickly from the beginning, all the file already
having been read.

To get around the slow startup, particularly when many files are loaded,
.I lookup
uses
.I "lazy loading"
if it can: a file is not actually read into memory at the time the
.I load
command is given. Rather, it will be read when first actually accessed.
Furthermore, files are loaded while
.I lookup
is idle, such as when waiting for user input. See the
.I files
command for more information.
.SH REGULAR EXPRESSIONS, A BRIEF TUTORIAL
.so regex.so
.SH BUGS
Needs full support for half-width katakana and JIS X 0212-1990.
.br
Non-EUC (JIS & SJIS) items not tested well.
.br
Probably won't work on non-UNIX systems.
.br
Screen control codes (for clear and highlight commands) are hard-coded
for ANSI/VT100/kterm.
.SH AUTHOR
Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)
.SH INFO
Jim Breen's text files
.I edict
and
.I kanjidic
and their documentation can be found in\c
.Q pub/nihongo
on ftp.cc.monash.edu.au (130.194.1.106

Information on input and output encoding and codes can be found in
Ken Lunde's
.I "Understanding Japanese Information Processing"
(\&ܸ\&) published by O'Reilly and Associates.
ISBN 1-56592-043-0.  There is also a Japanese edition published
by SoftBank.

A program to convert files among the various encoding methods is
Dr. Ken Lunde's\c
.IR jconv ,
which can also be found on ftp.cc.monash.edu.au.
.I Jconv
is also useful for converting halfwidth katakana (which
.I lookup
doesn't yet support well) to full-width.
