SECTION 9 -- RDBMS
<> 09.001 What processes do what?
dmfacp archiver process
wakes up at intervals and copies committed transactions
on journalled tables from the log file to the journal file.
dmfrcp recovery process (recovery server in OpenIngres)
The dmfrcp process moves log records from log buffers
in memory and writes these records to the logging
system file. In case of server or system failure, the
dmfrcp recovers pending transactions by reading the
logging system file and performing appropriate recovery
actions. This is done during start-up. While the recovery
process is running the database is locked and users attempting
to connect see the message "The database is currently
unavailable."
iidbms INGRES RDBMS server process
the actual data management component. There may be multiple
servers per installation. OpenIngres allows customer defined
server classes.
iigcc communication server process
provides the network communication function of INGRES/Net.
There may be multiple instances of iigcc in an installation.
iigcn name server process
monitors and identifies all the INGRES servers in an
installation (eg iidbms and iigcc). There will be one per
installation, on each client and server.
iislave disc I/O process (Unix machines only)
actually performs disc I/O operations for iidbms. To
maximize concurrency, it is advisable to have at least one
iislave per disc drive. (See note 09.002 below.)
<> 09.002 How are extra iislave (disc) processes started?
It is generally advisable to have at least one iislave process per disc.
The required number of iislaves is requested by the user `ingres' with:
ingsetenv II_NUM_SLAVES=
where is the number of iislaves to fork when iistartup is
executed. This assignment is non-volatile and needs to be made only
once, or as new discs are added. If multiple servers are to be started
the value of II_NUM_SLAVES can be adjusted for each server prior to
starting it.
<> 09.003 How do I recover an inconsistent database?
Contact Computer Associates Ingres Technical Support. This answer is
not deliberately intended to be facetious, it is just that an
inconsistent database can be caused by a number of things. Technical
Support are best able to determine the cause of the inconsistency and
advise on the appropriate recovery procedure.
Some situations which cause a database to be marked inconsistent can
be pre-empted. See section 09.019 for more information.
It would be very foolish to do as some contributors suggest and just
use verifydb to force the database consistent. That option of verifydb
is a last-resort salvage tool, not a recovery mechanism. It fixes
nothing and leaves a mess in the database.
<> 09.004 What are "compressed" table structures?
In ordinary table structures space is allocated for data rows according
to the maximum width of the row. With compressed table structures
Ingres applies a compression algorithm in order to reduce the amount of
space required. The advantage is a reduction in storage requirements;
the disadvantages are twofold:
Firstly there is a small performance penalty whenever a row is stored
or retrieved, as it has to be compressed or uncompressed. Secondly
when a row in a compressed table is updated, it is unlikely to remain
the same size as the old row. If it is bigger then it won't fit in the
hole left by the old row, and so it is added somewhere else in the
table. Thus an UPDATE is effectively turned into a DELETE followed by
an INSERT, and a hole, ie unused space, is left in the table.
Eventually these holes occupy more and more of the table until a MODIFY
is required to reclaim the space.
For these reasons compressed structures are not recommended for tables
that are updated heavily. For other operations, ie: SELECT, INSERT and
DELETE there is a performance penalty because OpenIngres 1.x uses LZW
compression. (Ingres 6.4 just eliminates "blank" space at the end of
strings, which is not very intensive.)
Note that for CISAM and CBTREE structures only the data is compressed.
Data stored as part of the key is never compressed in the index.
<> 09.005 Why might I configure more than 1 DBMS server process?
The DBMS server is multithreading: it handles SQL requests on behalf of
many INGRES clients. Yet INGRES allows me to set up more than 1 DBMS
server. What reasons might I have to do this?
There are quite a number of reasons you might decide to do this:
o Multiprocessors. Although a DBMS server is multithreading, it can
only run a single thread at a time. If you have more than 1 CPU it
may be beneficial to have more than 1 DBMS server. As the number
of CPUs increases then you are likely to need more DBMS server
processes to exploit the extra power.
The optimal number depends on average SQL arrival rate and service
time. You could build a model using queueing theory. Alternatively
do what everyone else does: try different numbers and measure
which is best for your workload and configuration.
o Keep DBMS size manageable. You can get a lot of users on 1 DBMS
server. But you need to configure more and more DBMS resources as
the number of users grows. There comes a time when the DBMS just
gets too *enormous* to be manageable.
Splitting the workload across multiple servers allows you to run
without setting all the Kernel process limits up to ridiculous
values, negating their original purpose.
o Separate conflicting work types. Some types of work don't run well
together in the same DBMS server, e.g. OLTP and heavy batch, OLTP
and heavy scans (reports or MIS). There is no internal
mechanism for assigning and managing thread priorities within
the DBMS server, which can lead to undesirable interactions between
the different parts of the workload.
A very common scenario is to have a separate server for daytime
batch and reports. This prevents the batch/reports interfering
with OLTP throughput and response times.
Another possible scenario is to separate 2 OLTP systems. This
separation may help with performance analysis and tuning, and
prevent interactions between them. They each have their own set of
resources.
o Error containment. When a DBMS server crashes or decides to shut
down, all users of that server are affected. It might make sense
to partition your workload into multiple DBMS servers, so that
only part of the user population is affected by a DBMS failure.
This can be particularly useful if your workload is unstable, even
as a temporary measure while you find the real problem.
Note that currently if a DBMS connected to a shared cache fails,
then all other DBMS servers using that shared cache will fail too.
That is, the shared cache is the unit of failure.
<> 09.006 How can I direct batch or reports to a particular server?
OpenIngres supports customer-defined server classes. See section
09.018 for more information.
Ingres 6.4 does not offer such an elegant solution as OpenIngres.
At the moment, you have to bypass the Name Server to do this.
There are a number of steps:
o Set up a *private* DBMS server, that is: one the Name Server doesn't
know about. Do this by setting the DBMS startup flag 'nonames'.
No connections will be routed to this server by the Name Server.
o Store the private DBMS server ID in a Well-Known place, e.g. an
O/S file, an installation-level environment variable.
Remember this changes when INGRES is restarted. You need to
find out what it is and store it on every restart.
o Arrange that the II_DBMS_SERVER variable is set locally for batch
and reports which you want to run in the private DBMS server:
$ II_DBMS_SERVER=1234
$ export II_DBMS_SERVER
Where '1234' is the private server ID remembered earlier. This
routes the batch/report to the private server.
You will need to write some simple shell scripts to store and retrieve
the private DBMS server id. The iishutdown script looks for private
servers: you don't need to shut them down manually.
<> 09.007 How does INGRES use sort space?
If you don't do anything, INGRES will take sort space to modify tables
from the table location, and other sort space from the database home
location. For production databases this is probably not what you want.
You can tell INGRES to use sort work locations which can be set up
anywhere. See the DBA Guide for details.
There are some things the DBA Guide doesn't tell you:
o The sort workfiles are striped across the available sort locations
by the sorter. The size of the stripe is determined by the sorter
at run time.
o It follows that the input and output files of the sort phases are
striped across the same disks. So you're guaranteed disk
contention--and there's nothing you can do about it.
o Keep the sort locations about the same size. When one location
fills that work file is *full*. There may be Gbs free in the other
sort locations, that doesn't matter. The sorter just gives up on
the sort in progress and returns an error.
o Unlike some other system locations, sort locations can be changed
by resetting the relevant installation environment variables. The
sort locations are dereferenced by the DBMS server when it
starts up, so restart INGRES or at least the DBMS servers to pick
up the new sort configuration.
o Check your sort locations have been picked up like this:
$ isql iidbdb
SET TRACE POINT DM1440;
This gives a list of sort location pathnames known to the
DBMS server at the moment.
o There must be no gaps in your sequence of sort location numbers.
When the DBMS server gets to a gap it stops looking for any more.
o It follows that if sort location #1 is not defined then sort space
is allocated using the default scheme described earlier.
Note: space for temp tables is always taken from the database home
location, not from the sort locations.
<> 09.008 When does INGRES switch to a new journal?
Journals are switched automatically when you take a checkpoint. But
sometimes INGRES will switch to a new journal between checkpoints.
The journals are written by the Archiver Process (ACP), dmfacp. By
default this writes the journal in 8 kb `pages'. When it has written
1024 of these pages it will look to switch journals, i.e. after 8 Mb of
journal.
The ACP is woken up periodically to copy information from the log file
to the individual database journals. When it has completed its log
file scan and copied all it can to the journals, it then checks if the
journals need switching. So in practise the switch happens at 8 Mb + a
little bit.
In OpenIngres the journal switch point and page size are configurable.
<> 09.009 Should I use mirroring on my data locations?
Maybe.
There's no standard virtual disk package: each is implemented
differently. You need to ask your vendor this question: does your
virtual disk implementation guarantee that mirrors are in sync after a
crash or power-off?
o If the answer is yes, then it is probably safe to mirror your
data locations. The choice is then a trade-off of resilience
against performance.
o If the answer is no, then Ingres fast commit recovery will probably
leave the mirrors out of step. The same SQL query may give different
answers at different times. Avoid, or if you want to take the risk
then invest in an uninterruptable power supply. This will give you
some protection against the most common failure case.
o If the answer is fudged, it's probably best to assume a `no'.
If you do mirror your data locations, the disk I/O load on the total
system will increase. You need to configure more disks and controllers
to handle the load.
Note: the problem is that fast commit recovery algorithms have to probe
the database after a crash to see which updates made it to disk. If the
mirrors are out of step, different things will happen depending on which
mirror returns the data. There are a number of scenarios, some of which
are not very nice: duplicate rows on one mirror, missing rows on one
mirror, corrupt index on one mirror, etc.
<> 09.010 Can I use mirroring as a super-fast checkpoint?
Yes, but ONLY if you do so by modifying the checkpoint template and
using "ckpdb". This method will work for either on-line or off-line
checkpoints, and inherits all the respective limitations.
To implement this, your "mirroring" solution must be controllable from
the command line. You need to be able to reliably run a command that:
1) determines if the mirrors are all on-line and synchonized
2) disengages one of the mirror disks so it is no longer updated
3) re-engages the mirror disks
The general checkpoint template algorithms are:
1) In the BEGIN phase, assure all mirrors are on-line and
synchronized. If not, fail.
2) In the WORK phase, disengage one of the mirror disks for each
data location.
3) In the END phase, do nothing.
After the "ckpdb" command completes, you have a set of disks containing
a "copy" of your database. If you performed an off-line checkpoint,
this "copy" is perfect. If you performed an on-line checkpoint, this
copy is not perfect, but will be made perfect when you recover by
"rollforwarddb" properly applying the dump file records.
Since you probably want to re-engage your mirror disks to prepare for
the next checkpoint, you must now copy the data from the off-line
mirror set to tape. Then issue the proper command to "resync" the
mirrors. Remember this may or may not have adverse performance
implications. Some solutions will resync faster if the time spent
"off-line" is minimized.
If you use this solution, remember that from the time the ckpdb begins
until you resynchronize the mirrors, you are protected by one less
mirror set. You may want to consider a minimum of three mirror sets if
you choose to implement this strategy.
Make sure you properly modify the checkpoint template so you can
recover properly.
PLEASE NOTE! Simply using the OS or hardware to "break" a mirror
set for later backup to tape WILL NOT WORK unless Ingres
has been shut down normally. If there are any
transactions in the log file, or any unflushed cache
pages in an active DBMS server, this method will NOT
provide a "recoverable" backup.
<> 09.011 Can I mirror my log file?
In installations using FAST COMMIT (i.e. almost all Ingres
installations) committed transactions are not written through the DMF
cache to disc right away. Committed updates can exist in the cache and
the transaction log for quite some time without being written to disc.
The transaction log file is therefore a `critical single point of
failure' for an Ingres system. If fast commit is enabled and you lose
the log file, you WILL lose committed transactions. A mirrored log
file could protect against this situation.
OpenIngres supports a dual logging option to mirror the log. This
option should be exploited. See the System Reference Guide for
instructions. (Note that dual logging will only be effective if the
mirror is on an entirely separate drive.) The rest of this note
applies to Ingres 6.4 which does not have this valuable feature.
Ingres 6.4 does not support dual logging as an Ingres feature, but
mirroring can still be done. As always, there is a price: performance
in this case. If your workload is update- and commit-intensive you
will notice it most. On an extreme workload of this type (TP1) a
reduction in throughput of some 40% was measured.
But you can choose between performance and resilience on this single
point of failure. An alternative approach is to disable fast-commit,
but that will also have a severe performance penalty.
Some virtual disk packages don't support mirrors on raw disk. Check
the manuals.
If you don't mirror your transaction log and you do want to run
fast commit, you can reduce the amount of data that would be lost
in the event of a log failure by making the archiver run after
every consistency point. The default is to run after every fourth
consistency point.
<> 09.012 Can I have more than 1 DBMS page cache?
This is quite easy to arrange, but be sure it's really what you want to
do.
Simply starting multiple servers with the runrundbms utility will, by
default, start each server with its own private DMF page cache in its
internal data space. The Ingres locking system is used to maintain the
coherency of multiple caches, using a value block locking technique.
You may therefore need to configure additional lock resources.
If you want to have a number of DBMS servers using fast commit on the
same database, then you can use the shared_cache option. This
creates the DMF page cache in a shared memory space, which is then
connected to by all of the servers. There is a system of latches and
semaphores to safely coordinate their access to the common DMF page
cache. (Note that if several servers share a cache and one server
dies, all the servers attached to the same cache will immediately
die too.)
It is also possible to have several shared caches, but you must give
them different names. See the I & O Guide for details.
<> 09.013 Can I have more than 1 fast commit server?
Yes. Set -fast_commit and -shared_cache and DON'T set -sole_server.
Refer to Chapter 2 of the I&O Guide, "Using a Shared Buffer Cache" for
more information.
<> 09.014 What is the Ingres Search Accelerator?
The Ingres Search Accelerator is a hardware device that speeds
up certain types of database operation and is based on ICL's
SCAFS technology. (CAFS stands for Content Addressable File
Store and was developed by ICL for their VME mainframes. SCAFS
is the SCSI bus based version.) It is available for the ICL DRS6000
and DRS3000, the IBM RS/6000, and probably any Fujitsu box that both
it and Ingres run on.
The system works by passing a query, or part of a query, to an
intelligent disk controller. The disk controller applies this
query to the data that it has been asked to read from the disk
and only passes back the rows that match. This results in a
significant reduction in the amount of data being passed back to
the main processor giving the following benefits:
- there is reduced bus activity, thereby allowing other
system activities to run more quickly
- the CPU has fewer rows to process and therefore less work
to do, so queries execute more quickly and more CPU
cycles are available for other activities
- there is less data coming back to the CPU so Ingres data
caches are flushed less frequently
The following operators are supported by the Accelerator and
queries containing them are eligible to be "accelerated":
Operator Meaning
= equal
!=, <> not equal
< less than
> greater than
<= less than or equal, not greater
than
>= greater than or equal, not less
than
IS NULL TRUE if the column value is null
IS NOT NULL TRUE if the column value is not
null
LIKE TRUE if the column value conforms
to a match string
NOT LIKE TRUE if the column value does
not conform to a match string
Note: A character string containing a character range (eg
[A-H]) cannot be handled by the Accelerator.
The Accelerator can handle absolute dates but not date
intervals.
<> 09.015 When and how do I use the INGRES Search Accelerator?
Use of the Accelerator is completely transparent to the
application. The Query Optimizer decides whether or not to
invoke the Accelerator and there is no way, as with normal
queries, of hinting at or suggesting a plan to the Optimizer.
The Accelerator hardware can be fitted "retrospectively" and any
existing applications will benefit from performance increases
automatically.
The Accelerator is used to best effect when there are tables
that would normally be scanned rather than accessed through an
index. Typically this occurs in MIS systems where ad hoc
queries are issued by the users and the application designer
cannot determine the most effective indexing in advance.
<> 09.016 How can I find out more about the Ingres Search Accelerator?
See the "Ingres Search Accelerator Workload Trials Report"
which gives detailed performance figures for five different
real-life applications. The results presented in the report
were independently verified by management consultants KPMG, and
ranged from a worst case of 2.5 times more to a best case of 45
times more in terms of transaction throughput.
Details on the product can be had by pointing your Web browser to:
http://www.icl.co.uk/cguide/cg500009.html
<> 09.017 Why am I running out of QSF memory (E_OP0886 and E_QS0001)?
There are two possibilities: the pool is too small for your local
requirements, or the pool is too fragmented (or equivalently: your
queries are too complicated).
Unless otherwise specified, the size of the QSF (Query Storage
Facility) memory pool is estimated from the number of connected sessions
declared when the server is started. By default, the pool size is set
to 60kb+N*40kb, where N is the number of connected sessions allowed.
This formula is just a rule-of-thumb for sizing the pool. In any given
installation the type of work being done may mean this estimate is too
small.
The allowed number of connected sessions is defined using CBF in
OpenIngres, and in Ingres 6.4 by -connected_sessions in
$II_SYSTEM/ingres/files/rundbms.opt. It is possible to increase the
size of the QSF pool by adjusting the allowed number of connected
sessions upward.
A better solution might be to increase the size of the pool directly by
specifying a pool size using the -qsf.pool_size variable in the same
file (this line will usually have to be added to the file). To find
out the current pool size:
SET TRACE POINT QS501
After making the necessary changes, shut down and restart the server.
One suggestion seen on the net is to simply double the pool size;
in the absence of any better advice that suggestion is repeated here.
Note that increasing the size of the pool may not solve the problem, or
not for long anyway. Objects in the pool require contiguous chunks of
space. The pool may have enough total free space, but it may be so
fragmented that no one region is big enough. If enlarging the pool
size provides no lasting relief, there are three possible courses of
action: (i) review and simplify your searches if they are hand coded;
(ii) SET TRACE POINT QS506 to clear the pool completely, or (iii) start
another server, de-register the first one, and register the new one.
Breaking up a complex search with lots of 'ORs' into unions may help.
The rundbms.opt file and its contents are described in the Installation
and Operations manual. The behaviour of the QSF pool is described in
more detail in Advisor note US-16503.
<> 09.018 How can I assign particular jobs to particular servers?
The goal here is to have two or more database servers eg: 'LIVE_SERVER'
and 'REPORT_SERVER' running and to be able to select one or the other
when connecting to the database, eg:
sql report_database/REPORT_SERVER < report.sql &
sql live_database/LIVE_SERVER < live.sql &
The REPORT_SERVER could be running at a lower process priority than the
LIVE_SERVER (NB not a good idea to do this on the SAME database).
Existing server types exist in the following file:
II_SYSTEM/ingres/files/name/iiname.all (Unix)
II_SYSTEM:[ingres.files.name]iiname.all (VMS)
It contains something like:
INGRES local transient
COMSVR local transient
STAR local transient
NODE global
LOGIN global
LTICKET local
RTICKET local
To add your own server type add the lines:
LIVE_SERVER local transient
REPORT_SERVER local transient
Servers must be registrated with the name server manually, eg:
$ runrundbms rundbms_noname.opt
Checking INGRES installation environment...
INGRES installation is setup correctly
Starting a DBMS server (iidbms) ...
II_DBMS_SERVER = 2264
$ iinamu add REPORT_SERVER * 2264
It is possible to automate this with some judicious shell/DCL
programming. After this it will be possible to connect to the new
server as described above.
This has the distinct advantage over using II_DBMS_SERVER that the name
server is not bypassed, and so connections to remote hosts are
possible.
OpenINGRES supports the manipulation of server classes through the new
CBF (Configuration By Forms) and no manual jiggerypokery is required.
<> 09.019 Should I set II_DMFRCP_STOP_ON_INCONS_DB?
[This environment variable is apparently documented only in some
release notes for some platforms, although it is thought that this
feature is supported by all versions of OpenIngres. The following is a
posting taken nearly verbatim from comp.databases.ingres, explaining
the purpose of this environment variable. If your release notes do not
describe it then obviously you use it AT YOUR OWN PERIL. --Roy Hann]
[The] recovery algorithms are predicated on the notion that at any point
the RCP can be asked to step in and recover either a single transaction
or all transactions for a failed server. For example, if a server runs
out of resources or otherwise fails to recover a transaction, it can
ask the RCP to take over the transaction (this is known as a "pass
abort"). If the RCP fails to recover a transaction, it has nowhere
else to go for help: it is the court of last resort. If the RCP
encounters errors performing recovery, it marks the affected
database(s) inconsistent. Once a database has been marked
inconsistent, recovery from a checkpoint is necessary to ensure data
integrity.
Therefore it is critically important that we ensure that the RCP has
sufficient resources to be able to abort any transaction at any time
and to be able to recover all transactions for a server at any time.
In general, very few additional locking system resources are needed
during recovery, due to the effects of some of the Ingres recovery
protocols. Certain rare situations exist where many locks are required
to perform recovery. One such is UNDO recovery of a transaction which
has deleted rows in a btree table with overflow chains on the leaf
pages.
Currently, the locking system cannot guarantee that resources will be
available for the RCP during recovery. In the case of a pass abort, it
is possible for the RCP to deadlock with other transactions running in
the server. Either of these conditions can elicit a variety of error
messages, depending on the state of the RCP when the condition
manifests itself. The problem at Qantek seems to have been a failed
pass abort.
When a transaction is selected for UNDO recovery, it by definition has
not committed, and so it still holds all the normal transaction page
locks for data pages which the transaction updated. Since Before Image
recovery operates by replacing data pages with their before images from
the log, the data pages which are updated by Before Image recovery are
still locked by the transaction. Thus no new lock resources are needed
to restore data pages from their before images. The problem arises
when processing a pass abort which involves changes to btree leaf and
index pages. If the pass abort happened when the lock list was very
nearly full, the additional locks required to recover the leaf and
index pages may not be available. In fact, if the btree table has a
highly duplicative key, an arbitrary number of locks may be required to
lock the leaf page overflow chain. The attempt to escalate to a table
level lock may result in deadlock with transactions still running in
the server. Deadlock is also possible when attempting to lock the leaf
or index pages, even if sufficient locks are available. Any of these
situations will result in the database being marked inconsistent,
necessitating recovery from a checkpoint.
Recovery from checkpoint may not be acceptable in all situations, so an
alternate processing scheme which forces the system to shut down rather
than mark a database inconsistent is provided. To use this alternate
processing scheme, the INGRES environment variable
II_DMFRCP_STOP_ON_INCONS_DB must be set to "YES" with the ingsetenv
utiltiy. When INGRES shuts down, the open databases are left in a
state which will require the RCP to operate in REDO mode on restart.
When a transaction is selected for REDO recovery, the database
requiring REDO recovery is taken offline and no access is allowed to
the database except by the RCP (or CSP in the cluster node failure
case). REDO recovery takes advantage of this by forgoing normal
database, table, and page locking during REDO recovery. This
dramatically reduces the locking resources required to process any
transactions which must be REDOne or UNDOne. If the locking system has
been underconfigured, it is possible to reconfigure it before
restarting INGRES.
With II_DMFRCP_STOP_ON_INCONS_DB set to "YES", almost all conditions
which would normally result in a database being marked inconsistent can
be handled without rolling forward from a checkpoint. Since the RCP is
started before the DBMS servers, it has an opportunity to clean up the
open databases without risk of lock conflict.
Shutting down INGRES is a drastic measure. It should be avoided, but
in this case the alternative is worse. The best that can be done is to
minimize the frequency. To do this, avoid lock escalation. Design
applications so that table level locks, if required, are taken
initially, rather than after some number of pages of the table have
been locked. In particular, avoid escalations due to running out of
configured locks per transaction. Either configure the INGRES locking
system with enough locks per transaction to meet the needs of the
application, or modify the application to require fewer locks. Highly
duplicative keys should also be avoided.
<> 09.020 How do I recover after a fire/flood in the machine room?
1) Regularly take a valid checkpoint (on or off line) on your
production machine. Do this checkpoint to either tape or disk, but
copy it to tape so that it can be stored off-site. When disaster
strikes call tech support. They will talk you through step 2 and up.
2) Make sure your replacement system has a database by the same name
with the same locations and disk paths.
3) Shutdown Ingres on the replacement box.
4) Wipe out ALL the data for the replacement machine's default database
directory.
5) Copy the destroyed database's aaaaaaaa.cnf file from the checkpoint
tape to the default database location on the replacement system.
6) Make the jnl and dmp directories on the replacement box identical to
the destroyed machine.
7) Bring Ingres up on the replacement box.
8) Use "rollforward" with or without journals to "recover" the database
from the checkpoint.
The only thing to worry about is users, groups, and roles, that are
stored in the iidbdb.