Difference between revisions of "Recital Replication Configuration"
| Yvonnemilne  (Talk | contribs)  (→Multiple transactions) | Yvonnemilne  (Talk | contribs)   (→Recital Database Server) | ||
| Line 369: | Line 369: | ||
| == Recital Database Server == | == Recital Database Server == | ||
| − | In order for either of these services to work the Recital Database Server must be installed and running on the system publishing the data.  | + | In order for either of these services to work the Recital Database Server must be installed and running on the system publishing the data. | 
| − | + | ||
| == ODBC Data Source == | == ODBC Data Source == | ||
Revision as of 14:49, 8 June 2010
Template:YLM remove peer to peer
Contents
Recital Engine Configuration
Queue Table
In Recital replication works by storing transactions for publication in the queue table which is stored in the replication database. The replication queue table meta data is defined below.
| Column Name | Type | Width | Description | 
|---|---|---|---|
| PUBLISHER | Character | 20 | Publisher name | 
| PUBLICATION | Character | 60 | Publication number | 
| TRANID | Integer | 8 | Transaction ID | 
| DATA | Object | 8 | Transaction XML file | 
| CMDTYPE | Integer | 4 | Command type number | 
| INSERTDATE | DateTime | 8 | Insert date | 
| CMDSTATE | Character | 1 | Current transaction state | 
| TRANKEY | Character | 73 | Transaction key | 
| TRANTYPE | Character | 1 | Transaction type | 
Publisher column
This is the node name of the system that the transaction was processed on.
Publication column
This is the publication ID of the system that the transaction was processed on. This ID is specified during installation and is unique between all the systems that the replication services are installed on. It can be found in the recital.conf file and is defined by the DB_PUBLICATIONID variable. This ID should not be changed once the replication service has been started on the system as it is used to determine which system the transaction was processed on.
TranID column
The transaction ID is unique sequence number of each transaction in the queue table.
Data column
Each transaction is contained in a XML file which is stored in this column. The XML format of the transaction is explained in the XML format section.
CMDType column
This is the numeric value of the command for this transaction. A full list of supported commands and their numeric values can be found in the commands section.
InsertDate column
This contains the date and time that the transaction was performed.
CMDState column
The current state of the transaction. When a transaction is first inserted into the queue table the state is set to W. This indicates that this transaction is in a wait state. When new transactions are performed on the same row from the same table this flag is checked to see if the transaction can be optimized. When this transaction has been processed by a Slave service the state is set to P. Once a transaction's state has been set to Processed no more optimization can be performed on this row. See the section on optimization for more information.
TranKey column
A unique key to identify transactions performed on a row in a table. This key is made up of the table name and any open single index files plus the sequence of the row that this transaction was performed on.
TranType column
This is used to specify the type of transaction in the queue table. This value is always set to Q for the replication service.
Queue table indexes
The queue table contains the following indexes
| Expression | Tag Name | 
|---|---|
| CMDSTATE+TRANKEY+SYNCNUM | CMDSTATE | 
| SYNCNUM | SYNCNUM | 
XML format
Each transaction is stored in a XML file in the data column which is a Large binary object type. The XML file is defined by the following elements and attributes.
_rep_tn attribute
The RECITAL_DATAEXCHANGE element has an attribute called _rep_tn which will contain the table name for the transaction. The target name is defined when the table being replicated is opened. The format of the target attribute is dependent of how the table is opened. Tables with the same name on the same system can be replicated because the table name is prefixed to either the database name or directory path. The following table explains more about the target attribute format
| Format | Description | 
|---|---|
| databasename ! tablename | This is the format if the table is in a database. Tables being replicated this way must be located in the same database on each system. | 
| directorypath tablename | This is the format if the table is in a directory. Tables being replicated this way must be located in the same directory in each system. Directory links can be used to match the directory paths. | 
| tablename,indexname,... | If single index files are open on the table, then their name(s) are sorted in ascending order before being affixed in a comma separated list to the end of the table name. | 
RECORD content element
This element contains the record contents.
_rep_tt attribute
Each record element must contain a transaction type attribute. Transactions attributes can be one of the following types;
| Transaction Type | 
|---|
| INSERT | 
| PACK | 
| DELETE | 
| RECALL | 
| ZAP | 
| UPDATE | 
| ALTER 
 | 
_rep_wc attribute
If the _rep_tn attribute is an UPDATE then it must also contain a _rep_wc attribute. The format of this attribute is SYNCNUM=syncnum. The where condition is used when the slave replication service is performing the update on the target table.
_rep_id text element
Each record element must contain a text element called _rep_id. The text contained in this element is the unique publication ID from the system that the transaction originated from.
This ID is specified during installation and is unique between all the systems that the replication services are installed on. It can be found in the recital.conf and is defined by the DB_PUBLICATIONID variable. This ID should not be changed once the replication service has been started on the system as it is used to determine which system the transaction was processed on.
_rep_sn text element
Each record must contain a text element called _rep_sn. This value can be used to identify the row on which to perform the transaction.
Field name text elements
If the transaction type in the record element is an INSERT or UPDATE then an element for each column that is being changed will be included. Only columns that have been changed will be added, see the optimization section for more information. Each field name text element will contain a pair of data separated by a vertical bar |. For example a change to a column named balance from 350 to 125 would be formated like this;
<BALANCE>125|350</BALANCE>
All text stored in field name elements is converted to hex so that if will not conflict with any XML directives.
The following is an example XML file for the SQL command;
INSERT INTO accounts (last_name, first_name) VALUES (''Jefferson'', ''Frank'');
<_rep_tn>=''finance!accounts''</_rep_tn> <_rep_tt>=''INSERT''</_rep_tt> <_rep_id>01</_rep_id> <_rep_sn>01000000000000A2</_rep_sn> <LAST_NAME>|02A9E02</LAST_NAME> <FIRST_NAME>|AB075F1</FIRST_NAME>
SYNCNUM
When replication is enable in the Recital Engine a unique sequence number for each row in every table being replicated must be added. The first time a table is flagged for replication all rows must be updated with a sequence number, if the table contains a large number of row this process may take some time. You can also issue the recitalconvert replicate command at the OS prompt to convert all the tables in the current directory. Once it's been converted, the table is flagged so this process will not be done again. All new rows inserted after this point will automatically contain their own unique sequence number. For the life of a table these numbers will never be duplicated.
Optimization
There are several specific optimization procedures enabled for the replication process.
Record buffer
When replication is enabled on a table, a before image record of the row is kept. After each update on the row, this image is compared with the new data and if they are both the same the transaction is not added to the queue. This prevents multiple updates on the same row with the same data being added to the replication queue.
XML attributes
When replication is enabled on a table, a before image record of the row is kept. If the transaction type in the record element is an INSERT or UPDATE than an element for the columns are included in the XML file. Only columns that are different between the before image record and the new data are added to the transaction.
The XML file that contains the replication transaction will only contain data being changed, so generally it will be smaller. This will help with storage size of the queue tables and network traffic from the slave subscription service.
Multiple transactions
The queue table contains a column used to store the current transaction state. When the first transaction is added for a row on a table, this state is set to wait. It will remain this way until a slave service processes the transaction and sets the state of the transaction to Processed. When new UPDATE transactions are added to the queue, existing transactions for that row which are in a wait state are searched for. If a transaction is found for that row in a wait state and the transaction is an INSERT or UPDATE then the new transaction is merged in. If the column is not found in the current XML file it will be added. If the column is already in the XML then the data is updated with the new data.
If the transaction that is found is an INSERT then it will remain as an INSERT transaction and the data from the UPDATE transaction will be merge into it. In this case the transactions queue will have no record of an UPDATE being done.
Statements
Transactions that are performed by the Recital Database Engine are stored in the queue table for publication. The following table is a list of statements and commands that currently supported for replication.
| Statement | Type Number | 
|---|---|
| APPEND | 5042 | 
| DELETE | 5022 | 
| EDIT | 5082 | 
| INSERT | 5042 | 
| MERGE | 5042 | 
| PACK | 33 | 
| ZAP | 82 | 
| REPLACE | 5082 | 
| RECALL | 36 | 
| READ | 5082 | 
| BROWSE | 5082 | 
| SQL INSERT | 4042 | 
| SQL UPDATE | 5082 | 
| SQL DELETE | 5022 | 
Enabling Replication
IMPORTANT You must add the directory path or database name that contains the tables you want to replicate into the replication.allow file. Only tables that match in this file and are not matched in the replication.deny file will be replicated.
In order for transactions to be published in the queue table, replication must be set on first. This can be done by adding the SET REPLICATION ON command into a local or systemwide configuration file. The config.db stored in the conf directory, which is in the root recital installation directory, is used for systemwide settings, or you can update the config.db in the local directory. You may also add the command to any 4GL program file.
When replication is turned on all tables already open and all tables opened afterwards are flagged for replication. You may turn replication off to disable replication, but if you want to filter out tables from replication it's better to use the allow and deny access control lists. Index files associated with the tables are also updated. Multiple index tag files are handled automatically. Single index files are sorted by name and then added onto the table name when stored with the transaction.
To turn replication on you must have installed the Replication service on the system already. If this hasn't been done then a You must configure the replication service first with 'recitalreplication rrs' error will be returned. 
The first time a table is flagged for replication all rows must be updated with a sequence number, if the table contains a large number of row this process may take some time. Once it's been the table is flagged so this process will not be done again. All new rows inserted after this point will automatically contain their own unique sequence number. For the life of a table these numbers will never be duplicated.
Access Control Lists
You must add the directory path or database name that contains the tables you want to replicate into the replication.allow file. Only tables that match in this file and are not matched in the replication.deny file will be replicated.
If you wish to limit which tables are replicated you may do this with access control lists. Access control lists are defined in two files stored in the conf directory which is in the root recital installation directory. Comment lines can be added to these file by starting the line with # symbol. The access control specification is "directory path" or [database name!]tablename, where an optional database name may be specified. Wild cards may also be specified with ''*'' for all text matches and ''%'' for single character matches. Name expansion from environment variables can also be used.
| Example | Description | 
|---|---|
| southwind!* | This would include all tables in the southwind database. | 
| /tmp/* | This would include all tables in the directory "/tmp/" | 
| ${DB_TMPDIR}/* | This would include all tables in the directory expanded from the environment variable DB_TMPDIR | 
replication.allow
This file describes the names of the tables which will be replicated if REPLICATION is SET ON. If this list is not empty then only tables matching will be replicated.
If this file is empty then no tables will be replicated even if SET REPLICATION is ON.
replication.deny
This file describes the names of the tables which will NOT be replicated if REPLICATION is SET ON. If this list is not empty then any tables matching will be not be replicated.
Replication Services Configuration
There are two replication services that can be used in the Recital, a subscriber and a publisher.
Subscriber
The subscriber service is used by any system that needs to update its databases from published data. Each slave needs to start a subscriber service to connect to the published data on the Master server.
Publisher
The publication service is only used in a Peer to Peer configuration and only the master publisher would run this service. This service will retrieve the data from the replication queue tables on each peer. This service checks for conflicts before publishing the data for subscriber services.
Recital Database Server
In order for either of these services to work the Recital Database Server must be installed and running on the system publishing the data.
ODBC Data Source
In order for the replication services to connect to the Recital Database Server you must define a ODBC data source in the odbc.ini file. The Data Source name format is Recital Replication Service on plus the node name. The Driver name is Recital and the Database format is ODBC:RECITAL: The following options can also be added;
ServerName
This is the publishers server name that has the Recital Database Server running.
UserName
This is the username to used to login on the publisher server. This is optional here as it can be specified in the argument command line when starting the service.
Password
This is the password for the username used to login on the publisher server. This is optional here as it can be specified in the argument command line when starting the service.
Database
This is the default data database that will opened when the service has connected to the Recital Database Server. This should be set to replication
Exclusive
This specifies how the replication database is opened and should be set to false so it's opened shared.
odbc.ini example
[Recital Replication Service on jazz] Driver = Recital DATABASE=ODBC:RECITAL:SERVERNAME=jazz.recital.local;USERNAME=replication;PASSWORD=recital;DATABASE=replication;EXCLUSIVE=FALSE
Initial data setup
Before starting the replication services, the best way to initially load the data from the master server onto the peers or slaves is to use the Recital recitaldump and recitalrestore commands:
# recitaldump -d /directory/path
recitalreplication
The replication services are administrated with the recitalreplication command. See recitalreplication for full details.
Master Slave Configuration
A Master Slave configuration in Recital works with the published data stored in the queue table of the replication database. The Recital engine on the master system (the source of the database changes) writes all updates, inserts, deletes, recalls, packs and zaps into the queue table of the replication database. These transactions are stored in a XML format in the queue table.
Slaves are configured to subscribe to the master and to execute the published events on the slave's local databases. The Master is dumb in this scenario. Once replication has been enabled, all statements are published in the replication queue. If required, you can configure the Master to only publish events that apply to particular databases or tables
Each subscribed slave will get a copy of the changes published since the last time it connected. Slaves keep a record of the position within the published queue that they have read and processed. This means that multiple slaves can be connected to the master and executing different parts of the same published data. Because the slaves control this process, individual slaves can be connected and disconnected from the server without affecting the master's operation. Also, because each slave remembers its position within the published queue, it is possible for slaves to be disconnected, reconnect and then 'catch up' by continuing from the recorded position. 
Both the master and each slave must be configured with a unique id. In addition, the slave must be configured with information about the master host name, RTQ database name and position within that file. 
Starting MSR
You should start the Master Slave Replication processes in the following order:
Starting Master
On the system specified as the Master you must start the Recital Database Server first with the command;
# recitalreplication start
Starting Slaves
On the systems specified as Slaves you must start the Subscriber Service with the command:
# recitalreplication start
Checking status
Once you have started the subscriber services on the Slaves you can check their current status with the command:
# recitalreplication status
An example out put for the Slave Subscriber Service would be:
Subscriber service 'port058' is running on a 60 second delay. Connected to publisher 'jazz' in a Master Slave configuration. Current transaction queue size is 0 Last transaction processed was 0 at Remaining transactions in queue None
Processing Transactions
The subscriber service will sleep the specified number of seconds between each set of transaction. Then it will connect to the master publisher and retrieve all published data since it last connected. For each transaction that it processes it will attempt to open the table in the required mode. For ZAP and PACK transactions it must open the table exclusively, for all others it will open it shared. If it can't find the table specified via the path or can't open it in the required mode an error will be returned.
When the subscriber has to perform changes to existing rows in tables it will use the sync number specified in the transaction to locate the row in the specified table. As all sync number as ascending in order it uses a algorithm to locate the row by truncating the search into sections.
Maintenance
Clearing the data in the queue that has been replicated.
