Handling damaged files

From NEOSYS Technical Support Wiki
Revision as of 12:59, 4 July 2012 by Vineet (talk | contribs)
Jump to navigationJump to search

Symptoms

"A Damaged file has been found..... ((the rest of this message to be completed)) 
Group format Error:
OS File : BOOKING.TEXTS
Group# : 00000932
Record length incorrect or frame linkage error.

Cause

Rare but nasty problem on server 99% caused by power failure, killing NEOSYS or server hardware or operating system failure.

  • Did somebody kill NEOSYS processes to reboot the server or any other reason?
  • Was there a power cut?
  • Any expected or unexpected reboots in the server log? (see Event Viewer > System Log)
  • Enough disk free space especially on servers not dedicated to NEOSYS?
  • Third party software running on NEOSYS server?
  • RAID installed on the server?
  • Did windows update happen?

Solution

Damaged files, when detected, cause an immediate termination of the currently executing program/user request and a restart of the process is involved. Some damaged files are automatically fixed during restart. Although the process "restarts" it may be internally corrupted in some way and it is advisable to close it completely and start or allow another process to start automatically.
However, the solution always depends on the cause and usually often requires aggressive interrogation to discover the cause.

Guidelines:

  1. Identify the cause of the damaged file
  2. Depending on the type of damaged file follow the procedures mentioned below to handle Damaged Files:
  3. Verify if the damaged file is fixed in the database by following instructions at Checking for corrupt database files

Recovery

It is URGENT to stop using NEOSYS and fix the damaged file because it will cause NEOSYS to crash everytime it happens to update that part of the file. NEOSYS crashing while updating often produces unrecoverable inconsistencies in the database that can only be fixed by programmers sitting for hours trying to edit the database .... or by restoring from a backup causing loss of work.

See the following sections how to patch the damaged files.

Files that are automatically fixed

  • LISTS
  • LOCKS
  • !INDEXING

The above files are automatically checked for corruption (and initialised to empty if so) when you startup a NEOSYS process and do not require any manual intervention apart from locating and eliminating the initial cause of the damage.

If the !INDEXING file was damaged then you should also do REINDEXALL to reinstate any possibly missing index entries, but since this will take a long time you should do this when everybody is off the system in the evening probably. A few missing index entries may only mean some records missing from some reports which may or may not be seriously inconvenient. It is a temporary file that should not hold any records for longer than it takes to update the various indexes immediately when the user saves any document.

Each process has its own LISTS file, so damage is limited to one process for the LISTS file. The process number can be seen in the error message. for example:

Process=7

You can also see the process number on the server screen. For example here is a process 2.

13:17:31 TESTDB NEOS0002 0 Listening 7 hours, 49 mins ...

The LISTS file is a temporary file for sorting records into order for reports, popups etc. Therefore any damage usually causes no real inconvenience apart from the failed report or popup on which the user might receive the "damaged LISTS file" message.

The error message would appear as follows :

SYSTEM ERROR:
An internal file LISTS was found
to be damaged and has been patched.
INFORM YOUR TECHNICAL SUPPORT
SOME DATA HAS PROBABLY BEEN LOST AND THE NEOSYS SYSTEM MAY MALFUNCTION
Server=NEOSYS-SERVER Install=D:\NEOSYS\NEOSYS\ Version=17:16:04 23 JAN 2011 Database=1D68EE63 ADLINED Process=2

The error could also appear as follows :

A DAMAGED FILE HAS BEEN FOUND.
YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW.
DO NOT CONTINUE USING THE SYSTEM!
Group Format Error: File  : LISTS Group # : 00000014 Record length incorrect or frame linkage error. Server=NEOSYS-SERVER Install=D:\NEOSYS\NEOSYS\ Version=17:16:04 23 JAN 2011 Database=1D68EE63 ADLINED Process=2

OR

A DAMAGED FILE HAS BEEN FOUND.
YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW.
DO NOT CONTINUE USING THE SYSTEM!
Group Format Error: File  : STATISTICS Group # : 00000005 Record length incorrect or frame linkage error. 114 files checked. Failed with 1 error !!!DANGER!!! !!! DAMAGED FILES HAVE BEEN FOUND IN THE NEOSYS SYSTEM !!! !!!BACKUP NOT DONE !!!

Note - Statistics contains info about user usage as per support menu so loss of data is not critical and it will be rebuilt automatically

We might also get errors related to Temporary Files as follows:

A DAMAGED FILE HAS BEEN FOUND.
YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW.
DO NOT CONTINUE USING THE SYSTEM!

 Group Format Error:
 File    : MATDEL_MEDIALIN_3
 Group # : 13369375
 Wrong Overflow frame linked to group.

Server=NEOSYS-SERVER
Install=D:\NEOSYS\NEOSYS\
Version=11:26:27  26 JUL 2010
Database=23BF2931 MEDIALIN
Process=3
Client=100_100_100_77
User=HAMADA
Request=MEDIADIARY
@Id=
Data=

Since these are temporary files, we need not use FIXFILE. They get cleared automatically.

Fixing damaged file that have a process to recreate them

Below files can be recreated from the maintenance mode.

ADS file can be recreated with F5 CREATEADS

ANALYSIS file can be recreated with F5 CREATEANALYSIS

BRANDS files can be recreated with F5 CREATEBRANDS

If the file is actually damaged it might be necessary to initialise it first using CLEARFILE filename.

The following is an example of the error in the ADS File at the time of backup.

 Group Format Error:
 OS File : ADS
 Group # : 00009543
 Record length incorrect or frame linkage error.
 217 files checked.
 Failed with 1 errors
 !!! DANGER !!!
 !!! DAMAGED FILES HAVE BEEN FOUND IN THE NEOSYS SYSTEM !!!
 !!! BACKUP NOT DONE !!!
 Error while writing data.
 You have attempted to write to a read-only file.
                     - or -
 access to the file has been denied by the operating system.
 (operating system file name: "ADS")

Explanation: The last four lines are very rare and can be seen with the above error messages in the backup email. The above message indicates that some external application is trying to access the backup files at the time of backup.

Before doing the solution below, please check the number of damaged files on the database by running CHK.FILES in maintenance mode. [CHK.FILES can be done when users are online]

Solution

NOTE: Stop all the processes before doing the below procedure. (see Closing NEOSYS Services for instructions)

Procedure to run CREATEADS is as under

Login to maintenance

F5
CREATEXXXX

(XXXX = ADS, ANALYSIS OR BRANDS)

Select the option appropriate option and start the procedure and hit ENTER

To verify if the above is done successfully and check if the damage files is fixed in the database run CHK.FILES or CHK.FILES FILENAME (Recommended) from this link: Checking for corrupt database files

Fixing damaged index files (names starting with !)

Whilst backing up or using NEOSYS the following error message might appear:

"A Damaged file has been found.....
Group format Error:
OS File : !ADS
Group# : 00000932
Record length incorrect or frame linkage error.
  1. Ensure that no-one is using NEOSYS
  2. Log into NEOSYS on the server
  3. Press F5 to get a TCL command box
  4. Type REINDEXADS and press enter.
  5. Wait until it finishes and you get back to the command box
  6. Type OFF in the command box and you will quit NEOSYS.

Other reindex commands are:

(Note - You can save time when indexing very large files by indexing only selected indexes instead of all indexed for the file. You can select the field(s) that need to be reindexed from a list of indexed fields which REINDEX filename presents you after you enter the command.)

  • REINDEXACCOUNTS recreates !ACCOUNTS, !BATCHES and !VOUCHERS
  • REINDEXADDRESSES
  • REINDEXADS
  • REINDEXBOOKINGS recreates !BOOKING.ORDERS
  • REINDEXBRANDS
  • REINDEXCLIENTS
  • REINDEXINVS recreates !INVOICES
  • REINDEXJOBS
  • REINDEXMATERIALS
  • REINDEXPLANS
  • REINDEXPRODINVS recreates !PRODUCTION.INVOICES
  • REINDEXPRODORDERS recreates !PRODUCTION.ORDERS
  • REINDEXRATECARDS
  • REINDEXSCHEDULES
  • REINDEXSUPPLIERS
  • REINDEXTIMESHEETS
  • REINDEXVEHICLES

REINDEXMEDIA does all the following

  • REINDEXPLANS
  • REINDEXSCHEDULES
  • REINDEXADS
  • REINDEXMATERIALS
  • REINDEXBOOKINGS

REINDEXFILES does all the following

  • REINDEXCLIENTS
  • REINDEXBRANDS
  • REINDEXSUPPLIERS
  • REINDEXVEHICLES
  • REINDEXRATECARDS

REINDEXPRODUCTION does all the following

  • REINDEXJOBS
  • REINDEXPRODORDERS
  • REINDEXPRODINVS
  • REINDEXTIMESHEETS

REINDEXALL does all the following

  • REINDEXMEDIA
  • REINDEXPRODUCTION
  • REINDEXFILES
  • REINDEXINVS
  • REINDEXACCOUNTS
  • REINDEXADDRESSES

Fixing damaged files that have no process for recreating them

Some damaged files can be rebuilt using NEOSYS utilities especially index files with filenames starting with !. Other damaged files like LOG20xx can be “fixed” to prevent NEOSYS crashes as follows but some records will most probably be lost. Depending on the file loss of a few records may not be too bad compared to restoring a database but the client should be informed and make the decision. Depending on the file it is usually best to lose the records. We can make a list of files that one can afford to lose records and those files that can be rebuilt programmatically from other files. Booking order texts they can lose records but inform them and especially the numbers mentioned in FIXFILE

SHADOW is a type of log file that can lose some records without serious impact and can therefore can also be fixed with FIXFILE.

FIXFILE will not run if there are neosys services still running so before doing the FIXFILE command, shut down the processes for the dataset in question.

NOTE - If there are multiple datasets running on the server. Shut down the particular dataset which has got the problem. (see Closing NEOSYS Services for instructions).

If you are trying to close processes during the day then please check if there are users that are actively working on that dataset in question. If yes, then please email the clients asking them to log off till the process is complete. If there are no users actively working then you may proceed to complete the fix quickly before users login so that users are not disturbed during their daily work.

Press F5

FIXFILE filename

(Select option 1 to update the file)

In case you get any Overwrite messages – type YES, as these are the actual files being updated.

The only alternative is restore from a previous backup in this case losing a days work. Nasty, and even worse, it will be impossible to reproduce the days work and especially the exact serial numbers of critical documents like orders, invoices, receipts and payments advices.


Using FIXFILE to repair corrupted files

BEWARE: FIXFILE causes loss of data. If the data is critical then use of FIXFILE can be totally catastrophic without any warning. Hence, in such situations it is advisable to talk to NEOSYS technical staff to see if they can provide any proper fix or provide any better comment about the impact of loss of data rather than implement FIXFILE directly.

Please Note:

  1. Using FIXFILE on some files is a guaranteed way to cause complete chaos for the users without them being aware of it until too late to do anything about it.
  2. Corrupt files will cause programs to abort every time they try to update that section of the file. Aborting programs causes possibly horrendous inconsistency in the database so it is important to stop all user activity and fix the corrupt file immediately.
  3. You can freely use FIXFILE on any file but it works by deleting any problematic data so cannot realistically be used on some files at all.
  4. FIXFILE cannot and does not recover any corrupted records therefore, for those files that have specific programs to rebuild them, FIXFILE is not the correct option.


Why use FIXFILE at all since it causes loss of data?

  1. Many files do not have programs to rebuild them and for some of those files, using FIXFILE is a quick way to get the system back in operation BUT with missing records. Missing records causes various degrees of failure ranging from the odd document missing, totally misleading information like wrong account balances, or any kind of system failure.
  2. FIXFILE does not fix the consistency of information in the database. When the database is is being updated with a transaction, usually a series of files are updated for the one transaction. A damaged file causes the update program to abort in the middle of the series of updates. Therefore some of the updates are done and the remaining updates are not done. This means that the data in the database (even after fixing the corrupt file) is inconsistent.
  3. FIXFILE merely fixes the technical structure of the file so that any further program accesses to that portion of the file do not cause immediate program termination and consequent likely catastrophic database inconsistency.

IMPORTANT : You must prevent anyone from accessing NEOSYS while FIXFILE is running.

List of Files Impacted by FIXFILE

  1. USERS File : The main data of any lost records from the USERS file will be recreated any time the Authorisation File is saved, however only the information in the Authorisation File is recreated which means that all personal preferences, holiday records, login history etc will be lost.

Fixing Damaged DEFINITIONS file

Non-Solution

You will probably not be able to login if the DEFINITIONS FILE is damaged. You can press Alt+S and type EXECUTE to get a TCL command box but this still doesnt help because you cannot use FIXFILE on DEFINITIONS.

CAUTION: DO NOT USE FIXFILE TO TRY AND FIX THE DEFINITIONS FILE.

DOUBLE CAUTION: IN CASE YOU DECIDE TO IGNORE THIS EXPECT ALL KINDS OF HORRIBLE PROBLEMS IN THE DATABASE REQUIRING A COMPLETE RESTORE FROM A GOOD BACKUP AFTER A FEW DAYS OF PAIN BECAUSE RANDOM OLD DATABASE UPGRADE/CONVERSIONS WILL BE PERFORMED *AGAIN* WITHOUT YOU KNOWING IT.

Partial Restore

You can copy recent DEFINITIONS.LK and DEFINITIONS.OV files from the DATA\datasetcode\GENERAL folder.

CAUTION: All changes done in the Configuration File, Authorisation File, Users File and various other changes will be reverted to the date of the DEFINITIONS files being restored. If you are restoring from last nights backup or testdata then this will not be much but could cause serious confusion to people who have changed their passwords etc.

Full Restore

Restore from your latest backup losing all work done since that time and date.

Preventing users from accessing NEOSYS in order to do maintenance or repairs

On most systems you can simply stop the NEOSYS processes. On other systems where NEOSYS processes are automatically started in response to requests from the web, you need to disable the autostart by one of the following methods

  1. Stop the autostart program/script
  2. Put a file called NET.CFG with a single line saying AUTOSTART=NO in the neosys\neosys folder
  3. Stop the IIS server
  4. Stop/Pause the particular web in question

Dealing with damaged files that cannot be fixed

If programmers are unable to fix the damage in a reasonably period of time and the damage is caused due to matters beyond NEOSYS control such as non-dedicated servers and power failures then data must be restored and client advised why they are going to suffer loss of data.

THIS DOES NOT RELEASE NEOSYS SUPPORT STAFF FROM THE OBLIGATION TO PROACTIVELY INFORM THE CLIENT OF MEASURES TO PREVENT THE RECURRENCE OF THE EVENT.

The following are the steps to follow:

  1. Ascertain the cause of the damage to the database by looking at the internal logs
  2. Create a new database from the damaged one as it will contain the up to date but damaged database.
  3. Restore the database from the backup media so that the database is in the last good known state.
  4. The client can see what work was done over the past couple of days from the damaged database and it will assist them to redo the work in the currently restored out of date main database.
  5. Next discuss with the client about the cause of the damage and (a) what to do to prevent the damage occurring again (b) to agree a procedure to perform a manual backup in case the automatic one is not performed for any reason overnight.