Handling damaged files: Difference between revisions
(2 intermediate revisions by one other user not shown) | |||
Line 149: | Line 149: | ||
If the file is actually damaged it might be necessary to initialise it first using CLEARFILE filename. | If the file is actually damaged it might be necessary to initialise it first using CLEARFILE filename. | ||
The following is an example of the error in the ADS File at the time of backup. | The following is an example of the error in the ADS File at the time of backup. | ||
Line 264: | Line 262: | ||
Before you proceed with fixing the file, first check for damaged files in the database by running CHK.FILES in maintenance mode as shown in [[Troubleshooting_NEOSYS_Generally#Checking_for_corrupt_database_files|Checking for corrupt database files]]. Because there may be a scenario of a temporary damaged file which already has got fixed or there are other damaged files in the system. | Before you proceed with fixing the file, first check for damaged files in the database by running CHK.FILES in maintenance mode as shown in [[Troubleshooting_NEOSYS_Generally#Checking_for_corrupt_database_files|Checking for corrupt database files]]. Because there may be a scenario of a temporary damaged file which already has got fixed or there are other damaged files in the system. | ||
If FIXFILE takes too long and leads to B703 error, and if the damaged file is not really necessary (e.g. REQUESTLOG), then use CLEARFILE instead. | |||
''SHADOW'' is a type of log file that can lose some records without serious impact and can therefore also be fixed with FIXFILE. | ''SHADOW'' is a type of log file that can lose some records without serious impact and can therefore also be fixed with FIXFILE. |
Latest revision as of 13:56, 15 March 2018
Symptoms
"A Damaged file has been found..... ((the rest of this message to be completed)) Group format Error: OS File : BOOKING.TEXTS Group# : 00000932 Record length incorrect or frame linkage error.
Cause
Rare but nasty problem on server, 99% caused by power failure, killing NEOSYS or server hardware or operating system failure.
- Did somebody kill NEOSYS processes to reboot the server or any other reason?
- Was there a power cut?
- Any expected or unexpected reboots in the server log? (see Event Viewer > System Log)
- Enough disk free space especially on servers not dedicated to NEOSYS?
- Third party software running on NEOSYS server?
- RAID installed on the server?
- Did windows update happen?
Solution
Damaged files, when detected, cause an immediate termination of the currently executing program/user request and a restart of the process is involved. Some damaged files are automatically fixed during restart. Although the process "restarts" it may be internally corrupted in some way and it is advisable to close it completely and start or allow another process to start automatically.
However, the solution always depends on the cause and usually often requires aggressive interrogation to discover the cause.
Guidelines:
- Check for damaged files to see if the file is in fact damaged.
- If damaged file is found, email the users about the downtime required to fix the damaged file. When emailing large clients, support MUST use less scary language unless that is what you specifically want to do, because scaring large clients may be counter-productive.
- Identify the cause of the damaged file and create a report.
- Depending on the type of damaged file follow the procedures mentioned below to handle Damaged Files:
- Verify if the damaged file is fixed in the database by following instructions at Checking for corrupt database files.
Recovery
It is URGENT to stop using NEOSYS and fix the damaged file because it will cause NEOSYS to crash everytime it happens to update that part of the file. NEOSYS crashing while updating often produces unrecoverable inconsistencies in the database that can only be fixed by programmers sitting for hours trying to edit the database .... or by restoring from a backup causing loss of work.
See the following sections how to patch the damaged files.
Files that are automatically fixed
- LISTS
- LOCKS
- !INDEXING
The above files are automatically checked for corruption (and initialised to empty if so) when you startup a NEOSYS process and do not require any manual intervention apart from locating and eliminating the initial cause of the damage.
If the !INDEXING file was damaged then you should also do REINDEXALL to reinstate any possibly missing index entries, but since this will take a long time you should do this when everybody is off the system in the evening probably. A few missing index entries may only mean some records missing from some reports which may or may not be seriously inconvenient. It is a temporary file that should not hold any records for longer than it takes to update the various indexes immediately when the user saves any document.
Each process has its own LISTS file, so damage is limited to one process for the LISTS file. The process number can be seen in the error message. for example:
Process=7
You can also see the process number on the server screen. For example here is a process 2.
13:17:31 TESTDB NEOS0002 0 Listening 7 hours, 49 mins ...
The LISTS file is a temporary file for sorting records into order for reports, popups etc. Therefore any damage usually causes no real inconvenience apart from the failed report or popup on which the user might receive the "damaged LISTS file" message.
The error message would appear as follows :
SYSTEM ERROR: An internal file LISTS was found to be damaged and has been patched. INFORM YOUR TECHNICAL SUPPORT
SOME DATA HAS PROBABLY BEEN LOST AND THE NEOSYS SYSTEM MAY MALFUNCTION
Server=NEOSYS-SERVER Install=D:\NEOSYS\NEOSYS\ Version=17:16:04 23 JAN 2011 Database=1D68EE63 ADLINED Process=2
The error could also appear as follows :
A DAMAGED FILE HAS BEEN FOUND. YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW. DO NOT CONTINUE USING THE SYSTEM!
Group Format Error: File : LISTS Group # : 00000014 Record length incorrect or frame linkage error. Server=NEOSYS-SERVER Install=D:\NEOSYS\NEOSYS\ Version=17:16:04 23 JAN 2011 Database=1D68EE63 ADLINED Process=2
OR
A DAMAGED FILE HAS BEEN FOUND. YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW. DO NOT CONTINUE USING THE SYSTEM!
Group Format Error: File : LISTS Group # : 00000005 Record length incorrect or frame linkage error. 114 files checked. Failed with 1 error !!!DANGER!!! !!! DAMAGED FILES HAVE BEEN FOUND IN THE NEOSYS SYSTEM !!! !!!BACKUP NOT DONE !!!
We might also get errors related to Temporary Files as follows:
A DAMAGED FILE HAS BEEN FOUND. YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW. DO NOT CONTINUE USING THE SYSTEM! Group Format Error: File : MATDEL_MEDIALIN_3 Group # : 13369375 Wrong Overflow frame linked to group. Server=NEOSYS-SERVER Install=D:\NEOSYS\NEOSYS\ Version=11:26:27 26 JUL 2010 Database=23BF2931 MEDIALIN Process=3 Client=100_100_100_77 User=HAMADA Request=MEDIADIARY @Id= Data=
Since these are temporary files, we need not use FIXFILE. They get cleared automatically.
Fixing damaged file that have a process to recreate them
Identify the cause of the damaged file and create a report.
Below files can be recreated from the maintenance mode.
BALANCES can be fixed/recreated using FIXFILE followed by CHK.POST
ADS file can be recreated with CREATEADS
ANALYSIS file can be recreated with F5 CREATEANALYSIS
BRANDS files can be recreated with F5 CREATEBRANDS
If the file is actually damaged it might be necessary to initialise it first using CLEARFILE filename.
The following is an example of the error in the ADS File at the time of backup.
Group Format Error: OS File : ADS Group # : 00009543 Record length incorrect or frame linkage error. 217 files checked. Failed with 1 errors !!! DANGER !!! !!! DAMAGED FILES HAVE BEEN FOUND IN THE NEOSYS SYSTEM !!! !!! BACKUP NOT DONE !!!
Before doing the solution below, please check the number of damaged files on the database by running CHK.FILES in maintenance mode as shown in Checking for corrupt database files.
Solution
Stop all the NEOSYS processes of the database to fix, before doing the below procedure. (see Closing NEOSYS Services for instructions)
Open Maintenance mode and log into the database you want to run the fix on.
F5 CREATEXXXX
(XXXX = ADS, ANALYSIS OR BRANDS)
Select the option appropriate option and start the procedure and hit ENTER
CREATEADS no longer asks to choose CREATE/CHECK/REINDEX and only does CREATE from now on. CHECK was never used and REINDEXADS command is available independently.
To verify if the above is done successfully and check if the damage files is fixed in the database run CHK.FILES as shown in Checking for corrupt database files
Fixing damaged index files (names starting with !)
Whilst backing up or using NEOSYS the following error message might appear:
"A Damaged file has been found..... Group format Error: OS File : !ADS Group# : 00000932 Record length incorrect or frame linkage error.
- Ensure that no-one is using NEOSYS and all the processes for that particular dataset is shutdown
- Identify the cause of the damaged file and create a report.
- Log into NEOSYS maintenance on the server
- Press F5 to get a TCL command box
- Type REINDEXADS and press enter.
- Wait until it finishes and you get back to the command box
- Verify if damaged file is fixed using CHK.FILES as shown in Checking for corrupt database files
- Type OFF in the command box and you will quit NEOSYS.
REINDEX command is used to rebuild all indexes. REINDEXALL should NOT be done during working hours on databases with large files. The REINDEXALL process can take up to 4 to 5 working hours depending on the file size. You can save time when indexing very large files by indexing only selected indexes instead of all indexed for the file. You can select the field(s) that need to be reindexed from a list of indexed fields which REINDEX filename presents you after you enter the command.
Other reindex commands are:
- REINDEXBATCHES batches=journals
- REINDEXVOUCHERS
- REINDEXACCOUNTS recreates !ACCOUNTS
- REINDEXADDRESSES
- REINDEXADS
- REINDEXBOOKINGS recreates !BOOKING.ORDERS
- REINDEXBRANDS
- REINDEXCLIENTS
- REINDEXINVS recreates !INVOICES
- REINDEXJOBS
- REINDEXMATERIALS
- REINDEXPLANS
- REINDEXPRODINVS recreates !PRODUCTION.INVOICES (includes estimates)
- REINDEXPRODORDERS recreates !PRODUCTION.ORDERS (includes production invoices)
- REINDEXRATECARDS
- REINDEXSCHEDULES
- REINDEXSUPPLIERS
- REINDEXTIMESHEETS
- REINDEXVEHICLES
REINDEXMEDIA does all the following
- REINDEXPLANS
- REINDEXSCHEDULES
- REINDEXADS
- REINDEXMATERIALS
- REINDEXBOOKINGS
REINDEXACCS does all of the following
- REINDEXACCOUNTS
- REINDEXBATCHES
- REINDEXVOUCHERS
REINDEXFILES does all the following
- REINDEXCLIENTS
- REINDEXBRANDS
- REINDEXSUPPLIERS
- REINDEXVEHICLES
- REINDEXRATECARDS
REINDEXPRODUCTION does all the following
- REINDEXJOBS
- REINDEXPRODORDERS
- REINDEXPRODINVS
- REINDEXTIMESHEETS
REINDEXALL does all the following
- REINDEXMEDIA
- REINDEXPRODUCTION
- REINDEXFILES
- REINDEXINVS
- REINDEXACCS
- REINDEXADDRESSES
Fixing damaged files that have no process for recreating them
Some damaged files can be rebuilt using NEOSYS utilities especially index files with filenames starting with !. Other damaged files like LOG20xx can be “fixed” to prevent NEOSYS crashes as follows but some records will most probably be lost. Depending on the file loss of a few records may not be too bad compared to restoring a database but the client should be informed and make the decision. Depending on the file it is usually best to lose the records. We can make a list of files that one can afford to lose records and those files that can be rebuilt programmatically from other files. Booking order texts they can lose records but inform them and especially the numbers mentioned in FIXFILE
Before you proceed with fixing the file, first check for damaged files in the database by running CHK.FILES in maintenance mode as shown in Checking for corrupt database files. Because there may be a scenario of a temporary damaged file which already has got fixed or there are other damaged files in the system.
If FIXFILE takes too long and leads to B703 error, and if the damaged file is not really necessary (e.g. REQUESTLOG), then use CLEARFILE instead.
SHADOW is a type of log file that can lose some records without serious impact and can therefore also be fixed with FIXFILE.
STATISTICS is another file, which contains the user statistics on the support menu, so the loss of some data is not significant. There is no process to rebuild STATISTICS records and can be fixed with FIXFILE.
A DAMAGED FILE HAS BEEN FOUND. YOU MUST GIVE THE FOLLOWING MESSAGE TO TECHNICAL SUPPORT NOW. DO NOT CONTINUE USING THE SYSTEM!
Group Format Error: File : STATISTICS Group # : 00000045 Record length incorrect or frame linkage error. 114 files checked. Failed with 1 error !!!DANGER!!! !!! DAMAGED FILES HAVE BEEN FOUND IN THE NEOSYS SYSTEM !!! !!!BACKUP NOT DONE !!!
FIXFILE will not run if there are neosys services still running so before doing the FIXFILE command, shut down the processes for the dataset in question.
NOTE - If there are multiple datasets running on the server. Shut down the particular dataset which has got the problem. (see Closing NEOSYS Services for instructions).
If you are trying to close processes during the day then check if there are users that are actively working on that dataset in question. If yes, then email the clients asking them to log off till the process is complete. If there are no users actively working then you may proceed to complete the fix quickly before users login so that users are not disturbed during their daily work.
Press F5
FIXFILE filename
Choose "Yes" for the following message:
Do you want to actually update the file?
For error messages, type " ED DOS XXXXX.MSG ". To find the file name to be used in this command type the below:
F5 DIR *.msg
In case you get any Overwrite messages – type YES, as these are the actual files being updated.
The only alternative is restore from a previous backup in this case losing a day's work. Nasty, and even worse, it will be impossible to reproduce the days work and especially the exact serial numbers of critical documents like orders, invoices, receipts and payments advices.
Fixing damaged NEOSYS program files and other non-data files
Some files listed below could be NEOSYS program files or other non-data files in the NEOSYS directory or subdirectories and can probably only be fixed by reinstalling NEOSYS software.
- VOC
There could be other files as well. You can check if any particular file is a data file in Maintenance mode F5 as follows:
ED FILES filename
If it is a data file then the first line will mention something like ..\DATA\DEMO\ACCOUNTS
Using FIXFILE to repair corrupted files
BEWARE: FIXFILE causes loss of data. If the data is critical then use of FIXFILE can be totally catastrophic without any warning.
Hence, in such situations it is advisable to talk to NEOSYS technical staff to see if they can provide any proper fix or provide any better comment about the impact of loss of data rather than implement FIXFILE directly.
Please Note:
- Using FIXFILE on some files is a guaranteed way to cause complete chaos for the users without them being aware of it until too late to do anything about it.
- Corrupt files will cause programs to abort every time they try to update that section of the file. Aborting programs causes possibly horrendous inconsistency in the database so it is important to stop all user activity and fix the corrupt file immediately.
- You can freely use FIXFILE on any file but it works by deleting any problematic data so cannot realistically be used on some files at all.
- FIXFILE cannot and does not recover any corrupted records therefore, for those files that have specific programs to rebuild them, FIXFILE is not the correct option.
Why use FIXFILE at all since it causes loss of data?
- Many files do not have programs to rebuild them and for some of those files, using FIXFILE is a quick way to get the system back in operation BUT with missing records. Missing records causes various degrees of failure ranging from the odd document missing, totally misleading information like wrong account balances, or any kind of system failure.
- FIXFILE does not fix the consistency of information in the database. When the database is is being updated with a transaction, usually a series of files are updated for the one transaction. A damaged file causes the update program to abort in the middle of the series of updates. Therefore some of the updates are done and the remaining updates are not done. This means that the data in the database (even after fixing the corrupt file) is inconsistent.
- FIXFILE merely fixes the technical structure of the file so that any further program accesses to that portion of the file do not cause immediate program termination and consequent likely catastrophic database inconsistency.
IMPORTANT : You must prevent anyone from accessing NEOSYS while FIXFILE is running.
List of Files Impacted by FIXFILE
- USERS File : The main data of any lost records from the USERS file will be recreated any time the Authorisation File is saved, however only the information in the Authorisation File is recreated which means that all personal preferences, holiday records, login history etc will be lost.
Fixing Damaged DEFINITIONS file
Non-Solution
You will probably not be able to login if the DEFINITIONS FILE is damaged. You can press Alt+S and type EXECUTE to get a TCL command box but this still doesnt help because you cannot use FIXFILE on DEFINITIONS.
CAUTION: DO NOT USE FIXFILE TO TRY AND FIX THE DEFINITIONS FILE.
DOUBLE CAUTION: IN CASE YOU DECIDE TO IGNORE THIS EXPECT ALL KINDS OF HORRIBLE PROBLEMS IN THE DATABASE REQUIRING A COMPLETE RESTORE FROM A GOOD BACKUP AFTER A FEW DAYS OF PAIN BECAUSE RANDOM OLD DATABASE UPGRADE/CONVERSIONS WILL BE PERFORMED *AGAIN* WITHOUT YOU KNOWING IT.
Partial Restore
In this solution we only replace the damaged DEFINIT.OV and DEFINIT.LK files with a good copy available from recent backup. Procedure is as follows:
- Restore the latest successful backup in test dataset. For info on how to Restore refer How to Restore NEOSYS from Backup. BACKUP.ZIP is password protected, so you have to first RESTORE the backup before you can copy selective files. Since we need only the DEFINIT.OV and DEFINIT.LK files, we restore the latest successful backup of the live dataset, onto test dataset and copy the required files.
- Copy the 2 files from test dataset folder i.e. DATA\testdatasetcode\GENERAL to live dataset folder i.e. DATA\datasetcode\GENERAL.
CAUTION: All changes done in the Configuration File, Authorisation File, Users File and various other changes will be reverted to the date of the DEFINITIONS files being restored. If you are restoring from last nights backup or testdata then this will not be much but could cause serious confusion to people who have changed their passwords etc.
Full Restore
Restore from your latest backup losing all work done since that time and date.
Preventing users from accessing NEOSYS in order to do maintenance or repairs
On most systems you can simply stop the NEOSYS processes. On other systems where NEOSYS processes are automatically started in response to requests from the web, you need to disable the autostart by one of the following methods
- Stop the autostart program/script
- Put a file called NET.CFG with a single line saying AUTOSTART=NO in the neosys\neosys folder
- Stop the IIS server
- Stop/Pause the particular web in question
Dealing with damaged files that cannot be fixed
If programmers are unable to fix the damage in a reasonably period of time and the damage is caused due to matters beyond NEOSYS control such as non-dedicated servers and power failures then data must be restored and client advised why they are going to suffer loss of data.
THIS DOES NOT RELEASE NEOSYS SUPPORT STAFF FROM THE OBLIGATION TO PROACTIVELY INFORM THE CLIENT OF MEASURES TO PREVENT THE RECURRENCE OF THE EVENT.
The following are the steps to follow:
- Ascertain the cause of the damage to the database by looking at the internal logs
- Create a new database from the damaged one as it will contain the up to date but damaged database.
- Restore the database from the backup media so that the database is in the last good known state.
- The client can see what work was done over the past couple of days from the damaged database and it will assist them to redo the work in the currently restored out of date main database.
- Next discuss with the client about the cause of the damage and (a) what to do to prevent the damage occurring again (b) to agree a procedure to perform a manual backup in case the automatic one is not performed for any reason overnight.