PDA

View Full Version : ZAR out of memory error


Donnie
11th August 2007, 09:35
Running ZAR to recover data from a RAID5 array. I've been receiving out of memory errors today.

I added another 1GB of ram and another 12GB of swap to possibly avoid the problem. My system now has 3GB RAM and 16GB swap.

The 2.7 million files ZAR has found on the volume is approximately the number of files that had been on the array before the disk problem. ZAR was creating an autosave file when a fatal error - out of memory box popped up.

Running ZAR 8.3RC1 on XP SP2, Opteron 185, 3GB RAM, 16GB swap. ZAR process was using 1.83GB RAM when fatal error occurred.

http://www.apple2pl.us/moz-screenshot-1.jpg
http://www.apple2pl.us/moz-screenshot-2.jpg

Last 100 or so lines of the log are as follows:

ALERT!: Excessive MFT attribute size near ap 262; hint 91051
ALERT!: Excessive MFT attribute size near ap 262; hint 91051
ALERT!: Excessive MFT attribute size near ap 262; hint 91052
ALERT!: Excessive MFT attribute size near ap 262; hint 91052
ALERT!: Excessive MFT attribute size near ap 462; hint 91053
ALERT!: Excessive MFT attribute size near ap 462; hint 91053
ALERT!: Excessive MFT attribute size near ap 278; hint 91054
ALERT!: Excessive MFT attribute size near ap 278; hint 91054
ALERT!: Excessive MFT attribute size near ap 270; hint 91055
ALERT!: Excessive MFT attribute size near ap 270; hint 91055
ALERT!: Excessive MFT attribute size near ap 254; hint 91056
ALERT!: Excessive MFT attribute size near ap 254; hint 91056
ALERT!: Excessive MFT attribute size near ap 270; hint 91057
ALERT!: Excessive MFT attribute size near ap 270; hint 91057
ALERT!: Excessive MFT attribute size near ap 254; hint 91058
ALERT!: Excessive MFT attribute size near ap 254; hint 91058
ALERT!: Excessive MFT attribute size near ap 270; hint 91059
ALERT!: Excessive MFT attribute size near ap 270; hint 91059
ALERT!: Excessive MFT attribute size near ap 278; hint 91060
ALERT!: Excessive MFT attribute size near ap 278; hint 91060
ALERT!: Excessive MFT attribute size near ap 254; hint 91061
ALERT!: Excessive MFT attribute size near ap 254; hint 91061
ALERT!: Excessive MFT attribute size near ap 254; hint 91062
ALERT!: Excessive MFT attribute size near ap 254; hint 91062
ALERT!: Excessive MFT attribute size near ap 270; hint 91064
ALERT!: Excessive MFT attribute size near ap 270; hint 91064
ALERT!: Excessive MFT attribute size near ap 262; hint 91065
ALERT!: Excessive MFT attribute size near ap 262; hint 91065
ALERT!: Excessive MFT attribute size near ap 270; hint 91066
ALERT!: Excessive MFT attribute size near ap 270; hint 91066
ALERT!: Excessive MFT attribute size near ap 270; hint 91067
ALERT!: Excessive MFT attribute size near ap 270; hint 91067
ALERT!: Excessive MFT attribute size near ap 270; hint 91068
ALERT!: Excessive MFT attribute size near ap 270; hint 91068
ALERT!: Excessive MFT attribute size near ap 278; hint 91069
ALERT!: Excessive MFT attribute size near ap 278; hint 91069
ALERT!: Excessive MFT attribute size near ap 278; hint 91070
ALERT!: Excessive MFT attribute size near ap 278; hint 91070
ALERT!: Excessive MFT attribute size near ap 286; hint 91071
ALERT!: Excessive MFT attribute size near ap 286; hint 91071
ALERT!: zero-sized MFT attribute
ALERT!: zero-sized MFT attribute
ALERT!: Excessive MFT attribute size near ap 270; hint 91457
ALERT!: Excessive MFT attribute size near ap 270; hint 91457
ALERT!: Excessive MFT attribute size near ap 486; hint 91458
ALERT!: Excessive MFT attribute size near ap 486; hint 91458
ALERT!: Excessive MFT attribute size near ap 254; hint 91459
ALERT!: Excessive MFT attribute size near ap 254; hint 91459
ALERT!: Excessive MFT attribute size near ap 270; hint 91460
ALERT!: Excessive MFT attribute size near ap 270; hint 91460
ALERT!: Excessive MFT attribute size near ap 270; hint 91461
ALERT!: Excessive MFT attribute size near ap 270; hint 91461
ALERT!: Excessive MFT attribute size near ap 254; hint 91464
ALERT!: Excessive MFT attribute size near ap 254; hint 91464
ALERT!: Excessive MFT attribute size near ap 270; hint 91465
ALERT!: Excessive MFT attribute size near ap 270; hint 91465
ALERT!: Excessive MFT attribute size near ap 350; hint 91466
ALERT!: an exception has occured in TThreadParseNtfs.ReadIn() status Out of memory
Performance: NTFS read-in 16m 29s
Directory tree linking started - total 2739549 objects on volume
Directory structure loop for U ln
ses
Directory structure loop for Cookies
Performance: NTFS rebuild directory tree 1m 15s
Directory tree refinement and reconstruction started
Directory tree refinement and reconstruction pass 1 started for 2739549 objects
ALERT!: an exception has occured in TThreadParseNtfs.Relink() status Out of memory
Performance: NTFS directory tree cleanup 8s
Simple volume - Parse filesystem complete
Fragment reordering start
No fragments stored, rolling back
Performance: Ident data query (0 of 0) 0s
Performance: Ident data query (0 of 0) 0s
Performance: Ident data query (0 of 0) 0s
Performance: Ident data query (0 of 0) 0s
ALERT!: an exception has occured in TIdentList.slSaveToFile() status Out of memory
ALERT!: an exception has occured in TIdentList.slSaveToFile() status Out of memory
FATAL:
The fatal exeption has occured: Out of memory
Last observed error code was 8.
Unable to continue processing, program will now terminate.
Please file a bug report.
FATAL:
The fatal exeption has occured: Out of memory
Last observed error code was 8.
Unable to continue processing, program will now terminate.
Please file a bug report.
FATAL:
The fatal exeption has occured: Out of memory
Last observed error code was 8.
Unable to continue processing, program will now terminate.
Please file a bug report.
FATAL:
The fatal exeption has occured: Out of memory
Last observed error code was 8.
Unable to continue processing, program will now terminate.
Please file a bug report.
FATAL:
The fatal exeption has occured: Out of memory
Last observed error code was 8.
Unable to continue processing, program will now terminate.
Please file a bug report.

Donnie
11th August 2007, 11:08
Also tried adding /3GB switch to XP's boot.ini file.

ZAR's log file still shows only 2047MB installed when 3072MB is in fact physically installed and detected by XP (see screenshot in above post):

Init start - misc
Init done - misc
Init start - types
Init done - types
Init start - math B
Vote : 1; Count=3 ; separation
Vote : 1; Count=3 ; separation
Vote : 10; Count=4 ; separation
Init done - math B
Init start - filters
Init done - filters
Init start - PDF validation
Init done - PDF validation
Init start - OSS validation
Init done - OSS validation
Init start - validation
Init done - validation
Init start - LDM parser
Init done - LDM parser
Init start - MFT regions
Vote : -1; Count=0 ; separation
Init done - MFT regions
Init start - caches
Init done - caches
Init start - lazy writer
Init done - lazy writer
Init start - copier
Thread startup: copier, TID 000007EC
Init done - copier
Init start - RAIDs
Init done - RAIDs
Init start - monitor
Init done - monitor
Init start - math A
Init done - math A
Init start - NTFS parameters
Init done - NTFS parameters
Init start - FAT parser
Init done - FAT parser
Init start - files
Init done - files
Init start - automatic update
Init done - automatic update
Init start - preview renderer
Init done - preview renderer
Init start - preview renderer
Init done - preview renderer
Init start - image recovery parser
Init done - image recovery parser
Init start - saveload
Init done - saveload
Init start - ident processing
Performance: Ident data query (3 of 3) 0s
Performance: Ident data query (9 of 9) 0s
Performance: Ident data query (2 of 9) 0s
Init done - ident processing
Init start - physical devices
Init done - physical devices
Init start - tasks
Init done - tasks
Init start - threads
Init done - threads
Init start - logging
Init done - logging
NT 5.1.2600 Service Pack 2; 2047 MB RAM

Alexey V. Gubin
11th August 2007, 11:39
I read it so that there are about 2.7 millions of files on the volume. About 2'700'000 files and directories combined. Is this the expected number of files?

Donnie
11th August 2007, 11:47
2.7 million files, and another 300,000 folders. This is the expected amount.

Alexey V. Gubin
11th August 2007, 15:04
Made me thinking. We tested up to 500,000 (files and folders combined). At about six times this, I expect some problems will occur.
I see from the log what the problem is, but I do not think we can resolve this fast enough to be practical.
Could you please PM me your order number so I can arrange the refund?

Donnie
11th August 2007, 16:08
I'm not opposed to a refund.

However ZAR is the only software out of a number of programs that I've tried (DiskInternals NTFS Recovery, Easy Recovery Pro, Active@ File Recovery, Pc Inspector File Recovery...none work, all either hang or give access violations) that looks as if it could recover these files. Most are digital photos of my kids since the day they were born, from 2001 through to 2006.

Although I can double click on the D: drive (corrupted NTFS drive) and XP claims the disk is corrupted, CHKDSK claims it can fix the problem. Running CHKDSK in read-only mode finds "minor problems" with thousands of files which suggests that it may only be minor problems but I don't want to mess things up any further by letting CHKDSK try to fix things.

I'm going to install VMWare Server and take a snapshot of the drive and then run CHKDSK so any changes aren't written directly to the drive but to a snapshot instead. That way if I still can't recover anything with chkdsk I can always rollback the snapshot.

Alexey V. Gubin
11th August 2007, 16:25
Please PM the order number - this qualifies for a refund regardless of what may happen later.

We're currently looking into some way to reduce the memory footprint so it will fit in 2GB. However, at this point nothing short of a major rework is on the horizon.

Current version holds all the files, file names, and cluster maps (where the file is located) in memory. So we are talking about 400 bytes per file, plus a file name, plus a cluster map. This brings us somewhere into 1KB per file memory requirement. With that and 2GB address space available, the limit is around 2M objects (files and folders combined). Taking into account variuos caches, search indexes, and object lists, the more reaslistic estimate is around 1M objects max.

I'll keep you informed on the progress but that is not going to be quick (considering it took about an hour to just create a test set with 3M files).

Donnie
12th August 2007, 09:54
I've sent you my order number. I've got time to wait for something that works. I'll try anything so long as it doesn't write to the original drives.

My idea about using a snapshot under VMWare doesn't work where I'm trying to use a physical disk with my guest. I'd have to make an image first and then work with the image.

I did try installing Win2003 Enterprise and ZAR then detected 3072MB. When running ZAR it did get a bit further in processing the dirve, claiming to be sorting fragments after finding 2842885 files in 313053 directories. Same error though:

....
....
Init start - tasks
Init done - tasks
Init start - threads
Init done - threads
Init start - logging
Init done - logging
NT 5.2.3790 Service Pack 2; 3071 MB RAM
Open physical drive 00000100 success, 465 GB : ST350063 0AS - Port 3, Bus 1, Target 0, LUN 0; maxLBA=976773168
Open physical drive 00000101 success, 894 GB : NVIDIA RAID5 894.27G - Port 0, Bus 0, Target 1, LUN 0; maxLBA=1875427200
Open physical drive 00000102 success, 465 GB : ST350063 9QG1AWM7; maxLBA=976773168
TS: Stop validation
TS: Stopped validation
....
....
....
ALERT!: Excessive MFT attribute size near ap 278; hint 43074
ALERT!: Excessive MFT attribute size near ap 342; hint 43082
ALERT!: Excessive MFT attribute size near ap 342; hint 43082
ALERT!: Excessive MFT attribute size near ap 254; hint 43085
ALERT!: Excessive MFT attribute size near ap 254; hint 43085
ALERT!: Excessive MFT attribute size near ap 342; hint 43086
ALERT!: Excessive MFT attribute size near ap 342; hint 43086
ALERT!: Excessive MFT attribute size near ap 254; hint 43087
ALERT!: Excessive MFT attribute size near ap 254; hint 43087
ALERT!: an exception has occured in TThreadParseNtfs.ReadIn() status Out of memory
Performance: NTFS read-in 16m 27s
Directory tree linking started - total 2836887 objects on volume
Directory structure loop for U ln
ses
Directory structure loop for Cookies
Performance: NTFS rebuild directory tree 1m 5s
Directory tree refinement and reconstruction started
Directory tree refinement and reconstruction pass 1 started for 2836887 objects
ALERT!: an exception has occured in TThreadParseNtfs.Relink() status Out of memory
Performance: NTFS directory tree cleanup 7s
Simple volume - Parse filesystem complete
Fragment reordering start


If I copied this drive into two images at 50% of the original size, would ZAR be just as likely to find files in the disk images? If so it would use 50% of the memory of the full disk.

Alexey V. Gubin
12th August 2007, 11:21
Refunded the order.

ZAR will not work on a part of the image. Anyway, the whole MFT needs to be loaded into memory otherwise you have only part of the directory tree available.

Alexey V. Gubin
14th August 2007, 14:12
Current estimation is that we will have it in about two more days work.

Donnie
16th August 2007, 09:21
I can wait. Though my fingers are getting itchy to do a CHKDSK /R :-)

Alexey V. Gubin
16th August 2007, 11:53
I think we're getting close to it here.
In the meanwhile, if you have a log file from the RAID reconstruction run,
search in it for the fragment similar to the following:


00000268: Final nonzero RAID layout scores
00000268: RAID0 (Stripe set):66.67
00000268: RAID5 (MS/LDM):100.00
000007FC: Virtual RAID settings changed for Virtual RAID #0
000007FC: -------------------------------------
000007FC: Volume information for ID 00000000
000007FC: Origin : Reconstructed RAID
000007FC: Partition type : Unknown
000007FC: Capacity : 819 MB
000007FC: Number of sectors : 1677316
000007FC: RAID type : RAID5 (MS/LDM)
000007FC: Stripe size : 128
000007FC: Rotation parameters: 4 / 4
000007FC: Member 00 : 419329 sectors at LBA 63 on device 0204
000007FC: Member 01 : 419329 sectors at LBA 63 on device 0200
000007FC: Member 02 : 419329 sectors at LBA 63 on device 0201
000007FC: Member 03 : 419329 sectors at LBA 63 on device 0203
000007FC: Member 04 : 419329 sectors at LBA 63 on device 0202


You need actually search for "Final nonzero" to arrive at the data needed. Post the copy here. If they are available, we may use these to speed up the process - so you do not need to rescan the RAID. The leftmost 8-digit hex numbers may be missing in your log (these are only recorded if the debug logging is set). The part of interest starts with "Final nonzero layout scores" and ends with the last "Member XX" entry.

Donnie
16th August 2007, 12:58
ZAR never finished reconstructing the RAID5 array thanks to a power outage. I was able to reconstruct the order of the drives by hand in about an hour, knowing what the stripe size was. Finding a partition table on the first disk dropped the permutations down from 24 to six. Got easy to figure out from there.

Originally when the array failed CHKDSK attempted to repair the array before I realized what was going on. Of course that made things worse. Now Windows recognizes that an array with an NTFS partition is there with the correct number of bytes used but no files show up.

Alexey V. Gubin
16th August 2007, 13:22
Ah, so you have an order of disks then. Good.


Download and install http://www.z-a-recovery.com/zar-83-rc3.exe
In "Advanced Configuration",
on the "Common filesystem analysis" tab check "Quick Scan" is set to "Automatic"
on the "Overrides" tab, set "Sector offset of the 0th cluster" to 0. Note this is a one-shot switch. You need to set this every time you start ZAR.
In the runtime control panel, set both cache sizes to minimum.
Proceed with the recovery as previously.A run against 1M files has just completed using about 300MB RAM. The crafted test set may be different from the actual data you have, but I hope it will do the trick.

Donnie
16th August 2007, 14:47
I've had RC3 running for about 20 minutes now. It's finding many more data fragments than any of the previous versions, during the quick scan.

I'll keep you posted as to final results. RC2 would take about 1.5 hours.

Alexey V. Gubin
16th August 2007, 14:58
There is a quirk in autosave. Near the end of the run, the autosave is performed. It might take a long time with 3M files, during which the program appears frozen but still uses 100% of one CPU. It will eventually unfreeze, so you just check the Task Manager.

Donnie
17th August 2007, 14:38
RC3 ran all last night and found many files that ended up useless. Photos (that I'm most interested in) would have the thumbnail intact (in JPGs) but when opening the file it would display a number of mixed up JPG images. I figure this is due to incorrect RAID disk order on my part.

I have ZAR determining the order of the disks now. I'll post a log when it gets it gets as far as "Final nonzero RAID layout scores"

Alexey V. Gubin
17th August 2007, 15:10
Unfortunately, we were unable to improve the RAID performance on the part of disk scan (so far). RC3 contains certain improvement calculation-wise, but I do not know how it turns out on a 1.2 TB array. Plan is to apply the same logic to RAID scan as we use for NTFS with a quick scan mode, but again this is not possible fast enough. This quick scan thing has been in a works for quite a while already, and reworking RAID scanner the same way is not that easy.

Btw, do you use compression on that volume?

Donnie
17th August 2007, 16:27
Fortunately, no. :-)

Donnie
18th August 2007, 04:11
It's now completed 50% of RAID parameters detection. It will be another 320 minutes (based on the first 50%'s speed) to complete this.

Donnie
18th August 2007, 05:31
ZAR is sitting at 99% completion of "Detecting parameters (RAID5 Checkerboard)". Claims to be using 0% cpu, which is confirmed in the windows task manager. Very occasionally ZAR reports 1% or 2% CPU usage. Task manager also shows ZAR increasing its memory usage at a rate of 6kb/second. Total memory usage is about 840MB. Last few lines of the log are as follows:

Vote : 1; Count=123 ; separation
Vote : 0; Count=102 ; separation
Vote : 1; Count=46 ; separation
Vote : 2; Count=122 ; separation
Vote : 2; Count=160 ; separation
Performance: BuildSwitchList 2h 9m 59s
Vote RAID stripe size: 128; Count=135283 ; separation 83.7%
Purged a total of 815778 known false positives (stripe size mismatched)
Remaining 209399 records
Vote Start offset: 0; Count=67517 ; separation 99.2%
RCRD/RSTR bias @55030
RAID: Test RAID0
Performance: Ident data query (206607 of 1981482) 0s
Parity for disk 0105 at 1
Performance: Ident data query (222265 of 2021636) 0s
Parity for disk 0106 at 0
Performance: Ident data query (226332 of 2027020) 0s
Parity for disk 0107 at 3
Performance: Ident data query (249587 of 2175207) 0s
Parity for disk 0108 at 2
Filtered 17359 ghost entries
RAID: Test RAID5/Checkerboard
Vote Starting disk: 264; Count=2336 ; separation 1.4%

Should this pause in activity be normal?

Donnie
18th August 2007, 06:37
Task Manager reports ZAR's memory usage has dropped from 840MB earlier this morning to 23MB at this time. Task manager also has a memory discrepancy now of about 850MB. The log file has not changed.

Donnie
3rd September 2007, 14:38
Just for the sake of closing out this thread, I was able to recover my data. Found a java app, written by someone who had the same problem as me, used to recover a five disk RAID5 array by removing the parity and creating an image he could mount via a loopback device with Linux. Three attempts to get the disk order correct on my array gave me results. Got all my business data, photos, Word files, MP3s, home movies, all recovered.

I greatly appreciate the effort put into getting this program to work for me. Most would have given up much sooner. And although it didn't work in my specific case, optimizations made for the amount of data that I had potentially available to recover will make it much more likely for others in the future.

Alexey V. Gubin
4th September 2007, 07:12
Yes. We already have something capable of handling RAID and 5M files in a reasonable time, but that's much too late for your particular case.