Secure ripping

Discuss the current and future development of Max.
woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

So, what's wrong with using paranoia? It seems to work nicely, and I've used a different port (I think different) that clearly was able to not use the cache, as it routinely ran into disks of mine that it found issues with, and was able to deal or not deal with the issues based on the depth of the scratches. Some of these scratches weren't enough to cause my Denon to notice, or it is *VERY* good at correcting on the fly (more likely).

So, why the need to move away from paranoia? Or is it just not working right for you on your hardware? (it seems to work correctly on my 12" powerbook, but under 10.3 and now 10.4).

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

{Edited}

Ok, so I've been doing some serious digging on the 'net, and started running some tests. Tiger "broke" paranoia. I've had a customized version of paranoia for a while, and it was merilly rejecting heavily scratched discs. I just grabbed a few of those, and ripped them twice, and md5'd each copy. usually, I get the same results back. Previously, these disks would fill the progress meter in cdparanoia with '+' and '!' marks, and I'd hear the drive continually seeking back/forth over portions of the disk. No longer. It just quietly hums along, no seeks, no errors. I just ripped a disk with a pin-hole in the data-layer (can see light through it), and got a single '+' and '!' char in the rip results. No failure. This was while ripping with -z to enforce perfect results.

And my md5s are showing that the disks usually have the same results, but not always.

So, I think the issue is software, and it's something that's changed in the OS. I might be something where we just need to find the right io control command to send to the drive to tell it to disable the read cache.

I have a few contacts I can work to see what I can find.

This is finally the ripper I've been looking for. I'll gladly pitch in and help.

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

I took your idea, and your port of paranoia that's in the Max code, and banged on it some yesterday afternoon.

I was able to get something working with a "known bad" disc. Instead of reading the whole track multiple times until it turned out to get some sort of statistical certainty to the accuracy of the read, I did it block at a time, using a block of sectors slightly larger than the known size of the hardware buffer in the drive.

On my PowerBook with the UJ-835E, with a 2MB buffer, I was reading 892 sectors at a time.

I'd read block N, then N+1, then N again, into three different arrays in memory. Then I'd compare the two N buffers, and then I'd move on to the next point, and read N+2 and then N+1 again, and compare N+1 to N+1. I guess now I could read N, N+1, N, N+1, and do both comparisons, and it would clean up my code more than a bit.

I'm just using your osx_find_cdrom(), osx_init_drive(), and osx_read_audio() methods in osx_interface.c.

There's still a whole giant mess of stuff to implement on top of that, but it's at least somewhat of a start, and the 2MB or so reads are manageable chunks in memory.

Once a chunk is known to be good, it can be written to disk, or streamed out to an encoder. Since the WAV files are fixed size based on time, one could probably just write all the good ones out to disk, writing all bad sectors worth of samples as silence, and then backfilling the bad areas as they can be read.

One idea I had was to take the first bad sector after the full pass of two reads, and then read 892 sectors starting from there again. Only look at the sectors that "previously failed", and compare, then move to the next sector beyond that block, and read it, cache it, and then come back to the first set of "bad blocks".

Keep building up these sets of reads, interleaving between two locations that need to be read, so that one can compare them all later, and pull what appears to be a proper value for the sector/sample.

However, what all of this doesn't do is ensure that when you ask for sector 892, you don't get sector 893 or 891, or halfway through sector 892. I'm not sure how much that actually happens (EAC's sample offset feature), at least on modern drives.

CDParanoia (the version I have, different than yours, I think), DOES report a '+' at the site of a scratch, but only a '+'. Two back to back album rips confirm different files, though.

I've just been performing full bytewise comparisons, and it seems pretty fast. Although I'm not sure how much is read time and how much is comparison time.

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Post by sbooth »

I like your approach to reading a batch of sectors at a time vs. the whole track- it saves both time and memory (but mostly time)! It seems like an efficient approach, as well.

Out of curiousity, how did you determine the size of the drive's cache? I know this can be done with the Disc Burning interface, but haven't personally tried.

I had stopped development temporarily on my secure ripper because I ran into a logical difficulty. I have a bit array corresponding to the sectors (one bit per sector to save space), and once a sector was verified as good/correct the value in the array for that sector was set to 1 and the sector was saved to disk. The array was then re-scanned, and any sector corresponding to an array value of 0 was re-read and re-verified. The problem I was having was that when dealing with only one or two problematic sectors, the returned data would come from the drive's cache and would obviously not be reliable from an error-correcting point of view. Since there seems to be no way to disable drive caching in OS X, I was stuck.

Your method solves this problem by reading chunks equal to or larger than the drive's cache size. I was thinking in this direction as well, since it seems the only way to defeat the fact that drive caching cannot be disabled. Kudos for getting it done!

I think it would be advantageous if the number of matching reads was user-configurable, but this is mostly an implementation detail. This could be a bit taxing on memory, but I assume people who are concerned with bit-perfect rips would be willing to wait a while to get them.

I wonder if it is worth building on top of paranoia for this, or if we could bypass it all together and just use the raw read/write interfaces. Since we are more interested in the data coming off the drive, I feel that we could likely just use a file descriptor for the raw device node.

The higher-level ripper interface in Max just passes an array of sectors to be ripped to the ripper object; I think it should be fairly simple to plugin another ripper, either in place of or along with paranoia.

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

sbooth wrote:I like your approach to reading a batch of sectors at a time vs. the whole track- it saves both time and memory (but mostly time)! It seems like an efficient approach, as well.
Thanks, it seemed like a much easier approach to it, the only rub is doing it in a large enough block to ensure that you've totally wiped the cache...

Alternatively, if you read in blocks of 2x the cache size, all the funky stuff with caching the previous read fall out, as by the time you've finished the first read, half the data you've read will have fallen out of the cache and been replaced by the second half. I'll make that change and see if it works better (it GREATLY simplifies the code).
Out of curiousity, how did you determine the size of the drive's cache? I know this can be done with the Disc Burning interface, but haven't personally tried.
I didn't try to go programmatically, at least not yet. I used SystemProfiler to get the model number, and google to get the cache-size. I figured it would just be easiest to either default to something BIG, or to just make it user-selectable. A 32 or 64MB block size isn't really that much memory these days. 128MB of memory (two blocks) for dealing with the reads isn't a lot, but that's not the size I'd want to use for trying to re-read bad blocks. Maybe a good size for an initial double-pass to verify them, though.
I had stopped development temporarily on my secure ripper because I ran into a logical difficulty. I have a bit array corresponding to the sectors (one bit per sector to save space), and once a sector was verified as good/correct the value in the array for that sector was set to 1 and the sector was saved to disk. The array was then re-scanned, and any sector corresponding to an array value of 0 was re-read and re-verified. The problem I was having was that when dealing with only one or two problematic sectors, the returned data would come from the drive's cache and would obviously not be reliable from an error-correcting point of view. Since there seems to be no way to disable drive caching in OS X, I was stuck.

Your method solves this problem by reading chunks equal to or larger than the drive's cache size. I was thinking in this direction as well, since it seems the only way to defeat the fact that drive caching cannot be disabled. Kudos for getting it done!
Another idea had would to just use the first n sectors as a "make the drive forget what it just read" area to read. Works fine for any sectors not in that block, and if you need to read in that block, instead grab the end of the disk instead. Or, just do the whole read 2x cache size block, and only pull the sectors out that you need in order to get the new values.

You should be able to just sit and re-read 2x the cache size over and over again like paranoia does with single sectors...

Of course, one thing I've NOT been doing is trying to ensure that the stream is accurate wrt synchronization. But I thought all modern drives are "AccurateStream" drives in EAC parlance.
I think it would be advantageous if the number of matching reads was user-configurable, but this is mostly an implementation detail. This could be a bit taxing on memory, but I assume people who are concerned with bit-perfect rips would be willing to wait a while to get them.
Well, from a programming point of view, it should be an easily tweakable value, which makes it really easy to make a pref in Cocoa...
I wonder if it is worth building on top of paranoia for this, or if we could bypass it all together and just use the raw read/write interfaces. Since we are more interested in the data coming off the drive, I feel that we could likely just use a file descriptor for the raw device node.

The higher-level ripper interface in Max just passes an array of sectors to be ripped to the ripper object; I think it should be fairly simple to plugin another ripper, either in place of or along with paranoia.
And this is where I start to get out of my depth as to what paranoia is actually doing.

So paranoia is doing a couple things:

- reading the raw sectors, and checking that the sectors match on repeat reads
- validating the C2 errors
- doing jitting correction (not needed on "AccurateStream" drives?)
- ??

C2 errors are something I just totally don't understand (mainly because I haven't tried to look up what they are). I thought that since the cdda stream is straight bits, arranged in sectors made of samples, with samples being the only "findable" piece in the system.

The app asks the drive for some sectors, and the drive responds with the right amount of data, starting very close to where it thinks it should be. Unfortunatley, most of what I know about the process is based on the xiph.org website for paranoia and the EAC website. Luckily, I learn fast.

Any idea where to find more docs on stuff like C2 errors and the like? I'll be going back through the eac and xiph sites to see what I can find as far as links go.

Unreleated: Your forums crash my netgear ap/router. No idea why. But after a couple page loads from here, it's time to power-cycle the ap/router.

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Post by sbooth »

woody77 wrote:Alternatively, if you read in blocks of 2x the cache size, all the funky stuff with caching the previous read fall out, as by the time you've finished the first read, half the data you've read will have fallen out of the cache and been replaced by the second half. I'll make that change and see if it works better (it GREATLY simplifies the code).
I like this idea- it shouldn't take any time at all to read and discard 2MB or so worth of data off the disc.
I didn't try to go programmatically, at least not yet. I used SystemProfiler to get the model number, and google to get the cache-size. I figured it would just be easiest to either default to something BIG, or to just make it user-selectable.
I will see if I can figure out the API for this- it shouldn't be too difficult. More later...
You should be able to just sit and re-read 2x the cache size over and over again like paranoia does with single sectors...
I like this approach because it is simple and elegant. EAC is great because it knows so much about the drive hardware and compensates for it, but Mac OS X seems to want developers to be at least semi-removed from the hardware specifics. I think following this paradigm in code for Mac OS X is worthwhile.
Of course, one thing I've NOT been doing is trying to ensure that the stream is accurate wrt synchronization. But I thought all modern drives are "AccurateStream" drives in EAC parlance.
I'm not intimately familiar with the names EAC uses for drive features, but I believe that you are correct. I would also assume any drive in use on a 10.4 system would be reasonably modern.
And this is where I start to get out of my depth as to what paranoia is actually doing.

So paranoia is doing a couple things:

- reading the raw sectors, and checking that the sectors match on repeat reads
- validating the C2 errors
- doing jitting correction (not needed on "AccurateStream" drives?)
- ??
The more I think about the more I think that using a homegrown ripper would be advantageous. Paranoia doesn't support pregaps, or enhanced cds, or drive offsets natively. I think that it would be simpler to implement these from scratch, along with a new ripper, than back-patching paranoia.
Any idea where to find more docs on stuff like C2 errors and the like? I'll be going back through the eac and xiph sites to see what I can find as far as links go.
I'm a big fan of http://www.chipchapin.com/CDMedia/cdda7.php3.

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

So, I reworked my code a bit, and simplified things by reading 4MB blocks, and just reading the 4MB block twice in a row. Since it's big enough, it just forgets the first half of the read, and everything on the second read doesn't come out of the disk.

At least, the same cd showed the same bad 5-6 sectors.

Running like this, one can just read the same block over and over and over again until the sectors are cleaned up, and then move on to the next block. Reading 4MB at a time seems to take about 1-2 seconds once the disc is spun up, and reading continually when spun up seems to help.

Also, setting the read speed to something other than the fastest would probably help, although the speed could be dropped to 1x for reading blocks with bad sectors, and then restoring it to full speed again for the next block.

Did you send an e-mail to the Darwin list about the FUA bit? I stumbled across that on some searching today.

I spent some time poking at the Darwin code, comparing 10.3.9 and 10.4.6. Somewhere in there, they made a change that either enabled the cache, or started enabling it for CDDA. I've yet to find it, though. Although I also might be able to poke some connections and find out what's going on (advantages of being local to the Bay Area and having friends who are friends of people at Apple).

Anyway, if you want, I can roll up my current code into a C++ class and send it your way. It's a C++ main() with a few helpers functions for reading/processing blocks and then validating the blocks.

I definitely agree that OSX is leading towards the "go simple" route with the reading, hopefully it works well enough...

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Post by sbooth »

Yes, that was me asking about the FUA bit on the Darwin lists (http://lists.apple.com/archives/darwin- ... 00086.html). I also filed a bug report/feature request for FUA (Apple bug number 4424205) but for some reason their radar won't let me view the bug anymore.

I like the idea of slowing the drive speed when an error is encountered- this is something that we have an ioctl() for and would probably help in a majority of cases.

You seem to be more savvy than I am about the darwin code- I have never taken a look at it so I don't know what they might have done for 10.4. I also did not own a Mac before 10.4 (unless you're counting System 7!), so I can't really say how things changed behaviorally.

I wonder what Leopard will do?

I would very much appreciate your code; it doesn't even have to be rolled into a class. I would likely have to redo the wrapper anyway to integrate it with Max.

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

sbooth wrote:Yes, that was me asking about the FUA bit on the Darwin lists (http://lists.apple.com/archives/darwin- ... 00086.html). I also filed a bug report/feature request for FUA (Apple bug number 4424205) but for some reason their radar won't let me view the bug anymore.
I figured so.
I like the idea of slowing the drive speed when an error is encountered- this is something that we have an ioctl() for and would probably help in a majority of cases.
I think EAC does something like that, which may have been where I got the idea.
You seem to be more savvy than I am about the darwin code- I have never taken a look at it so I don't know what they might have done for 10.4. I also did not own a Mac before 10.4 (unless you're counting System 7!), so I can't really say how things changed behaviorally.

I wonder what Leopard will do?
I've done a fair bit of C++ development (professionally), and the Darwin code is amazingly clean code. Very modularized, very well commented. It's rather shocking, actually. Might be different in the kernel itself, but the IOKit and ATAPI code are all very clean.

My powerbook is just old enough to have come with Panther, and I didn't get around to upgrading to Tiger until recently. Now I'm curious if paranoia actually ever worked right in the past. Although I have seen it return '!' and 'V' marks before when ripping, so I'm pretty sure that it was mostly functional. I just never bothered to rip a scratched disk twice to ensure that it actually returned the same rip each time.
I would very much appreciate your code; it doesn't even have to be rolled into a class. I would likely have to redo the wrapper anyway to integrate it with Max.
I'll get it packaged up, and e-mail it to you. You going to be at all interested in other people working on the Max code?

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Post by sbooth »

woody77 wrote:I'll get it packaged up, and e-mail it to you. You going to be at all interested in other people working on the Max code?
Yes, absolutely! (Though it would be hard for me relinquish the reins as the only svn commiter :))

woody77
Posts: 13
Joined: Sun Apr 16, 2006 1:36 am

Post by woody77 »

I probably won't be able to get it packaged up until tonight. I had other stuff going on last night, and today well, I need to work. :)

I'm going to roll it into a class instead of the loose functions so that I can fix one last nagging bug. Will take maybe an hour, once I sit down to actually do it.

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Update

Post by sbooth »

For all those interested, I've finished the first version of this ripper. It's all in svn, and the code should compile with the required frameworks (downloaded separately).

I look forward to all feedback!

shanecavanaugh
Posts: 68
Joined: Sat Jan 14, 2006 12:32 am
Contact:

Post by shanecavanaugh »

I've tested three CDs and on all of them Max has gotten an error 33% into the last track, saying "An error occurred while ripping tracks from the disc 'Album name'. Unable to read from the CD." None of them are Enhanced. One has 4 tracks, one has 9, and one has 11. There's no need to go through the whole CD; it's reproducible just by ripping the final track.

User avatar
sbooth
Site Admin
Posts: 2456
Joined: Fri Dec 23, 2005 7:45 am
Location: USA
Contact:

Post by sbooth »

shanecavanaugh wrote:I've tested three CDs and on all of them Max has gotten an error 33% into the last track, saying "An error occurred while ripping tracks from the disc 'Album name'. Unable to read from the CD." None of them are Enhanced. One has 4 tracks, one has 9, and one has 11. There's no need to go through the whole CD; it's reproducible just by ripping the final track.
Ah, the old "off by one" error strikes again!

Turns out I was using the lead out as the last sector for the last track, instead of sector prior to the lead out. I've fixed this in svn now and tested it on a disc, and it works.

Thanks for finding this so quickly!

LagunaSol
Posts: 24
Joined: Wed Jan 18, 2006 10:31 pm

Post by LagunaSol »

Can't wait to see the release of the secure ripping version. I dumped my PC and PowerMac for an Intel iMac but didn't manage to get EAC running using Parallels. I don't care to reboot (Boot Camp) just to rip CDs, so I'm anxious for secure ripping in Max.

Post Reply