Advertise here




Advertise here

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In with Google Sign In with OpenID

calculating MD5 hash of a large file

smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
edited June 2011 in iPhone SDK Development
Hi,

I've downloaded a large file ( > 100 MB ) to my Doc's directory, and I need to calculate the file's MD5.

I can't load the entire file into an NSData object using initWithContents of file because that crashes the app due to lack of memory.

How can I load pieces of the file, say 10MB at a time, and calculate the MD5 incrementally?

Thanks
Post edited by smithdale87 on

Replies

  • smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
    edited May 2009
  • ziconicziconic Posts: 64Registered Users
    edited May 2009
    Well Objective C supports all of regular C, so there's always the old school fread to fall back on. :)
    <a href="http://www.storieapp.com" target="_blank">Storie</a><br />
    <font size="1"><a href="http://itunes.apple.com/us/app/storie/id460389012?mt=8" target="_blank">Download Storie now</a> - it's free!</font>
  • jtarajtara Posts: 406Registered Users
    edited May 2009
    Even 10MB is too much in one gulp. And there's no reason to read in such large chunks anyway.

    Frankly, if you need help with this, you need to get a basic education in programming before you tackle an iPhone app. It sounds like you are just copying an example, and expect us to write your code for you.

    But I'll try to help you help yourself anyway. Do you think that initWithContents is the only way to read data from a file? iPhone has an extensive API (actually, several APIs...) for reading files. How about hitting the "Help" button in XCode and doing some exploring?

    If that's too much, try the "Files" chapter in any iPhone development book. You do have one of those, don't you?

    After you've done that, if there's something you don't understand, feel free to come back and ask.
  • smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
    edited May 2009
    Lol thanks ziconic for the fread idea. Thankfully I don't have to use C to do this.

    Thanks jtara for the @$$hole response. I was merely looking for direction, as you pointed to the "Files" chapter, not a breakdown of how incompetent I was.


    Here's my solution for others that may run into this same problem:
    +(NSString*)fileMD5:(NSString*)path
    {
    	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
    	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
    	
    	CC_MD5_CTX md5;
    
    	CC_MD5_Init(&md5);
    	
    	BOOL done = NO;
    	while(!done)
    	{
    		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
    		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
    		if( [fileData length] == 0 ) done = YES;
    	}
    	unsigned char digest[CC_MD5_DIGEST_LENGTH];
    	CC_MD5_Final(digest, &md5);
    	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
    				   digest[0], digest[1], 
    				   digest[2], digest[3],
    				   digest[4], digest[5],
    				   digest[6], digest[7],
    				   digest[8], digest[9],
    				   digest[10], digest[11],
    				   digest[12], digest[13],
    				   digest[14], digest[15]];
    	return s;
    }
    
  • beausejourbeausejour Posts: 11Registered Users *
    edited July 2009
  • edsteredster Posts: 4New Users
    edited November 2009
    +(NSString*)fileMD5:(NSString*)path
    {
    	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
    	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
    	
    	CC_MD5_CTX md5;
    
    	CC_MD5_Init(&md5);
    	
    	BOOL done = NO;
    	while(!done)
    	{
    		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
    		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
    		if( [fileData length] == 0 ) done = YES;
    	}
    	unsigned char digest[CC_MD5_DIGEST_LENGTH];
    	CC_MD5_Final(digest, &md5);
    	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
    				   digest[0], digest[1], 
    				   digest[2], digest[3],
    				   digest[4], digest[5],
    				   digest[6], digest[7],
    				   digest[8], digest[9],
    				   digest[10], digest[11],
    				   digest[12], digest[13],
    				   digest[14], digest[15]];
    	return s;
    }
    

    @smithdale

    Thanks for posting your solution. Do you have this running in a production environment? I tried it against some large files and saw RAM consumption shoot up, the autorelease pool is not getting a chance to get drained inside the loop.

    So I went with a more specific alloc and release
    		NSData *fileData = [[NSData alloc] initWithData:[handle readDataOfLength:4096]];
    		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
    		
    		if( [fileData length] == 0 ) {
    			done = YES;
    		}
    		
    		[fileData release];	
    
  • smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
    edited November 2009
    I wound up not using this code exactly as is.

    Since I was downloading files from a server, I just updated the MD5 with each chunk of data that was received, this way I never have to waste the time of calculating the entire md5 of a huge file all at once.

    Where are the files coming from in your situation?
  • edsteredster Posts: 4New Users
    edited November 2009
    I wound up not using this code exactly as is.

    Since I was downloading files from a server, I just updated the MD5 with each chunk of data that was received, this way I never have to waste the time of calculating the entire md5 of a huge file all at once.

    Where are the files coming from in your situation?

    Hmm, interesting. My files are coming from the internet, but they are downloaded at a different point in time. Maybe I could move the hash check back in time to the point of download. In this case, there are several files that make up a single object, so that object checks the hashes on all of the files before marking itself ready. In some cases, some file components might be 30-40MB videos.

    Right now though, I'm not having a problem with the app consuming too much memory with the current implementation. It doesn't seem to take very long to calculate the hash, so I'm not too concerned. Its a background thing anyway. I was seeing memory shooting up over 30MB into danger land with your code above. Right now my app is stable at around 8MB.
  • smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
    edited November 2009
    So you're explicit alloc/release keeps memory usage under control, whereas the autoreleased objects (in my solution) cause problems?
  • edsteredster Posts: 4New Users
    edited January 2010
    So you're explicit alloc/release keeps memory usage under control, whereas the autoreleased objects (in my solution) cause problems?

    Time to circle back on an issue I thought was long resolved. Back when I first did this, it seemed like the explicit init and release was doing the trick. However, I'm getting some crash reports from an old iTouch user where they are running out of memory.

    So now I'm testing again and I'm seeing a huge spike in memory usage when its calculating the hash. Not doing the MD5 check, my app stays around 9.5 - 11MB for its entire life, while downloading as much as 200-300MB in the background. If I enable the MD5 check, memory will shoot up as high as 60MB or so which will kill old devices. Even after the spike, the memory seems to be creeping up over time even though 'Leaks' is not showing any memory leaking. So regardless of the memory spike, I think the hash routines are not freeing some allocated memory.

    If you are having good luck with hashing the bytes as they stream down rather than on the file, I might look at switching to something like that.

    The investigation continues...
  • smithdale87smithdale87 Posts: 4,436iPhone Dev SDK Supporter @ @ @ @ @
    edited January 2010
    Yea I never had much trouble out calculating the hash as the file was downloading. Perhaps that's the direction you should head in.
  • JoeKunJoeKun Posts: 1New Users
    edited September 2010
    edster wrote: »
    Time to circle back on an issue I thought was long resolved. Back when I first did this, it seemed like the explicit init and release was doing the trick. However, I'm getting some crash reports from an old iTouch user where they are running out of memory.

    So now I'm testing again and I'm seeing a huge spike in memory usage when its calculating the hash. Not doing the MD5 check, my app stays around 9.5 - 11MB for its entire life, while downloading as much as 200-300MB in the background. If I enable the MD5 check, memory will shoot up as high as 60MB or so which will kill old devices. Even after the spike, the memory seems to be creeping up over time even though 'Leaks' is not showing any memory leaking. So regardless of the memory spike, I think the hash routines are not freeing some allocated memory.

    Even though you use a non autoreleased object called fileData, you still have an autoreleased object there:
    NSData *fileData = [[NSData alloc] initWithData:[handle readDataOfLength:4096]];
    

    Remember that the result of -readDataOfLength: was allocated and autoreleased. So, in reality, your solution is worse than the one proposed by smithdale87, because you end up allocating the same objects, plus one.

    I came up with an implementation that really works. I wrote an article about this efficient way to compute the MD5 hash of a large file.

    I hope this helps.
  • cncoolcncool Posts: 44Registered Users
    edited June 2011
    What about wrapping the reads from the file in an auto release pool?
    +(NSString*)fileMD5:(NSString*)path
    {
    	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
    	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
    	
    	CC_MD5_CTX md5;
    
    	CC_MD5_Init(&md5);
    	
    	BOOL done = NO;
    	while(!done)
    	{
    		[B]NSAutoreleasePool * pool = [NSAutoreleasePool new];[/B]
    		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
    		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
    		if( [fileData length] == 0 ) done = YES;
                    [B][pool drain];[/B]
    	}
    	unsigned char digest[CC_MD5_DIGEST_LENGTH];
    	CC_MD5_Final(digest, &md5);
    	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
    				   digest[0], digest[1], 
    				   digest[2], digest[3],
    				   digest[4], digest[5],
    				   digest[6], digest[7],
    				   digest[8], digest[9],
    				   digest[10], digest[11],
    				   digest[12], digest[13],
    				   digest[14], digest[15]];
    	return s;
    }
    
Sign In or Register to comment.