Well, it might be a "reasonable" assumption, but it's an incorrect one. NSString's initWithContentsOfFile: apparently assumes an encoding of NSASCIIStringEncoding when it can't determine the real encoding.
This, I'm embarrassed to say, bit me today. I had the following code that I was using to load a UTF-8 file with non-ASCII characters:
NSString *sourcePath = [[NSBundle mainBundle] pathForResource:@"act texts" ofType:@"txt"];
NSString *sourceData = [[NSString alloc] initWithContentsOfFile:sourcePath];
But the resulting strings didn't contain the correct non-ASCII characters that were in the file; every time there was supposed to be a diacritical or other non-ASCII character, the string contained two high-order (>128) ASCII characters. That's an indication that you've got UTF-8 data being loaded as ASCII (UTF-16 looks even weirder when it misses the encoding, making it easier to catch).
The solution is simple enough. Just explicitly tell it the encoding to use by calling initWithContentsOfFile:encoding:error: instead :
NSString *sourcePath = [[NSBundle mainBundle] pathForResource:@"act texts" ofType:@"txt"];
NSString *sourceData = [[NSString alloc] initWithContentsOfFile:sourcePath encoding:NSUTF8StringEncoding error:nil];
No comments:
Post a Comment