So, I was reverse engineering some C# code that I had developed in Visual Studio to create some UML class diagrams. Running StarUML i ran into the following error:
Unrecoverable Parse Error 1Line 0Column
After poking around it seems StarUML does not like the UTF-8 Byte Order Mark (BOM) that Visual Studio adds to the beginning of the file by default. The Byte Order Mark is a unicode character placed at the beginning of a unicode text file to indicate the endianness of the multi-byte characters. For UTF-8 it is a three byte sequence 0xEF
, 0xBB
, 0xBF
. The byte order mark for UTF-8 is not required since endianness is not an issue with the 8-bit encoding format. However, for some reason Microsoft thinks it’s a good idea to put it there anyway. This mark must be removed from the file before StarUML will be able to parse it. There are a few ways to remove this mark. Visual Studio will remove the BOM by going to Save As… and selecting “Save With Encoding…” and selecting “UTF-8 without signature”. Once it is saved without the BOM, Visual Studio will not add it again. Unfortunately, there is no way to make this default for all files in Visual Studio and it must be done manually each time a file is saved for the first time. If you have access to a Linux machine or Cygwin installed you can batch modify existing files with this little script:
find . -name "*.cpp" -exec vim -c "set nobomb" -c wq! {} ;
Or, if you don’t have Cygwin but you do have vim
you can use this batch script.
for %%f in (*.cpp) do call vim -c "set nobomb" -c wq! %%f
Note, doing this in a batch script, it seems I need to hit [return]
each time vim
exits which isn’t the case with the Cygwin version.
This problem of course is not limited to StarUML as the byte order mark can cause problems in many other programs as well. It just so happened that I was using StarUML when I discovered this issue. The root cause is Microsoft adding this unnecessary mark. The fix however would be for applications, including StarUML, to handle the mark properly since the standard does say it “could” be there even though it is not necessary or even recommended.
Recommended reading:
- Learning UML 2.0
- Use Case Driven Object Modeling with UML: Theory and Practice
- UML 2 and the Unified Process: Practical Object-Oriented Analysis and Design (2nd Edition)
]]>