Character encoding: Difference between revisions
Dan Polansky (talk | contribs) No edit summary |
Dan Polansky (talk | contribs) No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
FreeMind does not write any BOM natively; given it writes XML character entities, it would make no sense. | FreeMind does not write any BOM natively; given it writes XML character entities, it would make no sense. | ||
==XML declaration== | |||
FreeMind does not write XML declaration; it starts directly with the map element. The XML declaration would be like the following: | |||
* <nowiki><?xml version = "1.0"></nowiki> | |||
* <nowiki><?xml version = "1.0" encoding = "UTF-8"?></nowiki> | |||
* <nowiki><?xml version = "1.0" encoding = "iso-8859-1"?></nowiki> | |||
FreeMind does not read XML declaration either; the mind map has to start with the map element. | |||
==Implementation== | ==Implementation== | ||
Reading: | |||
* Class FileReaderCreator in MindMapMapModel uses UTF-8 as the character encoding: 'return new UnicodeReader(new FileInputStream(mFile), "UTF-8");' | * Class FileReaderCreator in MindMapMapModel uses UTF-8 as the character encoding: 'return new UnicodeReader(new FileInputStream(mFile), "UTF-8");' | ||
* The above class is instantiated in MindMapMapModel.loadTree(final File) method. | * The above class is instantiated in MindMapMapModel.loadTree(final File) method. | ||
* Which is called from MindMapMapModel.load(File file). | * Which is called from MindMapMapModel.load(File file). | ||
* Class UnicodeReader determines the encoding from byte order mark (BOM), if any; it seems to take the passed-in "UTF-8" in case there is no BOM. | * Class UnicodeReader determines the encoding from byte order mark (BOM), if any; it seems to take the passed-in "UTF-8" in case there is no BOM. | ||
Writing: | |||
* XMLElement.writeEncoded() encodes Unicode points such that unicode < 32 or unicode > 126 as XML character entities. | |||
Links: | Links: | ||
* [https://sourceforge.net/p/freemind/code/ci/master/tree/freemind/freemind/modes/mindmapmode/MindMapMapModel.java MindMapMapModel.java], sourceforge.net | * [https://sourceforge.net/p/freemind/code/ci/master/tree/freemind/freemind/modes/mindmapmode/MindMapMapModel.java MindMapMapModel.java], sourceforge.net | ||
* [https://sourceforge.net/p/freemind/code/ci/ae2e0364a92e71de2f85d3ba1ae2129a5736985d/tree/freemind/freemind/common/UnicodeReader.java UnicodeReader.java], sourceforge.net | * [https://sourceforge.net/p/freemind/code/ci/ae2e0364a92e71de2f85d3ba1ae2129a5736985d/tree/freemind/freemind/common/UnicodeReader.java UnicodeReader.java], sourceforge.net | ||
* [https://sourceforge.net/p/freemind/code/ci/master/tree/freemind/freemind/main/XMLElement.java XMLElement.java], sourceforge.net | |||
==Tracker items== | ==Tracker items== | ||
Line 27: | Line 40: | ||
==Limitations== | ==Limitations== | ||
* | * No support for XML declaration at the top of the mind map file; it is neither written nor read. | ||
* | * No way to choose, upon reading, from a variety of encodings in the file, which would be indicated e.g. by encoding attribute of the map element or in the XML declaration if it was there. This limitation seems very minor given one can use a conversion tool to convert from any encoding to UTF-8 or one can generate UTF-8 directly. | ||
* No writing in UTF-8, as per above. This is a major limitation: one of the spells of the XML format is that it is a plain text format that can be viewed in a plain text editors, but for non-Latin scripts, all characters end up as human-illegible character entities. | |||
==See also== | ==See also== | ||
* [[File format]] | * [[File format]] | ||
* [[Requests for enhancements#Use UTF-8 in the XML file to store | * [[Requests for enhancements#Use UTF-8 in the XML file to store unicode characters]] | ||
[[Category:Development]] | [[Category:Development]] |
Latest revision as of 16:56, 6 June 2023
FreeMind stores Unicode characters as XML character entities into mind map files.
Reading of UTF-8 encoded mind map files is supported, with or without UTF-8 byte order mark (BOM).
Reading of UTF-32BE, UTF-32LE, UTF-16BE and UTF-16LE seem to be all supported provided the mind map starts with byte order mark (BOM).
Writing of UTF-8 is probably not supported; there was a feature request for this.
FreeMind does not write any BOM natively; given it writes XML character entities, it would make no sense.
XML declaration
FreeMind does not write XML declaration; it starts directly with the map element. The XML declaration would be like the following:
- <?xml version = "1.0">
- <?xml version = "1.0" encoding = "UTF-8"?>
- <?xml version = "1.0" encoding = "iso-8859-1"?>
FreeMind does not read XML declaration either; the mind map has to start with the map element.
Implementation
Reading:
- Class FileReaderCreator in MindMapMapModel uses UTF-8 as the character encoding: 'return new UnicodeReader(new FileInputStream(mFile), "UTF-8");'
- The above class is instantiated in MindMapMapModel.loadTree(final File) method.
- Which is called from MindMapMapModel.load(File file).
- Class UnicodeReader determines the encoding from byte order mark (BOM), if any; it seems to take the passed-in "UTF-8" in case there is no BOM.
Writing:
- XMLElement.writeEncoded() encodes Unicode points such that unicode < 32 or unicode > 126 as XML character entities.
Links:
- MindMapMapModel.java, sourceforge.net
- UnicodeReader.java, sourceforge.net
- XMLElement.java, sourceforge.net
Tracker items
- #860 RC4 regression: incorrect viewing of UTF-8 map, bug, 2010-03-10, sourceforge.net
- #998 Accents in RC11 and RC12, bug, 2010-12-05
- #882 Why not UTF-8 file?, FR, 2015-12-21, sourceforge.net
- #827 saving .mm files in UTF-8, FR, 2012-03-08, sourceforge.net
- #67 Utf-8 aware, and better Chinese characters, patches, 2006-04-07, sourceforge.net -- points to User:Jiangxin/Better chinese characters support
Limitations
- No support for XML declaration at the top of the mind map file; it is neither written nor read.
- No way to choose, upon reading, from a variety of encodings in the file, which would be indicated e.g. by encoding attribute of the map element or in the XML declaration if it was there. This limitation seems very minor given one can use a conversion tool to convert from any encoding to UTF-8 or one can generate UTF-8 directly.
- No writing in UTF-8, as per above. This is a major limitation: one of the spells of the XML format is that it is a plain text format that can be viewed in a plain text editors, but for non-Latin scripts, all characters end up as human-illegible character entities.