Rich text (development): Difference between revisions

Revision as of 15:52, 24 October 2007

Using HTML versus XHTML in rich text nodes

What follows is a preliminary analysis of the issues concerning HTML versus XHTML used for rich text in FreeMind.

AFAIK there are two separate questions: (1) should we store (a) HTML or (b) XHTML in nodes, and (2) should we (a) store only one thing or (b) store plain text in one attribute, and store HTML/XHTML in another attribute where HTML is available, modelling on email systems.

As for the first question, using HTML has the advantage of being straightforward: it is supported by JLabel, it is supported by Java HTML editing component, and it is the format now mostly used in web pages. An advantage of XHTML is that it is a flavor of XML and thus easily amenable to XSLT processing.

As for the second question, as soon as we would store also plain text, it would be automatically available to all XSLT processing, which would make the first question less decisive. However, it would considerably increase the size of mind maps stored on the disk, by my estimation by factor 1.5 as soon as a lot of rich text would be used.

Performing transformations of HTML to XHTML on the fly before performing preprocessing from FreeMind would not really save the day, as XHTML would still need to be stored in mind maps on the disk; if we would use HTML internally, we would have to convert XHTML to HTML upon loading a new map for all nodes, instead of doing that only upon nodes being shown for the purpose of JLabel.

If you find the other options more attractive, an important question is: is it really possible to process XHTML from within an XML attribute? Can someone demonstrate that? If yes, that would make XHTML rather attractive. But if not really, then using HTML would be as good as using XHTML.

Converting HTML from node attribute TEXT to plain text within XSLT

Converting HTML from node attribute TEXT to plain text within XSLT is virtually impossible. XSLT does not feature regular replaces, and it does not even feature simple text replaces (simple text replaces can be sort of implemented within XSLT though). My old view on this option follows.

I estimate that the most straightforward way of solving the related problems is to find out how to convert HTML into plain text within XSLT script. I currently do not have a sufficient knowledge of XSLT to judge on that; it should be possible using several regular expression replacements, modelled on what we already have in FreeMind in Java. As soon as we would be able to do that, there might even be a regular expression way to covert boldface and italics to open document format of OpenOffice. Admittedly, instead of having one conversion routine from HTML to plain text in FreeMind, it would have to be replicated in every XSLT script dealing with FreeMind mind maps.

Storing XHTML on par with FreeMind XML

The option of storing XHTML elements on par with FreeMind XML elements like <node> would require a considerable effort. The benefits of the effort would include

+ better support for XSLT transformations
+ more readable XML of FreeMind mind maps

The costs would include

- switching away from NanoXML/Lite to a more bloated technology for reading and saving of mind maps, meaning considerable slowing down upon loading and saving of mind maps. (That is not true; I was wrong. We would only have to adjust NanoXML so that it stops parsing XML within certain elements and reads everything within them as uninterpreted string instead. --Danielpolansky 06:34, 13 May 2006 (PDT))

By storing on par, I mean the following.

  <map>
    <node>
      <html>
        <body>Hello </br> Dolly.</body>
      </html>
    </node>
  </map>

I recommend to avoid this option. Out current option is

  <map>
    <node TEXT="& lt ; HTML & gt ; Hello & lt ; br & gt ; Dolly."/>
  </map>

A discussion shows the following possibilities of storing on par.

Storing HTML directly like < node >< html ></ html ></ node >
Storing HTML within content element like < node >< content >< html >< /html ></ content ></ node > and
Storing plain text within TEXT attribute of node element
Storing plain text as < node >< text >Plain</ text ></ node >
Storing plain text as < node >< content >Plain</ text ></ node >

Converting HTML to XHTML in Java

I find a conversion of HTML to XHTML possible with reasonable effort. I think there are not many more subtleties apart from those already addressed: closed tags < td >, < tr > and the like, closed tags < /br >, < /hr >, < img/ > and several others. We would be able to discover all the subtleties by reading XHTML standard and by empirical testing. (For reference, html2xhtml at shredzone.net, thanks to Dimitri)

However, W3C points out that converting HTML to XHTML is a tricky business.

The main problem of developing your own converter is that either you are sure your HTML is correct (and so you only need to fix cases, quotes in attributes, entitities and close the few HTML empty tags) or you will go crazy trying to cope with all the possible errors that the "official" web browsers accept but that would kill any simple parser.

Converting XHTML to HTML in Java

Searching the web using the expressions

"XHTML to HTML" java
XHTML2HTML java GPL
XHTML2HTML java

I have found very little about already existing code for converting XHTML to HTML in Java, with GNU GPL licenced code. Thus, my recommendation is to create a new method for that, in the class Tools. The method would be created with the use of [1] as a checklist. The method would use regular expression replaces, unless we see that this is too slow, which I do not think will be the case. (We already use regular expression replaces in method update() of NodeView.)

A preliminary code is as follows.

   public String xhtmlToHtml(String xhtmlText) {
      //Remove '/' from <.../> of elements that do not have '/' there in HTML
      return xhtmlText.
         replaceAll("<(("+
                    "br|area|base|basefont|"+
                    "bgsound|button|col|colgroup|embed|hr"+
                    "|img|input|isindex|keygen|link|meta"+
                    "|object|plaintext|spacer|wbr"+
                    ")(\\s[^>]*)?)/>",
                    "<$1>"));

Requirements on file format of FreeMind

We have identified the following requirements on the format of FreeMind. These have different priority; people may disagree about what the priority of their requirements are.

guarantee file format integrity, i.e. XML conformance.
be flexible for new features like (x)html, svg, mathml,...
stay reasonably backwards compatible, so that all existing generators of FreeMind mind maps work with later versions of FreeMind too. Put differently, create new format only as an extension of the old format, that is by adding new elements and attributes only
new format should allow for import of all old format features (from stable versions).
limit redundancy of information.
keep it easy to do XSLT transformations.
keep the format as simple as possible.
make it easy to create and edit mm files manually in an editor like Vim, Emacs or Notepad.
make it easy to create mm files programmatically.
the solution should be fast.
the solution should be safe.
the file format of both notes and nodes should be XHTML (a further specification of the first requirement)

For Danielpolansky, the requirement on staying reasonably backwards compatible is important; the requirement on the solution to be fast too; the requirement on XSLT transformations is completely unimportant in view of the possibility of replacing XSLT transformations with small Java functions doing the same with less footprint; the requirement on keeping the format simple is important; the requirement on making it easy to create mind maps programatically is important.

Sources of HTML coming to mind maps

Rich text in the form of HTML will be coming into FreeMind from the following sources:

directly entered in FreeMind using WYSIWYG editor
pasted from web pages
pasted from Microsoft Word documents, and other applications exposing HTML to the clipboard

To my experience, the pasting is much more usual and of higher volume than direct editing.

Improved HTML editing

FreeMind's long node may contain HTML. However, it needs to be edited in its source text form. We can improve upon that by

providing WYSIWYG HTML editor embedded in FreeMind, like Java based eKIT (project page LGPL licence).

enabling using external WYISWYG editor for editing HTML, like Microsoft FrontPage. This editor would be automatically opened upon clicking an HTML node, displaying it in WYSIWYG way. It is not clear how to get the changed node back to FreeMind. One option is to generate a temporary file, passing it to the external editor upon calling. However, how does the external editor tell FreeMind that the editing has ended? Futhermore, should such editing be modal? How to ensure such a modality and not get locked in it when the external editor crashes?

There is already some work done on integration of the WYSIWYG HTML editor Kafenio into FreeMind. --Danielpolansky 10:51, 6 Mar 2005 (PST)

Open-source WYSIWYG HTML editors in Java

There are the following open source WYSIWYG HTML editors written in Java.

SimplyHTML — GPL licenced; currently used by FreeMind
Kafenio — LGPL licenced, developed as a fork of eKit
eKit — LGPL licenced

The slowness of rendering of HTML nodes

Rendering of quite long HTML nodes is slow. If you have a HTML page corresponding to ten paper pages, then the rendering of the node upon unfolding takes several seconds. The related code is in the method update of the class NodeView. What takes so long is the statement

          setText(nodeText);

in the section

       if (nodeText.startsWith("<html>")) {
          // Make it possible to use relative img references in HTML using tag <base>.
          if (nodeText.indexOf("<img")>=0 && nodeText.indexOf("<base ") < 0 ) {
             try {
                nodeText = "<html><base href=\""+
                   map.getModel().getURL()+"\">"+nodeText.substring(6); }
             catch (MalformedURLException e) {} }
          setText(nodeText);

This result does not give us much hope of improving the speed easily, as the command just tells Java's JLabel to render the page. A solution would be to find a different HTML rendering Java component. We can also wait until Sun's Java virtual machine improves the speed of JLabel's HTML rendering.

-- What about pre-loading nodes which are likely to be expanded, using threading? Keep some relatively small cache of nodes in proximity to the last expanded node, and swap in the expanded node when the unexpanded one is clicked. No?

@@ Line 145: / Line 145: @@
 : There is already some work done on integration of the WYSIWYG HTML editor Kafenio into FreeMind. --[[User:Danielpolansky|Danielpolansky]] 10:51, 6 Mar 2005 (PST)
-=== Open Source WYSIWYG HTML editors in Java ===
+=== Open-source WYSIWYG HTML editors in Java ===
 There are the following open source WYSIWYG HTML editors written in Java.

Rich text (development): Difference between revisions

Revision as of 15:52, 24 October 2007

Contents

Using HTML versus XHTML in rich text nodes

Converting HTML from node attribute TEXT to plain text within XSLT

Storing XHTML on par with FreeMind XML

Converting HTML to XHTML in Java

Converting XHTML to HTML in Java

Requirements on file format of FreeMind

Sources of HTML coming to mind maps

See also

Improved HTML editing

Open-source WYSIWYG HTML editors in Java

The slowness of rendering of HTML nodes

Navigation menu

Page actions

Page actions

Personal tools

FreeMind

Search

Tools