Page 1 of 1

Possible hyperlink bug

Posted: Wed Mar 25, 2020 6:42 pm
by jgkoehn
Greetings Sergey,
A coworker found this.
Please load the attached RTF into a compiled demo using the latest RVF. I believe 18.3
Now save it without changing anything and now look at the RTF code. The hyperlink code appears to double for the unicode links.
Now load that one and save again. The hyperlink code appears to double again for the unicode links.
Testing popups for lemma Greek.rtf
(49.97 KiB) Downloaded 1013 times

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 8:42 am
by Sergey Tkachenko
Sorry, I cannot reproduce the problem. At least, if I save RTF, open it, and save RTF again, these two new RTF files are identical.
Please tell me what's exactly wrong.

PS: there is one effect, which, I believe, is undesired: a path to RTF file is added to your custom hyperlinks, because the component thinks that these links are local.
There are two options to avoid it:
1) Assign RichView.RTFReadProperties.BasePathLinks := False
2) Or you can assign your own function to RVIsCustomURL variable from RVFileFuncs unit.
It is defined as

Code: Select all

type
  TCustomRVIsURLFunction = function(const Word: TRVUnicodeString): Boolean;

const
  RVIsCustomURL: TCustomRVIsURLFunction = nil;
Assign a function that returns True for strings started from 'tw://'

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 1:18 pm
by jgkoehn
Greetings Sergey. Thanks for the tip on the local link. I had turned that off.

When I load it in RVF 18.3 in a demo or tje component and save it to rtf. Then look at the code tje rtd for the unicode libk doubles. It looks fine in the RVF viewer but the underlying code in the rtf itself has the problem.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 1:26 pm
by Sergey Tkachenko
What do you mean by "doubled"?
Unicode characters may be duplicated by ANSI characters. This is an optional feature, it can be turned off by excluding rvrtfDuplicateUnicode
from RichView.RTFOptions.
But in any case, RTF must be correct. RTF readers that understand Unicode ignore duplicate ANSI characters, RTF readers that do not understand Unicode ignore Unicode characters.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 1:29 pm
by jgkoehn
I will try to send a screenshot. It doubles then doubles again for each load and save. Thank you for all you do.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 4:38 pm
by jgkoehn
Ah you are correct. No bug upon multiple tests. I think it is what you said this rvrtfDuplicateUnicode which is correct.
Thank you for working through this with us.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 5:51 pm
by jgkoehn
Ah I see I misunderstood my co-worker on this one.
Here is the actual situation.
Note these two lines:
Hyperlink from MS Word: (Edit after converted by RVF)

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?t=\uc1\u7936 ?\u956 \'b5\u8053 ?\u957 ?\uc0"}{\fldrslt \plain \f6\ul\fs20\cf1 \u7936 }}
Hyperlink we make in our program:

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?\uc1\u225 \'e1\u188 \'bc\u8364 \'80\u206 \'ce\u188 \'bc\u225 \'e1\u189 \'bd\u181 \'b5\u206 \'ce\u189 \'bd\uc0"}{\fldrslt \plain \f6\fs20\cf1 \u7936 }}
For some reason only part of the unicode in the MS Word is coming through we are not sure why. Please note this unicode is polytonic greek. It is like the ansi is not getting fully converted. Is there a setting we have missed?

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 7:47 pm
by Sergey Tkachenko
It's exactly what I described in my previous reply: for each Unicode character, its non-Unicode alternative is written. These non-Unicode characters are ignored by RTF readers that support Unicode in RTF (i.e. all modern rich text editors).
There are no cumulative duplication, just one Unicode character (\uNNN) followed by one non-Unicode character.
You can see, both TRichView and MS Word write these alternative non-Unicode characters ('?' in MS Word's RTF are these alternative characters as well).

If you exclude rvrtfDuplicateUnicode from RTFOptions, these non-Unicode alternative characters will not be written by TRichView. They are not necessary.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 9:23 pm
by jgkoehn
Greetings Sergey,
I think I understand. So when RVF reads a MS Word RTF it brings part of the code in as ?
Here is the same MS Word RTF for that code.

Code: Select all

{\field\fldedit{\*\fldinst {\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \hich\af38\dbch\af11\loch\f38  
\hich\af38\dbch\af11\loch\f38 HYPERLINK "tw://[strong]?t=}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u7936\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 
\f428\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ec}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u8053\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 
\f428\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ed\loch\f428 "}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \hich\af38\dbch\af11\loch\f38  }{\rtlch\fcs1 \af38 \ltrch\fcs0 
\lang2057\langfe1041\langnp2057\insrsid2912444 {\*\datafield 
00d0c9ea79f9bace118c8200aa004ba90b0200000003000000e0c9ea79f9bace118c8200aa004ba90b42000000740077003a002f002f005b007300740072006f006e0067005d003f0074003d00001fbc03751fbd030000795881f43b1d7f48af2c825dc485276300000000a5ab0003}}}{\fldrslt {\rtlch\fcs1 \af38 
\ltrch\fcs0 \cs53\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u7936\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 \cs53\f428\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ec}{
\rtlch\fcs1 \af38 \ltrch\fcs0 \cs53\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u8053\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 \cs53\f428\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 
\loch\af428\dbch\af11\hich\f428 \'ed}}}

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 9:44 pm
by jgkoehn
Hmms,
This does seem to load correctly in other editors, so I'm wondering if the other app we are working with needs changed.

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 9:47 pm
by jgkoehn
Sorry to take so much of your time Sergey,
I am confused as to why both of these have the same greek characters in a unicode enabled rtf viewer. But the second one has quite a few more characters is this the doubling you mentioned?
I recognize the rtf unicode here:
\u7936 \u956 \u8053 \u957 <This line has much less (Is this just a different unicode format?)
\u225 \u188 \u8364 \u206 \u188 \u225 \u189 \u181 \u206 \u189 <This line has much more. (Is this just a different unicode format?)

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?t=\uc1\u7936 ?\u956 \'b5\u8053 ?\u957 ?\uc0"}{\fldrslt \plain \f6\ul\fs20\cf1 \u7936 }}

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?\uc1\u225 \'e1\u188 \'bc\u8364 \'80\u206 \'ce\u188 \'bc\u225 \'e1\u189 \'bd\u181 \'b5\u206 \'ce\u189 \'bd\uc0"}{\fldrslt \plain \f6\fs20\cf1 \u7936 }}

Re: Possible hyperlink bug

Posted: Sat Mar 28, 2020 11:18 pm
by jgkoehn
Also some additional info from a co-worker.
You need to compare the files in a text editor to see the code. Is it normal to make 3 HYPERLINK the same for each Greek word, and the G281 only has one HYPERLINK
(Edit fixed image)
msword-.jpg
msword-.jpg (243.99 KiB) Viewed 32356 times
After saving
aftersave.jpg
aftersave.jpg (204.39 KiB) Viewed 32362 times

Re: Possible hyperlink bug

Posted: Sun Mar 29, 2020 7:17 am
by Sergey Tkachenko
In this document, some characters are loaded as separate hyperlinks. I'll try to optimize it.

Re: Possible hyperlink bug

Posted: Mon Mar 30, 2020 6:52 pm
by Sergey Tkachenko
As I said before, in this RTF, Unicode hyperlink is loaded in TRichView as several hypertext items. In TRichView, several hypertext items that have the same target are handled like a single hyperlink. When exporting to DocX, these items are saved as a single hyperlink as well. But when exporting to RTF, each item is saved as a separate hyperlink.

I re-checked this RTF. The link is loaded as several items because different characters in it have different Charsets.
TRichView has an option to ignore Charsets from RTF: assign RichView.RTFReadProperties.UseCharsetForUnicode = True. In this mode, RichView.RTFReadProperties.CharsetForUnicode will be applied to all text loaded from RTF, and this hyperlink will be loaded as a single item.

I modified RTF saving code. In the next update, adjacent hypertext items having the same target will be exported to RTF as a single hyperlink.

Re: Possible hyperlink bug

Posted: Mon Mar 30, 2020 7:22 pm
by jgkoehn
Thank you Sir, I can for now use the option you suggested as an option for the user. I look forward to the next update. Thanks for all your work. By the way I work with Jon Graef and Costas Stergiou at theword.net