Page 1 of 1

Fastest Way to Get All RVF Text

Posted: Tue Aug 17, 2021 2:49 pm
by standay
Sergey,

What is the fastest way to get all the plain text from an RVF file? I assume I'd have to load the file into an rve or rv first.

I found rve.GetTextBuf() and RVGetTextRange(). If rve.GetTextBuf() is the best way, do you have example code on how to use that? RVGetTextRange is no problem if that's what I should use.

Thanks

Stan

Re: Fastest Way to Get All RVF Text

Posted: Wed Aug 18, 2021 8:38 am
by Sergey Tkachenko
1) Yes, you need to load RVF file to TRichView (richview.LoadRVF). If you do not want to create a visual component, you can load it in TRVReportHelper (rvreporthelper.RichView.LoadRVF).
If you do not need to display this document, it's not necessary to format it (i.e. it's not necessary to call richview->Format / rvreporthelper->Init). So the slowest operation (formatting) can be skipped.

2) There are several ways to store document as text.

If this text is for displaying to a human, use GetAllText(richview) from RVGetTextW
(this function returns Unicode string; GetAllText from RVGetText is an ANSI analog).
It produces the result that is very similar to richview.SaveTextToStreamW.

The alternative way is RVGetTextRange(richview, 0, RVGetTextLength(richview))
This method returns a text string that has one-to-one correspondence to the original document (i.e., when you know the character position in document as (RVData, ItemNo, OffsetInItem), you can calculate the corresponding character position in this string using RichViewToLinear, and vice versa using LinearToRichView).

Re: Fastest Way to Get All RVF Text

Posted: Fri Aug 20, 2021 9:25 pm
by standay
Sergey,

Thanks for the ideas. I have both working, not sure which is faster. But they let me do my full text searches again so it's all good!

Stan

Re: Fastest Way to Get All RVF Text

Posted: Sat Aug 21, 2021 10:23 am
by Sergey Tkachenko
I think that the speed is about the same.
But if you use it for searching, RVGetTextRange is preferred, because it does not contain representation of non-text items.

Moreover, if you find a substring in text returned by RVGetTextRange, you can then find the position of this substring in document.
See the demo https://www.trichview.com/forums/viewto ... f=3&t=9278

Re: Fastest Way to Get All RVF Text

Posted: Sat Aug 21, 2021 9:34 pm
by standay
I wound up using RVGetTextRange. Yes, using it for searches of a set of rvf files. I run through them and add the text to a virtual stringtree, then I search the tree. That way, the first search take a few seconds, but after that they are very fast.

Stan