Line by line?
Hi, I'm about 2 days into my Word Macro. I'm pretty well set for Search (by just about any condition) / (replace with conditions).
What I'd really love to have is this. Start at the top, and line by line or word by word get... ThisText = ThatLine (or ThatWord)... Then, I'm comfortable with accessing ThisText ... look it over in VBScript and figure out if I want to mangle it up or not. After some code crunching, I'd end up with MyNewString which is a chopped up version of the original ThisText. At this point, I want to then change ThisText to MyNewString and then move to the next (Word, Paragraph). Can I iterate line by line or word by word through the text of my document? I'd like to read a line or word such as... 1.1.2.3.2.1.5 and then decide to chop that back to be 1.2.3.2.1.5. I have lines like this: 1.2.1.1 SomeHeading (ReqID 4035) And other lines that say: Submit Button - See [RQ4035] I'd love to parse line by line and find the [RQ4035]. When I see that, I know I need to then scan the doc for the line that contains (ReqID 4035)... and then parse that string to get the 1.2.1.1 out.. Go back to the See [RQ4035] and replace that with See 1.2.1.1. I would really love to be doing some looping.. and getting 'next line'.. Anyone know how to do that, or if It's not possible and I can stop dreaming about it.. :-) Thanks! Back to top |
Macros use vba, or did, I haven't followed MS stuff much lately, maybe they switched to vbscript, don't know. Either one will let you loop, and any other basic stuff, but, again, I simply don't follow MS products any more, sorry couldn't be of more help.
O'reilly press has a vba book out, that would answer all your questions, I'm sure there's no problem looping however, that's a basic scripting function. Back to top |
Thanks TechAdmin...
The macro could be vba, or vbs.. I can handle the looping and parsing my strings and such... What I'm not able to get thus far is how to hook to the word doc to get the lines or words out in sequence. For i = 1 to 1000000 ThisLine = SomeHookIntoWord(i) NewLine = DoMyThingToThisLine(ThisLine) SomeHookIntoWord(i) = NewLine next or do Word.MoveNextLine ThisLine = Word.GetText NewLine = DoMyThingToThisLine(ThisLine) Word.SetText = NewLine Loop until Word.EOF Once I can get the lines one at a time, I'll be able to stuff things into a VB memory structure, change things up a bit, then put them back in place. How do I iterate through the text in Word? I would be happy to get either line at a time as I see on the screen... or everything between paragraph markers... If anyone can point out how. Most stuff I find out on the web in this regard deals with scripting macro to the Replace... dialog, or to the Find dialog. I already do some good things with Find text via a dozen parameters, replace all with other text defined by a bunch of parameters.. I'm looking for stepping through line by line.. (as I would with a .txt file). I don't think a vba book will help me hook into Word like that. I think it would only help with vb syntax... Thanks! -Daron Back to top |
I doubt there is a way to get true line by line, since each paragraph marker is really the only delineating element. If you open a .doc document in a text editor, you can see what it actually contains. For a straight text document, there are the top and bottom formatting information, the meta information, then the text is in blocks, just like in a text editor, one line per block.
However, if there are any advanced formatting items, lists, tables, etc, this simple structure vanishes, and the actual lines in the code will become much more complex. Word 2003, which in its professional version, the one nobody has, stores the document in a proprietary xml format. Same with OpenOffice.org Writer, although that's an open xml standard, although it's compressed. Of course, this stuff would be totally easy to do if you were just handling a straight text document, nothing to it, a little PERL and you're off and running. You can also give openoffice.org writer a try, they use a different macro system, which might or might not work better for your needs, I'm not familiar with it, but you never know. Back to top |
Thanks Tech...
I'd settle for iterating paragraph by paragraph.. :-) I need to hand the macro off to a group of Word users, tech writers... They export a report from application X as a Word doc... Then, they run the macro and hit save... What-lah.. Beautiful document, all fixed up. They wouldn't be willing to run it through the alternative editor. Like I said, I've been able to do a lot of polishing with "Find, Replace All" (fonts, sizes, extra paragraph markers, etc..).. It's good.. it could be better if I can step through the document. Thanks for your feedback... Since I'm not deep in the world of Word Macro, I value your replies and now pretty much understand that it's not that I am missing out on how to interate through the doc, probably I can't do it.. I can stop searching for that method. Again, if anyone happens by here and reads these posts and has gone a level or two deeper and can do what I'm asking, I'd sure appreciate that pointer. -Daron Back to top |
Greetings Daron,
I'm in a similar situation: wanting to export certain tables to Excel (create the spreadsheet, easy enough), either using a properties file or user dialog to specify which. In my case they may or may not have captions, but are generally preceded by a paragraph with heeading style and optional content before thetabe starts. The following is just my opinion, and I'm still investigating (Word 2003), so feel free anyone to contradict me, civilly if at all possible, with evidence (ie., working code) to support it :) Some of it borders on a rant, so apologies in advance. If you look into Word's VBA for Word Help (Alt+F11, Help > MS VisualBasic Help, MS VB Reference, Objects, D, Documents (whew!) you'll find that its properties include a lot of things like Paragraph and Table collections, along with mny others. Apparently there is not a more generic collection that lets you walk through and determine what type of thing you have via some property/name, and then move onto more specific things within it. The approach seems to be: have the user highlight, and use Selection to know where you are. I suspect Microsoft has intentionally lobotomized the Word object model to discourage us rocket scientists from writing our own converters (treads on their intellectual property perhaps). You can get collections of things, but not their physical ordering with respect to their peers: it seems a paragraph doesn't know if a table precedes or follows it, but they can separately be retrieved as groups, and appear to be in order. In my case, Tables don't have a Caption property and the ID is blank in my test document (and I suspect in the documents I'm wanting to process). Thanks, Mr. Gates! No chance of getting my upstream mates to help out either with more discipline in their content. I've also done work with FormFields in the past, and it's possible to know what kind of field you're dealing with, but I didn't care at that time what else was in the document. I went to the trouble to develop the document template that was used with them, and ensured that the fields had bookmarks, so they had identifying information to help. In the present case, we both care about order: you want to walk through and do things, and I want to identify uniquely what table I want, cross-referenced to something a user will understand. The best you may be able to do is walk through the ActiveDocument.Paragraphs and use the properties of each object to do text mangling as you go, but not without injected intelligence I suspect. This probably is not the whiz-bang solution you were hoping for, but hey, whadda ya want for free? :) Best of luck. If I get a breakthrough, I'll post it. Back to top |
And as a trivial code fragment, this may point you in the direction you want:
:: Code ::
Sub enumerateWords() For Each oWord In ActiveDocument.Words Debug.Print oWord.Text Next oWord End Sub The output goes to the Immediate Window. Look into the properties of a Range object; each oWord above is actually a Range. In my case, this actually makes its way through my tables as well, so if I remember what the first cell in each table contains via the Tables collection, it may be possible to do what I want, though not all that efficiently. The Document object supports a Sentences collection as well. Good luck. Back to top |
Thanks for a well thought out solution, zeromaster, your postings are refreshingly informative, hope to see you around here now and then if you have the time. Always nice to see someone who is actually thinking about questions here.
jeffd Back to top |
All times are GMT - 8 Hours |