+ Reply to Thread
Results 1 to 3 of 3

Replace html / RTF Tags in Excel with Formatted Text

Hybrid View

  1. #1
    Registered User
    Join Date
    09-03-2021
    Location
    Philadelphia
    MS-Off Ver
    Microsoft Office 2019
    Posts
    2

    Replace html / RTF Tags in Excel with Formatted Text

    Hello all,

    I'm new to this server but I've been working on excel VBA coding for a bit of time. I have made a pretty useful code that I want to share with the community.

    BIG NOTE: For some reason, I couldn't post this with the html tags, which is the whole purpose of this code. To make this code work, you'll need to replace "{" with "<" and replace "}" with ">". Sorry for the inconvenience.

    Here is the gist; I work with databases often, ms access in particular. I have encountered a problem before where I try to export Rich Text Format (RTF) into excel, but none of the formatting is retained. When you export directly from access, it may look like the sample below:
    {div}Hello {strong}{em}world{/em}{/strong}. I say &quot;Hi&quot;.{/div}
    With RTF this should appear this way:
    Hello world. I say "Hi".

    To apply this formatting to cells with less than 255 characters, a simple Range.Characters(i,j).Insert or Range.Characters(i,j).Delete method paired with a Range.Characters(i,j).Font.Bold would work to apply the formatting where the html tag for Bold ({strong}) appears. However, with RTFs this is usually not the case. So, I have developed the code below which can perform the same operation for cells with text longer than 255 characters:
    Sub ReplaceRichText()
    Dim xCell As Range
    Dim rng As Range
    Dim BoldArray(2, 100) As Integer
    Dim ItalicArray(2, 100) As Integer
    Dim UnderlineArray(2, 100) As Integer
    Dim CenterArray(2, 100) As Integer
    Dim i As Integer
    Dim Find As Integer
    Dim Start As Integer
    Dim xPos As Integer
    Dim yLen As Integer
    Dim ParsedString As String
    
     
    
    Set rng = Selection
           
    For Each xCell In rng
        xCell.Value = Replace(xCell.Value, "{div}", "") 'Remove div RTF code
        xCell.Value = Replace(xCell.Value, "{/div}", "") 'Remove div RTF code
        xCell.Value = Replace(xCell.Value, "&nbsp;", "  ") 'Remove "non-blank space" RTF code
        xCell.Value = Replace(xCell.Value, vbCr & vbCr, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbCr & vbCrLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbCr & vbLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbCrLf & vbCrLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbCrLf & vbCr, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbCrLf & vbLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbLf & vbLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbLf & vbCr, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, vbLf & vbCrLf, vbLf) 'Remove duplicated lines
        xCell.Value = Replace(xCell.Value, "{font face=Arial size=2 color=black}", "") 'Remove font RTF code
        xCell.Value = Replace(xCell.Value, "{/font}", "") 'Remove font RTF code
        xCell.Value = Replace(xCell.Value, "{font color=black}", "") 'Remove font RTF code
        xCell.Value = Replace(xCell.Value, "{font face=""Cambria Math"" size=2 color=black}", "") 'Remove font RTF code
        xCell.Value = Replace(xCell.Value, "&gt;", "}") 'Replace Greater Than RTF code with {
        xCell.Value = Replace(xCell.Value, "&lt;", "{") 'Replace Greater Than RTF code with {
        xCell.Value = Replace(xCell.Value, "&quot;", """") 'Replace Quote RTF code with "
        'Bold
            ParsedString = Replace(xCell.Value, "{em}", "") 'Remove Italic RTF code
            ParsedString = Replace(ParsedString, "{/em}", "") 'Remove Italic RTF code
            ParsedString = Replace(ParsedString, "{u}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{/u}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{center}", "") 'Remove Center Align RTF code
            ParsedString = Replace(ParsedString, "{/center}", "") 'Remove Center Align RTF code
            xPos = InStr(ParsedString, "{strong}")
            yLen = InStr(ParsedString, "{/strong}") - xPos - 8
            i = 1
            Start = 1
            Do While xPos } 0
                BoldArray(1, i) = xPos
                BoldArray(2, i) = yLen
                Start = InStr(Start, ParsedString, "{strong}") + 2
                Debug.Print "Bold", xCell.Address, xPos, yLen, Start
                xPos = InStr(Start, ParsedString, "{strong}") - (17 * i)
                yLen = InStr(Start + yLen + 17, ParsedString, "{/strong}") - InStr(Start, ParsedString, "{strong}") - 8
                i = i + 1
            Loop
        'Italic
            ParsedString = Replace(xCell.Value, "{strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{/strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{u}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{/u}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{center}", "") 'Remove Center Align RTF code
            ParsedString = Replace(ParsedString, "{/center}", "") 'Remove Center Align RTF code
            xPos = InStr(ParsedString, "{em}")
            yLen = InStr(ParsedString, "{/em}") - xPos - 4
            i = 1
            Start = 1
            Do While xPos } 0
                ItalicArray(1, i) = xPos
                ItalicArray(2, i) = yLen
                Start = InStr(Start, ParsedString, "{em}") + 2
                Debug.Print "Italic", xCell.Address, xPos, yLen, Start
                xPos = InStr(Start, ParsedString, "{em}") - (9 * i)
                yLen = InStr(Start + yLen + 9, ParsedString, "{/em}") - InStr(Start, ParsedString, "{em}") - 4
                i = i + 1
            Loop
        'Underline
            ParsedString = Replace(xCell.Value, "{strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{/strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{em}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{/em}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{center}", "") 'Remove Center Align RTF code
            ParsedString = Replace(ParsedString, "{/center}", "") 'Remove Center Align RTF code
            xPos = InStr(ParsedString, "{u}")
            yLen = InStr(ParsedString, "{/u}") - xPos - 3
            i = 1
            Start = 1
            Do While xPos } 0
                UnderlineArray(1, i) = xPos
                UnderlineArray(2, i) = yLen
                Start = InStr(Start, ParsedString, "{u}") + 2
                Debug.Print "Underline", xCell.Address, xPos, yLen, Start
                xPos = InStr(Start, ParsedString, "{u}") - (7 * i)
                yLen = InStr(Start + yLen + 7, ParsedString, "{/u}") - InStr(Start, ParsedString, "{u}") - 3
                i = i + 1
            Loop
        'Center
            ParsedString = Replace(xCell.Value, "{strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{/strong}", "") 'Remove Bold RTF code
            ParsedString = Replace(ParsedString, "{em}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{/em}", "") 'Remove Underline RTF code
            ParsedString = Replace(ParsedString, "{u}", "") 'Remove Center Align RTF code
            ParsedString = Replace(ParsedString, "{/u}", "") 'Remove Center Align RTF code
            xPos = InStr(ParsedString, "{center}")
            yLen = InStr(ParsedString, "{/center}") - xPos - 8
            i = 1
            Start = 1
            Do While xPos } 0
                CenterArray(1, i) = xPos
                CenterArray(2, i) = yLen
                Start = InStr(Start, ParsedString, "{center}") + 2
                Debug.Print "Center", xCell.Address, xPos, yLen, Start
                xPos = InStr(Start, ParsedString, "{center}") - (17 * i)
                yLen = InStr(Start + yLen + 17, ParsedString, "{/center}") - InStr(Start, ParsedString, "{center}") - 8
                i = i + 1
            Loop
        xCell.Value = Replace(xCell.Value, "{strong}", "")
        xCell.Value = Replace(xCell.Value, "{/strong}", "")
        xCell.Value = Replace(xCell.Value, "{em}", "")
        xCell.Value = Replace(xCell.Value, "{/em}", "")
        xCell.Value = Replace(xCell.Value, "{u}", "")
        xCell.Value = Replace(xCell.Value, "{/u}", "")
        xCell.Value = Replace(xCell.Value, "{center}", "")
        xCell.Value = Replace(xCell.Value, "{/center}", "")
        i = 1
        Do While BoldArray(1, i) } 0
            xCell.Characters(BoldArray(1, i), BoldArray(2, i)).Font.Bold = True
            Debug.Print "FrmtB", xCell.Address, BoldArray(1, i), BoldArray(2, i)
            i = i + 1
        Loop
        i = 1
        Do While ItalicArray(1, i) } 0
            xCell.Characters(ItalicArray(1, i), ItalicArray(2, i)).Font.Italic = True
            i = i + 1
        Loop
        i = 1
        Do While UnderlineArray(1, i) } 0
            xCell.Characters(UnderlineArray(1, i), UnderlineArray(2, i)).Font.Underline = True
            Debug.Print "FrmtU", xCell.Address, UnderlineArray(1, i), UnderlineArray(2, i)
            i = i + 1
        Loop
        i = 1
        Do While CenterArray(1, i) } 0
            xCell.Characters(CenterArray(1, i), CenterArray(2, i)).HorizontalAlignment = xlCenter
            Debug.Print "FrmtC", xCell.Address, CenterArray(1, i), CenterArray(2, i)
            i = i + 1
        Loop
        Erase BoldArray
        Erase ItalicArray
        Erase UnderlineArray
        Erase CenterArray
    Next
    End Sub
    Note: this code will work for the selected cells only.

    This code tackles each cell at a time, for this reason the operation may take longer for very large workbooks.

    The code works by first replacing the text codes which do not need any formatting applied to them; things like {div} are replaced with nothing, &quot; is replaced with ", and &nbsp; is replaced with two blank spaces. I have additional coding to remove duplicated carriage returns / line feeds which access seems to add for no reason. Also, I didn't bother to apply different font formatting here, for my purposes this is unnecessary.

    Next, the code inventories the now semi-stripped text for occurrences of html formatting codes. To get an accurate number of the position of the formatting code, all other formatting codes must be removed before inventorying the text. The position and length of the text to be formatted is stored in an array for each formatting style.

    Finally, the code strips the remaining formatting codes and cycles through each array to apply the different formatting styles. Before moving on to the next cell, each array is cleared.

    I believe this code should work very well for almost all MS Access formatting options (MS Access RTF is relegated to a short list of 10-15 formatting options as opposed to other html RTF applications). To expand on this code for your own purposes you can follow the following steps:
    1. Identify those html tags which DO NOT require formatting to be applied. These can be added to the first block of "Replace" operations. Follow this formula: xCell.Value = Replace(xCell.Value, {{{your formatting code}}}, {{{replacement text}}})
    2. Identify those html tags which DO require formatting to be applied. First, add a replace operation to each formatting loop. Follow this formula:
      ' Bold
      ParsedString = Replace(xCell.Value, {{{your formatting code}}}, "") ParsedString = Replace(ParsedString, {{{your formatting code}}}, "")
    3. Next, add a loop for your formatting code. Follow this formula:
      'Your Fomatting Loop ParsedString = Replace(xCell.Value, "{strong}", "") 'Remove Bold RTF code ParsedString = Replace(ParsedString, "{/strong}", "") 'Remove Bold RTF code ParsedString = Replace(ParsedString, "{em}", "") 'Remove Underline RTF code ParsedString = Replace(ParsedString, "{/em}", "") 'Remove Underline RTF code ParsedString = Replace(ParsedString, "{center}", "") 'Remove Center Align RTF code ParsedString = Replace(ParsedString, "{/center}", "") 'Remove Center Align RTF code {{{Add all other formatting options here as replace operations}}} xPos = InStr(ParsedString, {{{your formatting code}}}) yLen = InStr(ParsedString, {{{your formatting end code}}}) - xPos - Length of your formatting code i = 1 Start = 1 Do While xPos } 0 UnderlineArray(1, i) = xPos UnderlineArray(2, i) = yLen Start = InStr(Start, ParsedString, {{{your formatting code}}}) + 2 Debug.Print "Underline", xCell.Address, xPos, yLen, Start xPos = InStr(Start, ParsedString, {{{your formatting code}}}) - (Length of your formatting code plus the length of your formatting end code * i) yLen = InStr(Start + yLen + Length of your formatting code plus the length of your formatting end code, ParsedString, {{{your formatting end code}}}) - InStr(Start, ParsedString, {{{your formatting end code}}}) - Length of your formatting code i = i + 1 Loop

    If anyone has improvements to this code, suggestions, or questions please let me know. This made my life a lot easier, I hope it can help you out too.
    Last edited by alansidman; 09-09-2021 at 11:45 AM.

  2. #2
    Forum Moderator alansidman's Avatar
    Join Date
    02-02-2010
    Location
    Steamboat Springs, CO
    MS-Off Ver
    MS Office 365 insider Version 2504 Win 11
    Posts
    24,671

    Re: Replace html / RTF Tags in Excel with Formatted Text

    Code Tags Added
    Your post does not comply with Rule 2 of our Forum RULES. Use code tags around code.

    Posting code between [CODE] [/CODE] tags makes your code much easier to read and copy for testing, it also maintains VBA formatting.

    Highlight your code and click the # icon at the top of your post window. More information about these and other tags can be found at http://www.excelforum.com/forum-rule...rum-rules.html



    (I have added them for you today. Please take a few minutes to read all Forum Rules and comply in the future.)
    Alan עַם יִשְׂרָאֵל חַי


    Change an Ugly Report with Power Query
    Database Normalization
    Complete Guide to Power Query
    Man's Mind Stretched to New Dimensions Never Returns to Its Original Form

  3. #3
    Registered User
    Join Date
    09-03-2021
    Location
    Philadelphia
    MS-Off Ver
    Microsoft Office 2019
    Posts
    2

    Re: Replace html / RTF Tags in Excel with Formatted Text

    Thanks for the help.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. HTML translation file - extract the text between tags
    By blackbool in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 06-17-2021, 01:14 AM
  2. Convert HTML tags to Excel text format
    By CristianCGP in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 02-28-2020, 09:49 PM
  3. Formatted column text to HTML - Is it possible?
    By mpoker84 in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 08-09-2016, 04:37 PM
  4. Search and replace/insert HTML code into Master File using tags
    By Dave855 in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 02-14-2014, 09:35 PM
  5. convert formated excel column to html source text with html tags
    By julia81 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 12-19-2011, 04:22 PM
  6. Pasting HTML formatted text as plain text in Excel
    By JeffCutter in forum Excel General
    Replies: 1
    Last Post: 09-21-2007, 07:43 AM
  7. Converting formatted text into HTML tags
    By cammo in forum Excel General
    Replies: 0
    Last Post: 12-01-2005, 02:38 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1