+ Reply to Thread
Results 1 to 9 of 9

VBA "Scripting.Dictionary" Word Occurences under each category

Hybrid View

  1. #1
    Forum Contributor
    Join Date
    05-24-2014
    MS-Off Ver
    Microsoft Office 2013
    Posts
    113

    VBA "Scripting.Dictionary" Word Occurences under each category

    I am very new to dictionaries in VBA but I found a macro that does almost exactly what I need it to in a fraction of the time as a formula. I have a list of items that belong to different categories. In one column I have the category, in columns next to it I have the product title, description, and a few bullet points with more descriptions. My goal is to have a macro go through all the items in my sheet and return a list of words in those columns and their occurences in each category.

    I've attached a sample with some dummy data. Dummy_Data.xlsx

    Below is the code that I found at this page.

    Sub HTH() 
        Dim vArray As Variant 
        Dim lLoop As Long 
        Dim rCell As Range 
         
        With CreateObject("Scripting.Dictionary") 
            For Each rCell In Range("A1", Cells(Rows.Count, "A").End(xlUp)) 
                vArray = Split(rCell.Value, " ") 
                For lLoop = LBound(vArray) To UBound(vArray) 
                    If Not .exists(vArray(lLoop)) Then 
                        .Add vArray(lLoop), 1 
                    Else 
                        .Item(vArray(lLoop)) = .Item(vArray(lLoop)) + 1 
                    End If 
                Next lLoop 
            Next rCell 
            Range("B1").Resize(.Count).Value = Application.Transpose(.keys) 
            Range("C1").Resize(.Count).Value = Application.Transpose(.items) 
        End With 
    End Sub
    This code gives me the count of each word's occurence. The shortcoming here is that it includes special characters. For example: Ther could be the word "great" and "great," and each one would have their own count. My goal is to eliminate the special characters before counting. Also, I would like it to give me the word and the count within each category.

    This next request is only optional but would be great. I would also like to get a list of two word combinations and their counts. For example if a phrase contained "big dog" then it would be counted. If a phrase contained "Big bad dog" then "Big Bad" and "Bad Dog" would be counted.

    Also, I would like it so it isn't case sensitive when counting.

    Any help on this would be greatly appreciated.
    Last edited by manofcheese; 07-23-2015 at 01:37 PM.

  2. #2
    Forum Contributor
    Join Date
    11-15-2012
    Location
    Buffalo, NY
    MS-Off Ver
    Office 365
    Posts
    319

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Well, you certainly provide some intriguing dummy data. However, you suggest that you want to see Yes/Big/24 as a result. When I look at all the words in the rows associated with Yes, I only see the word Big once (in the Title column, at cell B2). So I'm at a loss to see exactly what you are trying to accomplish!

    If I understand what you're saying, given this set of text columns, you want to create a completely different data structure that would have a list of all the different words in columns B through I, and a count of the number of times it appears in each category (which leads to an additional question about what is the difference between Yes (Row 2) and Yes (Row 4))

    I think this post is tied to the other one you did about columns and rows, in which case simply setting out what you want to happen (rather than describing the means to accomplish it) is the most important first step. As some cynic somewhere once said, if you don't know where you want to be, any route will do!

    Thanks,

    Tony

  3. #3
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & 2010
    Posts
    23,259

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Hello manofcheese,

    This was quite a challenge. While the code you provide is small, I believe your actual data is probably quite large. I tweaked the macro several times to get the best performance possible.

    The data is output on "Sheet2" with each category's words and count listed in 2 columns with the category header in row 1. Each category is separated from the next by 1 empty column for readability.

    Provided that your actual data is laid out the same as the example and is located on "Sheet1", you should be able to paste the macro into your original workbook with no problems. If you do encounter a problem, please post the workbook for review.


    Macro Code
    Sub Summarize()
    
        Dim Categories  As Variant
        Dim Category    As Variant
        Dim Data        As Variant
        Dim Dict        As Object
        Dim index       As Long
        Dim info        As Variant
        Dim n           As Long
        Dim RegExp      As Object
        Dim Rng         As Range
        Dim row         As Long
        Dim text        As String
        Dim Wks         As Worksheet
        Dim Word        As Variant
        Dim Words       As Variant
            
            Set Wks = Worksheets("Sheet1")
            Set Rng = Wks.Range("A1").CurrentRegion
            
            Set Dict = CreateObject("Scripting.Dictionary")
            Dict.CompareMode = vbTextCompare
            
            Set RegExp = CreateObject("VBScript.RegExp")
            RegExp.Global = True
            RegExp.Pattern = "[^\s0-9A-Za-z]+"
                
                Categories = Intersect(Rng, Rng.Columns(1).Offset(1, 0)).Value
            
                Data = Intersect(Rng, Rng.Offset(1, 1)).Value
            
                For row = 1 To UBound(Categories, 1)
                    Category = Categories(row, 1)
                    
                    
                    If Not Dict.exists(Category) Then
                        ReDim info(1, 0)
                        GoSub GetWordCount
                        Dict.Add Category, info
                    Else
                        info = Dict(Category)
                        GoSub GetWordCount
                        Dict(Category) = info
                    End If
                Next row
                           
            Application.ScreenUpdating = False
            
            n = 0
            
            For Each Key In Dict.Keys
                Item = Dict(Key)
                Sheet2.Range("A1").Offset(0, n) = Key
                Sheet2.Range("A1").Offset(1, n).Resize(UBound(Item, 2), 2) = Application.Transpose(Item)
                n = n + 3
            Next Key
          
            Application.ScreenUpdating = True
            
    Exit Sub
    
    GetWordCount:
                Words = Application.index(Data, row, 0)
                text = Join(Words, " ")
                text = RegExp.Replace(text, "")
                Words = Split(text, " ")
                        
                For Each Word In Words
                    n = UBound(info, 2)
                    index = 0
                        
                    For j = 0 To UBound(info, 2) - 1
                        If StrComp(info(0, j), Word, vbTextCompare) = 0 Then
                            index = j
                            Exit For
                        End If
                    Next j
                            
                    If index = 0 Then
                        info(1, n) = info(1, n) + 1
                        info(0, n) = Word
                        ReDim Preserve info(1, n + 1)
                    Else
                        info(1, index) = info(1, index) + 1
                    End If
                Next Word
            Return
    
    End Sub
    Attached Images Attached Images
    Attached Files Attached Files
    Last edited by Leith Ross; 07-18-2015 at 06:52 PM. Reason: Uploaded Case Insensitive Version of Workbook
    Sincerely,
    Leith Ross

    Remember To Do the Following....

    1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.
    2. Thank those who have helped you by clicking the Star below the post.
    3. Please mark your post [SOLVED] if it has been answered satisfactorily.


    Old Scottish Proverb...
    Luathaid gu deanamh maille! (Rushing causes delays!)

  4. #4
    Forum Contributor
    Join Date
    05-24-2014
    MS-Off Ver
    Microsoft Office 2013
    Posts
    113

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Leith,
    Thanks for your response. It looks like the macro works perfectly on my dummy data. However I can't get it to work with my actual data. I tried switching the layout to match my dummy data and then running it but it was still throwing an error. I have some better data now that you can take a look at. I put it back to the actual layout instead of my dummy data layout. I think it could be throwing an error because some of the cells contain thousands of characters. The file was too big to upload completely so this will give you a large amount of data but not 100%Test other method2.xlsm

  5. #5
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & 2010
    Posts
    23,259

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Hello manofcheese,

    After a lot of testing, checking, re-checking, and code tweaking, this macro is optimized. When I first started rewriting the macro, it took 3 minutes. Now, it takes a little over 2 seconds. This should work with no problem on your larger workbook.

    If you have any problems, let me know.

    Updated Macro Code
    Sub Summarize()
    
        Dim Categories  As Variant
        Dim Category    As Variant
        Dim Data        As Variant
        Dim Dict        As Object
        Dim DstRng      As Object
        Dim DstWks      As Worksheet
        Dim Info        As Object
        Dim j           As Long
        Dim key         As Variant
        Dim n           As Long
        Dim RegExp      As Object
        Dim Rng         As Range
        Dim row         As Long
        Dim SrcWks      As Worksheet
        Dim text        As String
        Dim Word        As Variant
        Dim Words       As Variant
            
            StartTime = Timer
            
            Set DstWks = Worksheets("Sheet2")
            
            Set SrcWks = Worksheets("Sheet1")
            Set Rng = SrcWks.Range("A1").CurrentRegion
            
            Set Dict = CreateObject("Scripting.Dictionary")
            Dict.CompareMode = vbTextCompare
            
            Set RegExp = CreateObject("VBScript.RegExp")
            RegExp.Global = True
            RegExp.Pattern = "([\-]+)|([^\sA-Za-z]+)"
                
                Categories = Intersect(Rng, Rng.Columns(1).Offset(1, 0)).Value
            
                Data = Intersect(Rng, Rng.Offset(1, 2)).Value
            
                For row = 1 To UBound(Categories, 1)
                    DoEvents
                    
                    Category = Categories(row, 1)
                    
                    If Not Dict.Exists(Category) Then
                        Set Info = CreateObject("Scripting.Dictionary")
                        Info.CompareMode = vbTextCompare
                        GoSub GetWordCount
                        Dict.Add Category, Info
                    Else
                        Set Info = Dict(Category)
                        GoSub GetWordCount
                        Set Dict(Category) = Info
                    End If
                Next row
                           
            Application.ScreenUpdating = False
            
            n = 0
            
            For Each key In Dict.Keys
                Set Info = Dict(key)
                
                Set DstRng = DstWks.Range("A1:B1").Offset(0, n).Resize(Info.Count, 2)
                
                DstRng.Range("A1:B1").Value = Array(key, "Count")
                
                DstRng.Columns(1).Offset(1, 0).Value = Application.Transpose(Info.Keys)
                DstRng.Columns(1).Cells.NumberFormat = "@"
                
                DstRng.Columns(2).Offset(1, 0).Value = Application.Transpose(Info.Items)
                DstRng.Columns(2).Cells.NumberFormat = "0;;;@"
                
                With DstWks.Sort
                    .SortFields.Clear
                    .SortFields.Add key:=DstRng.Cells(1, 1), SortOn:=xlSortOnValues, Order:=xlAscending
                
                    .Header = xlYes
                    .MatchCase = False
                    .Orientation = xlTopToBottom
                    .SetRange DstRng
                    .Apply
                End With
                
                n = n + 3
            Next key
          
            Application.ScreenUpdating = True
            
            EndTime = Timer
            TotalTime = EndTime - StartTime
            
    Exit Sub
    
    GetWordCount:
                text = ""
                Words = Empty
                
                For j = 1 To UBound(Data, 2)
                    text = RegExp.Replace(Data(row, j), "$2 ")  ' Replace hyphens with a space.
                    text = RegExp.Replace(text, "$1")           ' Remove all other punctuation except for spaces.
                    Words = Split(text, " ")
                        
                    For Each Word In Words
                        If Len(Word) > 0 Then
                            
                            If Not Info.Exists(Word) Then
                                Info.Add Word, 1
                            Else
                                n = Info(Word)
                                Info(Word) = n + 1
                            End If
                            
                        End If
                    Next Word
                Next j
            
            Return
    
    End Sub
    Attached Files Attached Files

  6. #6
    Forum Contributor
    Join Date
    05-24-2014
    MS-Off Ver
    Microsoft Office 2013
    Posts
    113

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Quote Originally Posted by Leith Ross View Post
    Hello manofcheese,

    After a lot of testing, checking, re-checking, and code tweaking, this macro is optimized. When I first started rewriting the macro, it took 3 minutes. Now, it takes a little over 2 seconds. This should work with no problem on your larger workbook.

    If you have any problems, let me know.

    Updated Macro Code
    Sub Summarize()
    
        Dim Categories  As Variant
        Dim Category    As Variant
        Dim Data        As Variant
        Dim Dict        As Object
        Dim DstRng      As Object
        Dim DstWks      As Worksheet
        Dim Info        As Object
        Dim j           As Long
        Dim key         As Variant
        Dim n           As Long
        Dim RegExp      As Object
        Dim Rng         As Range
        Dim row         As Long
        Dim SrcWks      As Worksheet
        Dim text        As String
        Dim Word        As Variant
        Dim Words       As Variant
            
            StartTime = Timer
            
            Set DstWks = Worksheets("Sheet2")
            
            Set SrcWks = Worksheets("Sheet1")
            Set Rng = SrcWks.Range("A1").CurrentRegion
            
            Set Dict = CreateObject("Scripting.Dictionary")
            Dict.CompareMode = vbTextCompare
            
            Set RegExp = CreateObject("VBScript.RegExp")
            RegExp.Global = True
            RegExp.Pattern = "([\-]+)|([^\sA-Za-z]+)"
                
                Categories = Intersect(Rng, Rng.Columns(1).Offset(1, 0)).Value
            
                Data = Intersect(Rng, Rng.Offset(1, 2)).Value
            
                For row = 1 To UBound(Categories, 1)
                    DoEvents
                    
                    Category = Categories(row, 1)
                    
                    If Not Dict.Exists(Category) Then
                        Set Info = CreateObject("Scripting.Dictionary")
                        Info.CompareMode = vbTextCompare
                        GoSub GetWordCount
                        Dict.Add Category, Info
                    Else
                        Set Info = Dict(Category)
                        GoSub GetWordCount
                        Set Dict(Category) = Info
                    End If
                Next row
                           
            Application.ScreenUpdating = False
            
            n = 0
            
            For Each key In Dict.Keys
                Set Info = Dict(key)
                
                Set DstRng = DstWks.Range("A1:B1").Offset(0, n).Resize(Info.Count, 2)
                
                DstRng.Range("A1:B1").Value = Array(key, "Count")
                
                DstRng.Columns(1).Offset(1, 0).Value = Application.Transpose(Info.Keys)
                DstRng.Columns(1).Cells.NumberFormat = "@"
                
                DstRng.Columns(2).Offset(1, 0).Value = Application.Transpose(Info.Items)
                DstRng.Columns(2).Cells.NumberFormat = "0;;;@"
                
                With DstWks.Sort
                    .SortFields.Clear
                    .SortFields.Add key:=DstRng.Cells(1, 1), SortOn:=xlSortOnValues, Order:=xlAscending
                
                    .Header = xlYes
                    .MatchCase = False
                    .Orientation = xlTopToBottom
                    .SetRange DstRng
                    .Apply
                End With
                
                n = n + 3
            Next key
          
            Application.ScreenUpdating = True
            
            EndTime = Timer
            TotalTime = EndTime - StartTime
            
    Exit Sub
    
    GetWordCount:
                text = ""
                Words = Empty
                
                For j = 1 To UBound(Data, 2)
                    text = RegExp.Replace(Data(row, j), "$2 ")  ' Replace hyphens with a space.
                    text = RegExp.Replace(text, "$1")           ' Remove all other punctuation except for spaces.
                    Words = Split(text, " ")
                        
                    For Each Word In Words
                        If Len(Word) > 0 Then
                            
                            If Not Info.Exists(Word) Then
                                Info.Add Word, 1
                            Else
                                n = Info(Word)
                                Info(Word) = n + 1
                            End If
                            
                        End If
                    Next Word
                Next j
            
            Return
    
    End Sub
    Thanks. This worked like a charm.

  7. #7
    Forum Guru
    Join Date
    08-15-2004
    Location
    Tokyo, Japan
    MS-Off Ver
    2013 O.365
    Posts
    22,835

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Try the attched
    Sub test()
        Dim a, i As Long, ii As Long, e, s, dic As Object, myMax As Long, t As Long
        Set dic = CreateObject("Scripting.Dictionary")
        dic.CompareMode = 1
        a = Sheets("sheet1").Cells(1).CurrentRegion.Value
        With CreateObject("VBScript.RegExp")
            .Global = True
            .Pattern = "[\W_\d]"
            For i = 2 To UBound(a, 1)
                If Not dic.exists(a(i, 1)) Then
                    Set dic(a(i, 1)) = CreateObject("Scripting.Dictionary")
                    dic(a(i, 1)).CompareMode = 1
                End If
                For ii = 3 To UBound(a, 2)
                    For Each e In Split(Application.Trim(.Replace(a(i, ii), " ")))
                        dic(a(i, 1))(e) = dic(a(i, 1))(e) + 1
                    Next
                Next
                myMax = Application.Max(myMax, dic(a(i, 1)).Count)
            Next
            ReDim a(1 To myMax + 1, 1 To dic.Count * 3): t = -2
            For Each e In dic
                i = 1: t = t + 3
                a(i, t) = e: a(i, t + 1) = "Count"
                For Each s In dic(e)
                    i = i + 1: a(i, t) = CStr(s): a(i, t + 1) = dic(e)(s)
                Next
            Next
        End With
        Application.ScreenUpdating = False
        With Sheets("sheet3").Cells(1).Resize(UBound(a, 1), UBound(a, 2))
            .NumberFormat = "@": .Value = a
            For i = 1 To UBound(a, 2) Step 3
                .Columns(i + 1).NumberFormat = ""
                .Columns(i).Resize(, 2).Sort .Cells(1, i), 1, , , , , , True
            Next
            .Columns.AutoFit
        End With
        Application.ScreenUpdating = True
    End Sub
    Attached Files Attached Files

  8. #8
    Forum Contributor
    Join Date
    05-24-2014
    MS-Off Ver
    Microsoft Office 2013
    Posts
    113

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Quote Originally Posted by jindon View Post
    Try the attched
    Sub test()
        Dim a, i As Long, ii As Long, e, s, dic As Object, myMax As Long, t As Long
        Set dic = CreateObject("Scripting.Dictionary")
        dic.CompareMode = 1
        a = Sheets("sheet1").Cells(1).CurrentRegion.Value
        With CreateObject("VBScript.RegExp")
            .Global = True
            .Pattern = "[\W_\d]"
            For i = 2 To UBound(a, 1)
                If Not dic.exists(a(i, 1)) Then
                    Set dic(a(i, 1)) = CreateObject("Scripting.Dictionary")
                    dic(a(i, 1)).CompareMode = 1
                End If
                For ii = 3 To UBound(a, 2)
                    For Each e In Split(Application.Trim(.Replace(a(i, ii), " ")))
                        dic(a(i, 1))(e) = dic(a(i, 1))(e) + 1
                    Next
                Next
                myMax = Application.Max(myMax, dic(a(i, 1)).Count)
            Next
            ReDim a(1 To myMax + 1, 1 To dic.Count * 3): t = -2
            For Each e In dic
                i = 1: t = t + 3
                a(i, t) = e: a(i, t + 1) = "Count"
                For Each s In dic(e)
                    i = i + 1: a(i, t) = CStr(s): a(i, t + 1) = dic(e)(s)
                Next
            Next
        End With
        Application.ScreenUpdating = False
        With Sheets("sheet3").Cells(1).Resize(UBound(a, 1), UBound(a, 2))
            .NumberFormat = "@": .Value = a
            For i = 1 To UBound(a, 2) Step 3
                .Columns(i + 1).NumberFormat = ""
                .Columns(i).Resize(, 2).Sort .Cells(1, i), 1, , , , , , True
            Next
            .Columns.AutoFit
        End With
        Application.ScreenUpdating = True
    End Sub
    Thanks. It looks like your solution worked as well. I'll keep trying both for my needs.

  9. #9
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & 2010
    Posts
    23,259

    Re: VBA "Scripting.Dictionary" Word Occurences under each category

    Hello manofcheese,

    You're welcome. Glad I help.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 5
    Last Post: 09-10-2014, 02:12 PM
  2. How to make a Pivot Table "Top 10" include an "other" category.
    By Melvinrobb in forum Excel Charting & Pivots
    Replies: 7
    Last Post: 09-09-2014, 05:15 AM
  3. [SOLVED] CreateObject("scripting.dictionary") Add Item Problem---->Empty Cells
    By HerryMarkowitz in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 08-15-2014, 05:04 PM
  4. Replies: 1
    Last Post: 01-15-2014, 08:53 AM
  5. [SOLVED] How to understand and use CreateObject("Scripting.Dictionary")
    By Darthzo in forum Excel Programming / VBA / Macros
    Replies: 11
    Last Post: 04-03-2013, 07:21 PM
  6. [SOLVED] How to Count number of "Error" and "OK" after the word "Instrument" found in table row
    By eltonlaw in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 05-17-2012, 06:26 AM
  7. Replies: 7
    Last Post: 05-13-2006, 05:02 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1