+ Reply to Thread
Results 1 to 3 of 3

Count and identify most frequently occurring words

Hybrid View

  1. #1
    Forum Contributor
    Join Date
    01-25-2010
    Location
    Virginia
    MS-Off Ver
    Excel 2013
    Posts
    104

    Count and identify most frequently occurring words

    Attached is a sample spreadsheet. There are three rows in which the first cell of each row contains a variable length text string. My actually files have MUCH more data.

    My goal is to create a sorted list of say, the top 12, most frequently occurring words across all of the cells. So, the result might be something like the following.

    the 6
    is 4
    by 3

    I also anticipate I might want to change the number of words of interest from 12 to some other number.

    What is the best way to do this?

    One more thing: I'd like to eventually be able to have a list of words that do not get counted, and that would be words like those above (the, is, by, ...) that are not "interesting".
    Last edited by SueWithQuestion; 02-24-2011 at 03:18 PM.

  2. #2
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Count and identify most frequently occurring words

    below is a pretty ugly and inefficient UDF

    Function ModeWord(rngS As Range, Optional lngRank As Long = 1) As String
        Dim oDic As Object, RegExp As Object, RegExpMatch As Object
        Dim rngC As Range
        Dim lngKey As Long, lngInstance As Long
        Dim vTemp As Variant, vKeys As Variant, vKey As Variant
        Dim strTemp As String
        Set oDic = CreateObject("Scripting.Dictionary")
        Set RegExp = CreateObject("vbscript.regexp")
        With RegExp
            .Global = True
            .IgnoreCase = True
            .Pattern = "\w+"
        End With
        For Each rngC In rngS.Cells
            Set RegExpMatch = RegExp.Execute(Application.Trim(rngC.Value))
            For lngInstance = 1 To RegExpMatch.Count Step 1
                strTemp = LCase(RegExpMatch(lngInstance - 1))
                With oDic
                    If Not .exists(strTemp) Then
                        .Add strTemp, 1 + 1 - (1 + .Count) / 10000
                    Else
                        .Item(strTemp) = .Item(strTemp) + 1
                    End If
                End With
            Next lngInstance
        Next rngC
        Set RegExpMatch = Nothing
        With oDic
            If lngRank <= .Count Then
                ReDim vKeys(1 To .Count, 1 To 2)
                For Each vKey In .Keys
                    lngKey = lngKey + 1
                    vKeys(lngKey, 1) = vKey
                    vKeys(lngKey, 2) = .Item(vKey)
                Next vKey
                vTemp = Application.Match(Application.Large(Application.Index(vKeys, 0, 2), lngRank), Application.Index(vKeys, 0, 2), 0)
                ModeWord = vKeys(vTemp, 1) & " (" & Int(vKeys(vTemp, 2)) & ")"
            End If
        End With
        Set oDic = Nothing
    End Function
    Using your example strings in A1:A3 then

    A5:
    =MODEWORD($A$1:$A$3,ROWS(A$5:A5))
    copied down as far as desired
    IMO you'd be better off using a Sub Routine with an Input dialog to specify no. items required.

  3. #3
    Forum Contributor
    Join Date
    01-25-2010
    Location
    Virginia
    MS-Off Ver
    Excel 2013
    Posts
    104

    Re: Count and identify most frequently occurring words

    Wow, thank you! You're amazing! I'm going to try it out right now.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1