+ Reply to Thread
Results 1 to 21 of 21

fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

  1. #1
    Forum Contributor
    Join Date
    10-30-2014
    Location
    England
    MS-Off Ver
    2007 / 365
    Posts
    281

    fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    I am reading text files line by line using the fso.OpenTextFile / Textstream.Readline method to get row counts, make some line amendments and then write the lines back out to a new file.

    The issue I have is that I often get text files with embedded linefeeds in some rows which is causing .Readline to end at that point and then start reading the rest of the line as a new line.

    Is there any change or addition can add to this method so it will ignore any embedded linefeeds and correctly read until the end of the line?

    The other method I was experimenting with was ADODB.stream, and with that .ReadText(adReadLine) works fine and seems to ignore the embedded linefeeds, but I've found ADODB to be slower the larger the files get.

    Unable to upload a workbook at the moment, but hopefully that shouldn't matter in this case.
    Last edited by EchoPassenger; 06-23-2023 at 06:37 PM.

  2. #2
    Forum Expert Keebellah's Avatar
    Join Date
    01-12-2014
    Location
    The Netherlands
    MS-Off Ver
    Office 2021 (Windows)
    Posts
    7,937

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Hi, if you're reading the file line by line in a Loop then you could add a check to either remove linefeeds or carriage returns.
    Can you upload a text file?
    ---
    Hans
    "IT" Always crosses your path!
    May the (vba) code be with you... if it isn't; start debugging!
    If you like my answer, Click the * below to say thank-you

  3. #3
    Forum Expert
    Join Date
    10-11-2021
    Location
    Netherlands
    MS-Off Ver
    365
    Posts
    1,505

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Is the array in variable "a" how you expect it to be? I just used carriage return instead of linefeed.

    Please Login or Register  to view this content.

  4. #4
    Forum Expert Keebellah's Avatar
    Join Date
    01-12-2014
    Location
    The Netherlands
    MS-Off Ver
    Office 2021 (Windows)
    Posts
    7,937

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    in vba you can also use Chr(10) or Chr(13)

  5. #5
    Forum Contributor
    Join Date
    10-30-2014
    Location
    England
    MS-Off Ver
    2007 / 365
    Posts
    281

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Quote Originally Posted by JEC. View Post
    Is the array in variable "a" how you expect it to be? I just used carriage return instead of linefeed.

    Please Login or Register  to view this content.
    Unfortunately, due to some potentially very large files I have to read in line by line rather than all at once to split. I may add add in a file size check for more manageable files and use this approach.


    Quote Originally Posted by Keebellah View Post
    Hi, if you're reading the file line by line in a Loop then you could add a check to either remove linefeeds or carriage returns.
    Can you upload a text file?
    Quote Originally Posted by Keebellah View Post
    in vba you can also use Chr(10) or Chr(13)
    That would be ideal (potential performance issues aside) if there is a way to do that, but I am finding that the line is split at the point of reading (ts.ReadLine). So I can't check it once it's been read as it then exists as two separate lines.


    I've tried wrapping the .readline in a CLEAN / SUBSTITUTE functions but that has no effect. So it seems that it just reads until it hits just a single LF, then treats the rest of the line as a new line. I am assuming that ADODB.Stream must read until the CR/LF at the real end of the line. Sample text file attached - 3 lines, 2nd line has the LF. These are usually created when someone enters multiple lines within a single Excel cell (Alt+Enter). Running the below code will result in 4 lines within the collection.

    Please Login or Register  to view this content.
    Attached Files Attached Files
    Last edited by EchoPassenger; 06-24-2023 at 06:43 AM.

  6. #6
    Forum Expert
    Join Date
    10-11-2021
    Location
    Netherlands
    MS-Off Ver
    365
    Posts
    1,505

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Did you put the quotes around the line with the line feed?
    Last edited by JEC.; 06-24-2023 at 07:28 AM.

  7. #7
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    According to post #5 csv attachment, as the End Of Line is different than Line Feed
    so this file can be read without any issue with the classic VBA function Input - or Line Input - like already demonstrated within this forum …

  8. #8
    Forum Contributor
    Join Date
    10-30-2014
    Location
    England
    MS-Off Ver
    2007 / 365
    Posts
    281

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Quote Originally Posted by JEC. View Post
    Did you put the quotes around the line with the line feed?
    No that seems to be Excel behavior when saving as a text file. Presumably for the same reason it adds quotes to cell values containing commas.


    Quote Originally Posted by Marc L View Post

    According to post #5 csv attachment, as the End Of Line is different than Line Feed
    so this file can be read without any issue with the classic VBA function Input - or Line Input - like already demonstrated within this forum …
    Thanks. Line Input does indeed work, a line feed on it's own will be ignored. It also seems much quicker reading large files than the method above I was using.

    Are there any downsides to using this approach vs. filesystemobject?

    Please Login or Register  to view this content.
    Now the final question is, for very large files (possibly up to 2 million lines), what would be the most efficient way to clean up the line feeds before writing back out to a new file? As I would rather replace it with a space than just remove it, would wrapping the string in a REPLACE or SUBSTITUTE function when adding to the collection be the best option?

  9. #9
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Question Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    So the question is why using the slow Excel / VBA ?!

  10. #10
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    Anyway under VBA you can duplicate the original file to a new file then open the new file in Binary mode,
    load all data to a Byte array then check this array in order to replace each single line feed byte with a space directly in the file :
    an easy short code and if not enough fast leave VBA for a faster coding language …

  11. #11
    Forum Expert Keebellah's Avatar
    Join Date
    01-12-2014
    Location
    The Netherlands
    MS-Off Ver
    Office 2021 (Windows)
    Posts
    7,937

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    And like this?

    Please Login or Register  to view this content.

  12. #12
    Forum Expert
    Join Date
    10-11-2021
    Location
    Netherlands
    MS-Off Ver
    365
    Posts
    1,505

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    This will be slow on large datasets because there is sheet interaction in every loop.

  13. #13
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    As the purpose is to not put data within a worksheet but to modify a text file so a VBA Binary way like described in post #10 is faster
    and fastest without using Excel …

  14. #14
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Cool Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    I forgot in posts #7 & 10 : quotes & doubles quotes are very not a concern …

    If post #10 Binary way loading all in a Byte array is too heavy for the computer then the classic VBA way works as expected
    if only the End Of Line of the source text file is CrLf :

    PHP Code: 
    Sub Demo00()
        
    Dim VR%, L&, W%, S$
            
    Application.GetOpenFilename("Text files,*.csv", , "File to convert"):  If False Then Exit Sub
            R 
    FreeFile
            Open V 
    For Input As #R
            
    InStrRev(V"\")
            W = FreeFile
            Open Left(V, L) & "
    New " & Mid(V, L + 1) For Output As #W
        While Not EOF(R)
            Line Input #R, S
            Print #W, Replace(S, vbLf, " ")
        Wend
            Close #R, #W
            Application.Speech.Speak "
    Done!", True
    End Sub 
    ► Do you like it ? ► So thanks to click on bottom left star icon « Add Reputation » !
    Last edited by Marc L; 07-08-2023 at 09:40 PM. Reason: not such slow in fact ...

  15. #15
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    According to my previous post, in Binary way if loading all to a Byte array is too heavy
    so the Binary way should use a buffer Byte array rather than loading all …
    As some weird text files can not be well read in Input mode so Binary mode is the universal way to go !

  16. #16
    Forum Expert Keebellah's Avatar
    Join Date
    01-12-2014
    Location
    The Netherlands
    MS-Off Ver
    Office 2021 (Windows)
    Posts
    7,937
    Quote Originally Posted by JEC. View Post
    This will be slow on large datasets because there is sheet interaction in every loop.
    Not if you disable events, then the time loss is minimal

  17. #17
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    No as writing cell by cell is the slowest way to go, in particular here when Excel is not necessary
    as the purpose is to directly overwrite each single Line Feed character with a Space within a text file !

  18. #18
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    Quote Originally Posted by EchoPassenger View Post
    Unfortunately, due to some potentially very large files I have to read in line by line rather than all at once to split.
    So you choosed the slow way …

    According to tests achieved years ago the fast way as described in post #10 uses the same post #3 way
    to load all data in a Byte array in order to overwrite only single Line Feed chars with spaces in Binary mode,
    25 times faster than the classic full VBA Binary way …
    Last edited by Marc L; 06-26-2023 at 10:05 AM.

  19. #19
    Forum Contributor
    Join Date
    10-30-2014
    Location
    England
    MS-Off Ver
    2007 / 365
    Posts
    281

    Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds

    Quote Originally Posted by Marc L View Post

    I forgot in posts #7 & 10 : quotes & doubles quotes are very not a concern …

    If post #10 Binary way loading all in a Byte array is to heavy for the computer then the classic slow VBA way works as expected
    if only the End Of Line of the source text file is CrLf :

    PHP Code: 
    Sub Demo00()
        
    Dim VR%, L&, W%, S$
            
    Application.GetOpenFilename("Text files,*.csv", , "File to convert"):  If False Then Exit Sub
            R 
    FreeFile
            Open V 
    For Input As #R
            
    InStrRev(V"\")
            W = FreeFile
            Open Left(V, L) & "
    New " & Mid(V, L + 1) For Output As #W
        While Not EOF(R)
            Line Input #R, S
            Print #W, Replace(S, vbLf, " ")
        Wend
            Close #R, #W
            Application.Speech.Speak "
    Done!", True
    End Sub 
    ► Do you like it ? ► So thanks to click on bottom left star icon « Add Reputation » !
    Quote Originally Posted by Marc L View Post

    So you choosed the slow way …

    According to tests achieved years ago the fast way as described in post #10 uses the same post #3 way
    to load all data in a Byte array in order to overwrite only single Line Feed chars with spaces in Binary mode,
    25 times faster than the classic full VBA Binary way …

    Thank you. That way is definitely a bit faster, however, I am finding that it leaves a 0kb copy of the file?

    Also, related to a previous thread of mine, is there anyway to specify the file encoding as UTF-8 with this method?
    Last edited by EchoPassenger; 06-26-2023 at 11:37 AM.

  20. #20
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Question Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    Which way ?
    If you are stating about my VBA demonstration Demo00 (Edit : maybe not the fastest but fast enough) :
    not on my side as this procedure just creates a same size new file with space in place of single Line Feed character …

    For the fast way according to posts #3 & 10 and for the UTF8 encoding it depends on the missing big text file attachment
    like a crystal clear explanation of what you really mean with « embedded linefeeds » so
    does that mean any single Line Feed within the text - single meaning not with a previous Carriage Return character - or
    only Line Feed character within a string between double quotes or … ?

    Edit : as in Binary mode the encoding stays the same or
    if you need to convert an ASCII text file to UTF8 so yes ADODB.Stream is a slower way to go …
    As guessing can't be coding !
    Last edited by Marc L; 06-26-2023 at 07:44 PM.

  21. #21
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: fso.OpenTextFile / ts.Readline - Dealing with embedded Line Feeds


    Addendum and to answer finally to post #8 about « the most efficient way » …
    I review old tests via a test file with 440K lines, +50 millions chars :
    the most efficient way is not always the same, can vary with the context.

    For example if there are few LF chars to overwrite with spaces my post #14 Demo00 is ranked #3 (5.5s),
    the way described in post #10 is ranked #2 (4.8s) and the EZ way I forgot is ranked #1 (3.6s).

    The EZ way like post #10 description read all the data with FSO but to a string variable
    then just use the VBA function Instr to locate a LF char then a space overwrite directly the LF char in Binary mode.
    Post #10 way is the solution when there are multiple conditions toggles, easier to code.
    When there is no toggle or only one like between double quotes for example, using Instr should be faster.

    Now with a LF to replace every two lines so 220K chars : post #10 way becomes slower than my Demo00,
    the EZ Instr way stays the fastest but the gap with Demo00 is very reduced …

    Then if every row has a LF to be replaced so 440K chars : Demo00 is the winner with 5.6s,
    Instr way with 7s and post #10 way with 8s.

    I had not enough time to test other ways but you get now the main idea :
    without knowing exactly the context neither an attachment no clue of which way could be the most efficient.

    According to the encoding, you can use a tool text editor like Notepad++ …

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Code Logic for Allowing & Ignoring Line Feeds
    By maxeffort33 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 04-13-2021, 08:30 AM
  2. Code Logic for Allowing & Ignoring Line Feeds
    By maxeffort33 in forum Excel General
    Replies: 0
    Last Post: 04-12-2021, 10:23 PM
  3. Replies: 3
    Last Post: 10-22-2019, 09:00 AM
  4. Multiple line invoice that feeds a database on a seperate sheet
    By vicken25 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 03-17-2014, 07:32 AM
  5. Parsing of line feeds in text file
    By pedron in forum Excel General
    Replies: 1
    Last Post: 02-01-2012, 06:21 PM
  6. Replies: 0
    Last Post: 12-09-2008, 08:42 PM
  7. MS Query and Carriage Returns/Line Feeds
    By Gary Brown in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 03-02-2005, 10:24 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1