<h2>Powershell Example: Regex style file processing to remove quotation marks and insert new line breaks.</h2>

Powershell Example: Regex style file processing to remove quotation marks and insert new line breaks.

Working with machine generated data files, you often find yourself having to pre-process them into a friendlier form. For example, extracting Quality data out of an Electronic Health Record (EHR) system to send to an Insurance Company. This activity would be the “T” in Extract, Transform and Load (ETL) and this post illustrates how to do it with a tool you probably already have.

This ugly looking PowerShell command string strips out quotation marks (") and inserts a new line break after a particular string ("02").

Get-Content .\Testfile.txt | %{
    $_-replace “`””,”  `
      –replace “ 02”, “`r`n 02”
    } 
| Out-File –FilePath .\Testfile.txt –Force –Encoding ascii

How do you use it?

You can copy and paste this into a text file with a .ps1 file extension and run this from the PowerShell command line. Or you can type this at the prompt and run it as a command.

How does it work?

Step 1 Use Get-Content to read the text (or csv) file and pipe (|) the results line by line, into the Foreach-Object cmdlet (% is an alias).

Get-Content .\Testfile.txt | %{$_-replace "`"",""}

As each line comes in, use $_ variable to reference the current line and the –replace operator to replace the quotation with nothing.

Step 2 Using the results from the previous section read the stream and use the -Replace operator to find strings that look like “ 02” and insert a new line break (`n) after it.

| (Get-Content .\testfile.txt | Out-String) –Replace “ 02”, “ 02`n”

Step 3 Using the results from the earlier step, pipe them through the Out-File cmdlet to write to file on disk. -Force will overwrite the original file if desired. -Encoding forces the encoding to a text friendly ascii format.

| Out-File –FilePath .\Testfile.txt –Force –Encoding ascii
Adventures in Data...
Hey There!
What is This?