Quantcast
Channel: Scott Hanselman's Blog
Viewing all articles
Browse latest Browse all 1148

Unix Fight! - Sed, Grep, Awk, Cut and Pulling Groups out of a PowerShell Regular Expression Capture

$
0
0

There's a wonderful old programmers joke I've told for years:

"You've got a problem, and you've decided to use regular expressions to solve it.

Ok, now you've got two problems..."

A friend of mine was talking on a social network and said something like:

"That decade I spent in the Windows world stunted my growth. one teeny-tiny unix command grabbed certain values from an XML doc for me."

Now, of course, I took this immediately as a personal challenge and rose up in a rit of fealous jage and defended my employer. Nah, not really as I worked at Nike on Unix for a number of years and I get the power of sed and awk and what not. However, he said XML, and well, PowerShell rocks XML.

Because it's a dynamic language, you can refer to XML nodes just like this:

$a = ([xml](new-object net.webclient).downloadstring("http://feeds.feedburner.com/Hanselminutes"))
$a.rss.channel.item

The first line gets the feed and the second line gets all the items.

However, turns out my friend was actually trying to retrieve values within poorly-formed XML fragments within a larger SQL dump file. There's three kinds of XML. Well-formed, valid, and crap. He was sifting through crap for some values. Basically he had this crazy text file with some fragments of XML within it and wanted the values in-between elements: "<FancyPants>He wants this value</FancyPants>."

Something like this:

grep "<FancyPants>.*<.FancyPants>" test.txt | sed -e "s/^.*<FancyPants/<FancyPants/" | cut -f2 -d">"| cut -f1 -d"<" > fancyresults.txt

I'm old, but I'm not an expert in grep and sed so I'm sure there are ways he could have done it more tersely. There always is, right? With regular expressions, sometimes someone just types $@($*@)$(*@)(@*)@*(%@%# and Shakespeare pops out. You never know.

There's also a lot of different ways to do this in PowerShell, but since he used RegExes, who am I to disagree?

First, here's the one line answer.

cat test.txt | foreach-object {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>'; $matches.x}

But I thought I'd also sort them, remove duplicates...

cat test.txt | foreach-object {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>'; $matches.x} | sort | get-unique

But foreach-object can be aliased as % and get-unique can be just "gu" so the final answer is:

cat test.txt | % {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>';$matches.x} | sort | gu

I think we can agree at they are both hard to read. I still love PowerShell.

Related Links:



© 2011 Scott Hanselman. All rights reserved.



Viewing all articles
Browse latest Browse all 1148

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>