Friday, 23 August 2013

How to just more a string of more than 1 character in a regex replace expression vb.net

How to just more a string of more than 1 character in a regex replace
expression vb.net

Right I'm to remove some quotes from a xml file I've downloaded from
Wikipedia. So far The text looks like this (ignore the line breaks, that's
just so it's easier to read):
'''Anarchism''' is a political philosophy that advocates stateless
societies based on
non-hierarchical free associations.<ref name="iaf-ifa.org"/><ref>"That is why
Anarchy, when it works to destroy authority in all its aspects, when it
demands
the abrogation of laws and the abolition of the mechanism that serves to
impose them, when it refuses all hierarchical organization and preaches
free agreement - at the same time strives to maintain and enlarge the
precious kernel of social customs without which
no human or animal society can exist." Peter Kropotkin.
http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin__Anarchism__its_philosophy_and_ideal.html
Anarchism: its philosophy and ideal</ref><ref>"anarchists are opposed to
irrational (e.g., illegitimate)
authority, in other words, hierarchy - hierarchy being the
institutionalisation of authority
within a society."
http://www.theanarchistlibrary.org/HTML/The_Anarchist_FAQ_Editorial_Collective__An_Anarchist_FAQ__03_17_.html#toc2
"B.1
Why are anarchists against authority and hierarchy?" in An
Anarchist FAQ</ref><ref>"ANARCHISM, a social philosophy that rejects
authoritarian government and maintains that voluntary institutions are best
suited to express man's natural social tendencies." George Woodcock.
"Anarchism" at The Encyclopedia of Philosophy</ref><ref>"In a society
developed on these lines, the voluntary
associations which already now begin to cover all the fields of human
activity
would take a still greater extension so as to substitute themselves for the
state in all its functions."
http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin___Anarchism__from_the_Encyclopaedia_Britannica.html
Peter Kropotkin. "Anarchism" from the Encyclopædia Britannica</ref>
Anarchism holds the state
to be undesirable, unnecessary, or harmful
All I want from this block of text is this:
Anarchism is a political philosophy that advocates stateless societies
based on non-hierarchical free associations. Anarchism holds the state to
be undesirable, unnecessary, or harmful.
It seem to me that if I remove all text between "<ref" and "/ref>" I
should be able to capture all the required undesirable text and remove it.
This is the code I have at the moment:
Dim temptext As String = newsrt.ToString
Dim expression As New Regex("(?<=\<ref)[^/ref>]+(?=/ref>)")
Dim resul As String = expression.Replace(temptext, "")
But this doesn't seem to work. No text between the <ref and /ref> is
captured and replaced with "".
Any help or advice would be great! Thanks.

No comments:

Post a Comment