Regex replace html tag content

I am trying to replace HTML content with regular expression.

from

test test ZZZZZZ test test

to

test test AAAAAA test test

note that only words outside HTML tags are replaced from ZZZ to AAA.

Any idea? Thanks a lot in advance.

asked May 18, 2011 at 7:26

iwaniwan

6,92915 gold badges47 silver badges65 bronze badges

3

You could walk all nodes, replacing text in text ones (.nodeType == 3):

Something like:

element.find('*:contains(ZZZ)').contents().each(function () {
    if (this.nodeType === 3)
        this.nodeValue = this.nodeValue.replace(/ZZZ/g,'AAA')
})

Or same without jQuery:

function replaceText(element, from, to) {
    for (var child = element.firstChild; child !== null; child = child.nextSibling) {
        if (child.nodeType === 3)
            this.nodeValue = this.nodeValue.replace(from,to)
        else if (child.nodeType === 1)
            replaceText(child, from, to);
    }
}

replaceText(element, /ZZZ/g, 'AAA');

answered May 18, 2011 at 7:58

SuorSuor

2,6631 gold badge20 silver badges28 bronze badges

3

The best idea in this case is most certainly to not use regular expressions to do this. At least not on their own. JavaScript surely has a HTML Parser somewhere?

If you really must use regular expressions, you could try to look for every instance of ZZZ that is followed by a "<" before any ">". That would look like

ZZZ(?=[^>]*<)

This might break horribly if the code contains HTML comments or script blocks, or is not well formed.

answered May 18, 2011 at 7:38

JensJens

24.9k6 gold badges75 silver badges116 bronze badges

1

Assuming a well-formed html document with outer/enclosing tags like , I would think the easiest way would be to look for the > and < signs:

/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/$1AAA$2/

If you're dealing with HTML fragments that may not have enclosing tags, it gets a little more complicated, you'd have to allow for start of string and end of string

Example JS (sorry, missed the tag):

alert('test test ZZZZZZ test test'.replace(/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/g, "$1AAA$2"));

Explanation: for each match that

  • starts with >: \>
  • follows with any number of characters that are neither > nor <: [^\>\<]*
  • then has "ZZZ"
  • follows with any number of characters that are neither > nor <: [^\>\<]*
  • and ends with <: \<

Replace with

  • everything before the ZZZ, marked with the first capture group (parentheses): $1
  • AAA
  • everything after the ZZZ, marked with the second capture group (parentheses): $2

Using the "g" (global) option to ensure that all possible matches are replaced.

answered May 18, 2011 at 7:34

TaoTao

13.1k7 gold badges64 silver badges75 bronze badges

2

Try this:

var str = '
ZZZ test test
test test ZZZ'; var rpl = str.match(/href=\"(\w*)\"/i)[1]; console.log(str.replace(new RegExp(rpl + "(?=[^>]*<)", "gi"), "XXX"));

answered May 18, 2011 at 7:41

jeronejerone

15.6k4 gold badges38 silver badges56 bronze badges

have you tried this:

replace:

>([^<>]*)(ZZZ)([^<>]*)<

with:

>$1AAA$3<

but beware all the savvy suggestions in the post linked in the first comment to your question!

answered May 18, 2011 at 7:46

sergiosergio

68.5k11 gold badges101 silver badges121 bronze badges

3