Hướng dẫn strip html tag js

Question

cleanText = strInputCode.replace(/<\/?[^>]+(>|$)/g, "");

Distilled from this website (web.achive).

Nội dung chính Show

Leave a Reply
How do you strip a tag in HTML?
How remove HTML tag from string in react?
How do I remove HTML tags from text in Excel?
How do I strip text formatting?

Nội dung chính

Leave a Reply
How do you strip a tag in HTML?
How remove HTML tag from string in react?
How do I remove HTML tags from text in Excel?
How do I strip text formatting?

This regex looks for <, an optional slash /, one or more characters that are not >, then either > or $ (the end of the line)

Examples:

'Hello' ==> 'Hello'
 ^^^^^     ^^^^^^
'Unterminated Tag  'Unterminated Tag '
                  ^^

But it is not bulletproof:

'If you are < 13 you cannot register' ==> 'If you are '
            ^^^^^^^^^^^^^^^^^^^^^^^^
'Hello' ==> ' 42">Hello'
 ^^^^^^^^^^^^^^^^^^          ^^^^^^

If someone is trying to break your application, this regex will not protect you. It should only be used if you already know the format of your input. As other knowledgable and mostly sane people have pointed out, to safely strip tags, you must use a parser.

If you do not have acccess to a convenient parser like the DOM, and you cannot trust your input to be in the right format, you may be better off using a package like sanitize-html, and also other sanitizers are available.

Comments

Your script works great! Cheers!
this is so cool , i like it
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText;
}
- This was even better for my needs. No issues with special characters etc…
- That is awful advice!
  If for some reason (like malicious intent of users) the html argument contains a script tag, you’ve now opened up for XSS attacks!!!
  Don’t use the DOM for something that doesn’t require it.
  Also, the DOM is really slow.
- This solution is great for using of inner content from paragraph in JS Alert window – it strips nbsp and em efectivelly,
  thanks
- Pushpinder,
  Lovely. Worked great
- If you don’t need to support IE6, maybe try using the DOMParser directly as it won’t download images nor execute scripts:
```
function stripHtml(dirtyString) {
  const doc = new DOMParser().parseFromString(dirtyString, 'text/html');
  return doc.body.textContent || '';
}
```
  Now if you run something like stripHtml(""); it won’t causes issues while still allowing the browser to do the work.
- One-Liner:
  Here’s a one-liner if you happen to be using jQuery anyway:
  txt=$(document.createElement("DIV")).html('Hi').text();
hey!!!..this is so ridiculous..
Thank you for great example
Thanks, this does exactly what I need (and so concisely, too!)
Thanks! A quick note about the regexp: the “i” isn’t needed here because there are no characters to be case-insensitive about. However, it does exactly what you want either way.
Nice, but the parentheses are unnecessary.

.replace(/<[^>]+>/ig,””);
Hi :)
I saw your contact form and i must say i love it!
Do you have a tutorial or something like that? It’s a wonderful one :)^
Hope to hear some news of you,
A french reader,
Florian
Thank for script :)
@Ricard: If you want to make a copy of the contact form, just view source or save this page to you local ;)
beautul site thank you for great example
the /i for case insensitivity is definitely recommended.
When using contenteditable, IE produces upper case tags, mozilla would only create lower case… To strip those you need it case insensitive.
- DScout, this is incorrect. There are no specified alphabetical characters in the regular expression – the case insensitivity modifier therefore affects nothing.
Hi
I have following code:
var text = ‘[$ ssIncludeXml(docName,”wcm:root/wcm:element[@name=’innerpage_content’]/text()”) $]’;
var StrippedString = text.replace(/(]+)>)/ig,””);
where ‘[$ ssIncludeXml(docName,”wcm:root/wcm:element[@name=’innerpage_content’]/text()”) $]’
is Idoc script that brings a block of HTML from a placeholder. But i am getting “unterminated string literal” Error at first line.
What i want to do is to remove or strip all HTML tags and to get plain text out of that markup.
Kindly let me know if there is any solution.
Thanks
works great but doest strip whitespaces….
Thank you! It was very useful for me and I think that is useful for everyone.
Thank you again!
Yeah, this solution removed all sorts of HTML, paragraph, line breaks, in-line styles etc etc
This does not works for IE. Please provide solution to strip tag in javascript that works for all browsers
Thanks for this script
It work greate

i am trying it on

var message;

    firstName = document.getElementById("username").value;

    if (firstName == null || firstName == "" || firstName == NaN || firstName == "First Name") {
        message = "Please Add some name.";
        document.body.insertAdjacentHTML("BeforeEnd", "" + message + "");
    }
    else {
        if (document.getElementById("myMessage")) {
            debugger;
            arguments = document.getElementById("myMessage").value.replace(/(]+)>)/ig, "");
        }
    }

but it is not working and saying

cannot call method ‘replace’ of undefined

Was wondering how this would be implemented if I only wanted to remove the href tags from a string of text, instead of removing all the tags? I’m trying to retrieve a page of text from a website but I only want the plain text with the formatting tags (p, ul, li).
Hope this makes sense, thanks in advance.
This was excellent! Thanks!
your “\S” is missing… or not?
```
/(<\S([^>]+)>)/ig
```
- \S means not whitespace, and ^> means not greater than, so your modified regex only ensures that single character tags will not be replaced.
Thanks,
Its working fine.
Cool! This is perfectly working…
What about < b r / > or < h r / > (the self closing tags) ?
Looks like “newInput” doesn’t do anything at all? So it’s either extraneous or there’s a problem with the code.
I have developed same thing using javascript Regular Expression.
It’ll strip all the html tags excluding tag provided in exclude list by user.
source code is also available on github
check here. HTML Tag Stripper
Nice, but it’s not that safe… I’d rather use jQuery:
$("
").text(' ').text();
document.body.innerText
b”> ~ fail
But this code is not working well with HTML table content.
How can strip all tags except anchor and img tags?
You can easily leave out the case sensitivity /i and the grouping ():
```
var noHtml = hasHtml.replace(/<[^>]+>/ig, '')
```
using jQuery
jQuery(stringWithTags).text()
jQuery(stringWithTags).text();
it is what i want. tanx…
not working with AngularJS.
Mohammad Mustafa Ahmedzai
Probably the simplest probably I found online. Thanks a bunch for it. Worked just fine!

string.replace(/\n/g, "");
string.replace(/[\t ]+\[\t ]+\<");
string.replace(/\>[\t ]+$/g, ">");

Doesn’t anyone see how this solution greatly affects this text:
Rounded amounts < 3 are way easier for people to use in calculations, since they are so tiny than numbers that are >=3
Becomes: Rounded amounts =3
- This one is better; phpjs.org/functions/strip_tags/

Safe way to use the DOM to strip html.

function striptags(content) {
  var frag = document.createDocumentFragment();
  var innerEl = document.createElement('div');
  frag.appendChild(innerEl);
  innerEl.innerHTML = content;
  return frag.firstChild.innerText;
}
striptags('');

I chucked together a function that allows some tags to be kept, similar to how the php function works.
As with PHP it comes with the following two caveats:
Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
and

This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.
```
/**
 * Native javascript function to emulate the PHP function strip_tags.
 * 
 * @param {string} str The original HTML string to filter.
 * @param {array|string} allowable_tags A tag name or array of tag
 * names to keep. Intergers, objects, and strings that don't follow the
 * standard tag format of a letter followed by numbers and letters will
 * be ignored. This means that invalid tags will also be removed.
 * @return {string} The filtered HTML string.
 */
function strip_tags(str, allowable_tags) {
    allowable_tags = [].concat(allowable_tags);
    var keep = '';
    allowable_tags.forEach(function(tag) {
        if (('' + tag).match(/^[a-z][a-z0-9]+$/i))
            keep += (keep.length ? '|' : '') + tag;
    } );
    return str.replace(new RegExp(']+>', 'ig'), '');
}
```
Additional checks have been implemented to prevent invalid tags from being removed where possible, by ensuring that the opening of each tag starts with a potential tag name; it does not account for greater than symbols within attributes. Comments will be retained but can be removed with a similar regex.
```
var no_comments = strip_tags('This is not a comment. ').replace(//, '');
```
- Hi!
  I hate to bother you, but it looks like the last line of your function has been corrupted somehow – that’s not a valid Regex. Any chance you could fix it?

Hi guys! I am currently facing a javascript problem with the regex / replace function you mention here.
I would like to bring a text around some of its HTML tags.

For this I use the function:

var regex = / (<([^>] +)>) / ig;
bodyValue = bodyValue.replace (regex, "");

Here all tags are deleted.

But I want to keep the and tags and found these two separate functions that worked for me:

                               var regex = / <(?! \ s * \ /? \ s * p \ b) [^>] *> / gi; // deletes all HTML except


                               var regex = / <(?! br \ s * \ /?) [^>] +> / gi; // deletes all HTML except for

Do you know how to combine the two conditions in one?

This not only removes the offending characters, but also the rest of the text.

What’s the HTML you’re working with?

Why don’t you use Element.textContent?

Just what I needed…Thanks

.replace(/(<([^> ]+)>)/ig, "")
added a space after the chevron to allow for things like: “< heey >”

Another tip: use the browser’s ability to remove tags:

const fakeDiv = document.createElement("div");
fakeDiv.innerHTML = html;
document.getElementById("stripped").innerHTML = fakeDiv.textContent || fakeDiv.innerText || "";

Hello sir. Please I wish to know if I can get help from you.
I have a frontend submission which users can share their article but will want to remove every link on the form.
Is there a way to do this only for the post submitted by users who are not admin?
Thanks
I already have the frontend post set and it works properly except what I am seeking for help.