Hướng dẫn strip html tag js

cleanText = strInputCode.replace(/<\/?[^>]+(>|$)/g, "");

Distilled from this website (web.achive).

Nội dung chính

  • Leave a Reply
  • How do you strip a tag in HTML?
  • How remove HTML tag from string in react?
  • How do I remove HTML tags from text in Excel?
  • How do I strip text formatting?

This regex looks for <, an optional slash /, one or more characters that are not >, then either > or $ (the end of the line)

Examples:

'
Hello
' ==> 'Hello' ^^^^^ ^^^^^^ 'Unterminated Tag 'Unterminated Tag ' ^^

But it is not bulletproof:

'If you are < 13 you cannot register' ==> 'If you are '
            ^^^^^^^^^^^^^^^^^^^^^^^^
'
Hello
' ==> ' 42">Hello' ^^^^^^^^^^^^^^^^^^ ^^^^^^

If someone is trying to break your application, this regex will not protect you. It should only be used if you already know the format of your input. As other knowledgable and mostly sane people have pointed out, to safely strip tags, you must use a parser.

If you do not have acccess to a convenient parser like the DOM, and you cannot trust your input to be in the right format, you may be better off using a package like sanitize-html, and also other sanitizers are available.

Comments

  1. Your script works great! Cheers!

  2. this is so cool , i like it

  3. function strip(html)
    {
    var tmp = document.createElement("DIV");
    tmp.innerHTML = html;
    return tmp.textContent || tmp.innerText;
    }

    • This was even better for my needs. No issues with special characters etc…

    • That is awful advice!

      If for some reason (like malicious intent of users) the html argument contains a script tag, you’ve now opened up for XSS attacks!!!

      Don’t use the DOM for something that doesn’t require it.

      Also, the DOM is really slow.

    • This solution is great for using of inner content from paragraph in JS Alert window – it strips nbsp and em efectivelly,
      thanks

    • Pushpinder,
      Lovely. Worked great

    • If you don’t need to support IE6, maybe try using the DOMParser directly as it won’t download images nor execute scripts:

      function stripHtml(dirtyString) {
        const doc = new DOMParser().parseFromString(dirtyString, 'text/html');
        return doc.body.textContent || '';
      }
      

      Now if you run something like stripHtml(""); it won’t causes issues while still allowing the browser to do the work.

    • One-Liner:

      Here’s a one-liner if you happen to be using jQuery anyway:

      txt=$(document.createElement("DIV")).html('Hi').text();

  4. hey!!!..this is so ridiculous..

  5. Thank you for great example

  6. Thanks, this does exactly what I need (and so concisely, too!)

  7. Thanks! A quick note about the regexp: the “i” isn’t needed here because there are no characters to be case-insensitive about. However, it does exactly what you want either way.

  8. Nice, but the parentheses are unnecessary.

    .replace(/<[^>]+>/ig,””);

  9. Hi :)

    I saw your contact form and i must say i love it!
    Do you have a tutorial or something like that? It’s a wonderful one :)^
    Hope to hear some news of you,

    A french reader,

    Florian

  10. Thank for script :)

    @Ricard: If you want to make a copy of the contact form, just view source or save this page to you local ;)

  11. beautul site thank you for great example

  12. the /i for case insensitivity is definitely recommended.
    When using contenteditable, IE produces upper case tags, mozilla would only create lower case… To strip those you need it case insensitive.

    • DScout, this is incorrect. There are no specified alphabetical characters in the regular expression – the case insensitivity modifier therefore affects nothing.

  13. Hi

    I have following code:

    var text = ‘[$ ssIncludeXml(docName,”wcm:root/wcm:element[@name=’innerpage_content’]/text()”) $]’;
    var StrippedString = text.replace(/(]+)>)/ig,””);

    where ‘[$ ssIncludeXml(docName,”wcm:root/wcm:element[@name=’innerpage_content’]/text()”) $]’
    is Idoc script that brings a block of HTML from a placeholder. But i am getting “unterminated string literal” Error at first line.

    What i want to do is to remove or strip all HTML tags and to get plain text out of that markup.

    Kindly let me know if there is any solution.

    Thanks

  14. works great but doest strip whitespaces….  

  15. Thank you! It was very useful for me and I think that is useful for everyone.
    Thank you again!

  16. Yeah, this solution removed all sorts of HTML, paragraph, line breaks, in-line styles etc etc

  17. This does not works for IE. Please provide solution to strip tag in javascript that works for all browsers

  18. Thanks for this script
    It work greate

  19. i am trying it on

    var message;

        firstName = document.getElementById("username").value;
    
        if (firstName == null || firstName == "" || firstName == NaN || firstName == "First Name") {
            message = "Please Add some name.";
            document.body.insertAdjacentHTML("BeforeEnd", "" + message + "");
        }
        else {
            if (document.getElementById("myMessage")) {
                debugger;
                arguments = document.getElementById("myMessage").value.replace(/(]+)>)/ig, "");
            }
        }
    

    but it is not working and saying

    cannot call method ‘replace’ of undefined

  20. Was wondering how this would be implemented if I only wanted to remove the href tags from a string of text, instead of removing all the tags? I’m trying to retrieve a page of text from a website but I only want the plain text with the formatting tags (p, ul, li).
    Hope this makes sense, thanks in advance.

  21. This was excellent! Thanks!

  22. your “\S” is missing… or not?

    /(<\S([^>]+)>)/ig
    
    • \S means not whitespace, and ^> means not greater than, so your modified regex only ensures that single character tags will not be replaced.

  23. Thanks,
    Its working fine.

  24. Cool! This is perfectly working…

  25. What about < b r / > or < h r / > (the self closing tags) ?

  26. Looks like “newInput” doesn’t do anything at all? So it’s either extraneous or there’s a problem with the code.

  27. I have developed same thing using javascript Regular Expression.
    It’ll strip all the html tags excluding tag provided in exclude list by user.
    source code is also available on github
    check here. HTML Tag Stripper

  28. Nice, but it’s not that safe… I’d rather use jQuery:

    $("

    ").text('
    Hướng dẫn strip html tag js
    ').text();

  29. document.body.innerText

    b”> ~ fail

  30. But this code is not working well with HTML table content.

  31. How can strip all tags except anchor and img tags?

  32. You can easily leave out the case sensitivity /i and the grouping ():

    var noHtml = hasHtml.replace(/<[^>]+>/ig, '')
    
  33. using jQuery
    jQuery(stringWithTags).text()

  34. jQuery(stringWithTags).text();
    it is what i want. tanx…

  35. not working with AngularJS.

  36. Mohammad Mustafa Ahmedzai

    Probably the simplest probably I found online. Thanks a bunch for it. Worked just fine!

  37. string.replace(/\n/g, "");
    string.replace(/[\t ]+\[\t ]+\<");
    string.replace(/\>[\t ]+$/g, ">");
    
  38. Doesn’t anyone see how this solution greatly affects this text:

    Rounded amounts < 3 are way easier for people to use in calculations, since they are so tiny than numbers that are >=3

    Becomes: Rounded amounts =3

  39. Safe way to use the DOM to strip html.

    function striptags(content) {
      var frag = document.createDocumentFragment();
      var innerEl = document.createElement('div');
      frag.appendChild(innerEl);
      innerEl.innerHTML = content;
      return frag.firstChild.innerText;
    }
    striptags('');
    
  40. I chucked together a function that allows some tags to be kept, similar to how the php function works.

    As with PHP it comes with the following two caveats:

    Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.

    and

    This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

    /**
     * Native javascript function to emulate the PHP function strip_tags.
     * 
     * @param {string} str The original HTML string to filter.
     * @param {array|string} allowable_tags A tag name or array of tag
     * names to keep. Intergers, objects, and strings that don't follow the
     * standard tag format of a letter followed by numbers and letters will
     * be ignored. This means that invalid tags will also be removed.
     * @return {string} The filtered HTML string.
     */
    function strip_tags(str, allowable_tags) {
        allowable_tags = [].concat(allowable_tags);
        var keep = '';
        allowable_tags.forEach(function(tag) {
            if (('' + tag).match(/^[a-z][a-z0-9]+$/i))
                keep += (keep.length ? '|' : '') + tag;
        } );
        return str.replace(new RegExp(']+>', 'ig'), '');
    }
    

    Additional checks have been implemented to prevent invalid tags from being removed where possible, by ensuring that the opening of each tag starts with a potential tag name; it does not account for greater than symbols within attributes. Comments will be retained but can be removed with a similar regex.

    var no_comments = strip_tags('This is not a comment. ').replace(//, '');
    
  41. Hi guys! I am currently facing a javascript problem with the regex / replace function you mention here.
    I would like to bring a text around some of its HTML tags.

    For this I use the function:

    var regex = / (<([^>] +)>) / ig;
    bodyValue = bodyValue.replace (regex, "");
    

    Here all tags are deleted.

    But I want to keep the and tags and found these two separate functions that worked for me:

                                   var regex = / <(?! \ s * \ /? \ s * p \ b) [^>] *> / gi; // deletes all HTML except
    
    
                                   var regex = / <(?! br \ s * \ /?) [^>] +> / gi; // deletes all HTML except for 
    

    Do you know how to combine the two conditions in one?

  42. This not only removes the offending characters, but also the rest of the text.

    • What’s the HTML you’re working with?

  43. Why don’t you use Element.textContent?

  44. Just what I needed…Thanks

  45. .replace(/(<([^> ]+)>)/ig, "")
    added a space after the chevron to allow for things like: “< heey >”

  46. Another tip: use the browser’s ability to remove tags:

    const fakeDiv = document.createElement("div");
    fakeDiv.innerHTML = html;
    document.getElementById("stripped").innerHTML = fakeDiv.textContent || fakeDiv.innerText || "";
    
  47. Hello sir. Please I wish to know if I can get help from you.
    I have a frontend submission which users can share their article but will want to remove every link on the form.
    Is there a way to do this only for the post submitted by users who are not admin?
    Thanks
    I already have the frontend post set and it works properly except what I am seeking for help.

Leave a Reply

How do you strip a tag in HTML?

The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.

How remove HTML tag from string in react?

To remove html tags from string in react js, just use the /(<([^>]+)>)/ig regex with replace() method it will remove tags with their attribute and return new string.

How do I remove HTML tags from text in Excel?

About This Article.

Open your project in Excel..

Navigate to the cell with the HTML tags you want to delete..

Press Ctrl + H ..

Type the HTML tags in the cells that you want to delete in the "Find what" field..

Leave the "Replace with" field blank..

Click Replace All..

How do I strip text formatting?

Use Ctrl + A to select all text in a document and then click the Clear All Formatting button to remove the formatting from the text (aka character level formatting.)