How do i get rid of html tags?

Instantly remove html tags from a string of content with this online tool. Enter all of the code for a web page or just a part of a web page and this tool will automatically remove all the HTML elements leaving just the text content you want.

This JavaScript based tool will also extract the text for the HTML button element and the title metatag alongside regular text content.

If you need to remove HTML tags then give it a whirl - it works pretty darn well at stripping out those unwanted HTML elements.

How to Remove HTML Tags from Text

This is just a bit of a technical note about removing html elements using JavaScript code so if you're not into the technical details then just skip this part and use the html stripper tool above.

Generally it's preferable to use an approach that leverages the DOM in a graceful way to find and remove the HTML content over an approach that just uses Regular Expressions to find and remove HTML tags.

Because you will encounter malformed HTML, the regex approach can fail in spectacular ways so here I tried to leverage the javascript innerText property to get the job done in a more dependable way.

The Problem with Using InnerText

Using the jaavscript innertext property to remove HTML tags unfortunately doesn't work exactly how I wanted it too so I had to sweeten the deal with some regular expressions to get the text output I wanted.

The big problem, for me, with using innertext to remove html tags was that it would remove script tags but leave the contents in-between the opening and closing script tag in your text content. It also did the same for style tags in those instances where you might encounter some on page style rules.

Although optional, I also added a regex to make the output more readable by getting rid of excess multiple line breaks. It just made the output format a bit more readable.

Anyways if none of these are deal breakers for you then I would just say use the innerText property to remove html tags from your web content. Otherwise you'll need to use some regex to remove the HTML tags.

The Newest Tools by Category

Text Fixing Tools

HTML Coding Tools

Word & Language Tools

Random Life Tools

How do i get rid of html tags?

Aaron has a document that contains a number of HTML tags, and he would like to remove the tags but maintain the formatting they represent. For instance, if he has a phrase that appears this way, he would like to remove the tags ( and ) but have "a phrase" appear in italics. Aaron is pretty sure this can be done with Find and Replace, but he's not quite sure how to go about it.

You are right, Aaron—you can use Find and Replace to accomplish the removal. The way you would do it is to follow these steps:

  1. Press Ctrl+H. Word displays the Replace tab of the Find and Replace dialog box.
  2. Click the More button, if it is available. (See Figure 1.)
  3. How do i get rid of html tags?

    Figure 1. The Replace tab of the Find and Replace dialog box.

  4. Make sure the Use Wildcards check box is selected.
  5. In the Find What box, enter the following: \([!<]@)\
  6. In the Replace With box, enter the following: \1
  7. With the insertion point still in the Replace With box, press Ctrl+I once. The text "Italic" should appear just below the Replace With box.
  8. Click Replace All.

The code that you enter in the Find What box (step 4) may look a little daunting. All you are telling Word to do is to find the beginning HTML tag () followed by any number of characters and ending with the closing HTML tag (). The very short entry in the Replace With box (step 5) simply says to replace whatever is found with the contents of the first element of the Find What box that is surrounded by parentheses—which just happens to be the text between the two HTML tags.

If you want to eliminate the need to remember (or look up) the contents of the Find What box all the time, you can place the Find and Replace operation into a macro:

Sub ConvertItalicTags()
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Italic = True
    With Selection.Find
        .Text = "\([!<]@)\"
        .Replacement.Text = "\1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Assign the macro to a shortcut key, and you can remove the italic HTML tags anytime you need. You could also expand the macro to make similar changes relative to other HTML tags you may need to remove. You may even want to make sure that alternate tags are dealt with. For instance, HTML uses both and tags to display information in italic, which means you should account for the possibility of both sets of tags in your macro.

Of course, there is an entirely different approach you could use to get rid of the HTML tags and still retain the formatting associated with those tags. That would be to save the HTML-encoded text into a text file, open it in your browser, copy the text within the browser window, and paste it directly into a Word document. If all goes well, you would have the desired formatted text in your finished document.

If you would like to know how to use the macros described on this page (or on any other page on the WordTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

WordTips is your source for cost-effective Microsoft Word training. (Microsoft Word is the most popular word processing software in the world.) This tip (10308) applies to Microsoft Word 2007, 2010, 2013, 2016, 2019, and Word in Microsoft 365.

Author Bio

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. Learn more about Allen...

MORE FROM ALLEN

Leading Zeros in Page Numbers

Page numbers in Excel printouts are typically simple counters, without much chance for embellishment. If you want to add ...

Discover More

Inserting the Date Your Document Was Last Printed

Word keeps track of each time you print your document, and you can automatically insert the last printing date anywhere ...

Discover More

Counting All Graphics

Need to know how many graphics a document contains? Getting at the true number may take a little more work than it first ...

Discover More

How do you remove tags in HTML?

Learn a quick way to remove HTML tags. For HTML tags, you can press Alt+Enter and select Remove tag instead of removing an opening tag and then a closing tag.

How do I get rid of HTML text?

Removing HTML Tags from Text.
Press Ctrl+H. ... .
Click the More button, if it is available. ... .
Make sure the Use Wildcards check box is selected..
In the Find What box, enter the following: \([!<]@)\.
In the Replace With box, enter the following: \1..
With the insertion point still in the Replace With box, press Ctrl+I once..

How do I get rid of HTML in Google?

Removing Formatting & Stripping HTML.
Select all the text in the document by pressing "Ctrl" and "A" simultaneously..
Press "Ctrl" and the backlash key ("\") to strip all formatting from the document. ... .
Download the document as an HTML file and open the file in your preferred plain text editor. ... .
Delete the stylesheet..

How do I strip a string in HTML?

To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace() function and can also use . textContent property, . innerText property from HTML DOM.