Hướng dẫn remove html from url

image

The search engine may index these pages as duplicate content, to overcome this add a meta tag in the HTML file. 

Example:  

Good question, but it seems to have confused people. The answers are almost equally divided between those who thought Dave [the OP] was saving his HTML pages without the .html extension, and those who thought he was saving them as normal [with .html], but wanting the URL to show up without. While the question could have been worded a little better, I think it’s clear what he meant. If he was saving pages without .html, his two question [‘how to remove .html'] and [how to ‘redirect any url with .html’] would be exactly the same question! So that interpretation doesn’t make much sense. Also, his first comment [about avoiding an infinite loop] and his own answer seem to confirm this.

So let’s start by rephrasing the question and breaking down the task. We want to accomplish two things:

  1. Visibly remove the .html if it’s part of the requested URL [e.g. /page.html]
  2. Point the cropped URL [e.g. /page] back to the actual file [/page.html].

There’s nothing difficult about doing either of these things. [We could achieve the second one simply by enabling MultiViews.] The challenge here is doing them both without creating an infinite loop.

Dave’s own answer got the job done, but it’s pretty convoluted and not at all portable. [Sorry Dave.] Łukasz Habrzyk seems to have cleaned up Anmol’s answer, and finally Amit Verma improved on them both. However, none of them explained how their solutions solved the fundamental problem—how to avoid an infinite loop. As I understand it, they work because THE_REQUEST variable holds the original request from the browser. As such, the condition [RewriteCond %{THE_REQUEST}] only gets triggered once. Since it doesn’t get triggered upon a rewrite, you avoid the infinite loop scenario. But then you're dealing with the full HTTP request—GET, HTTP and all—which partly explains some of the uglier regex examples on this page.

I’m going to offer one more approach, which I think is easier to understand. I hope this helps future readers understand the code they’re using, rather than just copying and pasting code they barely understand and hoping for the best.

RewriteEngine on

# Remove .html [or htm] from visible URL [permanent redirect]
RewriteCond %{REQUEST_URI} ^/[.+]\.html?$ [nocase]
RewriteRule ^ /%1 [L,R=301]

# Quietly point back to the HTML file [temporary/undefined redirect]:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^ %{REQUEST_URI}.html [END]

Let’s break it down…

The first rule is pretty simple. The condition matches any URL ending in .html [or .htm] and redirects to the URL without the filename extension. It's a permanent redirect to indicate that the cropped URL is the canonical one.

The second rule is simple too. The first condition will only pass if the requested filename is not a valid directory [!-d]. The second will only pass if the filename refers to a valid file [-f] with the .html extension added. If both conditions pass, the rewrite rule simply adds ‘.html’ to the filename. And then the magic happens… [END]. Yep, that’s all it takes to prevent an infinite loop. The Apache RewriteRule Flags documentation explains it:

Using the [END] flag terminates not only the current round of rewrite processing [like [L]] but also prevents any subsequent rewrite processing from occurring in per-directory [htaccess] context.

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề