Postgresql convert html to text
Use xpathFeed your database with XML datatype, not with "second class" TEXT, because is very simple to convert HTML into XHTML (see HTML-Tidy or standard DOM's ! IT IS FAST AND IS VERY SAFE ! The commom information retrieval need, is not a full content, but something into the XHTML, so
the power of Example: retrive all paragraphs with
regex solutions...I not recomend because is not an "information retrieval" solution... and, as @James and others commented here, the regex solution is not so safe. I like "pure SQL", for me is better than use Perl (se @Daniel's solution) or another.
See this and many other variations at siafoo.net, eskpee.wordpress, ... and here at Stackoverflow. Author - Kailash Problem : How to create a function in Postgres that will remove HTML tags from a piece of text? Solution : Create function in postgres : CREATE OR REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$ SELECT regexp_replace($1, '<[^>]*>', '', 'g') $$ LANGUAGE SQL; How to use : SELECT strip_tags(' Note: This function will remove all the content between < and > symbol. If HTML tags are not proper then your text may also get removed so check your HTML before parsing it through this function. Webner Solutions is a Software Development company focused on developing Insurance Agency Management Systems, Learning Management Systems and Salesforce apps. Contact us at for your Insurance, eLearning and Salesforce applications.
|