Detect and remove URLs from textarea
Try (Corrected and improved after comments):
value = value.replace(/^(\[url=)?(https?:\/\/)?(www\.|\S+?\.)(\S+?\.)?\S+$\s*/mg, '');
Peeling the expression from end to start:
- An address might have two or three 'parts', besides the scheme
- An address might start with www or not
- It my be preceeded by http:// or https://
- It may be enclosed inside [url=...]...[/url]
This expression does not enforce the full correct syntax, that is a much tougher regex to write.
A few improvements you might want:
1.Awareness of spaces
value = value.replace(/^\s*(\[\s*url\s*=\s*)?(https?:\/\/)?(www\.|\S+?\.)(\S+?\.)?\S+\s*$\s*/mg, '');
2.Enforce no dots on the last part
value = value.replace(/^(\[url=)?(https?:\/\/)?(www\.|\S+?\.)(\S+?\.)?[^.\s]+$\s*/mg, '');
Regarding your attempt at checking if there is a URL in the textarea.
if ($('textarea[name="test"]').val().indexOf('[url') >= 0 ||
$('textarea[name="test"]').val().match(/^http([s]?):\/\/.*/) ||
$('textarea[name="test"]').val().match(/^www.[0-9a-zA-Z',-]./)) {
Firstly, rather than getting the textarea value three times using multiple function calls it would better to store it in a variable before the checking, i.e.
var value = $('textarea[name="test"]').val();
The /^http([s]?):\/\/.*/
, because of the ^
will only match if the "http://..." is found right at the beginning of the textarea value. The same applies to the ^www.
. Adding the multiline flag m
to the end of the regex would make ^
match the start of each line, rather than just the start of the string.
The .*
in /^http([s]?):\/\/.*/
serves no purpose as it matches zero or more characters. The ([s]?)
is better as s?
.
In /^www.[0-9a-zA-Z',-]./
, the .
needs to be escaped to match a literal .
if that is your intention, i.e. \.
, and I assume you mean to match more than one of the characters in the character class so you need to follow it with +
.
It is more efficient to use the RegExp test
method rather than match
when the actual matches are not required, so, combining the above, you could have
if ( /^(\[url|https?:\/\/|www\.)/m.test( value ) ) {
There is little point in the check anyway if you are only using it to decide whether you need to call replace
, because the check is implicit in the replace
call itself
Using the simple criteria that strings of non-space characters at the start of a line and beginning with http[s]://
, [url
or www.
, should be removed, you could use
value = value.replace( /^(?:https?:\/\/|\[url|www\.)\S+\s*/gm, '' );
If the urls can appear anywhere you could use \b
, meaning word boundary, instead of ^
, and remove the m
flag.
value = value.replace( /(?:\bhttps?:\/\/|\bwww\.|\[url)\S+\s*/g, '' );
It would be a waste of effort to try to offer a better regex solution without precise details of what forms of url may appear in the textarea, where they may appear and what characters may adjoin them.
If any valid url can appear anywhere in the textarea and be surrounded by any other characters than there is no watertight solution.