Monday, March 30, 2015

Regular expression to extract the src tag details from given image tag or html source text


By using the regular expression we can extract the any data from the given input, here we are trying to get the src attribute value from the given image tag or html text data.

Image 'src' attribute extract Regx:

<img[^>]*src=[\\\"']([^\\\"^']*)


sample code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 * @author Uttesh Kumar T.H.
 */
public class ImgTest {

    public static void main(String[] args) {

        String s = "<p><img src=\"38220.png\" alt=\"test\" title=\"test\" /> <img src=\"32222.png\" alt=\"test\" title=\"test\" /></p>";
        Pattern p = Pattern.compile("<img[^>]*src=[\\\"']([^\\\"^']*)");
        Matcher m = p.matcher(s);
        while (m.find()) {
            String src = m.group();
            int startIndex = src.indexOf("src=") + 5;
            String srcTag = src.substring(startIndex, src.length());
            System.out.println(srcTag);
        }
    }

}


1 comment: