Android:如何在网站包含时下载RSS:link rel =“alternate”type =“application / rss xml”

我正在制作一个RSS相关的应用程序.
我希望能够下载仅包含以下网站URL的RSS(xml):

link rel =“alternate”type =“application / rss xml”

例如,http://www.engaget.com源包含:

<link rel="alternate" type="application/rss+xml" title="Engadget" href="http://www.engadget.com/rss.xml">

我假设如果我将此站点作为RSS应用程序打开,它将重定向到http://www.engadget.com/rss.xml页面.

我下载xml的代码如下:

private boolean downloadXml(String url, String filename) {
        try {
            URL   urlxml = new URL(url);
            URLConnection ucon = urlxml.openConnection();
            ucon.setConnectTimeout(4000);
            ucon.setReadTimeout(4000);
            InputStream is = ucon.getInputStream();
            BufferedInputStream bis = new BufferedInputStream(is, 128);
            FileOutputStream fOut = openFileOutput(filename + ".xml", Context.MODE_WORLD_READABLE | Context.MODE_WORLD_WRITEABLE);
            OutputStreamWriter osw = new OutputStreamWriter(fOut);
            int current = 0;
            while ((current = bis.read()) != -1) {
                osw.write((byte) current);
            }
            osw.flush();
            osw.close();

        } catch (Exception e) {
            return false;
        }
        return true;
    }

在我不知道’http://www.engadget.com/rss.xml’url的情况下,当我输入’http://www.engadget.com’时如何下载RSS?

解决方法:

要实现这一目标,您需要:

>检测URL是否指向HTML文件.请参阅下面的代码中的isHtml方法.
>如果URL指向HTML文件,请从中提取RSS URL.请参阅下面的代码中的extractRssUrl方法.

以下代码是您在问题中粘贴的代码的修改版本.对于I / O,我使用Apache Commons IO作为有用的IOUtils和FileUtils类. IOUtils.toString用于将输入流转换为字符串,如文章“In Java, how do I read/convert an InputStream to a String?”中所建议的那样

extractRssUrl使用正则表达式来解析HTML,即使它非常不受欢迎. (参见“RegEx match open tags except XHTML self-contained tags”中的咆哮.)考虑到这一点,让extractRssUrl成为一个起点. extractRssUrl中的正则表达式是基本的,并未涵盖所有情况.

请注意,对isRss(str)的调用已被注释掉.如果要进行RSS检测,请参阅“How to detect if a page is an RSS or ATOM feed”.

private boolean downloadXml(String url, String filename) {
    InputStream is = null;
    try {
        URL urlxml = new URL(url);
        URLConnection ucon = urlxml.openConnection();
        ucon.setConnectTimeout(4000);
        ucon.setReadTimeout(4000);
        is = ucon.getInputStream();
        String str = IOUtils.toString(is, "UTF-8");
        if (isHtml(str)) {
            String rssURL = extractRssUrl(str);
            if (rssURL != null && !url.equals(rssURL)) {
                return downloadXml(rssURL, filename + ".xml");
            }
        } else { // if (isRss(str)) {
            // For now, we'll assume that we're an RSS feed at this point
            FileUtils.write(new File(filename), str);
            return true;
        }
    } catch (Exception e) {
        // do nothing
    } finally {
        IOUtils.closeQuietly(is);
    }
    return false;
}

private boolean isHtml(String str) {
    Pattern pattern = Pattern.compile("<html", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(str);
    return matcher.find();
}

private String extractRssUrl(String str) {
    Pattern pattern = Pattern.compile("<link(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*\\s+type=\"application/rss\\+(?:xml|atom)\"(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*?\\s*/?>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(str);
    if (matcher.find()) {
        for (int i = 1; i <= matcher.groupCount(); i++) {
            if (matcher.group(i) != null) {
                return matcher.group(i);
            }
        }
    }
    return null;
}

上面的代码适用于您的Engadget示例:

obj.downloadXml("http://www.engadget.com/", "rss");
上一篇:php-YouTube XML Feed错误


下一篇:PHP-如何使用Google Feed API来检测Feed更新