程序笔记

C#正则表达式去除网页标签的id class style

2024-07-21 126

要从HTML中去除所有标签的id、class和style属性，你可以使用正则表达式来实现。以下是一个简单的示例：

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string htmlContent = "<div id=\"content\" class=\"main-content\" style=\"font-size: 16px;\">This is some content.</div>";

        // 去除id、class和style属性
        string result = RemoveAttributes(htmlContent);

        Console.WriteLine(result);
    }

    static string RemoveAttributes(string html)
    {
        // 匹配标签及其属性
        string pattern = @"<(\w+)(?:\s+[^>]*)?>";

        // 替换匹配到的标签
        string result = Regex.Replace(html, pattern, m =>
        {
            string tag = m.Groups[1].Value;
            return $"<{tag}>";
        });

        return result;
    }
}

在这个示例中，我们使用了正则表达式<(\w+)(?:\s+[^>]*)?>来匹配HTML标签及其属性。这个正则表达式的含义是：

<(\w+)：匹配以<开头的标签名称，\w+表示一个或多个字母数字字符，()表示捕获组。

(?:\s+[^>]*)?：匹配零个或多个空白字符后跟任意字符，直到遇到>，(?: ... )表示非捕获组，?表示该组出现零次或一次。

在RemoveAttributes方法中，我们使用Regex.Replace方法将匹配到的标签替换为相同的标签名称，从而达到去除id、class和style属性的目的。

更新于：6个月前

赞一波！2

文章评论

评论问答

雷达智富

雷达智富

程序笔记

C#正则表达式去除网页标签的id class style

相关文章

文章评论