'gb2312' is not a supported encoding name. For information on defining a custom encoding
2024-10-12
31
.NET Core使用HttpClinet抓取网页,使用Encoding.GetEncoding("gb2312").GetString(arr)方法获取网页内容时报错:'gb2312' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name')。
代码如下:
var handler = new HttpClientHandler() { AutomaticDecompression = System.Net.DecompressionMethods.GZip };
using (HttpClient client = new HttpClient(handler)) {
var res = await client.GetAsync(Url);
var arr = await res.Content.ReadAsByteArrayAsync();
var str = Encoding.GetEncoding("gb2312").GetString(arr);
}
解决方法是安装System.Text.Encoding.CodePages包,然后注册provider。
EncodingProvider provider = CodePagesEncodingProvider.Instance;
Encoding.RegisterProvider(provider);
var handler = new HttpClientHandler() { AutomaticDecompression = System.Net.DecompressionMethods.GZip };
using (HttpClient client = new HttpClient(handler)) {
var res = await client.GetAsync(Url);
var arr = await res.Content.ReadAsByteArrayAsync();
var str = Encoding.GetEncoding("gb2312").GetString(arr);
}
如果使用Encoding.UTF8.GetEncoding虽然不会报错,但是因为网页的content-type是charset=gb2312,所以网页中的中文会变成乱码。
后来又尝试了一些其他的方法,发现不管使用ReadAsByteArrayAsync,ReadAsStreamAsync还是ReadAsStringAsync,如果碰到这种编码的都需要安装这个包才能避免报错和避免乱码。
参考文章:https://www.leavescn.com/Articles/Content/1284
如果有更好的办法,请留言分享,谢谢。
更新于:2个月前赞一波!3
文章评论
评论问答