- 前面已經寫了兩篇簡體和繁體互相轉化的文章,但是都只是字的翻譯,但是一些詞語,大陸和香港,香港和臺灣之間是有一些差別。怎么才能做到翻譯兼容一些當地的語言使用習慣呢,這就是這篇文章所要解決的問題。
- 網上有不少有簡體和繁體相互翻譯的網站,不過大部分也是逐字翻譯,比如簡體翻譯成繁體(二)中的類和函數,其實我就是從網站上抄寫過來,把javascript實現方式改成C#實現方式。在眾多網站中,終于找個了一個能兼容一些語言使用習慣的翻譯網站https://brushes8.com/zhong-wen-jian-ti-fan-ti-zhuan-huan,雖然翻譯得勉勉強強,但是還是可以用的。
- 既然這個網站有我們所需要功能,我們總不能要翻譯的時候,每次拷貝文本在上面的輸入欄輸入,然后點擊“開始轉換”按鈕得出結果,最后把結果再拷貝回去。這里有一個問題,如果翻譯量少,這個還能忍受。如果是翻譯很多的話,這就太麻煩費事了。怎么辦呢?
我們可以寫一個網絡爬蟲,模擬網站的提交,然后獲取結果,不就行了嗎。真的是如此的簡單。話不多發,上代碼!
using System;
using System.IO;
using System.IO.Compression;
using System.Net;
using System.Net.Security;
using System.Security.Cryptography.X509Certificates;
using System.Text;
public class ChineseConverter {
public static string GetWebString(string cc)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
const string url = "https://brushes8.com/zhong-wen-jian-ti-fan-ti-zhuan-huan";
if (url.StartsWith("https", StringComparison.OrdinalIgnoreCase))
{
ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CheckValidationResult);
request = WebRequest.Create(url) as HttpWebRequest;
request.ProtocolVersion = HttpVersion.Version11;
}
else
{
request = WebRequest.Create(url) as HttpWebRequest;
}
request.Method = "POST";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
request.ContentType = "application/x-www-form-urlencoded";
request.AllowAutoRedirect = true;
request.KeepAlive = true;//建立持久性連接
//整數據
string postData = string.Format("data={0}&variant=zh-tw&dochineseconversion=1&submit=開始轉換 (Ctrl + Enter)", cc);
byte[] bytepostData = Encoding.UTF8.GetBytes(postData);
request.ContentLength = bytepostData.Length;
//發送數據 using結束代碼段釋放
using (Stream requestStm = request.GetRequestStream())
{
requestStm.Write(bytepostData, 0, bytepostData.Length);
}
//響應
response = (HttpWebResponse)request.GetResponse();
StringBuilder sb = new StringBuilder();
string text = string.Empty;
using (Stream responseStm = response.GetResponseStream())
{
if (response.ContentEncoding.ToLower().Contains("gzip"))
{
using (GZipStream stream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
{
text = reader.ReadToEnd();
}
}
}
else if (response.ContentEncoding.ToLower().Contains("deflate"))
{
using (DeflateStream stream = new DeflateStream(response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
{
text = reader.ReadToEnd();
}
}
}
else
{
using (Stream stream = response.GetResponseStream())
{
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
{
text = reader.ReadToEnd();
}
}
}
}
text = text.Substring(14457);
return text;
}
private static bool CheckValidationResult(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors errors)
{
return true;
}
}
1. 獲取FormData
示范分析過程
各個參數
參數連接
從圖片中可以看出我們所要提交的postData的結構:data={0}&variant=zh-tw&dochineseconversion=1&submit=開始轉換 (Ctrl + Enter),{0}即是我們輸入的字符串。
2.設置request的參數
把參數一一設置到代碼
request.Method = "POST";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
request.ContentType = "application/x-www-form-urlencoded";
3. & 問題
如果我有一個lang.xml,里面的內容是這樣的
對應文本如下:
<root>
<n k="嘖嘖??吹竭@些排列的整整齊齊的機器人,就忍不住想沖進去大鬧一場啊……" v="" />
<n k="至少說明我們來對了地方。準備動手吧!" v="" />
<n k="惡作劇發明家,今天我可是留手了哦。" v="" />
<n k="以你的戰斗力,留不留手根本無所謂。" v="" />
<n k="你們一碰面就會吵架嗎?難道你們從來沒配合過……" v="" />
<n k="畢昇&達爾文" v="" />
<n k="(異口同聲)誰要和這樣的家伙配合???" v="" />
<n k="兩人對視片刻" v="" />
<n k="(扭頭)哼!" v="" />
<n k="(這個同步率……)" v="" />
<n k="前面就是主控室了……" v="" />
</root>
我們要別k="xxx"里面文字翻譯到v=""里面,我們把文字在網頁上提交能得到正確的結果,但是如果這樣調用
string text = File.ReadAllText(langPath, Encoding.UTF8);
text = ChineseConverter.GetWebString(text);
結果就是會在&;這個地方翻譯就斷掉了,后面的內容被舍棄掉了。
這個問題我還沒想到非常好的方法,暫時的方式時先把&替換成其他字符,等翻譯完成之后在替換回來。比如
private static void ConvertToTraditonal()
{
string langPath = GameAssetPath.Lang + "/" + LangType.tc + "/" + AssetFixedName.Lang_Xml;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(langPath);
XmlElement root = xmlDoc.DocumentElement;
XmlNodeList list = root.ChildNodes;
string text = File.ReadAllText(langPath, Encoding.UTF8);
text = text.Replace("&", "#####");
text = ChineseConverter2.GetWebString(text);
XmlDocument xmlDoc2 = new XmlDocument();
string dst = text.Replace("\\\"", "\"").Replace("#####", "&").Replace("\\\\n", "\\n");
xmlDoc2.LoadXml(dst);
XmlElement root2 = xmlDoc2.DocumentElement;
XmlNodeList list2 = root2.ChildNodes;
for (int i = 0; i < list2.Count; i++)
{
string kAttr = list2[i].Attributes["k"].Value;
list[i].Attributes["v"].Value = kAttr;
}
xmlDoc.Save(langPath);
}
需要提出的時這部分
string dst = text.Replace("\\\"", "\"").Replace("#####", "&").Replace("\\\\n", "\\n");
這里是因為結果
引號和換行符\n前面都會多加一個\號。
langPath可以替換成你所在的xml的路徑,運行完畢得出正確結果
perfect!over了