OXIESEC PANEL
- Current Dir:
/
/
opt
/
gsutil
/
third_party
/
chardet
/
docs
Server IP: 2a02:4780:11:1594:0:ef5:22d7:a
Upload:
Create Dir:
Name
Size
Modified
Perms
📁
..
-
12/11/2024 09:39:44 AM
rwxr-xr-x
📄
.gitignore
7 bytes
08/01/2023 07:12:12 PM
rw-r--r--
📄
Makefile
6.61 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
README.md
127 bytes
08/01/2023 07:12:12 PM
rw-r--r--
📁
api
-
08/01/2023 07:12:12 PM
rwxr-xr-x
📄
conf.py
8.56 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
faq.rst
5.06 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
how-it-works.rst
8.37 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
index.rst
324 bytes
08/01/2023 07:12:12 PM
rw-r--r--
📄
make.bat
6.31 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
supported-encodings.rst
1.27 KB
08/01/2023 07:12:12 PM
rw-r--r--
📄
usage.rst
2.75 KB
08/01/2023 07:12:12 PM
rw-r--r--
Editing: usage.rst
Close
Usage ===== Basic usage ----------- The easiest way to use the Universal Encoding Detector library is with the ``detect`` function. Example: Using the ``detect`` function -------------------------------------- The ``detect`` function takes one argument, a non-Unicode string. It returns a dictionary containing the auto-detected character encoding and a confidence level from ``0`` to ``1``. .. code:: python >>> import urllib.request >>> rawdata = urllib.request.urlopen('http://yahoo.co.jp/').read() >>> import chardet >>> chardet.detect(rawdata) {'encoding': 'EUC-JP', 'confidence': 0.99} Advanced usage -------------- If you’re dealing with a large amount of text, you can call the Universal Encoding Detector library incrementally, and it will stop as soon as it is confident enough to report its results. Create a ``UniversalDetector`` object, then call its ``feed`` method repeatedly with each block of text. If the detector reaches a minimum threshold of confidence, it will set ``detector.done`` to ``True``. Once you’ve exhausted the source text, call ``detector.close()``, which will do some final calculations in case the detector didn’t hit its minimum confidence threshold earlier. Then ``detector.result`` will be a dictionary containing the auto-detected character encoding and confidence level (the same as the ``chardet.detect`` function `returns <usage.html#example-using-the-detect-function>`__). Example: Detecting encoding incrementally ----------------------------------------- .. code:: python import urllib.request from chardet.universaldetector import UniversalDetector usock = urllib.request.urlopen('http://yahoo.co.jp/') detector = UniversalDetector() for line in usock.readlines(): detector.feed(line) if detector.done: break detector.close() usock.close() print(detector.result) .. code:: python {'encoding': 'EUC-JP', 'confidence': 0.99} If you want to detect the encoding of multiple texts (such as separate files), you can re-use a single ``UniversalDetector`` object. Just call ``detector.reset()`` at the start of each file, call ``detector.feed`` as many times as you like, and then call ``detector.close()`` and check the ``detector.result`` dictionary for the file’s results. Example: Detecting encodings of multiple files ---------------------------------------------- .. code:: python import glob from chardet.universaldetector import UniversalDetector detector = UniversalDetector() for filename in glob.glob('*.xml'): print(filename.ljust(60), end='') detector.reset() for line in open(filename, 'rb'): detector.feed(line) if detector.done: break detector.close() print(detector.result)