Initial commit
This commit is contained in:
commit
b53fb99038
|
|
@ -0,0 +1,3 @@
|
||||||
|
4.5.0
|
||||||
|
last_version: 4.4.0
|
||||||
|
source_branch: feature/mirror-update-pr-10
|
||||||
|
|
@ -0,0 +1,90 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Initializes a git repository and pushes to a remote using credentials from arguments
|
||||||
|
#
|
||||||
|
# Usage: ./git-init-from-config.sh <git-url> <git-user> <git-token>
|
||||||
|
#
|
||||||
|
# Arguments:
|
||||||
|
# git-url - Repository URL (e.g., https://gitea.example.com/user/repo.git)
|
||||||
|
# git-user - Git username
|
||||||
|
# git-token - Git token or password
|
||||||
|
#
|
||||||
|
# The branch name will be read from the VERSION file (first line)
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
if [ "$#" -lt 3 ]; then
|
||||||
|
echo "Usage: $0 <git-url> <git-user> <git-token>"
|
||||||
|
echo ""
|
||||||
|
echo "Arguments:"
|
||||||
|
echo " git-url - Repository URL (e.g., https://gitea.example.com/user/repo.git)"
|
||||||
|
echo " git-user - Git username"
|
||||||
|
echo " git-token - Git token or password"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
GIT_URL="$1"
|
||||||
|
GIT_USER="$2"
|
||||||
|
GIT_TOKEN="$3"
|
||||||
|
|
||||||
|
if [ -z "$GIT_URL" ]; then
|
||||||
|
echo "Error: git-url is required"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "$GIT_USER" ]; then
|
||||||
|
echo "Error: git-user is required"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "$GIT_TOKEN" ]; then
|
||||||
|
echo "Error: git-token is required"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -f "VERSION" ]; then
|
||||||
|
echo "Error: VERSION file not found in current directory"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
GIT_BRANCH=$(head -n 1 VERSION | tr -d '[:space:]')
|
||||||
|
|
||||||
|
if [ -z "$GIT_BRANCH" ]; then
|
||||||
|
echo "Error: VERSION file is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Version detected: $GIT_BRANCH"
|
||||||
|
|
||||||
|
if [[ "$GIT_URL" == https://* ]]; then
|
||||||
|
URL_WITHOUT_PROTOCOL="${GIT_URL#https://}"
|
||||||
|
AUTH_URL="https://${GIT_USER}:${GIT_TOKEN}@${URL_WITHOUT_PROTOCOL}"
|
||||||
|
elif [[ "$GIT_URL" == http://* ]]; then
|
||||||
|
URL_WITHOUT_PROTOCOL="${GIT_URL#http://}"
|
||||||
|
AUTH_URL="http://${GIT_USER}:${GIT_TOKEN}@${URL_WITHOUT_PROTOCOL}"
|
||||||
|
else
|
||||||
|
echo "Error: URL must start with http:// or https://"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Initializing git repository..."
|
||||||
|
git init
|
||||||
|
|
||||||
|
echo "Adding all files..."
|
||||||
|
git add .
|
||||||
|
|
||||||
|
echo "Creating initial commit..."
|
||||||
|
git commit -m "Initial commit" || echo "Nothing to commit or already committed"
|
||||||
|
|
||||||
|
echo "Setting branch to $GIT_BRANCH..."
|
||||||
|
git branch -M "$GIT_BRANCH"
|
||||||
|
|
||||||
|
echo "Adding remote origin..."
|
||||||
|
git remote remove origin 2>/dev/null || true
|
||||||
|
git remote add origin "$AUTH_URL"
|
||||||
|
|
||||||
|
echo "Pushing to remote..."
|
||||||
|
git push -u origin "$GIT_BRANCH"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Done! Repository pushed to $GIT_URL on branch $GIT_BRANCH"
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,845 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiobotocore
|
||||||
|
Version: 2.25.2
|
||||||
|
Summary: Async client for aws services using botocore and aiohttp
|
||||||
|
Author-email: Nikolay Novik <nickolainovik@gmail.com>
|
||||||
|
License-Expression: Apache-2.0
|
||||||
|
Project-URL: Repository, https://github.com/aio-libs/aiobotocore
|
||||||
|
Project-URL: Documentation, https://aiobotocore.aio-libs.org
|
||||||
|
Classifier: Development Status :: 4 - Beta
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Intended Audience :: System Administrators
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Environment :: Web Environment
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE
|
||||||
|
Requires-Dist: aiohttp<4.0.0,>=3.9.2
|
||||||
|
Requires-Dist: aioitertools<1.0.0,>=0.5.1
|
||||||
|
Requires-Dist: botocore<1.40.71,>=1.40.46
|
||||||
|
Requires-Dist: python-dateutil<3.0.0,>=2.1
|
||||||
|
Requires-Dist: jmespath<2.0.0,>=0.7.1
|
||||||
|
Requires-Dist: multidict<7.0.0,>=6.0.0
|
||||||
|
Requires-Dist: wrapt<2.0.0,>=1.10.10
|
||||||
|
Provides-Extra: awscli
|
||||||
|
Requires-Dist: awscli<1.42.71,>=1.42.46; extra == "awscli"
|
||||||
|
Provides-Extra: boto3
|
||||||
|
Requires-Dist: boto3<1.40.71,>=1.40.46; extra == "boto3"
|
||||||
|
Provides-Extra: httpx
|
||||||
|
Requires-Dist: httpx<0.29,>=0.25.1; extra == "httpx"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
aiobotocore
|
||||||
|
===========
|
||||||
|
.. |ci badge| image:: https://github.com/aio-libs/aiobotocore/actions/workflows/ci-cd.yml/badge.svg?branch=master
|
||||||
|
:target: https://github.com/aio-libs/aiobotocore/actions/workflows/ci-cd.yml
|
||||||
|
:alt: CI status of master branch
|
||||||
|
.. |pre-commit badge| image:: https://results.pre-commit.ci/badge/github/aio-libs/aiobotocore/master.svg
|
||||||
|
:target: https://results.pre-commit.ci/latest/github/aio-libs/aiobotocore/master
|
||||||
|
:alt: pre-commit.ci status
|
||||||
|
.. |coverage badge| image:: https://codecov.io/gh/aio-libs/aiobotocore/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiobotocore
|
||||||
|
:alt: Coverage status on master branch
|
||||||
|
.. |docs badge| image:: https://readthedocs.org/projects/aiobotocore/badge/?version=latest
|
||||||
|
:target: https://aiobotocore.readthedocs.io/en/latest/?badge=latest
|
||||||
|
:alt: Documentation Status
|
||||||
|
.. |pypi badge| image:: https://img.shields.io/pypi/v/aiobotocore.svg
|
||||||
|
:target: https://pypi.python.org/pypi/aiobotocore
|
||||||
|
:alt: Latest version on pypi
|
||||||
|
.. |gitter badge| image:: https://badges.gitter.im/Join%20Chat.svg
|
||||||
|
:target: https://gitter.im/aio-libs/aiobotocore
|
||||||
|
:alt: Chat on Gitter
|
||||||
|
.. |pypi downloads badge| image:: https://img.shields.io/pypi/dm/aiobotocore.svg?label=PyPI%20downloads
|
||||||
|
:target: https://pypi.org/project/aiobotocore/
|
||||||
|
:alt: Downloads Last Month
|
||||||
|
.. |conda badge| image:: https://img.shields.io/conda/dn/conda-forge/aiobotocore.svg?label=Conda%20downloads
|
||||||
|
:target: https://anaconda.org/conda-forge/aiobotocore
|
||||||
|
:alt: Conda downloads
|
||||||
|
.. |stackoverflow badge| image:: https://img.shields.io/badge/stackoverflow-Ask%20questions-blue.svg
|
||||||
|
:target: https://stackoverflow.com/questions/tagged/aiobotocore
|
||||||
|
:alt: Stack Overflow
|
||||||
|
|
||||||
|
|ci badge| |pre-commit badge| |coverage badge| |docs badge| |pypi badge| |gitter badge| |pypi downloads badge| |conda badge| |stackoverflow badge|
|
||||||
|
|
||||||
|
Async client for amazon services using botocore_ and aiohttp_/asyncio_.
|
||||||
|
|
||||||
|
This library is a mostly full featured asynchronous version of botocore.
|
||||||
|
|
||||||
|
|
||||||
|
Install
|
||||||
|
-------
|
||||||
|
::
|
||||||
|
|
||||||
|
$ pip install aiobotocore
|
||||||
|
|
||||||
|
|
||||||
|
Basic Example
|
||||||
|
-------------
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
from aiobotocore.session import get_session
|
||||||
|
|
||||||
|
AWS_ACCESS_KEY_ID = "xxx"
|
||||||
|
AWS_SECRET_ACCESS_KEY = "xxx"
|
||||||
|
|
||||||
|
|
||||||
|
async def go():
|
||||||
|
bucket = 'dataintake'
|
||||||
|
filename = 'dummy.bin'
|
||||||
|
folder = 'aiobotocore'
|
||||||
|
key = '{}/{}'.format(folder, filename)
|
||||||
|
|
||||||
|
session = get_session()
|
||||||
|
async with session.create_client('s3', region_name='us-west-2',
|
||||||
|
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
|
||||||
|
aws_access_key_id=AWS_ACCESS_KEY_ID) as client:
|
||||||
|
# upload object to amazon s3
|
||||||
|
data = b'\x01'*1024
|
||||||
|
resp = await client.put_object(Bucket=bucket,
|
||||||
|
Key=key,
|
||||||
|
Body=data)
|
||||||
|
print(resp)
|
||||||
|
|
||||||
|
# getting s3 object properties of file we just uploaded
|
||||||
|
resp = await client.get_object_acl(Bucket=bucket, Key=key)
|
||||||
|
print(resp)
|
||||||
|
|
||||||
|
# get object from s3
|
||||||
|
response = await client.get_object(Bucket=bucket, Key=key)
|
||||||
|
# this will ensure the connection is correctly re-used/closed
|
||||||
|
async with response['Body'] as stream:
|
||||||
|
assert await stream.read() == data
|
||||||
|
|
||||||
|
# list s3 objects using paginator
|
||||||
|
paginator = client.get_paginator('list_objects_v2')
|
||||||
|
async for result in paginator.paginate(Bucket=bucket, Prefix=folder):
|
||||||
|
for c in result.get('Contents', []):
|
||||||
|
print(c)
|
||||||
|
|
||||||
|
# delete object from s3
|
||||||
|
resp = await client.delete_object(Bucket=bucket, Key=key)
|
||||||
|
print(resp)
|
||||||
|
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
loop.run_until_complete(go())
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Context Manager Examples
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
from contextlib import AsyncExitStack
|
||||||
|
|
||||||
|
from aiobotocore.session import AioSession
|
||||||
|
|
||||||
|
|
||||||
|
# How to use in existing context manager
|
||||||
|
class Manager:
|
||||||
|
def __init__(self):
|
||||||
|
self._exit_stack = AsyncExitStack()
|
||||||
|
self._s3_client = None
|
||||||
|
|
||||||
|
async def __aenter__(self):
|
||||||
|
session = AioSession()
|
||||||
|
self._s3_client = await self._exit_stack.enter_async_context(session.create_client('s3'))
|
||||||
|
|
||||||
|
async def __aexit__(self, exc_type, exc_val, exc_tb):
|
||||||
|
await self._exit_stack.__aexit__(exc_type, exc_val, exc_tb)
|
||||||
|
|
||||||
|
# How to use with an external exit_stack
|
||||||
|
async def create_s3_client(session: AioSession, exit_stack: AsyncExitStack):
|
||||||
|
# Create client and add cleanup
|
||||||
|
client = await exit_stack.enter_async_context(session.create_client('s3'))
|
||||||
|
return client
|
||||||
|
|
||||||
|
|
||||||
|
async def non_manager_example():
|
||||||
|
session = AioSession()
|
||||||
|
|
||||||
|
async with AsyncExitStack() as exit_stack:
|
||||||
|
s3_client = await create_s3_client(session, exit_stack)
|
||||||
|
|
||||||
|
# do work with s3_client
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Supported AWS Services
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
This is a non-exuastive list of what tests aiobotocore runs against AWS services. Not all methods are tested but we aim to test the majority of
|
||||||
|
commonly used methods.
|
||||||
|
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| Service | Status |
|
||||||
|
+================+=======================+
|
||||||
|
| S3 | Working |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| DynamoDB | Basic methods tested |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| SNS | Basic methods tested |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| SQS | Basic methods tested |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| CloudFormation | Stack creation tested |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
| Kinesis | Basic methods tested |
|
||||||
|
+----------------+-----------------------+
|
||||||
|
|
||||||
|
Due to the way boto3 is implemented, its highly likely that even if services are not listed above that you can take any ``boto3.client('service')`` and
|
||||||
|
stick ``await`` in front of methods to make them async, e.g. ``await client.list_named_queries()`` would asynchronous list all of the named Athena queries.
|
||||||
|
|
||||||
|
If a service is not listed here and you could do with some tests or examples feel free to raise an issue.
|
||||||
|
|
||||||
|
|
||||||
|
Enable type checking and code completion
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Install types-aiobotocore_ that contains type annotations for ``aiobotocore``
|
||||||
|
and all supported botocore_ services.
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
# install aiobotocore type annotations
|
||||||
|
# for ec2, s3, rds, lambda, sqs, dynamo and cloudformation
|
||||||
|
python -m pip install 'types-aiobotocore[essential]'
|
||||||
|
|
||||||
|
# or install annotations for services you use
|
||||||
|
python -m pip install 'types-aiobotocore[acm,apigateway]'
|
||||||
|
|
||||||
|
# Lite version does not provide session.create_client overloads
|
||||||
|
# it is more RAM-friendly, but requires explicit type annotations
|
||||||
|
python -m pip install 'types-aiobotocore-lite[essential]'
|
||||||
|
|
||||||
|
Now you should be able to run Pylance_, pyright_, or mypy_ for type checking
|
||||||
|
as well as code completion in your IDE.
|
||||||
|
|
||||||
|
For ``types-aiobotocore-lite`` package use explicit type annotations:
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
from aiobotocore.session import get_session
|
||||||
|
from types_aiobotocore_s3.client import S3Client
|
||||||
|
|
||||||
|
session = get_session()
|
||||||
|
async with session.create_client("s3") as client:
|
||||||
|
client: S3Client
|
||||||
|
# type checking and code completion is now enabled for client
|
||||||
|
|
||||||
|
|
||||||
|
Full documentation for ``types-aiobotocore`` can be found here: https://youtype.github.io/types_aiobotocore_docs/
|
||||||
|
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
------------
|
||||||
|
* Python_ 3.9+
|
||||||
|
* aiohttp_
|
||||||
|
* botocore_
|
||||||
|
|
||||||
|
.. _Python: https://www.python.org
|
||||||
|
.. _asyncio: https://docs.python.org/3/library/asyncio.html
|
||||||
|
.. _botocore: https://github.com/boto/botocore
|
||||||
|
.. _aiohttp: https://github.com/aio-libs/aiohttp
|
||||||
|
.. _types-aiobotocore: https://youtype.github.io/types_aiobotocore_docs/
|
||||||
|
.. _Pylance: https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance
|
||||||
|
.. _pyright: https://github.com/microsoft/pyright
|
||||||
|
.. _mypy: http://mypy-lang.org/
|
||||||
|
|
||||||
|
awscli & boto3
|
||||||
|
--------------
|
||||||
|
|
||||||
|
awscli and boto3 depend on a single version, or a narrow range of versions, of botocore.
|
||||||
|
However, aiobotocore only supports a specific range of botocore versions. To ensure you
|
||||||
|
install the latest version of awscli and boto3 that your specific combination or
|
||||||
|
aiobotocore and botocore can support use::
|
||||||
|
|
||||||
|
pip install -U 'aiobotocore[awscli,boto3]'
|
||||||
|
|
||||||
|
If you only need awscli and not boto3 (or vice versa) you can just install one extra or
|
||||||
|
the other.
|
||||||
|
|
||||||
|
Changes
|
||||||
|
-------
|
||||||
|
|
||||||
|
2.25.2 (2025-11-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.25.1 (2025-10-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.25.0 (2025-10-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* switch async test runner from pytest-asyncio to AnyIO
|
||||||
|
* turn ``AioClientArgsCreator.get_client_args()`` and ``AioClientCreator._get_client_args()`` into asynchronous methods
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.24.3 (2025-10-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.24.2 (2025-09-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.24.1 (2025-08-15)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix endpoint circular import error
|
||||||
|
|
||||||
|
2.24.0 (2025-07-31)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.23.2 (2025-07-24)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.23.1 (2025-07-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.23.0 (2025-06-12)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* drop support for Python 3.8 (EOL)
|
||||||
|
* bump botocore dependency specification
|
||||||
|
* add experimental support for ``httpx``. The backend can be activated when creating a new session: ``session.create_client(..., config=AioConfig(http_session_cls=aiobotocore.httpxsession.HttpxSession))``. It's not fully tested and some features from aiohttp have not been ported, but feedback on what you're missing and bug reports are very welcome.
|
||||||
|
|
||||||
|
2.22.0 (2025-04-29)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fully patch ``ClientArgsCreator.get_client_args()``
|
||||||
|
* patch ``AioEndpoint.__init__()``
|
||||||
|
* patch ``EventStream._parse_event()``, ``ResponseParser`` and subclasses
|
||||||
|
* use SPDX license identifier for project metadata
|
||||||
|
* upstream support for the smithy-rpc-v2-cbor protocol
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.21.1 (2025-03-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix for refreshable credential account-id lookup
|
||||||
|
|
||||||
|
2.21.0 (2025-02-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* make `AioDeferredRefreshableCredentials` subclass of `DeferredRefreshableCredentials`
|
||||||
|
* make `AioSSOCredentialFetcher` subclass of `SSOCredentialFetcher`
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.20.1.dev0 (2025-02-24)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
* upstream http response header fixes to be more in-line with botocore
|
||||||
|
|
||||||
|
2.20.0 (2025-02-19)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* patch `AwsChunkedWrapper.read`
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.19.0 (2025-01-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* support custom `ttl_dns_cache` connector configuration
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.18.0 (2025-01-17)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.17.0 (2025-01-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
* add missing dependencies `python-dateutil`, `jmespath`, `multidict`, and `urllib3`
|
||||||
|
|
||||||
|
2.16.1 (2024-12-26)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.16.0 (2024-12-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.15.2 (2024-10-09)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.15.1 (2024-09-19)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.15.0 (2024-09-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.14.0 (2024-08-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.13.3 (2024-08-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix ``create_waiter_with_client()``
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.13.2 (2024-07-18)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix for #1125 due to missing patch of StreamingChecksumBody
|
||||||
|
|
||||||
|
2.13.1 (2024-06-24)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.13.0 (2024-05-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* address breaking change introduced in `aiohttp==3.9.2` #882
|
||||||
|
|
||||||
|
2.12.4 (2024-05-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.12.3 (2024-04-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.12.2 (2024-04-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* expose configuration of ``http_session_cls`` in ``AioConfig``
|
||||||
|
|
||||||
|
2.12.1 (2024-03-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix use of proxies #1070
|
||||||
|
|
||||||
|
2.12.0 (2024-02-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.11.2 (2024-02-02)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.11.1 (2024-01-25)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.11.0 (2024-01-19)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* send project-specific `User-Agent` HTTP header #853
|
||||||
|
|
||||||
|
2.10.0 (2024-01-18)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.9.1 (2024-01-17)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix race condition in S3 Express identity cache #1072
|
||||||
|
|
||||||
|
2.9.0 (2023-12-12)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.8.0 (2023-11-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add AioStubber that returns AioAWSResponse()
|
||||||
|
* remove confusing `aiobotocore.session.Session` symbol
|
||||||
|
* bump botocore dependency specification
|
||||||
|
|
||||||
|
2.7.0 (2023-10-17)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add support for Python 3.12
|
||||||
|
* drop more Python 3.7 support (EOL)
|
||||||
|
* relax botocore dependency specification
|
||||||
|
|
||||||
|
2.6.0 (2023-08-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump aiohttp minimum version to 3.7.4.post0
|
||||||
|
* drop python 3.7 support (EOL)
|
||||||
|
|
||||||
|
2.5.4 (2023-08-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix __aenter__ attribute error introduced in refresh bugfix (#1031)
|
||||||
|
|
||||||
|
2.5.3 (2023-08-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add more support for Python 3.11
|
||||||
|
* bump botocore to 1.31.17
|
||||||
|
* add waiter.wait return
|
||||||
|
* fix SSO token refresh bug #1025
|
||||||
|
|
||||||
|
2.5.2 (2023-07-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix issue #1020
|
||||||
|
|
||||||
|
2.5.1 (2023-06-27)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore to 1.29.161
|
||||||
|
|
||||||
|
2.5.0 (2023-03-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore to 1.29.76 (thanks @jakob-keller #999)
|
||||||
|
|
||||||
|
2.4.2 (2022-12-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix retries (#988)
|
||||||
|
|
||||||
|
2.4.1 (2022-11-28)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Adds support for checksums in streamed request trailers (thanks @terrycain #962)
|
||||||
|
|
||||||
|
2.4.0 (2022-08-25)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore to 1.27.59
|
||||||
|
|
||||||
|
2.3.4 (2022-06-23)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix select_object_content
|
||||||
|
|
||||||
|
2.3.3 (2022-06-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix connect timeout while getting IAM creds
|
||||||
|
* fix test files appearing in distribution package
|
||||||
|
|
||||||
|
2.3.2 (2022-05-08)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix 3.6 testing and and actually fix 3.6 support
|
||||||
|
|
||||||
|
2.3.1 (2022-05-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix 3.6 support
|
||||||
|
* AioConfig: allow keepalive_timeout to be None (thanks @dnlserrano #933)
|
||||||
|
|
||||||
|
2.3.0 (2022-05-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix encoding issue by swapping to AioAWSResponse and AioAWSRequest to behave more
|
||||||
|
like botocore
|
||||||
|
* fix exceptions mappings
|
||||||
|
|
||||||
|
2.2.0 (2022-03-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* remove deprecated APIs
|
||||||
|
* bump to botocore 1.24.21
|
||||||
|
* re-enable retry of aiohttp.ClientPayloadError
|
||||||
|
|
||||||
|
2.1.2 (2022-03-03)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix httpsession close call
|
||||||
|
|
||||||
|
2.1.1 (2022-02-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* implement asynchronous non-blocking adaptive retry strategy
|
||||||
|
|
||||||
|
2.1.0 (2021-12-14)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump to botocore 1.23.24
|
||||||
|
* fix aiohttp resolver config param #906
|
||||||
|
|
||||||
|
2.0.1 (2021-11-25)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* revert accidental dupe of _register_s3_events #867 (thanks @eoghanmurray)
|
||||||
|
* Support customizing the aiohttp connector resolver class #893 (thanks @orf)
|
||||||
|
* fix timestream query #902
|
||||||
|
|
||||||
|
|
||||||
|
2.0.0 (2021-11-02)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump to botocore 1.22.8
|
||||||
|
* turn off default ``AIOBOTOCORE_DEPRECATED_1_4_0_APIS`` env var to match botocore module. See notes in 1.4.0.
|
||||||
|
|
||||||
|
1.4.2 (2021-09-03)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fix missing close() method on http session (thanks `@terrycain <https://github.com/terrycain>`_)
|
||||||
|
* Fix for verify=False
|
||||||
|
|
||||||
|
1.4.1 (2021-08-24)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* put backwards incompatible changes behind ``AIOBOTOCORE_DEPRECATED_1_4_0_APIS`` env var. This means that `#876 <https://github.com/aio-libs/aiobotocore/issues/876>`_ will not work unless this env var has been set to 0.
|
||||||
|
|
||||||
|
1.4.0 (2021-08-20)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix retries via config `#877 <https://github.com/aio-libs/aiobotocore/pull/877>`_
|
||||||
|
* remove AioSession and get_session top level names to match botocore_
|
||||||
|
* change exceptions raised to match those of botocore_, see `mappings <https://github.com/aio-libs/aiobotocore/pull/877/files#diff-b1675e1eb4276bfae81107cda919ba446e4ce1b1e228a9e878d65dd1f474bf8cR162-R181>`_
|
||||||
|
|
||||||
|
1.3.3 (2021-07-12)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix AioJSONParser `#872 <https://github.com/aio-libs/aiobotocore/issues/872>`_
|
||||||
|
|
||||||
|
1.3.2 (2021-07-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Bump to botocore_ to `1.20.106 <https://github.com/boto/botocore/tree/1.20.106>`_
|
||||||
|
|
||||||
|
1.3.1 (2021-06-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* TCPConnector: change deprecated ssl_context to ssl
|
||||||
|
* fix non awaited generate presigned url calls `#868 <https://github.com/aio-libs/aiobotocore/issues/868>`_
|
||||||
|
|
||||||
|
1.3.0 (2021-04-09)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Bump to botocore_ to `1.20.49 <https://github.com/boto/botocore/tree/1.20.49>`_ `#856 <https://github.com/aio-libs/aiobotocore/pull/856>`_
|
||||||
|
|
||||||
|
1.2.2 (2021-03-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Await call to async method _load_creds_via_assume_role `#858 <https://github.com/aio-libs/aiobotocore/pull/858>`_ (thanks `@puzza007 <https://github.com/puzza007>`_)
|
||||||
|
|
||||||
|
1.2.1 (2021-02-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* verify strings are now correctly passed to aiohttp.TCPConnector `#851 <https://github.com/aio-libs/aiobotocore/pull/851>`_ (thanks `@FHTMitchell <https://github.com/FHTMitchell>`_)
|
||||||
|
|
||||||
|
1.2.0 (2021-01-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore to `1.19.52 <https://github.com/boto/botocore/tree/1.19.52>`_
|
||||||
|
* use passed in http_session_cls param to create_client `#797 <https://github.com/aio-libs/aiobotocore/issues/797>`_
|
||||||
|
|
||||||
|
1.1.2 (2020-10-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix AioPageIterator search method #831 (thanks `@joseph-jones <https://github.com/joseph-jones>`_)
|
||||||
|
|
||||||
|
1.1.1 (2020-08-31)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix s3 region redirect bug #825
|
||||||
|
|
||||||
|
1.1.0 (2020-08-18)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump botocore to 1.17.44
|
||||||
|
|
||||||
|
1.0.7 (2020-06-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix generate_db_auth_token via #816
|
||||||
|
|
||||||
|
1.0.6 (2020-06-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* revert __getattr__ fix as it breaks ddtrace
|
||||||
|
|
||||||
|
1.0.5 (2020-06-03)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed AioSession.get_service_data emit call #811 via #812
|
||||||
|
* Fixed async __getattr__ #789 via #803
|
||||||
|
|
||||||
|
1.0.4 (2020-04-15)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed S3 Presigned Post not being async
|
||||||
|
|
||||||
|
1.0.3 (2020-04-09)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixes typo when using credential process
|
||||||
|
|
||||||
|
1.0.2 (2020-04-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Disable Client.__getattr__ emit for now #789
|
||||||
|
|
||||||
|
1.0.1 (2020-04-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed signing requests with explicit credentials
|
||||||
|
|
||||||
|
1.0.0 (2020-03-31)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* API breaking: The result of create_client is now a required async context class
|
||||||
|
* Credential refresh should now work
|
||||||
|
* generate_presigned_url is now an async call along with other credential methods
|
||||||
|
* Credentials.[access_key/secret_key/token] now raise NotImplementedError because
|
||||||
|
they won't call refresh like botocore. Instead should use get_frozen_credentials
|
||||||
|
async method
|
||||||
|
* Bump botocore and extras
|
||||||
|
|
||||||
|
0.12.0 (2020-02-23)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Bump botocore and extras
|
||||||
|
* Drop support for 3.5 given we are unable to test it with moto
|
||||||
|
and it will soon be unsupported
|
||||||
|
* Remove loop parameters for Python 3.8 compliance
|
||||||
|
* Remove deprecated AioPageIterator.next_page
|
||||||
|
|
||||||
|
0.11.1 (2020-01-03)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed event streaming API calls like S3 Select.
|
||||||
|
|
||||||
|
0.11.0 (2019-11-12)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* replace CaseInsensitiveDict with urllib3 equivalent #744
|
||||||
|
(thanks to inspiration from @craigmccarter and @kevchentw)
|
||||||
|
* bump botocore to 1.13.14
|
||||||
|
* fix for mismatched botocore method replacements
|
||||||
|
|
||||||
|
0.10.4 (2019-10-24)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Make AioBaseClient.close method async #724 (thanks @bsitruk)
|
||||||
|
* Bump awscli, boto3, botocore #735 (thanks @bbrendon)
|
||||||
|
* switch paginator to async_generator, add result_key_iters
|
||||||
|
(deprecate next_page method)
|
||||||
|
|
||||||
|
0.10.3 (2019-07-17)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Bump botocore and extras
|
||||||
|
|
||||||
|
0.10.2 (2019-02-11)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fix response-received emitted event #682
|
||||||
|
|
||||||
|
0.10.1 (2019-02-08)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Make tests pass with pytest 4.1 #669 (thanks @yan12125)
|
||||||
|
* Support Python 3.7 #671 (thanks to @yan12125)
|
||||||
|
* Update RTD build config #672 (thanks @willingc)
|
||||||
|
* Bump to botocore 1.12.91 #679
|
||||||
|
|
||||||
|
0.10.0 (2018-12-09)
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
* Update to botocore 1.12.49 #639 (thanks @terrycain)
|
||||||
|
|
||||||
|
0.9.4 (2018-08-08)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Add ClientPayloadError as retryable exception
|
||||||
|
|
||||||
|
0.9.3 (2018-07-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Bring botocore up to date
|
||||||
|
|
||||||
|
0.9.2 (2018-05-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump aiohttp requirement to fix read timeouts
|
||||||
|
|
||||||
|
0.9.1 (2018-05-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix timeout bug introduced in last release
|
||||||
|
|
||||||
|
0.9.0 (2018-06-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump aiohttp to 3.3.x
|
||||||
|
* remove unneeded set_socket_timeout
|
||||||
|
|
||||||
|
0.8.0 (2018-05-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fix pagination #573 (thanks @adamrothman)
|
||||||
|
* Enabled several s3 tests via moto
|
||||||
|
* Bring botocore up to date
|
||||||
|
|
||||||
|
0.7.0 (2018-05-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Just version bump
|
||||||
|
|
||||||
|
0.6.1a0 (2018-05-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
* bump to aiohttp 3.1.x
|
||||||
|
* switch tests to Python 3.5+
|
||||||
|
* switch to native coroutines
|
||||||
|
* fix non-streaming body timeout retries
|
||||||
|
|
||||||
|
0.6.0 (2018-03-04)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Upgrade to aiohttp>=3.0.0 #536 (thanks @Gr1N)
|
||||||
|
|
||||||
|
0.5.3 (2018-02-23)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed waiters #523 (thanks @dalazx)
|
||||||
|
* fix conn_timeout #485
|
||||||
|
|
||||||
|
0.5.2 (2017-12-06)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Updated awscli dependency #461
|
||||||
|
|
||||||
|
0.5.1 (2017-11-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Disabled compressed response #430
|
||||||
|
|
||||||
|
0.5.0 (2017-11-10)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fix error botocore error checking #190
|
||||||
|
* Update supported botocore requirement to: >=1.7.28, <=1.7.40
|
||||||
|
* Bump aiohttp requirement to support compressed responses correctly #298
|
||||||
|
|
||||||
|
0.4.5 (2017-09-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Added SQS examples and tests #336
|
||||||
|
* Changed requirements.txt structure #336
|
||||||
|
* bump to botocore 1.7.4
|
||||||
|
* Added DynamoDB examples and tests #340
|
||||||
|
|
||||||
|
|
||||||
|
0.4.4 (2017-08-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add the supported versions of boto3 to extras require #324
|
||||||
|
|
||||||
|
0.4.3 (2017-07-05)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add the supported versions of awscli to extras require #273 (thanks @graingert)
|
||||||
|
|
||||||
|
0.4.2 (2017-07-03)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* update supported aiohttp requirement to: >=2.0.4, <=2.3.0
|
||||||
|
* update supported botocore requirement to: >=1.5.71, <=1.5.78
|
||||||
|
|
||||||
|
0.4.1 (2017-06-27)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* fix redirects #268
|
||||||
|
|
||||||
|
0.4.0 (2017-06-19)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* update botocore requirement to: botocore>=1.5.34, <=1.5.70
|
||||||
|
* fix read_timeout due to #245
|
||||||
|
* implement set_socket_timeout
|
||||||
|
|
||||||
|
0.3.3 (2017-05-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* switch to PEP 440 version parser to support 'dev' versions
|
||||||
|
|
||||||
|
0.3.2 (2017-05-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fix botocore integration
|
||||||
|
* Provisional fix for aiohttp 2.x stream support
|
||||||
|
* update botocore requirement to: botocore>=1.5.34, <=1.5.52
|
||||||
|
|
||||||
|
0.3.1 (2017-04-18)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Fixed Waiter support
|
||||||
|
|
||||||
|
0.3.0 (2017-04-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Added support for aiohttp>=2.0.4 (thanks @achimnol)
|
||||||
|
* update botocore requirement to: botocore>=1.5.0, <=1.5.33
|
||||||
|
|
||||||
|
0.2.3 (2017-03-22)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* update botocore requirement to: botocore>=1.5.0, <1.5.29
|
||||||
|
|
||||||
|
0.2.2 (2017-03-07)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* set aiobotocore.__all__ for * imports #121 (thanks @graingert)
|
||||||
|
* fix ETag in head_object response #132
|
||||||
|
|
||||||
|
0.2.1 (2017-02-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Normalize headers and handle redirection by botocore #115 (thanks @Fedorof)
|
||||||
|
|
||||||
|
0.2.0 (2017-01-30)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* add support for proxies (thanks @jjonek)
|
||||||
|
* remove AioConfig verify_ssl connector_arg as this is handled by the
|
||||||
|
create_client verify param
|
||||||
|
* remove AioConfig limit connector_arg as this is now handled by
|
||||||
|
by the Config `max_pool_connections` property (note default is 10)
|
||||||
|
|
||||||
|
0.1.1 (2017-01-16)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* botocore updated to version 1.5.0
|
||||||
|
|
||||||
|
0.1.0 (2017-01-12)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
* Pass timeout to aiohttp.request to enforce read_timeout #86 (thanks @vharitonsky)
|
||||||
|
(bumped up to next semantic version due to read_timeout enabling change)
|
||||||
|
|
||||||
|
0.0.6 (2016-11-19)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
* Added enforcement of plain response #57 (thanks @rymir)
|
||||||
|
* botocore updated to version 1.4.73 #74 (thanks @vas3k)
|
||||||
|
|
||||||
|
|
||||||
|
0.0.5 (2016-06-01)
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
* Initial alpha release
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiobotocore
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiobotocore
|
||||||
|
</h1>
|
||||||
|
<a href="/aiobotocore/aiobotocore-2.25.2-py3-none-any.whl#sha256=0cec45c6ba7627dd5e5460337291c86ac38c3b512ec4054ce76407d0f7f2a48f" data-requires-python=">=3.9" data-dist-info-metadata="sha256=a3258cb67c6aca2e2da130b3ff52743b5a025f76d815d744a7599e6b62e872c4">
|
||||||
|
aiobotocore-2.25.2-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,209 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiofiles
|
||||||
|
Version: 25.1.0
|
||||||
|
Summary: File support for asyncio.
|
||||||
|
Project-URL: Changelog, https://github.com/Tinche/aiofiles#history
|
||||||
|
Project-URL: Bug Tracker, https://github.com/Tinche/aiofiles/issues
|
||||||
|
Project-URL: Repository, https://github.com/Tinche/aiofiles
|
||||||
|
Author-email: Tin Tvrtkovic <tinchester@gmail.com>
|
||||||
|
License: Apache-2.0
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: NOTICE
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: License :: OSI Approved :: Apache Software License
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
|
||||||
|
# aiofiles: file support for asyncio
|
||||||
|
|
||||||
|
[](https://pypi.python.org/pypi/aiofiles)
|
||||||
|
[](https://github.com/Tinche/aiofiles/actions)
|
||||||
|
[](https://github.com/Tinche/aiofiles/actions/workflows/main.yml)
|
||||||
|
[](https://github.com/Tinche/aiofiles)
|
||||||
|
[](https://github.com/astral-sh/ruff)
|
||||||
|
|
||||||
|
**aiofiles** is an Apache2 licensed library, written in Python, for handling local
|
||||||
|
disk files in asyncio applications.
|
||||||
|
|
||||||
|
Ordinary local file IO is blocking, and cannot easily and portably be made
|
||||||
|
asynchronous. This means doing file IO may interfere with asyncio applications,
|
||||||
|
which shouldn't block the executing thread. aiofiles helps with this by
|
||||||
|
introducing asynchronous versions of files that support delegating operations to
|
||||||
|
a separate thread pool.
|
||||||
|
|
||||||
|
```python
|
||||||
|
async with aiofiles.open('filename', mode='r') as f:
|
||||||
|
contents = await f.read()
|
||||||
|
print(contents)
|
||||||
|
'My file contents'
|
||||||
|
```
|
||||||
|
|
||||||
|
Asynchronous iteration is also supported.
|
||||||
|
|
||||||
|
```python
|
||||||
|
async with aiofiles.open('filename') as f:
|
||||||
|
async for line in f:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Asynchronous interface to tempfile module.
|
||||||
|
|
||||||
|
```python
|
||||||
|
async with aiofiles.tempfile.TemporaryFile('wb') as f:
|
||||||
|
await f.write(b'Hello, World!')
|
||||||
|
```
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- a file API very similar to Python's standard, blocking API
|
||||||
|
- support for buffered and unbuffered binary files, and buffered text files
|
||||||
|
- support for `async`/`await` ([PEP 492](https://peps.python.org/pep-0492/)) constructs
|
||||||
|
- async interface to tempfile module
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
To install aiofiles, simply:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
pip install aiofiles
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Files are opened using the `aiofiles.open()` coroutine, which in addition to
|
||||||
|
mirroring the builtin `open` accepts optional `loop` and `executor`
|
||||||
|
arguments. If `loop` is absent, the default loop will be used, as per the
|
||||||
|
set asyncio policy. If `executor` is not specified, the default event loop
|
||||||
|
executor will be used.
|
||||||
|
|
||||||
|
In case of success, an asynchronous file object is returned with an
|
||||||
|
API identical to an ordinary file, except the following methods are coroutines
|
||||||
|
and delegate to an executor:
|
||||||
|
|
||||||
|
- `close`
|
||||||
|
- `flush`
|
||||||
|
- `isatty`
|
||||||
|
- `read`
|
||||||
|
- `readall`
|
||||||
|
- `read1`
|
||||||
|
- `readinto`
|
||||||
|
- `readline`
|
||||||
|
- `readlines`
|
||||||
|
- `seek`
|
||||||
|
- `seekable`
|
||||||
|
- `tell`
|
||||||
|
- `truncate`
|
||||||
|
- `writable`
|
||||||
|
- `write`
|
||||||
|
- `writelines`
|
||||||
|
|
||||||
|
In case of failure, one of the usual exceptions will be raised.
|
||||||
|
|
||||||
|
`aiofiles.stdin`, `aiofiles.stdout`, `aiofiles.stderr`,
|
||||||
|
`aiofiles.stdin_bytes`, `aiofiles.stdout_bytes`, and
|
||||||
|
`aiofiles.stderr_bytes` provide async access to `sys.stdin`,
|
||||||
|
`sys.stdout`, `sys.stderr`, and their corresponding `.buffer` properties.
|
||||||
|
|
||||||
|
The `aiofiles.os` module contains executor-enabled coroutine versions of
|
||||||
|
several useful `os` functions that deal with files:
|
||||||
|
|
||||||
|
- `stat`
|
||||||
|
- `statvfs`
|
||||||
|
- `sendfile`
|
||||||
|
- `rename`
|
||||||
|
- `renames`
|
||||||
|
- `replace`
|
||||||
|
- `remove`
|
||||||
|
- `unlink`
|
||||||
|
- `mkdir`
|
||||||
|
- `makedirs`
|
||||||
|
- `rmdir`
|
||||||
|
- `removedirs`
|
||||||
|
- `link`
|
||||||
|
- `symlink`
|
||||||
|
- `readlink`
|
||||||
|
- `listdir`
|
||||||
|
- `scandir`
|
||||||
|
- `access`
|
||||||
|
- `getcwd`
|
||||||
|
- `path.abspath`
|
||||||
|
- `path.exists`
|
||||||
|
- `path.isfile`
|
||||||
|
- `path.isdir`
|
||||||
|
- `path.islink`
|
||||||
|
- `path.ismount`
|
||||||
|
- `path.getsize`
|
||||||
|
- `path.getatime`
|
||||||
|
- `path.getctime`
|
||||||
|
- `path.samefile`
|
||||||
|
- `path.sameopenfile`
|
||||||
|
|
||||||
|
### Tempfile
|
||||||
|
|
||||||
|
**aiofiles.tempfile** implements the following interfaces:
|
||||||
|
|
||||||
|
- TemporaryFile
|
||||||
|
- NamedTemporaryFile
|
||||||
|
- SpooledTemporaryFile
|
||||||
|
- TemporaryDirectory
|
||||||
|
|
||||||
|
Results return wrapped with a context manager allowing use with async with and async for.
|
||||||
|
|
||||||
|
```python
|
||||||
|
async with aiofiles.tempfile.NamedTemporaryFile('wb+') as f:
|
||||||
|
await f.write(b'Line1\n Line2')
|
||||||
|
await f.seek(0)
|
||||||
|
async for line in f:
|
||||||
|
print(line)
|
||||||
|
|
||||||
|
async with aiofiles.tempfile.TemporaryDirectory() as d:
|
||||||
|
filename = os.path.join(d, "file.ext")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Writing tests for aiofiles
|
||||||
|
|
||||||
|
Real file IO can be mocked by patching `aiofiles.threadpool.sync_open`
|
||||||
|
as desired. The return type also needs to be registered with the
|
||||||
|
`aiofiles.threadpool.wrap` dispatcher:
|
||||||
|
|
||||||
|
```python
|
||||||
|
aiofiles.threadpool.wrap.register(mock.MagicMock)(
|
||||||
|
lambda *args, **kwargs: aiofiles.threadpool.AsyncBufferedIOBase(*args, **kwargs)
|
||||||
|
)
|
||||||
|
|
||||||
|
async def test_stuff():
|
||||||
|
write_data = 'data'
|
||||||
|
read_file_chunks = [
|
||||||
|
b'file chunks 1',
|
||||||
|
b'file chunks 2',
|
||||||
|
b'file chunks 3',
|
||||||
|
b'',
|
||||||
|
]
|
||||||
|
file_chunks_iter = iter(read_file_chunks)
|
||||||
|
|
||||||
|
mock_file_stream = mock.MagicMock(
|
||||||
|
read=lambda *args, **kwargs: next(file_chunks_iter)
|
||||||
|
)
|
||||||
|
|
||||||
|
with mock.patch('aiofiles.threadpool.sync_open', return_value=mock_file_stream) as mock_open:
|
||||||
|
async with aiofiles.open('filename', 'w') as f:
|
||||||
|
await f.write(write_data)
|
||||||
|
assert await f.read() == b'file chunks 1'
|
||||||
|
|
||||||
|
mock_file_stream.write.assert_called_once_with(write_data)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Contributing
|
||||||
|
|
||||||
|
Contributions are very welcome. Tests can be run with `tox`, please ensure
|
||||||
|
the coverage at least stays the same before you submit a pull request.
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiofiles
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiofiles
|
||||||
|
</h1>
|
||||||
|
<a href="/aiofiles/aiofiles-25.1.0-py3-none-any.whl#sha256=abe311e527c862958650f9438e859c1fa7568a141b22abcd015e120e86a85695" data-requires-python=">=3.9" data-dist-info-metadata="sha256=6b96b99073158a00ddb012a514834b48c3ec5f732ce059ffcde700481759a8b5">
|
||||||
|
aiofiles-25.1.0-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,123 @@
|
||||||
|
Metadata-Version: 2.3
|
||||||
|
Name: aiohappyeyeballs
|
||||||
|
Version: 2.6.1
|
||||||
|
Summary: Happy Eyeballs for asyncio
|
||||||
|
License: PSF-2.0
|
||||||
|
Author: J. Nick Koston
|
||||||
|
Author-email: nick@koston.org
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Topic :: Software Development :: Libraries
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: License :: OSI Approved :: Python Software Foundation License
|
||||||
|
Project-URL: Bug Tracker, https://github.com/aio-libs/aiohappyeyeballs/issues
|
||||||
|
Project-URL: Changelog, https://github.com/aio-libs/aiohappyeyeballs/blob/main/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://aiohappyeyeballs.readthedocs.io
|
||||||
|
Project-URL: Repository, https://github.com/aio-libs/aiohappyeyeballs
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
|
||||||
|
# aiohappyeyeballs
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://github.com/aio-libs/aiohappyeyeballs/actions/workflows/ci.yml?query=branch%3Amain">
|
||||||
|
<img src="https://img.shields.io/github/actions/workflow/status/aio-libs/aiohappyeyeballs/ci-cd.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
|
||||||
|
</a>
|
||||||
|
<a href="https://aiohappyeyeballs.readthedocs.io">
|
||||||
|
<img src="https://img.shields.io/readthedocs/aiohappyeyeballs.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
|
||||||
|
</a>
|
||||||
|
<a href="https://codecov.io/gh/aio-libs/aiohappyeyeballs">
|
||||||
|
<img src="https://img.shields.io/codecov/c/github/aio-libs/aiohappyeyeballs.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://python-poetry.org/">
|
||||||
|
<img src="https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=" alt="Poetry">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/astral-sh/ruff">
|
||||||
|
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/pre-commit/pre-commit">
|
||||||
|
<img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://pypi.org/project/aiohappyeyeballs/">
|
||||||
|
<img src="https://img.shields.io/pypi/v/aiohappyeyeballs.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
|
||||||
|
</a>
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/aiohappyeyeballs.svg?style=flat-square&logo=python&logoColor=fff" alt="Supported Python versions">
|
||||||
|
<img src="https://img.shields.io/pypi/l/aiohappyeyeballs.svg?style=flat-square" alt="License">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Documentation**: <a href="https://aiohappyeyeballs.readthedocs.io" target="_blank">https://aiohappyeyeballs.readthedocs.io </a>
|
||||||
|
|
||||||
|
**Source Code**: <a href="https://github.com/aio-libs/aiohappyeyeballs" target="_blank">https://github.com/aio-libs/aiohappyeyeballs </a>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[Happy Eyeballs](https://en.wikipedia.org/wiki/Happy_Eyeballs)
|
||||||
|
([RFC 8305](https://www.rfc-editor.org/rfc/rfc8305.html))
|
||||||
|
|
||||||
|
## Use case
|
||||||
|
|
||||||
|
This library exists to allow connecting with
|
||||||
|
[Happy Eyeballs](https://en.wikipedia.org/wiki/Happy_Eyeballs)
|
||||||
|
([RFC 8305](https://www.rfc-editor.org/rfc/rfc8305.html))
|
||||||
|
when you
|
||||||
|
already have a list of addrinfo and not a DNS name.
|
||||||
|
|
||||||
|
The stdlib version of `loop.create_connection()`
|
||||||
|
will only work when you pass in an unresolved name which
|
||||||
|
is not a good fit when using DNS caching or resolving
|
||||||
|
names via another method such as `zeroconf`.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Install this via pip (or your favourite package manager):
|
||||||
|
|
||||||
|
`pip install aiohappyeyeballs`
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
[aiohappyeyeballs is licensed under the same terms as cpython itself.](https://github.com/python/cpython/blob/main/LICENSE)
|
||||||
|
|
||||||
|
## Example usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
|
||||||
|
addr_infos = await loop.getaddrinfo("example.org", 80)
|
||||||
|
|
||||||
|
socket = await start_connection(addr_infos)
|
||||||
|
socket = await start_connection(addr_infos, local_addr_infos=local_addr_infos, happy_eyeballs_delay=0.2)
|
||||||
|
|
||||||
|
transport, protocol = await loop.create_connection(
|
||||||
|
MyProtocol, sock=socket, ...)
|
||||||
|
|
||||||
|
# Remove the first address for each family from addr_info
|
||||||
|
pop_addr_infos_interleave(addr_info, 1)
|
||||||
|
|
||||||
|
# Remove all matching address from addr_info
|
||||||
|
remove_addr_infos(addr_info, "dead::beef::")
|
||||||
|
|
||||||
|
# Convert a local_addr to local_addr_infos
|
||||||
|
local_addr_infos = addr_to_addr_infos(("127.0.0.1",0))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Credits
|
||||||
|
|
||||||
|
This package contains code from cpython and is licensed under the same terms as cpython itself.
|
||||||
|
|
||||||
|
This package was created with
|
||||||
|
[Copier](https://copier.readthedocs.io/) and the
|
||||||
|
[browniebroke/pypackage-template](https://github.com/browniebroke/pypackage-template)
|
||||||
|
project template.
|
||||||
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiohappyeyeballs
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiohappyeyeballs
|
||||||
|
</h1>
|
||||||
|
<a href="/aiohappyeyeballs/aiohappyeyeballs-2.6.1-py3-none-any.whl#sha256=f349ba8f4b75cb25c99c5c2d84e997e485204d2902a9597802b0371f09331fb8" data-requires-python=">=3.9" data-dist-info-metadata="sha256=3525e5849c007e2dfcd1e123028ec14383ff4d56a5f718b4aa4c995a26ccb153">
|
||||||
|
aiohappyeyeballs-2.6.1-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,262 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiohttp
|
||||||
|
Version: 3.13.2
|
||||||
|
Summary: Async http client/server framework (asyncio)
|
||||||
|
Maintainer-email: aiohttp team <team@aiohttp.org>
|
||||||
|
License: Apache-2.0 AND MIT
|
||||||
|
Project-URL: Homepage, https://github.com/aio-libs/aiohttp
|
||||||
|
Project-URL: Chat: Matrix, https://matrix.to/#/#aio-libs:matrix.org
|
||||||
|
Project-URL: Chat: Matrix Space, https://matrix.to/#/#aio-libs-space:matrix.org
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiohttp
|
||||||
|
Project-URL: Docs: Changelog, https://docs.aiohttp.org/en/stable/changes.html
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiohttp.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiohttp/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiohttp
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: vendor/llhttp/LICENSE
|
||||||
|
Requires-Dist: aiohappyeyeballs>=2.5.0
|
||||||
|
Requires-Dist: aiosignal>=1.4.0
|
||||||
|
Requires-Dist: async-timeout<6.0,>=4.0; python_version < "3.11"
|
||||||
|
Requires-Dist: attrs>=17.3.0
|
||||||
|
Requires-Dist: frozenlist>=1.1.1
|
||||||
|
Requires-Dist: multidict<7.0,>=4.5
|
||||||
|
Requires-Dist: propcache>=0.2.0
|
||||||
|
Requires-Dist: yarl<2.0,>=1.17.0
|
||||||
|
Provides-Extra: speedups
|
||||||
|
Requires-Dist: aiodns>=3.3.0; extra == "speedups"
|
||||||
|
Requires-Dist: Brotli; platform_python_implementation == "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: brotlicffi; platform_python_implementation != "CPython" and extra == "speedups"
|
||||||
|
Requires-Dist: backports.zstd; (platform_python_implementation == "CPython" and python_version < "3.14") and extra == "speedups"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Async http client/server framework
|
||||||
|
==================================
|
||||||
|
|
||||||
|
.. image:: https://raw.githubusercontent.com/aio-libs/aiohttp/master/docs/aiohttp-plain.svg
|
||||||
|
:height: 64px
|
||||||
|
:width: 64px
|
||||||
|
:alt: aiohttp logo
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiohttp/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiohttp/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub Actions status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiohttp/branch/master/graph/badge.svg
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiohttp
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiohttp.svg
|
||||||
|
:target: https://pypi.org/project/aiohttp
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/dm/aiohttp
|
||||||
|
:target: https://pypistats.org/packages/aiohttp
|
||||||
|
:alt: Downloads count
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiohttp/badge/?version=latest
|
||||||
|
:target: https://docs.aiohttp.org/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json
|
||||||
|
:target: https://codspeed.io/aio-libs/aiohttp
|
||||||
|
:alt: Codspeed.io status for aiohttp
|
||||||
|
|
||||||
|
|
||||||
|
Key Features
|
||||||
|
============
|
||||||
|
|
||||||
|
- Supports both client and server side of HTTP protocol.
|
||||||
|
- Supports both client and server Web-Sockets out-of-the-box and avoids
|
||||||
|
Callback Hell.
|
||||||
|
- Provides Web-server with middleware and pluggable routing.
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Client
|
||||||
|
------
|
||||||
|
|
||||||
|
To get something from the web:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
async with session.get('http://python.org') as response:
|
||||||
|
|
||||||
|
print("Status:", response.status)
|
||||||
|
print("Content-type:", response.headers['content-type'])
|
||||||
|
|
||||||
|
html = await response.text()
|
||||||
|
print("Body:", html[:15], "...")
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
|
||||||
|
This prints:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
Status: 200
|
||||||
|
Content-type: text/html; charset=utf-8
|
||||||
|
Body: <!doctype html> ...
|
||||||
|
|
||||||
|
Coming from `requests <https://requests.readthedocs.io/>`_ ? Read `why we need so many lines <https://aiohttp.readthedocs.io/en/latest/http_request_lifecycle.html>`_.
|
||||||
|
|
||||||
|
Server
|
||||||
|
------
|
||||||
|
|
||||||
|
An example using a simple server:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# examples/server_simple.py
|
||||||
|
from aiohttp import web
|
||||||
|
|
||||||
|
async def handle(request):
|
||||||
|
name = request.match_info.get('name', "Anonymous")
|
||||||
|
text = "Hello, " + name
|
||||||
|
return web.Response(text=text)
|
||||||
|
|
||||||
|
async def wshandle(request):
|
||||||
|
ws = web.WebSocketResponse()
|
||||||
|
await ws.prepare(request)
|
||||||
|
|
||||||
|
async for msg in ws:
|
||||||
|
if msg.type == web.WSMsgType.text:
|
||||||
|
await ws.send_str("Hello, {}".format(msg.data))
|
||||||
|
elif msg.type == web.WSMsgType.binary:
|
||||||
|
await ws.send_bytes(msg.data)
|
||||||
|
elif msg.type == web.WSMsgType.close:
|
||||||
|
break
|
||||||
|
|
||||||
|
return ws
|
||||||
|
|
||||||
|
|
||||||
|
app = web.Application()
|
||||||
|
app.add_routes([web.get('/', handle),
|
||||||
|
web.get('/echo', wshandle),
|
||||||
|
web.get('/{name}', handle)])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
web.run_app(app)
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiohttp.readthedocs.io/
|
||||||
|
|
||||||
|
|
||||||
|
Demos
|
||||||
|
=====
|
||||||
|
|
||||||
|
https://github.com/aio-libs/aiohttp-demos
|
||||||
|
|
||||||
|
|
||||||
|
External links
|
||||||
|
==============
|
||||||
|
|
||||||
|
* `Third party libraries
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/third_party.html>`_
|
||||||
|
* `Built with aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/built_with.html>`_
|
||||||
|
* `Powered by aiohttp
|
||||||
|
<http://aiohttp.readthedocs.io/en/latest/powered_by.html>`_
|
||||||
|
|
||||||
|
Feel free to make a Pull Request for adding your link to these pages!
|
||||||
|
|
||||||
|
|
||||||
|
Communication channels
|
||||||
|
======================
|
||||||
|
|
||||||
|
*aio-libs Discussions*: https://github.com/aio-libs/aiohttp/discussions
|
||||||
|
|
||||||
|
*Matrix*: `#aio-libs:matrix.org <https://matrix.to/#/#aio-libs:matrix.org>`_
|
||||||
|
|
||||||
|
We support `Stack Overflow
|
||||||
|
<https://stackoverflow.com/questions/tagged/aiohttp>`_.
|
||||||
|
Please add *aiohttp* tag to your question there.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
============
|
||||||
|
|
||||||
|
- attrs_
|
||||||
|
- multidict_
|
||||||
|
- yarl_
|
||||||
|
- frozenlist_
|
||||||
|
|
||||||
|
Optionally you may install the aiodns_ library (highly recommended for sake of speed).
|
||||||
|
|
||||||
|
.. _aiodns: https://pypi.python.org/pypi/aiodns
|
||||||
|
.. _attrs: https://github.com/python-attrs/attrs
|
||||||
|
.. _multidict: https://pypi.python.org/pypi/multidict
|
||||||
|
.. _frozenlist: https://pypi.org/project/frozenlist/
|
||||||
|
.. _yarl: https://pypi.python.org/pypi/yarl
|
||||||
|
.. _async-timeout: https://pypi.python.org/pypi/async_timeout
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiohttp`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
|
||||||
|
Keepsafe
|
||||||
|
========
|
||||||
|
|
||||||
|
The aiohttp community would like to thank Keepsafe
|
||||||
|
(https://www.getkeepsafe.com) for its support in the early days of
|
||||||
|
the project.
|
||||||
|
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The latest developer version is available in a GitHub repository:
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
If you are interested in efficiency, the AsyncIO community maintains a
|
||||||
|
list of benchmarks on the official wiki:
|
||||||
|
https://github.com/python/asyncio/wiki/Benchmarks
|
||||||
|
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs:matrix.org
|
||||||
|
:alt: Matrix Room — #aio-libs:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/matrix/aio-libs-space:matrix.org?label=Discuss%20on%20Matrix%20at%20%23aio-libs-space%3Amatrix.org&logo=matrix&server_fqdn=matrix.org&style=flat
|
||||||
|
:target: https://matrix.to/#/%23aio-libs-space:matrix.org
|
||||||
|
:alt: Matrix Space — #aio-libs-space:matrix.org
|
||||||
|
|
||||||
|
.. image:: https://insights.linuxfoundation.org/api/badge/health-score?project=aiohttp
|
||||||
|
:target: https://insights.linuxfoundation.org/project/aiohttp
|
||||||
|
:alt: LFX Health Score
|
||||||
|
|
@ -0,0 +1,40 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiohttp
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiohttp
|
||||||
|
</h1>
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp311-cp311-musllinux_1_2_x86_64.whl#sha256=9acda8604a57bb60544e4646a4615c1866ee6c04a8edef9b8ee6fd1d8fa2ddc8" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp311-cp311-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=a3b6fb0c207cc661fa0bf8c66d8d9b657331ccc814f4719468af61034b478592" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp310-cp310-musllinux_1_2_x86_64.whl#sha256=05c4dd3c48fb5f15db31f57eb35374cb0c09afdde532e7fb70a75aede0ed30f6" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp310-cp310-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=b59d13c443f8e049d9e94099c7e412e34610f1f49be0f230ec656a10692a5802" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp39-cp39-musllinux_1_2_x86_64.whl#sha256=04c3971421576ed24c191f610052bcb2f059e395bc2489dd99e397f9bc466329" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp39-cp39-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/aiohttp/aiohttp-3.13.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=3a92cf4b9bea33e15ecbaa5c59921be0f23222608143d025c989924f7e3e0c07" data-requires-python=">=3.9" data-dist-info-metadata="sha256=df1afc67261322787dd3d4ea09d6612880b7ee0e674e0c8ff8d8ffca0942b390">
|
||||||
|
aiohttp-3.13.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,108 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aioitertools
|
||||||
|
Version: 0.13.0
|
||||||
|
Summary: itertools and builtins for AsyncIO and mixed iterables
|
||||||
|
Author-email: Amethyst Reese <amethyst@n7.gg>
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-Expression: MIT
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Topic :: Software Development :: Libraries
|
||||||
|
License-File: LICENSE
|
||||||
|
Requires-Dist: typing_extensions>=4.0; python_version < '3.10'
|
||||||
|
Project-URL: Changelog, https://aioitertools.omnilib.dev/en/latest/changelog.html
|
||||||
|
Project-URL: Documentation, https://aioitertools.omnilib.dev
|
||||||
|
Project-URL: Github, https://github.com/omnilib/aioitertools
|
||||||
|
|
||||||
|
aioitertools
|
||||||
|
============
|
||||||
|
|
||||||
|
Implementation of itertools, builtins, and more for AsyncIO and mixed-type iterables.
|
||||||
|
|
||||||
|
[](https://aioitertools.omnilib.dev)
|
||||||
|
[](https://pypi.org/project/aioitertools)
|
||||||
|
[](https://aioitertools.omnilib.dev/en/latest/changelog.html)
|
||||||
|
[](https://github.com/omnilib/aioitertools/blob/main/LICENSE)
|
||||||
|
|
||||||
|
|
||||||
|
Install
|
||||||
|
-------
|
||||||
|
|
||||||
|
aioitertools requires Python 3.9 or newer.
|
||||||
|
You can install it from PyPI:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ pip install aioitertools
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
aioitertools shadows the standard library whenever possible to provide
|
||||||
|
asynchronous version of the modules and functions you already know. It's
|
||||||
|
fully compatible with standard iterators and async iterators alike, giving
|
||||||
|
you one unified, familiar interface for interacting with iterable objects:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from aioitertools import iter, next, map, zip
|
||||||
|
|
||||||
|
something = iter(...)
|
||||||
|
first_item = await next(something)
|
||||||
|
|
||||||
|
async for item in iter(something):
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
async def fetch(url):
|
||||||
|
response = await aiohttp.request(...)
|
||||||
|
return response.json
|
||||||
|
|
||||||
|
async for value in map(fetch, MANY_URLS):
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
async for a, b in zip(something, something_else):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
aioitertools emulates the entire `itertools` module, offering the same
|
||||||
|
function signatures, but as async generators. All functions support
|
||||||
|
standard iterables and async iterables alike, and can take functions or
|
||||||
|
coroutines:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from aioitertools import chain, islice
|
||||||
|
|
||||||
|
async def generator1(...):
|
||||||
|
yield ...
|
||||||
|
|
||||||
|
async def generator2(...):
|
||||||
|
yield ...
|
||||||
|
|
||||||
|
async for value in chain(generator1(), generator2()):
|
||||||
|
...
|
||||||
|
|
||||||
|
async for value in islice(generator1(), 2, None, 2):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
See [builtins.py][], [itertools.py][], and [more_itertools.py][] for full
|
||||||
|
documentation of functions and abilities.
|
||||||
|
|
||||||
|
|
||||||
|
License
|
||||||
|
-------
|
||||||
|
|
||||||
|
aioitertools is copyright [Amethyst Reese](https://noswap.com), and licensed under
|
||||||
|
the MIT license. I am providing code in this repository to you under an open
|
||||||
|
source license. This is my personal repository; the license you receive to
|
||||||
|
my code is from me and not from my employer. See the `LICENSE` file for details.
|
||||||
|
|
||||||
|
|
||||||
|
[builtins.py]: https://github.com/omnilib/aioitertools/blob/main/aioitertools/builtins.py
|
||||||
|
[itertools.py]: https://github.com/omnilib/aioitertools/blob/main/aioitertools/itertools.py
|
||||||
|
[more_itertools.py]: https://github.com/omnilib/aioitertools/blob/main/aioitertools/more_itertools.py
|
||||||
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aioitertools
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aioitertools
|
||||||
|
</h1>
|
||||||
|
<a href="/aioitertools/aioitertools-0.13.0-py3-none-any.whl#sha256=0be0292b856f08dfac90e31f4739432f4cb6d7520ab9eb73e143f4f2fa5259be" data-requires-python=">=3.9" data-dist-info-metadata="sha256=91cc60bedec3f5f9760a37747dc07830081560c7ef42c3f9a8eb75626b1be01d">
|
||||||
|
aioitertools-0.13.0-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,112 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: aiosignal
|
||||||
|
Version: 1.4.0
|
||||||
|
Summary: aiosignal: a list of registered asynchronous callbacks
|
||||||
|
Home-page: https://github.com/aio-libs/aiosignal
|
||||||
|
Maintainer: aiohttp team <team@aiohttp.org>
|
||||||
|
Maintainer-email: team@aiohttp.org
|
||||||
|
License: Apache 2.0
|
||||||
|
Project-URL: Chat: Gitter, https://gitter.im/aio-libs/Lobby
|
||||||
|
Project-URL: CI: GitHub Actions, https://github.com/aio-libs/aiosignal/actions
|
||||||
|
Project-URL: Coverage: codecov, https://codecov.io/github/aio-libs/aiosignal
|
||||||
|
Project-URL: Docs: RTD, https://docs.aiosignal.org
|
||||||
|
Project-URL: GitHub: issues, https://github.com/aio-libs/aiosignal/issues
|
||||||
|
Project-URL: GitHub: repo, https://github.com/aio-libs/aiosignal
|
||||||
|
Classifier: License :: OSI Approved :: Apache Software License
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Operating System :: POSIX
|
||||||
|
Classifier: Operating System :: MacOS :: MacOS X
|
||||||
|
Classifier: Operating System :: Microsoft :: Windows
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE
|
||||||
|
Requires-Dist: frozenlist>=1.1.0
|
||||||
|
Requires-Dist: typing-extensions>=4.2; python_version < "3.13"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
=========
|
||||||
|
aiosignal
|
||||||
|
=========
|
||||||
|
|
||||||
|
.. image:: https://github.com/aio-libs/aiosignal/workflows/CI/badge.svg
|
||||||
|
:target: https://github.com/aio-libs/aiosignal/actions?query=workflow%3ACI
|
||||||
|
:alt: GitHub status for master branch
|
||||||
|
|
||||||
|
.. image:: https://codecov.io/gh/aio-libs/aiosignal/branch/master/graph/badge.svg?flag=pytest
|
||||||
|
:target: https://codecov.io/gh/aio-libs/aiosignal?flags[0]=pytest
|
||||||
|
:alt: codecov.io status for master branch
|
||||||
|
|
||||||
|
.. image:: https://badge.fury.io/py/aiosignal.svg
|
||||||
|
:target: https://pypi.org/project/aiosignal
|
||||||
|
:alt: Latest PyPI package version
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiosignal/badge/?version=latest
|
||||||
|
:target: https://aiosignal.readthedocs.io/
|
||||||
|
:alt: Latest Read The Docs
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/discourse/topics?server=https%3A%2F%2Faio-libs.discourse.group%2F
|
||||||
|
:target: https://aio-libs.discourse.group/
|
||||||
|
:alt: Discourse group for io-libs
|
||||||
|
|
||||||
|
.. image:: https://badges.gitter.im/Join%20Chat.svg
|
||||||
|
:target: https://gitter.im/aio-libs/Lobby
|
||||||
|
:alt: Chat on Gitter
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
A project to manage callbacks in `asyncio` projects.
|
||||||
|
|
||||||
|
``Signal`` is a list of registered asynchronous callbacks.
|
||||||
|
|
||||||
|
The signal's life-cycle has two stages: after creation its content
|
||||||
|
could be filled by using standard list operations: ``sig.append()``
|
||||||
|
etc.
|
||||||
|
|
||||||
|
After you call ``sig.freeze()`` the signal is *frozen*: adding, removing
|
||||||
|
and dropping callbacks is forbidden.
|
||||||
|
|
||||||
|
The only available operation is calling the previously registered
|
||||||
|
callbacks by using ``await sig.send(data)``.
|
||||||
|
|
||||||
|
For concrete usage examples see the `Signals
|
||||||
|
<https://docs.aiohttp.org/en/stable/web_advanced.html#aiohttp-web-signals>
|
||||||
|
section of the `Web Server Advanced
|
||||||
|
<https://docs.aiohttp.org/en/stable/web_advanced.html>` chapter of the `aiohttp
|
||||||
|
documentation`_.
|
||||||
|
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ pip install aiosignal
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
https://aiosignal.readthedocs.io/
|
||||||
|
|
||||||
|
License
|
||||||
|
=======
|
||||||
|
|
||||||
|
``aiosignal`` is offered under the Apache 2 license.
|
||||||
|
|
||||||
|
Source code
|
||||||
|
===========
|
||||||
|
|
||||||
|
The project is hosted on GitHub_
|
||||||
|
|
||||||
|
Please file an issue in the `bug tracker
|
||||||
|
<https://github.com/aio-libs/aiosignal/issues>`_ if you have found a bug
|
||||||
|
or have some suggestions to improve the library.
|
||||||
|
|
||||||
|
.. _GitHub: https://github.com/aio-libs/aiosignal
|
||||||
|
.. _aiohttp documentation: https://docs.aiohttp.org/
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiosignal
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiosignal
|
||||||
|
</h1>
|
||||||
|
<a href="/aiosignal/aiosignal-1.4.0-py3-none-any.whl#sha256=053243f8b92b990551949e63930a839ff0cf0b0ebbe0597b0f3fb19e1a0fe82e" data-requires-python=">=3.9" data-dist-info-metadata="sha256=09247ef1da8bc696728d47130e702e4307f6f7d101d6c485bff9f3a71ce7008d">
|
||||||
|
aiosignal-1.4.0-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,123 @@
|
||||||
|
Metadata-Version: 2.3
|
||||||
|
Name: aiosqlite
|
||||||
|
Version: 0.21.0
|
||||||
|
Summary: asyncio bridge to the standard sqlite3 module
|
||||||
|
Author-email: Amethyst Reese <amethyst@n7.gg>
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Framework :: AsyncIO
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: MIT License
|
||||||
|
Classifier: Topic :: Software Development :: Libraries
|
||||||
|
Requires-Dist: typing_extensions >= 4.0
|
||||||
|
Requires-Dist: attribution==1.7.1 ; extra == "dev"
|
||||||
|
Requires-Dist: black==24.3.0 ; extra == "dev"
|
||||||
|
Requires-Dist: build>=1.2 ; extra == "dev"
|
||||||
|
Requires-Dist: coverage[toml]==7.6.10 ; extra == "dev"
|
||||||
|
Requires-Dist: flake8==7.0.0 ; extra == "dev"
|
||||||
|
Requires-Dist: flake8-bugbear==24.12.12 ; extra == "dev"
|
||||||
|
Requires-Dist: flit==3.10.1 ; extra == "dev"
|
||||||
|
Requires-Dist: mypy==1.14.1 ; extra == "dev"
|
||||||
|
Requires-Dist: ufmt==2.5.1 ; extra == "dev"
|
||||||
|
Requires-Dist: usort==1.0.8.post1 ; extra == "dev"
|
||||||
|
Requires-Dist: sphinx==8.1.3 ; extra == "docs"
|
||||||
|
Requires-Dist: sphinx-mdinclude==0.6.1 ; extra == "docs"
|
||||||
|
Project-URL: Documentation, https://aiosqlite.omnilib.dev
|
||||||
|
Project-URL: Github, https://github.com/omnilib/aiosqlite
|
||||||
|
Provides-Extra: dev
|
||||||
|
Provides-Extra: docs
|
||||||
|
|
||||||
|
aiosqlite\: Sqlite for AsyncIO
|
||||||
|
==============================
|
||||||
|
|
||||||
|
.. image:: https://readthedocs.org/projects/aiosqlite/badge/?version=latest
|
||||||
|
:target: https://aiosqlite.omnilib.dev/en/latest/?badge=latest
|
||||||
|
:alt: Documentation Status
|
||||||
|
.. image:: https://img.shields.io/pypi/v/aiosqlite.svg
|
||||||
|
:target: https://pypi.org/project/aiosqlite
|
||||||
|
:alt: PyPI Release
|
||||||
|
.. image:: https://img.shields.io/badge/change-log-blue
|
||||||
|
:target: https://github.com/omnilib/aiosqlite/blob/master/CHANGELOG.md
|
||||||
|
:alt: Changelog
|
||||||
|
.. image:: https://img.shields.io/pypi/l/aiosqlite.svg
|
||||||
|
:target: https://github.com/omnilib/aiosqlite/blob/master/LICENSE
|
||||||
|
:alt: MIT Licensed
|
||||||
|
|
||||||
|
aiosqlite provides a friendly, async interface to sqlite databases.
|
||||||
|
|
||||||
|
It replicates the standard ``sqlite3`` module, but with async versions
|
||||||
|
of all the standard connection and cursor methods, plus context managers for
|
||||||
|
automatically closing connections and cursors:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
async with aiosqlite.connect(...) as db:
|
||||||
|
await db.execute("INSERT INTO some_table ...")
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
async with db.execute("SELECT * FROM some_table") as cursor:
|
||||||
|
async for row in cursor:
|
||||||
|
...
|
||||||
|
|
||||||
|
It can also be used in the traditional, procedural manner:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
db = await aiosqlite.connect(...)
|
||||||
|
cursor = await db.execute('SELECT * FROM some_table')
|
||||||
|
row = await cursor.fetchone()
|
||||||
|
rows = await cursor.fetchall()
|
||||||
|
await cursor.close()
|
||||||
|
await db.close()
|
||||||
|
|
||||||
|
aiosqlite also replicates most of the advanced features of ``sqlite3``:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
async with aiosqlite.connect(...) as db:
|
||||||
|
db.row_factory = aiosqlite.Row
|
||||||
|
async with db.execute('SELECT * FROM some_table') as cursor:
|
||||||
|
async for row in cursor:
|
||||||
|
value = row['column']
|
||||||
|
|
||||||
|
await db.execute('INSERT INTO foo some_table')
|
||||||
|
assert db.total_changes > 0
|
||||||
|
|
||||||
|
|
||||||
|
Install
|
||||||
|
-------
|
||||||
|
|
||||||
|
aiosqlite is compatible with Python 3.8 and newer.
|
||||||
|
You can install it from PyPI:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ pip install aiosqlite
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
aiosqlite allows interaction with SQLite databases on the main AsyncIO event
|
||||||
|
loop without blocking execution of other coroutines while waiting for queries
|
||||||
|
or data fetches. It does this by using a single, shared thread per connection.
|
||||||
|
This thread executes all actions within a shared request queue to prevent
|
||||||
|
overlapping actions.
|
||||||
|
|
||||||
|
Connection objects are proxies to the real connections, contain the shared
|
||||||
|
execution thread, and provide context managers to handle automatically closing
|
||||||
|
connections. Cursors are similarly proxies to the real cursors, and provide
|
||||||
|
async iterators to query results.
|
||||||
|
|
||||||
|
|
||||||
|
License
|
||||||
|
-------
|
||||||
|
|
||||||
|
aiosqlite is copyright `Amethyst Reese <https://noswap.com>`_, and licensed under the
|
||||||
|
MIT license. I am providing code in this repository to you under an open source
|
||||||
|
license. This is my personal repository; the license you receive to my code
|
||||||
|
is from me and not from my employer. See the `LICENSE`_ file for details.
|
||||||
|
|
||||||
|
.. _LICENSE: https://github.com/omnilib/aiosqlite/blob/master/LICENSE
|
||||||
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for aiosqlite
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for aiosqlite
|
||||||
|
</h1>
|
||||||
|
<a href="/aiosqlite/aiosqlite-0.21.0-py3-none-any.whl#sha256=2549cf4057f95f53dcba16f2b64e8e2791d7e1adedb13197dd8ed77bb226d7d0" data-requires-python=">=3.9" data-dist-info-metadata="sha256=cda37b4c739073560a18b8d818fad2214d6497014bd9640950afdc69172301cd">
|
||||||
|
aiosqlite-0.21.0-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,590 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: airium
|
||||||
|
Version: 0.2.7
|
||||||
|
Summary: Easy and quick html builder with natural syntax correspondence (python->html). No templates needed. Serves pure pythonic library with no dependencies.
|
||||||
|
Home-page: https://gitlab.com/kamichal/airium
|
||||||
|
Author: Michał Kaczmarczyk
|
||||||
|
Author-email: michal.s.kaczmarczyk@gmail.com
|
||||||
|
Maintainer: Michał Kaczmarczyk
|
||||||
|
Maintainer-email: michal.s.kaczmarczyk@gmail.com
|
||||||
|
License: MIT
|
||||||
|
Keywords: natural html generator compiler template-less
|
||||||
|
Classifier: Development Status :: 4 - Beta
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Intended Audience :: Information Technology
|
||||||
|
Classifier: Intended Audience :: Science/Research
|
||||||
|
Classifier: Intended Audience :: System Administrators
|
||||||
|
Classifier: Intended Audience :: Telecommunications Industry
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Topic :: Database :: Front-Ends
|
||||||
|
Classifier: Topic :: Documentation
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Classifier: Topic :: Scientific/Engineering :: Visualization
|
||||||
|
Classifier: Topic :: Software Development :: Code Generators
|
||||||
|
Classifier: Topic :: Text Processing :: Markup :: HTML
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: dev
|
||||||
|
Requires-Dist: check-manifest; extra == "dev"
|
||||||
|
Requires-Dist: flake8~=7.1; extra == "dev"
|
||||||
|
Requires-Dist: mypy~=1.10; extra == "dev"
|
||||||
|
Requires-Dist: pytest-cov~=3.0; extra == "dev"
|
||||||
|
Requires-Dist: pytest-mock~=3.6; extra == "dev"
|
||||||
|
Requires-Dist: pytest~=6.2; extra == "dev"
|
||||||
|
Requires-Dist: types-beautifulsoup4~=4.12; extra == "dev"
|
||||||
|
Requires-Dist: types-requests~=2.32; extra == "dev"
|
||||||
|
Provides-Extra: parse
|
||||||
|
Requires-Dist: requests<3,>=2.12.0; extra == "parse"
|
||||||
|
Requires-Dist: beautifulsoup4<5.0,>=4.10.0; extra == "parse"
|
||||||
|
Dynamic: author
|
||||||
|
Dynamic: author-email
|
||||||
|
Dynamic: classifier
|
||||||
|
Dynamic: description
|
||||||
|
Dynamic: description-content-type
|
||||||
|
Dynamic: home-page
|
||||||
|
Dynamic: keywords
|
||||||
|
Dynamic: license
|
||||||
|
Dynamic: license-file
|
||||||
|
Dynamic: maintainer
|
||||||
|
Dynamic: maintainer-email
|
||||||
|
Dynamic: provides-extra
|
||||||
|
Dynamic: summary
|
||||||
|
|
||||||
|
## Airium
|
||||||
|
|
||||||
|
Bidirectional `HTML`-`python` translator.
|
||||||
|
|
||||||
|
[](https://pypi.python.org/pypi/airium/)
|
||||||
|
[](https://gitlab.com/kamichal/airium/-/commits/master)
|
||||||
|
[](https://gitlab.com/kamichal/airium/-/commits/master)
|
||||||
|
[](https://pypi.org/project/airium/)
|
||||||
|
[](https://pypi.python.org/pypi/airium/)
|
||||||
|
[](https://pypi.python.org/pypi/airium/)
|
||||||
|
|
||||||
|
Key features:
|
||||||
|
|
||||||
|
- simple, straight-forward
|
||||||
|
- template-less (just the python, you may say goodbye to all the templates)
|
||||||
|
- DOM structure is strictly represented by python indentation (with context-managers)
|
||||||
|
- gives much cleaner `HTML` than regular templates
|
||||||
|
- equipped with reverse translator: `HTML` to python
|
||||||
|
- can output either pretty (default) or minified `HTML` code
|
||||||
|
|
||||||
|
# Generating `HTML` code in python using `airium`
|
||||||
|
|
||||||
|
#### Basic `HTML` page (hello world)
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
a('<!DOCTYPE html>')
|
||||||
|
with a.html(lang="pl"):
|
||||||
|
with a.head():
|
||||||
|
a.meta(charset="utf-8")
|
||||||
|
a.title(_t="Airium example")
|
||||||
|
|
||||||
|
with a.body():
|
||||||
|
with a.h3(id="id23409231", klass='main_header'):
|
||||||
|
a("Hello World.")
|
||||||
|
|
||||||
|
html = str(a) # casting to string extracts the value
|
||||||
|
# or directly to UTF-8 encoded bytes:
|
||||||
|
html_bytes = bytes(a) # casting to bytes is a shortcut to str(a).encode('utf-8')
|
||||||
|
|
||||||
|
print(html)
|
||||||
|
```
|
||||||
|
|
||||||
|
Prints such a string:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="pl">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<title>Airium example</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h3 id="id23409231" class="main_header">
|
||||||
|
Hello World.
|
||||||
|
</h3>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
```
|
||||||
|
|
||||||
|
In order to store it as a file, just:
|
||||||
|
|
||||||
|
```python
|
||||||
|
with open('that/file/path.html', 'wb') as f:
|
||||||
|
f.write(bytes(html))
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Simple image in a div
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
with a.div():
|
||||||
|
a.img(src='source.png', alt='alt text')
|
||||||
|
a('the text')
|
||||||
|
|
||||||
|
html_str = str(a)
|
||||||
|
print(html_str)
|
||||||
|
```
|
||||||
|
|
||||||
|
```html
|
||||||
|
|
||||||
|
<div>
|
||||||
|
<img src="source.png" alt="alt text"/>
|
||||||
|
the text
|
||||||
|
</div>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Table
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
with a.table(id='table_372'):
|
||||||
|
with a.tr(klass='header_row'):
|
||||||
|
a.th(_t='no.')
|
||||||
|
a.th(_t='Firstname')
|
||||||
|
a.th(_t='Lastname')
|
||||||
|
|
||||||
|
with a.tr():
|
||||||
|
a.td(_t='1.')
|
||||||
|
a.td(id='jbl', _t='Jill')
|
||||||
|
a.td(_t='Smith') # can use _t or text
|
||||||
|
|
||||||
|
with a.tr():
|
||||||
|
a.td(_t='2.')
|
||||||
|
a.td(_t='Roland', id='rmd')
|
||||||
|
a.td(_t='Mendel')
|
||||||
|
|
||||||
|
table_str = str(a)
|
||||||
|
print(table_str)
|
||||||
|
|
||||||
|
# To store it to a file:
|
||||||
|
with open('/tmp/airium_www.example.com.py') as f:
|
||||||
|
f.write(table_str)
|
||||||
|
```
|
||||||
|
|
||||||
|
Now `table_str` contains such a string:
|
||||||
|
|
||||||
|
```html
|
||||||
|
|
||||||
|
<table id="table_372">
|
||||||
|
<tr class="header_row">
|
||||||
|
<th>no.</th>
|
||||||
|
<th>Firstname</th>
|
||||||
|
<th>Lastname</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>1.</td>
|
||||||
|
<td id="jbl">Jill</td>
|
||||||
|
<td>Smith</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>2.</td>
|
||||||
|
<td id="rmd">Roland</td>
|
||||||
|
<td>Mendel</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Chaining shortcut for elements with only one child
|
||||||
|
|
||||||
|
_New in version 0.2.2_
|
||||||
|
|
||||||
|
Having a structure with large number of `with` statements:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
with a.article():
|
||||||
|
with a.table():
|
||||||
|
with a.thead():
|
||||||
|
with a.tr():
|
||||||
|
a.th(_t='Column 1')
|
||||||
|
a.th(_t='Column 2')
|
||||||
|
with a.tbody():
|
||||||
|
with a.tr():
|
||||||
|
with a.td():
|
||||||
|
a.strong(_t='Value 1')
|
||||||
|
a.td(_t='Value 2')
|
||||||
|
|
||||||
|
table_str = str(a)
|
||||||
|
print(table_str)
|
||||||
|
```
|
||||||
|
|
||||||
|
You may use a shortcut that is equivalent to:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
with a.article().table():
|
||||||
|
with a.thead().tr():
|
||||||
|
a.th(_t="Column 1")
|
||||||
|
a.th(_t="Column 2")
|
||||||
|
with a.tbody().tr():
|
||||||
|
a.td().strong(_t="Value 1")
|
||||||
|
a.td(_t="Value 2")
|
||||||
|
|
||||||
|
table_str = str(a)
|
||||||
|
print(table_str)
|
||||||
|
```
|
||||||
|
|
||||||
|
```html
|
||||||
|
|
||||||
|
<article>
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Column 1</th>
|
||||||
|
<th>Column 2</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<strong>Value 1</strong>
|
||||||
|
</td>
|
||||||
|
<td>Value 2</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</article>
|
||||||
|
```
|
||||||
|
|
||||||
|
# Options
|
||||||
|
|
||||||
|
### Pretty or Minify
|
||||||
|
|
||||||
|
By default, airium biulds `HTML` code indented with spaces and with line breaks being line feed `\n` characters.
|
||||||
|
It can be changed while creating an `Airium` instance. In general all avaliable arguments whit their default values are:
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = Airium(
|
||||||
|
base_indent=' ', # str
|
||||||
|
current_level=0, # int
|
||||||
|
source_minify=False, # bool
|
||||||
|
source_line_break_character="\n", # str
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### minify
|
||||||
|
|
||||||
|
That's a mode when size of the code is minimized, i.e. contains as less whitespaces as it's possible.
|
||||||
|
The option can be enabled with `source_minify` argument, i.e.:
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = Airium(source_minify=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
In case if you need to explicitly add a line break in the source code (not the `<br/>`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = Airium(source_minify=True)
|
||||||
|
a.h1(_t="Here's your table")
|
||||||
|
with a.table():
|
||||||
|
with a.tr():
|
||||||
|
a.break_source_line()
|
||||||
|
a.th(_t="Cell 11")
|
||||||
|
a.th(_t="Cell 12")
|
||||||
|
with a.tr():
|
||||||
|
a.break_source_line()
|
||||||
|
a.th(_t="Cell 21")
|
||||||
|
a.th(_t="Cell 22")
|
||||||
|
a.break_source_line()
|
||||||
|
a.p(_t="Another content goes here")
|
||||||
|
```
|
||||||
|
|
||||||
|
Will result with such a code:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<h1>Here's your table</h1><table><tr>
|
||||||
|
<th>Cell 11</th><th>Cell 12</th></tr><tr>
|
||||||
|
<th>Cell 21</th><th>Cell 22</th></tr>
|
||||||
|
</table><p>Another content goes here</p>
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that the `break_source_line` cannot be used
|
||||||
|
in [context manager chains](#chaining-shortcut-for-elements-with-only-one-child).
|
||||||
|
|
||||||
|
#### indent style
|
||||||
|
|
||||||
|
The default indent of the generated HTML code has two spaces per each indent level.
|
||||||
|
You can change it to `\t` or 4 spaces by setting `Airium` constructor argument, e.g.:
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = Airium(base_indent="\t") # one tab symbol
|
||||||
|
a = Airium(base_indent=" ") # 4 spaces per each indentation level
|
||||||
|
a = Airium(base_indent=" ") # 1 space per one level
|
||||||
|
# pick one of the above statements, it can be mixed with other arguments
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that this setting is ignored when `source_minify` argument is set to `True` (see above).
|
||||||
|
|
||||||
|
There is a special case when you set the base indent to empty string. It would disable indentation,
|
||||||
|
but line breaks will be still added. In order to get rid of line breaks, check the `source_minify` argument.
|
||||||
|
|
||||||
|
#### indent level
|
||||||
|
|
||||||
|
The `current_level` being an integer can be set to non-negative
|
||||||
|
value, wich will cause `airium` to start indentation with level offset given by the number.
|
||||||
|
|
||||||
|
#### line break character
|
||||||
|
|
||||||
|
By default, just a line feed (`\n`) is used for terminating lines of the generated code.
|
||||||
|
You can change it to different style, e.g. `\r\n` or `\r` by setting `source_line_break_character` to the desired value.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = Airium(source_line_break_character="\r\n") # windows' style
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that the setting has no effect when `source_minify` argument is set to `True` (see above).
|
||||||
|
|
||||||
|
# Using airium with web-frameworks
|
||||||
|
|
||||||
|
Airium can be used with frameworks like Flask or Django. It can completely replace
|
||||||
|
template engines, reducing code-files scater, which may bring better code organization, and some other reasons.
|
||||||
|
|
||||||
|
Here is an example of using airium with django. It implements reusable `basic_body` and a view called `index`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# file: your_app/views.py
|
||||||
|
import contextlib
|
||||||
|
import inspect
|
||||||
|
|
||||||
|
from airium import Airium
|
||||||
|
from django.http import HttpResponse
|
||||||
|
|
||||||
|
|
||||||
|
@contextlib.contextmanager
|
||||||
|
def basic_body(a: Airium, useful_name: str = ''):
|
||||||
|
"""Works like a Django/Ninja template."""
|
||||||
|
|
||||||
|
a('<!DOCTYPE html>')
|
||||||
|
with a.html(lang='en'):
|
||||||
|
with a.head():
|
||||||
|
a.meta(charset='utf-8')
|
||||||
|
a.meta(content='width=device-width, initial-scale=1', name='viewport')
|
||||||
|
# do not use CSS from this URL in a production, it's just for an educational purpose
|
||||||
|
a.link(href='https://unpkg.com/@picocss/pico@1.4.1/css/pico.css', rel='stylesheet')
|
||||||
|
a.title(_t=f'Hello World')
|
||||||
|
|
||||||
|
with a.body():
|
||||||
|
with a.div():
|
||||||
|
with a.nav(klass='container-fluid'):
|
||||||
|
with a.ul():
|
||||||
|
with a.li():
|
||||||
|
with a.a(klass='contrast', href='./'):
|
||||||
|
a.strong(_t="⌨ Foo Bar")
|
||||||
|
with a.ul():
|
||||||
|
with a.li():
|
||||||
|
a.a(klass='contrast', href='#', **{'data-theme-switcher': 'auto'}, _t='Auto')
|
||||||
|
with a.li():
|
||||||
|
a.a(klass='contrast', href='#', **{'data-theme-switcher': 'light'}, _t='Light')
|
||||||
|
with a.li():
|
||||||
|
a.a(klass='contrast', href='#', **{'data-theme-switcher': 'dark'}, _t='Dark')
|
||||||
|
|
||||||
|
with a.header(klass='container'):
|
||||||
|
with a.hgroup():
|
||||||
|
a.h1(_t=f"You're on the {useful_name}")
|
||||||
|
a.h2(_t="It's a page made by our automatons with a power of steam engines.")
|
||||||
|
|
||||||
|
with a.main(klass='container'):
|
||||||
|
yield # This is the point where main content gets inserted
|
||||||
|
|
||||||
|
with a.footer(klass='container'):
|
||||||
|
with a.small():
|
||||||
|
margin = 'margin: auto 10px;'
|
||||||
|
a.span(_t='© Airium HTML generator example', style=margin)
|
||||||
|
|
||||||
|
# do not use JS from this URL in a production, it's just for an educational purpose
|
||||||
|
a.script(src='https://picocss.com/examples/js/minimal-theme-switcher.js')
|
||||||
|
|
||||||
|
|
||||||
|
def index(request) -> HttpResponse:
|
||||||
|
a = Airium()
|
||||||
|
with basic_body(a, f'main page: {request.path}'):
|
||||||
|
with a.article():
|
||||||
|
a.h3(_t="Hello World from Django running Airium")
|
||||||
|
with a.p().small():
|
||||||
|
a("This bases on ")
|
||||||
|
with a.a(href="https://picocss.com/examples/company/"):
|
||||||
|
a("Pico.css / Company example")
|
||||||
|
|
||||||
|
with a.p():
|
||||||
|
a("Instead of a HTML template, airium has been used.")
|
||||||
|
a("The whole body is generated by a template "
|
||||||
|
"and the article code looks like that:")
|
||||||
|
|
||||||
|
with a.code().pre():
|
||||||
|
a(inspect.getsource(index))
|
||||||
|
|
||||||
|
return HttpResponse(bytes(a)) # from django.http import HttpResponse
|
||||||
|
```
|
||||||
|
|
||||||
|
Route it in `urls.py` just like a regular view:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# file: your_app/urls.py
|
||||||
|
from django.contrib import admin
|
||||||
|
from django.urls import path
|
||||||
|
|
||||||
|
import your_app
|
||||||
|
|
||||||
|
urlpatterns = [
|
||||||
|
path('index/', your_app.views.index),
|
||||||
|
path('admin/', admin.site.urls),
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
The result ing web page on my machine looks like that:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
# Reverse translation
|
||||||
|
|
||||||
|
Airium is equipped with a transpiler `[HTML -> py]`.
|
||||||
|
It generates python code out of a given `HTML` string.
|
||||||
|
|
||||||
|
### Using reverse translator as a binary:
|
||||||
|
|
||||||
|
Ensure you have [installed](#installation) `[parse]` extras. Then call in command line:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
airium http://www.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
That will fetch the document and translate it to python code.
|
||||||
|
The code calls `airium` statements that reproduce the `HTML` document given.
|
||||||
|
It may give a clue - how to define `HTML` structure for a given
|
||||||
|
web page using `airium` package.
|
||||||
|
|
||||||
|
To store the translation's result into a file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
airium http://www.example.com > /tmp/airium_example_com.py
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also parse local `HTML` files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
airium /path/to/your_file.html > /tmp/airium_my_file.py
|
||||||
|
```
|
||||||
|
|
||||||
|
You may also try to parse your Django templates. I'm not sure if it works,
|
||||||
|
but there will be probably not much to fix.
|
||||||
|
|
||||||
|
### Using reverse translator as python code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from airium import from_html_to_airium
|
||||||
|
|
||||||
|
# assume we have such a page given as a string:
|
||||||
|
html_str = """\
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="pl">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<title>Airium example</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h3 id="id23409231" class="main_header">
|
||||||
|
Hello World.
|
||||||
|
</h3>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
"""
|
||||||
|
|
||||||
|
# to convert the html into python, just call:
|
||||||
|
|
||||||
|
py_str = from_html_to_airium(html_str)
|
||||||
|
|
||||||
|
# airium tests ensure that the result of the conversion is equal to the string:
|
||||||
|
assert py_str == """\
|
||||||
|
#!/usr/bin/env python
|
||||||
|
# File generated by reverse AIRIUM translator (version 0.2.7).
|
||||||
|
# Any change will be overridden on next run.
|
||||||
|
# flake8: noqa E501 (line too long)
|
||||||
|
|
||||||
|
from airium import Airium
|
||||||
|
|
||||||
|
a = Airium()
|
||||||
|
|
||||||
|
a('<!DOCTYPE html>')
|
||||||
|
with a.html(lang='pl'):
|
||||||
|
with a.head():
|
||||||
|
a.meta(charset='utf-8')
|
||||||
|
a.title(_t='Airium example')
|
||||||
|
with a.body():
|
||||||
|
a.h3(klass='main_header', id='id23409231', _t='Hello World.')
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
### <a name="transpiler_limitations">Transpiler limitations</a>
|
||||||
|
|
||||||
|
> so far in version 0.2.2:
|
||||||
|
|
||||||
|
- result of translation does not keep exact amount of leading whitespaces
|
||||||
|
within `<pre>` tags. They come over-indented in python code.
|
||||||
|
|
||||||
|
This is not however an issue when code is generated from python to `HTML`.
|
||||||
|
|
||||||
|
- although it keeps the proper tags structure, the transpiler does not
|
||||||
|
chain all the `with` statements, so in some cases the generated
|
||||||
|
code may be much indented.
|
||||||
|
|
||||||
|
- it's not too fast
|
||||||
|
|
||||||
|
# <a name="installation">Installation</a>
|
||||||
|
|
||||||
|
If you need a new virtual environment, call:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
virtualenv venv
|
||||||
|
source venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
Having it activated - you may install airium like this:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install airium
|
||||||
|
```
|
||||||
|
|
||||||
|
In order to use reverse translation - two additional packages are needed, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install airium[parse]
|
||||||
|
```
|
||||||
|
|
||||||
|
Then check if the transpiler works by calling:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
airium --help
|
||||||
|
```
|
||||||
|
|
||||||
|
> Enjoy!
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for airium
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for airium
|
||||||
|
</h1>
|
||||||
|
<a href="/airium/airium-0.2.7-py3-none-any.whl#sha256=35e3ae334327b17b7c2fc39bb57ab2c48171ca849f8cf3dff11437d1e054952e" data-dist-info-metadata="sha256=48022884c676a59c85113445ae9e14ad7f149808fb5d62c2660f8c4567489fe5">
|
||||||
|
airium-0.2.7-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,187 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: apeye-core
|
||||||
|
Version: 1.1.5
|
||||||
|
Summary: Core (offline) functionality for the apeye library.
|
||||||
|
Project-URL: Homepage, https://github.com/domdfcoding/apeye-core
|
||||||
|
Project-URL: Issue Tracker, https://github.com/domdfcoding/apeye-core/issues
|
||||||
|
Project-URL: Source Code, https://github.com/domdfcoding/apeye-core
|
||||||
|
Author-email: Dominic Davis-Foster <dominic@davis-foster.co.uk>
|
||||||
|
License: Copyright (c) 2022, Dominic Davis-Foster
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without modification,
|
||||||
|
are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
* Redistributions in binary form must reproduce the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer in the documentation
|
||||||
|
and/or other materials provided with the distribution.
|
||||||
|
* Neither the name of the copyright holder nor the names of its contributors
|
||||||
|
may be used to endorse or promote products derived from this software without
|
||||||
|
specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||||
|
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||||
|
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||||
|
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
License-File: LICENSE
|
||||||
|
Keywords: url
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: BSD License
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.6
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.6.1
|
||||||
|
Requires-Dist: domdf-python-tools>=2.6.0
|
||||||
|
Requires-Dist: idna>=2.5
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
|
||||||
|
===========
|
||||||
|
apeye-core
|
||||||
|
===========
|
||||||
|
|
||||||
|
.. start short_desc
|
||||||
|
|
||||||
|
**Core (offline) functionality for the apeye library.**
|
||||||
|
|
||||||
|
.. end short_desc
|
||||||
|
|
||||||
|
|
||||||
|
.. start shields
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:stub-columns: 1
|
||||||
|
:widths: 10 90
|
||||||
|
|
||||||
|
* - Tests
|
||||||
|
- |actions_linux| |actions_windows| |actions_macos| |coveralls|
|
||||||
|
* - PyPI
|
||||||
|
- |pypi-version| |supported-versions| |supported-implementations| |wheel|
|
||||||
|
* - Anaconda
|
||||||
|
- |conda-version| |conda-platform|
|
||||||
|
* - Activity
|
||||||
|
- |commits-latest| |commits-since| |maintained| |pypi-downloads|
|
||||||
|
* - QA
|
||||||
|
- |codefactor| |actions_flake8| |actions_mypy|
|
||||||
|
* - Other
|
||||||
|
- |license| |language| |requires|
|
||||||
|
|
||||||
|
.. |actions_linux| image:: https://github.com/domdfcoding/apeye-core/workflows/Linux/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/actions?query=workflow%3A%22Linux%22
|
||||||
|
:alt: Linux Test Status
|
||||||
|
|
||||||
|
.. |actions_windows| image:: https://github.com/domdfcoding/apeye-core/workflows/Windows/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/actions?query=workflow%3A%22Windows%22
|
||||||
|
:alt: Windows Test Status
|
||||||
|
|
||||||
|
.. |actions_macos| image:: https://github.com/domdfcoding/apeye-core/workflows/macOS/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/actions?query=workflow%3A%22macOS%22
|
||||||
|
:alt: macOS Test Status
|
||||||
|
|
||||||
|
.. |actions_flake8| image:: https://github.com/domdfcoding/apeye-core/workflows/Flake8/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/actions?query=workflow%3A%22Flake8%22
|
||||||
|
:alt: Flake8 Status
|
||||||
|
|
||||||
|
.. |actions_mypy| image:: https://github.com/domdfcoding/apeye-core/workflows/mypy/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/actions?query=workflow%3A%22mypy%22
|
||||||
|
:alt: mypy status
|
||||||
|
|
||||||
|
.. |requires| image:: https://dependency-dash.repo-helper.uk/github/domdfcoding/apeye-core/badge.svg
|
||||||
|
:target: https://dependency-dash.repo-helper.uk/github/domdfcoding/apeye-core/
|
||||||
|
:alt: Requirements Status
|
||||||
|
|
||||||
|
.. |coveralls| image:: https://img.shields.io/coveralls/github/domdfcoding/apeye-core/master?logo=coveralls
|
||||||
|
:target: https://coveralls.io/github/domdfcoding/apeye-core?branch=master
|
||||||
|
:alt: Coverage
|
||||||
|
|
||||||
|
.. |codefactor| image:: https://img.shields.io/codefactor/grade/github/domdfcoding/apeye-core?logo=codefactor
|
||||||
|
:target: https://www.codefactor.io/repository/github/domdfcoding/apeye-core
|
||||||
|
:alt: CodeFactor Grade
|
||||||
|
|
||||||
|
.. |pypi-version| image:: https://img.shields.io/pypi/v/apeye-core
|
||||||
|
:target: https://pypi.org/project/apeye-core/
|
||||||
|
:alt: PyPI - Package Version
|
||||||
|
|
||||||
|
.. |supported-versions| image:: https://img.shields.io/pypi/pyversions/apeye-core?logo=python&logoColor=white
|
||||||
|
:target: https://pypi.org/project/apeye-core/
|
||||||
|
:alt: PyPI - Supported Python Versions
|
||||||
|
|
||||||
|
.. |supported-implementations| image:: https://img.shields.io/pypi/implementation/apeye-core
|
||||||
|
:target: https://pypi.org/project/apeye-core/
|
||||||
|
:alt: PyPI - Supported Implementations
|
||||||
|
|
||||||
|
.. |wheel| image:: https://img.shields.io/pypi/wheel/apeye-core
|
||||||
|
:target: https://pypi.org/project/apeye-core/
|
||||||
|
:alt: PyPI - Wheel
|
||||||
|
|
||||||
|
.. |conda-version| image:: https://img.shields.io/conda/v/conda-forge/apeye-core?logo=anaconda
|
||||||
|
:target: https://anaconda.org/conda-forge/apeye-core
|
||||||
|
:alt: Conda - Package Version
|
||||||
|
|
||||||
|
.. |conda-platform| image:: https://img.shields.io/conda/pn/conda-forge/apeye-core?label=conda%7Cplatform
|
||||||
|
:target: https://anaconda.org/conda-forge/apeye-core
|
||||||
|
:alt: Conda - Platform
|
||||||
|
|
||||||
|
.. |license| image:: https://img.shields.io/github/license/domdfcoding/apeye-core
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/blob/master/LICENSE
|
||||||
|
:alt: License
|
||||||
|
|
||||||
|
.. |language| image:: https://img.shields.io/github/languages/top/domdfcoding/apeye-core
|
||||||
|
:alt: GitHub top language
|
||||||
|
|
||||||
|
.. |commits-since| image:: https://img.shields.io/github/commits-since/domdfcoding/apeye-core/v1.1.5
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/pulse
|
||||||
|
:alt: GitHub commits since tagged version
|
||||||
|
|
||||||
|
.. |commits-latest| image:: https://img.shields.io/github/last-commit/domdfcoding/apeye-core
|
||||||
|
:target: https://github.com/domdfcoding/apeye-core/commit/master
|
||||||
|
:alt: GitHub last commit
|
||||||
|
|
||||||
|
.. |maintained| image:: https://img.shields.io/maintenance/yes/2024
|
||||||
|
:alt: Maintenance
|
||||||
|
|
||||||
|
.. |pypi-downloads| image:: https://img.shields.io/pypi/dm/apeye-core
|
||||||
|
:target: https://pypi.org/project/apeye-core/
|
||||||
|
:alt: PyPI - Downloads
|
||||||
|
|
||||||
|
.. end shields
|
||||||
|
|
||||||
|
Installation
|
||||||
|
--------------
|
||||||
|
|
||||||
|
.. start installation
|
||||||
|
|
||||||
|
``apeye-core`` can be installed from PyPI or Anaconda.
|
||||||
|
|
||||||
|
To install with ``pip``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ python -m pip install apeye-core
|
||||||
|
|
||||||
|
To install with ``conda``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ conda install -c conda-forge apeye-core
|
||||||
|
|
||||||
|
.. end installation
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for apeye-core
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for apeye-core
|
||||||
|
</h1>
|
||||||
|
<a href="/apeye-core/apeye_core-1.1.5-py3-none-any.whl#sha256=dc27a93f8c9e246b3b238c5ea51edf6115ab2618ef029b9f2d9a190ec8228fbf" data-requires-python=">=3.6.1" data-dist-info-metadata="sha256=751bbcd20a27f156c12183849bc78419fbac8ca5a51c29fb5137e01e6aeb5e78">
|
||||||
|
apeye_core-1.1.5-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,210 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: apeye
|
||||||
|
Version: 1.4.1
|
||||||
|
Summary: Handy tools for working with URLs and APIs.
|
||||||
|
Keywords: api,cache,requests,rest,url
|
||||||
|
Author-email: Dominic Davis-Foster <dominic@davis-foster.co.uk>
|
||||||
|
Requires-Python: >=3.6.1
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.6
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Dist: apeye-core>=1.0.0b2
|
||||||
|
Requires-Dist: domdf-python-tools>=2.6.0
|
||||||
|
Requires-Dist: platformdirs>=2.3.0
|
||||||
|
Requires-Dist: requests>=2.24.0
|
||||||
|
Requires-Dist: cachecontrol[filecache]>=0.12.6 ; extra == "all"
|
||||||
|
Requires-Dist: lockfile>=0.12.2 ; extra == "all"
|
||||||
|
Requires-Dist: cachecontrol[filecache]>=0.12.6 ; extra == "limiter"
|
||||||
|
Requires-Dist: lockfile>=0.12.2 ; extra == "limiter"
|
||||||
|
Project-URL: Documentation, https://apeye.readthedocs.io/en/latest
|
||||||
|
Project-URL: Homepage, https://github.com/domdfcoding/apeye
|
||||||
|
Project-URL: Issue Tracker, https://github.com/domdfcoding/apeye/issues
|
||||||
|
Project-URL: Source Code, https://github.com/domdfcoding/apeye
|
||||||
|
Provides-Extra: all
|
||||||
|
Provides-Extra: limiter
|
||||||
|
|
||||||
|
======
|
||||||
|
apeye
|
||||||
|
======
|
||||||
|
|
||||||
|
.. start short_desc
|
||||||
|
|
||||||
|
**Handy tools for working with URLs and APIs.**
|
||||||
|
|
||||||
|
.. end short_desc
|
||||||
|
|
||||||
|
|
||||||
|
.. start shields
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:stub-columns: 1
|
||||||
|
:widths: 10 90
|
||||||
|
|
||||||
|
* - Docs
|
||||||
|
- |docs| |docs_check|
|
||||||
|
* - Tests
|
||||||
|
- |actions_linux| |actions_windows| |actions_macos| |coveralls|
|
||||||
|
* - PyPI
|
||||||
|
- |pypi-version| |supported-versions| |supported-implementations| |wheel|
|
||||||
|
* - Anaconda
|
||||||
|
- |conda-version| |conda-platform|
|
||||||
|
* - Activity
|
||||||
|
- |commits-latest| |commits-since| |maintained| |pypi-downloads|
|
||||||
|
* - QA
|
||||||
|
- |codefactor| |actions_flake8| |actions_mypy|
|
||||||
|
* - Other
|
||||||
|
- |license| |language| |requires|
|
||||||
|
|
||||||
|
.. |docs| image:: https://img.shields.io/readthedocs/apeye/latest?logo=read-the-docs
|
||||||
|
:target: https://apeye.readthedocs.io/en/latest
|
||||||
|
:alt: Documentation Build Status
|
||||||
|
|
||||||
|
.. |docs_check| image:: https://github.com/domdfcoding/apeye/workflows/Docs%20Check/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22Docs+Check%22
|
||||||
|
:alt: Docs Check Status
|
||||||
|
|
||||||
|
.. |actions_linux| image:: https://github.com/domdfcoding/apeye/workflows/Linux/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22Linux%22
|
||||||
|
:alt: Linux Test Status
|
||||||
|
|
||||||
|
.. |actions_windows| image:: https://github.com/domdfcoding/apeye/workflows/Windows/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22Windows%22
|
||||||
|
:alt: Windows Test Status
|
||||||
|
|
||||||
|
.. |actions_macos| image:: https://github.com/domdfcoding/apeye/workflows/macOS/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22macOS%22
|
||||||
|
:alt: macOS Test Status
|
||||||
|
|
||||||
|
.. |actions_flake8| image:: https://github.com/domdfcoding/apeye/workflows/Flake8/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22Flake8%22
|
||||||
|
:alt: Flake8 Status
|
||||||
|
|
||||||
|
.. |actions_mypy| image:: https://github.com/domdfcoding/apeye/workflows/mypy/badge.svg
|
||||||
|
:target: https://github.com/domdfcoding/apeye/actions?query=workflow%3A%22mypy%22
|
||||||
|
:alt: mypy status
|
||||||
|
|
||||||
|
.. |requires| image:: https://dependency-dash.repo-helper.uk/github/domdfcoding/apeye/badge.svg
|
||||||
|
:target: https://dependency-dash.repo-helper.uk/github/domdfcoding/apeye/
|
||||||
|
:alt: Requirements Status
|
||||||
|
|
||||||
|
.. |coveralls| image:: https://img.shields.io/coveralls/github/domdfcoding/apeye/master?logo=coveralls
|
||||||
|
:target: https://coveralls.io/github/domdfcoding/apeye?branch=master
|
||||||
|
:alt: Coverage
|
||||||
|
|
||||||
|
.. |codefactor| image:: https://img.shields.io/codefactor/grade/github/domdfcoding/apeye?logo=codefactor
|
||||||
|
:target: https://www.codefactor.io/repository/github/domdfcoding/apeye
|
||||||
|
:alt: CodeFactor Grade
|
||||||
|
|
||||||
|
.. |pypi-version| image:: https://img.shields.io/pypi/v/apeye
|
||||||
|
:target: https://pypi.org/project/apeye/
|
||||||
|
:alt: PyPI - Package Version
|
||||||
|
|
||||||
|
.. |supported-versions| image:: https://img.shields.io/pypi/pyversions/apeye?logo=python&logoColor=white
|
||||||
|
:target: https://pypi.org/project/apeye/
|
||||||
|
:alt: PyPI - Supported Python Versions
|
||||||
|
|
||||||
|
.. |supported-implementations| image:: https://img.shields.io/pypi/implementation/apeye
|
||||||
|
:target: https://pypi.org/project/apeye/
|
||||||
|
:alt: PyPI - Supported Implementations
|
||||||
|
|
||||||
|
.. |wheel| image:: https://img.shields.io/pypi/wheel/apeye
|
||||||
|
:target: https://pypi.org/project/apeye/
|
||||||
|
:alt: PyPI - Wheel
|
||||||
|
|
||||||
|
.. |conda-version| image:: https://img.shields.io/conda/v/domdfcoding/apeye?logo=anaconda
|
||||||
|
:target: https://anaconda.org/domdfcoding/apeye
|
||||||
|
:alt: Conda - Package Version
|
||||||
|
|
||||||
|
.. |conda-platform| image:: https://img.shields.io/conda/pn/domdfcoding/apeye?label=conda%7Cplatform
|
||||||
|
:target: https://anaconda.org/domdfcoding/apeye
|
||||||
|
:alt: Conda - Platform
|
||||||
|
|
||||||
|
.. |license| image:: https://img.shields.io/github/license/domdfcoding/apeye
|
||||||
|
:target: https://github.com/domdfcoding/apeye/blob/master/LICENSE
|
||||||
|
:alt: License
|
||||||
|
|
||||||
|
.. |language| image:: https://img.shields.io/github/languages/top/domdfcoding/apeye
|
||||||
|
:alt: GitHub top language
|
||||||
|
|
||||||
|
.. |commits-since| image:: https://img.shields.io/github/commits-since/domdfcoding/apeye/v1.4.1
|
||||||
|
:target: https://github.com/domdfcoding/apeye/pulse
|
||||||
|
:alt: GitHub commits since tagged version
|
||||||
|
|
||||||
|
.. |commits-latest| image:: https://img.shields.io/github/last-commit/domdfcoding/apeye
|
||||||
|
:target: https://github.com/domdfcoding/apeye/commit/master
|
||||||
|
:alt: GitHub last commit
|
||||||
|
|
||||||
|
.. |maintained| image:: https://img.shields.io/maintenance/yes/2023
|
||||||
|
:alt: Maintenance
|
||||||
|
|
||||||
|
.. |pypi-downloads| image:: https://img.shields.io/pypi/dm/apeye
|
||||||
|
:target: https://pypi.org/project/apeye/
|
||||||
|
:alt: PyPI - Downloads
|
||||||
|
|
||||||
|
.. end shields
|
||||||
|
|
||||||
|
``apeye`` provides:
|
||||||
|
|
||||||
|
* ``pathlib.Path``\-like objects to represent URLs
|
||||||
|
* a JSON-backed cache decorator for functions
|
||||||
|
* a CacheControl_ adapter to limit the rate of requests
|
||||||
|
|
||||||
|
See `the documentation`_ for more details.
|
||||||
|
|
||||||
|
.. _CacheControl: https://github.com/ionrock/cachecontrol
|
||||||
|
.. _the documentation: https://apeye.readthedocs.io/en/latest/api/cache.html
|
||||||
|
|
||||||
|
Installation
|
||||||
|
--------------
|
||||||
|
|
||||||
|
.. start installation
|
||||||
|
|
||||||
|
``apeye`` can be installed from PyPI or Anaconda.
|
||||||
|
|
||||||
|
To install with ``pip``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ python -m pip install apeye
|
||||||
|
|
||||||
|
To install with ``conda``:
|
||||||
|
|
||||||
|
* First add the required channels
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ conda config --add channels https://conda.anaconda.org/conda-forge
|
||||||
|
$ conda config --add channels https://conda.anaconda.org/domdfcoding
|
||||||
|
|
||||||
|
* Then install
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ conda install apeye
|
||||||
|
|
||||||
|
.. end installation
|
||||||
|
|
||||||
|
|
||||||
|
.. attention::
|
||||||
|
|
||||||
|
In v0.9.0 and above the ``rate_limiter`` module requires the ``limiter`` extra to be installed:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ python -m pip install apeye[limiter]
|
||||||
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for apeye
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for apeye
|
||||||
|
</h1>
|
||||||
|
<a href="/apeye/apeye-1.4.1-py3-none-any.whl#sha256=44e58a9104ec189bf42e76b3a7fe91e2b2879d96d48e9a77e5e32ff699c9204e" data-requires-python=">=3.6.1" data-dist-info-metadata="sha256=c76bd745f0ea8d7105ed23a0827bf960cd651e8e071dbdeb62946a390ddf86c1">
|
||||||
|
apeye-1.4.1-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,250 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: asyncua
|
||||||
|
Version: 1.1.8
|
||||||
|
Summary: Pure Python OPC-UA client and server library
|
||||||
|
Project-URL: Homepage, http://freeopcua.github.io/
|
||||||
|
Project-URL: Repository, https://github.com/FreeOpcUa/opcua-asyncio
|
||||||
|
Author-email: Olivier Roulet-Dubonnet <olivier.roulet@gmail.com>
|
||||||
|
License: GNU Lesser General Public License v3 or later
|
||||||
|
License-File: COPYING
|
||||||
|
Classifier: Development Status :: 4 - Beta
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Requires-Dist: aiofiles
|
||||||
|
Requires-Dist: aiosqlite
|
||||||
|
Requires-Dist: cryptography>42.0.0
|
||||||
|
Requires-Dist: pyopenssl>23.2.0
|
||||||
|
Requires-Dist: python-dateutil
|
||||||
|
Requires-Dist: pytz
|
||||||
|
Requires-Dist: sortedcontainers
|
||||||
|
Requires-Dist: typing-extensions
|
||||||
|
Requires-Dist: wait-for2==0.3.2; python_version < '3.12'
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
|
||||||
|
OPC UA / IEC 62541 Client and Server for Python >= 3.8 and pypy3 .
|
||||||
|
http://freeopcua.github.io/, https://github.com/FreeOpcUa/opcua-asyncio
|
||||||
|
|
||||||
|
[](https://github.com/FreeOpcUa/opcua-asyncio/actions)
|
||||||
|
|
||||||
|
[](https://badge.fury.io/py/asyncua)
|
||||||
|
|
||||||
|
# opcua-asyncio
|
||||||
|
|
||||||
|
opcua-asyncio is an asyncio-based asynchronous OPC UA client and server based on python-opcua, removing support of python < 3.8.
|
||||||
|
Asynchronous programming allows for simpler code (e.g. less need for locks) and can potentially provide performance improvements.
|
||||||
|
This library also provides a [synchronous wrapper](https://github.com/FreeOpcUa/opcua-asyncio/blob/master/asyncua/sync.py) over the async API, which can be used in synchronous code instead of python-opcua.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
The OPC UA binary protocol implementation has undergone extensive testing with various OPC UA stacks. The API offers both a low level interface to send and receive all UA defined structures and high level classes allowing to write a server or a client in a few lines. It is easy to mix high level objects and low level UA calls in one application. Most low level code is autogenerated from xml specification.
|
||||||
|
|
||||||
|
The test coverage reported by coverage.py is over 95%, with the majority of the non-tested code being autogenerated code that is not currently in use.
|
||||||
|
|
||||||
|
# Warnings
|
||||||
|
|
||||||
|
opcua-asyncio is open-source and comes with absolutely no warranty. We try to keep it as bug-free as possible, and try to keep the API stable, but bugs and API changes will happen! In particular, API changes are expected to take place prior to any 1.0 release.
|
||||||
|
|
||||||
|
Some renaming of methods from get_xx to read_xx and set_xx to write_xxx have been made to better follow OPC UA naming conventions
|
||||||
|
|
||||||
|
Version 0.9.9 introduces some argument renaming due to more automatic code generation. Especially the arguments to NodeId, BrowseName, LocalizedText and DataValue, which are now CamelCase instead of lower case, following the OPC UA conventions used in all other structures in this library
|
||||||
|
|
||||||
|
# Installation
|
||||||
|
|
||||||
|
With uv/pip
|
||||||
|
|
||||||
|
uv pip install asyncua
|
||||||
|
|
||||||
|
# Usage
|
||||||
|
|
||||||
|
We assume that you already have some experience with Python, the asyncio module, the async / await syntax and the concept of asyncio Tasks.
|
||||||
|
|
||||||
|
## Client class
|
||||||
|
|
||||||
|
The `Client` class provides a high level API for connecting to OPC UA servers, session management and access to basic
|
||||||
|
address space services.
|
||||||
|
The client can be used as a context manager. The client will then automatically connect and disconnect withing the `with`syntax.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from asyncua import Client
|
||||||
|
|
||||||
|
async with Client(url='opc.tcp://localhost:4840/freeopcua/server/') as client:
|
||||||
|
while True:
|
||||||
|
# Do something with client
|
||||||
|
node = client.get_node('i=85')
|
||||||
|
value = await node.read_value()
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course, you can also call the `connect`, `disconnect` methods yourself if you do not want to use the context manager.
|
||||||
|
|
||||||
|
See the example folder and the code for more information on the client API.
|
||||||
|
|
||||||
|
## Node class
|
||||||
|
|
||||||
|
The `Node` class provides a high level API for management of nodes as well as data access services.
|
||||||
|
|
||||||
|
## Subscription class
|
||||||
|
|
||||||
|
The `Subscription` class provides a high level API for management of monitored items.
|
||||||
|
|
||||||
|
## Server class
|
||||||
|
|
||||||
|
The `Server` class provides a high level API for creation of OPC UA server instances.
|
||||||
|
|
||||||
|
# Documentation
|
||||||
|
|
||||||
|
The documentation is available here [ReadTheDocs](http://opcua-asyncio.readthedocs.org/en/latest/).
|
||||||
|
|
||||||
|
The API remains mostly unchanged with regards to [python-opcua](http://opcua-asyncio.rtfd.io/).
|
||||||
|
The main difference is that most methods are now asynchronous.
|
||||||
|
Please have a look at [the examples](https://github.com/FreeOpcUa/opcua-asyncio/blob/master/examples) and/or the code.
|
||||||
|
|
||||||
|
A simple GUI client is available at: https://github.com/FreeOpcUa/opcua-client-gui
|
||||||
|
|
||||||
|
Browse the examples: https://github.com/FreeOpcUa/opcua-asyncio/tree/master/examples
|
||||||
|
|
||||||
|
The minimal examples are a good starting point.
|
||||||
|
Minimal client example: https://github.com/FreeOpcUa/opcua-asyncio/blob/master/examples/client-minimal.py
|
||||||
|
Minimal server example: https://github.com/FreeOpcUa/opcua-asyncio/blob/master/examples/server-minimal.py
|
||||||
|
|
||||||
|
A set of command line tools also available: https://github.com/FreeOpcUa/opcua-asyncio/tree/master/tools
|
||||||
|
|
||||||
|
- `uadiscover `(find_servers, get_endpoints and find_servers_on_network calls)
|
||||||
|
- `uals `(list children of a node)
|
||||||
|
- `uahistoryread`
|
||||||
|
- `uaread `(read attribute of a node)
|
||||||
|
- `uawrite `(write attribute of a node)
|
||||||
|
- `uacall `(call method of a node)
|
||||||
|
- `uasubscribe `(subscribe to a node and print datachange events)
|
||||||
|
- `uaclient `(connect to server and start python shell)
|
||||||
|
- `uaserver `(starts a demo OPC UA server)
|
||||||
|
`tools/uaserver --populate --certificate cert.pem --private_key pk.pem`
|
||||||
|
|
||||||
|
How to generate certificate: https://github.com/FreeOpcUa/opcua-asyncio/tree/master/examples/generate_certificate.sh
|
||||||
|
|
||||||
|
## Client support
|
||||||
|
|
||||||
|
What works:
|
||||||
|
|
||||||
|
- connection to server, opening channel, session
|
||||||
|
- browsing and reading attributes value
|
||||||
|
- getting nodes by path and nodeids
|
||||||
|
- creating subscriptions
|
||||||
|
- subscribing to items for data change
|
||||||
|
- subscribing to events
|
||||||
|
- adding nodes
|
||||||
|
- method call
|
||||||
|
- user and password
|
||||||
|
- history read
|
||||||
|
- login with certificate
|
||||||
|
- communication encryption
|
||||||
|
- removing nodes
|
||||||
|
|
||||||
|
Tested servers: freeopcua C++, freeopcua Python, prosys, kepware, beckhoff, winCC, B&R, …
|
||||||
|
|
||||||
|
Not implemented yet:
|
||||||
|
|
||||||
|
- localized text feature
|
||||||
|
- XML protocol
|
||||||
|
- UDP (PubSub stuff)
|
||||||
|
- WebSocket
|
||||||
|
- maybe automatic reconnection...
|
||||||
|
|
||||||
|
## Server support
|
||||||
|
|
||||||
|
What works:
|
||||||
|
|
||||||
|
- creating channel and sessions
|
||||||
|
- read/set attributes and browse
|
||||||
|
- getting nodes by path and nodeids
|
||||||
|
- autogenerate address space from spec
|
||||||
|
- adding nodes to address space
|
||||||
|
- datachange events
|
||||||
|
- events
|
||||||
|
- methods
|
||||||
|
- basic user implementation (one existing user called admin, which can be disabled, all others are read only)
|
||||||
|
- encryption
|
||||||
|
- certificate handling
|
||||||
|
- removing nodes
|
||||||
|
- history support for data change and events
|
||||||
|
- more high level solution to create custom structures
|
||||||
|
|
||||||
|
Tested clients: freeopcua C++, freeopcua Python, uaexpert, prosys, quickopc
|
||||||
|
|
||||||
|
Not yet implemented:
|
||||||
|
|
||||||
|
- UDP (PubSub stuff)
|
||||||
|
- WebSocket
|
||||||
|
- session restore
|
||||||
|
- alarms
|
||||||
|
- XML protocol
|
||||||
|
- views
|
||||||
|
- localized text features
|
||||||
|
- better security model with users and password
|
||||||
|
|
||||||
|
### Running a server on a Raspberry Pi
|
||||||
|
|
||||||
|
Setting up the standard address-space from XML is the most time-consuming step of the startup process which may lead to
|
||||||
|
long startup times on less powerful devices like a Raspberry Pi. By passing a path to a cache-file to the server constructor,
|
||||||
|
a shelve holding the address space will be created during the first startup. All following startups will make use of the
|
||||||
|
cache-file which leads to significantly better startup performance (~3.5 vs 125 seconds on a Raspberry Pi Model B).
|
||||||
|
|
||||||
|
# Development
|
||||||
|
|
||||||
|
Code follows PEP8 apart for line lengths which should be max 160 characters and OPC UA structures that keep camel case
|
||||||
|
from XML definition.
|
||||||
|
|
||||||
|
All protocol code is under opcua directory
|
||||||
|
|
||||||
|
- `asyncua/ua` contains all UA structures from specification, most are autogenerated
|
||||||
|
- `asyncua/common` contains high level objects and methods used both in server and client
|
||||||
|
- `asyncua/client` contains client specific code
|
||||||
|
- `asyncua/server` contains server specific code
|
||||||
|
- `asyncua/utils` contains some utilities function and classes
|
||||||
|
- `asyncua/tools` contains code for command lines tools
|
||||||
|
- `schemas` contains the XML and text files from specification and the python scripts used to autogenerate code
|
||||||
|
- `tests` contains tests
|
||||||
|
- `docs` contains files to auto generate documentation from doc strings
|
||||||
|
- `examples` contains many example files
|
||||||
|
- `examples/sync` contains many example files using sync API
|
||||||
|
- `tools` contains python scripts that can be used to run command line tools from repository without installing
|
||||||
|
|
||||||
|
## Running a command for testing:
|
||||||
|
|
||||||
|
```
|
||||||
|
uv run uals -u opc.tcp://localhost:4840/myserver
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running tests:
|
||||||
|
|
||||||
|
```
|
||||||
|
uv run pytest -v -s tests
|
||||||
|
```
|
||||||
|
|
||||||
|
## Coverage
|
||||||
|
|
||||||
|
```
|
||||||
|
uv run pytest -v -s --cov asyncua --cov-report=html
|
||||||
|
```
|
||||||
|
|
||||||
|
## Linting
|
||||||
|
|
||||||
|
To apply linting checks (including ruff, and mypy) at each commit run,
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv sync --group lint
|
||||||
|
uv run pre-commit install
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also run all linters on all files with,
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv run pre-commit run -a
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for asyncua
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for asyncua
|
||||||
|
</h1>
|
||||||
|
<a href="/asyncua/asyncua-1.1.8-py3-none-any.whl#sha256=40c57151b93537beb77cb3f1a0190d75cef5326e8c40978de28b69e5b41e6ede" data-requires-python=">=3.9" data-dist-info-metadata="sha256=54ed6b16b77d680fef02438810d634209456528155261466b252b9bc8bfdfc71">
|
||||||
|
asyncua-1.1.8-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,235 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: attrs
|
||||||
|
Version: 25.4.0
|
||||||
|
Summary: Classes Without Boilerplate
|
||||||
|
Project-URL: Documentation, https://www.attrs.org/
|
||||||
|
Project-URL: Changelog, https://www.attrs.org/en/stable/changelog.html
|
||||||
|
Project-URL: GitHub, https://github.com/python-attrs/attrs
|
||||||
|
Project-URL: Funding, https://github.com/sponsors/hynek
|
||||||
|
Project-URL: Tidelift, https://tidelift.com/subscription/pkg/pypi-attrs?utm_source=pypi-attrs&utm_medium=pypi
|
||||||
|
Author-email: Hynek Schlawack <hs@ox.cx>
|
||||||
|
License-Expression: MIT
|
||||||
|
License-File: LICENSE
|
||||||
|
Keywords: attribute,boilerplate,class
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://www.attrs.org/">
|
||||||
|
<img src="https://raw.githubusercontent.com/python-attrs/attrs/main/docs/_static/attrs_logo.svg" width="35%" alt="attrs" />
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
*attrs* is the Python package that will bring back the **joy** of **writing classes** by relieving you from the drudgery of implementing object protocols (aka [dunder methods](https://www.attrs.org/en/latest/glossary.html#term-dunder-methods)).
|
||||||
|
Trusted by NASA for [Mars missions since 2020](https://github.com/readme/featured/nasa-ingenuity-helicopter)!
|
||||||
|
|
||||||
|
Its main goal is to help you to write **concise** and **correct** software without slowing down your code.
|
||||||
|
|
||||||
|
|
||||||
|
## Sponsors
|
||||||
|
|
||||||
|
*attrs* would not be possible without our [amazing sponsors](https://github.com/sponsors/hynek).
|
||||||
|
Especially those generously supporting us at the *The Organization* tier and higher:
|
||||||
|
|
||||||
|
<!-- sponsor-break-begin -->
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
|
||||||
|
<!-- [[[cog
|
||||||
|
import pathlib, tomllib
|
||||||
|
|
||||||
|
for sponsor in tomllib.loads(pathlib.Path("pyproject.toml").read_text())["tool"]["sponcon"]["sponsors"]:
|
||||||
|
print(f'<a href="{sponsor["url"]}"><img title="{sponsor["title"]}" src="https://www.attrs.org/en/25.4.0/_static/sponsors/{sponsor["img"]}" width="190" /></a>')
|
||||||
|
]]] -->
|
||||||
|
<a href="https://www.variomedia.de/"><img title="Variomedia AG" src="https://www.attrs.org/en/25.4.0/_static/sponsors/Variomedia.svg" width="190" /></a>
|
||||||
|
<a href="https://tidelift.com/?utm_source=lifter&utm_medium=referral&utm_campaign=hynek"><img title="Tidelift" src="https://www.attrs.org/en/25.4.0/_static/sponsors/Tidelift.svg" width="190" /></a>
|
||||||
|
<a href="https://privacy-solutions.org/"><img title="Privacy Solutions" src="https://www.attrs.org/en/25.4.0/_static/sponsors/Privacy-Solutions.svg" width="190" /></a>
|
||||||
|
<a href="https://filepreviews.io/"><img title="FilePreviews" src="https://www.attrs.org/en/25.4.0/_static/sponsors/FilePreviews.svg" width="190" /></a>
|
||||||
|
<a href="https://polar.sh/"><img title="Polar" src="https://www.attrs.org/en/25.4.0/_static/sponsors/Polar.svg" width="190" /></a>
|
||||||
|
<!-- [[[end]]] -->
|
||||||
|
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<!-- sponsor-break-end -->
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<strong>Please consider <a href="https://github.com/sponsors/hynek">joining them</a> to help make <em>attrs</em>’s maintenance more sustainable!</strong>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<!-- teaser-end -->
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
*attrs* gives you a class decorator and a way to declaratively define the attributes on that class:
|
||||||
|
|
||||||
|
<!-- code-begin -->
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> from attrs import asdict, define, make_class, Factory
|
||||||
|
|
||||||
|
>>> @define
|
||||||
|
... class SomeClass:
|
||||||
|
... a_number: int = 42
|
||||||
|
... list_of_numbers: list[int] = Factory(list)
|
||||||
|
...
|
||||||
|
... def hard_math(self, another_number):
|
||||||
|
... return self.a_number + sum(self.list_of_numbers) * another_number
|
||||||
|
|
||||||
|
|
||||||
|
>>> sc = SomeClass(1, [1, 2, 3])
|
||||||
|
>>> sc
|
||||||
|
SomeClass(a_number=1, list_of_numbers=[1, 2, 3])
|
||||||
|
|
||||||
|
>>> sc.hard_math(3)
|
||||||
|
19
|
||||||
|
>>> sc == SomeClass(1, [1, 2, 3])
|
||||||
|
True
|
||||||
|
>>> sc != SomeClass(2, [3, 2, 1])
|
||||||
|
True
|
||||||
|
|
||||||
|
>>> asdict(sc)
|
||||||
|
{'a_number': 1, 'list_of_numbers': [1, 2, 3]}
|
||||||
|
|
||||||
|
>>> SomeClass()
|
||||||
|
SomeClass(a_number=42, list_of_numbers=[])
|
||||||
|
|
||||||
|
>>> C = make_class("C", ["a", "b"])
|
||||||
|
>>> C("foo", "bar")
|
||||||
|
C(a='foo', b='bar')
|
||||||
|
```
|
||||||
|
|
||||||
|
After *declaring* your attributes, *attrs* gives you:
|
||||||
|
|
||||||
|
- a concise and explicit overview of the class's attributes,
|
||||||
|
- a nice human-readable `__repr__`,
|
||||||
|
- equality-checking methods,
|
||||||
|
- an initializer,
|
||||||
|
- and much more,
|
||||||
|
|
||||||
|
*without* writing dull boilerplate code again and again and *without* runtime performance penalties.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This example uses *attrs*'s modern APIs that have been introduced in version 20.1.0, and the *attrs* package import name that has been added in version 21.3.0.
|
||||||
|
The classic APIs (`@attr.s`, `attr.ib`, plus their serious-business aliases) and the `attr` package import name will remain **indefinitely**.
|
||||||
|
|
||||||
|
Check out [*On The Core API Names*](https://www.attrs.org/en/latest/names.html) for an in-depth explanation!
|
||||||
|
|
||||||
|
|
||||||
|
### Hate Type Annotations!?
|
||||||
|
|
||||||
|
No problem!
|
||||||
|
Types are entirely **optional** with *attrs*.
|
||||||
|
Simply assign `attrs.field()` to the attributes instead of annotating them with types:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from attrs import define, field
|
||||||
|
|
||||||
|
@define
|
||||||
|
class SomeClass:
|
||||||
|
a_number = field(default=42)
|
||||||
|
list_of_numbers = field(factory=list)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Data Classes
|
||||||
|
|
||||||
|
On the tin, *attrs* might remind you of `dataclasses` (and indeed, `dataclasses` [are a descendant](https://hynek.me/articles/import-attrs/) of *attrs*).
|
||||||
|
In practice it does a lot more and is more flexible.
|
||||||
|
For instance, it allows you to define [special handling of NumPy arrays for equality checks](https://www.attrs.org/en/stable/comparison.html#customization), allows more ways to [plug into the initialization process](https://www.attrs.org/en/stable/init.html#hooking-yourself-into-initialization), has a replacement for `__init_subclass__`, and allows for stepping through the generated methods using a debugger.
|
||||||
|
|
||||||
|
For more details, please refer to our [comparison page](https://www.attrs.org/en/stable/why.html#data-classes), but generally speaking, we are more likely to commit crimes against nature to make things work that one would expect to work, but that are quite complicated in practice.
|
||||||
|
|
||||||
|
|
||||||
|
## Project Information
|
||||||
|
|
||||||
|
- [**Changelog**](https://www.attrs.org/en/stable/changelog.html)
|
||||||
|
- [**Documentation**](https://www.attrs.org/)
|
||||||
|
- [**PyPI**](https://pypi.org/project/attrs/)
|
||||||
|
- [**Source Code**](https://github.com/python-attrs/attrs)
|
||||||
|
- [**Contributing**](https://github.com/python-attrs/attrs/blob/main/.github/CONTRIBUTING.md)
|
||||||
|
- [**Third-party Extensions**](https://github.com/python-attrs/attrs/wiki/Extensions-to-attrs)
|
||||||
|
- **Get Help**: use the `python-attrs` tag on [Stack Overflow](https://stackoverflow.com/questions/tagged/python-attrs)
|
||||||
|
|
||||||
|
|
||||||
|
### *attrs* for Enterprise
|
||||||
|
|
||||||
|
Available as part of the [Tidelift Subscription](https://tidelift.com/?utm_source=lifter&utm_medium=referral&utm_campaign=hynek).
|
||||||
|
|
||||||
|
The maintainers of *attrs* and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source packages you use to build your applications.
|
||||||
|
Save time, reduce risk, and improve code health, while paying the maintainers of the exact packages you use.
|
||||||
|
|
||||||
|
## Release Information
|
||||||
|
|
||||||
|
### Backwards-incompatible Changes
|
||||||
|
|
||||||
|
- Class-level `kw_only=True` behavior is now consistent with `dataclasses`.
|
||||||
|
|
||||||
|
Previously, a class that sets `kw_only=True` makes all attributes keyword-only, including those from base classes.
|
||||||
|
If an attribute sets `kw_only=False`, that setting is ignored, and it is still made keyword-only.
|
||||||
|
|
||||||
|
Now, only the attributes defined in that class that doesn't explicitly set `kw_only=False` are made keyword-only.
|
||||||
|
|
||||||
|
This shouldn't be a problem for most users, unless you have a pattern like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@attrs.define(kw_only=True)
|
||||||
|
class Base:
|
||||||
|
a: int
|
||||||
|
b: int = attrs.field(default=1, kw_only=False)
|
||||||
|
|
||||||
|
@attrs.define
|
||||||
|
class Subclass(Base):
|
||||||
|
c: int
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, we have a `kw_only=True` *attrs* class (`Base`) with an attribute that sets `kw_only=False` and has a default (`Base.b`), and then create a subclass (`Subclass`) with required arguments (`Subclass.c`).
|
||||||
|
Previously this would work, since it would make `Base.b` keyword-only, but now this fails since `Base.b` is positional, and we have a required positional argument (`Subclass.c`) following another argument with defaults.
|
||||||
|
[#1457](https://github.com/python-attrs/attrs/issues/1457)
|
||||||
|
|
||||||
|
|
||||||
|
### Changes
|
||||||
|
|
||||||
|
- Values passed to the `__init__()` method of `attrs` classes are now correctly passed to `__attrs_pre_init__()` instead of their default values (in cases where *kw_only* was not specified).
|
||||||
|
[#1427](https://github.com/python-attrs/attrs/issues/1427)
|
||||||
|
- Added support for Python 3.14 and [PEP 749](https://peps.python.org/pep-0749/).
|
||||||
|
[#1446](https://github.com/python-attrs/attrs/issues/1446),
|
||||||
|
[#1451](https://github.com/python-attrs/attrs/issues/1451)
|
||||||
|
- `attrs.validators.deep_mapping()` now allows to leave out either *key_validator* xor *value_validator*.
|
||||||
|
[#1448](https://github.com/python-attrs/attrs/issues/1448)
|
||||||
|
- `attrs.validators.deep_iterator()` and `attrs.validators.deep_mapping()` now accept lists and tuples for all validators and wrap them into a `attrs.validators.and_()`.
|
||||||
|
[#1449](https://github.com/python-attrs/attrs/issues/1449)
|
||||||
|
- Added a new **experimental** way to inspect classes:
|
||||||
|
|
||||||
|
`attrs.inspect(cls)` returns the _effective_ class-wide parameters that were used by *attrs* to construct the class.
|
||||||
|
|
||||||
|
The returned class is the same data structure that *attrs* uses internally to decide how to construct the final class.
|
||||||
|
[#1454](https://github.com/python-attrs/attrs/issues/1454)
|
||||||
|
- Fixed annotations for `attrs.field(converter=...)`.
|
||||||
|
Previously, a `tuple` of converters was only accepted if it had exactly one element.
|
||||||
|
[#1461](https://github.com/python-attrs/attrs/issues/1461)
|
||||||
|
- The performance of `attrs.asdict()` has been improved by 45–260%.
|
||||||
|
[#1463](https://github.com/python-attrs/attrs/issues/1463)
|
||||||
|
- The performance of `attrs.astuple()` has been improved by 49–270%.
|
||||||
|
[#1469](https://github.com/python-attrs/attrs/issues/1469)
|
||||||
|
- The type annotation for `attrs.validators.or_()` now allows for different types of validators.
|
||||||
|
|
||||||
|
This was only an issue on Pyright.
|
||||||
|
[#1474](https://github.com/python-attrs/attrs/issues/1474)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[Full changelog →](https://www.attrs.org/en/stable/changelog.html)
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for attrs
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for attrs
|
||||||
|
</h1>
|
||||||
|
<a href="/attrs/attrs-25.4.0-py3-none-any.whl#sha256=adcf7e2a1fb3b36ac48d97835bb6d8ade15b8dcce26aba8bf1d14847b57a3373" data-requires-python=">=3.9" data-dist-info-metadata="sha256=d917abc63eda81c311c62c1d92dea50b682ea870269062821f5de75962bb6684">
|
||||||
|
attrs-25.4.0-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,24 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: beniget
|
||||||
|
Version: 0.4.2.post1
|
||||||
|
Summary: Extract semantic information about static Python code
|
||||||
|
Home-page: https://github.com/serge-sans-paille/beniget/
|
||||||
|
Author: serge-sans-paille
|
||||||
|
Author-email: serge.guelton@telecom-bretagne.eu
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Classifier: Development Status :: 4 - Beta
|
||||||
|
Classifier: Environment :: Console
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: BSD License
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Requires-Python: >=3.6
|
||||||
|
License-File: LICENSE
|
||||||
|
Requires-Dist: gast >=0.5.0
|
||||||
|
|
||||||
|
|
||||||
|
A static analyzer for Python code.
|
||||||
|
|
||||||
|
Beniget provides a static over-approximation of the global and
|
||||||
|
local definitions inside Python Module/Class/Function.
|
||||||
|
It can also compute def-use chains from each definition.
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for beniget
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for beniget
|
||||||
|
</h1>
|
||||||
|
<a href="/beniget/beniget-0.4.2.post1-py3-none-any.whl#sha256=e1b336e7b5f2ae201e6cc21f533486669f1b9eccba018dcff5969cd52f1c20ba" data-requires-python=">=3.6" data-dist-info-metadata="sha256=93abce9ee6ad15817ded0cace625dacfe441a5a18b16267b7d36d5a85584412e">
|
||||||
|
beniget-0.4.2.post1-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,186 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: boto3
|
||||||
|
Version: 1.41.5
|
||||||
|
Summary: The AWS SDK for Python
|
||||||
|
Home-page: https://github.com/boto/boto3
|
||||||
|
Author: Amazon Web Services
|
||||||
|
License: Apache-2.0
|
||||||
|
Project-URL: Documentation, https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
|
||||||
|
Project-URL: Source, https://github.com/boto/boto3
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Requires-Python: >= 3.9
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: NOTICE
|
||||||
|
Requires-Dist: botocore (<1.42.0,>=1.41.5)
|
||||||
|
Requires-Dist: jmespath (<2.0.0,>=0.7.1)
|
||||||
|
Requires-Dist: s3transfer (<0.16.0,>=0.15.0)
|
||||||
|
Provides-Extra: crt
|
||||||
|
Requires-Dist: botocore[crt] (<2.0a0,>=1.21.0) ; extra == 'crt'
|
||||||
|
|
||||||
|
===============================
|
||||||
|
Boto3 - The AWS SDK for Python
|
||||||
|
===============================
|
||||||
|
|
||||||
|
|Version| |Python| |License|
|
||||||
|
|
||||||
|
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for
|
||||||
|
Python, which allows Python developers to write software that makes use
|
||||||
|
of services like Amazon S3 and Amazon EC2. You can find the latest, most
|
||||||
|
up to date, documentation at our `doc site`_, including a list of
|
||||||
|
services that are supported.
|
||||||
|
|
||||||
|
Boto3 is maintained and published by `Amazon Web Services`_.
|
||||||
|
|
||||||
|
Boto (pronounced boh-toh) was named after the fresh water dolphin native to the Amazon river. The name was chosen by the author of the original Boto library, Mitch Garnaat, as a reference to the company.
|
||||||
|
|
||||||
|
Notices
|
||||||
|
-------
|
||||||
|
|
||||||
|
On 2026-04-29, support for Python 3.9 will end for Boto3. This follows the
|
||||||
|
Python Software Foundation `end of support <https://peps.python.org/pep-0596/#lifespan>`__
|
||||||
|
for the runtime which occurred on 2025-10-31.
|
||||||
|
|
||||||
|
On 2025-04-22, support for Python 3.8 ended for Boto3. This follows the
|
||||||
|
Python Software Foundation `end of support <https://peps.python.org/pep-0569/#lifespan>`__
|
||||||
|
for the runtime which occurred on 2024-10-07.
|
||||||
|
|
||||||
|
For more information on deprecations, see this
|
||||||
|
`blog post <https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/>`__.
|
||||||
|
|
||||||
|
.. _boto: https://docs.pythonboto.org/
|
||||||
|
.. _`doc site`: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
|
||||||
|
.. _`Amazon Web Services`: https://aws.amazon.com/what-is-aws/
|
||||||
|
.. |Python| image:: https://img.shields.io/pypi/pyversions/boto3.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/boto3/
|
||||||
|
:alt: Python Versions
|
||||||
|
.. |Version| image:: http://img.shields.io/pypi/v/boto3.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/boto3/
|
||||||
|
:alt: Package Version
|
||||||
|
.. |License| image:: http://img.shields.io/pypi/l/boto3.svg?style=flat
|
||||||
|
:target: https://github.com/boto/boto3/blob/develop/LICENSE
|
||||||
|
:alt: License
|
||||||
|
|
||||||
|
Getting Started
|
||||||
|
---------------
|
||||||
|
Assuming that you have a supported version of Python installed, you can first
|
||||||
|
set up your environment with:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ python -m venv .venv
|
||||||
|
...
|
||||||
|
$ . .venv/bin/activate
|
||||||
|
|
||||||
|
Then, you can install boto3 from PyPI with:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ python -m pip install boto3
|
||||||
|
|
||||||
|
or install from source with:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ git clone https://github.com/boto/boto3.git
|
||||||
|
$ cd boto3
|
||||||
|
$ python -m pip install -r requirements.txt
|
||||||
|
$ python -m pip install -e .
|
||||||
|
|
||||||
|
|
||||||
|
Using Boto3
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
After installing boto3
|
||||||
|
|
||||||
|
Next, set up credentials (in e.g. ``~/.aws/credentials``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
aws_access_key_id = YOUR_KEY
|
||||||
|
aws_secret_access_key = YOUR_SECRET
|
||||||
|
|
||||||
|
Then, set up a default region (in e.g. ``~/.aws/config``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
region=us-east-1
|
||||||
|
|
||||||
|
Other credential configuration methods can be found `here <https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html>`__
|
||||||
|
|
||||||
|
Then, from a Python interpreter:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
>>> import boto3
|
||||||
|
>>> s3 = boto3.resource('s3')
|
||||||
|
>>> for bucket in s3.buckets.all():
|
||||||
|
print(bucket.name)
|
||||||
|
|
||||||
|
Running Tests
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
You can run tests in all supported Python versions using ``tox``. By default,
|
||||||
|
it will run all of the unit and functional tests, but you can also specify your own
|
||||||
|
``pytest`` options. Note that this requires that you have all supported
|
||||||
|
versions of Python installed, otherwise you must pass ``-e`` or run the
|
||||||
|
``pytest`` command directly:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ tox
|
||||||
|
$ tox -- unit/test_session.py
|
||||||
|
$ tox -e py26,py33 -- integration/
|
||||||
|
|
||||||
|
You can also run individual tests with your default Python version:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ pytest tests/unit
|
||||||
|
|
||||||
|
|
||||||
|
Getting Help
|
||||||
|
------------
|
||||||
|
|
||||||
|
We use GitHub issues for tracking bugs and feature requests and have limited
|
||||||
|
bandwidth to address them. Please use these community resources for getting
|
||||||
|
help:
|
||||||
|
|
||||||
|
* Ask a question on `Stack Overflow <https://stackoverflow.com/>`__ and tag it with `boto3 <https://stackoverflow.com/questions/tagged/boto3>`__
|
||||||
|
* Open a support ticket with `AWS Support <https://console.aws.amazon.com/support/home#/>`__
|
||||||
|
* If it turns out that you may have found a bug, please `open an issue <https://github.com/boto/boto3/issues/new>`__
|
||||||
|
|
||||||
|
|
||||||
|
Contributing
|
||||||
|
------------
|
||||||
|
|
||||||
|
We value feedback and contributions from our community. Whether it's a bug report, new feature, correction, or additional documentation, we welcome your issues and pull requests. Please read through this `CONTRIBUTING <https://github.com/boto/boto3/blob/develop/CONTRIBUTING.rst>`__ document before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.
|
||||||
|
|
||||||
|
|
||||||
|
Maintenance and Support for SDK Major Versions
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Boto3 was made generally available on 06/22/2015 and is currently in the full support phase of the availability life cycle.
|
||||||
|
|
||||||
|
For information about maintenance and support for SDK major versions and their underlying dependencies, see the following in the AWS SDKs and Tools Shared Configuration and Credentials Reference Guide:
|
||||||
|
|
||||||
|
* `AWS SDKs and Tools Maintenance Policy <https://docs.aws.amazon.com/sdkref/latest/guide/maint-policy.html>`__
|
||||||
|
* `AWS SDKs and Tools Version Support Matrix <https://docs.aws.amazon.com/sdkref/latest/guide/version-support-matrix.html>`__
|
||||||
|
|
||||||
|
|
||||||
|
More Resources
|
||||||
|
--------------
|
||||||
|
|
||||||
|
* `NOTICE <https://github.com/boto/boto3/blob/develop/NOTICE>`__
|
||||||
|
* `Changelog <https://github.com/boto/boto3/blob/develop/CHANGELOG.rst>`__
|
||||||
|
* `License <https://github.com/boto/boto3/blob/develop/LICENSE>`__
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for boto3
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for boto3
|
||||||
|
</h1>
|
||||||
|
<a href="/boto3/boto3-1.41.5-py3-none-any.whl#sha256=bb278111bfb4c33dca8342bda49c9db7685e43debbfa00cc2a5eb854dd54b745" data-requires-python=">= 3.9" data-dist-info-metadata="sha256=329053bf9a9139cc670ba9b8557fe3e7400b57d3137514c9baf0c3209ac04d1f">
|
||||||
|
boto3-1.41.5-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,146 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: botocore
|
||||||
|
Version: 1.40.70
|
||||||
|
Summary: Low-level, data-driven core of boto 3.
|
||||||
|
Home-page: https://github.com/boto/botocore
|
||||||
|
Author: Amazon Web Services
|
||||||
|
License: Apache-2.0
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Intended Audience :: System Administrators
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Requires-Python: >= 3.9
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: NOTICE
|
||||||
|
Requires-Dist: jmespath (<2.0.0,>=0.7.1)
|
||||||
|
Requires-Dist: python-dateutil (<3.0.0,>=2.1)
|
||||||
|
Requires-Dist: urllib3 (<1.27,>=1.25.4) ; python_version < "3.10"
|
||||||
|
Requires-Dist: urllib3 (!=2.2.0,<3,>=1.25.4) ; python_version >= "3.10"
|
||||||
|
Provides-Extra: crt
|
||||||
|
Requires-Dist: awscrt (==0.27.6) ; extra == 'crt'
|
||||||
|
|
||||||
|
botocore
|
||||||
|
========
|
||||||
|
|
||||||
|
|Version| |Python| |License|
|
||||||
|
|
||||||
|
A low-level interface to a growing number of Amazon Web Services. The
|
||||||
|
botocore package is the foundation for the
|
||||||
|
`AWS CLI <https://github.com/aws/aws-cli>`__ as well as
|
||||||
|
`boto3 <https://github.com/boto/boto3>`__.
|
||||||
|
|
||||||
|
Botocore is maintained and published by `Amazon Web Services`_.
|
||||||
|
|
||||||
|
Notices
|
||||||
|
-------
|
||||||
|
|
||||||
|
On 2025-04-22, support for Python 3.8 ended for Botocore. This follows the
|
||||||
|
Python Software Foundation `end of support <https://peps.python.org/pep-0569/#lifespan>`__
|
||||||
|
for the runtime which occurred on 2024-10-07.
|
||||||
|
For more information, see this `blog post <https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/>`__.
|
||||||
|
|
||||||
|
.. _`Amazon Web Services`: https://aws.amazon.com/what-is-aws/
|
||||||
|
.. |Python| image:: https://img.shields.io/pypi/pyversions/botocore.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/botocore/
|
||||||
|
:alt: Python Versions
|
||||||
|
.. |Version| image:: http://img.shields.io/pypi/v/botocore.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/botocore/
|
||||||
|
:alt: Package Version
|
||||||
|
.. |License| image:: http://img.shields.io/pypi/l/botocore.svg?style=flat
|
||||||
|
:target: https://github.com/boto/botocore/blob/develop/LICENSE.txt
|
||||||
|
:alt: License
|
||||||
|
|
||||||
|
Getting Started
|
||||||
|
---------------
|
||||||
|
Assuming that you have Python and ``virtualenv`` installed, set up your environment and install the required dependencies like this or you can install the library using ``pip``:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ git clone https://github.com/boto/botocore.git
|
||||||
|
$ cd botocore
|
||||||
|
$ python -m venv .venv
|
||||||
|
...
|
||||||
|
$ source .venv/bin/activate
|
||||||
|
$ python -m pip install -r requirements.txt
|
||||||
|
$ python -m pip install -e .
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ pip install botocore
|
||||||
|
|
||||||
|
Using Botocore
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
After installing botocore
|
||||||
|
|
||||||
|
Next, set up credentials (in e.g. ``~/.aws/credentials``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
aws_access_key_id = YOUR_KEY
|
||||||
|
aws_secret_access_key = YOUR_SECRET
|
||||||
|
|
||||||
|
Then, set up a default region (in e.g. ``~/.aws/config``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
region=us-east-1
|
||||||
|
|
||||||
|
Other credentials configuration method can be found `here <https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html>`__
|
||||||
|
|
||||||
|
Then, from a Python interpreter:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
>>> import botocore.session
|
||||||
|
>>> session = botocore.session.get_session()
|
||||||
|
>>> client = session.create_client('ec2')
|
||||||
|
>>> print(client.describe_instances())
|
||||||
|
|
||||||
|
|
||||||
|
Getting Help
|
||||||
|
------------
|
||||||
|
|
||||||
|
We use GitHub issues for tracking bugs and feature requests and have limited
|
||||||
|
bandwidth to address them. Please use these community resources for getting
|
||||||
|
help. Please note many of the same resources available for ``boto3`` are
|
||||||
|
applicable for ``botocore``:
|
||||||
|
|
||||||
|
* Ask a question on `Stack Overflow <https://stackoverflow.com/>`__ and tag it with `boto3 <https://stackoverflow.com/questions/tagged/boto3>`__
|
||||||
|
* Open a support ticket with `AWS Support <https://console.aws.amazon.com/support/home#/>`__
|
||||||
|
* If it turns out that you may have found a bug, please `open an issue <https://github.com/boto/botocore/issues/new/choose>`__
|
||||||
|
|
||||||
|
|
||||||
|
Contributing
|
||||||
|
------------
|
||||||
|
|
||||||
|
We value feedback and contributions from our community. Whether it's a bug report, new feature, correction, or additional documentation, we welcome your issues and pull requests. Please read through this `CONTRIBUTING <https://github.com/boto/botocore/blob/develop/CONTRIBUTING.rst>`__ document before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.
|
||||||
|
|
||||||
|
|
||||||
|
Maintenance and Support for SDK Major Versions
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Botocore was made generally available on 06/22/2015 and is currently in the full support phase of the availability life cycle.
|
||||||
|
|
||||||
|
For information about maintenance and support for SDK major versions and their underlying dependencies, see the following in the AWS SDKs and Tools Reference Guide:
|
||||||
|
|
||||||
|
* `AWS SDKs and Tools Maintenance Policy <https://docs.aws.amazon.com/sdkref/latest/guide/maint-policy.html>`__
|
||||||
|
* `AWS SDKs and Tools Version Support Matrix <https://docs.aws.amazon.com/sdkref/latest/guide/version-support-matrix.html>`__
|
||||||
|
|
||||||
|
|
||||||
|
More Resources
|
||||||
|
--------------
|
||||||
|
|
||||||
|
* `NOTICE <https://github.com/boto/botocore/blob/develop/NOTICE>`__
|
||||||
|
* `Changelog <https://github.com/boto/botocore/blob/develop/CHANGELOG.rst>`__
|
||||||
|
* `License <https://github.com/boto/botocore/blob/develop/LICENSE.txt>`__
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,151 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: botocore
|
||||||
|
Version: 1.41.5
|
||||||
|
Summary: Low-level, data-driven core of boto 3.
|
||||||
|
Home-page: https://github.com/boto/botocore
|
||||||
|
Author: Amazon Web Services
|
||||||
|
License: Apache-2.0
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Intended Audience :: System Administrators
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Requires-Python: >= 3.9
|
||||||
|
License-File: LICENSE.txt
|
||||||
|
License-File: NOTICE
|
||||||
|
Requires-Dist: jmespath (<2.0.0,>=0.7.1)
|
||||||
|
Requires-Dist: python-dateutil (<3.0.0,>=2.1)
|
||||||
|
Requires-Dist: urllib3 (<1.27,>=1.25.4) ; python_version < "3.10"
|
||||||
|
Requires-Dist: urllib3 (!=2.2.0,<3,>=1.25.4) ; python_version >= "3.10"
|
||||||
|
Provides-Extra: crt
|
||||||
|
Requires-Dist: awscrt (==0.29.0) ; extra == 'crt'
|
||||||
|
|
||||||
|
botocore
|
||||||
|
========
|
||||||
|
|
||||||
|
|Version| |Python| |License|
|
||||||
|
|
||||||
|
A low-level interface to a growing number of Amazon Web Services. The
|
||||||
|
botocore package is the foundation for the
|
||||||
|
`AWS CLI <https://github.com/aws/aws-cli>`__ as well as
|
||||||
|
`boto3 <https://github.com/boto/boto3>`__.
|
||||||
|
|
||||||
|
Botocore is maintained and published by `Amazon Web Services`_.
|
||||||
|
|
||||||
|
Notices
|
||||||
|
-------
|
||||||
|
|
||||||
|
On 2026-04-29, support for Python 3.9 will end for Botocore. This follows the
|
||||||
|
Python Software Foundation `end of support <https://peps.python.org/pep-0596/#lifespan>`__
|
||||||
|
for the runtime which occurred on 2025-10-31.
|
||||||
|
|
||||||
|
On 2025-04-22, support for Python 3.8 ended for Botocore. This follows the
|
||||||
|
Python Software Foundation `end of support <https://peps.python.org/pep-0569/#lifespan>`__
|
||||||
|
for the runtime which occurred on 2024-10-07.
|
||||||
|
|
||||||
|
For more information, see this `blog post <https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/>`__.
|
||||||
|
|
||||||
|
.. _`Amazon Web Services`: https://aws.amazon.com/what-is-aws/
|
||||||
|
.. |Python| image:: https://img.shields.io/pypi/pyversions/botocore.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/botocore/
|
||||||
|
:alt: Python Versions
|
||||||
|
.. |Version| image:: http://img.shields.io/pypi/v/botocore.svg?style=flat
|
||||||
|
:target: https://pypi.python.org/pypi/botocore/
|
||||||
|
:alt: Package Version
|
||||||
|
.. |License| image:: http://img.shields.io/pypi/l/botocore.svg?style=flat
|
||||||
|
:target: https://github.com/boto/botocore/blob/develop/LICENSE.txt
|
||||||
|
:alt: License
|
||||||
|
|
||||||
|
Getting Started
|
||||||
|
---------------
|
||||||
|
Assuming that you have Python and ``virtualenv`` installed, set up your environment and install the required dependencies like this or you can install the library using ``pip``:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ git clone https://github.com/boto/botocore.git
|
||||||
|
$ cd botocore
|
||||||
|
$ python -m venv .venv
|
||||||
|
...
|
||||||
|
$ source .venv/bin/activate
|
||||||
|
$ python -m pip install -r requirements.txt
|
||||||
|
$ python -m pip install -e .
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ pip install botocore
|
||||||
|
|
||||||
|
Using Botocore
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
After installing botocore
|
||||||
|
|
||||||
|
Next, set up credentials (in e.g. ``~/.aws/credentials``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
aws_access_key_id = YOUR_KEY
|
||||||
|
aws_secret_access_key = YOUR_SECRET
|
||||||
|
|
||||||
|
Then, set up a default region (in e.g. ``~/.aws/config``):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[default]
|
||||||
|
region=us-east-1
|
||||||
|
|
||||||
|
Other credentials configuration method can be found `here <https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html>`__
|
||||||
|
|
||||||
|
Then, from a Python interpreter:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
>>> import botocore.session
|
||||||
|
>>> session = botocore.session.get_session()
|
||||||
|
>>> client = session.create_client('ec2')
|
||||||
|
>>> print(client.describe_instances())
|
||||||
|
|
||||||
|
|
||||||
|
Getting Help
|
||||||
|
------------
|
||||||
|
|
||||||
|
We use GitHub issues for tracking bugs and feature requests and have limited
|
||||||
|
bandwidth to address them. Please use these community resources for getting
|
||||||
|
help. Please note many of the same resources available for ``boto3`` are
|
||||||
|
applicable for ``botocore``:
|
||||||
|
|
||||||
|
* Ask a question on `Stack Overflow <https://stackoverflow.com/>`__ and tag it with `boto3 <https://stackoverflow.com/questions/tagged/boto3>`__
|
||||||
|
* Open a support ticket with `AWS Support <https://console.aws.amazon.com/support/home#/>`__
|
||||||
|
* If it turns out that you may have found a bug, please `open an issue <https://github.com/boto/botocore/issues/new/choose>`__
|
||||||
|
|
||||||
|
|
||||||
|
Contributing
|
||||||
|
------------
|
||||||
|
|
||||||
|
We value feedback and contributions from our community. Whether it's a bug report, new feature, correction, or additional documentation, we welcome your issues and pull requests. Please read through this `CONTRIBUTING <https://github.com/boto/botocore/blob/develop/CONTRIBUTING.rst>`__ document before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.
|
||||||
|
|
||||||
|
|
||||||
|
Maintenance and Support for SDK Major Versions
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Botocore was made generally available on 06/22/2015 and is currently in the full support phase of the availability life cycle.
|
||||||
|
|
||||||
|
For information about maintenance and support for SDK major versions and their underlying dependencies, see the following in the AWS SDKs and Tools Reference Guide:
|
||||||
|
|
||||||
|
* `AWS SDKs and Tools Maintenance Policy <https://docs.aws.amazon.com/sdkref/latest/guide/maint-policy.html>`__
|
||||||
|
* `AWS SDKs and Tools Version Support Matrix <https://docs.aws.amazon.com/sdkref/latest/guide/version-support-matrix.html>`__
|
||||||
|
|
||||||
|
|
||||||
|
More Resources
|
||||||
|
--------------
|
||||||
|
|
||||||
|
* `NOTICE <https://github.com/boto/botocore/blob/develop/NOTICE>`__
|
||||||
|
* `Changelog <https://github.com/boto/botocore/blob/develop/CHANGELOG.rst>`__
|
||||||
|
* `License <https://github.com/boto/botocore/blob/develop/LICENSE.txt>`__
|
||||||
|
|
@ -0,0 +1,24 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for botocore
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for botocore
|
||||||
|
</h1>
|
||||||
|
<a href="/botocore/botocore-1.41.5-py3-none-any.whl#sha256=3fef7fcda30c82c27202d232cfdbd6782cb27f20f8e7e21b20606483e66ee73a" data-requires-python=">= 3.9" data-dist-info-metadata="sha256=867c86c9f400df83088bb210e49402344febc90aa6b10d46a0cd02642ae1096c">
|
||||||
|
botocore-1.41.5-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/botocore/botocore-1.40.70-py3-none-any.whl#sha256=4a394ad25f5d9f1ef0bed610365744523eeb5c22de6862ab25d8c93f9f6d295c" data-requires-python=">= 3.9" data-dist-info-metadata="sha256=ff124fb918cb0210e04c2c4396cb3ad31bbe26884306bf4d35b9535ece1feb27">
|
||||||
|
botocore-1.40.70-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,107 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: calver
|
||||||
|
Version: 2025.10.20
|
||||||
|
Summary: Setuptools extension for CalVer package versions
|
||||||
|
Author-email: Dustin Ingram <di@python.org>
|
||||||
|
License-Expression: Apache-2.0
|
||||||
|
Project-URL: Homepage, https://github.com/di/calver
|
||||||
|
Project-URL: Repository, https://github.com/di/calver
|
||||||
|
Keywords: calver
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Topic :: Software Development :: Build Tools
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
# CalVer
|
||||||
|
|
||||||
|
The `calver` package is a [setuptools](https://pypi.org/p/setuptools) extension
|
||||||
|
for automatically defining your Python package version as a calendar version.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
First, ensure `calver` is present during the project's build step by specifying
|
||||||
|
it as one of the build requirements:
|
||||||
|
|
||||||
|
`pyproject.toml`:
|
||||||
|
```toml
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=42", "calver"]
|
||||||
|
```
|
||||||
|
|
||||||
|
To enable generating the version automatically based on the date, add the
|
||||||
|
following to `setup.py`:
|
||||||
|
|
||||||
|
`setup.py`:
|
||||||
|
```python
|
||||||
|
from setuptools import setup
|
||||||
|
|
||||||
|
setup(
|
||||||
|
...
|
||||||
|
use_calver=True,
|
||||||
|
setup_requires=['calver'],
|
||||||
|
...
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
You can test that it is working with:
|
||||||
|
|
||||||
|
```console
|
||||||
|
$ python setup.py --version
|
||||||
|
2020.6.16
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
By default, when setting `use_calver=True`, it uses the following to generate
|
||||||
|
the version string:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import datetime
|
||||||
|
>>> datetime.datetime.now(tz=datetime.timezone.utc).strftime("%Y.%m.%d")
|
||||||
|
2020.6.16
|
||||||
|
```
|
||||||
|
|
||||||
|
You can override the format string by passing it instead of `True`:
|
||||||
|
|
||||||
|
`setup.py`:
|
||||||
|
```python
|
||||||
|
from setuptools import setup
|
||||||
|
|
||||||
|
setup(
|
||||||
|
...
|
||||||
|
use_calver="%Y.%m.%d.%H.%M",
|
||||||
|
setup_requires=['calver'],
|
||||||
|
...
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
You can override the current date/time by passing the environment variable
|
||||||
|
`SOURCE_DATE_EPOCH`, which should be a Unix timestamp in seconds.
|
||||||
|
This is useful for reproducible builds (see https://reproducible-builds.org/docs/source-date-epoch/):
|
||||||
|
|
||||||
|
```console
|
||||||
|
env SOURCE_DATE_EPOCH=1743428011000 python setup.py --version
|
||||||
|
```
|
||||||
|
|
||||||
|
You can override this entirely by passing a callable instead, which will be called
|
||||||
|
with no arguments at build time:
|
||||||
|
|
||||||
|
`setup.py`:
|
||||||
|
```python
|
||||||
|
import datetime
|
||||||
|
from setuptools import setup
|
||||||
|
|
||||||
|
def long_now_version():
|
||||||
|
now = datetime.datetime.now(tz=datetime.timezone.utc)
|
||||||
|
return now.strftime("%Y").zfill(5) + "." + now.strftime("%m.%d")
|
||||||
|
|
||||||
|
setup(
|
||||||
|
...
|
||||||
|
use_calver=long_now_version,
|
||||||
|
setup_requires=['calver'],
|
||||||
|
...
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for calver
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for calver
|
||||||
|
</h1>
|
||||||
|
<a href="/calver/calver-2025.10.20-py3-none-any.whl#sha256=d7d75224eed9d9263f4fb30008487615196d208d14752bfd93fea7af2c84f508" data-requires-python=">=3.9" data-dist-info-metadata="sha256=46b969834bb99e6a06ade2eefd609814ad6cbdb83459a7d4ed7ac9a04244634e">
|
||||||
|
calver-2025.10.20-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,78 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: certifi
|
||||||
|
Version: 2025.11.12
|
||||||
|
Summary: Python package for providing Mozilla's CA Bundle.
|
||||||
|
Home-page: https://github.com/certifi/python-certifi
|
||||||
|
Author: Kenneth Reitz
|
||||||
|
Author-email: me@kennethreitz.com
|
||||||
|
License: MPL-2.0
|
||||||
|
Project-URL: Source, https://github.com/certifi/python-certifi
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
License-File: LICENSE
|
||||||
|
Dynamic: author
|
||||||
|
Dynamic: author-email
|
||||||
|
Dynamic: classifier
|
||||||
|
Dynamic: description
|
||||||
|
Dynamic: home-page
|
||||||
|
Dynamic: license
|
||||||
|
Dynamic: license-file
|
||||||
|
Dynamic: project-url
|
||||||
|
Dynamic: requires-python
|
||||||
|
Dynamic: summary
|
||||||
|
|
||||||
|
Certifi: Python SSL Certificates
|
||||||
|
================================
|
||||||
|
|
||||||
|
Certifi provides Mozilla's carefully curated collection of Root Certificates for
|
||||||
|
validating the trustworthiness of SSL certificates while verifying the identity
|
||||||
|
of TLS hosts. It has been extracted from the `Requests`_ project.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
``certifi`` is available on PyPI. Simply install it with ``pip``::
|
||||||
|
|
||||||
|
$ pip install certifi
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
To reference the installed certificate authority (CA) bundle, you can use the
|
||||||
|
built-in function::
|
||||||
|
|
||||||
|
>>> import certifi
|
||||||
|
|
||||||
|
>>> certifi.where()
|
||||||
|
'/usr/local/lib/python3.7/site-packages/certifi/cacert.pem'
|
||||||
|
|
||||||
|
Or from the command line::
|
||||||
|
|
||||||
|
$ python -m certifi
|
||||||
|
/usr/local/lib/python3.7/site-packages/certifi/cacert.pem
|
||||||
|
|
||||||
|
Enjoy!
|
||||||
|
|
||||||
|
.. _`Requests`: https://requests.readthedocs.io/en/master/
|
||||||
|
|
||||||
|
Addition/Removal of Certificates
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Certifi does not support any addition/removal or other modification of the
|
||||||
|
CA trust store content. This project is intended to provide a reliable and
|
||||||
|
highly portable root of trust to python deployments. Look to upstream projects
|
||||||
|
for methods to use alternate trust.
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for certifi
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for certifi
|
||||||
|
</h1>
|
||||||
|
<a href="/certifi/certifi-2025.11.12-py3-none-any.whl#sha256=97de8790030bbd5c2d96b7ec782fc2f7820ef8dba6db909ccf95449f2d062d4b" data-requires-python=">=3.7" data-dist-info-metadata="sha256=fc9a6b1aeff595649d1e5aee44129ca2b5e7adfbc10e1bd7ffa291afc1d06cb7">
|
||||||
|
certifi-2025.11.12-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,68 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: cffi
|
||||||
|
Version: 2.0.0
|
||||||
|
Summary: Foreign Function Interface for Python calling C code.
|
||||||
|
Author: Armin Rigo, Maciej Fijalkowski
|
||||||
|
Maintainer: Matt Davis, Matt Clay, Matti Picus
|
||||||
|
License-Expression: MIT
|
||||||
|
Project-URL: Documentation, https://cffi.readthedocs.io/
|
||||||
|
Project-URL: Changelog, https://cffi.readthedocs.io/en/latest/whatsnew.html
|
||||||
|
Project-URL: Downloads, https://github.com/python-cffi/cffi/releases
|
||||||
|
Project-URL: Contact, https://groups.google.com/forum/#!forum/python-cffi
|
||||||
|
Project-URL: Source Code, https://github.com/python-cffi/cffi
|
||||||
|
Project-URL: Issue Tracker, https://github.com/python-cffi/cffi/issues
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: AUTHORS
|
||||||
|
Requires-Dist: pycparser; implementation_name != "PyPy"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
[](https://github.com/python-cffi/cffi/actions/workflows/ci.yaml?query=branch%3Amain++)
|
||||||
|
[](https://pypi.org/project/cffi)
|
||||||
|
[][Documentation]
|
||||||
|
|
||||||
|
|
||||||
|
CFFI
|
||||||
|
====
|
||||||
|
|
||||||
|
Foreign Function Interface for Python calling C code.
|
||||||
|
|
||||||
|
Please see the [Documentation] or uncompiled in the `doc/` subdirectory.
|
||||||
|
|
||||||
|
Download
|
||||||
|
--------
|
||||||
|
|
||||||
|
[Download page](https://github.com/python-cffi/cffi/releases)
|
||||||
|
|
||||||
|
Source Code
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Source code is publicly available on
|
||||||
|
[GitHub](https://github.com/python-cffi/cffi).
|
||||||
|
|
||||||
|
Contact
|
||||||
|
-------
|
||||||
|
|
||||||
|
[Mailing list](https://groups.google.com/forum/#!forum/python-cffi)
|
||||||
|
|
||||||
|
Testing/development tips
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
After `git clone` or `wget && tar`, we will get a directory called `cffi` or `cffi-x.x.x`. we call it `repo-directory`. To run tests under CPython, run the following in the `repo-directory`:
|
||||||
|
|
||||||
|
pip install pytest
|
||||||
|
pip install -e . # editable install of CFFI for local development
|
||||||
|
pytest src/c/ testing/
|
||||||
|
|
||||||
|
[Documentation]: http://cffi.readthedocs.org/
|
||||||
|
|
@ -0,0 +1,40 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for cffi
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for cffi
|
||||||
|
</h1>
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp311-cp311-musllinux_1_2_x86_64.whl#sha256=5fed36fccc0612a53f1d4d9a816b50a36702c28a2aa880cb8a122b3466638743" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp311-cp311-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl#sha256=8941aaadaf67246224cee8c3803777eed332a19d909b47e29c9842ef1e79ac26" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp310-cp310-musllinux_1_2_x86_64.whl#sha256=8ea985900c5c95ce9db1745f7933eeef5d314f0565b27625d9a10ec9881e1bfb" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp310-cp310-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl#sha256=fc7de24befaeae77ba923797c7c87834c73648a05a4bde34b3b7e5588973a453" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp39-cp39-musllinux_1_2_x86_64.whl#sha256=89472c9762729b5ae1ad974b777416bfda4ac5642423fa93bd57a09204712322" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp39-cp39-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/cffi/cffi-2.0.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl#sha256=61d028e90346df14fedc3d1e5441df818d095f3b87d286825dfcbd6459b7ef63" data-requires-python=">=3.9" data-dist-info-metadata="sha256=b98ce7e3417af089bc12d5c73412d9b3b1683ccf8ec73c986c35ac8c9be1ba39">
|
||||||
|
cffi-2.0.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,763 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode_backport
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,763 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode_backport
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,764 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
|
|
@ -0,0 +1,52 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for charset-normalizer
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for charset-normalizer
|
||||||
|
</h1>
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-py3-none-any.whl#sha256=7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_x86_64.whl#sha256=ebf3e58c7ec8a8bed6d66a75d7fb37b55e5015b03ceae72a8e7c74495551e224" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=840c25fb618a231545cbab0564a799f101b63b9901f2569faecd6b222ac72381" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_x86_64.whl#sha256=cc00f04ed596e9dc0da42ed17ac5e596c6ccba999ba6bd92b0e0aef2f170f2d6" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=9d1bb833febdff5c8927f922386db610b49db6e0d4f4ee29601d71e7c2694313" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp39-cp39-musllinux_1_2_x86_64.whl#sha256=cb01158d8b88ee68f15949894ccc6712278243d95f344770fa7593fa2d94410c" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp39-cp39-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=4fe7859a4e3e8457458e2ff592f15ccb02f3da787fcd31e0183879c3ad4692a1" data-requires-python=">=3.7" data-dist-info-metadata="sha256=8d5b94141b62f1d6afd7d60bbd68acb138a155d176a33518e0a28cc3b8dd9014">
|
||||||
|
charset_normalizer-3.4.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp38-cp38-musllinux_1_2_x86_64.whl#sha256=5cb4d72eea50c8868f5288b7f7f33ed276118325c1dfd3957089f6b519e1382a" data-requires-python=">=3.7" data-dist-info-metadata="sha256=47aaaa4790e1bdc8c54ab5bf6e35ee86e979b65a95d41e888c72639919cbb5c3">
|
||||||
|
charset_normalizer-3.4.4-cp38-cp38-musllinux_1_2_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
<a href="/charset-normalizer/charset_normalizer-3.4.4-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl#sha256=f155a433c2ec037d4e8df17d18922c3a0d9b3232a396690f17175d2946f0218d" data-requires-python=">=3.7" data-dist-info-metadata="sha256=47aaaa4790e1bdc8c54ab5bf6e35ee86e979b65a95d41e888c72639919cbb5c3">
|
||||||
|
charset_normalizer-3.4.4-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Binary file not shown.
|
|
@ -0,0 +1,191 @@
|
||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: choreographer
|
||||||
|
Version: 1.2.1
|
||||||
|
Summary: Devtools Protocol implementation for chrome.
|
||||||
|
Author-email: Andrew Pikul <ajpikul@gmail.com>, Neyberson Atencio <neyberatencio@gmail.com>
|
||||||
|
Maintainer-email: Andrew Pikul <ajpikul@gmail.com>
|
||||||
|
License: # MIT License
|
||||||
|
|
||||||
|
Copyright (c) Plotly, Inc.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person
|
||||||
|
obtaining a copy of this software and associated documentation
|
||||||
|
files (the "Software"), to deal in the Software without
|
||||||
|
restriction, including without limitation the rights to use,
|
||||||
|
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the
|
||||||
|
Software is furnished to do so, subject to the following
|
||||||
|
conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be
|
||||||
|
included in all copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||||
|
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
||||||
|
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||||
|
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
||||||
|
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||||
|
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
|
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||||
|
OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
|
||||||
|
Project-URL: Homepage, https://github.com/plotly/choreographer
|
||||||
|
Project-URL: Repository, https://github.com/plotly/choreographer
|
||||||
|
Requires-Python: >=3.8
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE.md
|
||||||
|
Requires-Dist: logistro>=2.0.1
|
||||||
|
Requires-Dist: simplejson>=3.19.3
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
# Choreographer
|
||||||
|
|
||||||
|
choreographer allows remote control of browsers from Python.
|
||||||
|
It was created to support image generation from browser-based charting tools,
|
||||||
|
but can be used for other purposes as well.
|
||||||
|
|
||||||
|
choreographer is available [PyPI](https://pypi.org/project/choreographer) and [github](https://github.com/plotly/choreographer).
|
||||||
|
|
||||||
|
## Wait—I Thought This Was Kaleido?
|
||||||
|
|
||||||
|
[Kaleido][kaleido] is a cross-platform library for generating static images of
|
||||||
|
plots. The original implementation included a custom build of Chrome, which has
|
||||||
|
proven very difficult to maintain. In contrast, this package uses the Chrome
|
||||||
|
binary on the user's machine in the same way as testing tools like
|
||||||
|
[Puppeteer][puppeteer]; the next step is to re-implement Kaleido as a layer on
|
||||||
|
top of it.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
choreographer is a work in progress: only Chrome-ish browsers are supported at
|
||||||
|
the moment, though we hope to add others. (Pull requests are greatly
|
||||||
|
appreciated.)
|
||||||
|
|
||||||
|
Note that we strongly recommend using async/await with this package, but it is
|
||||||
|
not absolutely required. The synchronous functions in this package are intended
|
||||||
|
as building blocks for other asynchronous strategies that Python may favor over
|
||||||
|
async/await in the future.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Process Control Tests
|
||||||
|
|
||||||
|
- Verbose: `pytest -W error -vvv tests/test_process.py`
|
||||||
|
- Quiet:`pytest -W error -v tests/test_process.py`
|
||||||
|
|
||||||
|
### Browser Interaction Tests
|
||||||
|
|
||||||
|
- Verbose: `pytest --debug -W error -vvv --ignore=tests/test_process.py`
|
||||||
|
- Quiet :`pytest -W error -v --ignore=tests/test_process.py`
|
||||||
|
|
||||||
|
You can also add "--no-headless" if you want to see the browser pop up.
|
||||||
|
|
||||||
|
### Writing Tests
|
||||||
|
|
||||||
|
- Separate async and sync test files. Add `_sync.py` to synchronous tests.
|
||||||
|
- For process tests, copy the fixtures in `test_process.py` file.
|
||||||
|
- For API tests, use `test_placeholder.py` as the minimum template.
|
||||||
|
|
||||||
|
## Help Wanted
|
||||||
|
|
||||||
|
We need your help to test this package on different platforms
|
||||||
|
and for different use cases.
|
||||||
|
To get started:
|
||||||
|
|
||||||
|
1. Clone this repository.
|
||||||
|
1. Create and activate a Python virtual environment.
|
||||||
|
1. Install this repository using `pip install .` or the equivalent.
|
||||||
|
1. Run `dtdoctor` and paste the output into an issue in this repository.
|
||||||
|
|
||||||
|
## Quickstart with `asyncio`
|
||||||
|
|
||||||
|
Save the following code to `example.py` and run with Python.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import asyncio
|
||||||
|
import choreographer as choreo
|
||||||
|
|
||||||
|
|
||||||
|
async def example():
|
||||||
|
browser = await choreo.Browser(headless=False)
|
||||||
|
tab = await browser.create_tab("https://google.com")
|
||||||
|
await asyncio.sleep(3)
|
||||||
|
await tab.send_command("Page.navigate", params={"url": "https://github.com"})
|
||||||
|
await asyncio.sleep(3)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(example())
|
||||||
|
```
|
||||||
|
|
||||||
|
Step by step, this example:
|
||||||
|
|
||||||
|
1. Imports the required libraries.
|
||||||
|
1. Defines an `async` function
|
||||||
|
(because `await` can only be used inside `async` functions).
|
||||||
|
1. Asks choreographer to create a browser.
|
||||||
|
`headless=False` tells it to display the browser on the screen;
|
||||||
|
the default is no display.
|
||||||
|
1. Wait three seconds for the browser to be created.
|
||||||
|
1. Create another tab.
|
||||||
|
(Note that users can't rearrange programmatically-generated tabs using the
|
||||||
|
mouse, but that's OK: we're not trying to replace testing tools like
|
||||||
|
[Puppeteer][puppeteer].)
|
||||||
|
1. Sleep again.
|
||||||
|
1. Runs the example function.
|
||||||
|
|
||||||
|
See [the devtools reference][devtools-ref] for a list of possible commands.
|
||||||
|
|
||||||
|
### Subscribing to Events
|
||||||
|
|
||||||
|
Try adding the following to the example shown above:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Callback for printing result
|
||||||
|
async def dump_event(response):
|
||||||
|
print(str(response))
|
||||||
|
|
||||||
|
|
||||||
|
# Callback for raising result as error
|
||||||
|
async def error_event(response):
|
||||||
|
raise Exception(str(response))
|
||||||
|
|
||||||
|
|
||||||
|
browser.subscribe("Target.targetCrashed", error_event)
|
||||||
|
new_tab.subscribe("Page.loadEventFired", dump_event)
|
||||||
|
browser.subscribe("Target.*", dump_event) # dumps all "Target" events
|
||||||
|
response = await new_tab.subscribe_once("Page.lifecycleEvent")
|
||||||
|
# do something with response
|
||||||
|
browser.unsubscribe("Target.*")
|
||||||
|
# events are always sent to a browser or tab,
|
||||||
|
# but the documentation isn't always clear which.
|
||||||
|
# Dumping all: `browser.subscribe("*", dump_event)` (on tab too)
|
||||||
|
# can be useful (but verbose) for debugging.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Synchronous Use
|
||||||
|
|
||||||
|
You can use this library without `asyncio`,
|
||||||
|
|
||||||
|
```python
|
||||||
|
my_browser = choreo.Browser() # blocking until open
|
||||||
|
```
|
||||||
|
|
||||||
|
However, you must call `browser.pipe.read_jsons(blocking=True|False)` manually,
|
||||||
|
and organizing the results.
|
||||||
|
|
||||||
|
`browser.run_output_thread()` starts another thread constantly printing
|
||||||
|
messages received from the browser but it can't be used with `asyncio`
|
||||||
|
nor will it play nice with any other read.
|
||||||
|
|
||||||
|
In other words, unless you're really, really sure you know what you're doing,
|
||||||
|
use `asyncio`.
|
||||||
|
|
||||||
|
## Low-Level Use
|
||||||
|
|
||||||
|
We provide a `Browser` and `Tab` interface, but there are lower-level `Target`
|
||||||
|
and `Session` interfaces if needed.
|
||||||
|
|
||||||
|
[devtools-ref]: https://chromedevtools.github.io/devtools-protocol/
|
||||||
|
[kaleido]: https://pypi.org/project/kaleido/
|
||||||
|
[puppeteer]: https://pptr.dev/
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta name="generator" content="simple503 version 0.4.0" />
|
||||||
|
<meta name="pypi:repository-version" content="1.0" />
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<title>
|
||||||
|
Links for choreographer
|
||||||
|
</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>
|
||||||
|
Links for choreographer
|
||||||
|
</h1>
|
||||||
|
<a href="/choreographer/choreographer-1.2.1-py3-none-any.whl#sha256=9af5385effa3c204dbc337abf7ac74fd8908ced326a15645dc31dde75718c77e" data-requires-python=">=3.8" data-dist-info-metadata="sha256=3a851fd5d74a1119d96770e59a3696f059066ca75809eda394166e941475ae9c">
|
||||||
|
choreographer-1.2.1-py3-none-any.whl
|
||||||
|
</a>
|
||||||
|
<br />
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue