https://github.com/huggingface/tokenizers

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Last synced: about 1 month ago

Repository metadata:

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production


Owner metadata:


Committers metadata

Last synced: about 2 months ago

Total Commits: 1,719
Total Committers: 112
Avg Commits per committer: 15.348
Development Distribution Score (DDS): 0.457

Commits in past year: 95
Committers in past year: 28
Avg Commits per committer in past year: 3.393
Development Distribution Score (DDS) in past year: 0.705

Name Email Commits
Anthony MOI m****i@g****m 934
Nicolas Patry p****s@p****m 253
Pierric Cistac p****c@h****o 136
epwalsh e****0@g****m 48
Arthur Zucker a****r@g****m 45
dependabot[bot] 4****] 32
Arthur 4****r 27
Sebastian Pütz s****z@u****e 26
Mishig Davaadorj d****g@g****m 22
Morgan Funtowicz m****n@h****o 17
Funtowicz Morgan m****z 9
Morgan Funtowicz f****o@g****m 9
Bjarte Johansen b****n@g****m 6
Chris Ha h****9@g****m 6
Sylvain Gugger s****r@g****m 6
thomwolf t****f@g****m 6
Bjarte Johansen b****h@e****m 5
Roy Hvaara h****a@g****m 5
Connor Boyle c****o@g****m 4
Clement c****e@g****m 4
Luc Georges M****e 4
Lysandre l****t@r****r 3
Julien Chaumond c****d@g****m 3
dctelus 9****s 3
François Garillot f****s@g****t 3
mert-kurttutan k****t@g****m 2
SeongBeomLEE 2****r@n****m 2
Sebastian Pütz s****z@g****m 2
Mario Šaško m****7@g****m 2
Lucain l****p@g****m 2
and 82 more...

Issue and Pull Request metadata

Last synced: about 1 month ago


Package metadata

pypi: tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://tokenizers.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release: 0.19.1 (published 8 months ago)
  • Last Synced: 2024-11-05T17:17:00.174Z (about 2 months ago)
  • Versions: 102
  • Dependent Packages: 380
  • Dependent Repositories: 14,571
  • Downloads: 30,374,895 Last month
  • Docker Downloads: 42,285,347
  • Rankings:
    • Downloads: 0.057%
    • Dependent repos count: 0.068%
    • Dependent packages count: 0.086%
    • Docker downloads count: 0.599%
    • Stargazers count: 0.619%
    • Average: 0.626%
    • Forks count: 2.329%
  • Maintainers (4)
cargo: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://docs.rs/tokenizers/
  • Licenses: Apache-2.0
  • Latest release: 0.19.1 (published 8 months ago)
  • Last Synced: 2024-11-05T17:37:04.891Z (about 2 months ago)
  • Versions: 29
  • Dependent Packages: 60
  • Dependent Repositories: 281
  • Downloads: 1,415,780 Total
  • Docker Downloads: 23,287,869
  • Rankings:
    • Stargazers count: 1.247%
    • Forks count: 1.477%
    • Dependent repos count: 2.361%
    • Average: 2.519%
    • Dependent packages count: 2.982%
    • Docker downloads count: 3.074%
    • Downloads: 3.97%
  • Maintainers (3)
npm: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
  • Licenses: Apache-2.0
  • Latest release: 0.13.3 (published over 1 year ago)
  • Last Synced: 2024-11-11T00:39:09.889Z (about 1 month ago)
  • Versions: 38
  • Dependent Packages: 6
  • Dependent Repositories: 23
  • Downloads: 1,480 Last month
  • Docker Downloads: 130
  • Rankings:
    • Stargazers count: 1.166%
    • Docker downloads count: 1.479%
    • Forks count: 1.502%
    • Dependent repos count: 2.673%
    • Average: 3.134%
    • Dependent packages count: 4.385%
    • Downloads: 7.598%
  • Maintainers (4)
go: github.com/huggingface/tokenizers

  • Homepage:
  • Documentation: https://pkg.go.dev/github.com/huggingface/tokenizers#section-documentation
  • Licenses: apache-2.0
  • Latest release: v0.19.1 (published 8 months ago)
  • Last Synced: 2024-11-11T00:40:09.370Z (about 1 month ago)
  • Versions: 33
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Rankings:
    • Stargazers count: 0.809%
    • Forks count: 1.187%
    • Average: 3.794%
    • Dependent repos count: 4.802%
    • Dependent packages count: 8.376%
alpine: py3-tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.15.2-r1 (published 8 months ago)
  • Last Synced: 2024-11-06T01:01:46.940Z (about 2 months ago)
  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 2.289%
    • Forks count: 4.25%
    • Average: 5.295%
    • Dependent packages count: 14.641%
  • Maintainers (1)
conda: tokenizers

  • Homepage: https://pypi.org/project/tokenizers/
  • Licenses: Apache-2.0
  • Latest release: 0.13.1 (published about 2 years ago)
  • Last Synced: 2024-10-29T18:23:03.906Z (about 2 months ago)
  • Versions: 16
  • Dependent Packages: 6
  • Dependent Repositories: 35
  • Rankings:
    • Stargazers count: 4.188%
    • Dependent repos count: 6.114%
    • Average: 6.559%
    • Forks count: 6.898%
    • Dependent packages count: 9.034%
alpine: py3-tokenizers-pyc

Precompiled Python bytecode for py3-tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.15.2-r1 (published 8 months ago)
  • Last Synced: 2024-11-06T01:01:59.518Z (about 2 months ago)
  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Average: 6.693%
    • Dependent packages count: 13.386%
  • Maintainers (1)
spack: py-tokenizers

Fast and Customizable Tokenizers.

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: []
  • Latest release: 0.15.0 (published about 1 year ago)
  • Last Synced: 2024-10-29T08:30:40.259Z (about 2 months ago)
  • Versions: 7
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 1.653%
    • Forks count: 3.946%
    • Average: 8.417%
    • Dependent packages count: 28.067%
  • Maintainers (1)
conda: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.15.1 (published 10 months ago)
  • Last Synced: 2024-10-29T18:21:47.374Z (about 2 months ago)
  • Versions: 7
  • Dependent Packages: 3
  • Dependent Repositories: 35
  • Rankings:
    • Stargazers count: 9.936%
    • Forks count: 14.321%
    • Average: 23.106%
    • Dependent repos count: 27.231%
    • Dependent packages count: 40.938%
pypi: divyanx-tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://divyanx-tokenizers.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release:
  • Last Synced: 2024-11-11T00:38:57.799Z (about 1 month ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 41 Last month
  • Rankings:
    • Dependent packages count: 10.454%
    • Average: 34.65%
    • Dependent repos count: 58.847%
  • Maintainers (1)
pypi: real-wordpiece

A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.

  • Homepage:
  • Documentation: https://real-wordpiece.readthedocs.io/
  • Licenses: Apache-2.0
  • Latest release:
  • Last Synced: 2024-11-11T00:38:47.457Z (about 1 month ago)
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 241 Last month
  • Rankings:
    • Dependent packages count: 10.718%
    • Average: 35.542%
    • Dependent repos count: 60.366%
  • Maintainers (1)
pypi: charylu-tokenizer

Biblioteca com tokenizadores criados por Luis Chary

  • Homepage:
  • Documentation: https://charylu-tokenizer.readthedocs.io/
  • Licenses: apache-2.0
  • Latest release:
  • Last Synced: 2024-11-11T00:38:46.501Z (about 1 month ago)
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 307 Last month
  • Rankings:
    • Dependent packages count: 10.763%
    • Average: 35.69%
    • Dependent repos count: 60.617%
  • Maintainers (1)
pypi: tokenizers-gt

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://tokenizers-gt.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release: 0.15.2.post0 (published 10 months ago)
  • Last Synced: 2024-11-11T00:38:55.864Z (about 1 month ago)
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 5,655 Last month
  • Rankings:
    • Dependent packages count: 10.044%
    • Average: 38.708%
    • Dependent repos count: 67.371%
  • Maintainers (1)
npm: tokenizers-node

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
  • Licenses: Apache-2.0
  • Latest release: 0.14.2-dev0 (published 9 months ago)
  • Last Synced: 2024-11-11T00:38:13.989Z (about 1 month ago)
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2 Last month
  • Rankings:
    • Dependent repos count: 32.533%
    • Average: 39.609%
    • Dependent packages count: 46.685%
  • Maintainers (1)

Dependencies

.github/workflows/docs-check.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact v2 composite
.github/workflows/node-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
  • actions/setup-python v1 composite
.github/workflows/node.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
.github/workflows/python-release-conda.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/python-release-extra.yml actions
  • actions/checkout v1 composite
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
.github/workflows/python-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • actions/setup-python v4 composite
.github/workflows/python.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/rust-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
.github/workflows/rust.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v1 composite
bindings/python/Cargo.toml cargo
  • pyo3 0.17.2 development
  • tempfile 3.1 development
  • env_logger 0.7.1
  • itertools 0.9
  • libc 0.2
  • ndarray 0.13
  • numpy 0.17.2
  • onig 6.0
  • pyo3 0.17.2
  • rayon 1.3
  • serde 1.0
  • serde_json 1.0
  • tokenizers *
tokenizers/Cargo.toml cargo
  • assert_approx_eq 1.1 development
  • criterion 0.4 development
  • tempfile 3.1 development
  • aho-corasick 0.7
  • cached-path 0.6
  • clap 4.0
  • derive_builder 0.12
  • dirs 3.0
  • esaxx-rs 0.1
  • fancy-regex 0.10
  • getrandom 0.2.6
  • indicatif 0.15
  • itertools 0.9
  • lazy_static 1.4
  • log 0.4
  • macro_rules_attribute 0.1.2
  • onig 6.0
  • paste 1.0.6
  • rand 0.8
  • rayon 1.3
  • rayon-cond 0.1
  • regex 1.3
  • regex-syntax 0.6
  • reqwest 0.11
  • serde 1.0
  • serde_json 1.0
  • spm_precompiled 0.1
  • thiserror 1.0.30
  • unicode-normalization-alignments 0.1
  • unicode-segmentation 1.6
  • unicode_categories 0.1
tokenizers/examples/unstable_wasm/Cargo.toml cargo
  • wasm-bindgen-test 0.3.13 development
  • console_error_panic_hook 0.1.6
  • wasm-bindgen 0.2.63
  • wee_alloc 0.4.5
bindings/node/package-lock.json npm
  • 627 dependencies
bindings/node/package.json npm
  • @types/jest ^26.0.24 development
  • @typescript-eslint/eslint-plugin ^3.10.1 development
  • @typescript-eslint/parser ^3.10.1 development
  • eslint ^7.32.0 development
  • eslint-config-prettier ^6.15.0 development
  • eslint-plugin-jest ^23.20.0 development
  • eslint-plugin-jsdoc ^30.7.13 development
  • eslint-plugin-prettier ^3.4.1 development
  • eslint-plugin-simple-import-sort ^5.0.3 development
  • jest ^26.6.3 development
  • neon-cli ^0.9.1 development
  • prettier ^2.5.1 development
  • shelljs ^0.8.3 development
  • ts-jest ^26.5.6 development
  • typescript ^3.9.10 development
  • @types/node ^13.13.52
  • node-pre-gyp ^0.14.0
tokenizers/examples/unstable_wasm/www/package-lock.json npm
  • 312 dependencies
tokenizers/examples/unstable_wasm/www/package.json npm
  • copy-webpack-plugin ^11.0.0 development
  • webpack ^5.75.0 development
  • webpack-cli ^5.0.1 development
  • webpack-dev-server ^4.10.0 development
  • unstable_wasm file:../pkg