https://github.com/huggingface/tokenizers
bert gpt language-model natural-language-processing natural-language-understanding nlp transformers
Last synced: about 1 month ago
Repository metadata:
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
- Host: GitHub
- URL: https://github.com/huggingface/tokenizers
- Owner: huggingface
- License: apache-2.0
- Created: 2019-11-01T17:52:20.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2024-11-04T17:16:15.000Z (about 2 months ago)
- Last Synced: 2024-11-04T17:26:21.478Z (about 2 months ago)
- Topics: bert, gpt, language-model, natural-language-processing, natural-language-understanding, nlp, transformers
- Language: Rust
- Homepage: https://huggingface.co/docs/tokenizers
- Size: 9.8 MB
- Stars: 9,031
- Watchers: 120
- Forks: 796
- Open Issues: 56
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Owner metadata:
- Name: Hugging Face
- Login: huggingface
- Email:
- Kind: organization
- Description: The AI community building the future.
- Website: https://huggingface.co/
- Location: NYC + Paris
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/25720743?v=4
- Repositories: 123
- Last Synced at: 2023-04-09T14:51:31.532Z
- Profile URL: https://github.com/huggingface
- Sponsor URL:
Committers metadata
Last synced: about 2 months ago
Total Commits: 1,719
Total Committers: 112
Avg Commits per committer: 15.348
Development Distribution Score (DDS): 0.457
Commits in past year: 95
Committers in past year: 28
Avg Commits per committer in past year: 3.393
Development Distribution Score (DDS) in past year: 0.705
Name | Commits | |
---|---|---|
Anthony MOI | m****i@g****m | 934 |
Nicolas Patry | p****s@p****m | 253 |
Pierric Cistac | p****c@h****o | 136 |
epwalsh | e****0@g****m | 48 |
Arthur Zucker | a****r@g****m | 45 |
dependabot[bot] | 4****] | 32 |
Arthur | 4****r | 27 |
Sebastian Pütz | s****z@u****e | 26 |
Mishig Davaadorj | d****g@g****m | 22 |
Morgan Funtowicz | m****n@h****o | 17 |
Funtowicz Morgan | m****z | 9 |
Morgan Funtowicz | f****o@g****m | 9 |
Bjarte Johansen | b****n@g****m | 6 |
Chris Ha | h****9@g****m | 6 |
Sylvain Gugger | s****r@g****m | 6 |
thomwolf | t****f@g****m | 6 |
Bjarte Johansen | b****h@e****m | 5 |
Roy Hvaara | h****a@g****m | 5 |
Connor Boyle | c****o@g****m | 4 |
Clement | c****e@g****m | 4 |
Luc Georges | M****e | 4 |
Lysandre | l****t@r****r | 3 |
Julien Chaumond | c****d@g****m | 3 |
dctelus | 9****s | 3 |
François Garillot | f****s@g****t | 3 |
mert-kurttutan | k****t@g****m | 2 |
SeongBeomLEE | 2****r@n****m | 2 |
Sebastian Pütz | s****z@g****m | 2 |
Mario Šaško | m****7@g****m | 2 |
Lucain | l****p@g****m | 2 |
and 82 more... |
Issue and Pull Request metadata
Last synced: about 1 month ago
Package metadata
- Total packages: 14
-
Total downloads:
- pypi: 30,381,139 last-month
- cargo: 1,415,780 total
- npm: 1,482 last-month
- Total docker downloads: 65,573,346
- Total dependent packages: 456 (may contain duplicates)
- Total dependent repositories: 14,946 (may contain duplicates)
- Total versions: 274
- Total maintainers: 18
pypi: tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://tokenizers.readthedocs.io/
- Licenses: Apache Software License
- Latest release: 0.19.1 (published 8 months ago)
- Last Synced: 2024-11-05T17:17:00.174Z (about 2 months ago)
- Versions: 102
- Dependent Packages: 380
- Dependent Repositories: 14,571
- Downloads: 30,374,895 Last month
- Docker Downloads: 42,285,347
-
Rankings:
- Downloads: 0.057%
- Dependent repos count: 0.068%
- Dependent packages count: 0.086%
- Docker downloads count: 0.599%
- Stargazers count: 0.619%
- Average: 0.626%
- Forks count: 2.329%
- Maintainers (4)
cargo: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://docs.rs/tokenizers/
- Licenses: Apache-2.0
- Latest release: 0.19.1 (published 8 months ago)
- Last Synced: 2024-11-05T17:37:04.891Z (about 2 months ago)
- Versions: 29
- Dependent Packages: 60
- Dependent Repositories: 281
- Downloads: 1,415,780 Total
- Docker Downloads: 23,287,869
-
Rankings:
- Stargazers count: 1.247%
- Forks count: 1.477%
- Dependent repos count: 2.361%
- Average: 2.519%
- Dependent packages count: 2.982%
- Docker downloads count: 3.074%
- Downloads: 3.97%
- Maintainers (3)
npm: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
- Licenses: Apache-2.0
- Latest release: 0.13.3 (published over 1 year ago)
- Last Synced: 2024-11-11T00:39:09.889Z (about 1 month ago)
- Versions: 38
- Dependent Packages: 6
- Dependent Repositories: 23
- Downloads: 1,480 Last month
- Docker Downloads: 130
-
Rankings:
- Stargazers count: 1.166%
- Docker downloads count: 1.479%
- Forks count: 1.502%
- Dependent repos count: 2.673%
- Average: 3.134%
- Dependent packages count: 4.385%
- Downloads: 7.598%
- Maintainers (4)
go: github.com/huggingface/tokenizers
- Homepage:
- Documentation: https://pkg.go.dev/github.com/huggingface/tokenizers#section-documentation
- Licenses: apache-2.0
- Latest release: v0.19.1 (published 8 months ago)
- Last Synced: 2024-11-11T00:40:09.370Z (about 1 month ago)
- Versions: 33
- Dependent Packages: 0
- Dependent Repositories: 1
-
Rankings:
- Stargazers count: 0.809%
- Forks count: 1.187%
- Average: 3.794%
- Dependent repos count: 4.802%
- Dependent packages count: 8.376%
alpine: py3-tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.15.2-r1 (published 8 months ago)
- Last Synced: 2024-11-06T01:01:46.940Z (about 2 months ago)
- Versions: 13
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 2.289%
- Forks count: 4.25%
- Average: 5.295%
- Dependent packages count: 14.641%
- Maintainers (1)
conda: tokenizers
- Homepage: https://pypi.org/project/tokenizers/
- Licenses: Apache-2.0
- Latest release: 0.13.1 (published about 2 years ago)
- Last Synced: 2024-10-29T18:23:03.906Z (about 2 months ago)
- Versions: 16
- Dependent Packages: 6
- Dependent Repositories: 35
-
Rankings:
- Stargazers count: 4.188%
- Dependent repos count: 6.114%
- Average: 6.559%
- Forks count: 6.898%
- Dependent packages count: 9.034%
alpine: py3-tokenizers-pyc
Precompiled Python bytecode for py3-tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.15.2-r1 (published 8 months ago)
- Last Synced: 2024-11-06T01:01:59.518Z (about 2 months ago)
- Versions: 12
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Average: 6.693%
- Dependent packages count: 13.386%
- Maintainers (1)
spack: py-tokenizers
Fast and Customizable Tokenizers.
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: []
- Latest release: 0.15.0 (published about 1 year ago)
- Last Synced: 2024-10-29T08:30:40.259Z (about 2 months ago)
- Versions: 7
- Dependent Packages: 1
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 1.653%
- Forks count: 3.946%
- Average: 8.417%
- Dependent packages count: 28.067%
- Maintainers (1)
conda: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.15.1 (published 10 months ago)
- Last Synced: 2024-10-29T18:21:47.374Z (about 2 months ago)
- Versions: 7
- Dependent Packages: 3
- Dependent Repositories: 35
-
Rankings:
- Stargazers count: 9.936%
- Forks count: 14.321%
- Average: 23.106%
- Dependent repos count: 27.231%
- Dependent packages count: 40.938%
pypi: divyanx-tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://divyanx-tokenizers.readthedocs.io/
- Licenses: Apache Software License
- Latest release:
- Last Synced: 2024-11-11T00:38:57.799Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 41 Last month
-
Rankings:
- Dependent packages count: 10.454%
- Average: 34.65%
- Dependent repos count: 58.847%
- Maintainers (1)
pypi: real-wordpiece
A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.
- Homepage:
- Documentation: https://real-wordpiece.readthedocs.io/
- Licenses: Apache-2.0
- Latest release:
- Last Synced: 2024-11-11T00:38:47.457Z (about 1 month ago)
- Versions: 7
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 241 Last month
-
Rankings:
- Dependent packages count: 10.718%
- Average: 35.542%
- Dependent repos count: 60.366%
- Maintainers (1)
pypi: charylu-tokenizer
Biblioteca com tokenizadores criados por Luis Chary
- Homepage:
- Documentation: https://charylu-tokenizer.readthedocs.io/
- Licenses: apache-2.0
- Latest release:
- Last Synced: 2024-11-11T00:38:46.501Z (about 1 month ago)
- Versions: 3
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 307 Last month
-
Rankings:
- Dependent packages count: 10.763%
- Average: 35.69%
- Dependent repos count: 60.617%
- Maintainers (1)
pypi: tokenizers-gt
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://tokenizers-gt.readthedocs.io/
- Licenses: Apache Software License
- Latest release: 0.15.2.post0 (published 10 months ago)
- Last Synced: 2024-11-11T00:38:55.864Z (about 1 month ago)
- Versions: 3
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 5,655 Last month
-
Rankings:
- Dependent packages count: 10.044%
- Average: 38.708%
- Dependent repos count: 67.371%
- Maintainers (1)
npm: tokenizers-node
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
- Licenses: Apache-2.0
- Latest release: 0.14.2-dev0 (published 9 months ago)
- Last Synced: 2024-11-11T00:38:13.989Z (about 1 month ago)
- Versions: 3
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 2 Last month
-
Rankings:
- Dependent repos count: 32.533%
- Average: 39.609%
- Dependent packages count: 46.685%
- Maintainers (1)
Dependencies
- actions-rs/toolchain v1 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/upload-artifact v2 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-node v1 composite
- actions/setup-python v1 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-node v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/setup-python v4 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-python v2 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v1 composite
- pyo3 0.17.2 development
- tempfile 3.1 development
- env_logger 0.7.1
- itertools 0.9
- libc 0.2
- ndarray 0.13
- numpy 0.17.2
- onig 6.0
- pyo3 0.17.2
- rayon 1.3
- serde 1.0
- serde_json 1.0
- tokenizers *
- assert_approx_eq 1.1 development
- criterion 0.4 development
- tempfile 3.1 development
- aho-corasick 0.7
- cached-path 0.6
- clap 4.0
- derive_builder 0.12
- dirs 3.0
- esaxx-rs 0.1
- fancy-regex 0.10
- getrandom 0.2.6
- indicatif 0.15
- itertools 0.9
- lazy_static 1.4
- log 0.4
- macro_rules_attribute 0.1.2
- onig 6.0
- paste 1.0.6
- rand 0.8
- rayon 1.3
- rayon-cond 0.1
- regex 1.3
- regex-syntax 0.6
- reqwest 0.11
- serde 1.0
- serde_json 1.0
- spm_precompiled 0.1
- thiserror 1.0.30
- unicode-normalization-alignments 0.1
- unicode-segmentation 1.6
- unicode_categories 0.1
- wasm-bindgen-test 0.3.13 development
- console_error_panic_hook 0.1.6
- wasm-bindgen 0.2.63
- wee_alloc 0.4.5
- 627 dependencies
- @types/jest ^26.0.24 development
- @typescript-eslint/eslint-plugin ^3.10.1 development
- @typescript-eslint/parser ^3.10.1 development
- eslint ^7.32.0 development
- eslint-config-prettier ^6.15.0 development
- eslint-plugin-jest ^23.20.0 development
- eslint-plugin-jsdoc ^30.7.13 development
- eslint-plugin-prettier ^3.4.1 development
- eslint-plugin-simple-import-sort ^5.0.3 development
- jest ^26.6.3 development
- neon-cli ^0.9.1 development
- prettier ^2.5.1 development
- shelljs ^0.8.3 development
- ts-jest ^26.5.6 development
- typescript ^3.9.10 development
- @types/node ^13.13.52
- node-pre-gyp ^0.14.0
- 312 dependencies
- copy-webpack-plugin ^11.0.0 development
- webpack ^5.75.0 development
- webpack-cli ^5.0.1 development
- webpack-dev-server ^4.10.0 development
- unstable_wasm file:../pkg