Configuration objects inherit from PretrainedConfig and can be used to manage the product outputs. Read the
working on byte-sized tokens, transformers scale inadequately as each token need to "attend" to each other https://gerardtevz703517.blogsmine.com/30485408/the-definitive-guide-to-mamba-paper