/content/project TTSTrainer start 528.226516734 Initializing trainer with hparams: {'attention_dim': 128, 'attention_location_kernel_size': 31, 'attention_location_n_filters': 32, 'attention_rnn_dim': 1024, 'batch_size': 10, 'checkpoint_name': 'my-very-epic-model', 'checkpoint_path': '/content/drive/MyDrive/tacotron', 'coarse_n_frames_per_step': None, 'config': 'tacotron2_config.json', 'cudnn_enabled': True, 'dataset_path': '.', 'debug': False, 'decay_rate': 8000, 'decay_start': 15000, 'decoder_rnn_dim': 1024, 'distributed_run': False, 'encoder_embedding_dim': 512, 'encoder_kernel_size': 5, 'encoder_n_convolutions': 3, 'epochs': 69420, 'epochs_per_checkpoint': 10, 'filter_length': 1024, 'fp16_run': False, 'gate_threshold': 0.5, 'grad_clip_thresh': 1.0, 'gst_dim': 2304, 'gst_type': 'torchmoji', 'has_speaker_embedding': True, 'hop_length': 256, 'ignore_layers': ['speaker_embedding.weight', 'spkr_lin.weight', 'spkr_lin.bias'], 'include_f0': False, 'is_validate': True, 'learning_rate': 0.001118033988749895, 'log_dir': '/content/project/logs', 'lrdecay_min': 0.00011180339887498949, 'lrdecay_start': 150, 'lrdecay_steps': 350, 'mask_padding': True, 'max_decoder_steps': 1000, 'max_wav_value': 32768.0, 'mel_fmax': 8000, 'mel_fmin': 0, 'n_frames_per_step_initial': 1, 'n_mel_channels': 80, 'n_speakers': 1, 'num_heads': 8, 'p_arpabet': 1.0, 'p_attention_dropout': 0.1, 'p_decoder_dropout': 0.1, 'p_teacher_forcing': 1.0, 'pos_weight': None, 'postnet_embedding_dim': 512, 'postnet_kernel_size': 5, 'postnet_n_convolutions': 5, 'prenet_dim': 256, 'prenet_f0_dim': 1, 'prenet_f0_kernel_size': 1, 'prenet_f0_n_layers': 1, 'prenet_fms_kernel_size': 1, 'prenet_rms_dim': 0, 'reduction_window_schedule': [{'batch_size': 16, 'n_frames_per_step': 1, 'until_step': 10000}, {'batch_size': 16, 'n_frames_per_step': 1, 'until_step': 50000}, {'batch_size': 16, 'n_frames_per_step': 1, 'until_step': 60000}, {'batch_size': 16, 'n_frames_per_step': 1, 'until_step': 70000}, {'batch_size': 16, 'n_frames_per_step': 1, 'until_step': None}], 'ref_enc_filters': [32, 32, 64, 64, 128, 128], 'ref_enc_gru_size': 128, 'ref_enc_pad': [1, 1], 'ref_enc_size': [3, 3], 'ref_enc_strides': [2, 2], 'sample_inference_speaker_ids': [0], 'sample_inference_text': 'That quick beige fox jumped in the air loudly over ' 'each thin dog.', 'sampling_rate': 22050, 'seed': 123, 'speaker_embedding_dim': 128, 'steps_per_sample': 15, 'symbol_set': 'nvidia_taco2', 'symbols_embedding_dim': 512, 'text_cleaners': ['english_cleaners'], 'torchmoji_model_file': 'pytorch_model.bin', 'torchmoji_vocabulary_file': 'vocabulary.json', 'training_audiopaths_and_text': 'transcription.txt', 'val_audiopaths_and_text': 'transcription_val.txt', 'warm_start_name': '/content/base_aitch.pt', 'weight_decay': 1e-06, 'win_length': 1024, 'with_gst': True} /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/torchmoji.py:1475: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_. nn.init.uniform(self.embed.weight.data, a=-0.5, b=0.5) /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/torchmoji.py:1477: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. nn.init.xavier_uniform(t) /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/torchmoji.py:1479: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_. nn.init.orthogonal(t) /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/torchmoji.py:1481: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. nn.init.constant(t, 0) /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/torchmoji.py:1483: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. nn.init.xavier_uniform(self.output_layer[0].weight.data) start train 529.488128475 /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/common.py:230: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = pad_center(fft_window, filter_length) /usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/models/common.py:357: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error mel_basis = librosa_mel( Initialized Torchmoji GST Starting warm_start 533.831198587 WARNING! Attempting to load a model with out the speaker_embedding.weight layer. This could lead to unexpected results during evaluation. Ending warm_start 533.91961055 Error while getting data: ['speakers/0000_WAV/WAV/8.wav', 'Easier said than done, I thought you were the one listening to my movie.', '0'] [Errno 2] No such file or directory: 'speakers/0000_WAV/WAV/8.wav' Exception raised while training: [Errno 2] No such file or directory: 'speakers/0000_WAV/WAV/8.wav' Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 49, in run(None, None, hparams) File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 30, in run raise e File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 26, in run trainer.train() File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/trainer/tacotron2.py", line 462, in train for batch_idx, batch in enumerate(train_loader): File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__ data = self._next_data() File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 671, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/data_loader.py", line 232, in __getitem__ data = self._get_data(self.audiopaths_and_text[idx]) File "/usr/local/lib/python3.9/dist-packages/uberduck_ml_dev/data_loader.py", line 190, in _get_data sampling_rate, wav_data = read(path) File "/usr/local/lib/python3.9/dist-packages/scipy/io/wavfile.py", line 647, in read fid = open(filename, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'speakers/0000_WAV/WAV/8.wav'