Skip to contents

This function checks if a file/dataset is a valid parquet format. It will print the number of lines/columns and return a tibble on columns information.





path to the file or dataset


a tibble with information on parquet dataset/file's columns with three columns : field name, arrow type and nullable


This function will :

* open the parquet dataset/file to check if it's valid * print the number of lines * print the number of columns * return a tibble with 2 columns :

* the column name (string) * the arrow type (string)

You can find a list of arrow type in the documentation on this page.


# check a parquet file
#>  checking: /home/runner/work/_temp/Library/parquetize/extdata/iris.parquet
#>  loading dataset:   ok
#>  number of lines:   150
#>  number of columns: 5
#> # A tibble: 5 × 2
#>   name         type      
#>   <chr>        <chr>     
#> 1 Sepal.Length double    
#> 2 Sepal.Width  double    
#> 3 Petal.Length double    
#> 4 Petal.Width  double    
#> 5 Species      dictionary

# check a parquet dataset
#>  checking: /home/runner/work/_temp/Library/parquetize/extdata/iris_dataset
#>  loading dataset:   ok
#>  number of lines:   150
#>  number of columns: 5
#> # A tibble: 5 × 2
#>   name         type  
#>   <chr>        <chr> 
#> 1 Sepal.Length double
#> 2 Sepal.Width  double
#> 3 Petal.Length double
#> 4 Petal.Width  double
#> 5 Species      utf8