Hive Metastore Utils¶
About Hive Metastore¶
The Hive Metastore is a database with metadata for Hive tables.
To configure `SparklySession
to work with external Hive Metastore, you need to set hive.metastore.uris
option.
You can do this via hive-site.xml
file in spark config ($SPARK_HOME/conf/hive-site.xml):
<property>
<name>hive.metastore.uris</name>
<value>thrift://<n.n.n.n>:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
or set it dynamically via SparklySession
options:
class MySession(SparklySession):
options = {
'hive.metastore.uris': 'thrift://<n.n.n.n>:9083',
}
Tables management¶
Why: sometimes you need more than just to create a table.
from sparkly import SparklySession
spark = SparklySession()
assert spark.catalog_ext.has_table('my_table') in {True, False}
spark.catalog_ext.rename_table('my_table', 'my_new_table')
spark.catalog_ext.drop_table('my_new_table')
Table properties management¶
Why: sometimes you want to assign custom attributes for your table, e.g. creation time, last update, purpose, data source. The only way to interact with table properties in spark - use raw SQL queries. We implemented a more convenient interface to make your code cleaner.
from sparkly import SparklySession
spark = SparklySession()
spark.catalog_ext.set_table_property('my_table', 'foo', 'bar')
assert spark.catalog_ext.get_table_property('my_table', 'foo') == 'bar'
assert spark.catalog_ext.get_table_properties('my_table') == {'foo': 'bar'}
Note properties are stored as strings. In case if you need other types, consider using a serialisation format, e.g. JSON.
API documentation¶
-
class
sparkly.catalog.
SparklyCatalog
(spark)[source]¶ A set of tools to interact with HiveMetastore.
-
drop_table
(table_name, checkfirst=True)[source]¶ Drop table from the metastore.
Note
Follow the official documentation to understand DROP TABLE semantic. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL #LanguageManualDDL-DropTable
Parameters: - table_name (str) – A table name.
- checkfirst (bool) – Only issue DROPs for tables that are presented in the database.
-
get_table_properties
(table_name)[source]¶ Get table properties from the metastore.
Parameters: table_name (str) – A table name. Returns: Key/value for properties. Return type: dict[str,str]
-
get_table_property
(table_name, property_name, to_type=None)[source]¶ Get table property value from the metastore.
Parameters: - table_name (str) – A table name. Might contain a db name. E.g. “my_table” or “default.my_table”.
- property_name (str) – A property name to read value for.
- to_type (function) – Cast value to the given type. E.g. int or float.
Returns: Any
-
has_table
(table_name, db_name=None)[source]¶ Check if table is available in the metastore.
Parameters: - table_name (str) – A table name.
- db_name (str) – A database name.
Returns: bool
-
rename_table
(old_table_name, new_table_name)[source]¶ Rename table in the metastore.
Note
Follow the official documentation to understand ALTER TABLE semantic. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL #LanguageManualDDL-RenameTable
Parameters: - old_table_name (str) – The current table name.
- new_table_name (str) – An expected table name.
-