This came up in the context of using Hive: how should I handle mapping “original” to “safe” field names?
Currently, a non-alphanumeric field name in Avro leads to an error when using it with Hive. That’s fine, but while researching that, I saw that this is a generally unresolved issue.
The field name with Avro implementations is almost-always processed as UTF-8. That keeps it in parity with JSON, which is nice.
But, there was talk about possibly restricting it to alphanumeric [w/ underscore perhaps]. Apologies, I don’t have the bug numbers
Looks like we have aliases:
https://issues.apache.org/jira/browse/AVRO-600
Do I just pop the “original” field name in as an alias and use the “safe” (alphanumeric+underscore) one as the primary name?
-Charles
Currently, a non-alphanumeric field name in Avro leads to an error when using it with Hive. That’s fine, but while researching that, I saw that this is a generally unresolved issue.
The field name with Avro implementations is almost-always processed as UTF-8. That keeps it in parity with JSON, which is nice.
But, there was talk about possibly restricting it to alphanumeric [w/ underscore perhaps]. Apologies, I don’t have the bug numbers
Looks like we have aliases:
https://issues.apache.org/jira/browse/AVRO-600
Do I just pop the “original” field name in as an alias and use the “safe” (alphanumeric+underscore) one as the primary name?
-Charles