博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
ETL数据抽取转换工具DataX使用记录
阅读量:7182 次
发布时间:2019-06-29

本文共 4420 字,大约阅读时间需要 14 分钟。

hot3.png

        DataX是一个在异构的/文件系统之间高速交换数据的工具,实现了在任意的数据处理系统(RDBMS/Hdfs/Local filesystem)之间的数据交换

datax产品说明

从oracle读取数据打印到控制台

//启动脚本#!/bin/bashsource ~/.bashrcpython /home/hadoop/ceshi/datax/bin/datax.py /home/hadoop/test/jobJson/test2.json
//任务json{    "job": {        "setting": {            "speed": {                "channel": 5            }        },        "content": [            {                "reader": {                    "name": "oraclereader",                    "parameter": {                        "username": "****",                        "password": "****",                        "where": "",                        "connection": [                            {                                "querySql": [                                    "select callingtel,calledtel from trecord where calledtel <= 100 group by callingtel,calledtel"                                ],                                "jdbcUrl": [                                    "jdbc:oracle:thin:@192.168.140.30:1521:TEST"                                ]                            }                        ]                    }                },                "writer": {                    "name": "streamwriter",                    "parameter": {                        "visible": true,                        "encoding": "UTF-8"                    }                }            }        ]    }}

从oracle导出数据到csv文件(用于neo4j数据导入)

//任务json;  启动命令与以上类似 【需要注意channel的不同?】{	"job": {		"content": [		{			"reader": {				"name": "oraclereader",					"parameter": {						"connection": [						{							"querySql": [								"SELECT CALLINGTEL AS START_ID, (CASE DATATYPE WHEN 0 THEN 'voice'WHEN 3 THEN 'sms'ELSE ''END ) calltype, (BEGINTIME - TO_DATE ('1970-01-01', 'yyyy-mm-dd') ) * 24 * 60 * 60 * 1000 AS BeginTime, ((BEGINTIME - TO_DATE ('1970-01-01', 'yyyy-mm-dd') ) * 24 * 60 * 60 * 1000 ) + (SPAN * 1000) AS EndTime, SPAN AS Span, CALLEDTEL AS END_ID, (CASE DATATYPE WHEN 0 THEN 'voice'WHEN 3 THEN 'sms'ELSE ''END ) TYPE FROM TRECORD WHERE CALLINGTEL != CALLEDTEL AND CALLINGTEL IS NOT NULL AND CALLEDTEL IS NOT NULL"],							"jdbcUrl": [								"jdbc:oracle:thin:@10.1.140.30:1521:TEST"								]						}						],							"password": "test",							"username": "test"					}			},				"writer": {					"name": "txtfilewriter",					"parameter": {						"path": "/home/hadoop/test/data/",						"fileName": "rel",						"fileType": "csv",						"fieldDelimiter": ",",						"writeMode": "append"					}				}		}		],				"setting": {				"speed": {					"channel": "10"				}			}	}}

fz用oracle导出数据到hdfs

//querySql模式,parameter-column可以不用指定{    "job": {        "content": [            {                "reader": {                    "name": "oraclereader",                    "parameter": {                        "connection": [                            {                               "querySql": [                                    "select callingtel,calledtel from trecord where to_char(rectime,'yyyy-mm-dd')=to_char(sysdate - 1,'yyyy-mm-dd') group by callingtel,calledtel"                                ],                                "jdbcUrl": [                                    "jdbc:oracle:thin:@192.168.140.30:1521:TEST"                                ]                            }                        ],                         "password": "****",                         "username": "****"                    }                },                "writer": {                    "name": "hdfswriter",                    "parameter": {                        "column": [                            {                                "name": "callingtel",                                "type": "INT"                            },                            {                                "name": "calledtel",                                "type": "INT"                            }                        ],                        "compress": "",                        "defaultFS": "hdfs://192.168.140.11:9000",                        "fieldDelimiter": " ",                        "fileName": "trecord",                        "fileType": "text",                        "path": "/user/test/data/",                        "writeMode": "append"                    }                }            }        ],        "setting": {            "speed": {                "channel": "2"            }        }    }}

转载于:https://my.oschina.net/sunyouling/blog/1528394

你可能感兴趣的文章
深入理解多线程(四)—— Moniter的实现原理
查看>>
前端面试中常考的源码实现
查看>>
vue基于viewer实现的图片查看器
查看>>
HTML、CSS、JavaScript
查看>>
Html5的新特性总结
查看>>
来一个阿里妈妈字体图标的简单说明书吧
查看>>
git 入门教程之撤销更改
查看>>
React在线编辑国际化文本
查看>>
了解多线程!
查看>>
Android Jetpack架构组件之 Paging(使用、源码篇)
查看>>
Day 4
查看>>
面向对象(理解对象)——JavaScript基础总结(一)
查看>>
写项目代码之前必须要做的事
查看>>
别装啦!一看就知道你要跳槽了.....
查看>>
java B2B2C Springcloud电子商城系统-Spring Cloud常见问题与总结(四)
查看>>
2017双11技术揭秘—阿里巴巴数据库技术架构演进
查看>>
聊聊字典编码
查看>>
独家 | 史上最权威的BI 趋势分析及产品对比
查看>>
观点 | 云原生时代来袭 下一代云数据库技术将走向何方?
查看>>
互联网分布式微服务云平台规划分析--SSO单点登录系统
查看>>