How can I improve LOAD DATA performance on large InnoDB tables?
I have this table with more than 7 million rows and I am
LOAD DATA LOCAL INFILE'ing more data in the order of 0.5 million rows at a time into it. The first few times were fast, but this addition is taking increasingly long, probably due to indexing overhead:
CREATE TABLE `orthograph_ests` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`digest` char(32) NOT NULL,
`taxid` int(10) unsigned NOT NULL,
`date` int(10) unsigned DEFAULT NULL,
`header` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `digest` (`digest`),
KEY `taxid` (`taxid`),
KEY `header` (`header`)
) ENGINE=InnoDB AUTO_INCREMENT=12134266 DEFAULT CHARSET=latin1
I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions like these are of limited use.
I have read that minimizing keys on this table will help. However, I need those keys for later queries. I'm guessing that if I drop and re-create them would take very long as well, but I have not tested this. I have also read that especially the
UNIQUE constraint makes the insertion slow. The
digest column will take SHA256 digests that must be unique, and I can't make sure there is no collision (very unlikely, I know, but possible).
Would partitioning help, as suggested here? Could I improve the indexing, e.g., by limiting the key length on the
digest column? Should I change to MyISAM, which supports
DISABLE KEYS during transcactions? What else could I do to improve
LOAD DATA performance?
After the large insertion, this table is used for
SELECTs only, no more writes. This large loading is mostly a once-and-done operation, however about 1,000 datasets (of each 0.5M rows) need to be uploaded before this is finished.
I will be using the digest to look up rows, which is why I indexed that column. If there should be a collision, that individual row should not be uploaded.
sequence blob in an external file system is probably not a viable option since I cannot easily impose file system changes on the users.
below query takes more than 1 minute, how to improve the performance. full scan is happening in both the tables. how to avoid? query plan: SELECT STATEMENT ALL_ROWSCost: 62 Bytes: 14,355 Cardinality:
I am using MySQL tables that have the following data: users(ID, name, email, create_added) (about 10000 rows) points(user_id, point) (about 15000 rows) And my query: SELECT u.*, SUM(p.point) point FR
Could I somehow only use only 1 sql query for this? showthread.php // Get Topic subject etc $threadID = isset($_GET['threadID']) ? intval($_GET['threadID']) : 0; $result = mysql_query(SELECT * FROM t
How, can I backup only the Innodb tables. I'm looking for command line solution. Thanks for your feedback.
I am new to MySQl. Please excuse some wrong terminologies. I have a question about indexing and splitting tables in MySQL. I am working on a web server. There are 3 very large tables that are frequent
I was wondering if anyone could point out a cleaner better way to write my code which is pasted here. The code scrapes some data from yelp and processes it into a json format. The reason I'm not using
Here my conversion code. This is taking long time when we are dealing with large data... Calling the method almost a million times....We could clearly see that it is holding threads for a while. Pleas
I've a database with 3 of the tables having rows in excess of 20 million each. I've used GUIDs as primary keys (unfortunately). Now our database is about 20GB and growing 5GB per month. It takes about
I use SharePoint 2013 and i use Performance Point Service. When I create Filter in dashboard designer I most select very large number of members. How i can select all filter item in performance point
I heard an example at work of using serialization to serialise some values for a webpart (which come from class properties), as this improves performance (than I assume getting values/etting values fr
I have a sql installer file for my custom Magento module. It attempts to insert many thousands of rows into a custom database table but it runs out of memory and the module doesn't install. Everything
I have two tables, in two different databases. I am using one of the tables to update values in the other database table. There are over 200,000 rows to iterate through, and it is taking several hours
I'm using LinqToSql to query a small, simple SQL Server CE database. I've noticed that any operations involving sub-properties are disappointingly slow. For example, if I have a Customer table that is
I am currently working in a project with developers working on three sites. One of the sites is in Delhi, India while the other two are in Europe. The communication between the European offices and th
It takes nearly 50 seconds to load a big chunk of 35 MB Json when accessing the Api. So to improve performance I added the WebApiContrib.Formatting.ProtoBuf to my project. The data is displayed in a K
So I am querying some extremely large tables. The reason they are so large is because PeopleSoft inserts new records every time a change is made to some data, rather than updating existing records. In
I have a jQueryMobile site/app where performance is an issue. I am reading a lot about how it's smart not to bind too many separate handlers. So if you have a page with a lot of buttons , the advice i
I'm learning clojure by going through project euler and am working on problem number 10 (find the sum of all the prime number below two million. I implemented a pretty literal algorithm for the sieve
In my MVC project, I have some tables like this: Form (FormID, SectionID) Section (SectionID, SectionName) SectionQuestion (SectionID, QuestionID) Question (QuestionID, Content) A form has multiple
I have a WPF TreeView where each TreeViewItem has a checkbox. My implementation is based on the sample provided in Josh Smith's excellent article: http://www.codeproject.com/KB/WPF/TreeViewWithCheckBo
I am developing web api in Visual studio 2012 and returning JSON result for each api call. How to improve performance of WEB API?
I am using GWT 2.4. There are times when I have to show huge amount of records for example: 50,000 records on my screen in a gridtable or flextable. But it takes very long to load that screen say arou
I have exercise below: The Lavin Interactive Company, which has developed the turn-based strategy Losers-V, is constantly extending its target market by localizing the game to as many languages as i
At work, we build large multi-page web applications, consisting mostly of radio and check boxes. The primary purpose of each application is to gather data, but as users return to a page they have prev
I'm facing a potential problem using the XtraReports tool and a web service about performance. in a Windows Form app. I know XtraReport loads large data set (I understand a large data set as +10,000 r
I have a big data to store into three tables in sqlite storage with Core Data. And it works slowly on deletes and inserts. I want add index to improve performance but I don't know how I can do this. H
How can i improve this query: My table structure is (showing just relevant columns) entityType | status | entityTypeId | empresaId E 0 5 2 S 1 5 2 S 2 6 1 I need to Count the total comments by entity
Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tabl
i write this but transaction not working and i also convert both tables in innodb type can any one guide me whats wrong in my coding or another alternative of transaction. mysql_query(begin;); $quer
This function takes around 1.2 seconds to execute. I am unable to understand why? Is it because of the inner joins? If yes, then how can i improve the execution speed? I am using Microsoft Enterprise
Does anybody know how to improve performance of this query? I'm using doctrine DQL model, here's the code (it takes 5-6 sec without pagination bundle) Controller: $data = $this->getDoctrine() -&
It generates MyISAM tables by default. I don't want to have to get it to generate a DDL script and then edit that if I can avoid it. I would also like to avoid changing the default table type of my My
I have a membership database that I am looking to rebuild. Every member has 1 row in a main members table. From there I will use a JOIN to reference information from other tables. My question is, what
Somehow, using MAMP PRO in my macbook certain two tables just disappear after a while. This has just started a couple of weeks a go and I can't find why. I read the logs and it points me to the INNODB
We have a page layout as below, with jQuery click handler for a large number of HTML elements (thousands of DIVs). The layout is like this: The Navbar contains at least 2000+ DIVs and UL , LI (used f
I want to know how can I insert data into tables(exists on sql server) from variable text files. At first time, I tried to use LOAD DATA INFILE command like this : mysql --user=user --password=passwor
I'm in a situation where I have Alot of services that are going to get called frequently in a large silverlight app. After some very light testing we are finding that things are just slowing to a craw
i had prepared a project on making a software application. It is complete and working fine except that the speed of execution is very slow..i have taken several chunks of code and optimized it.. i tri
I'm wondering if there is a performance gain when using external links vs. opening a spreadsheet and copying cells (using VBA). I would imagine that Excel has to open the file just the same, but when
I am using the task parallel library from .NET framework 4 (specifically Parallel.For and Parallel.ForEach) however I am getting extremely mediocre speed-ups when parallelizing some tasks which look l
I have two innodb tables: articles id | title | sum_votes ------------------------------ 1 | art 1 | 5 2 | art 2 | 8 3 | art 3 | 35 votes id | article_id | vote ------------------------------ 1 | 1 |
How can I improve this? the purpose of this code is to be used in a method that captures a string of hash_tags #twittertype from a form - parse through the list of words and make sure all the words ar
I know that the payoload of the function send() through winsocket is a pointer to an array of char, I want to use it to send a set of fixed values, I'd like to write something like this send(socket, F
I started to investigate why some searches in the Django admin where really slow (see here). Digging further I found that MySQL (5.1, InnoDB tables) performance vary a lot from one query to another on
I want a regular expression to match valid input into a Tags input field with the following properties: 1-5 tags Each tag is 1-30 characters long Valid tag characters are [a-zA-Z0-9-] input and tags
I am working on a website where users perform different tasks. Each activity of a user is recorded in a table. For performance measures, the records of over 12 months are moved to another table. But t
I have a method that has to execute sql scripts for many times. These scripts use for create tables, views, stored procedures, and functions on the database. I came up with this code which works fine
I have 3 tables: create table user ( user_id integer primary key autoincrement, username string not null, email string not null, pw_hash string not null ); create table product ( product_id integer pr